Re: [Scikit-learn-general] SVM: select the training set randomly

2013-06-21 Thread Bilal Dadanlar
you can have a look at sklearn.cross_validation.train_test_split() and some other methods from here: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.cross_validation On Fri, Jun 21, 2013 at 3:59 AM, Joel Nothman jnoth...@student.usyd.edu.auwrote: Please see

Re: [Scikit-learn-general] Interface for data imputation

2013-06-21 Thread Mathieu Blondel
On Fri, Jun 21, 2013 at 6:56 AM, Nicolas Trésegnie nicolas.treseg...@gmail.com wrote: - To impute only some of the missing values (rows, columns or a combination) I think this can be added later if you have time. For now, I would rather not clutter the API. For rows, one can just use

[Scikit-learn-general] Bootstrap aggregating

2013-06-21 Thread Maheshakya Wijewardena
Hi all, I would like to know whether we have bootstrap aggregating functionality in scikit-learn library. If so, How do I use that? (If it doesn't exist I would like to implement that explicitly to cohere with the learning algorithms we have in scikit-learn) Thank you

Re: [Scikit-learn-general] Bootstrap aggregating

2013-06-21 Thread Gilles Louppe
Hi, Such ensembles are not implemented at the moment. Gilles On 21 June 2013 09:59, Maheshakya Wijewardena pmaheshak...@gmail.com wrote: Hi all, I would like to know whether we have bootstrap aggregating functionality in scikit-learn library. If so, How do I use that? (If it doesn't exist

Re: [Scikit-learn-general] Bootstrap aggregating

2013-06-21 Thread Maheshakya Wijewardena
I'm doing a brownfield development for a university project and I'm so interested in this field. If I start implementing that kind ensemble method, will it suit in the scope of this scikit-learn project. Will it be useful for the users?( I've felt the need of that personally. It has improved the

Re: [Scikit-learn-general] Using Random forest classifier after One hot encoding

2013-06-21 Thread Maheshakya Wijewardena
can anyone give me a sample algorithm for one hot encoding used in scikit-learn? On Thu, Jun 20, 2013 at 8:37 PM, Peter Prettenhofer peter.prettenho...@gmail.com wrote: you can try an ordinal encoding instead - just map each categorical value to an integer so that you end up with 8 numerical

Re: [Scikit-learn-general] Using Random forest classifier after One hot encoding

2013-06-21 Thread Peter Prettenhofer
? you already use one-hot encoding in your example ( preprocessing.OneHotEncoder) 2013/6/21 Maheshakya Wijewardena pmaheshak...@gmail.com can anyone give me a sample algorithm for one hot encoding used in scikit-learn? On Thu, Jun 20, 2013 at 8:37 PM, Peter Prettenhofer

Re: [Scikit-learn-general] Using Random forest classifier after One hot encoding

2013-06-21 Thread Maheshakya Wijewardena
I'd like to analyse a bit and encode using that method to cohere with random forests in scikit-learn. On Fri, Jun 21, 2013 at 2:08 PM, Peter Prettenhofer peter.prettenho...@gmail.com wrote: ? you already use one-hot encoding in your example ( preprocessing.OneHotEncoder) 2013/6/21

Re: [Scikit-learn-general] Bootstrap aggregating

2013-06-21 Thread Olivier Grisel
2013/6/21 Gilles Louppe g.lou...@gmail.com: Hi, Such ensembles are not implemented at the moment. Ensembles of trees have a `bootstrap` parameter that do bagging, although they also randomize the feature selection and optionally split locations. -- Olivier

Re: [Scikit-learn-general] Bootstrap aggregating

2013-06-21 Thread Maheshakya Wijewardena
So that means that bagging can only be applied to trees. How about implementing a general module so that it can be applied on more learning algorithms. On Fri, Jun 21, 2013 at 4:17 PM, Olivier Grisel olivier.gri...@ensta.orgwrote: 2013/6/21 Gilles Louppe g.lou...@gmail.com: Hi, Such

Re: [Scikit-learn-general] Using Random forest classifier after One hot encoding

2013-06-21 Thread federico vaggi
What do you mean? It's pretty trivial to implement a one-hot encoding, the issue is that if you use a non-sparse format then you'll end up with a matrix which is far too dense to be practical, for anything but trivial examples. On Fri, Jun 21, 2013 at 10:46 AM, Maheshakya Wijewardena

Re: [Scikit-learn-general] Bootstrap aggregating

2013-06-21 Thread Andreas Mueller
On 06/21/2013 12:56 PM, Maheshakya Wijewardena wrote: So that means that bagging can only be applied to trees. How about implementing a general module so that it can be applied on more learning algorithms. I think that would be great. You should look at the forest implementation to get

Re: [Scikit-learn-general] Bootstrap aggregating

2013-06-21 Thread Maheshakya Wijewardena
Thank you. I'll have look at the forest implementation and check what can be done and inform you. I'd like to have a look at Gilles s code. If it's convenient, can you tell how you tried to implement that? best Maheshakya On Fri, Jun 21, 2013 at 6:55 PM, Andreas Mueller

Re: [Scikit-learn-general] Bootstrap aggregating

2013-06-21 Thread Andreas Mueller
On 06/21/2013 03:37 PM, Maheshakya Wijewardena wrote: Thank you. I'll have look at the forest implementation and check what can be done and inform you. I'd like to have a look at Gilles s code. If it's convenient, can you tell how you tried to implement that? I think it was mostly removing

Re: [Scikit-learn-general] Bootstrap aggregating

2013-06-21 Thread Maheshakya Wijewardena
Ok, I got it. I'll look at the code and see what can be done. Thank you. On Fri, Jun 21, 2013 at 7:17 PM, Andreas Mueller amuel...@ais.uni-bonn.dewrote: On 06/21/2013 03:37 PM, Maheshakya Wijewardena wrote: Thank you. I'll have look at the forest implementation and check what can be done

Re: [Scikit-learn-general] SVM: select the training set randomly

2013-06-21 Thread Gianni Iannelli
Thank You very much for the link!! It does closely what I wanna do! In my case I have two classes that are for example 0 and 1. I wanna keep the distribution (in the training set and so also the test set) between them similar. And I also need that are choosen randomly, I don't care if in one

Re: [Scikit-learn-general] SVM: select the training set randomly

2013-06-21 Thread Roban Kramer
StratifiedKFold will keep the class distribution the same for you: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html#sklearn.cross_validation.StratifiedKFold There are lots of metrics (score functions, etc.) available:

Re: [Scikit-learn-general] SVM: select the training set randomly

2013-06-21 Thread Gianni Iannelli
StratifiedKFold will keep the class distribution the same for you: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html#sklearn.cross_validation.StratifiedKFold I was looking at this, it is written: This cross-validation object is a variation of KFold,

Re: [Scikit-learn-general] SVM: select the training set randomly

2013-06-21 Thread Roban Kramer
Oh sorry, I was thinking of balanced sets for cross validation, rather than a training and testing split. I don't know of a convenience routine specifically for producing stratified training and testing sets. If both your classes have decent support and the training and testing set sizes aren't

Re: [Scikit-learn-general] SVM: select the training set randomly

2013-06-21 Thread Gianni Iannelli
Ah ok! Yeah, I was thinking that having in my dataset 50/50 (also 40/60) of my dataset for the two classes will be not a problem but since that the ratio is 1/3 I would prefere to have the same distribution for the two, then my choose to use the train_test_split method. I don't know if there

Re: [Scikit-learn-general] SVM: select the training set randomly

2013-06-21 Thread Gianni Iannelli
Found the error...I post below. The problem is that metrics.confusion_matrix accept lists and not numpy.array. So I converted everything in list: #Compute the confusion matrixy_testlist_tmp = y_test.transpose().tolist()y_testlist = y_testlist_tmp[0]resultlist = result.tolist()

[Scikit-learn-general] Optimization of the SVM parameters

2013-06-21 Thread Gianni Iannelli
Dear All, I'm stuck with a problem and I don't know if it's a bug. I'm defining the optimization parameter C and gamma for my SVM in this way: C = 10.0 ** numpy.arange(-3, 9)gamma = 10.0 ** numpy.arange(-6, 4)param_grid = dict(gamma=gamma, C=C)svr = svm.SVC(kernel='rbf')clfopt =