Re: [Scikit-learn-general] Feature selection != feature elimination?

2016-05-02 Thread Sebastian Raschka
A little question regarding how it’s currently handled ... So, if I have one of scikit-learn’s feature selectors in a pipeline, and it selected e.g., the features idx=[1, 12, 23] after “.fit”. Now, if I use “.predict" on that pipeline, wouldn’t the feature selectors transform method only pass X[

Re: [Scikit-learn-general] Feature selection != feature elimination?

2016-05-02 Thread Philip Tully
Cool, thanks for feedback! Any outstanding PRs addressing something like this or anyone on this list been thinking of/working on solutions? I imagine it might be implemented as a step in a pipeline (eg. FeatureRemover()) and be generally applicable / potentially benefit many sklearners. Not sure i

Re: [Scikit-learn-general] Feature selection != feature elimination?

2016-03-14 Thread Joel Nothman
Currently there is no automatic mechanism for eliminating the generation of features that are not selected downstream. It needs to be achieved manually. On 15 March 2016 at 08:05, Philip Tully wrote: > Hi, > > I'm trying to optimize the time it takes to make a prediction with my > model(s). I re

Re: [Scikit-learn-general] Feature selection

2015-07-08 Thread Herbert Schulz
Ah, the API changes... but know im getting something like: import mlxtend.classifier.EnsembleClassifier Traceback (most recent call last): File "", line 1, in File "mlxtend/classifier/__init__.py", line 8, in from .ensemble import EnsembleClassifier File "mlxtend/classifier/ensemble.p

Re: [Scikit-learn-general] Feature selection

2015-07-08 Thread Herbert Schulz
Hey, the mlxtend library worked great on my Computer. Now installed it on an server. import mlxtend works fine but if i want to import the EnsembleClassifier he gives ma an error like: from mlxtend.sklearn import EnsembleClassifier : "No module named sklearn" import sklearn works also. Doe

Re: [Scikit-learn-general] Feature selection

2015-06-02 Thread Sebastian Raschka
Hi, Herbert, I can't help you with the accuracy problem since this can be due to many different things. However, there is now a way to combine different classifiers for majority rule voting, the sklearn.ensemble.VotingClassifier (. It is not in the current stable release yet but you could get it

Re: [Scikit-learn-general] Feature selection

2015-06-02 Thread Herbert Schulz
Thanks that helped. But i just can't get an higher accuracy then 45%... don't now why. also with logicstic regression and so on.. Is there a way to combine for example an SVM with a decision tree? Herb On 2 June 2015 at 11:19, Michael Eickenberg wrote: > Some configurations are not implemente

Re: [Scikit-learn-general] Feature selection

2015-06-02 Thread Michael Eickenberg
Some configurations are not implemented or difficult to evaluate in the dual. Setting dual=True/False doesn't change the result, so please don't vary it as you would vary other parameters. It can however sometimes yield a speed-up. Here you should try setting dual=False as a first means of debuggin

Re: [Scikit-learn-general] Feature selection

2015-06-02 Thread Herbert Schulz
Does anyone know why this failure occurs? ValueError: Unsupported set of arguments: loss='l1' and penalty='squared_hinge'are not supported when dual=True, Parameters: penalty='l1', loss='squared_hinge', dual=True I'm using a Linear SVC ( in andreas example code). On 1 June 2015 at 13:38, Herber

Re: [Scikit-learn-general] Feature selection

2015-06-01 Thread Herbert Schulz
Cool, thx for that! Herb On 1 June 2015 at 12:16, JAGANADH G wrote: > Hi > > I have listed sklearn feature selection with minimal examples here > > > http://nbviewer.ipython.org/github/jaganadhg/data_science_notebooks/blob/master/sklearn/scikit_learn_feature_selection.ipynb > > Jagan > > On Th

Re: [Scikit-learn-general] Feature selection

2015-06-01 Thread JAGANADH G
Hi I have listed sklearn feature selection with minimal examples here http://nbviewer.ipython.org/github/jaganadhg/data_science_notebooks/blob/master/sklearn/scikit_learn_feature_selection.ipynb Jagan On Thu, May 28, 2015 at 10:14 PM, Herbert Schulz wrote: > Thank's to both of you!!! I realy

Re: [Scikit-learn-general] Feature selection

2015-05-28 Thread Herbert Schulz
Thank's to both of you!!! I realy appreciate it! I will try everything this weekend. Best regards, Herb On 28 May 2015 at 18:21, Sebastian Raschka wrote: > I agree with Andreas, > typically, a large number of features also shouldn't be a big problem for > random forests in my experience; howev

Re: [Scikit-learn-general] Feature selection

2015-05-28 Thread Sebastian Raschka
I agree with Andreas, typically, a large number of features also shouldn't be a big problem for random forests in my experience; however, it of course depends on the number of trees and training samples. If you suspect that overfitting might be a problem using unregularized classifiers, also co

Re: [Scikit-learn-general] Feature selection

2015-05-28 Thread Andreas Mueller
Hi Herbert. 1) Often reducing the features space does not help with accuracy, and using a regularized classifier leads to better results. 2) To do feature selection, you need two methods: one to reduce the set of features, another that does the actual supervised task (classification here). Ha

Re: [Scikit-learn-general] Feature selection and ranking for different feature types?

2015-05-18 Thread Andreas Mueller
Hi Tim. Nearly everything in scikit-learn will assume numeric features, or one-hot encoded categorical features. You can feed categorical variables encoded as integers, but usually this will not result in the desired behavior. For the ordinal (ordered) data, tree-based methods like the RandomFor

Re: [Scikit-learn-general] Feature selection and cross validation; and identifying chosen features

2015-02-11 Thread Gilles Louppe
On 11 February 2015 at 22:22, Timothy Vivian-Griffiths wrote: > Hi Gilles, > > Thank you so much for clearing this up for me. So, am I right in thinking > that the feature selection is carried for every CV-fold, and then once the > best parameters have been found, the pipeline is then run on the

Re: [Scikit-learn-general] Feature selection and cross validation; and identifying chosen features

2015-02-11 Thread Joel Nothman
You could use grid2.best_estimator_.named_steps['feature_selection'].get_support(), or .transform(feature_names) instead of .get_support(). Note for instance that if you have a pipeline of multiple feature selectors, for some reason, .transform(feature_names) remains useful while .get_support() do

Re: [Scikit-learn-general] Feature selection and cross validation; and identifying chosen features

2015-02-11 Thread Vlad Niculae
> On 11 Feb 2015, at 16:31, Andy wrote: > > > On 02/11/2015 04:22 PM, Timothy Vivian-Griffiths wrote: >> Hi Gilles, >> >> Thank you so much for clearing this up for me. So, am I right in thinking >> that the feature selection is carried for every CV-fold, and then once the >> best parameters

Re: [Scikit-learn-general] Feature selection and cross validation; and identifying chosen features

2015-02-11 Thread Andy
On 02/11/2015 04:22 PM, Timothy Vivian-Griffiths wrote: > Hi Gilles, > > Thank you so much for clearing this up for me. So, am I right in thinking > that the feature selection is carried for every CV-fold, and then once the > best parameters have been found, the pipeline is then run on the whole

Re: [Scikit-learn-general] Feature selection and cross validation

2015-02-10 Thread Gilles Louppe
Hi Tim, On 9 February 2015 at 19:54, Timothy Vivian-Griffiths wrote: > Just a quick follow up to some of the previous problems that I have had: > after getting some kind assistance at the PyData London meetup last week, I > found out why I was getting different results using an SVC in R, and it w

Re: [Scikit-learn-general] feature selection

2014-11-02 Thread Andy
On 11/02/2014 04:15 PM, Lars Buitinck wrote: > 2014-11-02 22:09 GMT+01:00 Andy : >>> No. That would be backward stepwise selection. Neither that, nor its >>> forward cousin (find most discriminative feature, then second-most, >>> etc.) are implemented in scikit-learn. >>> >> Isn't RFE the backward

Re: [Scikit-learn-general] feature selection

2014-11-02 Thread Lars Buitinck
2014-11-02 22:09 GMT+01:00 Andy : >> No. That would be backward stepwise selection. Neither that, nor its >> forward cousin (find most discriminative feature, then second-most, >> etc.) are implemented in scikit-learn. >> > Isn't RFE the backward step selection using a maximum number of features?

Re: [Scikit-learn-general] feature selection

2014-11-02 Thread Andy
On 10/20/2014 04:29 PM, Lars Buitinck wrote: > 2014-10-20 22:08 GMT+02:00 George Bezerra : >> Not an expert, but I think the idea is that you remove (or add) features one >> by one, starting from the ones that have the least (or most) impact. >> >> E.g., try removing a feature, if performance impro

Re: [Scikit-learn-general] feature selection

2014-10-21 Thread Dayvid Victor
There are feature selection algorithms based on Evolutionary Algorithms, so, despite the exponential space of search, you can fix a number of evaluations. Experimentally, this approach have found optimal solutions on Instace/Feature/Classifier selection, without exploring the whole search space.

Re: [Scikit-learn-general] feature selection

2014-10-21 Thread Lars Buitinck
2014-10-21 4:14 GMT+02:00 Joel Nothman : > I assume Robert's query is about RFECV. Oh wait, RFE = backward subset selection. I'm an idiot, sorry. -- Comprehensive Server Monitoring with Site24x7. Monitor 10 servers for $9

Re: [Scikit-learn-general] feature selection

2014-10-20 Thread Joel Nothman
I assume Robert's query is about RFECV. On 21 October 2014 07:35, Manoj Kumar wrote: > Hi, > > No expert here, either but there are also feature selection classes which > compute the score per feature. > > A simple example would be the f_classif, which in a very broad way > measures how a certai

Re: [Scikit-learn-general] feature selection

2014-10-20 Thread Joel Nothman
*Roberto On 21 October 2014 13:14, Joel Nothman wrote: > I assume Robert's query is about RFECV. > > On 21 October 2014 07:35, Manoj Kumar > wrote: > >> Hi, >> >> No expert here, either but there are also feature selection classes which >> compute the score per feature. >> >> A simple example w

Re: [Scikit-learn-general] feature selection

2014-10-20 Thread Manoj Kumar
Hi, No expert here, either but there are also feature selection classes which compute the score per feature. A simple example would be the f_classif, which in a very broad way measures how a certain feature varies across all the classes to how a feature varies in a particular class (a naive expla

Re: [Scikit-learn-general] feature selection

2014-10-20 Thread Lars Buitinck
2014-10-20 22:08 GMT+02:00 George Bezerra : > Not an expert, but I think the idea is that you remove (or add) features one > by one, starting from the ones that have the least (or most) impact. > > E.g., try removing a feature, if performance improves, keep it that way and > move on to the next fea

Re: [Scikit-learn-general] feature selection

2014-10-20 Thread George Bezerra
Not an expert, but I think the idea is that you remove (or add) features one by one, starting from the ones that have the least (or most) impact. E.g., try removing a feature, if performance improves, keep it that way and move on to the next feature. It's a greedy approach; not optimal, but avoids

Re: [Scikit-learn-general] Feature selection: floating search algorithm

2014-10-09 Thread Nikolay Mayorov
e: Thu, 9 Oct 2014 06:58:46 +0200 > From: peter.z...@gmail.com > To: scikit-learn-general@lists.sourceforge.net > Subject: Re: [Scikit-learn-general] Feature selection: floating search > algorithm > > Hi Nikolay, > > On Wed, Oct 8, 2014 at 10:03 PM, Nikolay Mayorov

Re: [Scikit-learn-general] Feature selection: floating search algorithm

2014-10-08 Thread Pietro
Hi Nikolay, On Wed, Oct 8, 2014 at 10:03 PM, Nikolay Mayorov wrote: > Hi! > > Do you think scikit-learn will benefit from the general algorithm of feature > selection as described by P.Pudil et al. in "Floating search methods in > feature selection"? > > It is a wrapper method which alternates f

Re: [Scikit-learn-general] Feature selection/extraction contribution

2013-10-18 Thread Andreas Mueller
Hi Andrea. Thanks a lot for wanting to contribute. Could you elaborate a bit on the algorithmsthat you want to implement (i.e. reference paper) and their usage? I haven't heard of them (except Gram-Schmidt but I'm not sure how that works in this context) and I am sure other could you some detai

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-18 Thread Eustache DIEMERT
> > That said, as Olivier mentioned, the GradientBoostingClassifier could >> implement a "transform", and that might be a good idea. >> > > Ok, then maybe that's something I can tackle if it's not to hairy ? > > I tried something really dumb, but it seems to work in my case: """ class ExtGradien

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-17 Thread Nelle Varoquaux
> On Wed, Jul 17, 2013 at 09:09:02AM +0200, Eustache DIEMERT wrote: > > Ok, then for folks like me that come to numpy because (thanks to) > sklearn, than > > why not point a (few) good tutorials somewhere in the docs ? > > Indeed. What would people think of pointing to the scipy-lectures > (http://

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-17 Thread Gael Varoquaux
On Wed, Jul 17, 2013 at 09:09:02AM +0200, Eustache DIEMERT wrote: > Ok, then for folks like me that come to numpy because (thanks to) sklearn, > than > why not point a (few) good tutorials somewhere in the docs ?  Indeed. What would people think of pointing to the scipy-lectures (http://scipy-lec

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-17 Thread Olivier Grisel
I agree that the narrative feature selection documentation should include an inline toy example to demonstrate how to combine a selector transformer in a pipeline as this is the canonical way to use a feature selection, especially if you want to cross validate the impact oft he feature selection hy

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-17 Thread Eustache DIEMERT
Mmm Maybe just including the simple pipeline you provide in the feature selection doc [1] would suffice to point to the recommended way to do that ? Like a sub-sub-section dubbed "Including feature selection in a prediction pipeline" ? What do you think ? Would it be too detailed ? should we le

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-17 Thread Eustache DIEMERT
> > Yes. Learn numpy. Seriously, this may sound provocative but it's the > biggest favor you can do yourself. Ok, then for folks like me that come to numpy because (thanks to) sklearn, than why not point a (few) good tutorials somewhere in the docs ? I mean if it's an implicit requirement, then

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-17 Thread Eustache DIEMERT
2013/7/16 Olivier Grisel > Feature selectors should implement the `Transformer` API so that they > can be used in a Pipeline and make it possible to cross validate them. > > That's what I thought too. Do we have an example of cross-validation feature selection + learning ? > The univariate feat

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-16 Thread Gael Varoquaux
On Tue, Jul 16, 2013 at 05:09:09PM +0200, Eustache DIEMERT wrote: > What is missing IMHO is a simple example on how to actually transform the > dataset after the initial feature selection ! I beg to disagree. We have a huge amount of examples. Probably too many. We need to move people away from co

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-16 Thread Joel Nothman
Oh, well that's sad! Given that it assigns feature_importances_, is there any reason it should not incorporate the mixin to provide it with transform()? (I assumed that transform was available wherever feature_importances_ was.) On Wed, Jul 17, 2013 at 3:38 PM, Gael Varoquaux < gael.varoqu...@nor

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-16 Thread Gael Varoquaux
Hey Joel, I am afraid that I think that the GradientBoostingClassifier does not implement the transform method. Gaël On Wed, Jul 17, 2013 at 07:42:20AM +1000, Joel Nothman wrote: > Sorry, I made a mistake: unless the classifier has penalty=l1, its default > feature selection threshold (as used i

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-16 Thread Joel Nothman
Sorry, I made a mistake: unless the classifier has penalty=l1, its default feature selection threshold (as used in a pipeline currently) is the mean feature importance score. On Wed, Jul 17, 2013 at 7:11 AM, Joel Nothman wrote: > For your example, Eustache, the following would work (with a dense

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-16 Thread Joel Nothman
For your example, Eustache, the following would work (with a dense or sparse X): """ clf = GradientBoostingClassifier() clf.fit(X, y) clf.fit(clf.transform(threshold=1e-3), y) """ Alternatively, use a Pipeline: """ clf = Pipeline([ ('sel', GradientBoostingClassifier()), ('clf', GradientBo

Re: [Scikit-learn-general] feature selection documentation : improvements ?

2013-07-16 Thread Olivier Grisel
Feature selectors should implement the `Transformer` API so that they can be used in a Pipeline and make it possible to cross validate them. The univariate feature selectors already implement the transformer API: http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-sel

Re: [Scikit-learn-general] feature selection & scoring

2013-02-22 Thread Andreas Mueller
On 02/22/2013 12:03 PM, Christian wrote: > Hi, > > when I train a classification model with feature selected data, I'll > need for future scoring issues the selector object and the model object. > So I'll must persist both ( i.e. with pickle ), right ? Yes. But the selector is just a mask of siz

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread josef . pktd
On Fri, Jun 15, 2012 at 4:50 PM, Yaroslav Halchenko wrote: > > On Fri, 15 Jun 2012, josef.p...@gmail.com wrote: >> https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/misc/dcov.py#L160 >> looks like a double sum, but wikipedia only has one sum, elementwise product. > > sorry -- I might be slow -- w

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread Yaroslav Halchenko
On Fri, 15 Jun 2012, josef.p...@gmail.com wrote: > https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/misc/dcov.py#L160 > looks like a double sum, but wikipedia only has one sum, elementwise product. sorry -- I might be slow -- what sum? there is only an outer product in 160:Axy = Ax[:, None

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread josef . pktd
On Fri, Jun 15, 2012 at 4:20 PM, Yaroslav Halchenko wrote: > Here is a comparison to output of my code (marked with >): > >  0.00458652660079 0.788017364828 0.00700027844478 0.00483928213727 >> 0.145564526722 0.480124905375 0.422482399359 0.217567496918 > 6.50616752373e-07 7.99461373461e-05 0.0070

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread Yaroslav Halchenko
Here is a comparison to output of my code (marked with >): 0.00458652660079 0.788017364828 0.00700027844478 0.00483928213727 > 0.145564526722 0.480124905375 0.422482399359 0.217567496918 6.50616752373e-07 7.99461373461e-05 0.00700027844478 0.0094610687282 > 0.120884106118 0.249205123601 0.4224823

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread josef . pktd
On Fri, Jun 15, 2012 at 3:50 PM, wrote: > On Fri, Jun 15, 2012 at 10:45 AM, Yaroslav Halchenko > wrote: >> >> On Fri, 15 Jun 2012, Satrajit Ghosh wrote: >>>    hi yarik, >>>    here is my attempt: >>>     >>> [1]https://github.com/satra/scikit-learn/blob/enh/covariance/sklearn/covariance/distan

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread josef . pktd
On Fri, Jun 15, 2012 at 10:45 AM, Yaroslav Halchenko wrote: > > On Fri, 15 Jun 2012, Satrajit Ghosh wrote: >>    hi yarik, >>    here is my attempt: >>     >> [1]https://github.com/satra/scikit-learn/blob/enh/covariance/sklearn/covariance/distance_covariance.py >>    i'll look at your code in det

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread Yaroslav Halchenko
On Fri, 15 Jun 2012, Satrajit Ghosh wrote: >hi yarik, >here is my attempt: > > [1]https://github.com/satra/scikit-learn/blob/enh/covariance/sklearn/covariance/distance_covariance.py >i'll look at your code in detail later today to understand the uv=True it is just to compute dCo[v

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread Satrajit Ghosh
hi yarik, here is my attempt: https://github.com/satra/scikit-learn/blob/enh/covariance/sklearn/covariance/distance_covariance.py i'll look at your code in detail later today to understand the uv=True case. cheers, satra On Fri, Jun 15, 2012 at 10:19 AM, Yaroslav Halchenko wrote: > I haven't

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread Yaroslav Halchenko
I haven't had a chance to play with it extensively but I have a basic implementation: https://github.com/PyMVPA/PyMVPA/blob/master/mvpa2/misc/dcov.py which still lacks statistical assessment, but provides dCov, dCor values and yes -- it is "inherently multivariate", but since also could be useful

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread Satrajit Ghosh
hi yarik, hm... interesting -- and there is no comparison against "minimizing > independence"? e.g. dCov measure > http://en.wikipedia.org/wiki/Distance_correlation which is really simple > to estimate and as intuitive as a correlation coefficient > thanks for bringing up dCov. have you had a cha

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread xinfan meng
Submitted 5/07; Revised 6/11; Published 5/12 It takes such a long time ... On Fri, Jun 15, 2012 at 8:58 PM, Satrajit Ghosh wrote: > fyi > > -- Forwarded message -- > From: joshua vogelstein > Date: Fri, Jun 15, 2012 at 12:35 AM > > http://jmlr.csail.mit.edu/papers/volume13/song

Re: [Scikit-learn-general] feature selection algo

2012-06-15 Thread Yaroslav Halchenko
hm... interesting -- and there is no comparison against "minimizing independence"? e.g. dCov measure http://en.wikipedia.org/wiki/Distance_correlation which is really simple to estimate and as intuitive as a correlation coefficient On Fri, 15 Jun 2012, Satrajit Ghosh wrote: >fyi >