Re: [Scikit-learn-general] Feature selection

Sebastian Raschka Tue, 02 Jun 2015 08:44:14 -0700

Hi, Herbert,
I can't help you with the accuracy problem since this can be due to many 
different things. However, there is now a way to combine different classifiers 
for majority rule voting, the sklearn.ensemble.VotingClassifier (. It is not in 
the current stable release yet but you could get it from the scikit-learn dev 
version from github.


Alternatively, if you don't want to install the scikit-learn dev version, you 
could use the EnsembleClassifier from mlxtend until the next stable release of 
scikit-learn -- slightly different syntax but the same principle 
http://rasbt.github.io/mlxtend/docs/sklearn/ensemble_classifier/ 
<http://rasbt.github.io/mlxtend/docs/sklearn/ensemble_classifier/> (this is 
basically the original implementation that was later ported to scikit-learn).

Hope that helps.

Best,
Sebastian


> On Jun 2, 2015, at 11:25 AM, Herbert Schulz <hrbrt....@gmail.com> wrote:
> 
> Thanks that helped.
> 
> But i just can't get an higher accuracy then 45%... don't now why. also with 
> logicstic regression and so on..
> 
> Is there a way to combine for example an SVM with a decision tree?
> 
> Herb
> 
> On 2 June 2015 at 11:19, Michael Eickenberg <michael.eickenb...@gmail.com 
> <mailto:michael.eickenb...@gmail.com>> wrote:
> Some configurations are not implemented or difficult to evaluate in the dual. 
> Setting dual=True/False doesn't change the result, so please don't vary it as 
> you would vary other parameters. It can however sometimes yield a speed-up. 
> Here you should try setting dual=False as a first means of debugging.
> 
> Michael
> 
> On Tue, Jun 2, 2015 at 11:04 AM, Herbert Schulz <hrbrt....@gmail.com 
> <mailto:hrbrt....@gmail.com>> wrote:
> Does anyone know why this failure occurs?
> 
> ValueError: Unsupported set of arguments: loss='l1' and 
> penalty='squared_hinge'are not supported when dual=True, Parameters: 
> penalty='l1', loss='squared_hinge', dual=True
> 
> I'm using a Linear SVC ( in andreas example code).
> 
> 
> On 1 June 2015 at 13:38, Herbert Schulz <hrbrt....@gmail.com 
> <mailto:hrbrt....@gmail.com>> wrote:
> Cool, thx for that!
> 
> 
> Herb
> 
> On 1 June 2015 at 12:16, JAGANADH G <jagana...@gmail.com 
> <mailto:jagana...@gmail.com>> wrote:
> Hi
> 
> I have listed sklearn feature selection with minimal examples here
> 
>  
> http://nbviewer.ipython.org/github/jaganadhg/data_science_notebooks/blob/master/sklearn/scikit_learn_feature_selection.ipynb
>  
> <http://nbviewer.ipython.org/github/jaganadhg/data_science_notebooks/blob/master/sklearn/scikit_learn_feature_selection.ipynb>
>  
> 
> Jagan
> 
> On Thu, May 28, 2015 at 10:14 PM, Herbert Schulz <hrbrt....@gmail.com 
> <mailto:hrbrt....@gmail.com>> wrote:
> Thank's to both of you!!! I realy appreciate it! I will try everything this 
> weekend.
> 
> Best regards,
> 
> Herb
> 
> On 28 May 2015 at 18:21, Sebastian Raschka <se.rasc...@gmail.com 
> <mailto:se.rasc...@gmail.com>> wrote:
> I agree with Andreas,
> typically, a large number of features also shouldn't be a big problem for 
> random forests in my experience; however, it of course depends on the number 
> of trees and training samples.
> 
> If you suspect that overfitting might be a problem using unregularized 
> classifiers, also consider "dimensionality reduction"/"feature exctraction" 
> techniques to compress the feature space, e.g., linear or kernel PCA, or 
> other methods listed in the manifold learning section on the scikit-website.
> 
> However, there are scenarios where you'd want to keep the "original" features 
> (in contrast to e.g., principal components), and there are scenarios where 
> linear methods such as LinearSVC(penalty='l1') may not work so well (e.g., 
> for non-linear problems). The optimal solution would be to exhaustively test 
> all feature combinations to see which works best, however, this can be quite 
> costly. For demonstration purposes, I implemented "sequential backward 
> selection" 
> (http://rasbt.github.io/mlxtend/docs/sklearn/sequential_backward_selection/ 
> <http://rasbt.github.io/mlxtend/docs/sklearn/sequential_backward_selection/>) 
> some time ago; a simple greedy alternative to the exhaustive search, maybe 
> you are lucky and it works well in your case? . When I find time after my 
> summer projects, I am planning to implement some genetic algos for feature 
> selection... 
> 
> Best,
> Sebastian
> 
> 
>> On May 28, 2015, at 11:59 AM, Andreas Mueller <t3k...@gmail.com 
>> <mailto:t3k...@gmail.com>> wrote:
>> 
>> Hi Herbert.
>> 1) Often reducing the features space does not help with accuracy, and using 
>> a regularized classifier leads to better results.
>> 2) To do feature selection, you need two methods: one to reduce the set of 
>> features, another that does the actual supervised task (classification here).
>> 
>> Have you tried just using the standard classifiers? Clearly you tried the 
>> RF, but I'd also try a linear method like LinearSVC/LogisticRegression or a 
>> kernel SVC.
>> 
>> If you want to do feature selection, what you need to do is something like 
>> this:
>> 
>> feature_selector = LinearSVC(penalty='l1')  #or maybe start with 
>> SelectKBest()
>> feature_selector.train(X_train, y_train)
>> 
>> X_train_reduced = feature_selector.transform(X_train)
>> X_test_reduced = feature_selector.transform(X_test)
>> 
>> classifier = RandomForestClassifier().fit(X_train_reduced, y_train)
>> 
>> prediction = classifier.predict(X_test_reduced)
>> 
>> 
>> Or you use a pipeline, as here: 
>> http://scikit-learn.org/dev/auto_examples/feature_selection/feature_selection_pipeline.html
>>  
>> <http://scikit-learn.org/dev/auto_examples/feature_selection/feature_selection_pipeline.html>
>> Maybe we should add a version without the pipeline to the examples?
>> 
>> Cheers,
>> Andy
>> 
>> 
>> 
>> On 05/28/2015 08:32 AM, Herbert Schulz wrote:
>>> Hello,
>>> I'm using scikit-learn for machine learning.
>>> I have 800 samples with 2048 features, therefore i want to reduce my 
>>> features to get hopefully a better accuracy. 
>>> 
>>> It is a multiclass problem (class 0-5), and the features consists of 1's 
>>> and 0's:  [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0....,0]
>>> 
>>> I'm using the Randfom Forest Classifier.
>>> 
>>> Should i just feature select the training data ? And is it enough if I'm 
>>> using this code:
>>> 
>>>     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)
>>> 
>>>     
>>> clf=RandomForestClassifier(n_estimators=200,warm_start=True,criterion='gini',
>>>  max_depth=13)
>>>     clf.fit(X_train, y_train).transform(X_train)
>>> 
>>>     predicted=clf.predict(X_test)
>>>     expected=y_test
>>>     confusionMatrix=metrics.confusion_matrix(expected,predicted)
>>> 
>>> Cause the accuracy didn't get higher. Is everything ok in the code or am I 
>>> doing something wrong?
>>> 
>>> I'll be very grateful for your help.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> 
>>> 
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net 
>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>> 
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net 
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> 
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> 
> ------------------------------------------------------------------------------
> 
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> 
> 
> -- 
> **********************************
> JAGANADH G
> http://jaganadhg.in <http://jaganadhg.in/>
> ILUGCBE
> http://ilugcbe.org.in <http://ilugcbe.org.in/>
> 
> ------------------------------------------------------------------------------
> 
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> 
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> 
> ------------------------------------------------------------------------------
> 
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Feature selection

Reply via email to