Hi

I have listed sklearn feature selection with minimal examples here


http://nbviewer.ipython.org/github/jaganadhg/data_science_notebooks/blob/master/sklearn/scikit_learn_feature_selection.ipynb

Jagan

On Thu, May 28, 2015 at 10:14 PM, Herbert Schulz <hrbrt....@gmail.com>
wrote:

> Thank's to both of you!!! I realy appreciate it! I will try everything
> this weekend.
>
> Best regards,
>
> Herb
>
> On 28 May 2015 at 18:21, Sebastian Raschka <se.rasc...@gmail.com> wrote:
>
>> I agree with Andreas,
>> typically, a large number of features also shouldn't be a big problem for
>> random forests in my experience; however, it of course depends on the
>> number of trees and training samples.
>>
>> If you suspect that overfitting might be a problem using unregularized
>> classifiers, also consider "dimensionality reduction"/"feature exctraction"
>> techniques to compress the feature space, e.g., linear or kernel PCA, or
>> other methods listed in the manifold learning section on the scikit-website.
>>
>> However, there are scenarios where you'd want to keep the "original"
>> features (in contrast to e.g., principal components), and there are
>> scenarios where linear methods such as LinearSVC(penalty='l1') may not work
>> so well (e.g., for non-linear problems). The optimal solution would be to
>> exhaustively test all feature combinations to see which works best,
>> however, this can be quite costly. For demonstration purposes, I
>> implemented "sequential backward selection" (
>> http://rasbt.github.io/mlxtend/docs/sklearn/sequential_backward_selection/)
>> some time ago; a simple greedy alternative to the exhaustive search, maybe
>> you are lucky and it works well in your case? . When I find time after my
>> summer projects, I am planning to implement some genetic algos for feature
>> selection...
>>
>> Best,
>> Sebastian
>>
>>
>> On May 28, 2015, at 11:59 AM, Andreas Mueller <t3k...@gmail.com> wrote:
>>
>>  Hi Herbert.
>> 1) Often reducing the features space does not help with accuracy, and
>> using a regularized classifier leads to better results.
>> 2) To do feature selection, you need two methods: one to reduce the set
>> of features, another that does the actual supervised task (classification
>> here).
>>
>> Have you tried just using the standard classifiers? Clearly you tried the
>> RF, but I'd also try a linear method like LinearSVC/LogisticRegression or a
>> kernel SVC.
>>
>> If you want to do feature selection, what you need to do is something
>> like this:
>>
>> feature_selector = LinearSVC(penalty='l1')  #or maybe start with
>> SelectKBest()
>> feature_selector.train(X_train, y_train)
>>
>> X_train_reduced = feature_selector.transform(X_train)
>> X_test_reduced = feature_selector.transform(X_test)
>>
>> classifier = RandomForestClassifier().fit(X_train_reduced, y_train)
>>
>> prediction = classifier.predict(X_test_reduced)
>>
>>
>> Or you use a pipeline, as here:
>> http://scikit-learn.org/dev/auto_examples/feature_selection/feature_selection_pipeline.html
>> Maybe we should add a version without the pipeline to the examples?
>>
>> Cheers,
>> Andy
>>
>>
>>
>> On 05/28/2015 08:32 AM, Herbert Schulz wrote:
>>
>> Hello,
>> I'm using scikit-learn for machine learning.
>> I have 800 samples with 2048 features, therefore i want to reduce my
>> features to get hopefully a better accuracy.
>>
>> It is a multiclass problem (class 0-5), and the features consists of 1's
>> and 0's:  [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0....,0]
>>
>> I'm using the Randfom Forest Classifier.
>>
>> Should i just feature select the training data ? And is it enough if I'm
>> using this code:
>>
>>     X_train, X_test, y_train, y_test = train_test_split(X, y,
>> test_size=.3)
>>
>>
>> clf=RandomForestClassifier(n_estimators=200,warm_start=True,criterion='gini',
>> max_depth=13)
>>     clf.fit(X_train, y_train).transform(X_train)
>>
>>     predicted=clf.predict(X_test)
>>     expected=y_test
>>     confusionMatrix=metrics.confusion_matrix(expected,predicted)
>>
>> Cause the accuracy didn't get higher. Is everything ok in the code or am
>> I doing something wrong?
>>
>> I'll be very grateful for your help.
>>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
**********************************
JAGANADH G
http://jaganadhg.in
*ILUGCBE*
http://ilugcbe.org.in
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to