Re: [Scikit-learn-general] Feature selection

Michael Eickenberg Tue, 02 Jun 2015 02:20:52 -0700

Some configurations are not implemented or difficult to evaluate in the
dual. Setting dual=True/False doesn't change the result, so please don't
vary it as you would vary other parameters. It can however sometimes yield
a speed-up. Here you should try setting dual=False as a first means of
debugging.


Michael

On Tue, Jun 2, 2015 at 11:04 AM, Herbert Schulz <hrbrt....@gmail.com> wrote:

> Does anyone know why this failure occurs?
>
> ValueError: Unsupported set of arguments: loss='l1' and
> penalty='squared_hinge'are not supported when dual=True, Parameters:
> penalty='l1', loss='squared_hinge', dual=True
>
> I'm using a Linear SVC ( in andreas example code).
>
>
> On 1 June 2015 at 13:38, Herbert Schulz <hrbrt....@gmail.com> wrote:
>
>> Cool, thx for that!
>>
>>
>> Herb
>>
>> On 1 June 2015 at 12:16, JAGANADH G <jagana...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> I have listed sklearn feature selection with minimal examples here
>>>
>>>
>>> http://nbviewer.ipython.org/github/jaganadhg/data_science_notebooks/blob/master/sklearn/scikit_learn_feature_selection.ipynb
>>>
>>> Jagan
>>>
>>> On Thu, May 28, 2015 at 10:14 PM, Herbert Schulz <hrbrt....@gmail.com>
>>> wrote:
>>>
>>>> Thank's to both of you!!! I realy appreciate it! I will try everything
>>>> this weekend.
>>>>
>>>> Best regards,
>>>>
>>>> Herb
>>>>
>>>> On 28 May 2015 at 18:21, Sebastian Raschka <se.rasc...@gmail.com>
>>>> wrote:
>>>>
>>>>> I agree with Andreas,
>>>>> typically, a large number of features also shouldn't be a big problem
>>>>> for random forests in my experience; however, it of course depends on the
>>>>> number of trees and training samples.
>>>>>
>>>>> If you suspect that overfitting might be a problem using unregularized
>>>>> classifiers, also consider "dimensionality reduction"/"feature 
>>>>> exctraction"
>>>>> techniques to compress the feature space, e.g., linear or kernel PCA, or
>>>>> other methods listed in the manifold learning section on the 
>>>>> scikit-website.
>>>>>
>>>>> However, there are scenarios where you'd want to keep the "original"
>>>>> features (in contrast to e.g., principal components), and there are
>>>>> scenarios where linear methods such as LinearSVC(penalty='l1') may not 
>>>>> work
>>>>> so well (e.g., for non-linear problems). The optimal solution would be to
>>>>> exhaustively test all feature combinations to see which works best,
>>>>> however, this can be quite costly. For demonstration purposes, I
>>>>> implemented "sequential backward selection" (
>>>>> http://rasbt.github.io/mlxtend/docs/sklearn/sequential_backward_selection/)
>>>>> some time ago; a simple greedy alternative to the exhaustive search, maybe
>>>>> you are lucky and it works well in your case? . When I find time after my
>>>>> summer projects, I am planning to implement some genetic algos for feature
>>>>> selection...
>>>>>
>>>>> Best,
>>>>> Sebastian
>>>>>
>>>>>
>>>>> On May 28, 2015, at 11:59 AM, Andreas Mueller <t3k...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>  Hi Herbert.
>>>>> 1) Often reducing the features space does not help with accuracy, and
>>>>> using a regularized classifier leads to better results.
>>>>> 2) To do feature selection, you need two methods: one to reduce the
>>>>> set of features, another that does the actual supervised task
>>>>> (classification here).
>>>>>
>>>>> Have you tried just using the standard classifiers? Clearly you tried
>>>>> the RF, but I'd also try a linear method like LinearSVC/LogisticRegression
>>>>> or a kernel SVC.
>>>>>
>>>>> If you want to do feature selection, what you need to do is something
>>>>> like this:
>>>>>
>>>>> feature_selector = LinearSVC(penalty='l1')  #or maybe start with
>>>>> SelectKBest()
>>>>> feature_selector.train(X_train, y_train)
>>>>>
>>>>> X_train_reduced = feature_selector.transform(X_train)
>>>>> X_test_reduced = feature_selector.transform(X_test)
>>>>>
>>>>> classifier = RandomForestClassifier().fit(X_train_reduced, y_train)
>>>>>
>>>>> prediction = classifier.predict(X_test_reduced)
>>>>>
>>>>>
>>>>> Or you use a pipeline, as here:
>>>>> http://scikit-learn.org/dev/auto_examples/feature_selection/feature_selection_pipeline.html
>>>>> Maybe we should add a version without the pipeline to the examples?
>>>>>
>>>>> Cheers,
>>>>> Andy
>>>>>
>>>>>
>>>>>
>>>>> On 05/28/2015 08:32 AM, Herbert Schulz wrote:
>>>>>
>>>>> Hello,
>>>>> I'm using scikit-learn for machine learning.
>>>>> I have 800 samples with 2048 features, therefore i want to reduce my
>>>>> features to get hopefully a better accuracy.
>>>>>
>>>>> It is a multiclass problem (class 0-5), and the features consists of
>>>>> 1's and 0's:  [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0....,0]
>>>>>
>>>>> I'm using the Randfom Forest Classifier.
>>>>>
>>>>> Should i just feature select the training data ? And is it enough if
>>>>> I'm using this code:
>>>>>
>>>>>     X_train, X_test, y_train, y_test = train_test_split(X, y,
>>>>> test_size=.3)
>>>>>
>>>>>
>>>>> clf=RandomForestClassifier(n_estimators=200,warm_start=True,criterion='gini',
>>>>> max_depth=13)
>>>>>     clf.fit(X_train, y_train).transform(X_train)
>>>>>
>>>>>     predicted=clf.predict(X_test)
>>>>>     expected=y_test
>>>>>     confusionMatrix=metrics.confusion_matrix(expected,predicted)
>>>>>
>>>>> Cause the accuracy didn't get higher. Is everything ok in the code or
>>>>> am I doing something wrong?
>>>>>
>>>>> I'll be very grateful for your help.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing 
>>>>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>>
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-general@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-general@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> --
>>> **********************************
>>> JAGANADH G
>>> http://jaganadhg.in
>>> *ILUGCBE*
>>> http://ilugcbe.org.in
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Feature selection

Reply via email to