Hi Herbert.
1) Often reducing the features space does not help with accuracy, and using a regularized classifier leads to better results. 2) To do feature selection, you need two methods: one to reduce the set of features, another that does the actual supervised task (classification here).

Have you tried just using the standard classifiers? Clearly you tried the RF, but I'd also try a linear method like LinearSVC/LogisticRegression or a kernel SVC.

If you want to do feature selection, what you need to do is something like this:

feature_selector = LinearSVC(penalty='l1') #or maybe start with SelectKBest()
feature_selector.train(X_train, y_train)

X_train_reduced = feature_selector.transform(X_train)
X_test_reduced = feature_selector.transform(X_test)

classifier = RandomForestClassifier().fit(X_train_reduced, y_train)

prediction = classifier.predict(X_test_reduced)


Or you use a pipeline, as here: http://scikit-learn.org/dev/auto_examples/feature_selection/feature_selection_pipeline.html
Maybe we should add a version without the pipeline to the examples?

Cheers,
Andy



On 05/28/2015 08:32 AM, Herbert Schulz wrote:
Hello,
I'm using scikit-learn for machine learning.
I have 800 samples with 2048 features, therefore i want to reduce my features to get hopefully a better accuracy.

It is a multiclass problem (class 0-5), and the features consists of 1's and 0's: [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0....,0]

I'm using the Randfom Forest Classifier.

Should i just feature select the training data ? And is it enough if I'm using this code:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3)

clf=RandomForestClassifier(n_estimators=200,warm_start=True,criterion='gini', max_depth=13)
    clf.fit(X_train, y_train).transform(X_train)

    predicted=clf.predict(X_test)
    expected=y_test
    confusionMatrix=metrics.confusion_matrix(expected,predicted)

Cause the accuracy didn't get higher. Is everything ok in the code or am I doing something wrong?

I'll be very grateful for your help.





------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to