Hi Herbert.
1) Often reducing the features space does not help with accuracy, and
using a regularized classifier leads to better results.
2) To do feature selection, you need two methods: one to reduce the set
of features, another that does the actual supervised task
(classification here).
Have you tried just using the standard classifiers? Clearly you tried
the RF, but I'd also try a linear method like
LinearSVC/LogisticRegression or a kernel SVC.
If you want to do feature selection, what you need to do is something
like this:
feature_selector = LinearSVC(penalty='l1') #or maybe start with
SelectKBest()
feature_selector.train(X_train, y_train)
X_train_reduced = feature_selector.transform(X_train)
X_test_reduced = feature_selector.transform(X_test)
classifier = RandomForestClassifier().fit(X_train_reduced, y_train)
prediction = classifier.predict(X_test_reduced)
Or you use a pipeline, as here:
http://scikit-learn.org/dev/auto_examples/feature_selection/feature_selection_pipeline.html
Maybe we should add a version without the pipeline to the examples?
Cheers,
Andy
On 05/28/2015 08:32 AM, Herbert Schulz wrote:
Hello,
I'm using scikit-learn for machine learning.
I have 800 samples with 2048 features, therefore i want to reduce my
features to get hopefully a better accuracy.
It is a multiclass problem (class 0-5), and the features consists of
1's and 0's: [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0....,0]
I'm using the Randfom Forest Classifier.
Should i just feature select the training data ? And is it enough if
I'm using this code:
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=.3)
clf=RandomForestClassifier(n_estimators=200,warm_start=True,criterion='gini',
max_depth=13)
clf.fit(X_train, y_train).transform(X_train)
predicted=clf.predict(X_test)
expected=y_test
confusionMatrix=metrics.confusion_matrix(expected,predicted)
Cause the accuracy didn't get higher. Is everything ok in the code or
am I doing something wrong?
I'll be very grateful for your help.
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general