Thank's to both of you!!! I realy appreciate it! I will try everything this weekend.
Best regards, Herb On 28 May 2015 at 18:21, Sebastian Raschka <se.rasc...@gmail.com> wrote: > I agree with Andreas, > typically, a large number of features also shouldn't be a big problem for > random forests in my experience; however, it of course depends on the > number of trees and training samples. > > If you suspect that overfitting might be a problem using unregularized > classifiers, also consider "dimensionality reduction"/"feature exctraction" > techniques to compress the feature space, e.g., linear or kernel PCA, or > other methods listed in the manifold learning section on the scikit-website. > > However, there are scenarios where you'd want to keep the "original" > features (in contrast to e.g., principal components), and there are > scenarios where linear methods such as LinearSVC(penalty='l1') may not work > so well (e.g., for non-linear problems). The optimal solution would be to > exhaustively test all feature combinations to see which works best, > however, this can be quite costly. For demonstration purposes, I > implemented "sequential backward selection" ( > http://rasbt.github.io/mlxtend/docs/sklearn/sequential_backward_selection/) > some time ago; a simple greedy alternative to the exhaustive search, maybe > you are lucky and it works well in your case? . When I find time after my > summer projects, I am planning to implement some genetic algos for feature > selection... > > Best, > Sebastian > > > On May 28, 2015, at 11:59 AM, Andreas Mueller <t3k...@gmail.com> wrote: > > Hi Herbert. > 1) Often reducing the features space does not help with accuracy, and > using a regularized classifier leads to better results. > 2) To do feature selection, you need two methods: one to reduce the set of > features, another that does the actual supervised task (classification > here). > > Have you tried just using the standard classifiers? Clearly you tried the > RF, but I'd also try a linear method like LinearSVC/LogisticRegression or a > kernel SVC. > > If you want to do feature selection, what you need to do is something like > this: > > feature_selector = LinearSVC(penalty='l1') #or maybe start with > SelectKBest() > feature_selector.train(X_train, y_train) > > X_train_reduced = feature_selector.transform(X_train) > X_test_reduced = feature_selector.transform(X_test) > > classifier = RandomForestClassifier().fit(X_train_reduced, y_train) > > prediction = classifier.predict(X_test_reduced) > > > Or you use a pipeline, as here: > http://scikit-learn.org/dev/auto_examples/feature_selection/feature_selection_pipeline.html > Maybe we should add a version without the pipeline to the examples? > > Cheers, > Andy > > > > On 05/28/2015 08:32 AM, Herbert Schulz wrote: > > Hello, > I'm using scikit-learn for machine learning. > I have 800 samples with 2048 features, therefore i want to reduce my > features to get hopefully a better accuracy. > > It is a multiclass problem (class 0-5), and the features consists of 1's > and 0's: [1,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0....,0] > > I'm using the Randfom Forest Classifier. > > Should i just feature select the training data ? And is it enough if I'm > using this code: > > X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.3) > > > clf=RandomForestClassifier(n_estimators=200,warm_start=True,criterion='gini', > max_depth=13) > clf.fit(X_train, y_train).transform(X_train) > > predicted=clf.predict(X_test) > expected=y_test > confusionMatrix=metrics.confusion_matrix(expected,predicted) > > Cause the accuracy didn't get higher. Is everything ok in the code or am I > doing something wrong? > > I'll be very grateful for your help. > > > > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > Scikit-learn-general mailing > listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general