Hi Tim. Nearly everything in scikit-learn will assume numeric features, or one-hot encoded categorical features. You can feed categorical variables encoded as integers, but usually this will not result in the desired behavior. For the ordinal (ordered) data, tree-based methods like the RandomForestClassifier should do the right thing (tm). For unordered categorical variables, it will lead to odd splits.
The methods from the feature_selection module will not really work with anything but the numerical features. If you goal is to classify in the end, you need to transform your features into a numeric representation anyhow. Hth, Andy On 05/16/2015 05:15 PM, Tim wrote: > Hi all, > > In my classification problem, > some features are numerical (e.g. 10.1, 1), and > some features are categorical though numerically coded as nonnegative > numbers (such as id coded as 100, 99), and > some features are ordered though numerically coded as nonnegative > numbers(such as versions 12, 13, 4 ). > > Do the attribute feature_importances_ calculated by > RandomForestClassifier().fit() work with my feature types? > Can it work with all the feature types, except categorical features coded as > numbers? > > Does the chi squared test in sklearn.feature_selection.chi2(X, y) work with > my feature types? > What types of features can it work with and what can it not? > Can it only work with categorical features and ordered features, not > numerical features? > > Does the test by sklearn.feature_selection.f_classif(X, y) work with my > feature types? > Can it only work with numerical features, not categorical or ordered features? > > The above three ways all return measurements and ranking of the features. But > I wonder if the results can be reliable due to different feature types. > What do you suggest me to do feature selection and feature ranking in my > problem? > > Thanks, > Tim > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general