Hi Tim.
Nearly everything in scikit-learn will assume numeric features, or 
one-hot encoded categorical features.
You can feed categorical variables encoded as integers, but usually this 
will not result in the desired behavior.
For the ordinal (ordered) data, tree-based methods like the 
RandomForestClassifier should do  the right thing (tm).
For unordered categorical variables, it will lead to odd splits.

The methods from the feature_selection module will not really work with 
anything but the numerical features.

If you goal is to classify in the end, you need to transform your 
features into a numeric representation anyhow.

Hth,
Andy


On 05/16/2015 05:15 PM, Tim wrote:
> Hi all,
>
> In my classification problem,
> some features are numerical (e.g. 10.1, 1), and
> some features are categorical though  numerically coded as  nonnegative 
> numbers (such as id coded as 100, 99), and
> some features are ordered though numerically coded as  nonnegative 
> numbers(such as versions 12, 13, 4 ).
>
> Do the attribute feature_importances_ calculated by 
> RandomForestClassifier().fit() work with my feature types?
> Can it work with all the feature types, except categorical features coded as 
> numbers?
>
> Does the chi squared test in sklearn.feature_selection.chi2(X, y)  work with 
> my feature types?
> What types of features can it work with and what can it not?
> Can it only work with categorical features and ordered features, not 
> numerical features?
>
> Does the test by sklearn.feature_selection.f_classif(X, y) work with my 
> feature types?
> Can it only work with numerical features, not categorical or ordered features?
>
> The above three ways all return measurements and ranking of the features. But 
> I wonder if the results can be reliable due to different feature types.
> What do you suggest me to do feature selection and feature ranking in my 
> problem?
>
> Thanks,
> Tim
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to