Re: [scikit-learn] Classifiers for dataset with categorical features

2017-07-26 Thread Gael Varoquaux
The right thing to do would probably be to write a scikit-learn-contrib package for them and see if they gather traction. If they perform well on eg kaggle competitions, we know that we need them in :). Cheers, Gaƫl On Fri, Jul 21, 2017 at 07:09:03PM -0400, Sebastian Raschka wrote: > Maybe becau

Re: [scikit-learn] Classifiers for dataset with categorical features

2017-07-21 Thread Sebastian Raschka
Maybe because they are genetic algorithms, which are -- for some reason -- not very popular in the ML field in general :P. (People in bioinformatics seem to use them a lot though.). Also, the name "Learning Classifier Systems" is also a bit weird I'd must say: I remember that when Ryan introduce

Re: [scikit-learn] Classifiers for dataset with categorical features

2017-07-21 Thread Stuart Reynolds
+1 LCS and its many many variants seem very practical and adaptable. I'm not sure why they haven't gotten traction. Overshadowed by GBM & random forests? On Fri, Jul 21, 2017 at 11:52 AM, Sebastian Raschka wrote: > Just to throw some additional ideas in here. Based on a conversation with a > co

Re: [scikit-learn] Classifiers for dataset with categorical features

2017-07-21 Thread Raga Markely
Sounds good, Sebastian. Thank you! Raga On Fri, Jul 21, 2017 at 2:52 PM, Sebastian Raschka wrote: > Just to throw some additional ideas in here. Based on a conversation with > a colleague some time ago, I think learning classifier systems ( > https://en.wikipedia.org/wiki/Learning_classifier_sy

Re: [scikit-learn] Classifiers for dataset with categorical features

2017-07-21 Thread Sebastian Raschka
> Traditionally tree based methods are very good when it comes to categorical > variables and can handle them appropriately. There is a current WIP PR to add > this support to sklearn. I think it's also important to distinguish between nominal and ordinal; it can make a huge difference imho. I.

Re: [scikit-learn] Classifiers for dataset with categorical features

2017-07-21 Thread Sebastian Raschka
Just to throw some additional ideas in here. Based on a conversation with a colleague some time ago, I think learning classifier systems (https://en.wikipedia.org/wiki/Learning_classifier_system) are particularly useful when working with large, sparse binary vectors (like from a one-hot encodin

Re: [scikit-learn] Classifiers for dataset with categorical features

2017-07-21 Thread Raga Markely
Thank you, Jacob. Appreciate it. Regarding 'perform better', I was referring to better accuracy, precision, recall, F1 score, etc. Thanks, Raga On Fri, Jul 21, 2017 at 2:27 PM, Jacob Schreiber wrote: > Traditionally tree based methods are very good when it comes to > categorical variables and

Re: [scikit-learn] Classifiers for dataset with categorical features

2017-07-21 Thread Jacob Schreiber
Traditionally tree based methods are very good when it comes to categorical variables and can handle them appropriately. There is a current WIP PR to add this support to sklearn. I'm not exactly sure what you mean that "perform better" though. Estimators that ignore the categorical aspect of these

Re: [scikit-learn] Classifiers for dataset with categorical features

2017-07-21 Thread Jacob Schreiber
Traditionally tree based methods are very good when it comes to categorical variables and can handle them appropriately. There is a current WIP PR to add this support to sklearn. I'm not exactly sure what you mean that "perform better" though. Estimators that ignore the categorical aspect of these

[scikit-learn] Classifiers for dataset with categorical features

2017-07-21 Thread Raga Markely
Hello, I am wondering if there are some classifiers that perform better for datasets with categorical features (converted into sparse input matrix with pd.get_dummies())? The data for the categorical features are nominal (order doesn't matter, e.g. country, occupation, etc). If you could provide