Just to throw some additional ideas in here. Based on a conversation with a colleague some time ago, I think learning classifier systems (https://en.wikipedia.org/wiki/Learning_classifier_system) are particularly useful when working with large, sparse binary vectors (like from a one-hot encoding). I am really not into LCS's, and only know the basics (read through the first chapters of the Intro to Learning Classifier Systems draft; the print version will be out later this year). Also, I saw an interesting poster on a Set Covering Machine algorithm once, which they benchmarked against SVMs, random forests and the like for categorical (genomics data). Looked promising.
Best, Sebastian > On Jul 21, 2017, at 2:37 PM, Raga Markely <raga.mark...@gmail.com> wrote: > > Thank you, Jacob. Appreciate it. > > Regarding 'perform better', I was referring to better accuracy, precision, > recall, F1 score, etc. > > Thanks, > Raga > > On Fri, Jul 21, 2017 at 2:27 PM, Jacob Schreiber <jmschreibe...@gmail.com> > wrote: > Traditionally tree based methods are very good when it comes to categorical > variables and can handle them appropriately. There is a current WIP PR to add > this support to sklearn. I'm not exactly sure what you mean that "perform > better" though. Estimators that ignore the categorical aspect of these > variables and treat them as discrete will likely perform worse than those > that treat them appropriately. > > On Fri, Jul 21, 2017 at 8:11 AM, Raga Markely <raga.mark...@gmail.com> wrote: > Hello, > > I am wondering if there are some classifiers that perform better for datasets > with categorical features (converted into sparse input matrix with > pd.get_dummies())? The data for the categorical features are nominal (order > doesn't matter, e.g. country, occupation, etc). > > If you could provide me some references (papers, books, website, etc), that > would be great. > > Thank you very much! > Raga > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn