Just to throw some additional ideas in here. Based on a conversation with a 
colleague some time ago, I think learning classifier systems 
(https://en.wikipedia.org/wiki/Learning_classifier_system) are particularly 
useful when working with large, sparse binary vectors (like from a one-hot 
encoding). I am really not into LCS's, and only know the basics (read through 
the first chapters of the Intro to Learning Classifier Systems draft; the print 
version will be out later this year). 
Also, I saw an interesting poster on a Set Covering Machine algorithm once, 
which they benchmarked against SVMs, random forests and the like for 
categorical (genomics data). Looked promising.

Best,
Sebastian


> On Jul 21, 2017, at 2:37 PM, Raga Markely <raga.mark...@gmail.com> wrote:
> 
> Thank you, Jacob. Appreciate it.
> 
> Regarding 'perform better', I was referring to better accuracy, precision, 
> recall, F1 score, etc.
> 
> Thanks,
> Raga
> 
> On Fri, Jul 21, 2017 at 2:27 PM, Jacob Schreiber <jmschreibe...@gmail.com> 
> wrote:
> Traditionally tree based methods are very good when it comes to categorical 
> variables and can handle them appropriately. There is a current WIP PR to add 
> this support to sklearn. I'm not exactly sure what you mean that "perform 
> better" though. Estimators that ignore the categorical aspect of these 
> variables and treat them as discrete will likely perform worse than those 
> that treat them appropriately.
> 
> On Fri, Jul 21, 2017 at 8:11 AM, Raga Markely <raga.mark...@gmail.com> wrote:
> Hello,
> 
> I am wondering if there are some classifiers that perform better for datasets 
> with categorical features (converted into sparse input matrix with 
> pd.get_dummies())? The data for the categorical features are nominal (order 
> doesn't matter, e.g. country, occupation, etc).
> 
> If you could provide me some references (papers, books, website, etc), that 
> would be great.
> 
> Thank you very much!
> Raga
> 
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
> 
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to