Traditionally tree based methods are very good when it comes to categorical
variables and can handle them appropriately. There is a current WIP PR to
add this support to sklearn. I'm not exactly sure what you mean that
"perform better" though. Estimators that ignore the categorical aspect of
these variables and treat them as discrete will likely perform worse than
those that treat them appropriately.

On Fri, Jul 21, 2017 at 8:11 AM, Raga Markely <raga.mark...@gmail.com>
wrote:

> Hello,
>
> I am wondering if there are some classifiers that perform better for
> datasets with categorical features (converted into sparse input matrix with
> pd.get_dummies())? The data for the categorical features are nominal (order
> doesn't matter, e.g. country, occupation, etc).
>
> If you could provide me some references (papers, books, website, etc),
> that would be great.
>
> Thank you very much!
> Raga
>
>
>
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to