For large enough models (e.g. random forests or gradient boosted trees
ensembles) I would definitely recommend arbitrary integer coding for
the categorical variables.

Try both, use cross-validation and see for yourself.

-- 
Olivier
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to