On 8 November 2015 at 17:50, Sebastian Raschka <se.rasc...@gmail.com> wrote: > >> On Nov 8, 2015, at 11:32 AM, Raphael C <drr...@gmail.com> wrote: >> >> In terms of computational efficiency, one-hot encoding combined with >> the support for sparse feature vectors seems to work well, at least >> for me. I assume therefore >> the problem must be in terms of classification accuracy. > > One thing comes to mind regarding the different solvers for the linear > models. E.g., Newton’s method is O(n * d^2), and even gradient descent is O(n > *d) > > For decision trees, I don’t see a substantial difference in terms of > computational complexity if a categorical feature, let’s say it can take 4 > values, is split into 4 binary questions (i.e., using one-hot encoding). One > the other hand, I think the problem is that the decision algorithm does not > no that these 4 binary questions “belong” to one observation, which could > make the decision tree grow much larger in depth and width; this is bad for > computational efficiency and would more likely produce trees with higher > variance. >
I am unclear what difference it makes for decision trees myself. I am no expert on the construction algorithms but I assume that they would never split on a feature which depends 100% on a parent node as one branch will just be empty. If that is right, it seems the decision tree should not grow much larger. It might take more time I suppose for the construction algorithm to work this out of course. It would be great if anyone had a concrete example where it made a difference for a decision tree (or any classifier which uses decision trees). Raphael ------------------------------------------------------------------------------ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general