On 8 November 2015 at 17:50, Sebastian Raschka <se.rasc...@gmail.com> wrote:
>
>> On Nov 8, 2015, at 11:32 AM, Raphael C <drr...@gmail.com> wrote:
>>
>> In terms of computational efficiency, one-hot encoding combined with
>> the support for sparse feature vectors seems to work well, at least
>> for me. I assume therefore
>> the problem must be in terms of classification accuracy.
>
> One thing comes to mind regarding the different solvers for the linear 
> models. E.g., Newton’s method is O(n * d^2), and even gradient descent is O(n 
> *d)
>
> For decision trees, I don’t see a substantial difference in terms of 
> computational complexity if a categorical feature, let’s say it can take 4 
> values, is split into 4 binary questions (i.e., using one-hot encoding). One 
> the other hand, I think the problem is that the decision algorithm does not 
> no that these 4 binary questions “belong” to one observation, which could 
> make the decision tree grow much larger in depth and width; this is bad for 
> computational efficiency and would more likely produce trees with higher 
> variance.
>

I am unclear what difference it makes for decision trees myself. I am
no expert on the construction algorithms but I assume that they would
never split on a feature which depends 100% on a parent node as one
branch will just be empty. If that is right, it seems the decision
tree should not grow much larger.  It might take more time I suppose
for the construction algorithm to work this out of course.

It would be great if anyone had a concrete example where it made a
difference for a decision tree (or any classifier which uses decision
trees).

Raphael

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to