Re: [Scikit-learn-general] Using Random forest classifier after One hot encoding

Olivier Grisel Thu, 20 Jun 2013 07:07:45 -0700

2013/6/20 Lars Buitinck <[email protected]>:
> 2013/6/20 Gilles Louppe <[email protected]>:
>> This looks like the dataset from the Amazon challenge currently
>> running on Kaggle. When one-hot-encoded, you end up with rhoughly
>> 15000 binary features, which means that the dense representation
>> requires at least 32000*15000*4 bytes to hold in memory (or even twice
>> as as more depending on your architecture). I let you do the math.
>
> Actually twice as much, even on a 32-bit platform (float size is
> always 64 bits).


The decision tree code always uses 32 bits floats:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L38

but you have to cast your data to `dtype=np.float32` in fortran layout
ahead of time to avoid the memory copy.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Using Random forest classifier after One hot encoding

Reply via email to