2013/6/20 Lars Buitinck <l.j.buiti...@uva.nl>: > 2013/6/20 Gilles Louppe <g.lou...@gmail.com>: >> This looks like the dataset from the Amazon challenge currently >> running on Kaggle. When one-hot-encoded, you end up with rhoughly >> 15000 binary features, which means that the dense representation >> requires at least 32000*15000*4 bytes to hold in memory (or even twice >> as as more depending on your architecture). I let you do the math. > > Actually twice as much, even on a 32-bit platform (float size is > always 64 bits).
The decision tree code always uses 32 bits floats: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L38 but you have to cast your data to `dtype=np.float32` in fortran layout ahead of time to avoid the memory copy. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general