Re: [Scikit-learn-general] Using Random forest classifier after One hot encoding

Maheshakya Wijewardena Thu, 20 Jun 2013 07:52:59 -0700

The shape of X after encoding is (32769, 16600). Seems as if that is too
big to be converted into a dense matrix. Can Random forest handle this
amount of features?



On Thu, Jun 20, 2013 at 7:31 PM, Olivier Grisel <[email protected]>wrote:

> 2013/6/20 Lars Buitinck <[email protected]>:
> > 2013/6/20 Olivier Grisel <[email protected]>:
> >>> Actually twice as much, even on a 32-bit platform (float size is
> >>> always 64 bits).
> >>
> >> The decision tree code always uses 32 bits floats:
> >>
> >>
> https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tree/_tree.pyx#L38
> >>
> >> but you have to cast your data to `dtype=np.float32` in fortran layout
> >> ahead of time to avoid the memory copy.
> >
> > OneHot produces np.float, though, which is float64.
>
> Alright but you could convert it to np.float32 before calling toarray.
> But anyway this kind of sparsity level is unsuitable for random
> forests anyways I think.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Using Random forest classifier after One hot encoding

Reply via email to