Interesting, I've been using DictVectorizer (and one hot coded categorical
data) with Random Forests and getting decent results. Is this just
coincidental, and will I see better results if I combine the categorical
data into a single column?
On Sun, Jul 28, 2013 at 9:06 AM, Lars Buitinck <[email protected]> wrote:
> 2013/7/28 Oğuz Yarımtepe <[email protected]>:
> > I had read the scikit preprocessing issues and it seems i shoudl have
> used
> > DictVectoricer to encode my categorical string values after i put them
> in a
> > dict format. But i am not sure how i will use the resulting output at the
> > random forest code.
>
> What you get from DictVectorizer is a sparse matrix containing one-hot
> coded categorical values (booleans). Random forests don't support
> those, but fortunately they (should) handle categorical values without
> one-hot coding, so you do something like
>
> np.unique(column, return_inverse=True)
>
> on each column containing categorical values. The second return value
> of this contains an integer index representation of the column.
>
> --
> Lars Buitinck
> Scientific programmer, ILPS
> University of Amsterdam
>
>
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general