2013/7/28 Oğuz Yarımtepe <[email protected]>:
> I had read the scikit preprocessing issues and it seems i shoudl have used
> DictVectoricer to encode my categorical string values after i put them in a
> dict format. But i am not sure how i will use the resulting output at the
> random forest code.

What you get from DictVectorizer is a sparse matrix containing one-hot
coded categorical values (booleans). Random forests don't support
those, but fortunately they (should) handle categorical values without
one-hot coding, so you do something like

    np.unique(column, return_inverse=True)

on each column containing categorical values. The second return value
of this contains an integer index representation of the column.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to