{"word": vocabulary[word], ...}
the trained data is lie [[0.0, 1.0, 'xxx', 'yyy', '13.0', ...], ]
so when i use DictVectorizer it will create an array when i run
fit_transform somethign like
array([[ 1., 0.],
[ 0., 1.]])
with different shape and data. I am not sure how i will repla
2013/7/31 Oğuz Yarımtepe :
> How will i use DictVectorizer for string values above?
It won't do categorical integer coding directly. You can keep a
separate dict of the string values, say vocabulary, then feed
DictVectorizer dicts of the form
{"word": vocabulary[word], ...}
--
Lars Buitinck
On Mon, Jul 29, 2013 at 12:19 AM, Ross Boucher wrote:
> Interesting, I've been using DictVectorizer (and one hot coded categorical
> data) with Random Forests and getting decent results. Is this just
> coincidental, and will I see better results if I combine the categorical
> data into a single c
Hi,
> What you get from DictVectorizer is a sparse matrix containing one-hot
> coded categorical values (booleans). Random forests don't support
> those, but fortunately they (should) handle categorical values without
> one-hot coding, so you do something like
>
>
I tried with string values and
2013/7/28 Ross Boucher :
> Interesting, I've been using DictVectorizer (and one hot coded categorical
> data) with Random Forests and getting decent results. Is this just
> coincidental, and will I see better results if I combine the categorical
> data into a single column?
The thing is that dense
If the cardinality of the categorical variable is not too big, the
output of the DictVectorizer should be ok if you first convert it to a
dense numpy array (
by calling `.toarray()` on the CSR instance).
--
See everything
Interesting, I've been using DictVectorizer (and one hot coded categorical
data) with Random Forests and getting decent results. Is this just
coincidental, and will I see better results if I combine the categorical
data into a single column?
On Sun, Jul 28, 2013 at 9:06 AM, Lars Buitinck wrote:
2013/7/28 Oğuz Yarımtepe :
> I had read the scikit preprocessing issues and it seems i shoudl have used
> DictVectoricer to encode my categorical string values after i put them in a
> dict format. But i am not sure how i will use the resulting output at the
> random forest code.
What you get from
Hi,
I am trying to use random forest for my dataset that includes string values
also. The dataset that i used for training is a csv file but includes some
string categorical values also.
I had read the scikit preprocessing issues and it seems i shoudl have used
DictVectoricer to encode my categor