On Mon, Jul 29, 2013 at 12:19 AM, Ross Boucher <[email protected]> wrote:

> Interesting, I've been using DictVectorizer (and one hot coded categorical
> data) with Random Forests and getting decent results. Is this just
> coincidental, and will I see better results if I combine the categorical
> data into a single column?
>
>
Can you give me a sample example of  DictVectorizer and RandomForest usage?
What i do is reading a csv file line by line:

        train = csv_io.read_data(train_file)
        #set the training responses
        self.target = [x[0] for x in train]
        #set the training features
        self.train = [x[1:] for x in train]

and csv_io.read_data is

for line in f:
        sample = []
        line = line.strip().split(",")
        for x in line:
            try:
                sample.append(float(x))
            except ValueError:
                sample.append(str(x))

        samples.append(sample)
        #sample = [float(x) for x in line]
        #samples.append(sample)
    return samples

How will i use DictVectorizer for string values above?
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to