Hi,

> What you get from DictVectorizer is a sparse matrix containing one-hot
> coded categorical values (booleans). Random forests don't support
> those, but fortunately they (should) handle categorical values without
> one-hot coding, so you do something like
>
>
 I tried with string values and got no success. I have a csv file and
preprocessing it:

for line in f:
        sample = []
        line = line.strip().split(",")
        for x in line:
            try:
                sample.append(float(x))
            except ValueError:
                sample.append(str(x))

        samples.append(sample)
        #sample = [float(x) for x in line]
        #samples.append(sample)
    return samples


This is from csv_io.read_data at the above code. When the fit function is
called below is the error i got:

  File "/usr/lib/pymodules/python2.7/sklearn/ensemble/forest.py", line 294,
in fit
    X = array2d(X, dtype=DTYPE, order="F")
  File "/usr/lib/pymodules/python2.7/sklearn/utils/validation.py", line 80,
in array2d
    X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
  File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 320,
in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: SF

Either using string values is not a good idea or i am doing something
wrong. Any idea?
------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to