Re: [Scikit-learn-general] using string features for classification

Lars Buitinck Tue, 03 Jan 2012 06:57:13 -0800

2011/12/30 Bronco Zaurus <[email protected]>:
> One more way would be computing classification probability for each value
> and plugging the resulting number back into data. For example, let's say
> there are 10 samples with BMW in the training set, and 3 of them are 1
> (true), 7 are 0 (false). So the maximum likelihood of BMW sample being true
> is 0.3, and we'd put 0.3 instead of BMW in these 10 samples.
>
> What do you think, is it sound matematically?


No. One-hot representation is the way to go with categorical features.
I did some work on a transformer to handle this in the past, but gave
up the project due to lack of time.

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] using string features for classification

Reply via email to