2011/12/30 Bronco Zaurus <[email protected]>: > One more way would be computing classification probability for each value > and plugging the resulting number back into data. For example, let's say > there are 10 samples with BMW in the training set, and 3 of them are 1 > (true), 7 are 0 (false). So the maximum likelihood of BMW sample being true > is 0.3, and we'd put 0.3 instead of BMW in these 10 samples. > > What do you think, is it sound matematically?
No. One-hot representation is the way to go with categorical features. I did some work on a transformer to handle this in the past, but gave up the project due to lack of time. -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ Write once. Port to many. Get the SDK and tools to simplify cross-platform app development. Create new or port existing apps to sell to consumers worldwide. Explore the Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join http://p.sf.net/sfu/intel-appdev _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
