Thank you for all the answers. Yes, I'm not dealing with arbitrary strings,
just a set of possible values, so the binary representation seems OK.

One more way would be computing classification probability for each value
and plugging the resulting number back into data. For example, let's say
there are 10 samples with BMW in the training set, and 3 of them are 1
(true), 7 are 0 (false). So the maximum likelihood of BMW sample being true
is 0.3, and we'd put 0.3 instead of BMW in these 10 samples.

What do you think, is it sound matematically?
------------------------------------------------------------------------------
Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex
infrastructure or vast IT resources to deliver seamless, secure access to
virtual desktops. With this all-in-one solution, easily deploy virtual 
desktops for less than the cost of PCs and save 60% on VDI infrastructure 
costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to