Hi Mahouters,

I just posted part 2 of a series on extracting text features for machine 
learning…

http://www.scaleunlimited.com/2013/07/21/text-feature-selection-for-machine-learning-part-2/

The top five terms (by LLR score) in emails written by Ted are now u_k, v_k, 
sgd, regress, and categori. Which is way better than the very first results 
(see previous blog post), which were v3, 3, v2, q, and 0.00000

Regards,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to