Hi Mahouters, I just posted part 2 of a series on extracting text features for machine learning…
http://www.scaleunlimited.com/2013/07/21/text-feature-selection-for-machine-learning-part-2/ The top five terms (by LLR score) in emails written by Ted are now u_k, v_k, sgd, regress, and categori. Which is way better than the very first results (see previous blog post), which were v3, 3, v2, q, and 0.00000 Regards, -- Ken -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
