Hello everybody, I'm trying to build a Bayesian classifier using plain text as input data. I've found one answer here: http://search-lucene.com/m/vWCqa1npUU01&subj=Re+How+to+train+naive+bayes+classifier+using+text+files
But running the asf-examples is not an option for me as it involves downloading the 200GB archive. I've read the Mahout in Action book and they use the newsgroups examples, and the example presented there: http://www.plugtree.com/ham-spam-and-elephants-or-how-to-build-a-spam-filter-server-with-mahout/ is also using the newsgroups processing Class. Isn't there any tutorial/documentation that explains how to handle input data that looks like this: train/ spam/ ham/ test/ spam/ ham/ where spam and ham contain plain text files? Did I miss some documentation? Can someone explain me how to process such input? cheers, Boris -- 42
