Hello everybody,

I'm trying to build a Bayesian classifier using plain text as input
data. I've found one answer here:
http://search-lucene.com/m/vWCqa1npUU01&subj=Re+How+to+train+naive+bayes+classifier+using+text+files

But running the asf-examples is not an option for me as it involves
downloading the 200GB archive.

I've read the Mahout in Action book and they use the newsgroups
examples, and the example presented there:
http://www.plugtree.com/ham-spam-and-elephants-or-how-to-build-a-spam-filter-server-with-mahout/
is also using the newsgroups processing Class.

Isn't there any tutorial/documentation that explains how to handle
input data that looks like this:

train/
      spam/
      ham/

test/
     spam/
     ham/

where spam and ham contain plain text files?

Did I miss some documentation? Can someone explain me how to process such input?

cheers,
Boris
-- 
42

Reply via email to