Hi All, I am new to Mahout hence the question might sound a bit stupid. Even so, please do humor me. I am trying to build a Bayes classifier engine on the lines of a spam filter. But it has more than 2 classes, and will contain a network later. I plan to use Mahout for the same.
I tried to follow the implementation example of 20-newsgroups. But the format differs from the explanation in "Mahout in Action", Figure 13.2. The New Examples have Predictor Variables only, and no Target Variables. But the example contains target variables in both the training and test set. An example is, alt.atheism.txt from both training and test data. It contains rows of data in the format "alt.atheism<tab>some_text_to_classify". The result is a matrix showing the Naive Bayes probability of classification. Am I missing something? Or does the Naive Bayes implementation using Mahout require this data format? I am still looking for some other implementations, but to no avail. Any help is greatly appreciated. Thank you, Vikas
