Hello Vikas, I am actually having the same problem with the 20-newsgroups example. I will reduce the directory structure to see if that works. Let me know if you come up with a better solution.
Thanks, -Ahmed On Mon, Apr 9, 2012 at 11:13 AM, Vikas <[email protected]> wrote: > Hi All, > > I am new to Mahout hence the question might sound a bit stupid. > Even so, please do humor me. > I am trying to build a Bayes classifier engine on the lines of a spam > filter. > But it has more than 2 classes, and will contain a network later. > I plan to use Mahout for the same. > > I tried to follow the implementation example of 20-newsgroups. > But the format differs from the explanation in "Mahout in Action", Figure > 13.2. > The New Examples have Predictor Variables only, and no Target Variables. > > But the example contains target variables in both the training and test > set. > An example is, alt.atheism.txt from both training and test data. > It contains rows of data in the format > "alt.atheism<tab>some_text_to_classify". > The result is a matrix showing the Naive Bayes probability of > classification. > > Am I missing something? > Or does the Naive Bayes implementation using Mahout require this data > format? > > I am still looking for some other implementations, but to no avail. > > Any help is greatly appreciated. > > Thank you, > Vikas > >
