Hello Vikas,
I am actually having the same problem with the 20-newsgroups example. I
will reduce the directory structure to see if that works. Let me know if
you come up with a better solution.

Thanks,
-Ahmed

On Mon, Apr 9, 2012 at 11:13 AM, Vikas <[email protected]> wrote:

> Hi All,
>
> I am new to Mahout hence the question might sound a bit stupid.
> Even so, please do humor me.
> I am trying to build a Bayes classifier engine on the lines of a spam
> filter.
> But it has more than 2 classes, and will contain a network later.
> I plan to use Mahout for the same.
>
> I tried to follow the implementation example of 20-newsgroups.
> But the format differs from the explanation in "Mahout in Action", Figure
> 13.2.
> The New Examples have Predictor Variables only, and no Target Variables.
>
> But the example contains target variables in both the training and test
> set.
> An example is, alt.atheism.txt from both training and test data.
> It contains rows of data in the format
> "alt.atheism<tab>some_text_to_classify".
> The result is a matrix showing the Naive Bayes probability of
> classification.
>
> Am I missing something?
> Or does the Naive Bayes implementation using Mahout require this data
> format?
>
> I am still looking for some other implementations, but to no avail.
>
> Any help is greatly appreciated.
>
> Thank you,
> Vikas
>
>

Reply via email to