Hi Jossef,
I can answer your first two questions for you:
> 1) Are these predicted values normal?
Yes, negative scores are normal.
> 2) For now, i'm assuming that the max value 'wins'. is that correct?
That is correct, NaiveBayes uses a winner takes all approach to to class
assignment based on the max score across all classes. ie. :
> {0:-2119.616101368751,1:-2536.217343666528}
will be classified as 0.
> 3) When i call 'naiveBayesModel.numFeatures()' (line 96 in MahoutTest.java)
> it returns 40 instead of 41 features. Why is that?
This seems odd. Is it possible that something is getting dropped in your
vectorization process?
Could you give a little more information on how you're using this. Could you
please clarify what you're referring to re: (line 96 in MahoutTest.java)
Thanks,
Andy
> From: [email protected]
> Date: Sun, 4 May 2014 23:16:48 +0300
> Subject: Re: Fwd: Mahout Naive Bayes CSV Classification
> To: [email protected]; [email protected]
>
> Hey Sebastian,
>
> Thanks for your reply.
>
> a link to a github gist with my java code and a small sample from the CSV
> i'm using can be found here:
> https://gist.github.com/Jossef/e6c8fc0c31f0c2bf036a
>
>
>
> I wrote code to convert the csv data (41 features + class name) to a
> RandomAccessSparseVector and appending it into a sequence file
>
> I successfully managed to create a model from the sequence file and to
> run the NaiveBayes classifier with data.
>
>
> My problem is that i get negative results when i call '
> classifier.classifyFull'
>
> e.g. :
>
>
> {0:-2119.616101368751,1:-2536.217343666528}
> {0:-3210.7575139461096,1:-4569.913127240827}
> {0:-2986.049040829474,1:-3473.9551320126384}
> {0:-2411.582039236549,1:-3487.8547154600456}
> {0:-25620.824856365696,1:-31625.63011412386}
> {0:-4601.922062356241,1:-5019.98413435188}
> {0:-4331.835315861215,1:-4718.881475757016}
> {0:-3568.9589306062785,1:-4132.310969149298}
> ...
> ...
>
>
>
>
> 1) Are these predicted values normal?
> 2) For now, i'm assuming that the max value 'wins'. is that correct?
> 3) When i call 'naiveBayesModel.numFeatures()' (line 96 in MahoutTest.java)
> it returns 40 instead of 41 features. Why is that?
>
>
> Thanks :)
>
>
>
>
>
> On Sun, May 4, 2014 at 2:25 PM, Sebastian Schelter <[email protected]> wrote:
>
> > Hi Jossef,
> >
> > You have to vectorize and normalize your data. The input for naive bayes
> > is a sequencefile containing a Text object as key (your label) and a
> > VectorWritable that holds a vector with the data.
> >
> > Instructions to run NaiveBayes can be found here:
> >
> > https://mahout.apache.org/users/classification/bayesian.html
> >
> > --sebastian
> >
> >
> >
> > On 05/03/2014 07:40 PM, Jossef Harush wrote:
> >
> >> I have these 2 CSV files:
> >>
> >> 1. train-set.csv
> >> 2. test-set.csv
> >>
> >>
> >> Both of them are in the same structure (with different content) and
> >> similar
> >> to this example (http://i.stack.imgur.com/jsckr.png) :
> >>
> >> [image: enter image description here]
> >>
> >> Each column is a feature and the last column - class, is the name of the
> >> class to predict.
> >>
> >> .
> >>
> >> *Can anyone please provide a sample code for:*
> >>
> >> 1. Initializing Naive Bayes with a CSV file (model creation, training,
> >> required pre-processing, etc...)
> >> 2. For a given CSV row - predicting a class
> >>
> >>
> >> Thanks!
> >>
> >> .
> >>
> >> .
> >>
> >> BTW -
> >>
> >> I'm using Mahout 0.9 and Hadoop 2.4 and iv'e already tried to follow these
> >> links:
> >>
> >> http://web.archiveorange.com/archive/v/y0uRZw9Q4iHdjrm4Rfsu
> >> http://chimpler.wordpress.com/2013/03/13/using-the-mahout-
> >> naive-bayes-classifier-to-automatically-classify-twitter-messages/
> >>
> >> .
> >>
> >>
> >>
> >
>
>
> --
> Sincerely,
>
> Jossef Harush.
> jossef.com <http://www.jossef.com>