Jossef, Does your training set have any features with a zero value for all instances?
> Date: Mon, 5 May 2014 08:33:37 +0300 > Subject: RE: Mahout Naive Bayes CSV Classification > From: [email protected] > To: [email protected] > > a link to a github gist with my java code and a small sample from the CSV > i'm using can be found here: > https://gist.github.com/Jossef/e6c8fc0c31f0c2bf036a > On May 5, 2014 5:53 AM, "Andrew Palumbo" <[email protected]> wrote: > > > Hi Jossef, > > > > I can answer your first two questions for you: > > > > > 1) Are these predicted values normal? > > > > Yes, negative scores are normal. > > > > > 2) For now, i'm assuming that the max value 'wins'. is that correct? > > > > That is correct, NaiveBayes uses a winner takes all approach to to class > > assignment based on the max score across all classes. ie. : > > > > > {0:-2119.616101368751,1:-2536.217343666528} > > > > will be classified as 0. > > > > > 3) When i call 'naiveBayesModel.numFeatures()' (line 96 in > > MahoutTest.java) > > > it returns 40 instead of 41 features. Why is that? > > > > This seems odd. Is it possible that something is getting dropped in your > > vectorization process? > > > > Could you give a little more information on how you're using this. Could > > you please clarify what you're referring to re: (line 96 in > > MahoutTest.java) > > > > Thanks, > > > > Andy > > > > > From: [email protected] > > > Date: Sun, 4 May 2014 23:16:48 +0300 > > > Subject: Re: Fwd: Mahout Naive Bayes CSV Classification > > > To: [email protected]; [email protected] > > > > > > Hey Sebastian, > > > > > > Thanks for your reply. > > > > > > a link to a github gist with my java code and a small sample from the CSV > > > i'm using can be found here: > > > https://gist.github.com/Jossef/e6c8fc0c31f0c2bf036a > > > > > > > > > > > > I wrote code to convert the csv data (41 features + class name) to a > > > RandomAccessSparseVector and appending it into a sequence file > > > > > > I successfully managed to create a model from the sequence file and to > > > run the NaiveBayes classifier with data. > > > > > > > > > My problem is that i get negative results when i call ' > > > classifier.classifyFull' > > > > > > e.g. : > > > > > > > > > {0:-2119.616101368751,1:-2536.217343666528} > > > {0:-3210.7575139461096,1:-4569.913127240827} > > > {0:-2986.049040829474,1:-3473.9551320126384} > > > {0:-2411.582039236549,1:-3487.8547154600456} > > > {0:-25620.824856365696,1:-31625.63011412386} > > > {0:-4601.922062356241,1:-5019.98413435188} > > > {0:-4331.835315861215,1:-4718.881475757016} > > > {0:-3568.9589306062785,1:-4132.310969149298} > > > ... > > > ... > > > > > > > > > > > > > > > 1) Are these predicted values normal? > > > 2) For now, i'm assuming that the max value 'wins'. is that correct? > > > 3) When i call 'naiveBayesModel.numFeatures()' (line 96 in > > MahoutTest.java) > > > it returns 40 instead of 41 features. Why is that? > > > > > > > > > Thanks :) > > > > > > > > > > > > > > > > > > On Sun, May 4, 2014 at 2:25 PM, Sebastian Schelter <[email protected]> > > wrote: > > > > > > > Hi Jossef, > > > > > > > > You have to vectorize and normalize your data. The input for naive > > bayes > > > > is a sequencefile containing a Text object as key (your label) and a > > > > VectorWritable that holds a vector with the data. > > > > > > > > Instructions to run NaiveBayes can be found here: > > > > > > > > https://mahout.apache.org/users/classification/bayesian.html > > > > > > > > --sebastian > > > > > > > > > > > > > > > > On 05/03/2014 07:40 PM, Jossef Harush wrote: > > > > > > > >> I have these 2 CSV files: > > > >> > > > >> 1. train-set.csv > > > >> 2. test-set.csv > > > >> > > > >> > > > >> Both of them are in the same structure (with different content) and > > > >> similar > > > >> to this example (http://i.stack.imgur.com/jsckr.png) : > > > >> > > > >> [image: enter image description here] > > > >> > > > >> Each column is a feature and the last column - class, is the name of > > the > > > >> class to predict. > > > >> > > > >> . > > > >> > > > >> *Can anyone please provide a sample code for:* > > > >> > > > >> 1. Initializing Naive Bayes with a CSV file (model creation, > > training, > > > >> required pre-processing, etc...) > > > >> 2. For a given CSV row - predicting a class > > > >> > > > >> > > > >> Thanks! > > > >> > > > >> . > > > >> > > > >> . > > > >> > > > >> BTW - > > > >> > > > >> I'm using Mahout 0.9 and Hadoop 2.4 and iv'e already tried to follow > > these > > > >> links: > > > >> > > > >> http://web.archiveorange.com/archive/v/y0uRZw9Q4iHdjrm4Rfsu > > > >> http://chimpler.wordpress.com/2013/03/13/using-the-mahout- > > > >> naive-bayes-classifier-to-automatically-classify-twitter-messages/ > > > >> > > > >> . > > > >> > > > >> > > > >> > > > > > > > > > > > > > -- > > > Sincerely, > > > > > > > > Jossef Harush. > > > jossef.com <http://www.jossef.com> > >
