Hey Sebastian, Thanks for your reply.
a link to a github gist with my java code and a small sample from the CSV i'm using can be found here: https://gist.github.com/Jossef/e6c8fc0c31f0c2bf036a I wrote code to convert the csv data (41 features + class name) to a RandomAccessSparseVector and appending it into a sequence file I successfully managed to create a model from the sequence file and to run the NaiveBayes classifier with data. My problem is that i get negative results when i call ' classifier.classifyFull' e.g. : {0:-2119.616101368751,1:-2536.217343666528} {0:-3210.7575139461096,1:-4569.913127240827} {0:-2986.049040829474,1:-3473.9551320126384} {0:-2411.582039236549,1:-3487.8547154600456} {0:-25620.824856365696,1:-31625.63011412386} {0:-4601.922062356241,1:-5019.98413435188} {0:-4331.835315861215,1:-4718.881475757016} {0:-3568.9589306062785,1:-4132.310969149298} ... ... 1) Are these predicted values normal? 2) For now, i'm assuming that the max value 'wins'. is that correct? 3) When i call 'naiveBayesModel.numFeatures()' (line 96 in MahoutTest.java) it returns 40 instead of 41 features. Why is that? Thanks :) On Sun, May 4, 2014 at 2:25 PM, Sebastian Schelter <[email protected]> wrote: > Hi Jossef, > > You have to vectorize and normalize your data. The input for naive bayes > is a sequencefile containing a Text object as key (your label) and a > VectorWritable that holds a vector with the data. > > Instructions to run NaiveBayes can be found here: > > https://mahout.apache.org/users/classification/bayesian.html > > --sebastian > > > > On 05/03/2014 07:40 PM, Jossef Harush wrote: > >> I have these 2 CSV files: >> >> 1. train-set.csv >> 2. test-set.csv >> >> >> Both of them are in the same structure (with different content) and >> similar >> to this example (http://i.stack.imgur.com/jsckr.png) : >> >> [image: enter image description here] >> >> Each column is a feature and the last column - class, is the name of the >> class to predict. >> >> . >> >> *Can anyone please provide a sample code for:* >> >> 1. Initializing Naive Bayes with a CSV file (model creation, training, >> required pre-processing, etc...) >> 2. For a given CSV row - predicting a class >> >> >> Thanks! >> >> . >> >> . >> >> BTW - >> >> I'm using Mahout 0.9 and Hadoop 2.4 and iv'e already tried to follow these >> links: >> >> http://web.archiveorange.com/archive/v/y0uRZw9Q4iHdjrm4Rfsu >> http://chimpler.wordpress.com/2013/03/13/using-the-mahout- >> naive-bayes-classifier-to-automatically-classify-twitter-messages/ >> >> . >> >> >> > -- Sincerely, Jossef Harush. jossef.com <http://www.jossef.com>
