Hi Jossef,
You have to vectorize and normalize your data. The input for naive bayes
is a sequencefile containing a Text object as key (your label) and a
VectorWritable that holds a vector with the data.
Instructions to run NaiveBayes can be found here:
https://mahout.apache.org/users/classification/bayesian.html
--sebastian
On 05/03/2014 07:40 PM, Jossef Harush wrote:
I have these 2 CSV files:
1. train-set.csv
2. test-set.csv
Both of them are in the same structure (with different content) and similar
to this example (http://i.stack.imgur.com/jsckr.png) :
[image: enter image description here]
Each column is a feature and the last column - class, is the name of the
class to predict.
.
*Can anyone please provide a sample code for:*
1. Initializing Naive Bayes with a CSV file (model creation, training,
required pre-processing, etc...)
2. For a given CSV row - predicting a class
Thanks!
.
.
BTW -
I'm using Mahout 0.9 and Hadoop 2.4 and iv'e already tried to follow these
links:
http://web.archiveorange.com/archive/v/y0uRZw9Q4iHdjrm4Rfsu
http://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/
.