Hi Jossef,

You have to vectorize and normalize your data. The input for naive bayes is a sequencefile containing a Text object as key (your label) and a VectorWritable that holds a vector with the data.

Instructions to run NaiveBayes can be found here:

https://mahout.apache.org/users/classification/bayesian.html

--sebastian


On 05/03/2014 07:40 PM, Jossef Harush wrote:
I have these 2 CSV files:

    1. train-set.csv
    2. test-set.csv

Both of them are in the same structure (with different content) and similar
to this example (http://i.stack.imgur.com/jsckr.png) :

[image: enter image description here]

Each column is a feature and the last column - class, is the name of the
class to predict.

.

*Can anyone please provide a sample code for:*

    1. Initializing Naive Bayes with a CSV file (model creation, training,
    required pre-processing, etc...)
    2. For a given CSV row - predicting a class

Thanks!

.

.

BTW -

I'm using Mahout 0.9 and Hadoop 2.4 and iv'e already tried to follow these
links:

http://web.archiveorange.com/archive/v/y0uRZw9Q4iHdjrm4Rfsu
http://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/

.
​


Reply via email to