Hey Sebastian,

Thanks for your reply.

a link to a github gist with my java code and a small sample from the CSV
i'm using can be found here:
https://gist.github.com/Jossef/e6c8fc0c31f0c2bf036a



I wrote code to convert the csv data (41 features + class name) to a
RandomAccessSparseVector and appending it into a sequence file

I successfully managed to create a model from the sequence file and to
run the NaiveBayes classifier with data.


My problem is that i get negative results when i call '
classifier.classifyFull'

e.g. :


{0:-2119.616101368751,1:-2536.217343666528}
{0:-3210.7575139461096,1:-4569.913127240827}
{0:-2986.049040829474,1:-3473.9551320126384}
{0:-2411.582039236549,1:-3487.8547154600456}
{0:-25620.824856365696,1:-31625.63011412386}
{0:-4601.922062356241,1:-5019.98413435188}
{0:-4331.835315861215,1:-4718.881475757016}
{0:-3568.9589306062785,1:-4132.310969149298}
...
...




1) Are these predicted values normal?
2) For now, i'm assuming that the max value 'wins'. is that correct?
3) When i call 'naiveBayesModel.numFeatures()' (line 96 in MahoutTest.java)
it returns 40 instead of 41 features. Why is that?


Thanks :)





On Sun, May 4, 2014 at 2:25 PM, Sebastian Schelter <[email protected]> wrote:

> Hi Jossef,
>
> You have to vectorize and normalize your data. The input for naive bayes
> is a sequencefile containing a Text object as key (your label) and a
> VectorWritable that holds a vector with the data.
>
> Instructions to run NaiveBayes can be found here:
>
> https://mahout.apache.org/users/classification/bayesian.html
>
> --sebastian
>
>
>
> On 05/03/2014 07:40 PM, Jossef Harush wrote:
>
>> I have these 2 CSV files:
>>
>>     1. train-set.csv
>>     2. test-set.csv
>>
>>
>> Both of them are in the same structure (with different content) and
>> similar
>> to this example (http://i.stack.imgur.com/jsckr.png) :
>>
>> [image: enter image description here]
>>
>> Each column is a feature and the last column - class, is the name of the
>> class to predict.
>>
>> .
>>
>> *Can anyone please provide a sample code for:*
>>
>>     1. Initializing Naive Bayes with a CSV file (model creation, training,
>>     required pre-processing, etc...)
>>     2. For a given CSV row - predicting a class
>>
>>
>> Thanks!
>>
>> .
>>
>> .
>>
>> BTW -
>>
>> I'm using Mahout 0.9 and Hadoop 2.4 and iv'e already tried to follow these
>> links:
>>
>> http://web.archiveorange.com/archive/v/y0uRZw9Q4iHdjrm4Rfsu
>> http://chimpler.wordpress.com/2013/03/13/using-the-mahout-
>> naive-bayes-classifier-to-automatically-classify-twitter-messages/
>>
>> .
>> ​
>>
>>
>


-- 
Sincerely,

Jossef Harush.
jossef.com <http://www.jossef.com>

Reply via email to