Re: Use Naïve Bayes on a large CSV

2014-02-25 Thread Kevin Moulart
I finally managed to make it run, I had to format the class label in the input file with a / in the name so I put Yes/1 or No/0 instead of just 1 or 0. But then I noticed when testing the model that it doesn't classify all the data : 14/02/25 16:16:30 INFO mapred.JobClient: Map-Reduce Framework

Re: Use Naïve Bayes on a large CSV

2014-02-25 Thread Kevin Moulart
For information purpose, this is the program creating the sequence file : public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(true); FileSystem fs = FileSystem.get(conf); // The input file is not in

Re: Use Naïve Bayes on a large CSV

2014-02-24 Thread Kevin Moulart
Hi again, I finally set my mind on going through java to make a sequence file for the naive bayes, but I still can't manage to find anyplace stating exactly what should be in the sequence file for mahout to process it with Naive Bayes. I tried virtually every piece of code i found related to this

Re: Use Naïve Bayes on a large CSV

2014-02-24 Thread Sebastian Schelter
NaiveBayes expects a SequenceFile as input. The key is the class label as Text, the value are the features as VectorWritable. --sebastian On 02/24/2014 11:51 AM, Kevin Moulart wrote: Hi again, I finally set my mind on going through java to make a sequence file for the naive bayes, but I still

Re: Use Naïve Bayes on a large CSV

2014-02-24 Thread Kevin Moulart
Thanks, that's about the clearest answer I got so far :) 2014-02-24 11:59 GMT+01:00 Sebastian Schelter s...@apache.org: NaiveBayes expects a SequenceFile as input. The key is the class label as Text, the value are the features as VectorWritable. --sebastian On 02/24/2014 11:51 AM, Kevin

Re: Use Naïve Bayes on a large CSV

2014-02-24 Thread Ted Dunning
Kevin, While this is fresh in your mind can you prepare a javadoc patch that would have helped you out? And suggest other doc patches as well? On Mon, Feb 24, 2014 at 3:00 AM, Kevin Moulart kevinmoul...@gmail.comwrote: Thanks, that's about the clearest answer I got so far :) 2014-02-24

Re: Use Naïve Bayes on a large CSV

2014-02-24 Thread Kevin Moulart
I'll do that as soon as I manage to make it work ^^', that's a great idea ! I'm stuck with this for now : public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Configuration conf = new Configuration(true); FileSystem fs =

Use Naïve Bayes on a large CSV

2014-02-20 Thread Kevin Moulart
Hi I'm trying to apply a Naive Bayes Classifier to a large CSV file from the command line. I know I have to feed the classifier with a seq file, so I tried to put my csv into one using the command seqdirectory, but even when I try with a really small csv (less than 100Mo) I instantly get an

Re: Use Naïve Bayes on a large CSV

2014-02-20 Thread Suneel Marthi
To convert input CSV to vectors, u can either: a) Use CSVIterator b) use InputDriver Either of the above should generate vectors from input CSV that could then be fed into Mahout classifier/clustering jobs. On Thursday, February 20, 2014 5:57 AM, Kevin Moulart kevinmoul...@gmail.com

Re: Use Naïve Bayes on a large CSV

2014-02-20 Thread Kevin Moulart
Hi and thanks ! What about the command line, is there a way to do that using the existing command line ? 2014-02-20 12:02 GMT+01:00 Suneel Marthi suneel_mar...@yahoo.com: To convert input CSV to vectors, u can either: a) Use CSVIterator b) use InputDriver Either of the above should

Re: Use Naïve Bayes on a large CSV

2014-02-20 Thread Jay Vyas
This relates to a previous question I have: Does mahout have a concept of adapters which allow us to read data csv style data with filters to create exact format for its various inputs (i.e. Recommender three column format).? If not is it worth a jira? On Feb 20, 2014, at 7:50 AM, Kevin