Hi there,

I am relatively new Hadoop and a greenhorn concerning Mahout :)

Basically, I am playing around with mahout classification. We want to classify 
user ratings for our product, not only based on the stars but also on the text.
Okay, I guess it is a sentiment :)
The whole process is quite clear, so I do not have any problems with the algos 
or so.

So, we got the target variable "x" that has already been classified by a very 
patient person to -1/0/+1.

And we have several predictors, e.g. category,country, language, ratingText, 
ratingTitle and a little bit more.
Country+language+category are categorical, ratingText and ratingTitle is a 
text-like predictor.


I am doing an SQL select and receive all these values and now I want to write 
them line-by-line into a hadoop sequence file so that mahout can read these 
data.

How do I arrange multiple values under the same key, e.g. /category/DATABASEID?

I tried to adapt the SequenceFile Writer from here 
https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/
 but i don't know how to store the single predictors.

I appreciate any hint on how to solve this problem!

Many thanks,

Jan

Reply via email to