Hi there, I am relatively new Hadoop and a greenhorn concerning Mahout :)
Basically, I am playing around with mahout classification. We want to classify user ratings for our product, not only based on the stars but also on the text. Okay, I guess it is a sentiment :) The whole process is quite clear, so I do not have any problems with the algos or so. So, we got the target variable "x" that has already been classified by a very patient person to -1/0/+1. And we have several predictors, e.g. category,country, language, ratingText, ratingTitle and a little bit more. Country+language+category are categorical, ratingText and ratingTitle is a text-like predictor. I am doing an SQL select and receive all these values and now I want to write them line-by-line into a hadoop sequence file so that mahout can read these data. How do I arrange multiple values under the same key, e.g. /category/DATABASEID? I tried to adapt the SequenceFile Writer from here https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/ but i don't know how to store the single predictors. I appreciate any hint on how to solve this problem! Many thanks, Jan
