All bad things happen here:
Name RecommenderJob-PartialMultiplyMapper-Reducer User oozie Process User oozie Group oozie Mapper Class PartialMultiplyMapper Reducer Class AggregateAndRecommendReducer Job Input Directory hdfs://nameservice1/itemrec/temp/partialMultiply Job Output Directory hdfs://nameservice1/itemrec/output/ 14/07/20 23:57:47 INFO mapred.JobClient: Map input records=3312879 14/07/20 23:57:47 INFO mapred.JobClient: Map output records=3313251 14/07/20 23:57:47 INFO mapred.JobClient: Reduce input records=3313251 14/07/20 23:57:47 INFO mapred.JobClient: Reduce output records=0 Why does mahout returns 0 rows? it works when booleanData=true (preferences are ignored...?) 2014-07-20 23:19 GMT+04:00 Serega Sheypak <[email protected]>: > the version is: CDH-4.7.0-1.cdh4.7.0.p0.40 > users_file: > --inverted_item_id > -1 > -2 > -3 > -4 > > users_items_prefs > --inverted item_id > -1 1 1.0 > -2 2 1.0 > -3 3 1.0 > -4 4 1.0 > --user_id item_id pref_value > 11 1 1.6 > 11 2 1.6 > 123 3 2.0 > 123 4 2.0 > 333 1 2.0 > 333 2 1.6 > --e.t.c. > > if I set --booleanData true > then mahout returns the result. > > > > > 2014-07-20 23:12 GMT+04:00 Andrew Musselman <[email protected]>: > > I'm confused about how you're constructing the user file, and why there >> are negated item ids here. >> >> Can you post some more details please, including Mahout version and some >> sample data sets? >> >> > On Jul 20, 2014, at 11:57 AM, Serega Sheypak <[email protected]> >> wrote: >> > >> > Hi, I'm trying to create item similarity. >> > I gather items which users visit during shopping and then create a file: >> > user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], depends >> on >> > user action type and data source) >> > UNION >> > -item_id, item_id, 1.0 (from items dictionary) >> > >> > and I do provide a userFile, where user_id = -item_id >> > >> > The idea is to get item similary. If any user visits item named "A", i >> want >> > to show him items "B", "c", "xxx" using preferences of other users. >> > >> > The problem is that the last (???) mapreduce job returns 0 rows: >> > >> > Here are my settings: >> > >> > >> > sudo -u oozie mahout recommenditembased \ >> > --input visited_items_with_inverted_items \ >> > >> > --output result \ >> > --similarityClassname SIMILARITY_LOGLIKELIHOOD \ >> > --usersFile inverted_items \ >> > --numRecommendations 500 \ >> > --booleanData false \ >> > --maxPrefsPerUser 100 \ >> > --maxSimilaritiesPerItem 500 \ >> > --minPrefsPerUser 0\ >> > --maxPrefsPerUserInItemSimilarity 30 \ >> > --threshold 0.91 \ >> > --tempDir temp \ >> > >> > Some counters... I don't get what do they mean.... >> > >> > 14/07/20 22:43:08 INFO mapred.JobClient: >> > org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters >> > >> > 14/07/20 22:43:08 INFO mapred.JobClient: USERS=7528530 >> > >> > 14/07/20 22:43:43 INFO mapred.JobClient: >> > >> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements >> > >> > 14/07/20 22:43:43 INFO mapred.JobClient: >> > USER_RATINGS_NEGLECTED=1,798,738 >> > >> > 14/07/20 22:43:43 INFO mapred.JobClient: >> USER_RATINGS_USED=12,429,693 >> > >> > >> > 14/07/20 22:44:24 INFO mapred.JobClient: >> > >> >> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters >> > >> > 14/07/20 22:44:24 INFO mapred.JobClient: ROWS=3312879 >> > >> > 14/07/20 22:45:18 INFO mapred.JobClient: >> > >> >> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters >> > >> > 14/07/20 22:45:18 INFO mapred.JobClient: COOCCURRENCES=35882374 >> > >> > 14/07/20 22:45:18 INFO mapred.JobClient: PRUNED_COOCCURRENCES=0 >> > >> > 14/07/20 22:46:00 INFO mapred.JobClient: Map input records=3312879 >> > >> > 14/07/20 22:46:00 INFO mapred.JobClient: Map output records=17570268 >> > >> > 14/07/20 22:46:00 INFO mapred.JobClient: Reduce input >> records=5221907 >> > >> > 14/07/20 22:46:00 INFO mapred.JobClient: Reduce output >> records=3312879 >> > >> > >> > 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input >> records=3312879 >> > >> > 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output >> records=3312879 >> > >> > 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input >> records=3312879 >> > >> > 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output >> records=3312879 >> > >> > 14/07/20 22:47:06 INFO mapred.JobClient: Map input records=7528530 >> > >> > 14/07/20 22:47:06 INFO mapred.JobClient: Map output records=3313251 >> > >> > 14/07/20 22:47:06 INFO mapred.JobClient: Reduce input >> records=3313251 >> > >> > 14/07/20 22:47:06 INFO mapred.JobClient: Reduce output >> records=3313251 >> > >> > 14/07/20 22:47:40 INFO mapred.JobClient: Map input records=6626130 >> > >> > 14/07/20 22:47:40 INFO mapred.JobClient: Map output records=6626130 >> > >> > 14/07/20 22:47:40 INFO mapred.JobClient: Reduce input >> records=6626130 >> > >> > 14/07/20 22:47:40 INFO mapred.JobClient: Reduce output >> records=3312879 >> > >> > >> > 14/07/20 22:48:26 INFO mapred.JobClient: Map input records=3312879 >> > >> > 14/07/20 22:48:26 INFO mapred.JobClient: Map output records=3313251 >> > >> > 14/07/20 22:48:26 INFO mapred.JobClient: Reduce input >> records=3313251 >> > >> > -------- >> > 14/07/20 22:48:26 INFO mapred.JobClient: Reduce output records=0 >> > -------- >> > >> > why 0??? >> > >
