the version is: CDH-4.7.0-1.cdh4.7.0.p0.40 users_file: --inverted_item_id -1 -2 -3 -4
users_items_prefs --inverted item_id -1 1 1.0 -2 2 1.0 -3 3 1.0 -4 4 1.0 --user_id item_id pref_value 11 1 1.6 11 2 1.6 123 3 2.0 123 4 2.0 333 1 2.0 333 2 1.6 --e.t.c. if I set --booleanData true then mahout returns the result. 2014-07-20 23:12 GMT+04:00 Andrew Musselman <[email protected]>: > I'm confused about how you're constructing the user file, and why there > are negated item ids here. > > Can you post some more details please, including Mahout version and some > sample data sets? > > > On Jul 20, 2014, at 11:57 AM, Serega Sheypak <[email protected]> > wrote: > > > > Hi, I'm trying to create item similarity. > > I gather items which users visit during shopping and then create a file: > > user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], depends > on > > user action type and data source) > > UNION > > -item_id, item_id, 1.0 (from items dictionary) > > > > and I do provide a userFile, where user_id = -item_id > > > > The idea is to get item similary. If any user visits item named "A", i > want > > to show him items "B", "c", "xxx" using preferences of other users. > > > > The problem is that the last (???) mapreduce job returns 0 rows: > > > > Here are my settings: > > > > > > sudo -u oozie mahout recommenditembased \ > > --input visited_items_with_inverted_items \ > > > > --output result \ > > --similarityClassname SIMILARITY_LOGLIKELIHOOD \ > > --usersFile inverted_items \ > > --numRecommendations 500 \ > > --booleanData false \ > > --maxPrefsPerUser 100 \ > > --maxSimilaritiesPerItem 500 \ > > --minPrefsPerUser 0\ > > --maxPrefsPerUserInItemSimilarity 30 \ > > --threshold 0.91 \ > > --tempDir temp \ > > > > Some counters... I don't get what do they mean.... > > > > 14/07/20 22:43:08 INFO mapred.JobClient: > > org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters > > > > 14/07/20 22:43:08 INFO mapred.JobClient: USERS=7528530 > > > > 14/07/20 22:43:43 INFO mapred.JobClient: > > > org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements > > > > 14/07/20 22:43:43 INFO mapred.JobClient: > > USER_RATINGS_NEGLECTED=1,798,738 > > > > 14/07/20 22:43:43 INFO mapred.JobClient: USER_RATINGS_USED=12,429,693 > > > > > > 14/07/20 22:44:24 INFO mapred.JobClient: > > > > org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters > > > > 14/07/20 22:44:24 INFO mapred.JobClient: ROWS=3312879 > > > > 14/07/20 22:45:18 INFO mapred.JobClient: > > > > org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters > > > > 14/07/20 22:45:18 INFO mapred.JobClient: COOCCURRENCES=35882374 > > > > 14/07/20 22:45:18 INFO mapred.JobClient: PRUNED_COOCCURRENCES=0 > > > > 14/07/20 22:46:00 INFO mapred.JobClient: Map input records=3312879 > > > > 14/07/20 22:46:00 INFO mapred.JobClient: Map output records=17570268 > > > > 14/07/20 22:46:00 INFO mapred.JobClient: Reduce input records=5221907 > > > > 14/07/20 22:46:00 INFO mapred.JobClient: Reduce output > records=3312879 > > > > > > 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input records=3312879 > > > > 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output > records=3312879 > > > > 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input records=3312879 > > > > 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output > records=3312879 > > > > 14/07/20 22:47:06 INFO mapred.JobClient: Map input records=7528530 > > > > 14/07/20 22:47:06 INFO mapred.JobClient: Map output records=3313251 > > > > 14/07/20 22:47:06 INFO mapred.JobClient: Reduce input records=3313251 > > > > 14/07/20 22:47:06 INFO mapred.JobClient: Reduce output > records=3313251 > > > > 14/07/20 22:47:40 INFO mapred.JobClient: Map input records=6626130 > > > > 14/07/20 22:47:40 INFO mapred.JobClient: Map output records=6626130 > > > > 14/07/20 22:47:40 INFO mapred.JobClient: Reduce input records=6626130 > > > > 14/07/20 22:47:40 INFO mapred.JobClient: Reduce output > records=3312879 > > > > > > 14/07/20 22:48:26 INFO mapred.JobClient: Map input records=3312879 > > > > 14/07/20 22:48:26 INFO mapred.JobClient: Map output records=3313251 > > > > 14/07/20 22:48:26 INFO mapred.JobClient: Reduce input records=3313251 > > > > -------- > > 14/07/20 22:48:26 INFO mapred.JobClient: Reduce output records=0 > > -------- > > > > why 0??? >
