Seraga, I have two comments: 1. Don’t use negative user ids. Since Mahout uses user id as well as item id as the row/column index, you’d better use 0, 1, 2, etc as ids 2. If you want to get the item similarity information, you can use --outputPathForSimilarityMatrix in the command
Regards, Peng Zhang M: +86 186-1658-7856 [email protected] On Jul 21, 2014, at 4:00 AM, Serega Sheypak <[email protected]> wrote: > All bad things happen here: > > > > Name > > RecommenderJob-PartialMultiplyMapper-Reducer > > User > > oozie > > Process User > > oozie > > Group > > oozie > > Mapper Class > > PartialMultiplyMapper > > Reducer Class > > AggregateAndRecommendReducer > > > Job Input Directory > > hdfs://nameservice1/itemrec/temp/partialMultiply > > Job Output Directory > > hdfs://nameservice1/itemrec/output/ > > 14/07/20 23:57:47 INFO mapred.JobClient: Map input records=3312879 > > 14/07/20 23:57:47 INFO mapred.JobClient: Map output records=3313251 > > > 14/07/20 23:57:47 INFO mapred.JobClient: Reduce input records=3313251 > > 14/07/20 23:57:47 INFO mapred.JobClient: Reduce output records=0 > > Why does mahout returns 0 rows? it works when booleanData=true (preferences > are ignored...?) > > > > 2014-07-20 23:19 GMT+04:00 Serega Sheypak <[email protected]>: > >> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40 >> users_file: >> --inverted_item_id >> -1 >> -2 >> -3 >> -4 >> >> users_items_prefs >> --inverted item_id >> -1 1 1.0 >> -2 2 1.0 >> -3 3 1.0 >> -4 4 1.0 >> --user_id item_id pref_value >> 11 1 1.6 >> 11 2 1.6 >> 123 3 2.0 >> 123 4 2.0 >> 333 1 2.0 >> 333 2 1.6 >> --e.t.c. >> >> if I set --booleanData true >> then mahout returns the result. >> >> >> >> >> 2014-07-20 23:12 GMT+04:00 Andrew Musselman <[email protected]>: >> >> I'm confused about how you're constructing the user file, and why there >>> are negated item ids here. >>> >>> Can you post some more details please, including Mahout version and some >>> sample data sets? >>> >>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak <[email protected]> >>> wrote: >>>> >>>> Hi, I'm trying to create item similarity. >>>> I gather items which users visit during shopping and then create a file: >>>> user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], depends >>> on >>>> user action type and data source) >>>> UNION >>>> -item_id, item_id, 1.0 (from items dictionary) >>>> >>>> and I do provide a userFile, where user_id = -item_id >>>> >>>> The idea is to get item similary. If any user visits item named "A", i >>> want >>>> to show him items "B", "c", "xxx" using preferences of other users. >>>> >>>> The problem is that the last (???) mapreduce job returns 0 rows: >>>> >>>> Here are my settings: >>>> >>>> >>>> sudo -u oozie mahout recommenditembased \ >>>> --input visited_items_with_inverted_items \ >>>> >>>> --output result \ >>>> --similarityClassname SIMILARITY_LOGLIKELIHOOD \ >>>> --usersFile inverted_items \ >>>> --numRecommendations 500 \ >>>> --booleanData false \ >>>> --maxPrefsPerUser 100 \ >>>> --maxSimilaritiesPerItem 500 \ >>>> --minPrefsPerUser 0\ >>>> --maxPrefsPerUserInItemSimilarity 30 \ >>>> --threshold 0.91 \ >>>> --tempDir temp \ >>>> >>>> Some counters... I don't get what do they mean.... >>>> >>>> 14/07/20 22:43:08 INFO mapred.JobClient: >>>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters >>>> >>>> 14/07/20 22:43:08 INFO mapred.JobClient: USERS=7528530 >>>> >>>> 14/07/20 22:43:43 INFO mapred.JobClient: >>>> >>> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements >>>> >>>> 14/07/20 22:43:43 INFO mapred.JobClient: >>>> USER_RATINGS_NEGLECTED=1,798,738 >>>> >>>> 14/07/20 22:43:43 INFO mapred.JobClient: >>> USER_RATINGS_USED=12,429,693 >>>> >>>> >>>> 14/07/20 22:44:24 INFO mapred.JobClient: >>>> >>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters >>>> >>>> 14/07/20 22:44:24 INFO mapred.JobClient: ROWS=3312879 >>>> >>>> 14/07/20 22:45:18 INFO mapred.JobClient: >>>> >>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters >>>> >>>> 14/07/20 22:45:18 INFO mapred.JobClient: COOCCURRENCES=35882374 >>>> >>>> 14/07/20 22:45:18 INFO mapred.JobClient: PRUNED_COOCCURRENCES=0 >>>> >>>> 14/07/20 22:46:00 INFO mapred.JobClient: Map input records=3312879 >>>> >>>> 14/07/20 22:46:00 INFO mapred.JobClient: Map output records=17570268 >>>> >>>> 14/07/20 22:46:00 INFO mapred.JobClient: Reduce input >>> records=5221907 >>>> >>>> 14/07/20 22:46:00 INFO mapred.JobClient: Reduce output >>> records=3312879 >>>> >>>> >>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input >>> records=3312879 >>>> >>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output >>> records=3312879 >>>> >>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input >>> records=3312879 >>>> >>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output >>> records=3312879 >>>> >>>> 14/07/20 22:47:06 INFO mapred.JobClient: Map input records=7528530 >>>> >>>> 14/07/20 22:47:06 INFO mapred.JobClient: Map output records=3313251 >>>> >>>> 14/07/20 22:47:06 INFO mapred.JobClient: Reduce input >>> records=3313251 >>>> >>>> 14/07/20 22:47:06 INFO mapred.JobClient: Reduce output >>> records=3313251 >>>> >>>> 14/07/20 22:47:40 INFO mapred.JobClient: Map input records=6626130 >>>> >>>> 14/07/20 22:47:40 INFO mapred.JobClient: Map output records=6626130 >>>> >>>> 14/07/20 22:47:40 INFO mapred.JobClient: Reduce input >>> records=6626130 >>>> >>>> 14/07/20 22:47:40 INFO mapred.JobClient: Reduce output >>> records=3312879 >>>> >>>> >>>> 14/07/20 22:48:26 INFO mapred.JobClient: Map input records=3312879 >>>> >>>> 14/07/20 22:48:26 INFO mapred.JobClient: Map output records=3313251 >>>> >>>> 14/07/20 22:48:26 INFO mapred.JobClient: Reduce input >>> records=3313251 >>>> >>>> -------- >>>> 14/07/20 22:48:26 INFO mapred.JobClient: Reduce output records=0 >>>> -------- >>>> >>>> why 0??? >>> >> >>
