I've inspected the code, our approach wouldn't work with booleanData=false. We do calcualte imte similarity in the wrong way...((( Thank you 1. We provide "fake" user_id and provide --usersFile in order to get recommendations for "fake user_id, where user_id is a negative item_id. It worked when we did provide user_id->item_id pairs without preference. 2. Our target is to get item similarities. We tried org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob but it returns bad result comparing to RecommenderJob with our "fake" user_id (inverted item_id)
1. I'll try the option you provided. 2. I will remove input with fake user_id and usersFile with these fake ids 3. https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java I don't understand how to pass ---outputPathForSimilarityMatrix option to RecommenderJob 2014-07-21 4:58 GMT+04:00 Peng Zhang <[email protected]>: > Seraga, > > I have two comments: > 1. Don’t use negative user ids. Since Mahout uses user id as well as item > id as the row/column index, you’d better use 0, 1, 2, etc as ids > 2. If you want to get the item similarity information, you can use > --outputPathForSimilarityMatrix in the command > > Regards, > Peng Zhang > M: +86 186-1658-7856 > [email protected] > > > > > > On Jul 21, 2014, at 4:00 AM, Serega Sheypak <[email protected]> > wrote: > > > All bad things happen here: > > > > > > > > Name > > > > RecommenderJob-PartialMultiplyMapper-Reducer > > > > User > > > > oozie > > > > Process User > > > > oozie > > > > Group > > > > oozie > > > > Mapper Class > > > > PartialMultiplyMapper > > > > Reducer Class > > > > AggregateAndRecommendReducer > > > > > > Job Input Directory > > > > hdfs://nameservice1/itemrec/temp/partialMultiply > > > > Job Output Directory > > > > hdfs://nameservice1/itemrec/output/ > > > > 14/07/20 23:57:47 INFO mapred.JobClient: Map input records=3312879 > > > > 14/07/20 23:57:47 INFO mapred.JobClient: Map output records=3313251 > > > > > > 14/07/20 23:57:47 INFO mapred.JobClient: Reduce input records=3313251 > > > > 14/07/20 23:57:47 INFO mapred.JobClient: Reduce output records=0 > > > > Why does mahout returns 0 rows? it works when booleanData=true > (preferences > > are ignored...?) > > > > > > > > 2014-07-20 23:19 GMT+04:00 Serega Sheypak <[email protected]>: > > > >> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40 > >> users_file: > >> --inverted_item_id > >> -1 > >> -2 > >> -3 > >> -4 > >> > >> users_items_prefs > >> --inverted item_id > >> -1 1 1.0 > >> -2 2 1.0 > >> -3 3 1.0 > >> -4 4 1.0 > >> --user_id item_id pref_value > >> 11 1 1.6 > >> 11 2 1.6 > >> 123 3 2.0 > >> 123 4 2.0 > >> 333 1 2.0 > >> 333 2 1.6 > >> --e.t.c. > >> > >> if I set --booleanData true > >> then mahout returns the result. > >> > >> > >> > >> > >> 2014-07-20 23:12 GMT+04:00 Andrew Musselman <[email protected] > >: > >> > >> I'm confused about how you're constructing the user file, and why there > >>> are negated item ids here. > >>> > >>> Can you post some more details please, including Mahout version and > some > >>> sample data sets? > >>> > >>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak < > [email protected]> > >>> wrote: > >>>> > >>>> Hi, I'm trying to create item similarity. > >>>> I gather items which users visit during shopping and then create a > file: > >>>> user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], > depends > >>> on > >>>> user action type and data source) > >>>> UNION > >>>> -item_id, item_id, 1.0 (from items dictionary) > >>>> > >>>> and I do provide a userFile, where user_id = -item_id > >>>> > >>>> The idea is to get item similary. If any user visits item named "A", i > >>> want > >>>> to show him items "B", "c", "xxx" using preferences of other users. > >>>> > >>>> The problem is that the last (???) mapreduce job returns 0 rows: > >>>> > >>>> Here are my settings: > >>>> > >>>> > >>>> sudo -u oozie mahout recommenditembased \ > >>>> --input visited_items_with_inverted_items \ > >>>> > >>>> --output result \ > >>>> --similarityClassname SIMILARITY_LOGLIKELIHOOD \ > >>>> --usersFile inverted_items \ > >>>> --numRecommendations 500 \ > >>>> --booleanData false \ > >>>> --maxPrefsPerUser 100 \ > >>>> --maxSimilaritiesPerItem 500 \ > >>>> --minPrefsPerUser 0\ > >>>> --maxPrefsPerUserInItemSimilarity 30 \ > >>>> --threshold 0.91 \ > >>>> --tempDir temp \ > >>>> > >>>> Some counters... I don't get what do they mean.... > >>>> > >>>> 14/07/20 22:43:08 INFO mapred.JobClient: > >>>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters > >>>> > >>>> 14/07/20 22:43:08 INFO mapred.JobClient: USERS=7528530 > >>>> > >>>> 14/07/20 22:43:43 INFO mapred.JobClient: > >>>> > >>> > org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements > >>>> > >>>> 14/07/20 22:43:43 INFO mapred.JobClient: > >>>> USER_RATINGS_NEGLECTED=1,798,738 > >>>> > >>>> 14/07/20 22:43:43 INFO mapred.JobClient: > >>> USER_RATINGS_USED=12,429,693 > >>>> > >>>> > >>>> 14/07/20 22:44:24 INFO mapred.JobClient: > >>>> > >>> > org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters > >>>> > >>>> 14/07/20 22:44:24 INFO mapred.JobClient: ROWS=3312879 > >>>> > >>>> 14/07/20 22:45:18 INFO mapred.JobClient: > >>>> > >>> > org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters > >>>> > >>>> 14/07/20 22:45:18 INFO mapred.JobClient: COOCCURRENCES=35882374 > >>>> > >>>> 14/07/20 22:45:18 INFO mapred.JobClient: PRUNED_COOCCURRENCES=0 > >>>> > >>>> 14/07/20 22:46:00 INFO mapred.JobClient: Map input records=3312879 > >>>> > >>>> 14/07/20 22:46:00 INFO mapred.JobClient: Map output > records=17570268 > >>>> > >>>> 14/07/20 22:46:00 INFO mapred.JobClient: Reduce input > >>> records=5221907 > >>>> > >>>> 14/07/20 22:46:00 INFO mapred.JobClient: Reduce output > >>> records=3312879 > >>>> > >>>> > >>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input > >>> records=3312879 > >>>> > >>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output > >>> records=3312879 > >>>> > >>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input > >>> records=3312879 > >>>> > >>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output > >>> records=3312879 > >>>> > >>>> 14/07/20 22:47:06 INFO mapred.JobClient: Map input records=7528530 > >>>> > >>>> 14/07/20 22:47:06 INFO mapred.JobClient: Map output > records=3313251 > >>>> > >>>> 14/07/20 22:47:06 INFO mapred.JobClient: Reduce input > >>> records=3313251 > >>>> > >>>> 14/07/20 22:47:06 INFO mapred.JobClient: Reduce output > >>> records=3313251 > >>>> > >>>> 14/07/20 22:47:40 INFO mapred.JobClient: Map input records=6626130 > >>>> > >>>> 14/07/20 22:47:40 INFO mapred.JobClient: Map output > records=6626130 > >>>> > >>>> 14/07/20 22:47:40 INFO mapred.JobClient: Reduce input > >>> records=6626130 > >>>> > >>>> 14/07/20 22:47:40 INFO mapred.JobClient: Reduce output > >>> records=3312879 > >>>> > >>>> > >>>> 14/07/20 22:48:26 INFO mapred.JobClient: Map input records=3312879 > >>>> > >>>> 14/07/20 22:48:26 INFO mapred.JobClient: Map output > records=3313251 > >>>> > >>>> 14/07/20 22:48:26 INFO mapred.JobClient: Reduce input > >>> records=3313251 > >>>> > >>>> -------- > >>>> 14/07/20 22:48:26 INFO mapred.JobClient: Reduce output records=0 > >>>> -------- > >>>> > >>>> why 0??? > >>> > >> > >> > >
