Hm... rather confusing... You are talking about input for: org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob or org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
My target is to get item-item similarity. ItemSimilarityJob right now returns few similarities. I'm readin this: https://mahout.apache.org/users/recommender/intro-itembased-hadoop.html and that: https://mahout.apache.org/users/recommender/userbased-5-minutes.html I don't see there something about " Your IDs must be in the range from 0 to the number of rows" for both items and users. Where does this requirement come from? 2014-07-25 23:57 GMT+04:00 Pat Ferrel <[email protected]>: > I think I did explain below. Your IDs must be in the range from 0 to the > number of rows - 1 and the same for item IDs. This is done by taking your > application specific IDs and mapping them to sequential non-negative > Integers. You need to maintain a mapping to/from Mahout IDs somewhere in > your own code. > > For example imagine input of the form > -92, abc, 1.0 > 75000x, jkl, 2.0 > > Your first user ID is -92, give it Mahout ID = 0. For your next user ID > 75000x give it Mahout ID = 1 > Your first item ID is abc, give it Mahout ID = 0. For your next item ID > jkl give it Mahout ID = 1 > keep doing this the first time you see a unique id from your input. A Map > will do this for you. > > And so on. Then the input to Mahout would be: > 0,0,1.0 > 1,1,2.0 > > The output will have Mahout IDs too so you need to map recommendations for > Mahout User ID 0 back to your User ID of -92, and the same for all item IDs. > > > On Jul 25, 2014, at 11:55 AM, Serega Sheypak <[email protected]> > wrote: > > I'm preparing data using apache hive: user_id:long, item_it:long, > preference[1.0, 2.0] > I don't understand "For most Mahout jobs you have to prepare you data to > have Mahout IDs". What is "Mahout IDs"? I try to follow mahout site docs, I > didn't find there something related to mahout ids. > Please explain. > > > 2014-07-25 22:39 GMT+04:00 Pat Ferrel <[email protected]>: > > > Sorry I haven’t read this thread carefully but it looks like you may be > > using the wrong IDs. > > > > For most Mahout jobs you have to prepare you data to have Mahout IDs. You > > do this by looking at each datum and as you see a new unique application > > specific user or item ID you give it a Mahout ID starting from 0. So > Mahout > > ID can be thought of as row and column numbers in a matrix. The Mahout > IDs > > for rows will be 0 thru # of rows-1 same for columns. > > > > This always requires that you translate into Mahout IDs then after the > job > > is run translate back into your application IDs. You need a > bi-directional > > dictionary of some type. I use a HashBiMap from Guava. > > > > Also I’d avoid the threshold for now. If you get that wrong it will mess > > things up badly and is very hard to tune. It’s there for completeness > but I > > never use it. > > > > > > On Jul 25, 2014, at 12:55 AM, Serega Sheypak <[email protected]> > > wrote: > > > > Hi, nothing helps... > > I do use mahout 0.9 compiled for CDH 4.7 > > I do provide only positive values > > I do use itemsimilarityJob and do get 2000 similarities for 1400 unique > > items > > Input data is: > > 16*10^6 preferences > > 4*10^6 users > > 0.6*10^ items > > I do use perason correlation and preferece vlaues are: 1.0 and 2.0 > > > > > > 2014-07-22 9:32 GMT+04:00 Serega Sheypak <[email protected]>: > > > >> Ok, I have recompiled mahout 0.9 for CDH 4.7. I'll try this evening. > >> Right now I don't see how can it help me. As far as I know the stuff I > > try > >> to use is pretty old and stable. > >> looks like I do apply it in a wrong way. > >> > >> There is an option for recommenditembased named "--threshold". I do > >> provide data for recommenditembased with preference values in range > >> [1.1..2.0]. > >> I set --threshold to 1.2 > >> --threshold is absolute and can be from [1.1 . .2+] or it's relative and > >> can be [0.0 .. 0.99999]? > >> > >> > >> 2014-07-22 3:54 GMT+04:00 Ted Dunning <[email protected]>: > >> > >> That version is no longer supported. You should upgrade to 0.9 > >>> > >>> > >>> > >>> > >>> On Mon, Jul 21, 2014 at 11:41 AM, Serega Sheypak < > >>> [email protected]> > >>> wrote: > >>> > >>>> 0.7-cdh4.7.0 > >>>> Anyway, recommenditembased does produce these catalogs: > >>>> > >>>> /recommenditembased/temp/maxValues.bin > >>>> /recommenditembased/temp/norms.bin > >>>> /recommenditembased/temp/numNonZeroEntries.bin > >>>> /recommenditembased/temp/pairwiseSimilarity > >>>> /recommenditembased/temp/partialMultiply > >>>> /recommenditembased/temp/prePartialMultiply1 > >>>> /recommenditembased/temp/prePartialMultiply2 > >>>> /recommenditembased/temp/preparePreferenceMatrix > >>>> /recommenditembased/temp/similarityMatrix > >>>> /recommenditembased/temp/weights > >>>> > >>>> I suppose that "/recommenditembased/temp/similarityMatrix" is the > thing > >>> In > >>>> eed. Right now I try to read it using > >>>> > >>>> matrix = LOAD '/recommenditembased/temp/similarityMatrix' USING > >>>> com.twitter.elephantbird.pig.load.SequenceFileLoader( > >>>> '-c com.twitter.elephantbird.pig.util.IntWritableConverter', > >>>> '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter' > >>>> ) as (intId: int, vector:tuple(cardinality:int, > >>>> entries:bag{t:tuple(some_id:long, some_value:double)})); > >>>> > >>>> > >>>> Looks like the vector is empty... Or i do something wrong. > >>>> > >>>> > >>>> > >>>> 2014-07-21 22:09 GMT+04:00 Ted Dunning <[email protected]>: > >>>> > >>>>> Which version of Mahout? > >>>>> > >>>>> > >>>>> On Mon, Jul 21, 2014 at 11:05 AM, Serega Sheypak < > >>>> [email protected] > >>>>>> > >>>>> wrote: > >>>>> > >>>>>> Hi, I've tried: Unexpected --outputPathForSimilarityMatrix while > >>>>> processing > >>>>>> Job-Specific > >>>>>> > >>>>>> sudo -u hdfs hadoop fs -rm -r > >>>>> hdfs://nameservice1/recommenditembased/output > >>>>>> sudo -u hdfs hadoop fs -rm -r > >>>> hdfs://nameservice1/recommenditembased/temp > >>>>>> sudo -u oozie mahout recommenditembased \ > >>>>>> --input \ > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>> > > > hdfs://nameservice1/user/hive/warehouse/staging_weighted_visits_and_rec_clicks > >>>>>> \ > >>>>>> --output \ > >>>>>> hdfs://nameservice1/recommenditembased/output \ > >>>>>> --similarityClassname \ > >>>>>> SIMILARITY_LOGLIKELIHOOD \ > >>>>>> --numRecommendations \ > >>>>>> 500 \ > >>>>>> --booleanData \ > >>>>>> false \ > >>>>>> --maxPrefsPerUser \ > >>>>>> 1000 \ > >>>>>> --maxSimilaritiesPerItem \ > >>>>>> 1000 \ > >>>>>> --minPrefsPerUser \ > >>>>>> 5 \ > >>>>>> --maxPrefsPerUserInItemSimilarity \ > >>>>>> 30 \ > >>>>>> --threshold \ > >>>>>> 1.1 \ > >>>>>> --tempDir \ > >>>>>> hdfs://nameservice1/recommenditembased/temp \ > >>>>>> --outputPathForSimilarityMatrix \ > >>>>>> > >>> hdfs://nameservice1/recommenditembased/sim_matrix > >>>>>> > >>>>>> > >>>>>> I'm on Cloudera cdh 4.7, looks like this feature is not supported. > >>>>>> > >>>>>> > >>>>>> 2014-07-21 11:18 GMT+04:00 Peng Zhang <[email protected]>: > >>>>>> > >>>>>>> Serega, > >>>>>>> > >>>>>>> See the last line on how to pass outputPathForSimilarityMatrix > >>>> options > >>>>> to > >>>>>>> the recommenditembased command: > >>>>>>> > >>>>>>> sudo -u oozie mahout recommenditembased \ > >>>>>>> --input visited_items_with_inverted_items \ > >>>>>>> > >>>>>>> --output result \ > >>>>>>> --similarityClassname SIMILARITY_LOGLIKELIHOOD > >>> \ > >>>>>>> --usersFile inverted_items \ > >>>>>>> --numRecommendations 500 \ > >>>>>>> --booleanData false \ > >>>>>>> --maxPrefsPerUser 100 \ > >>>>>>> --maxSimilaritiesPerItem 500 \ > >>>>>>> --minPrefsPerUser 0\ > >>>>>>> --maxPrefsPerUserInItemSimilarity 30 \ > >>>>>>> --threshold 0.91 \ > >>>>>>> --tempDir temp \ > >>>>>>> --outputPathForSimilarityMatrix > >>> similarityMatri \ > >>>>>>> > >>>>>>> > >>>>>>> Peng Zhang > >>>>>>> [email protected] > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On Jul 21, 2014, at 3:09 PM, Serega Sheypak < > >>>> [email protected]> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> I've inspected the code, our approach wouldn't work with > >>>>>>> booleanData=false. > >>>>>>>> We do calcualte imte similarity in the wrong way...((( > >>>>>>>> Thank you > >>>>>>>> 1. We provide "fake" user_id and provide --usersFile in order to > >>>> get > >>>>>>>> recommendations for "fake user_id, where user_id is a negative > >>>>> item_id. > >>>>>>> It > >>>>>>>> worked when we did provide user_id->item_id pairs without > >>>> preference. > >>>>>>>> 2. Our target is to get item similarities. We tried > >>>>>>>> > >>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob > >>>>> but > >>>>>>> it > >>>>>>>> returns bad result comparing to RecommenderJob with our "fake" > >>>>> user_id > >>>>>>>> (inverted item_id) > >>>>>>>> > >>>>>>>> 1. I'll try the option you provided. > >>>>>>>> 2. I will remove input with fake user_id and usersFile with > >>> these > >>>>> fake > >>>>>>> ids > >>>>>>>> > >>>>>>>> 3. > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > > > https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java > >>>>>>>> I don't understand how to pass ---outputPathForSimilarityMatrix > >>>>> option > >>>>>> to > >>>>>>>> RecommenderJob > >>>>>>>> > >>>>>>>> > >>>>>>>> 2014-07-21 4:58 GMT+04:00 Peng Zhang <[email protected]>: > >>>>>>>> > >>>>>>>>> Seraga, > >>>>>>>>> > >>>>>>>>> I have two comments: > >>>>>>>>> 1. Don’t use negative user ids. Since Mahout uses user id as > >>> well > >>>> as > >>>>>>> item > >>>>>>>>> id as the row/column index, you’d better use 0, 1, 2, etc as > >>> ids > >>>>>>>>> 2. If you want to get the item similarity information, you can > >>> use > >>>>>>>>> --outputPathForSimilarityMatrix in the command > >>>>>>>>> > >>>>>>>>> Regards, > >>>>>>>>> Peng Zhang > >>>>>>>>> M: +86 186-1658-7856 > >>>>>>>>> [email protected] > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Jul 21, 2014, at 4:00 AM, Serega Sheypak < > >>>>> [email protected] > >>>>>>> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> All bad things happen here: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Name > >>>>>>>>>> > >>>>>>>>>> RecommenderJob-PartialMultiplyMapper-Reducer > >>>>>>>>>> > >>>>>>>>>> User > >>>>>>>>>> > >>>>>>>>>> oozie > >>>>>>>>>> > >>>>>>>>>> Process User > >>>>>>>>>> > >>>>>>>>>> oozie > >>>>>>>>>> > >>>>>>>>>> Group > >>>>>>>>>> > >>>>>>>>>> oozie > >>>>>>>>>> > >>>>>>>>>> Mapper Class > >>>>>>>>>> > >>>>>>>>>> PartialMultiplyMapper > >>>>>>>>>> > >>>>>>>>>> Reducer Class > >>>>>>>>>> > >>>>>>>>>> AggregateAndRecommendReducer > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Job Input Directory > >>>>>>>>>> > >>>>>>>>>> hdfs://nameservice1/itemrec/temp/partialMultiply > >>>>>>>>>> > >>>>>>>>>> Job Output Directory > >>>>>>>>>> > >>>>>>>>>> hdfs://nameservice1/itemrec/output/ > >>>>>>>>>> > >>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient: Map input > >>>>>> records=3312879 > >>>>>>>>>> > >>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient: Map output > >>>>>> records=3313251 > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient: Reduce input > >>>>>>> records=3313251 > >>>>>>>>>> > >>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient: Reduce output > >>>>> records=0 > >>>>>>>>>> > >>>>>>>>>> Why does mahout returns 0 rows? it works when booleanData=true > >>>>>>>>> (preferences > >>>>>>>>>> are ignored...?) > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> 2014-07-20 23:19 GMT+04:00 Serega Sheypak < > >>>>> [email protected] > >>>>>>> : > >>>>>>>>>> > >>>>>>>>>>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40 > >>>>>>>>>>> users_file: > >>>>>>>>>>> --inverted_item_id > >>>>>>>>>>> -1 > >>>>>>>>>>> -2 > >>>>>>>>>>> -3 > >>>>>>>>>>> -4 > >>>>>>>>>>> > >>>>>>>>>>> users_items_prefs > >>>>>>>>>>> --inverted item_id > >>>>>>>>>>> -1 1 1.0 > >>>>>>>>>>> -2 2 1.0 > >>>>>>>>>>> -3 3 1.0 > >>>>>>>>>>> -4 4 1.0 > >>>>>>>>>>> --user_id item_id pref_value > >>>>>>>>>>> 11 1 1.6 > >>>>>>>>>>> 11 2 1.6 > >>>>>>>>>>> 123 3 2.0 > >>>>>>>>>>> 123 4 2.0 > >>>>>>>>>>> 333 1 2.0 > >>>>>>>>>>> 333 2 1.6 > >>>>>>>>>>> --e.t.c. > >>>>>>>>>>> > >>>>>>>>>>> if I set --booleanData true > >>>>>>>>>>> then mahout returns the result. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> 2014-07-20 23:12 GMT+04:00 Andrew Musselman < > >>>>>>> [email protected] > >>>>>>>>>> : > >>>>>>>>>>> > >>>>>>>>>>> I'm confused about how you're constructing the user file, and > >>>> why > >>>>>>> there > >>>>>>>>>>>> are negated item ids here. > >>>>>>>>>>>> > >>>>>>>>>>>> Can you post some more details please, including Mahout > >>> version > >>>>> and > >>>>>>>>> some > >>>>>>>>>>>> sample data sets? > >>>>>>>>>>>> > >>>>>>>>>>>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak < > >>>>>>>>> [email protected]> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi, I'm trying to create item similarity. > >>>>>>>>>>>>> I gather items which users visit during shopping and then > >>>>> create a > >>>>>>>>> file: > >>>>>>>>>>>>> user_id, item_id, weight (where weight can be: [1.0, 1.6, > >>>> 1.9], > >>>>>>>>> depends > >>>>>>>>>>>> on > >>>>>>>>>>>>> user action type and data source) > >>>>>>>>>>>>> UNION > >>>>>>>>>>>>> -item_id, item_id, 1.0 (from items dictionary) > >>>>>>>>>>>>> > >>>>>>>>>>>>> and I do provide a userFile, where user_id = -item_id > >>>>>>>>>>>>> > >>>>>>>>>>>>> The idea is to get item similary. If any user visits item > >>>> named > >>>>>>> "A", i > >>>>>>>>>>>> want > >>>>>>>>>>>>> to show him items "B", "c", "xxx" using preferences of > >>> other > >>>>>> users. > >>>>>>>>>>>>> > >>>>>>>>>>>>> The problem is that the last (???) mapreduce job returns 0 > >>>> rows: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Here are my settings: > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> sudo -u oozie mahout recommenditembased \ > >>>>>>>>>>>>> --input visited_items_with_inverted_items > >>> \ > >>>>>>>>>>>>> > >>>>>>>>>>>>> --output result \ > >>>>>>>>>>>>> --similarityClassname > >>>> SIMILARITY_LOGLIKELIHOOD > >>>>> \ > >>>>>>>>>>>>> --usersFile inverted_items \ > >>>>>>>>>>>>> --numRecommendations 500 \ > >>>>>>>>>>>>> --booleanData false \ > >>>>>>>>>>>>> --maxPrefsPerUser 100 \ > >>>>>>>>>>>>> --maxSimilaritiesPerItem 500 \ > >>>>>>>>>>>>> --minPrefsPerUser 0\ > >>>>>>>>>>>>> --maxPrefsPerUserInItemSimilarity 30 \ > >>>>>>>>>>>>> --threshold 0.91 \ > >>>>>>>>>>>>> --tempDir temp \ > >>>>>>>>>>>>> > >>>>>>>>>>>>> Some counters... I don't get what do they mean.... > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient: > >>>>>>>>>>>>> > >>>>>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient: USERS=7528530 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient: > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > > > org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient: > >>>>>>>>>>>>> USER_RATINGS_NEGLECTED=1,798,738 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient: > >>>>>>>>>>>> USER_RATINGS_USED=12,429,693 > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient: > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > > > org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient: ROWS=3312879 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient: > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > > > org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient: > >>>>>> COOCCURRENCES=35882374 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient: > >>>>>> PRUNED_COOCCURRENCES=0 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient: Map input > >>>>>>> records=3312879 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient: Map output > >>>>>>>>> records=17570268 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>> records=5221907 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient: Reduce output > >>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output > >>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output > >>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient: Map input > >>>>>>> records=7528530 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient: Map output > >>>>>>>>> records=3313251 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>> records=3313251 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient: Reduce output > >>>>>>>>>>>> records=3313251 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient: Map input > >>>>>>> records=6626130 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient: Map output > >>>>>>>>> records=6626130 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>> records=6626130 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient: Reduce output > >>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient: Map input > >>>>>>> records=3312879 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient: Map output > >>>>>>>>> records=3313251 > >>>>>>>>>>>>> > >>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>> records=3313251 > >>>>>>>>>>>>> > >>>>>>>>>>>>> -------- > >>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient: Reduce output > >>>>>> records=0 > >>>>>>>>>>>>> -------- > >>>>>>>>>>>>> > >>>>>>>>>>>>> why 0??? > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >> > > > > > >
