Thank you! I could spend all my life trying to get result without knowing the requirements for input data.
BTW: we used mahout 0.7-cdh-4.4...cdh 4.7 org.apache.mahout.cf.taste.hadoop.item.RecommenderJob and did get results close to reality. We just provided long user_id, item_id and didn't do something special. Why did it work? 2014-07-27 5:18 GMT+04:00 Pat Ferrel <[email protected]>: > Both those jobs require you create Mahout IDs for users and items. For > most Hadoop based Mahout jobs, taking either text input or sequence files, > the IDs must follow the rules mentioned below. There are a few exceptions > but none you are using. The Wiki was rewritten for 0.9 and so the ID > requirements may not be documented well. You can file a Jira so someone > documents this. > > BTW spark-itemsimilarity will take any IDs and can read any text-delimited > file format, unfortunately it’s not quite ready yet. > > On Jul 26, 2014, at 3:14 AM, Serega Sheypak <[email protected]> > wrote: > > Hm... rather confusing... You are talking about input for: > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob > or > org.apache.mahout.cf.taste.hadoop.item.RecommenderJob > > My target is to get item-item similarity. ItemSimilarityJob right now > returns few similarities. > > I'm readin this: > https://mahout.apache.org/users/recommender/intro-itembased-hadoop.html > and that: > https://mahout.apache.org/users/recommender/userbased-5-minutes.html > > I don't see there something about " Your IDs must be in the range from 0 to > the number of rows" for both items and users. Where does this requirement > come from? > > > 2014-07-25 23:57 GMT+04:00 Pat Ferrel <[email protected]>: > > > I think I did explain below. Your IDs must be in the range from 0 to the > > number of rows - 1 and the same for item IDs. This is done by taking your > > application specific IDs and mapping them to sequential non-negative > > Integers. You need to maintain a mapping to/from Mahout IDs somewhere in > > your own code. > > > > For example imagine input of the form > > -92, abc, 1.0 > > 75000x, jkl, 2.0 > > > > Your first user ID is -92, give it Mahout ID = 0. For your next user ID > > 75000x give it Mahout ID = 1 > > Your first item ID is abc, give it Mahout ID = 0. For your next item ID > > jkl give it Mahout ID = 1 > > keep doing this the first time you see a unique id from your input. A Map > > will do this for you. > > > > And so on. Then the input to Mahout would be: > > 0,0,1.0 > > 1,1,2.0 > > > > The output will have Mahout IDs too so you need to map recommendations > for > > Mahout User ID 0 back to your User ID of -92, and the same for all item > IDs. > > > > > > On Jul 25, 2014, at 11:55 AM, Serega Sheypak <[email protected]> > > wrote: > > > > I'm preparing data using apache hive: user_id:long, item_it:long, > > preference[1.0, 2.0] > > I don't understand "For most Mahout jobs you have to prepare you data to > > have Mahout IDs". What is "Mahout IDs"? I try to follow mahout site > docs, I > > didn't find there something related to mahout ids. > > Please explain. > > > > > > 2014-07-25 22:39 GMT+04:00 Pat Ferrel <[email protected]>: > > > >> Sorry I haven’t read this thread carefully but it looks like you may be > >> using the wrong IDs. > >> > >> For most Mahout jobs you have to prepare you data to have Mahout IDs. > You > >> do this by looking at each datum and as you see a new unique application > >> specific user or item ID you give it a Mahout ID starting from 0. So > > Mahout > >> ID can be thought of as row and column numbers in a matrix. The Mahout > > IDs > >> for rows will be 0 thru # of rows-1 same for columns. > >> > >> This always requires that you translate into Mahout IDs then after the > > job > >> is run translate back into your application IDs. You need a > > bi-directional > >> dictionary of some type. I use a HashBiMap from Guava. > >> > >> Also I’d avoid the threshold for now. If you get that wrong it will mess > >> things up badly and is very hard to tune. It’s there for completeness > > but I > >> never use it. > >> > >> > >> On Jul 25, 2014, at 12:55 AM, Serega Sheypak <[email protected]> > >> wrote: > >> > >> Hi, nothing helps... > >> I do use mahout 0.9 compiled for CDH 4.7 > >> I do provide only positive values > >> I do use itemsimilarityJob and do get 2000 similarities for 1400 unique > >> items > >> Input data is: > >> 16*10^6 preferences > >> 4*10^6 users > >> 0.6*10^ items > >> I do use perason correlation and preferece vlaues are: 1.0 and 2.0 > >> > >> > >> 2014-07-22 9:32 GMT+04:00 Serega Sheypak <[email protected]>: > >> > >>> Ok, I have recompiled mahout 0.9 for CDH 4.7. I'll try this evening. > >>> Right now I don't see how can it help me. As far as I know the stuff I > >> try > >>> to use is pretty old and stable. > >>> looks like I do apply it in a wrong way. > >>> > >>> There is an option for recommenditembased named "--threshold". I do > >>> provide data for recommenditembased with preference values in range > >>> [1.1..2.0]. > >>> I set --threshold to 1.2 > >>> --threshold is absolute and can be from [1.1 . .2+] or it's relative > and > >>> can be [0.0 .. 0.99999]? > >>> > >>> > >>> 2014-07-22 3:54 GMT+04:00 Ted Dunning <[email protected]>: > >>> > >>> That version is no longer supported. You should upgrade to 0.9 > >>>> > >>>> > >>>> > >>>> > >>>> On Mon, Jul 21, 2014 at 11:41 AM, Serega Sheypak < > >>>> [email protected]> > >>>> wrote: > >>>> > >>>>> 0.7-cdh4.7.0 > >>>>> Anyway, recommenditembased does produce these catalogs: > >>>>> > >>>>> /recommenditembased/temp/maxValues.bin > >>>>> /recommenditembased/temp/norms.bin > >>>>> /recommenditembased/temp/numNonZeroEntries.bin > >>>>> /recommenditembased/temp/pairwiseSimilarity > >>>>> /recommenditembased/temp/partialMultiply > >>>>> /recommenditembased/temp/prePartialMultiply1 > >>>>> /recommenditembased/temp/prePartialMultiply2 > >>>>> /recommenditembased/temp/preparePreferenceMatrix > >>>>> /recommenditembased/temp/similarityMatrix > >>>>> /recommenditembased/temp/weights > >>>>> > >>>>> I suppose that "/recommenditembased/temp/similarityMatrix" is the > > thing > >>>> In > >>>>> eed. Right now I try to read it using > >>>>> > >>>>> matrix = LOAD '/recommenditembased/temp/similarityMatrix' USING > >>>>> com.twitter.elephantbird.pig.load.SequenceFileLoader( > >>>>> '-c com.twitter.elephantbird.pig.util.IntWritableConverter', > >>>>> '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter' > >>>>> ) as (intId: int, vector:tuple(cardinality:int, > >>>>> entries:bag{t:tuple(some_id:long, some_value:double)})); > >>>>> > >>>>> > >>>>> Looks like the vector is empty... Or i do something wrong. > >>>>> > >>>>> > >>>>> > >>>>> 2014-07-21 22:09 GMT+04:00 Ted Dunning <[email protected]>: > >>>>> > >>>>>> Which version of Mahout? > >>>>>> > >>>>>> > >>>>>> On Mon, Jul 21, 2014 at 11:05 AM, Serega Sheypak < > >>>>> [email protected] > >>>>>>> > >>>>>> wrote: > >>>>>> > >>>>>>> Hi, I've tried: Unexpected --outputPathForSimilarityMatrix while > >>>>>> processing > >>>>>>> Job-Specific > >>>>>>> > >>>>>>> sudo -u hdfs hadoop fs -rm -r > >>>>>> hdfs://nameservice1/recommenditembased/output > >>>>>>> sudo -u hdfs hadoop fs -rm -r > >>>>> hdfs://nameservice1/recommenditembased/temp > >>>>>>> sudo -u oozie mahout recommenditembased \ > >>>>>>> --input \ > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >> > > > hdfs://nameservice1/user/hive/warehouse/staging_weighted_visits_and_rec_clicks > >>>>>>> \ > >>>>>>> --output \ > >>>>>>> hdfs://nameservice1/recommenditembased/output \ > >>>>>>> --similarityClassname \ > >>>>>>> SIMILARITY_LOGLIKELIHOOD \ > >>>>>>> --numRecommendations \ > >>>>>>> 500 \ > >>>>>>> --booleanData \ > >>>>>>> false \ > >>>>>>> --maxPrefsPerUser \ > >>>>>>> 1000 \ > >>>>>>> --maxSimilaritiesPerItem \ > >>>>>>> 1000 \ > >>>>>>> --minPrefsPerUser \ > >>>>>>> 5 \ > >>>>>>> --maxPrefsPerUserInItemSimilarity \ > >>>>>>> 30 \ > >>>>>>> --threshold \ > >>>>>>> 1.1 \ > >>>>>>> --tempDir \ > >>>>>>> hdfs://nameservice1/recommenditembased/temp \ > >>>>>>> --outputPathForSimilarityMatrix \ > >>>>>>> > >>>> hdfs://nameservice1/recommenditembased/sim_matrix > >>>>>>> > >>>>>>> > >>>>>>> I'm on Cloudera cdh 4.7, looks like this feature is not supported. > >>>>>>> > >>>>>>> > >>>>>>> 2014-07-21 11:18 GMT+04:00 Peng Zhang <[email protected]>: > >>>>>>> > >>>>>>>> Serega, > >>>>>>>> > >>>>>>>> See the last line on how to pass outputPathForSimilarityMatrix > >>>>> options > >>>>>> to > >>>>>>>> the recommenditembased command: > >>>>>>>> > >>>>>>>> sudo -u oozie mahout recommenditembased \ > >>>>>>>> --input visited_items_with_inverted_items \ > >>>>>>>> > >>>>>>>> --output result \ > >>>>>>>> --similarityClassname SIMILARITY_LOGLIKELIHOOD > >>>> \ > >>>>>>>> --usersFile inverted_items \ > >>>>>>>> --numRecommendations 500 \ > >>>>>>>> --booleanData false \ > >>>>>>>> --maxPrefsPerUser 100 \ > >>>>>>>> --maxSimilaritiesPerItem 500 \ > >>>>>>>> --minPrefsPerUser 0\ > >>>>>>>> --maxPrefsPerUserInItemSimilarity 30 \ > >>>>>>>> --threshold 0.91 \ > >>>>>>>> --tempDir temp \ > >>>>>>>> --outputPathForSimilarityMatrix > >>>> similarityMatri \ > >>>>>>>> > >>>>>>>> > >>>>>>>> Peng Zhang > >>>>>>>> [email protected] > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On Jul 21, 2014, at 3:09 PM, Serega Sheypak < > >>>>> [email protected]> > >>>>>>>> wrote: > >>>>>>>> > >>>>>>>>> I've inspected the code, our approach wouldn't work with > >>>>>>>> booleanData=false. > >>>>>>>>> We do calcualte imte similarity in the wrong way...((( > >>>>>>>>> Thank you > >>>>>>>>> 1. We provide "fake" user_id and provide --usersFile in order to > >>>>> get > >>>>>>>>> recommendations for "fake user_id, where user_id is a negative > >>>>>> item_id. > >>>>>>>> It > >>>>>>>>> worked when we did provide user_id->item_id pairs without > >>>>> preference. > >>>>>>>>> 2. Our target is to get item similarities. We tried > >>>>>>>>> > >>>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob > >>>>>> but > >>>>>>>> it > >>>>>>>>> returns bad result comparing to RecommenderJob with our "fake" > >>>>>> user_id > >>>>>>>>> (inverted item_id) > >>>>>>>>> > >>>>>>>>> 1. I'll try the option you provided. > >>>>>>>>> 2. I will remove input with fake user_id and usersFile with > >>>> these > >>>>>> fake > >>>>>>>> ids > >>>>>>>>> > >>>>>>>>> 3. > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >> > > > https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java > >>>>>>>>> I don't understand how to pass ---outputPathForSimilarityMatrix > >>>>>> option > >>>>>>> to > >>>>>>>>> RecommenderJob > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> 2014-07-21 4:58 GMT+04:00 Peng Zhang <[email protected]>: > >>>>>>>>> > >>>>>>>>>> Seraga, > >>>>>>>>>> > >>>>>>>>>> I have two comments: > >>>>>>>>>> 1. Don’t use negative user ids. Since Mahout uses user id as > >>>> well > >>>>> as > >>>>>>>> item > >>>>>>>>>> id as the row/column index, you’d better use 0, 1, 2, etc as > >>>> ids > >>>>>>>>>> 2. If you want to get the item similarity information, you can > >>>> use > >>>>>>>>>> --outputPathForSimilarityMatrix in the command > >>>>>>>>>> > >>>>>>>>>> Regards, > >>>>>>>>>> Peng Zhang > >>>>>>>>>> M: +86 186-1658-7856 > >>>>>>>>>> [email protected] > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On Jul 21, 2014, at 4:00 AM, Serega Sheypak < > >>>>>> [email protected] > >>>>>>>> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> All bad things happen here: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Name > >>>>>>>>>>> > >>>>>>>>>>> RecommenderJob-PartialMultiplyMapper-Reducer > >>>>>>>>>>> > >>>>>>>>>>> User > >>>>>>>>>>> > >>>>>>>>>>> oozie > >>>>>>>>>>> > >>>>>>>>>>> Process User > >>>>>>>>>>> > >>>>>>>>>>> oozie > >>>>>>>>>>> > >>>>>>>>>>> Group > >>>>>>>>>>> > >>>>>>>>>>> oozie > >>>>>>>>>>> > >>>>>>>>>>> Mapper Class > >>>>>>>>>>> > >>>>>>>>>>> PartialMultiplyMapper > >>>>>>>>>>> > >>>>>>>>>>> Reducer Class > >>>>>>>>>>> > >>>>>>>>>>> AggregateAndRecommendReducer > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Job Input Directory > >>>>>>>>>>> > >>>>>>>>>>> hdfs://nameservice1/itemrec/temp/partialMultiply > >>>>>>>>>>> > >>>>>>>>>>> Job Output Directory > >>>>>>>>>>> > >>>>>>>>>>> hdfs://nameservice1/itemrec/output/ > >>>>>>>>>>> > >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient: Map input > >>>>>>> records=3312879 > >>>>>>>>>>> > >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient: Map output > >>>>>>> records=3313251 > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient: Reduce input > >>>>>>>> records=3313251 > >>>>>>>>>>> > >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient: Reduce output > >>>>>> records=0 > >>>>>>>>>>> > >>>>>>>>>>> Why does mahout returns 0 rows? it works when booleanData=true > >>>>>>>>>> (preferences > >>>>>>>>>>> are ignored...?) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> 2014-07-20 23:19 GMT+04:00 Serega Sheypak < > >>>>>> [email protected] > >>>>>>>> : > >>>>>>>>>>> > >>>>>>>>>>>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40 > >>>>>>>>>>>> users_file: > >>>>>>>>>>>> --inverted_item_id > >>>>>>>>>>>> -1 > >>>>>>>>>>>> -2 > >>>>>>>>>>>> -3 > >>>>>>>>>>>> -4 > >>>>>>>>>>>> > >>>>>>>>>>>> users_items_prefs > >>>>>>>>>>>> --inverted item_id > >>>>>>>>>>>> -1 1 1.0 > >>>>>>>>>>>> -2 2 1.0 > >>>>>>>>>>>> -3 3 1.0 > >>>>>>>>>>>> -4 4 1.0 > >>>>>>>>>>>> --user_id item_id pref_value > >>>>>>>>>>>> 11 1 1.6 > >>>>>>>>>>>> 11 2 1.6 > >>>>>>>>>>>> 123 3 2.0 > >>>>>>>>>>>> 123 4 2.0 > >>>>>>>>>>>> 333 1 2.0 > >>>>>>>>>>>> 333 2 1.6 > >>>>>>>>>>>> --e.t.c. > >>>>>>>>>>>> > >>>>>>>>>>>> if I set --booleanData true > >>>>>>>>>>>> then mahout returns the result. > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> 2014-07-20 23:12 GMT+04:00 Andrew Musselman < > >>>>>>>> [email protected] > >>>>>>>>>>> : > >>>>>>>>>>>> > >>>>>>>>>>>> I'm confused about how you're constructing the user file, and > >>>>> why > >>>>>>>> there > >>>>>>>>>>>>> are negated item ids here. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Can you post some more details please, including Mahout > >>>> version > >>>>>> and > >>>>>>>>>> some > >>>>>>>>>>>>> sample data sets? > >>>>>>>>>>>>> > >>>>>>>>>>>>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak < > >>>>>>>>>> [email protected]> > >>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi, I'm trying to create item similarity. > >>>>>>>>>>>>>> I gather items which users visit during shopping and then > >>>>>> create a > >>>>>>>>>> file: > >>>>>>>>>>>>>> user_id, item_id, weight (where weight can be: [1.0, 1.6, > >>>>> 1.9], > >>>>>>>>>> depends > >>>>>>>>>>>>> on > >>>>>>>>>>>>>> user action type and data source) > >>>>>>>>>>>>>> UNION > >>>>>>>>>>>>>> -item_id, item_id, 1.0 (from items dictionary) > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> and I do provide a userFile, where user_id = -item_id > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The idea is to get item similary. If any user visits item > >>>>> named > >>>>>>>> "A", i > >>>>>>>>>>>>> want > >>>>>>>>>>>>>> to show him items "B", "c", "xxx" using preferences of > >>>> other > >>>>>>> users. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> The problem is that the last (???) mapreduce job returns 0 > >>>>> rows: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Here are my settings: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> sudo -u oozie mahout recommenditembased \ > >>>>>>>>>>>>>> --input visited_items_with_inverted_items > >>>> \ > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> --output result \ > >>>>>>>>>>>>>> --similarityClassname > >>>>> SIMILARITY_LOGLIKELIHOOD > >>>>>> \ > >>>>>>>>>>>>>> --usersFile inverted_items \ > >>>>>>>>>>>>>> --numRecommendations 500 \ > >>>>>>>>>>>>>> --booleanData false \ > >>>>>>>>>>>>>> --maxPrefsPerUser 100 \ > >>>>>>>>>>>>>> --maxSimilaritiesPerItem 500 \ > >>>>>>>>>>>>>> --minPrefsPerUser 0\ > >>>>>>>>>>>>>> --maxPrefsPerUserInItemSimilarity 30 \ > >>>>>>>>>>>>>> --threshold 0.91 \ > >>>>>>>>>>>>>> --tempDir temp \ > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Some counters... I don't get what do they mean.... > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient: > >>>>>>>>>>>>>> > >>>>>>> > org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient: USERS=7528530 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient: > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >> > > > org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient: > >>>>>>>>>>>>>> USER_RATINGS_NEGLECTED=1,798,738 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient: > >>>>>>>>>>>>> USER_RATINGS_USED=12,429,693 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient: > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >> > > > org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient: ROWS=3312879 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient: > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >> > > > org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient: > >>>>>>> COOCCURRENCES=35882374 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient: > >>>>>>> PRUNED_COOCCURRENCES=0 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient: Map input > >>>>>>>> records=3312879 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient: Map output > >>>>>>>>>> records=17570268 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>>> records=5221907 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient: Reduce output > >>>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output > >>>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient: Reduce output > >>>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient: Map input > >>>>>>>> records=7528530 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient: Map output > >>>>>>>>>> records=3313251 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>>> records=3313251 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient: Reduce output > >>>>>>>>>>>>> records=3313251 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient: Map input > >>>>>>>> records=6626130 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient: Map output > >>>>>>>>>> records=6626130 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>>> records=6626130 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient: Reduce output > >>>>>>>>>>>>> records=3312879 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient: Map input > >>>>>>>> records=3312879 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient: Map output > >>>>>>>>>> records=3313251 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient: Reduce input > >>>>>>>>>>>>> records=3313251 > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> -------- > >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient: Reduce output > >>>>>>> records=0 > >>>>>>>>>>>>>> -------- > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> why 0??? > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >>> > >> > >> > > > > > >
