Thank you! I could spend all my life trying to get result without knowing
the requirements for input data.

BTW:
we used mahout 0.7-cdh-4.4...cdh
4.7 org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
and did get results close to reality. We just provided long user_id,
item_id and didn't do something special.
Why did it work?


2014-07-27 5:18 GMT+04:00 Pat Ferrel <[email protected]>:

> Both those jobs require you create Mahout IDs for users and items. For
> most Hadoop based Mahout jobs, taking either text input or sequence files,
> the IDs must follow the rules mentioned below. There are a few exceptions
> but none you are using. The Wiki was rewritten for 0.9 and so the ID
> requirements may not be documented well. You can file a Jira so someone
> documents this.
>
> BTW spark-itemsimilarity will take any IDs and can read any text-delimited
> file format, unfortunately it’s not quite ready yet.
>
> On Jul 26, 2014, at 3:14 AM, Serega Sheypak <[email protected]>
> wrote:
>
> Hm... rather confusing... You are talking about input for:
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
> or
> org.apache.mahout.cf.taste.hadoop.item.RecommenderJob
>
> My target is to get item-item similarity. ItemSimilarityJob right now
> returns few similarities.
>
> I'm readin this:
> https://mahout.apache.org/users/recommender/intro-itembased-hadoop.html
> and that:
> https://mahout.apache.org/users/recommender/userbased-5-minutes.html
>
> I don't see there something about " Your IDs must be in the range from 0 to
> the number of rows" for both items and users. Where does this requirement
> come from?
>
>
> 2014-07-25 23:57 GMT+04:00 Pat Ferrel <[email protected]>:
>
> > I think I did explain below. Your IDs must be in the range from 0 to the
> > number of rows - 1 and the same for item IDs. This is done by taking your
> > application specific IDs and mapping them to sequential non-negative
> > Integers. You need to maintain a mapping to/from Mahout IDs somewhere in
> > your own code.
> >
> > For example imagine input of the form
> > -92, abc, 1.0
> > 75000x, jkl, 2.0
> >
> > Your first user ID is -92, give it Mahout ID = 0. For your next user ID
> > 75000x give it Mahout ID = 1
> > Your first item ID is abc, give it Mahout ID = 0. For your next item ID
> > jkl give it Mahout ID = 1
> > keep doing this the first time you see a unique id from your input. A Map
> > will do this for you.
> >
> > And so on. Then the input to Mahout would be:
> > 0,0,1.0
> > 1,1,2.0
> >
> > The output will have Mahout IDs too so you need to map recommendations
> for
> > Mahout User ID 0 back to your User ID of -92, and the same for all item
> IDs.
> >
> >
> > On Jul 25, 2014, at 11:55 AM, Serega Sheypak <[email protected]>
> > wrote:
> >
> > I'm preparing data using apache hive: user_id:long, item_it:long,
> > preference[1.0, 2.0]
> > I don't understand "For most Mahout jobs you have to prepare you data to
> > have Mahout IDs". What is "Mahout IDs"? I try to follow mahout site
> docs, I
> > didn't find there something related to mahout ids.
> > Please explain.
> >
> >
> > 2014-07-25 22:39 GMT+04:00 Pat Ferrel <[email protected]>:
> >
> >> Sorry I haven’t read this thread carefully but it looks like you may be
> >> using the wrong IDs.
> >>
> >> For most Mahout jobs you have to prepare you data to have Mahout IDs.
> You
> >> do this by looking at each datum and as you see a new unique application
> >> specific user or item ID you give it a Mahout ID starting from 0. So
> > Mahout
> >> ID can be thought of as row and column numbers in a matrix. The Mahout
> > IDs
> >> for rows will be 0 thru # of rows-1 same for columns.
> >>
> >> This always requires that you translate into Mahout IDs then after the
> > job
> >> is run translate back into your application IDs. You need a
> > bi-directional
> >> dictionary of some type. I use a HashBiMap from Guava.
> >>
> >> Also I’d avoid the threshold for now. If you get that wrong it will mess
> >> things up badly and is very hard to tune. It’s there for completeness
> > but I
> >> never use it.
> >>
> >>
> >> On Jul 25, 2014, at 12:55 AM, Serega Sheypak <[email protected]>
> >> wrote:
> >>
> >> Hi, nothing helps...
> >> I do use mahout 0.9 compiled for CDH 4.7
> >> I do provide only positive values
> >> I do use itemsimilarityJob and do get 2000 similarities for 1400 unique
> >> items
> >> Input data is:
> >> 16*10^6 preferences
> >> 4*10^6 users
> >> 0.6*10^ items
> >> I do use perason correlation and preferece vlaues are: 1.0 and 2.0
> >>
> >>
> >> 2014-07-22 9:32 GMT+04:00 Serega Sheypak <[email protected]>:
> >>
> >>> Ok, I have recompiled mahout 0.9 for CDH 4.7. I'll try this evening.
> >>> Right now I don't see how can it help me. As far as I know the stuff I
> >> try
> >>> to use is pretty old and stable.
> >>> looks like I do apply it in a wrong way.
> >>>
> >>> There is an option for recommenditembased named "--threshold". I do
> >>> provide data for recommenditembased with preference values in range
> >>> [1.1..2.0].
> >>> I set --threshold to 1.2
> >>> --threshold is absolute and can be from [1.1 . .2+] or it's relative
> and
> >>> can be [0.0 .. 0.99999]?
> >>>
> >>>
> >>> 2014-07-22 3:54 GMT+04:00 Ted Dunning <[email protected]>:
> >>>
> >>> That version is no longer supported.  You should upgrade to 0.9
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Mon, Jul 21, 2014 at 11:41 AM, Serega Sheypak <
> >>>> [email protected]>
> >>>> wrote:
> >>>>
> >>>>> 0.7-cdh4.7.0
> >>>>> Anyway, recommenditembased does produce these catalogs:
> >>>>>
> >>>>> /recommenditembased/temp/maxValues.bin
> >>>>> /recommenditembased/temp/norms.bin
> >>>>> /recommenditembased/temp/numNonZeroEntries.bin
> >>>>> /recommenditembased/temp/pairwiseSimilarity
> >>>>> /recommenditembased/temp/partialMultiply
> >>>>> /recommenditembased/temp/prePartialMultiply1
> >>>>> /recommenditembased/temp/prePartialMultiply2
> >>>>> /recommenditembased/temp/preparePreferenceMatrix
> >>>>> /recommenditembased/temp/similarityMatrix
> >>>>> /recommenditembased/temp/weights
> >>>>>
> >>>>> I suppose that "/recommenditembased/temp/similarityMatrix" is the
> > thing
> >>>> In
> >>>>> eed. Right now I try to read it using
> >>>>>
> >>>>> matrix = LOAD '/recommenditembased/temp/similarityMatrix' USING
> >>>>> com.twitter.elephantbird.pig.load.SequenceFileLoader(
> >>>>>  '-c com.twitter.elephantbird.pig.util.IntWritableConverter',
> >>>>>  '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter'
> >>>>> )  as (intId: int, vector:tuple(cardinality:int,
> >>>>> entries:bag{t:tuple(some_id:long, some_value:double)}));
> >>>>>
> >>>>>
> >>>>> Looks like the vector is empty... Or i do something wrong.
> >>>>>
> >>>>>
> >>>>>
> >>>>> 2014-07-21 22:09 GMT+04:00 Ted Dunning <[email protected]>:
> >>>>>
> >>>>>> Which version of Mahout?
> >>>>>>
> >>>>>>
> >>>>>> On Mon, Jul 21, 2014 at 11:05 AM, Serega Sheypak <
> >>>>> [email protected]
> >>>>>>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi, I've tried: Unexpected --outputPathForSimilarityMatrix while
> >>>>>> processing
> >>>>>>> Job-Specific
> >>>>>>>
> >>>>>>> sudo -u hdfs hadoop fs -rm -r
> >>>>>> hdfs://nameservice1/recommenditembased/output
> >>>>>>> sudo -u hdfs hadoop fs -rm -r
> >>>>> hdfs://nameservice1/recommenditembased/temp
> >>>>>>> sudo -u oozie mahout recommenditembased \
> >>>>>>>                  --input \
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >
> hdfs://nameservice1/user/hive/warehouse/staging_weighted_visits_and_rec_clicks
> >>>>>>> \
> >>>>>>>                  --output \
> >>>>>>>                  hdfs://nameservice1/recommenditembased/output \
> >>>>>>>                  --similarityClassname \
> >>>>>>>                  SIMILARITY_LOGLIKELIHOOD \
> >>>>>>>                 --numRecommendations \
> >>>>>>>                  500 \
> >>>>>>>                  --booleanData \
> >>>>>>>                  false \
> >>>>>>>                  --maxPrefsPerUser \
> >>>>>>>                  1000 \
> >>>>>>>                  --maxSimilaritiesPerItem \
> >>>>>>>                  1000 \
> >>>>>>>                  --minPrefsPerUser \
> >>>>>>>                  5 \
> >>>>>>>                  --maxPrefsPerUserInItemSimilarity \
> >>>>>>>                  30 \
> >>>>>>>                  --threshold \
> >>>>>>>                 1.1 \
> >>>>>>>                  --tempDir \
> >>>>>>>                  hdfs://nameservice1/recommenditembased/temp \
> >>>>>>>                  --outputPathForSimilarityMatrix \
> >>>>>>>
> >>>> hdfs://nameservice1/recommenditembased/sim_matrix
> >>>>>>>
> >>>>>>>
> >>>>>>> I'm on Cloudera cdh 4.7, looks like this feature is not supported.
> >>>>>>>
> >>>>>>>
> >>>>>>> 2014-07-21 11:18 GMT+04:00 Peng Zhang <[email protected]>:
> >>>>>>>
> >>>>>>>> Serega,
> >>>>>>>>
> >>>>>>>> See the last line on how to pass outputPathForSimilarityMatrix
> >>>>> options
> >>>>>> to
> >>>>>>>> the recommenditembased command:
> >>>>>>>>
> >>>>>>>> sudo -u oozie mahout recommenditembased \
> >>>>>>>>                 --input visited_items_with_inverted_items \
> >>>>>>>>
> >>>>>>>>                 --output result \
> >>>>>>>>                 --similarityClassname SIMILARITY_LOGLIKELIHOOD
> >>>> \
> >>>>>>>>                 --usersFile inverted_items \
> >>>>>>>>                 --numRecommendations 500 \
> >>>>>>>>                 --booleanData false \
> >>>>>>>>                 --maxPrefsPerUser 100 \
> >>>>>>>>                 --maxSimilaritiesPerItem 500 \
> >>>>>>>>                 --minPrefsPerUser 0\
> >>>>>>>>                 --maxPrefsPerUserInItemSimilarity 30 \
> >>>>>>>>                 --threshold 0.91 \
> >>>>>>>>                 --tempDir  temp \
> >>>>>>>>                 --outputPathForSimilarityMatrix
> >>>> similarityMatri \
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Peng Zhang
> >>>>>>>> [email protected]
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Jul 21, 2014, at 3:09 PM, Serega Sheypak <
> >>>>> [email protected]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I've inspected the code, our approach wouldn't work with
> >>>>>>>> booleanData=false.
> >>>>>>>>> We do calcualte imte similarity in the wrong way...(((
> >>>>>>>>> Thank you
> >>>>>>>>> 1. We provide "fake" user_id and provide --usersFile in order to
> >>>>> get
> >>>>>>>>> recommendations for "fake user_id, where user_id is a negative
> >>>>>> item_id.
> >>>>>>>> It
> >>>>>>>>> worked when we did provide user_id->item_id pairs without
> >>>>> preference.
> >>>>>>>>> 2. Our target is to get item similarities. We tried
> >>>>>>>>>
> >>>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
> >>>>>> but
> >>>>>>>> it
> >>>>>>>>> returns bad result comparing to RecommenderJob with our "fake"
> >>>>>> user_id
> >>>>>>>>> (inverted item_id)
> >>>>>>>>>
> >>>>>>>>> 1. I'll try the option you provided.
> >>>>>>>>> 2. I will remove input with fake user_id and usersFile with
> >>>> these
> >>>>>> fake
> >>>>>>>> ids
> >>>>>>>>>
> >>>>>>>>> 3.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >
> https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
> >>>>>>>>> I don't understand how to pass ---outputPathForSimilarityMatrix
> >>>>>> option
> >>>>>>> to
> >>>>>>>>> RecommenderJob
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> 2014-07-21 4:58 GMT+04:00 Peng Zhang <[email protected]>:
> >>>>>>>>>
> >>>>>>>>>> Seraga,
> >>>>>>>>>>
> >>>>>>>>>> I have two comments:
> >>>>>>>>>> 1. Don’t use negative user ids. Since Mahout uses user id as
> >>>> well
> >>>>> as
> >>>>>>>> item
> >>>>>>>>>> id as the row/column index, you’d better use 0, 1, 2, etc as
> >>>> ids
> >>>>>>>>>> 2. If you want to get the item similarity information, you can
> >>>> use
> >>>>>>>>>> --outputPathForSimilarityMatrix in the command
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Peng Zhang
> >>>>>>>>>> M: +86 186-1658-7856
> >>>>>>>>>> [email protected]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Jul 21, 2014, at 4:00 AM, Serega Sheypak <
> >>>>>> [email protected]
> >>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> All bad things happen here:
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Name
> >>>>>>>>>>>
> >>>>>>>>>>> RecommenderJob-PartialMultiplyMapper-Reducer
> >>>>>>>>>>>
> >>>>>>>>>>> User
> >>>>>>>>>>>
> >>>>>>>>>>> oozie
> >>>>>>>>>>>
> >>>>>>>>>>> Process User
> >>>>>>>>>>>
> >>>>>>>>>>> oozie
> >>>>>>>>>>>
> >>>>>>>>>>> Group
> >>>>>>>>>>>
> >>>>>>>>>>> oozie
> >>>>>>>>>>>
> >>>>>>>>>>> Mapper Class
> >>>>>>>>>>>
> >>>>>>>>>>> PartialMultiplyMapper
> >>>>>>>>>>>
> >>>>>>>>>>> Reducer Class
> >>>>>>>>>>>
> >>>>>>>>>>> AggregateAndRecommendReducer
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Job Input Directory
> >>>>>>>>>>>
> >>>>>>>>>>> hdfs://nameservice1/itemrec/temp/partialMultiply
> >>>>>>>>>>>
> >>>>>>>>>>> Job Output Directory
> >>>>>>>>>>>
> >>>>>>>>>>> hdfs://nameservice1/itemrec/output/
> >>>>>>>>>>>
> >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Map input
> >>>>>>> records=3312879
> >>>>>>>>>>>
> >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Map output
> >>>>>>> records=3313251
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce input
> >>>>>>>> records=3313251
> >>>>>>>>>>>
> >>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce output
> >>>>>> records=0
> >>>>>>>>>>>
> >>>>>>>>>>> Why does mahout returns 0 rows? it works when booleanData=true
> >>>>>>>>>> (preferences
> >>>>>>>>>>> are ignored...?)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> 2014-07-20 23:19 GMT+04:00 Serega Sheypak <
> >>>>>> [email protected]
> >>>>>>>> :
> >>>>>>>>>>>
> >>>>>>>>>>>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
> >>>>>>>>>>>> users_file:
> >>>>>>>>>>>> --inverted_item_id
> >>>>>>>>>>>> -1
> >>>>>>>>>>>> -2
> >>>>>>>>>>>> -3
> >>>>>>>>>>>> -4
> >>>>>>>>>>>>
> >>>>>>>>>>>> users_items_prefs
> >>>>>>>>>>>> --inverted item_id
> >>>>>>>>>>>> -1 1 1.0
> >>>>>>>>>>>> -2 2 1.0
> >>>>>>>>>>>> -3 3 1.0
> >>>>>>>>>>>> -4 4 1.0
> >>>>>>>>>>>> --user_id item_id pref_value
> >>>>>>>>>>>> 11   1 1.6
> >>>>>>>>>>>> 11   2 1.6
> >>>>>>>>>>>> 123 3 2.0
> >>>>>>>>>>>> 123 4 2.0
> >>>>>>>>>>>> 333 1 2.0
> >>>>>>>>>>>> 333 2 1.6
> >>>>>>>>>>>> --e.t.c.
> >>>>>>>>>>>>
> >>>>>>>>>>>> if I set --booleanData true
> >>>>>>>>>>>> then mahout returns the result.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2014-07-20 23:12 GMT+04:00 Andrew Musselman <
> >>>>>>>> [email protected]
> >>>>>>>>>>> :
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm confused about how you're constructing the user file, and
> >>>>> why
> >>>>>>>> there
> >>>>>>>>>>>>> are negated item ids here.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Can you post some more details please, including Mahout
> >>>> version
> >>>>>> and
> >>>>>>>>>> some
> >>>>>>>>>>>>> sample data sets?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak <
> >>>>>>>>>> [email protected]>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Hi, I'm trying to create item similarity.
> >>>>>>>>>>>>>> I gather items which users visit during shopping and then
> >>>>>> create a
> >>>>>>>>>> file:
> >>>>>>>>>>>>>> user_id, item_id, weight (where weight can be: [1.0, 1.6,
> >>>>> 1.9],
> >>>>>>>>>> depends
> >>>>>>>>>>>>> on
> >>>>>>>>>>>>>> user action type and data source)
> >>>>>>>>>>>>>> UNION
> >>>>>>>>>>>>>> -item_id, item_id, 1.0 (from items dictionary)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> and I do provide a userFile, where user_id = -item_id
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The idea is to get item similary. If any user visits item
> >>>>> named
> >>>>>>>> "A", i
> >>>>>>>>>>>>> want
> >>>>>>>>>>>>>> to show him items "B", "c", "xxx" using preferences of
> >>>> other
> >>>>>>> users.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> The problem is that the last (???) mapreduce job returns 0
> >>>>> rows:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Here are my settings:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> sudo -u oozie mahout recommenditembased \
> >>>>>>>>>>>>>>               --input visited_items_with_inverted_items
> >>>> \
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>               --output result \
> >>>>>>>>>>>>>>               --similarityClassname
> >>>>> SIMILARITY_LOGLIKELIHOOD
> >>>>>> \
> >>>>>>>>>>>>>>               --usersFile inverted_items \
> >>>>>>>>>>>>>>               --numRecommendations 500 \
> >>>>>>>>>>>>>>               --booleanData false \
> >>>>>>>>>>>>>>               --maxPrefsPerUser 100 \
> >>>>>>>>>>>>>>               --maxSimilaritiesPerItem 500 \
> >>>>>>>>>>>>>>               --minPrefsPerUser 0\
> >>>>>>>>>>>>>>               --maxPrefsPerUserInItemSimilarity 30 \
> >>>>>>>>>>>>>>               --threshold 0.91 \
> >>>>>>>>>>>>>>               --tempDir  temp \
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Some counters... I don't get what do they mean....
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
> >>>>>>>>>>>>>>
> >>>>>>>
> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:     USERS=7528530
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >
> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>>>>>>>>>>> USER_RATINGS_NEGLECTED=1,798,738
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>>>>>>>>>> USER_RATINGS_USED=12,429,693
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:     ROWS=3312879
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>
> >
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> >>>>>>> COOCCURRENCES=35882374
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> >>>>>>> PRUNED_COOCCURRENCES=0
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map input
> >>>>>>>> records=3312879
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map output
> >>>>>>>>>> records=17570268
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce input
> >>>>>>>>>>>>> records=5221907
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce output
> >>>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
> >>>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
> >>>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
> >>>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
> >>>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map input
> >>>>>>>> records=7528530
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map output
> >>>>>>>>>> records=3313251
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce input
> >>>>>>>>>>>>> records=3313251
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce output
> >>>>>>>>>>>>> records=3313251
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map input
> >>>>>>>> records=6626130
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map output
> >>>>>>>>>> records=6626130
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce input
> >>>>>>>>>>>>> records=6626130
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce output
> >>>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map input
> >>>>>>>> records=3312879
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map output
> >>>>>>>>>> records=3313251
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce input
> >>>>>>>>>>>>> records=3313251
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --------
> >>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce output
> >>>>>>> records=0
> >>>>>>>>>>>>>> --------
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> why 0???
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>

Reply via email to