RE: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

WangRamon Wed, 19 Oct 2011 01:28:27 -0700
Yes, I'm still using version 0.5, the plan is to verify it can work on 0.5 and 
get some benchmark first then moving forward to 0.6, Sebastian, do you think 
it's a problem related to Mahout? (Not Hadoop?) And do you think 0.6 will bring 
us a huge performance increase? Thanks. CheersRamon
 > Date: Wed, 19 Oct 2011 10:20:24 +0200
> From: [email protected]
> To: [email protected]
> Subject: Re: Exception during running 
> RowSimilarityJob-Mapper-EntriesToVectorsReducer job
> 
> It seems like you're still not using Mahout 0.6. Please use the latest
> version and apply appropriate down sampling to your input data. You
> should also try to get access to a cluster with more than 2 machines.
> 
> --sebastian
> 
> On 19.10.2011 10:16, WangRamon wrote:
> > 
> > 
> > 
> > 
> > Hi Guys I'm continuing running the test case with a 1GB data file which 
> > contains 600000 users and 2000000 items, all the jobs are running in a 2 
> > two nodes cluster, each node has 32GB RAM and 8 core CPU, the 
> > RecommenderJob running until it reach 
> > RowSimilarityJob-Mapper-EntriesToVectorsReducer Job, see below for the 
> > error log: 11/10/18 23:09:34 INFO mapred.JobClient:  map 11% reduce 1%
> > 11/10/18 23:12:46 INFO mapred.JobClient:  map 11% reduce 2%
> > 11/10/18 23:13:55 INFO mapred.JobClient:  map 12% reduce 2%
> > 11/10/18 23:18:22 INFO mapred.JobClient:  map 13% reduce 2%
> > 11/10/18 23:22:50 INFO mapred.JobClient:  map 14% reduce 2%
> > 11/10/18 23:27:08 INFO mapred.JobClient:  map 15% reduce 2%
> > 11/10/18 23:28:15 INFO mapred.JobClient:  map 15% reduce 3%
> > 11/10/18 23:31:42 INFO mapred.JobClient:  map 16% reduce 3%
> > 11/10/18 23:33:36 INFO mapred.JobClient: Task Id : 
> > attempt_201110181002_0007_r_000000_0, Status : FAILED
> > java.io.IOException: Task: attempt_201110181002_0007_r_000000_0 - The 
> > reduce copier failed
> >  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
> >  at org.apache.hadoop.mapred.Child.main(Child.java:170)
> > Caused by: java.io.IOException: Intermediate merge failed
> >  at 
> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2576)
> >  at 
> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)
> > Caused by: java.lang.RuntimeException: java.io.EOFException
> >  at 
> > org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
> >  at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
> >  at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:123)
> >  at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:50)
> >  at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:447)
> >  at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
> >  at org.apache.hadoop.mapred.Merger.merge(Merger.java:107)
> >  at org.apache.hadoop.mapred.Merger.merge(Merger.java:93)
> >  at 
> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2551)
> >  ... 1 more
> > Caused by: java.io.EOFException
> >  at java.io.DataInputStream.readByte(DataInputStream.java:250)
> >  at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159)
> >  at org.apache.mahout.math.Varint.readSignedVarInt(Varint.java:140)
> >  at 
> > org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey.readFields(SimilarityMatrixEntryKey.java:64)
> >  at 
> > org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:97)
> >  ... 9 more11/10/18 23:33:37 INFO mapred.JobClient:  map 16% reduce 0%
> > 11/10/18 23:35:57 INFO mapred.JobClient:  map 17% reduce 0% I googled a lot 
> > and find i should increase "mapred.reduce.tasks" property in Hadoop, so I 
> > set it to 8 in my environment and restart this Job only, so far so good, 
> > the job is still running by now, but it's still a little slow, so here 
> > comes my questions: 1) does it be so slow for this job 
> > RowSimilarityJob-Mapper-EntriesToVectorsReducer Job ?2) what does this 
> > property "mapred.reduce.tasks" do? And why it can effect 
> > RowSimilarityJob-Mapper-EntriesToVectorsReducer Job ? (Maybe i should ask 
> > this 2nd question in hadoop user list... but i think people here are both 
> > pro at hadoop :) )3) what can i do to increase the speed for this job? Any 
> > ideas? Thanks in advance! Ramon                                           
>
RE: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

Reply via email to