Re: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

Sebastian Schelter Wed, 19 Oct 2011 01:21:07 -0700

It seems like you're still not using Mahout 0.6. Please use the latest
version and apply appropriate down sampling to your input data. You
should also try to get access to a cluster with more than 2 machines.


--sebastian

On 19.10.2011 10:16, WangRamon wrote:
> 
> 
> 
> 
> Hi Guys I'm continuing running the test case with a 1GB data file which 
> contains 600000 users and 2000000 items, all the jobs are running in a 2 two 
> nodes cluster, each node has 32GB RAM and 8 core CPU, the RecommenderJob 
> running until it reach RowSimilarityJob-Mapper-EntriesToVectorsReducer Job, 
> see below for the error log: 11/10/18 23:09:34 INFO mapred.JobClient:  map 
> 11% reduce 1%
> 11/10/18 23:12:46 INFO mapred.JobClient:  map 11% reduce 2%
> 11/10/18 23:13:55 INFO mapred.JobClient:  map 12% reduce 2%
> 11/10/18 23:18:22 INFO mapred.JobClient:  map 13% reduce 2%
> 11/10/18 23:22:50 INFO mapred.JobClient:  map 14% reduce 2%
> 11/10/18 23:27:08 INFO mapred.JobClient:  map 15% reduce 2%
> 11/10/18 23:28:15 INFO mapred.JobClient:  map 15% reduce 3%
> 11/10/18 23:31:42 INFO mapred.JobClient:  map 16% reduce 3%
> 11/10/18 23:33:36 INFO mapred.JobClient: Task Id : 
> attempt_201110181002_0007_r_000000_0, Status : FAILED
> java.io.IOException: Task: attempt_201110181002_0007_r_000000_0 - The reduce 
> copier failed
>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
>  at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.io.IOException: Intermediate merge failed
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2576)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)
> Caused by: java.lang.RuntimeException: java.io.EOFException
>  at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
>  at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>  at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:123)
>  at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:50)
>  at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:447)
>  at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
>  at org.apache.hadoop.mapred.Merger.merge(Merger.java:107)
>  at org.apache.hadoop.mapred.Merger.merge(Merger.java:93)
>  at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2551)
>  ... 1 more
> Caused by: java.io.EOFException
>  at java.io.DataInputStream.readByte(DataInputStream.java:250)
>  at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159)
>  at org.apache.mahout.math.Varint.readSignedVarInt(Varint.java:140)
>  at 
> org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey.readFields(SimilarityMatrixEntryKey.java:64)
>  at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:97)
>  ... 9 more11/10/18 23:33:37 INFO mapred.JobClient:  map 16% reduce 0%
> 11/10/18 23:35:57 INFO mapred.JobClient:  map 17% reduce 0% I googled a lot 
> and find i should increase "mapred.reduce.tasks" property in Hadoop, so I set 
> it to 8 in my environment and restart this Job only, so far so good, the job 
> is still running by now, but it's still a little slow, so here comes my 
> questions: 1) does it be so slow for this job 
> RowSimilarityJob-Mapper-EntriesToVectorsReducer Job ?2) what does this 
> property "mapred.reduce.tasks" do? And why it can effect 
> RowSimilarityJob-Mapper-EntriesToVectorsReducer Job ? (Maybe i should ask 
> this 2nd question in hadoop user list... but i think people here are both pro 
> at hadoop :) )3) what can i do to increase the speed for this job? Any ideas? 
> Thanks in advance! Ramon

Re: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

Reply via email to