Re: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

Ted Dunning Wed, 19 Oct 2011 06:57:40 -0700

First of all 2 nodes is a very small hadoop cluster. It is not uncommon to have 
odd problems with such a small cluster and you certainly are very unlikely to 
see any significant speedup.


Are you running out of disk space?

Sent from my iPhone

On Oct 19, 2011, at 2:16, WangRamon <[email protected]> wrote:

> 
> 
> 
> 
> Hi Guys I'm continuing running the test case with a 1GB data file which 
> contains 600000 users and 2000000 items, all the jobs are running in a 2 two 
> nodes cluster, each node has 32GB RAM and 8 core CPU, the RecommenderJob 
> running until it reach RowSimilarityJob-Mapper-EntriesToVectorsReducer Job, 
> see below for the error log: 11/10/18 23:09:34 INFO mapred.JobClient:  map 
> 11% reduce 1%
> 11/10/18 23:12:46 INFO mapred.JobClient:  map 11% reduce 2%
> 11/10/18 23:13:55 INFO mapred.JobClient:  map 12% reduce 2%
> 11/10/18 23:18:22 INFO mapred.JobClient:  map 13% reduce 2%
> 11/10/18 23:22:50 INFO mapred.JobClient:  map 14% reduce 2%
> 11/10/18 23:27:08 INFO mapred.JobClient:  map 15% reduce 2%
> 11/10/18 23:28:15 INFO mapred.JobClient:  map 15% reduce 3%
> 11/10/18 23:31:42 INFO mapred.JobClient:  map 16% reduce 3%
> 11/10/18 23:33:36 INFO mapred.JobClient: Task Id : 
> attempt_201110181002_0007_r_000000_0, Status : FAILED
> java.io.IOException: Task: attempt_201110181002_0007_r_000000_0 - The reduce 
> copier failed
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.io.IOException: Intermediate merge failed
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2576)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)
> Caused by: java.lang.RuntimeException: java.io.EOFException
> at 
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
> at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
> at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:123)
> at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:50)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:447)
> at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
> at org.apache.hadoop.mapred.Merger.merge(Merger.java:107)
> at org.apache.hadoop.mapred.Merger.merge(Merger.java:93)
> at 
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2551)
> ... 1 more
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readByte(DataInputStream.java:250)
> at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159)
> at org.apache.mahout.math.Varint.readSignedVarInt(Varint.java:140)
> at 
> org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey.readFields(SimilarityMatrixEntryKey.java:64)
> at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:97)
> ... 9 more11/10/18 23:33:37 INFO mapred.JobClient:  map 16% reduce 0%
> 11/10/18 23:35:57 INFO mapred.JobClient:  map 17% reduce 0% I googled a lot 
> and find i should increase "mapred.reduce.tasks" property in Hadoop, so I set 
> it to 8 in my environment and restart this Job only, so far so good, the job 
> is still running by now, but it's still a little slow, so here comes my 
> questions: 1) does it be so slow for this job 
> RowSimilarityJob-Mapper-EntriesToVectorsReducer Job ?2) what does this 
> property "mapred.reduce.tasks" do? And why it can effect 
> RowSimilarityJob-Mapper-EntriesToVectorsReducer Job ? (Maybe i should ask 
> this 2nd question in hadoop user list... but i think people here are both pro 
> at hadoop :) )3) what can i do to increase the speed for this job? Any ideas? 
> Thanks in advance! Ramon

Re: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

Reply via email to