Re: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

Sebastian Schelter Wed, 19 Oct 2011 01:55:45 -0700
Yes.
On 19.10.2011 10:53, WangRamon wrote:
> 
> Hi Sebastian Can mahout 0.6 work with hadoop 0.20.2 ? ThanksRamon > Date: 
> Wed, 19 Oct 2011 10:29:31 +0200
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: Exception during running 
>> RowSimilarityJob-Mapper-EntriesToVectorsReducer job
>>
>> As I'm the author of RowSimilarityJob you should save yourself some time
>> and believe me that the best thing is to move to 0.6 immediately.
>>
>> --sebastian
>>
>> On 19.10.2011 10:27, WangRamon wrote:
>>>
>>> Yes, I'm still using version 0.5, the plan is to verify it can work on 0.5 
>>> and get some benchmark first then moving forward to 0.6, Sebastian, do you 
>>> think it's a problem related to Mahout? (Not Hadoop?) And do you think 0.6 
>>> will bring us a huge performance increase? Thanks. CheersRamon
>>>  > Date: Wed, 19 Oct 2011 10:20:24 +0200
>>>> From: [email protected]
>>>> To: [email protected]
>>>> Subject: Re: Exception during running 
>>>> RowSimilarityJob-Mapper-EntriesToVectorsReducer job
>>>>
>>>> It seems like you're still not using Mahout 0.6. Please use the latest
>>>> version and apply appropriate down sampling to your input data. You
>>>> should also try to get access to a cluster with more than 2 machines.
>>>>
>>>> --sebastian
>>>>
>>>> On 19.10.2011 10:16, WangRamon wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi Guys I'm continuing running the test case with a 1GB data file which 
>>>>> contains 600000 users and 2000000 items, all the jobs are running in a 2 
>>>>> two nodes cluster, each node has 32GB RAM and 8 core CPU, the 
>>>>> RecommenderJob running until it reach 
>>>>> RowSimilarityJob-Mapper-EntriesToVectorsReducer Job, see below for the 
>>>>> error log: 11/10/18 23:09:34 INFO mapred.JobClient:  map 11% reduce 1%
>>>>> 11/10/18 23:12:46 INFO mapred.JobClient:  map 11% reduce 2%
>>>>> 11/10/18 23:13:55 INFO mapred.JobClient:  map 12% reduce 2%
>>>>> 11/10/18 23:18:22 INFO mapred.JobClient:  map 13% reduce 2%
>>>>> 11/10/18 23:22:50 INFO mapred.JobClient:  map 14% reduce 2%
>>>>> 11/10/18 23:27:08 INFO mapred.JobClient:  map 15% reduce 2%
>>>>> 11/10/18 23:28:15 INFO mapred.JobClient:  map 15% reduce 3%
>>>>> 11/10/18 23:31:42 INFO mapred.JobClient:  map 16% reduce 3%
>>>>> 11/10/18 23:33:36 INFO mapred.JobClient: Task Id : 
>>>>> attempt_201110181002_0007_r_000000_0, Status : FAILED
>>>>> java.io.IOException: Task: attempt_201110181002_0007_r_000000_0 - The 
>>>>> reduce copier failed
>>>>>  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:380)
>>>>>  at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>>>> Caused by: java.io.IOException: Intermediate merge failed
>>>>>  at 
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2576)
>>>>>  at 
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)
>>>>> Caused by: java.lang.RuntimeException: java.io.EOFException
>>>>>  at 
>>>>> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103)
>>>>>  at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>>>>>  at org.apache.hadoop.util.PriorityQueue.upHeap(PriorityQueue.java:123)
>>>>>  at org.apache.hadoop.util.PriorityQueue.put(PriorityQueue.java:50)
>>>>>  at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:447)
>>>>>  at org.apache.hadoop.mapred.Merger$MergeQueue.merge(Merger.java:381)
>>>>>  at org.apache.hadoop.mapred.Merger.merge(Merger.java:107)
>>>>>  at org.apache.hadoop.mapred.Merger.merge(Merger.java:93)
>>>>>  at 
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2551)
>>>>>  ... 1 more
>>>>> Caused by: java.io.EOFException
>>>>>  at java.io.DataInputStream.readByte(DataInputStream.java:250)
>>>>>  at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159)
>>>>>  at org.apache.mahout.math.Varint.readSignedVarInt(Varint.java:140)
>>>>>  at 
>>>>> org.apache.mahout.math.hadoop.similarity.SimilarityMatrixEntryKey.readFields(SimilarityMatrixEntryKey.java:64)
>>>>>  at 
>>>>> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:97)
>>>>>  ... 9 more11/10/18 23:33:37 INFO mapred.JobClient:  map 16% reduce 0%
>>>>> 11/10/18 23:35:57 INFO mapred.JobClient:  map 17% reduce 0% I googled a 
>>>>> lot and find i should increase "mapred.reduce.tasks" property in Hadoop, 
>>>>> so I set it to 8 in my environment and restart this Job only, so far so 
>>>>> good, the job is still running by now, but it's still a little slow, so 
>>>>> here comes my questions: 1) does it be so slow for this job 
>>>>> RowSimilarityJob-Mapper-EntriesToVectorsReducer Job ?2) what does this 
>>>>> property "mapred.reduce.tasks" do? And why it can effect 
>>>>> RowSimilarityJob-Mapper-EntriesToVectorsReducer Job ? (Maybe i should ask 
>>>>> this 2nd question in hadoop user list... but i think people here are both 
>>>>> pro at hadoop :) )3) what can i do to increase the speed for this job? 
>>>>> Any ideas? Thanks in advance! Ramon                                       
>>>>>   
>>>>
>>>                                       
>>
>
Re: Exception during running RowSimilarityJob-Mapper-EntriesToVectorsReducer job

Reply via email to