Re: java.io.IOException while running itemsimilarity

Andrew Schein Fri, 24 Jun 2011 12:14:18 -0700

Hi Sean -

How can you determine that the file size is an issue? I don't see anymemory-related exception in the stack trace.


Thanks,

Andy

Sean Owen wrote:

This is a Hadoop issue, not a Mahout issue.

In general it means Hadoop is choking on files that are too large. Use more
mappers and/or reducers.

On Thu, Jun 23, 2011 at 6:35 PM, Andrew Schein
<[email protected]>wrote:

Hi all -

I am getting the following exception while running an itemsimilarity job:

java.io.IOException: Task: attempt_201106201353_0017_r_**000000_0 - The
reduce copier failed
      at org.apache.hadoop.mapred.**ReduceTask.run(ReduceTask.**java:388)
      at org.apache.hadoop.mapred.**Child$4.run(Child.java:259)
      at java.security.**AccessController.doPrivileged(**Native Method)
      at javax.security.auth.Subject.**doAs(Subject.java:396)
      at org.apache.hadoop.security.**UserGroupInformation.doAs(**
UserGroupInformation.java:**1059)
      at org.apache.hadoop.mapred.**Child.main(Child.java:253)
Caused by: java.io.IOException: java.lang.RuntimeException:
java.io.EOFException
      at 
org.apache.hadoop.io.**WritableComparator.compare(**WritableComparator.java:103)

      at org.apache.hadoop.mapred.**Merger$MergeQueue.lessThan(**
Merger.java:373)
      at org.apache.hadoop.util.**PriorityQueue.downHeap(**
PriorityQueue.java:136)
      at org.apache.hadoop.util.**PriorityQueue.adjustTop(**
PriorityQueue.java:103)
      at org.apache.hadoop.mapred.**Merger$MergeQueue.**
adjustPriorityQueue(Merger.**java:335)
      at org.apache.hadoop.mapred.**Merger$MergeQueue.next(Merger.**
java:350)
      at org.apache.hadoop.mapred.**Merger.writeFile(Merger.java:**156)
      at org.apache.hadoop.mapred.**ReduceTask$ReduceCopier$**
LocalFSMerger.run(ReduceTask.**java:2669)
Caused by: java.io.EOFException
      at java.io.DataInputStream.**readByte(DataInputStream.java:**250)
      at org.apache.mahout.math.Varint.**readUnsignedVarInt(Varint.**
java:159)
      at org.apache.mahout.math.Varint.**readSignedVarInt(Varint.java:**
140)
      at org.apache.mahout.math.hadoop.**similarity.**
SimilarityMatrixEntryKey.**readFields(**SimilarityMatrixEntryKey.java:**64)

      at org.apache.hadoop.io.**WritableComparator.compare(**
WritableComparator.java:97)
      ... 7 more

      at org.apache.hadoop.mapred.**ReduceTask$ReduceCopier$**
LocalFSMerger.run(ReduceTask.**java:2673)

The exception only occurs for large data sets >= 9 gigs making it difficult
to diagnose.

I am using mahout-distribution-0.4 (0.5 gave me other issues) with
hadoop-0.20.203.0.

Has anyone else encountered this problem?

Thanks,

Andrew

Re: java.io.IOException while running itemsimilarity

Reply via email to