Hi,

I'm trying to tune mahout ssvd job to not spill so much, I'm trying to tune

<property>
 <name>io.sort.mb</name>
 <value>1047</value>
</property>


but when I try to put any bigger value, ie.

<property>
 <name>io.sort.mb</name>
 <value>1247</value>
</property>


according to hadoop source code this value can be as hight as 2047

http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/com.cloudera.hadoop/hadoop-core/0.20.2-320/org/apache/hadoop/mapred/MapTask.java
line 771

I'm getting:

java.io.IOException: Spill failed
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1029) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper$1.collect(BtJob.java:261) at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper$1.collect(BtJob.java:255) at org.apache.mahout.math.hadoop.stochasticsvd.SparseRowBlockAccumulator.flushBlock(SparseRowBlockAccumulator.java:65) at org.apache.mahout.math.hadoop.stochasticsvd.SparseRowBlockAccumulator.collect(SparseRowBlockAccumulator.java:75) at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.map(BtJob.java:158) at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$BtMapper.map(BtJob.java:102)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: next value iterator failed
at org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:166) at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$OuterProductCombiner.reduce(BtJob.java:322) at org.apache.mahout.math.hadoop.stochasticsvd.BtJob$OuterProductCombiner.reduce(BtJob.java:302)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1502) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1436) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:853) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1344)
Caused by: java.io.EOFException
    at java.io.DataInputStream.readByte(DataInputStream.java:267)
    at org.apache.mahout.math.Varint.readUnsignedVarInt(Varint.java:159)
at org.apache.mahout.math.hadoop.stochasticsvd.SparseRowBlockWritable.readFields(SparseRowBlockWritable.java:60) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116) at org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163)
    ... 7 more

by changing this value I've already managed to reduce spills from 100 (for default value) to 10, disk usage dropped from around 7 gigabytes for my small data set to around 900 mb.

I've got lots of ram, and :
<property>
 <name>mapred.child.java.opts</name>
 <value>-Xmx7000M</value>
</property>

is that some configuration error or bug?

I can't figure out why I'm getting that exception

Reply via email to