Hello,
I have written one PIG Script and tried to execute it, but after executing some
part it gives me error java.io.IOException: Spill failed. I have included below
statements in my script. And also I have set the classpath for hadoop-LZO jar.
1) set mapred.compress.map.output true;
2) set mapred.map.output.compression.codec com.hadoop.compression.lzo.LzopCodec;
This error is caused by java.lang.RuntimeException: native-lzo library not
available. But I have set the CLASSPATH for Hadoop-LZO jar.
After searching for this error I came to know some like:
"Since you have a very large number of records, if the individual records are
small it's likely the
map task is spilling not because the data buffer is full, but because the
accounting area is full."
So what should I do to avoid the Spill Failed?
Below is the exception I got.
Backend error message
---------------------
java.io.IOException: Spill failed
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1213)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1194)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:153)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:555)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
at
org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
at org.apache.pig.data.BinSedesTuple.write(BinSedesTuple.java:41)
at
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1061)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzoCodec.getCompressorType(LzoCodec.java:135)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
at org.apache.hadoop.mapred.IFile$Writer.<init>(IFile.java:101)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1407)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:853)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1344)