>From the logs it looks like issue is not with Pig but with your hdfs.
Either your hdfs is running out of space or some (or all) nodes in
your cluster can't talk to each other (network issue ?)

Ashutosh
On Wed, Dec 8, 2010 at 06:09, jr <[email protected]> wrote:
> Hi guys,
> I'm having some trouble finished jobs that run smoothly on a smaller
> dataset, but always fail at 99% if i try to run the job on the whole
> set.
> i can see a few killed map and a few killed reduce, but quite a lot of
> failed reduce tasks that all show the same exception at the end.
> here is what i have in the logs:
>
> 2010-12-08 08:44:56,127 INFO org.apache.hadoop.mapred.ReduceTask:
> Ignoring obsolete output of KILLED map-task:
> 'attempt_201012080810_0003_m_000009_1'
> 2010-12-08 08:45:08,152 INFO org.apache.hadoop.mapred.ReduceTask: 
> attempt_201012080810_0003_r_000000_0: Got 1 new map-outputs
> 2010-12-08 08:45:13,103 INFO org.apache.hadoop.mapred.ReduceTask: 
> attempt_201012080810_0003_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 
> dup hosts)
> 2010-12-08 08:45:13,241 INFO org.apache.hadoop.mapred.ReduceTask: header: 
> attempt_201012080810_0003_m_000003_0, compressed len: 3488519, decompressed 
> len: 3488515
> 2010-12-08 08:45:13,241 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 
> 3488515 bytes (3488519 raw bytes) into RAM from 
> attempt_201012080810_0003_m_000003_0
> 2010-12-08 08:45:13,348 INFO org.apache.pig.impl.util.SpillableMemoryManager: 
> low memory handler called (Collection threshold exceeded) init = 
> 5439488(5312K) used = 78403496(76565K) committed = 101908480(99520K) max = 
> 139853824(136576K)
> 2010-12-08 08:45:13,404 INFO org.apache.hadoop.mapred.ReduceTask: Read 
> 3488515 bytes from map-output for attempt_201012080810_0003_m_000003_0
> 2010-12-08 08:45:13,405 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from 
> attempt_201012080810_0003_m_000003_0 -> (142, 21) from 
> ip-10-98-71-195.ec2.internal
> 2010-12-08 08:45:14,241 INFO org.apache.hadoop.mapred.ReduceTask: 
> GetMapEventsThread exiting
> 2010-12-08 08:45:14,241 INFO org.apache.hadoop.mapred.ReduceTask: 
> getMapsEventsThread joined.
> 2010-12-08 08:45:14,242 INFO org.apache.hadoop.mapred.ReduceTask: Closed ram 
> manager
> 2010-12-08 08:45:14,253 INFO org.apache.hadoop.mapred.ReduceTask: Interleaved 
> on-disk merge complete: 2 files left.
> 2010-12-08 08:45:14,254 INFO org.apache.hadoop.mapred.ReduceTask: In-memory 
> merge complete: 64 files left.
> 2010-12-08 08:45:14,312 INFO org.apache.hadoop.mapred.Merger: Merging 64 
> sorted segments
> 2010-12-08 08:45:14,313 INFO org.apache.hadoop.mapred.Merger: Down to the 
> last merge-pass, with 64 segments left of total size: 82947024 bytes
> 2010-12-08 08:45:15,389 INFO org.apache.hadoop.mapred.ReduceTask: Merged 64 
> segments, 82947024 bytes to disk to satisfy reduce memory limit
> 2010-12-08 08:45:15,390 INFO org.apache.hadoop.mapred.ReduceTask: Merging 3 
> files, 214514578 bytes from disk
> 2010-12-08 08:45:15,392 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0 
> segments, 0 bytes from memory into reduce
> 2010-12-08 08:45:15,392 INFO org.apache.hadoop.mapred.Merger: Merging 3 
> sorted segments
> 2010-12-08 08:45:15,397 INFO org.apache.hadoop.mapred.Merger: Down to the 
> last merge-pass, with 3 segments left of total size: 214514566 bytes
> 2010-12-08 08:45:15,489 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader: 
> Loaded native gpl library
> 2010-12-08 08:45:15,522 INFO com.hadoop.compression.lzo.LzoCodec: 
> Successfully loaded & initialized native-lzo library [hadoop-lzo rev 
> 3e7c9dcf0ea0acbde146cb22b236978b344c5525]
> 2010-12-08 08:45:15,530 INFO 
> com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader 
> created.
> 2010-12-08 08:45:15,534 INFO 
> com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader 
> created.
> 2010-12-08 08:45:15,544 INFO 
> com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader 
> created.
> 2010-12-08 08:45:15,562 INFO 
> com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader 
> created.
> 2010-12-08 08:45:15,564 INFO 
> com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader 
> created.
> 2010-12-08 08:45:15,568 INFO 
> com.twitter.elephantbird.pig.load.LzoBaseRegexLoader: LzoBaseRegexLoader 
> created.
> 2010-12-08 08:45:37,233 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
> createBlockOutputStream java.io.IOException: Bad connect ack with 
> firstBadLink 10.98.99.197:50010
> 2010-12-08 08:45:37,235 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
> block blk_8615551403563164366_3938
> 2010-12-08 08:45:43,251 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
> createBlockOutputStream java.io.IOException: Bad connect ack with 
> firstBadLink 10.98.99.197:50010
> 2010-12-08 08:45:43,251 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
> block blk_4074920756844442310_4023
> 2010-12-08 08:45:49,282 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
> createBlockOutputStream java.io.IOException: Bad connect ack with 
> firstBadLink 10.100.226.63:50010
> 2010-12-08 08:45:49,282 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
> block blk_-681320892856427804_4034
> 2010-12-08 08:45:55,292 INFO org.apache.hadoop.hdfs.DFSClient: Exception in 
> createBlockOutputStream java.io.IOException: Bad connect ack with 
> firstBadLink 10.99.26.80:50010
> 2010-12-08 08:45:55,292 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning 
> block blk_-6999793088579291779_4039
> 2010-12-08 08:46:01,294 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer 
> Exception: java.io.IOException: Unable to create new block.
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2812)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 2010-12-08 08:46:01,294 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery 
> for block blk_-6999793088579291779_4039 bad datanode[1] nodes == null
> 2010-12-08 08:46:01,294 WARN org.apache.hadoop.hdfs.DFSClient: Could not get 
> block locations. Source file 
> "/tmp/temp664356070/tmp-973959386/_temporary/_attempt_201012080810_0003_r_000000_0/winrar/output/extlink/2010-04/2010-04-00000"
>  - Aborting...
> 2010-12-08 08:46:01,656 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> running child
> org.apache.pig.backend.executionengine.ExecException: ERROR 2135: Received 
> error from store function.Bad connect ack with firstBadLink 10.99.26.80:50010
>        at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:140)
>        at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
>        at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
>        at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
>        at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.runPipeline(POSplit.java:254)
>        at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.processPlan(POSplit.java:236)
>        at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSplit.getNext(POSplit.java:228)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371)
>        at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:239)
>        at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:463)
>        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:411)
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.io.IOException: Bad connect ack with firstBadLink 
> 10.99.26.80:50010
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2870)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>        at 
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
> 2010-12-08 08:46:01,659 INFO org.apache.hadoop.mapred.TaskRunner: Runnning 
> cleanup for the task
>
> Any idea on what this problem is and how to make hadoop actually finish
> larger datasets?
> (i have to admit i have a lot of rather small files (<30MB) as input,
> but this appears to happen when the reducer tries to write it's result?
>
> regards,
> Johannes
>
>

Reply via email to