Re: Any reason a bunch of nearly-identical jobs would suddenly stop working?

Mridul Muralidharan Wed, 09 Mar 2011 17:29:55 -0800


Did you try checking the task logs ?
There might be more details there ...



Regards,
Mridul

On Wednesday 09 March 2011 04:23 AM, Kris Coward wrote:


So I queued up a batch of jobs last night to run overnight (and into the
day a bit, owing to to a bottleneck on the scheduler the way that things
are currently implemented), made sure they were running correctly, went
to sleep, and when I woke up in the morning, they were failing all over
the place.

Since each of these jobs was basicaly the same pig script being run with
a different set of parameters, I tried re-reunning it with the
parameters that it had run (successfully) with the night before, and it
also failed. So I started whittling away at steps to try and find the
origin of the failure, until I was even getting a failure loading the
initial data, and dumping it out. Basically, I've reduced things to a
matter of

apa = LOAD 
'/rawfiles/08556ecf5c6841d59eb702e9762e649a/{1296432000,1296435600,1296439200,1296442800,1296446400,1296450000,1296453600,1296457200,1296460800,1296464400,1296468000,1296471600,1296475200,1296478800,1296482400,1296486000,1296489600,1296493200,1296496800,1296500400,1296504000,1296507600,1296511200,1296514800}/*/apa'
 USING com.twitter.elephantbird.pig.load.LzoTokenizedLoader(',') AS 
(timestamp:long, type:chararray, appkey:chararray, uid:chararray, 
uniq:chararray, shortUniq:chararray, profUid:chararray, addr:chararray, 
ref:chararray);
dump apa;

and after getting all the happy messages from the loader like:

2011-03-08 21:48:46,454 [Thread-12] INFO 
com.twitter.elephantbird.pig.load.LzoBaseLoadFunc - Got 117 LZO slices in total.
2011-03-08 21:48:48,044 [main] INFO 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2011-03-08 21:50:17,612 [main] INFO 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete

It went straight to:

2011-03-08 21:50:17,612 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 1 map reduce job(s) failed!
2011-03-08 21:50:17,662 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed 
to produce result in: 
"hdfs://master.hadoop:9000/tmp/temp-2121884028/tmp-268519128"
2011-03-08 21:50:17,664 [main] INFO 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Failed!
2011-03-08 21:50:17,668 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1066: Unable to open iterator for alias apa
Details at logfile: /home/kris/pig_1299620898192.log

And looking at the stack trace in the logfile, I've got:

Pig Stack Trace
---------------
ERROR 1066: Unable to open iterator for alias apa

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias apa
         at org.apache.pig.PigServer.openIterator(PigServer.java:482)
         at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
         at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
         at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
         at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
         at org.apache.pig.Main.main(Main.java:352)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
         at org.apache.pig.PigServer.openIterator(PigServer.java:476)
         ... 6 more
================================================================================

My sysadmin's off on vacation for the week, but left information on the
scripts to restart the cluster, so I tried that, and the problem is
still persisting, so I was hoping someone here might have an idea what's
wrong (and how to fix it).

Thanks,
Kris

Re: Any reason a bunch of nearly-identical jobs would suddenly stop working?

Reply via email to