This may be better asked on one of the other hadoop lists, but as the job in 
question is done with Pig I thought I would start here.  I have a nightly job 
that runs against around 1000 gzip log files.  Around once a week one of the 
map tasks will fail reporting some form of gzip error/corruption of the input 
file. The job still completes as successful as we have set 
mapred.max.map.failures.percent = 1 to allow a few input files to fail without 
aborting the entire job.


 Sometimes I can find the name of the corrupt input file in the logs available 
for the map task from the Map/Reduce Administration page on port 50030 of the 
name node.  However most of the time the name is not in these logs.  I can find 
the map task id of the form attempt_201102141346_0097_m_000000_0, but would 
like to know how if possible to find the name of the corrupted input file.  Is 
there a Pig/Haddop file/log somewhere that associates the attempt id with the 
input file?

Thanks,
Scott

Reply via email to