I think the problem is I need to report progress() from my cleanup task. How can I do this?
The commitJob() in my custom org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter[1] only provides org.apache.hadoop.mapreduce.JobContext[2] which has no getProgressible() like the old org.apache.hadoop.mapred.JobContext[3]. [1] http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.html#commitJob%28org.apache.hadoop.mapreduce.JobContext%29 [2] http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/JobContext.html [3] http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobContext.html#getProgressible%28%29 On Sat, Apr 13, 2013 at 2:35 PM, Robert Dyer <[email protected]> wrote: > What does the job cleanup task do? My understanding was it just cleaned > up any intermediate/temporary files and moved the reducer output to the > output directory? Does it do more? > > One of my jobs runs, all maps and reduces finish, but then the job cleanup > task never finishes. Instead it gets killed several times until the entire > Job gets killed: > > Task attempt_201303272327_0772_m_000105_0 failed to report status for 600 > seconds. Killing! > > > I suppose that since my reducers generate around 20GB of output, that > perhaps moving it takes too long? > > Is it possible to disable speculative execution *only* for the cleanup > task? >
