Hi, With large map output the task tracker can time out (no progress update during merge). Using io.sort.factor i can tune the merge phase to proceed a bit faster. Yet it can still time out when the cluster is very busy etc. I've increased the task time out but now it also takes longer to get rid of handing threads.
The fetcher thread time out is mapred.task.timeout / 2, it makes sense but i guess it would make more sense to reduce the time out value even further; why would i want to wait so long for it to get aborted anyway? Now a single mapper can have a huge impact in avg. thoughput. Thought? thanks

