I reduced the thread time out time to mapred.task.timeout / 8, meaning 150 
seconds in my case. This actually helps for mappers that finish the queue but 
remaing hanging on some items and it helps to prematurely end instead of kill 
a task that's running on a server with too high load. My VM's suffer from 
having RAID-5 enabled and a short on RAM so i/o-wait is high. Fetcher threads 
that would normally be killed by the tracker are now being timed out. This 
means that whatever it's fetched is saved and no new single map is started, 
which would increase run time again.

Comments?

> Hi,
> 
> With large map output the task tracker can time out (no progress update
> during merge). Using io.sort.factor i can tune the merge phase to proceed
> a bit faster. Yet it can still time out when the cluster is very busy etc.
> I've increased the task time out but now it also takes longer to get rid
> of handing threads.
> 
> The fetcher thread time out is mapred.task.timeout / 2, it makes sense but
> i guess it would make more sense to reduce the time out value even
> further; why would i want to wait so long for it to get aborted anyway?
> Now a single mapper can have a huge impact in avg. thoughput.
> 
> Thought?
> thanks

Reply via email to