Maysam Yabandeh created MAPREDUCE-6489:
------------------------------------------

             Summary: Fail fast rogue tasks that write too much to local disk
                 Key: MAPREDUCE-6489
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6489
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: task
    Affects Versions: 2.7.1
            Reporter: Maysam Yabandeh


Tasks of the rogue jobs can write too much to local disk, negatively affecting 
the jobs running in collocated containers. Ideally YARN will be able to limit 
amount of local disk used by each task: YARN-4011. Until then, the mapreduce 
task can fail fast if the task is writing too much (above a configured 
threshold) to local disk.

As we discussed 
[here|https://issues.apache.org/jira/browse/YARN-4011?focusedCommentId=14902750&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14902750]
 the suggested approach is that the MapReduce task checks for BYTES_WRITTEN 
counter for the local disk and throws an exception when it goes beyond a 
configured value.  It is true that written bytes is larger than the actual used 
disk space, but to detect a rogue task the exact value is not required and a 
very large value for written bytes to local disk is a good indicative that the 
task is misbehaving.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to