Ashwin Shankar created YARN-4011:

             Summary: Jobs fail since nm-local-dir not cleaned up when rogue 
job fills up disk
                 Key: YARN-4011
             Project: Hadoop YARN
          Issue Type: Bug
          Components: yarn
    Affects Versions: 2.4.0
            Reporter: Ashwin Shankar

We observed jobs failed since tasks couldn't launch on nodes due to 
" No space left on device". 
On digging in further, we found a rogue job which filled up disk.
Specifically it was wrote a lot of map spills(like 
attempt_1432082376223_461647_m_000421_0_spill_10000.out) to nm-local-dir 
causing disk to fill up, and it failed/got killed, but didn't clean up its 
files in nm-local-dir.
So the disk remained full, causing subsequent jobs to fail.

This message was sent by Atlassian JIRA

Reply via email to