Jason Lowe commented on YARN-4011:

bq. The mapreduce task can check for BYTES_WRITTEN counter and fail fast if it 
is above the configured limit.
I think having the MR framework provide an optional limit for local filesystem 
output is a reasonable request until a more sophisticated solution can be 
implemented by YARN directly.

> Jobs fail since nm-local-dir not cleaned up when rogue job fills up disk
> ------------------------------------------------------------------------
>                 Key: YARN-4011
>                 URL: https://issues.apache.org/jira/browse/YARN-4011
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>    Affects Versions: 2.4.0
>            Reporter: Ashwin Shankar
> We observed jobs failed since tasks couldn't launch on nodes due to 
> "java.io.IOException No space left on device". 
> On digging in further, we found a rogue job which filled up disk.
> Specifically it was wrote a lot of map spills(like 
> attempt_1432082376223_461647_m_000421_0_spill_10000.out) to nm-local-dir 
> causing disk to fill up, and it failed/got killed, but didn't clean up these 
> files in nm-local-dir.
> So the disk remained full, causing subsequent jobs to fail.
> This jira is created to address why files under nm-local-dir doesn't get 
> cleaned up when job fails after filling up disk.

This message was sent by Atlassian JIRA

Reply via email to