Re: LinuxContainerExecutor mkdir failures causing NodeManagers to become unhealthy

Jeff Hubbs Mon, 17 Sep 2018 12:05:37 -0700

I would also just suggest moving up to 3.1.1 and trying again. Barringthat, maybe you can take the error message at its word. My experiencewith running Hadoop 3.x jobs is a little limited, but I know that jobscan paint a lot of data into /tmp/hadoop-yarn and if your nodes can'tabsorb a lot of expansion in that directory, things will error outalbeit softly. Noting the way the terasort example behaves in thatregard, I set up my worker nodes to make /tmp/hadoop-yarn a mount pointfor its own disk volume whose size I can preset and I can alsooptionally enable transparent compression via btrfs. A lot of times, Iwould expect I could give that volume some token small size but intrying to make a 1/5-scale (i.e., 200GB) terasort run, 128GiB withcompression enabled across five workers wasn't enough. 1/10th-scale Icould manage but at 1/5, it would fill up one node's /tmp/hadoop-yarn,then the next, then the next, etc. Makes me think that terasort tries towrite the whole dang thing out to extra-HDFS file system before makingan output file in HDFS.


On 9/17/18 1:55 PM, Eric Badger wrote:

Hi Jonathan,

Have you opened up a YARN JIRA with your findings? If not, that wouldbe the next step in debugging the issue and coding up a fix. Thiscertainly sounds like a bug and something that we should get to thebottom of.

As far as Nodemanagers becoming unhealthy, a config could be added toprevent this. But, if you're only seeing 1 failure out of millions oftasks, this seems like it would unmask more problems than it fixes. 1container failing is bad, but a node going bad and failing everycontainer that runs on it forever until it is shutdown is much, muchworse. However, if you think that you have a use case that couldbenefit from the config being optional, that is something we couldalso look into. That would be a separate YARN JIRA as well.


Thanks,

Eric

On Mon, Sep 17, 2018 at 12:37 PM, Jonathan Bender<jonben...@stripe.com.invalid <mailto:jonben...@stripe.com.invalid>>wrote:


    Hello,

    We started are using CGroups with LinuxContainerExecutor recently,
    running Apache Hadoop 3.0.0. Occasionally (once out of many
    millions of tasks) a yarn container will fail with a message like
    the following:
    WARN privileged.PrivilegedOperationExecutor: Shell execution
    returned exit code: 35. Privileged Execution Operation Stderr:
    Could not create container dirsCould not create local files and
    directories

    Looking at the container executor source it's traceable to errors
    here:
    
https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604
    
<https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604>

    And ultimately to
    
https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672
    
<https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672>

    The root failure seems to be in the underlying mkdir call, but
    that exit code / errno is swallowed so we don't have more details.
    We tend to see this when many containers start at the same time
    for the same application on a host, and suspect it may be related
    to some race conditions around those shared directories between
    containers for the same application.

    Has anyone seen similar failures in using the LinuxContainerExecutor?

    This issue compounded because LinuxContainerExecutor renders the
    node unhealthy in these scenarios:
    
https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566
    
<https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566>

    Under some circumstances this seems appropriate, but since this is
    a transient failure (none of these machines were at capacity for
    disks, inodes, etc) we shouldn't down the NodeManager. The
    behavior to add this blacklisting came as part of
    https://issues.apache.org/jira/browse/YARN-6302
    <https://issues.apache.org/jira/browse/YARN-6302> which seems
    perfectly valid, but perhaps we should make this configurable so
    certain users can opt out?

    Cheers,
    Jon

Re: LinuxContainerExecutor mkdir failures causing NodeManagers to become unhealthy

Reply via email to