I would also just suggest moving up to 3.1.1 and trying again. Barring
that, maybe you can take the error message at its word. My experience
with running Hadoop 3.x jobs is a little limited, but I know that jobs
can paint a lot of data into /tmp/hadoop-yarn and if your nodes can't
absorb a lot of expansion in that directory, things will error out
albeit softly. Noting the way the terasort example behaves in that
regard, I set up my worker nodes to make /tmp/hadoop-yarn a mount point
for its own disk volume whose size I can preset and I can also
optionally enable transparent compression via btrfs. A lot of times, I
would expect I could give that volume some token small size but in
trying to make a 1/5-scale (i.e., 200GB) terasort run, 128GiB with
compression enabled across five workers wasn't enough. 1/10th-scale I
could manage but at 1/5, it would fill up one node's /tmp/hadoop-yarn,
then the next, then the next, etc. Makes me think that terasort tries to
write the whole dang thing out to extra-HDFS file system before making
an output file in HDFS.
On 9/17/18 1:55 PM, Eric Badger wrote:
Hi Jonathan,
Have you opened up a YARN JIRA with your findings? If not, that would
be the next step in debugging the issue and coding up a fix. This
certainly sounds like a bug and something that we should get to the
bottom of.
As far as Nodemanagers becoming unhealthy, a config could be added to
prevent this. But, if you're only seeing 1 failure out of millions of
tasks, this seems like it would unmask more problems than it fixes. 1
container failing is bad, but a node going bad and failing every
container that runs on it forever until it is shutdown is much, much
worse. However, if you think that you have a use case that could
benefit from the config being optional, that is something we could
also look into. That would be a separate YARN JIRA as well.
Thanks,
Eric
On Mon, Sep 17, 2018 at 12:37 PM, Jonathan Bender
<jonben...@stripe.com.invalid <mailto:jonben...@stripe.com.invalid>>
wrote:
Hello,
We started are using CGroups with LinuxContainerExecutor recently,
running Apache Hadoop 3.0.0. Occasionally (once out of many
millions of tasks) a yarn container will fail with a message like
the following:
WARN privileged.PrivilegedOperationExecutor: Shell execution
returned exit code: 35. Privileged Execution Operation Stderr:
Could not create container dirsCould not create local files and
directories
Looking at the container executor source it's traceable to errors
here:
https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604
<https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L1604>
And ultimately to
https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672
<https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c#L672>
The root failure seems to be in the underlying mkdir call, but
that exit code / errno is swallowed so we don't have more details.
We tend to see this when many containers start at the same time
for the same application on a host, and suspect it may be related
to some race conditions around those shared directories between
containers for the same application.
Has anyone seen similar failures in using the LinuxContainerExecutor?
This issue compounded because LinuxContainerExecutor renders the
node unhealthy in these scenarios:
https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566
<https://github.com/apache/hadoop/blob/release-3.0.0-RC1/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/LinuxContainerExecutor.java#L566>
Under some circumstances this seems appropriate, but since this is
a transient failure (none of these machines were at capacity for
disks, inodes, etc) we shouldn't down the NodeManager. The
behavior to add this blacklisting came as part of
https://issues.apache.org/jira/browse/YARN-6302
<https://issues.apache.org/jira/browse/YARN-6302> which seems
perfectly valid, but perhaps we should make this configurable so
certain users can opt out?
Cheers,
Jon