Miklos Szegedi created YARN-6302:
------------------------------------
Summary: Fail the node, if Linux Container Executor is not
configured properly
Key: YARN-6302
URL: https://issues.apache.org/jira/browse/YARN-6302
Project: Hadoop YARN
Issue Type: Bug
Reporter: Miklos Szegedi
Assignee: Miklos Szegedi
Priority: Minor
We have a cluster that has one node with misconfigured Linux Container
Executor. Every time an AM or regular container is launched on the cluster, it
will fail. The node will still have resources available, so it keeps failing
apps until the administrator notices the issue and decommissions the node. AM
Blacklisting only helps, if the application is already running.
As a possible improvement, when the LCE is used on the cluster and a NM gets
certain errors back from the LCE, like error 24 configuration not found, we
should not try to allocate anything on the node anymore or shut down the node
entirely. That kind of problem normally does not fix itself and it means that
nothing can really run on that node.
{code}
Application application_1488920587909_0010 failed 2 times due to AM Container
for appattempt_1488920587909_0010_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: Application application_1488920587909_0010
initialization failed (exitCode=24) with output:
For more detailed output, check the application tracking page:
http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then
click on links to logs of each attempt.
. Failing the application.
{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]