Miklos Szegedi created YARN-6302:
------------------------------------

             Summary: Fail the node, if Linux Container Executor is not 
configured properly
                 Key: YARN-6302
                 URL: https://issues.apache.org/jira/browse/YARN-6302
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Miklos Szegedi
            Assignee: Miklos Szegedi
            Priority: Minor


We have a cluster that has one node with misconfigured Linux Container 
Executor. Every time an AM or regular container is launched on the cluster, it 
will fail. The node will still have resources available, so it keeps failing 
apps until the administrator notices the issue and decommissions the node. AM 
Blacklisting only helps, if the application is already running.

As a possible improvement, when the LCE is used on the cluster and a NM gets 
certain errors back from the LCE, like error 24 configuration not found, we 
should not try to allocate anything on the node anymore or shut down the node 
entirely. That kind of problem normally does not fix itself and it means that 
nothing can really run on that node.

{code}
Application application_1488920587909_0010 failed 2 times due to AM Container 
for appattempt_1488920587909_0010_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: Application application_1488920587909_0010 
initialization failed (exitCode=24) with output:
For more detailed output, check the application tracking page: 
http://node-1.domain.com:8088/cluster/app/application_1488920587909_0010 Then 
click on links to logs of each attempt.
. Failing the application.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to