[ 
https://issues.apache.org/jira/browse/YARN-9923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975252#comment-16975252
 ] 

Adam Antal commented on YARN-9923:
----------------------------------

Thanks for your response [~eyang].

Let's stick to the script approach then. I think we concluded that it's better 
to do that in a script form then in Java code, also the script args are fine to 
tune the program, so I can accept that.

bq. Personally, I would prefer to avoid multi-script approach. [...]

I'd still argue to the multi-script approach though. The too many running 
threads can actually be a problem, but we're not talking about 20 script here 
only 2-3 ones which has negligible overhead. Also I think this solution can 
give administrators more flexibility: using multiple scripts (that can still 
source some other ones) can allow them to assign a script path to a certain 
team for instance - thus can effectively decouple the responsibility of set of 
scripts and increase maintainability per se. Nonetheless the script args 
approach would be much more complicated when a great amount of scripts and 
functions are in a single bash script depending on a handful of input arguments.

bq. Apache common logging is one of real lesson that I learn from Hadoop that 
having too many run away threads making logging expensive and hard to debug 
where is the failure
Just to be sure, we can enforce the code to have no more than like 4 threads 
(that means running at max 4 individual scripts) and no more, if you don't 
reject this solution.

I won't be upset if you believe this proposal would cause more harm than what 
it helps, but I still have these points concerning supportability and 
flexibility for users. 

> Introduce HealthReporter interface and implement running Docker daemon checker
> ------------------------------------------------------------------------------
>
>                 Key: YARN-9923
>                 URL: https://issues.apache.org/jira/browse/YARN-9923
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, yarn
>    Affects Versions: 3.2.1
>            Reporter: Adam Antal
>            Assignee: Adam Antal
>            Priority: Major
>         Attachments: YARN-9923.001.patch, YARN-9923.002.patch, 
> YARN-9923.003.patch, YARN-9923.004.patch
>
>
> Currently if a NodeManager is enabled to allocate Docker containers, but the 
> specified binary (docker.binary in the container-executor.cfg) is missing the 
> container allocation fails with the following error message:
> {noformat}
> Container launch fails
> Exit code: 29
> Exception message: Launch container failed
> Shell error output: sh: <docker binary path, /usr/bin/docker by default>: No 
> such file or directory
> Could not inspect docker network to get type /usr/bin/docker network inspect 
> host --format='{{.Driver}}'.
> Error constructing docker command, docker error code=-1, error 
> message='Unknown error'
> {noformat}
> I suggest to add a property say "yarn.nodemanager.runtime.linux.docker.check" 
> to have the following options:
> - STARTUP: setting this option the NodeManager would not start if Docker 
> binaries are missing or the Docker daemon is not running (the exception is 
> considered FATAL during startup)
> - RUNTIME: would give a more detailed/user-friendly exception in 
> NodeManager's side (NM logs) if Docker binaries are missing or the daemon is 
> not working. This would also prevent further Docker container allocation as 
> long as the binaries do not exist and the docker daemon is not running.
> - NONE (default): preserving the current behaviour, throwing exception during 
> container allocation, carrying on using the default retry procedure.
> ------------------------------------------------------------------------------------------------
> A new interface called {{HealthChecker}} is introduced which is used in the 
> {{NodeHealthCheckerService}}. Currently existing implementations like 
> {{LocalDirsHandlerService}} are modified to implement this giving a clear 
> abstraction to the node's health. The {{DockerHealthChecker}} implements this 
> new interface.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to