[ 
https://issues.apache.org/jira/browse/YARN-8523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573946#comment-16573946
 ] 

Zian Chen commented on YARN-8523:
---------------------------------

[~eyang], thanks for raising this feature. This is very useful for live debug 
of container diagnosis. we can add a series of  interactive commands to let 
user debug more effectively, like tail -f container log, container resource 
usage, etc. 

For handling nodemanager restart scenario, we can register a event listener to 
listen restart or shutdown signal of node manager web socket and respond in 
xterm js terminal accordingly, (like print out NM restart/shutdown message to 
user, etc) and do reconnect retries several times after typical nm restart 
interval. Again, if NM meet any unexpected issue which can not resume its 
service, that's something we can not solve on this interactive docker shell by 
itself and we should just give user reasonable alert message to inform the 
current situation (like retry failed with timeout, please check NM log to get 
more information, etc). I think pass command through NM web socket and reuse 
container-executor security check would be a good prototype we can build first 
without have too much burden on handling root daemon by carving another secure 
channel. 

> Interactive docker shell
> ------------------------
>
>                 Key: YARN-8523
>                 URL: https://issues.apache.org/jira/browse/YARN-8523
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Eric Yang
>            Priority: Major
>              Labels: Docker
>
> Some application might require interactive unix commands executions to carry 
> out operations.  Container-executor can interface with docker exec to debug 
> or analyze docker containers while the application is running.  It would be 
> nice to support an API to invoke docker exec to perform unix commands and 
> report back the output to application master.  Application master can 
> distribute and aggregate execution of the commands to record in application 
> master log file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to