[
https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483845#comment-16483845
]
Shane Kumpf commented on YARN-8259:
-----------------------------------
{quote}System administrator can reserve one cpu core for node manager and all
the docker inspect call are counted toward saturating one cpu core{quote}
I'm less concerned about the cpu usage and more about docker's client/server
model and the potential for hangs (that I've seen many of in the past under
load). Personally, I want the /proc route for my systems and am not using
hidepid. Losing a container due to an intermittent docker issue isn't really
acceptable to me when an alternative exists that avoids the issue.
What I could do is implement both the /proc and {{docker inspect}} approaches,
and a configuration switch to choose the implementation for that that use
hidepid (or a system without /proc). Would that be acceptable?
I'm also going to make this a blocker, as all privileged containers are leaked
on NM restart today.
> Revisit liveliness checks for Docker containers
> -----------------------------------------------
>
> Key: YARN-8259
> URL: https://issues.apache.org/jira/browse/YARN-8259
> Project: Hadoop YARN
> Issue Type: Sub-task
> Affects Versions: 3.0.2, 3.2.0, 3.1.1
> Reporter: Shane Kumpf
> Assignee: Shane Kumpf
> Priority: Major
> Labels: Docker
> Attachments: YARN-8259.001.patch
>
>
> As privileged containers may execute as a user that does not match the YARN
> run as user, sending the null signal for liveliness checks could fail. We
> need to reconsider how liveliness checks are handled in the Docker case.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]