[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16483845#comment-16483845 ]
Shane Kumpf commented on YARN-8259: ----------------------------------- {quote}System administrator can reserve one cpu core for node manager and all the docker inspect call are counted toward saturating one cpu core{quote} I'm less concerned about the cpu usage and more about docker's client/server model and the potential for hangs (that I've seen many of in the past under load). Personally, I want the /proc route for my systems and am not using hidepid. Losing a container due to an intermittent docker issue isn't really acceptable to me when an alternative exists that avoids the issue. What I could do is implement both the /proc and {{docker inspect}} approaches, and a configuration switch to choose the implementation for that that use hidepid (or a system without /proc). Would that be acceptable? I'm also going to make this a blocker, as all privileged containers are leaked on NM restart today. > Revisit liveliness checks for Docker containers > ----------------------------------------------- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task > Affects Versions: 3.0.2, 3.2.0, 3.1.1 > Reporter: Shane Kumpf > Assignee: Shane Kumpf > Priority: Major > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org