[ https://issues.apache.org/jira/browse/YARN-8259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496492#comment-16496492 ]
Shane Kumpf edited comment on YARN-8259 at 5/31/18 12:51 PM: ------------------------------------------------------------- I've been doing additional testing here and could use input from the community as all of the solutions have cons. Here is what I've tested and been considering. ---- 1) */proc/pid check as yarn* Pros: * No c-e changes * Works with Docker live restore Cons: * Breaks down when using hide pid * Portability ---- 2) */proc/pid or kill -0 as privileged user* Pros: * Works with Docker live restore Cons: * Circumvents hidepid, allows the yarn user to check the existence of any pid due to use of elevated privileges. * Portability (/proc method) ---- 3) *docker inspect* Pros: * No c-e changes * Uses the Docker API Cons: * Requires retry handling to support Docker live restore. ** In the case of a Docker daemon upgrade, this means the upgrade must complete before the retries are exhausted, which could mean hundreds of retries. ---- 4) *Hybrid* (Keep existing kill -0 for non-privileged, docker inspect for privileged) Pros: * No c-e changes * Limits impacts to live restore Cons: * Requires retry handling to support Docker live restore. * Different handling based on container type. ---- I believe #2 is a non-starter as it silently bypasses the hidepid option. I'm leaning towards striking #3 from the list as well, as we really need the recovery logic to be solid, so I don't want to unnecessary impact non-privileged containers which appear to be working well. At this point, I'm leaning towards #4 or #1 (with docs indicating that the NM user must be whitelisted if hidepid is enabled). was (Author: shaneku...@gmail.com): I've been doing additional testing here and could use input from the community as all of the solutions have cons. Here is what I've tested and been considering. ---- 1) */proc/pid check as yarn* Pros: * No c-e changes * Works for with Docker live restore Cons: * Breaks down when using hide pid * Portability ---- 2) */proc/pid or kill -0 as privileged user* Pros: * Works for with Docker live restore Cons: * Circumvents hidepid, allows the yarn user to check the existence of any pid due to use of elevated privileges. * Portability (/proc method) ---- 3) *docker inspect* Pros: * No c-e changes * Uses the Docker API Cons: * Requires retry handling to support Docker live restore. ** In the case of a Docker daemon upgrade, this means the upgrade must complete before the retries are exhausted, which could mean hundreds of retries. ---- 4) *Hybrid* (Keep existing kill -0 for non-privileged, docker inspect for privileged) Pros: * No c-e changes * Limits impacts to live restore Cons: * Requires retry handling to support Docker live restore. * Different handling based on container type. ---- I believe #2 is a non-starter as it silently bypasses the hidepid option. I'm leaning towards striking #3 from the list as well, as we really need the recovery logic to be solid, so I don't want to unnecessary impact non-privileged containers which appear to be working well. At this point, I'm leaning towards #4 or #1 (with docs indicating that the NM user must be whitelisted if hidepid is enabled). > Revisit liveliness checks for Docker containers > ----------------------------------------------- > > Key: YARN-8259 > URL: https://issues.apache.org/jira/browse/YARN-8259 > Project: Hadoop YARN > Issue Type: Sub-task > Affects Versions: 3.0.2, 3.2.0, 3.1.1 > Reporter: Shane Kumpf > Assignee: Shane Kumpf > Priority: Blocker > Labels: Docker > Attachments: YARN-8259.001.patch > > > As privileged containers may execute as a user that does not match the YARN > run as user, sending the null signal for liveliness checks could fail. We > need to reconsider how liveliness checks are handled in the Docker case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org