[
https://issues.apache.org/jira/browse/YARN-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929804#comment-16929804
]
shanyu zhao commented on YARN-9834:
-----------------------------------
[~ashvin] I missed that file during git push, it is added now. Thanks!
> Allow using a pool of local users to run Yarn Secure Container in secure mode
> -----------------------------------------------------------------------------
>
> Key: YARN-9834
> URL: https://issues.apache.org/jira/browse/YARN-9834
> Project: Hadoop YARN
> Issue Type: Bug
> Components: nodemanager
> Affects Versions: 3.1.2
> Reporter: shanyu zhao
> Assignee: shanyu zhao
> Priority: Major
>
> Yarn Secure Container in secure mode allows separation of different user's
> local files and container processes running on the same node manager. This
> depends on an out of band service such as SSSD/Winbind to sync all domain
> users to local machine.
> SSSD/Winbind user sync has lots of overhead, especially for large
> corporations. Also if running Yarn inside Kubernetes cluster (meaning node
> managers running inside Docker container), it doesn't make sense for each
> container to sync a whole copy of domain users.
> We should allow a new configuration to Yarn, such that we can pre-create a
> pool of users on each machine/Docker container. And at runtime, Yarn
> allocates a local user to the domain user that submits the application. When
> all containers of that user and all files belonging to that user are deleted,
> we can release the allocation and allow other users to use the same local
> user to run their Yarn containers.
> h2. Design
> We propose to add these new configurations:
> {code:java}
> yarn.nodemanager.linux-container-executor.secure-mode.use-local-user,
> defaults to false
> yarn.nodemanager.linux-container-executor.secure-mode.local-user-prefix,
> defaults to "user"{code}
> By default this feature is turned off. If we enable it, with
> local-user-prefix set to "user", then we expect there are pre-created local
> users user0 - usern, where n equals to:
> {code:java}
> yarn.nodemanager.resource.cpu-vcores {code}
> We can use an in-memory allocator to keep the domain user to local user
> mapping. When to add the mapping and when to remove it?
> In node manager, ApplicationImpl implements the state machine for a Yarn app
> life cycle, only if the app has at least 1 container running on that node
> manager. We can hook up the code to add the mapping during application
> initialization.
> For removing the mapping, we need to wait for 3 things:
> 1) All applications of the same user is completed;
> 2) All log handling of the applications (log aggregation or non-aggregated
> handling) is done;
> 3) All pending FileDeletionTask that use the user's identity is finished.
> h2. Limitations
> 1) This feature does not support PRIVATE visibility type of resource
> allocation. Because PRIVATE type of resources are potentially cached in the
> node manager for a very long time, supporting it will be a security problem
> that a user might be able to peek into previous user's PRIVATE resources. We
> can modify code to treat all PRIVATE type of resource as APPLICATION type.
> 2) It is recommended to enable DominantResourceCalculator so that no more
> than "cpu-vcores" number of concurrent containers running on a node manager:
> {code:java}
> yarn.scheduler.capacity.resource-calculator
> = org.apache.hadoop.yarn.util.resource.DominantResourceCalculator {code}
> 3) Currently this feature does not work with Yarn Node Manager recovery. We
> may add recovery support in the future when we hook up with the right calls
> in the recovery flow.
>
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]