[ 
https://issues.apache.org/jira/browse/YARN-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932748#comment-16932748
 ] 

shanyu zhao commented on YARN-9834:
-----------------------------------

[~eyang]
{quote}User's home directory may be used by multiple parties. Some application 
may install ssh key or user preference into user home directory. User home 
directory becomes a public space in this design. This breaks user home 
directory privacy for existing applications.{quote}
There is no home directory configured for "domain users" and Yarn container are 
not supposed to use home directory. The working directory of a container is 
managed by Yarn node manager and will be deleted when the container is 
finished. Any additional credentials (other than hadoop delegation token) is 
also localized to the container working directory and will be cleaned up.

{quote}Application can not rely on any group membership lookup because the user 
does not have group membership in the container. To access external posix like 
file system in application, group authorization needs to be custom code like 
going through HDFS client rather than standard file interfaces. This is 
incompatible with most file systems options today.{quote}
Again for HDFS access hadoop delegation token is used. For any external 
resource access such as JDBC, the credentials (e.g. keytab file) should be 
localized to the container as a file resource. The only thing I can think of is 
this scenario: we want to allow sharing some content in local file system of 
the node manager and configure a certain group of user can read it. I don't 
believe this is an important usage scenario, and we can always put that shared 
content to HDFS and rely on resource localization and manage the ACLs on HDFS.

{quote}User can not access raw devices like GPU. Many of the sudo or kernel 
capabilities are only granted permissively to users with sudo rights or proper 
group permission. Node manager does not have the logic to grant user sudo 
rights. If continue on this path, this means node manager becomes root to 
delegate rights of the user. This makes node manager more powerful and more 
dangerous.{quote}
GPU scheduling and isolation rely on CGroup. The "domain user" cannot be 
sudoers otherwise they can peek into each other's working directory. 

I'm trying to provide an alternative to SSSD (the domain user sync) to provide 
a light-weight secure mode for LinuxContainerExecutor to provide the benefit of 
Yarn Secure Containers (user isolation inside node manager). I believe it works 
for most of the scenarios. If there are certain scenario this cannot support, 
we can call it out. It's not a bad idea to offer an additional choice. But I 
don't want to have any security holes in any situation.

{quote}

> Allow using a pool of local users to run Yarn Secure Container in secure mode
> -----------------------------------------------------------------------------
>
>                 Key: YARN-9834
>                 URL: https://issues.apache.org/jira/browse/YARN-9834
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 3.1.2
>            Reporter: shanyu zhao
>            Assignee: shanyu zhao
>            Priority: Major
>
> Yarn Secure Container in secure mode allows separation of different user's 
> local files and container processes running on the same node manager. This 
> depends on an out of band service such as SSSD/Winbind to sync all domain 
> users to local machine.
> Winbind user sync has lots of overhead, especially for large corporations. 
> Also if running Yarn inside Kubernetes cluster (meaning node managers running 
> inside Docker container), it doesn't make sense for each container to domain 
> join with Active Directory and sync a whole copy of domain users.
> We should allow a new configuration to Yarn, such that we can pre-create a 
> pool of users on each machine/Docker container. And at runtime, Yarn 
> allocates a local user to the domain user that submits the application. When 
> all containers of that user are finished and all files belonging to that user 
> are deleted, we can release the allocation and allow other users to use the 
> same local user to run their Yarn containers.
> h2. Design
> We propose to add these new configurations:
> {code:java}
> yarn.nodemanager.linux-container-executor.secure-mode.use-local-user, 
> defaults to false
> yarn.nodemanager.linux-container-executor.secure-mode.local-user-prefix, 
> defaults to "user"{code}
> By default this feature is turned off. If we enable it, with 
> local-user-prefix set to "user", then we expect there are pre-created local 
> users user0 - usern, where the total number of local users equals to:
> {code:java}
> yarn.nodemanager.resource.cpu-vcores {code}
> We can use an in-memory allocator to keep the domain user to local user 
> mapping. 
> Now when to add the mapping and when to remove it?
> In node manager, ApplicationImpl implements the state machine for a Yarn app 
> life cycle, only if the app has at least 1 container running on that node 
> manager. We can hook up the code to add the mapping during application 
> initialization.
> For removing the mapping, we need to wait for 3 things:
> 1) All applications of the same user is completed;
>  2) All log handling of the applications (log aggregation or non-aggregated 
> handling) is done;
>  3) All pending FileDeletionTask that use the user's identity is finished.
> Note that all operation to these reference counting should be synchronized 
> operation.
> If all of our local users in the pool are allocated, we'll return 
> "nonexistuser" as runas user, this will cause the container to fail to 
> execute and Yarn will relaunch it in other nodes.
> What about node manager restarts? During ResourceLocalizationService init, it 
> renames the root folders used by the node manager and schedules 
> FileDeletionTask to delete the content of these files. To prevent the newly 
> launched Yarn containers to be able to peek into the yet-to-be-deleted old 
> application folders right after node manager restart, we can chmod the root 
> folders to 700 right after rename.
> h2. Limitations
> 1) This feature does not support PRIVATE visibility type of resource 
> allocation. Because PRIVATE type of resources are potentially cached in the 
> node manager for a very long time, supporting it will be a security problem 
> that a user might be able to peek into previous user's PRIVATE resources. We 
> can modify code to treat all PRIVATE type of resource as APPLICATION type.
> 2) It is recommended to enable DominantResourceCalculator so that no more 
> than "cpu-vcores" number of concurrent containers running on a node manager:
> {code:java}
> yarn.scheduler.capacity.resource-calculator
> = org.apache.hadoop.yarn.util.resource.DominantResourceCalculator {code}
> 3) Currently this feature does not work with Yarn Node Manager recovery. This 
> is because the mappings are kept in memory, it cannot be recovered after node 
> manager restart.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to