[ 
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054909#comment-14054909
 ] 

Remus Rusanu commented on YARN-2198:
------------------------------------

I have uploaded a first patch so we can start the review discussion. Here is a 
summary of changes:

 - `winutils service` is a new winutils CLI option to that causes winutils to 
attach to SCM (ie. start as a service) and open an LPC endpoint. This service 
is required to run with elevated privileges (LocalSystem)
 - an LPC protocol is declared:
{code}
interface Hdpwinutilsvc
{
        typedef struct {
                [string] const wchar_t* cwd;
                [string] const wchar_t* jobName;
                [string] const wchar_t* user;
                [string] const wchar_t* pidFile;
                [string] const wchar_t* cmdLine;
        } CREATE_PROCESS_REQUEST;

        typedef struct {
                LONG_PTR hProcess;
                LONG_PTR hThread;
                LONG_PTR hStdIn;
                LONG_PTR hStdOut;
                LONG_PTR hStdErr;
        } CREATE_PROCESS_RESPONSE;

         error_status_t WinutilsCreateProcessAsUser(
                [in] int nmPid,
                [in] CREATE_PROCESS_REQUEST *request,
                [out] CREATE_PROCESS_RESPONSE **response);
}
{code}
 - hadoop.dll JNI is extended via NativeIO.createTaskAsUser to use the LPC 
mechanism to ask the winutils service to start the containers (and the 
localizer too)
 - The winutils service does not do any S4U impersonation work. It simply 
spwans winutils again, with the appropriate command line for S4U (ie. 
YARN-1063). The process is created suspended, the process handle, the main 
thread handle and stdin/stdout/stderr handles are duplicated in NM. The LPC 
call response (out) structure contains all these handles.
 - NM takes ownership of the spawned process, creates Java Input/Output stream 
around the stdin/stdout/stderr and then resumes the process. The resumed 
process does the S4U work, spawns the secure container process and waits for 
the container execution to finish (ditto for localization).
 - The NM uses org.apache.hadoop.io.nativeio.NativeIO.WinutilsProcessStub to 
control the process spwaned by the wintuils. This class uses several JNI 
methods to control this process.

Access check
----------------

1. Service Access check. The winutils service authorizes the caller for 
permission to use the elevated service create process feature. Access check is 
performed in the RPC authorization context using AuthZ and the RPC client 
context. Authorization is checked against an ah-hoc security descriptor that 
describes the configurable 'allowed' users. Normally this should contain the NM 
(or the YARN group perhaps).

2. The impersonation access check. winutils task createAsUser perfoms the 
access check on the user being impersonated against the configurable 'allowed' 
and 'denied' lists. The check is done using AuthZ using an authz context 
derived from the user logon token (LsaLogonUser token, see YARN-1063)  against 
an ad-hoc security descriptor that describes the two configurable lists 
('allowed' and 'denied'). Note that the access check is not done at the 
winutils service LPC call layer, but at the S4U layer. This way the winutils 
tool cannot be uses outside the service call context to bypass the check. True 
that the check is preventing something that the caller, in that context (ie. 
not when using the winutils service) is allowed to do, so the caller could use 
any other tool of choice (PoSh scripts) to do the same. A second reason to do 
it at this layer (some would say the true reason...)  is that this layer has 
the proper infrastructure to do the check (the logon handle). Had the check be 
done at the winutils LPC service layer that could would have to also obtain the 
logon token just to do the check. Doing the check at the S4U layer is both 
simpler and more intuitive for an admin user.

What's not present in this 1.patch:

 - the access check configuration is based on settings in yarn-site.xml. Will 
need to be moved to a separate config file (TBD if xml or not).
 - The NM handling of the spawned process (parse the output, wait for 
completion, handle timeout if any) is a duplicate of 
ShellProcess.ShellCommandExecutor. I tried to refactor the later to handle an 
injected Process rather than the one it spawns itself, but it ripples over.
 - code needs cleanup, it shows the signs of the mighty struggle it took to get 
it to work.

> Remove the need to run NodeManager as privileged account for Windows Secure 
> Container Executor
> ----------------------------------------------------------------------------------------------
>
>                 Key: YARN-2198
>                 URL: https://issues.apache.org/jira/browse/YARN-2198
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>              Labels: security, windows
>         Attachments: YARN-2198.1.patch
>
>
> YARN-1972 introduces a Secure Windows Container Executor. However this 
> executor requires a the process launching the container to be LocalSystem or 
> a member of the a local Administrators group. Since the process in question 
> is the NodeManager, the requirement translates to the entire NM to run as a 
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT 
> service. The NM can run as a low privilege account and communicate with the 
> privileged NT service when it needs to launch a container. This would reduce 
> the surface exposed to the high privileges. 
> There has to exist a secure, authenticated and authorized channel of 
> communication between the NM and the privileged NT service. Possible 
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
> specific inter-process communication channel that satisfies all requirements 
> and is easy to deploy. The privileged NT service would register and listen on 
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
> with libwinutils which would host the LPC client code. The client would 
> connect to the LPC port (NtConnectPort) and send a message requesting a 
> container launch (NtRequestWaitReplyPort). LPC provides authentication and 
> the privileged NT service can use authorization API (AuthZ) to validate the 
> caller.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to