Ivan Mitic commented on YARN-2198:

Thanks again Remus for working on this big and important piece of work. I went 
~70% thru the patch and below is my first pass of comments. Will review the 
rest in a day or two.

1. nativeio.c: {code}
#ifdef UNIX
    THROW(env, "java/io/IOException",
      "The function createTaskAsUser is not supported on Unix");
    return -1;
Should we return null here?
2. {code}
  LPCWSTR lpszCwd = NULL, lpszJobName = NULL, 
    lpszUser = NULL, lpszPidFile = NULL, lpszCmdLine = NULL;
Nit: nativeio code uses different naming convention for local variables. Please 
try to be consistent with the rest of the file.
3. nativeio.c: {code}

  if (lpszCwd)     (*env)->ReleaseStringChars(env, cwd, lpszCwd);
  if (lpszJobName) (*env)->ReleaseStringChars(env, jobName, lpszJobName);
  if (lpszUser)    (*env)->ReleaseStringChars(env, user, lpszUser);
  if (lpszPidFile) (*env)->ReleaseStringChars(env, pidFile, lpszPidFile);
  if (lpszCmdLine) (*env)->ReleaseStringChars(env, cmdLine, lpszCmdLine);

  if (dwError != ERROR_SUCCESS) {
    throw_ioe (env, dwError);
Nit: I would move {{throw_ioe}} if check before done:, the code flow will be 
less error prone :)
4. winutils_process_stub.c: Can {{env->NewGlobalRef())) return null/throw? 
Should we handle this?
5. winutils_process_stub.c: You should properly handle the 
{{GetExitCodeProcess()}} failure case.
6. winutils_process_stub.c: {code}
  HANDLE hProcess, hThread;
7. client.c: Are RPC_STATUS error codes compatible with winerror codes? 
(semantic around checking for error)
8. config.cpp: {code}
#define YARN_SITE_XML_PATH L"%HADOOP_CONF_DIR%\\yarn-site.xml"
#define YARN_DEFAULT_XML_PATH L"%HADOOP_CONF_DIR%\\yarn-default.xml"
Wondering if there is a way to get to config files without adding a dependency 
on env variables? We are looking into how to to make hadoop xcopy deployable 
and evn variables are a pain to handle. One way to mitigate is to try to 
resolve the path relatively and if that fails fall back to HADOOP_CONF_DIR.

9. config.cpp: GetConfigValue: {code}
  if (*len) {
    goto done;
This error check is unintuitive. Can you please be more explicit?
10. config.cpp: {code}
DWORD GetConfigValue(__in LPCWSTR keyName, 
  __out size_t* len, __out_bcount(len) LPCWSTR* value) {
Are SAL annotations correct? For strings one would usually use __out_ecount()?
11. config.cpp: {code}
DWORD GetConfigValueFromXmlFile(__in LPCWSTR xmlFile, __in LPCWSTR keyName, 
  __out size_t* outLen, __out_bcount(len) LPCWSTR* outValue) {
SAL annotation {{__out_bcount}}? Also outLen->len in the annotation.
11. config.cpp: GetConfigValueFromXmlFile: 
  hr = CoInitialize(NULL);
  ERROR_CHECK_HRESULT_DONE(hr, L"CoInitialize");
This should be before StringCbPrintf to guarantee that CoInit and CoUninit are 
12. hdpwinutilsvc.idl: Name does not seem appropriate for apache... possibly 
name it just winutilsvc.idl. Should we use spaces in this file for consistency? 
13. winutils.h: {code}
DWORD SplitStringIgnoreSpaceW(__in size_t len, __in_bcount(len) LPCWSTR source, 
  __in WCHAR deli, 
  __out size_t* count, __out_ecount(count) WCHAR*** out);
{code} __in_bcount(len) -> __in_ecount(len)
14. libwinutils.c: SplitStirngIgnoreSpaceW: Looking at this function (and some 
other xml parsing functions) I'm wondering if this is good opportunity to 
introduce unittests for our C code, as the complexity started increasing beyond 
just windows OS calls, where there is little value in unittesting. 
15. libwinutils.c: BuildServiceSecurityDescriptor: {code}
  eas = (EXPLICIT_ACCESS*) alloca(sizeof(EXPLICIT_ACCESS) * (grantSidCount + 
Should we deallocate this when BuildSecurityDescriptor fails?
I don't think it is required to do this now, just wanted to bring it up: if our 
native codebase continues to grow at this pace we should consider introducing 
smart pointers. It is becoming impossibly hard to properly manage the memory in 
all success/failure cases. This becomes more important now that we have long 
running NM native client and winutils service. 
16. What is the behaviour of calling {{winutils service}}. Will this command 
install and start a winutils.exe service under SYSTEM account, and exit?

> Remove the need to run NodeManager as privileged account for Windows Secure 
> Container Executor
> ----------------------------------------------------------------------------------------------
>                 Key: YARN-2198
>                 URL: https://issues.apache.org/jira/browse/YARN-2198
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: Remus Rusanu
>            Assignee: Remus Rusanu
>              Labels: security, windows
>         Attachments: YARN-2198.1.patch, YARN-2198.2.patch
> YARN-1972 introduces a Secure Windows Container Executor. However this 
> executor requires a the process launching the container to be LocalSystem or 
> a member of the a local Administrators group. Since the process in question 
> is the NodeManager, the requirement translates to the entire NM to run as a 
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT 
> service. The NM can run as a low privilege account and communicate with the 
> privileged NT service when it needs to launch a container. This would reduce 
> the surface exposed to the high privileges. 
> There has to exist a secure, authenticated and authorized channel of 
> communication between the NM and the privileged NT service. Possible 
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would 
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform 
> specific inter-process communication channel that satisfies all requirements 
> and is easy to deploy. The privileged NT service would register and listen on 
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop 
> with libwinutils which would host the LPC client code. The client would 
> connect to the LPC port (NtConnectPort) and send a message requesting a 
> container launch (NtRequestWaitReplyPort). LPC provides authentication and 
> the privileged NT service can use authorization API (AuthZ) to validate the 
> caller.

This message was sent by Atlassian JIRA

Reply via email to