[
https://issues.apache.org/jira/browse/YARN-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14091901#comment-14091901
]
Ivan Mitic commented on YARN-2198:
----------------------------------
Thanks again Remus for working on this big and important piece of work. I went
~70% thru the patch and below is my first pass of comments. Will review the
rest in a day or two.
1. nativeio.c: {code}
#ifdef UNIX
THROW(env, "java/io/IOException",
"The function createTaskAsUser is not supported on Unix");
return -1;
#endif
{code}
Should we return null here?
2. {code}
LPCWSTR lpszCwd = NULL, lpszJobName = NULL,
lpszUser = NULL, lpszPidFile = NULL, lpszCmdLine = NULL;
{code}
Nit: nativeio code uses different naming convention for local variables. Please
try to be consistent with the rest of the file.
3. nativeio.c: {code}
done:
if (lpszCwd) (*env)->ReleaseStringChars(env, cwd, lpszCwd);
if (lpszJobName) (*env)->ReleaseStringChars(env, jobName, lpszJobName);
if (lpszUser) (*env)->ReleaseStringChars(env, user, lpszUser);
if (lpszPidFile) (*env)->ReleaseStringChars(env, pidFile, lpszPidFile);
if (lpszCmdLine) (*env)->ReleaseStringChars(env, cmdLine, lpszCmdLine);
if (dwError != ERROR_SUCCESS) {
throw_ioe (env, dwError);
}
{code}
Nit: I would move {{throw_ioe}} if check before done:, the code flow will be
less error prone :)
4. winutils_process_stub.c: Can {{env->NewGlobalRef())) return null/throw?
Should we handle this?
5. winutils_process_stub.c: You should properly handle the
{{GetExitCodeProcess()}} failure case.
6. winutils_process_stub.c: {code}
HANDLE hProcess, hThread;
{code}
Init to INVALID_HANDLE_VALUE?
7. client.c: Are RPC_STATUS error codes compatible with winerror codes?
(semantic around checking for error)
8. config.cpp: {code}
#define YARN_SITE_XML_PATH L"%HADOOP_CONF_DIR%\\yarn-site.xml"
#define YARN_DEFAULT_XML_PATH L"%HADOOP_CONF_DIR%\\yarn-default.xml"
{code}
Wondering if there is a way to get to config files without adding a dependency
on env variables? We are looking into how to to make hadoop xcopy deployable
and evn variables are a pain to handle. One way to mitigate is to try to
resolve the path relatively and if that fails fall back to HADOOP_CONF_DIR.
9. config.cpp: GetConfigValue: {code}
if (*len) {
goto done;
}
{code}
This error check is unintuitive. Can you please be more explicit?
10. config.cpp: {code}
DWORD GetConfigValue(__in LPCWSTR keyName,
__out size_t* len, __out_bcount(len) LPCWSTR* value) {
{code}
Are SAL annotations correct? For strings one would usually use __out_ecount()?
11. config.cpp: {code}
DWORD GetConfigValueFromXmlFile(__in LPCWSTR xmlFile, __in LPCWSTR keyName,
__out size_t* outLen, __out_bcount(len) LPCWSTR* outValue) {
{code}
SAL annotation {{__out_bcount}}? Also outLen->len in the annotation.
11. config.cpp: GetConfigValueFromXmlFile:
{code}
hr = CoInitialize(NULL);
ERROR_CHECK_HRESULT_DONE(hr, L"CoInitialize");
{code}
This should be before StringCbPrintf to guarantee that CoInit and CoUninit are
balanced.
12. hdpwinutilsvc.idl: Name does not seem appropriate for apache... possibly
name it just winutilsvc.idl. Should we use spaces in this file for consistency?
13. winutils.h: {code}
DWORD SplitStringIgnoreSpaceW(__in size_t len, __in_bcount(len) LPCWSTR source,
__in WCHAR deli,
__out size_t* count, __out_ecount(count) WCHAR*** out);
{code} __in_bcount(len) -> __in_ecount(len)
14. libwinutils.c: SplitStirngIgnoreSpaceW: Looking at this function (and some
other xml parsing functions) I'm wondering if this is good opportunity to
introduce unittests for our C code, as the complexity started increasing beyond
just windows OS calls, where there is little value in unittesting.
15. libwinutils.c: BuildServiceSecurityDescriptor: {code}
eas = (EXPLICIT_ACCESS*) alloca(sizeof(EXPLICIT_ACCESS) * (grantSidCount +
denySidCount));
{code}
Should we deallocate this when BuildSecurityDescriptor fails?
I don't think it is required to do this now, just wanted to bring it up: if our
native codebase continues to grow at this pace we should consider introducing
smart pointers. It is becoming impossibly hard to properly manage the memory in
all success/failure cases. This becomes more important now that we have long
running NM native client and winutils service.
16. What is the behaviour of calling {{winutils service}}. Will this command
install and start a winutils.exe service under SYSTEM account, and exit?
> Remove the need to run NodeManager as privileged account for Windows Secure
> Container Executor
> ----------------------------------------------------------------------------------------------
>
> Key: YARN-2198
> URL: https://issues.apache.org/jira/browse/YARN-2198
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Remus Rusanu
> Assignee: Remus Rusanu
> Labels: security, windows
> Attachments: YARN-2198.1.patch, YARN-2198.2.patch
>
>
> YARN-1972 introduces a Secure Windows Container Executor. However this
> executor requires a the process launching the container to be LocalSystem or
> a member of the a local Administrators group. Since the process in question
> is the NodeManager, the requirement translates to the entire NM to run as a
> privileged account, a very large surface area to review and protect.
> This proposal is to move the privileged operations into a dedicated NT
> service. The NM can run as a low privilege account and communicate with the
> privileged NT service when it needs to launch a container. This would reduce
> the surface exposed to the high privileges.
> There has to exist a secure, authenticated and authorized channel of
> communication between the NM and the privileged NT service. Possible
> alternatives are a new TCP endpoint, Java RPC etc. My proposal though would
> be to use Windows LPC (Local Procedure Calls), which is a Windows platform
> specific inter-process communication channel that satisfies all requirements
> and is easy to deploy. The privileged NT service would register and listen on
> an LPC port (NtCreatePort, NtListenPort). The NM would use JNI to interop
> with libwinutils which would host the LPC client code. The client would
> connect to the LPC port (NtConnectPort) and send a message requesting a
> container launch (NtRequestWaitReplyPort). LPC provides authentication and
> the privileged NT service can use authorization API (AuthZ) to validate the
> caller.
--
This message was sent by Atlassian JIRA
(v6.2#6252)