[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-10-17 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208007#comment-16208007
 ] 

Jonathan Hung commented on YARN-6852:
-

Sounds good. Filed YARN-7345 for this.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-10-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206962#comment-16206962
 ] 

Wangda Tan commented on YARN-6852:
--

[~jhung], thanks for reporting this. Yeah you're correct, in real world almost 
all minor_numbers of GPU start with 0 and ended with n-1, so that's why we 
didn't see this issue in test. Please file a ticket to track this issue. 

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-10-16 Thread Jonathan Hung (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16206771#comment-16206771
 ] 

Jonathan Hung commented on YARN-6852:
-

Hi [~leftnoteasy], in {{gpu-module.c#internal_handle_gpu_request}}, there's 
this code: {noformat}+  // Use cgroup helpers to blacklist devices
+  for (int i = 0; i < n_minor_devices_to_block; i++) {
+char param_value[128];
+memset(param_value, 0, sizeof(param_value));
+snprintf(param_value, sizeof(param_value), "c %d:%d rwm",
+ major_device_number, i);
+
+int rc = update_cgroups_parameters_func_p("devices", "deny",
+  container_id, param_value);
+
+if (0 != rc) {
+  fprintf(ERRORFILE, "CGroups: Failed to update cgroups\n");
+  return_code = -1;
+  goto cleanup;
+}
+  }{noformat}

Is {noformat}+snprintf(param_value, sizeof(param_value), "c %d:%d rwm",
+ major_device_number, i);{noformat} supposed to be {noformat}+
snprintf(param_value, sizeof(param_value), "c %d:%d rwm",
+ major_device_number, minor_devices[i]);{noformat}? Seems the 
minor number is not actually the number being written.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-10-12 Thread Varun Vasudev (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202313#comment-16202313
 ] 

Varun Vasudev commented on YARN-6852:
-

Sounds good, I'll create a new ticket for that then and close this.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-10-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16202236#comment-16202236
 ] 

Wangda Tan commented on YARN-6852:
--

[~vvasudev], I would prefer to only pull in common changes since rest of GPU 
logics cannot be landed to branch-2. (We don't have resource-profile, etc.). 
Few changes need to make, please let me know if you need any help from my side.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-09-13 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16165065#comment-16165065
 ] 

Wangda Tan commented on YARN-6852:
--

[~tangzhankun],

Thanks for adding ref to K8S ongoing proposals.

I just quickly read both proposals, to me the hw-accelerator looks like a long 
term goal can be done 1-2 years later. IMHO, the usage of hw-accelerator on 
such platforms (K8S/YARN) are still in early phase, people are trying to move 
some workload from bare-metal or HPC to these platforms. It becomes important 
requirement once more workload needs GPU/FPGA landed. We can either do some 
non-intrusive changes like adding node attribute for device types / versions, 
or more comprehensive changes to support topology, etc. To me the first option 
will be straightforward, the 2nd option is not only a challenge for device 
isolation, it also changes how application asks resource, and how scheduler 
deal with asks. The k8s proposal to solve the scheduling problem looks too 
simple to me, it won't fit in YARN's scheduling performance requirement.

For the device manager, it will be a nice-to-have feature, I will think more 
about it while working on YARN-6620. K8S proposal is very flexible to add new 
resource type but it is also very heavy-weighted. For example, different 
resource plugins need to implement their own logics to store state, etc. And 
managing plugin might be a challenge for today's YARN.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-09-12 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16164150#comment-16164150
 ] 

Zhankun Tang commented on YARN-6852:


[~wangda], Yeah.  At present, all supported resource type needs to committed to 
YARN as first-class citizen. 

I'm not quite sure if it's easy to do but trying to find a way to improve 
current framework to support a new set of devices resource like GPU, FPGA, 
sound card, SSD, etc with  plugins that don't needs to change YARN core.
I don't have a clear picture in my mind at present. Never mind, just some 
thoughts wants to share after reviewing kubernetes' in-progress plugin design 
on [hardware-accelerator|https://github.com/kubernetes/community/pull/844] and 
[device-manager|https://github.com/kubernetes/community/blob/master/contributors/design-proposals/device-plugin.md].


> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-09-12 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163299#comment-16163299
 ] 

Wangda Tan commented on YARN-6852:
--

[~tangzhankun], 

I'm not quite sure about what does the first-class citizen means? In my mind 
all supported resource types committed to YARN are first-class citizen.

For every new added resource types which need to be isolated/enforced in NM 
side, it will be unavoidably to change at least ResourceHandler (so only add 
.so is not enough), and for resource type needs additional permissions, c-e 
changes are required. This is because model of different resource types are 
quite different, for example, GPU supports need to use binaries like nvidia-smi 
to detect host OS's GPU information, but FPGA may not need. And IIUC, NUMA 
support only needs ResourceHandler changes but don't need changes to c-e.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-09-11 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162318#comment-16162318
 ] 

Zhankun Tang commented on YARN-6852:


[~wangda], agree with you. I'll refactor both native and java side code.
But still a concern is that how can we determine which resource should be 
treated as first-class citizen? I'm afraid a flexible way with minimal code 
changes to enable more hardware accelerator resource type is needed? For 
instance, just a .so for every device module?

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-09-08 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159448#comment-16159448
 ] 

Wangda Tan commented on YARN-6852:
--

[~tangzhankun], 

In general, I'm +1 to the approach to do refactoring and reuse the code as much 
as we can, but make the CLI/config independent. The reason is: we should limit 
privileged changes we can make to cgroups. And each module could be operated 
independently.


> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-09-08 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158338#comment-16158338
 ] 

Zhankun Tang commented on YARN-6852:


[~wangda], Thanks for the patch. I'd like to update FPGA related patch based on 
current progress. 
But it seems the c-e.cfg format and device isolation logic is almost same for 
GPU and FPGA.
Since the c-e binary is mainly providing cgroups isolation API to the above 
device resource handler by executing below privileged command:
{panel}
container-executor gpu --excluded_gpus=0,1 --container_id=container_x_y_z
{panel}
Is it valuable that we make a general module so that we don't need to implement 
one native module for each hardware accelerator?
{panel}
container-executor hw-acc --container_id=container_x_y_z --include_hw-acc=0,1 
or --excluded_hw-acc=0,1 
{panel}


> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-18 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16133872#comment-16133872
 ] 

Hudson commented on YARN-6852:
--

ABORTED: Integrated in Jenkins build Hadoop-trunk-Commit #12214 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/12214/])
YARN-6852. Native code changes to support isolate GPU devices by using (wangda: 
rev 436c2638f9ca1fb8de6a630cb5e91d956ac75216)
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor-common.h
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/common/module-configs.h
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/string-utils.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/CMakeLists.txt
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/path-utils.c
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.c
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/cgroups/test-cgroups-module.cc
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test-path-utils.cc
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.h
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/common/constants.h
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/gpu/gpu-module.h
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.c
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/common/module-configs.c
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/path-utils.h
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/utils/string-utils.h
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/modules/gpu/test-gpu-module.cc
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test_main.cc
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/utils/test-string-utils.cc
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/main.c
* (add) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/modules/cgroups/cgroups-operations.c
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/impl/container-executor.h
* (edit) 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/native/container-executor/test/test-container-executor.c


> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-16 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129450#comment-16129450
 ] 

Wangda Tan commented on YARN-6852:
--

If no opposite opinions, I will commit the patch by Friday.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129375#comment-16129375
 ] 

Hadoop QA commented on YARN-6852:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
19s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 14m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
44s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
31s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
41s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
48s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
16s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 32m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12882190/YARN-6852.009.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux ffca14155929 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / de462da |
| Default Java | 1.8.0_144 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16938/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16938/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch, 
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16129290#comment-16129290
 ] 

Hadoop QA commented on YARN-6852:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 17m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} cc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 13m 23s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 34m 54s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.yarn.server.nodemanager.containermanager.logaggregation.TestLogAggregationService
 |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12882180/YARN-6852.008.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 35a5b0b26756 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 1455306 |
| Default Java | 1.8.0_144 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/16936/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16936/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16936/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-16 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16128996#comment-16128996
 ] 

Sunil G commented on YARN-6852:
---

+1 on latest patch.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122382#comment-16122382
 ] 

Hadoop QA commented on YARN-6852:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m  
0s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
24s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 39s{color} | 
{color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 25 new + 0 unchanged - 0 fixed = 25 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 14m  
6s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 49m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881319/YARN-6852.007.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux e62e4e5b9086 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 
13:48:03 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 312e57b |
| Default Java | 1.8.0_144 |
| cc | 
https://builds.apache.org/job/PreCommit-YARN-Build/16840/artifact/patchprocess/diff-compile-cc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16840/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16840/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch, YARN-6852.007.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-10 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16122138#comment-16122138
 ] 

Hadoop QA commented on YARN-6852:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 7 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 
48s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 38s{color} | 
{color:red} 
hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager
 generated 41 new + 0 unchanged - 0 fixed = 41 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 
19s{color} | {color:green} hadoop-yarn-server-nodemanager in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 30m 36s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881293/YARN-6852.006.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 3ce5e6484bec 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 
12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 312e57b |
| Default Java | 1.8.0_131 |
| cc | 
https://builds.apache.org/job/PreCommit-YARN-Build/16832/artifact/patchprocess/diff-compile-cc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16832/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16832/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch, 
> YARN-6852.006.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120841#comment-16120841
 ] 

Hadoop QA commented on YARN-6852:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 6 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 
 1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
49s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
35s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
30s{color} | {color:red} hadoop-yarn-server-nodemanager in the patch failed. 
{color} |
| {color:red}-1{color} | {color:red} cc {color} | {color:red}  0m 30s{color} | 
{color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 30s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 29s{color} 
| {color:red} hadoop-yarn-server-nodemanager in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
14s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 19m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:14b5c93 |
| JIRA Issue | YARN-6852 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12881065/YARN-6852.005.patch |
| Optional Tests |  asflicense  compile  cc  mvnsite  javac  unit  |
| uname | Linux 67fe8215c245 3.13.0-116-generic #163-Ubuntu SMP Fri Mar 31 
14:13:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / ac7d060 |
| Default Java | 1.8.0_131 |
| compile | 
https://builds.apache.org/job/PreCommit-YARN-Build/16814/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| cc | 
https://builds.apache.org/job/PreCommit-YARN-Build/16814/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| javac | 
https://builds.apache.org/job/PreCommit-YARN-Build/16814/artifact/patchprocess/patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-YARN-Build/16814/artifact/patchprocess/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/16814/testReport/ |
| modules | C: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 U: 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/16814/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, 

[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-09 Thread Sunil G (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119593#comment-16119593
 ] 

Sunil G commented on YARN-6852:
---

Thanks [~leftnoteasy] for the patch.

Few minor comments:
In {{get_numbers_split_by_comma}}
# its better to pass {{input}} as const
# in below code
{code}
67while (p != NULL) {
68  int n = strtol(p, NULL, 0);
69  n_numbers++;
{code}
If {{strtol}} fails, we need to check {{errno}} and then process the input, 
correct?
# One more doubt
{code}
106   // Use cgroup helpers to blacklist devices
107   for (int i = 0; i < n_minor_devices_to_block; i++) {
108 char param_value[128];
109 snprintf(param_value, sizeof(param_value), "c %d:%d rwm",
110  major_device_number, i);
{code}
Is {{param_value}} null terminated after the printf?
# One more small suggestion
{{update_cgroups_parameters_func_p}} takes input like "devices" or "deny". Is 
it better to define all such "entities" and "verbs" in to a common include and 
use as macro?

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-08 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119277#comment-16119277
 ] 

Miklos Szegedi commented on YARN-6852:
--

+1 (non-binding) Thanks for the latest patch [~wangda].

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-07 Thread Zhankun Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117773#comment-16117773
 ] 

Zhankun Tang commented on YARN-6852:


[~miklos.szeg...@cloudera.com], [~wangda], Thanks for the good progress. The 
patch LGTM.
Perhaps one minor reason/benefit we prefer "devices.deny" comparing with 
"devices.allow" is because a newly added parent group blacklist will propagate 
to the children while whitelist entries will not?

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-07 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117702#comment-16117702
 ] 

Miklos Szegedi commented on YARN-6852:
--

No comments on the design, thanks for the explanation. We can keep 
container-executor but I would not add more - especially not converting 
existing files, since the standard is _.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch, YARN-6852.004.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-07 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117175#comment-16117175
 ] 

Miklos Szegedi commented on YARN-6852:
--

Thank you [~wangda],
I have a second batch.
I could not find a doc about this, so I ask. Is there a reason to have a 
disable list of devices, instead of an enable list?
command-line-parser.c, etc: I think it is more common to have a_b.c style file 
naming. For example in the Linux kernel: attribute_container.c
{code}
int handle_gpu_request(update_cgroups_parameters_func func,
143const char* module_name, int argc, char** argv) {
{code}
Since they are not argv anymore I would use the names paramc, params instead of 
argc, arg
{code}
int* minor_devices;
{code}
I would call internal_handle_gpu_request, only if minor_devices is initialized. 
Currently it may contain any garbage on the stack, if ‘e’ never hits.
{code}
191  default:
192fprintf(ERRORFILE,
193  "Unknown option in gpu command character %d %c, optionindex = 
%d\n",
194  c, c, optind);
195return -1;
196break;
{code}
No need to break;
{code}
if (0 == strlen(container_id)) {
{code}
container_id[0]==0
{code}
strncpy(container_id, optarg, MAX_CONTAINER_ID_LEN)
{code}
strncpy does not null terminate the string if the length equals to MAX_CON...
{code}
free(full_path);
{code}
It is a common practice to
{code}
… goto cleanup;
cleanup:
free(full_path);
return X;
{code}
-
{code}
  if (!major_number_str || 0 == strlen(major_number_str)) {
{code}
This could be major_number_str[0]==0
{code}
61// Default major number of Nvidia devices
62major_device_number = 195;
{code}
It might be nicer to have a #define for this number
{code}
72  if (!allowed_minor_numbers_str || strlen(allowed_minor_numbers_str)) {
73allowed_minor_numbers = NULL;
{code}
Bug: strlen==0 (!) or allowed_minor_numbers_str[0]==0
{code}
get_section_value return values are not freed in internal_handle_gpu_request
{code}
200  if (0 == strlen(container_id)) {
{code}
This could be container_id[0]==0
{code}
164  int* minor_devices;
{code}
I think this is never freed.
parse_commandline_opts may leak opts, opts->keys and opts-> values when 
returning -1 on errors.
{code}
41  if (!opts->keys || !opts->values) {
42fprintf(ERRORFILE, "Failed to malloc keys or values of opts\n");
43return NULL;
44  }
{code}
You still need to free keys if values is NULL
{code}
51  if (!(known_parameters && required && has_values)) {
{code}
Ideally, this is a check at the beginning of the function.
{code}
64// make sure param_name start with "--"
65if (0 != strncmp("--", param_name, 2)) {
66  fprintf(ERRORFILE, "option %s is not started with \"--\"\n", 
param_name);
67  return NULL;
68}
82if (param_idx < 0) {
83  fprintf(ERRORFILE, "cannot find parameter %s from known 
parameters\n", param_name);
84  return NULL;
85}
{code}
param_name is leaked.
Is parse_commandline_opts used anywhere? If not, please remove it from this 
jira.
{code}
24/*
25 * if all chars in the input str are numbers
26 * return true/false
27 */
28static int all_numbers(char* input) {
29  int len = strlen(input);
30
31  for (int i = 0; i < len; i++) {
32if (input[i] < '0' || input[i] > '9') {
33  return 0;
34}
35  }
36  return 1;
37}
{code}
This should be like the code below, otherwise we pass through the string twice:
{code}
24/*
25 * if all chars in the input str are numbers
26 * return true/false
27 */
28static int all_numbers(char* input) {
31  for (; input[0] != 0; input++) {
32if (input[0] < '0' || input[0] > '9') {
33  return 0;
34}
35  }
36  return 1;
37}
{code}

{code}
39int get_numbers_split_by_comma(char* input, int** numbers, size_t* 
ret_n_numbers) {
40  size_t n_numbers = 1;
41  for (int i = 0; i < strlen(input); i++) {
42if (input[i] == ',') {
43  n_numbers++;
44}
45  }
46
47  (*numbers) = malloc(sizeof(int) * n_numbers);
48  if (!(*numbers)) {
49return -1;
50  }
51
52  char* input_cpy = malloc(strlen(input));
53  strcpy(input_cpy, input);
54
55  char* p = strtok(input_cpy, ",");
56  int idx = 0;
57  while (p != NULL) {
58int n = atoi(p);
59(*numbers)[idx] = n;
60p = strtok(NULL, ",");
61idx++;
62  }
63
64  free(input_cpy);
65  *ret_n_numbers = n_numbers;
66
67  return 0;
68}
{code}
get_numbers_split_by_comma: Please do not do strlen at every character. It will 
be an O(n^2) algorithm from an O(n).
get_numbers_split_by_comma: You may actually return an array with garbage at 
the end, since a double comma (,,) will not trigger a new 

[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-04 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115055#comment-16115055
 ] 

Wangda Tan commented on YARN-6852:
--

Hi Miklos, really appreciate your thorough reviews, very helpful!

I address most of your comments.

Few items which I haven't addressed in the updated patch.
bq. Why do you have cgroup_cfg_section? You could eliminate it and get it all 
the time or just cache cgroups_root.
I still prefer to have it since this can help us get more configs without 
changing major code structure.

bq. int input_argv_idx = 0; the first argument is the process name.
Actually the argc and argv are modified in main.c before passed to modules, I 
removed process name already:
{code}
+return handle_gpu_request(_cgroups_parameters, "gpu", argc - 1,
+   [1]);
{code}
Please let me know if you have any suggestions to the approach. 

bq. opts->keys = malloc(sizeof(char*) * (argc + 1)); Why argc+1 and not argc-1? 
Updated to argc.

bq. required and has_values could be implemented as a bit array instead of a 
byte array. Another option ...
Since container-executor is not a memory-intensive application, I would prefer 
to spend time on changing it when it is necessary or there's any safety 
concerns. :) 

bq. This pattern is C+0x.
I think Varun mentioned this in YARN-6033, it is C99: 
https://stackoverflow.com/a/330867

bq. arr[idx] = n; There is no overflow check. This could also be exploitable.
This might not be an issue since we have already checked the input string once:
{code}
  for (int i = 0; i < strlen(input); i++) {
if (input[i] == ',') {
  n_numbers++;
}
  }
{code}

bq. container_1 is an invalid container id in the unit tests. They will fail. 
Did you mean we should not fail the check? "container_1" is actually an invalid 
id in YARN. 

bq. There is no indentation after namespace ContainerExecutor
I would prefer to not add extra indention for namespace. There're some 
discussions on SO: 
https://stackoverflow.com/questions/713698/c-namespaces-advice

bq. static std::vector cgroups_parameters_invoked; I think you 
should consider std::string here. No need to malloc later
bq. You do not clean up files in the unit tests, do you? Is there a reason?
(TODO) Will include unit test related changes and clean ups in the next patch.

Updated ver.003 patch.

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch, 
> YARN-6852.003.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-02 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112193#comment-16112193
 ] 

Miklos Szegedi commented on YARN-6852:
--

…
{{Return a parsed commandline options.}} There is a typo in this sentence.
{code}
struct section empty_executor_cfg = {.size=0, .kv_pairs=NULL};
{code}
This pattern is C++0x, should not be used in standard C. Note: I am not against 
converting the whole tool to C++…
Moreover {{section->name}} is not set to NULL above.
{code}
const struct section* cfg_section;
static int config_initialized = 0;
{code}
I see the same issues as in the groups case. (cfg_section==NULL can be used 
instead of config_initialized, cfg_section can be static, etc.)
n_minor_devices_to_block could be unsigned int, so that the negative check is 
not needed
strtol is a better alternative to atoi
{code}
char param_value[128];
snprintf(param_value, 128, "c %d:%d rwm", major_device_number, i);
{code}
This could be written as:
{code}
snprintf(param_value, sizeof(param_value), "c %d:%d rwm", 
major_device_number, i);
{code}
I do not see allowed_minor_numbers released anywhere.
{code}
  char container_id[128];
  memset(container_id, 0, 128);
{code}
It should be memset(container_id, 0, sizeof(container_id));
{{strcpy(container_id, optarg);}} This is dangerous without size. Use strncpy.
{{fflush(LOGFILE);}} This avoids caching and can be a performance bottleneck. I 
think it is better to avoid unless there is a good reason.
{code}
  const char *cgroups_param_path;
  const char* cgroups_param_value;
{code}
Misaligned space.
In module_enabled I would name rc something else. You marked 0 as rc success in 
other functions.
all_numbers: You call strlen n^2 times. You should call it once and cache the 
value.
all_numbers:
{code}
  if (strlen(input) == 0) {
return 0;
  }
{code}
This is not necessary.
{code}
  int* arr = (*numbers);
  arr = malloc(sizeof(int) * n_numbers);
{code}
Does this return anything? I think it should be:
{code}
  (*numbers) = malloc(sizeof(int) * n_numbers);
{code}
.
{code}
  char* input_cpy = malloc(strlen(input));
  strcpy(input_cpy, input);
{code}
There is no null pointer check.
{{arr[idx] = n;}} There is no overflow check. This could also be exploitable.
get_numbers_split_by_comma will return an array if a single 0 for an empty 
string. It should return ret_n_number=0 instead.
{code}
if (strlen(p) == 0) {
  return 0;
}
{code}
You could just check p[0]==0
{code}
if (mkdirs(TEST_ROOT, 0755) != 0) {
  exit(1);
}
{code}
This needs some logging to show what happened.
{{fprintf(LOGFILE, "\nTesting %s\n", __func__);}} GTest prints out the function 
name itself.
container_1 is an invalid container id in the unit tests. They will fail.
There is no indentation after {{namespace ContainerExecutor}}
{{static std::vector cgroups_parameters_invoked;}} I think you 
should consider std::string here. No need to malloc later
You do not clean up files in the unit tests, do you? Is there a reason?

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-02 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112127#comment-16112127
 ] 

Miklos Szegedi commented on YARN-6852:
--

…
Never mind about the function header comments above. I see them in the header.
{{// Make sure file exist}} There is a missing (s) at the end
{code}
f = fopen(full_path, "a");
{code}
Defense in depth: I would make sure full_path does not contain {{..}} For 
example {{/cgroups/cpu,cpuacct/container/../../../etc/passwd}}
{{if (fprintf(f, "%s", value) < 0)}} You do not close the file upon error of 
this call.
{code}
#ifdef __FreeBSD__
#define _WITH_GETLINE
#endif
{code}
Is this really needed in the Linux cgroups header file?
I think you do not need to include strings.h
parse_commandline_opts does not free opts and opts->keys, opts->values on error.
{{int input_argv_idx = 0;}} should be declared at the beginning of the file or 
inside braces in standard C.
{{while (input_argv_idx < argc)}} could and probably should be replaced by for
{{int input_argv_idx = 0;}} the first argument is the process name. This should 
be {{int input_argv_idx = 1;}}
{code}
opts->keys[opts->n_options] = param_name;
{code}
In general it is not a safe practice to return pointers inside argv. Consider a 
copy here.
opts->values is uninitialized. It may accidentally be dereferenced, so please 
fill it with zeros first.
{{opts->keys = malloc(sizeof(char*) * (argc + 1));}} Why argc+1 and not argc-1?
required and has_values could be implemented as a bit array instead of a byte 
array. Another option: Consider something like an array of 
{code}
struct known_parameter {char* name; struct {int required:1, int has_values:1 } 
flags;}
{code}
It saves 7 bytes per parameter.
continued...

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-08-02 Thread Miklos Szegedi (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111936#comment-16111936
 ] 

Miklos Szegedi commented on YARN-6852:
--

Thank you for the patch [~wangda]
We have now get_executable.c and cgroups-operations.c. It is up to you but I 
prefer cgroups_operations.c.
get_cgroups_path_to_write: This function could really use some comments
{{if (!cgroups_root || strlen(cgroups_root) == 0)}} How about {{if 
(!cgroups_root || cgroups_root[0] == 0)}} it is more common.
{code}
sprintf(output_path, "%s/%s/%s/%s/%s.%s",
  cgroups_root, controller_name, yarn_hierarchy_name,
  group_id, controller_name, param_name);
{code}
Please use snprintf to avoid buffer overflow and potential security/reliability 
issues. Usually the caller is supposed to send the max size as well. Also you 
need to handle snprintf.
config_initialized is not necessary/redundant. cgroup_cfg_section != NULL 
provides the same meaning.
cgroup_cfg_section should be static as well.
To be accurate controller_name is actually hierarchy_name. There is subsystem 
(cpu) and hierarchy (cpu,cpuacct).
Why do you have cgroup_cfg_section? You could eliminate it and get it all the 
time or just cache cgroups_root.
update_cgroups_parameters needs function header comments as well.
update_cgroups_parameters: Pass in full_path size to get_cgroups_path_to_write 
otherwise it may overflow the buffer on the stack(!) overwrite the return 
address to the buffer itself and execute arbitrary code as root upon return. 
full_path should be allocated on the heap. It is quite big and may increase the 
likelihood of stack overflows along with vulnerabilities like above.
continued...

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-6852) [YARN-6223] Native code changes to support isolate GPU devices by using CGroups

2017-07-24 Thread Wangda Tan (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099193#comment-16099193
 ] 

Wangda Tan commented on YARN-6852:
--

[~chris.douglas], could you help to review the approach and patch if you have 
bandwidth?

> [YARN-6223] Native code changes to support isolate GPU devices by using 
> CGroups
> ---
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: Wangda Tan
> Attachments: YARN-6852.001.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org