[
https://issues.apache.org/jira/browse/YARN-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206771#comment-16206771
]
Jonathan Hung commented on YARN-6852:
-------------------------------------
Hi [~leftnoteasy], in {{gpu-module.c#internal_handle_gpu_request}}, there's
this code: {noformat}+ // Use cgroup helpers to blacklist devices
+ for (int i = 0; i < n_minor_devices_to_block; i++) {
+ char param_value[128];
+ memset(param_value, 0, sizeof(param_value));
+ snprintf(param_value, sizeof(param_value), "c %d:%d rwm",
+ major_device_number, i);
+
+ int rc = update_cgroups_parameters_func_p("devices", "deny",
+ container_id, param_value);
+
+ if (0 != rc) {
+ fprintf(ERRORFILE, "CGroups: Failed to update cgroups\n");
+ return_code = -1;
+ goto cleanup;
+ }
+ }{noformat}
Is {noformat}+ snprintf(param_value, sizeof(param_value), "c %d:%d rwm",
+ major_device_number, i);{noformat} supposed to be {noformat}+
snprintf(param_value, sizeof(param_value), "c %d:%d rwm",
+ major_device_number, minor_devices[i]);{noformat}? Seems the
minor number is not actually the number being written.
> [YARN-6223] Native code changes to support isolate GPU devices by using
> CGroups
> -------------------------------------------------------------------------------
>
> Key: YARN-6852
> URL: https://issues.apache.org/jira/browse/YARN-6852
> Project: Hadoop YARN
> Issue Type: Sub-task
> Reporter: Wangda Tan
> Assignee: Wangda Tan
> Fix For: 3.0.0-beta1
>
> Attachments: YARN-6852.001.patch, YARN-6852.002.patch,
> YARN-6852.003.patch, YARN-6852.004.patch, YARN-6852.005.patch,
> YARN-6852.006.patch, YARN-6852.007.patch, YARN-6852.008.patch,
> YARN-6852.009.patch
>
>
> This JIRA plan to add support of:
> 1) Isolation in CGroups. (native side).
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]