[
https://issues.apache.org/jira/browse/YARN-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bob updated YARN-4508:
----------------------
Description:
In one scenarios , could result in mount_cgroup return success, but actually
the request cgroup controller mount failed.
Below code should enhance the condition check:
{code}
} else {
fprintf(LOGFILE, "Failed to mount cgroup controller %s at %s - %s\n",
controller, mount_path, strerror(errno));
// if controller is already mounted, don't stop trying to mount others
if (errno != EBUSY) {
result = -1;
}
}
{code}
In below scenarios can reproduce the issue:
1.Start NM, it will mount cgroups normally
2.Manually unmount the cgroups used by NM
3.Restart NM, NM can start successfully , but container can't be started due
to cgroups did not mounted successfully.
was:
In one scenarios , could result in mount_cgroup return success, but actually
the request cgroup controller mount failed.
Below code should enhance the condition check:
{code}
} else {
fprintf(LOGFILE, "Failed to mount cgroup controller %s at %s - %s\n",
controller, mount_path, strerror(errno));
// if controller is already mounted, don't stop trying to mount others
if (errno != EBUSY) {
result = -1;
}
}
{code}
In below scenarios can reproduce the issue:
1.Start NM, it will mount cgroups normally
2.Manually unmount the cgroups used by NM
3.Restart NM, NM can start successfully , but container cant be started due to
cgroups did not mounted successfully.
> The mount_cgroup method in container-executor.c should enhance mount check
> when mount the request cgroup controller.
> --------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-4508
> URL: https://issues.apache.org/jira/browse/YARN-4508
> Project: Hadoop YARN
> Issue Type: Bug
> Components: yarn
> Affects Versions: 2.6.1, 2.7.1
> Reporter: Bob
> Priority: Minor
>
> In one scenarios , could result in mount_cgroup return success, but actually
> the request cgroup controller mount failed.
> Below code should enhance the condition check:
> {code}
> } else {
> fprintf(LOGFILE, "Failed to mount cgroup controller %s at %s - %s\n",
> controller, mount_path, strerror(errno));
> // if controller is already mounted, don't stop trying to mount others
> if (errno != EBUSY) {
> result = -1;
> }
> }
> {code}
> In below scenarios can reproduce the issue:
> 1.Start NM, it will mount cgroups normally
> 2.Manually unmount the cgroups used by NM
> 3.Restart NM, NM can start successfully , but container can't be started due
> to cgroups did not mounted successfully.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)