[
https://issues.apache.org/jira/browse/YARN-9098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725212#comment-16725212
]
Jim Brennan commented on YARN-9098:
-----------------------------------
[~snemeth] I found the bug (or at least one bug).
In findControllerPathInMountConfig(), you are not looping:
{noformat}
public static String findControllerPathInMountConfig(String controller,
CGroupsMountConfig mountConfig) {
String path = mountConfig.getPathForController(controller);
if (path != null) {
if (new File(path).canRead()) {
return path;
} else {
LOG.warn(String.format(
"Skipping inaccessible cgroup mount point %s", path));
}
}
return null;
}
{noformat}
If the bad entry for the CPU controller comes before the good entry, then you
will "skip" and return null for the CPU controller. This code path should loop
through all of the mountconfig entries so that it will properly skip bad
entries and find good ones. It also makes me concerned about other uses of
mountConfig.getPathForController() - the current code essentially assumes that
there is only one entry for each controller.
The reason I think I am hitting it and you (and precommit) are not is because
this is a hash map, so the ordering is essentially random depending on the file
paths. Since our file paths are different, we get different orderings, and
since in my case the bad entry comes first, the tests fail for me.
I found this while debugging the testMtabParsing() test.
> Separate mtab file reader code and cgroups file system hierarchy parser code
> from CGroupsHandlerImpl and ResourceHandlerModule
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-9098
> URL: https://issues.apache.org/jira/browse/YARN-9098
> Project: Hadoop YARN
> Issue Type: Improvement
> Reporter: Szilard Nemeth
> Assignee: Szilard Nemeth
> Priority: Major
> Attachments: YARN-9098.002.patch, YARN-9098.003.patch,
> YARN-9098.004.patch
>
>
> Separate mtab file reader code and cgroups file system hierarchy parser code
> from CGroupsHandlerImpl and ResourceHandlerModule
> CGroupsHandlerImpl has a method parseMtab that parses an mtab file and stores
> cgroups data.
> CGroupsLCEResourcesHandler also has a method with the same name, with
> identical code.
> The parser code should be extracted from these places and be added in a new
> class as this is a separate responsibility.
> As the output of the file parser is a Map<String, Set<String>>, it's better
> to encapsulate it in a domain object, named 'CGroupsMountConfig' for instance.
> ResourceHandlerModule has a method named parseConfiguredCGroupPath, that is
> responsible for producing the same results (Map<String, Set<String>>) to
> store cgroups data, it does not operate on mtab file, but looking at the
> filesystem for cgroup settings. As the output is the same, CGroupsMountConfig
> should be used here, too.
> Again, this could should not be part of ResourceHandlerModule as it is a
> different responsibility.
> One more thing which is strongly related to the methods above is
> CGroupsHandlerImpl.initializeFromMountConfig: This method processes the
> result of a parsed mtab file or a parsed cgroups filesystem data and stores
> file system paths for all available controllers. This method invokes
> findControllerPathInMountConfig, which is a duplicated in CGroupsHandlerImpl
> and CGroupsLCEResourcesHandler, so it should be moved to a single place. To
> store filesystem path and controller mappings, a new domain object could be
> introduced.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]