[jira] [Updated] (YARN-11743) Cgroup v2 support should fall back to v1 when there are no v2 controllers
[ https://issues.apache.org/jira/browse/YARN-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated YARN-11743: -- Labels: pull-request-available (was: ) > Cgroup v2 support should fall back to v1 when there are no v2 controllers > - > > Key: YARN-11743 > URL: https://issues.apache.org/jira/browse/YARN-11743 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Benjamin Teke >Assignee: Peter Szucs >Priority: Major > Labels: pull-request-available > > Cgroup v1/v2 mixed mode support was introduced in YARN-11692, however it does > not support an edgecase where NM has the cgroup v2 support enabled (using > {{yarn.nodemanager.linux-container-executor.cgroups.v2.enabled}} set to > true), but there are only cgroup v1 controllers mounted. In larger clusters > there is a chance that some part of the cluster is already on newer OSes with > cgroup v2 as a default, and others are still using v1. > Currently trying to launch an NM with cgroup v2 support enabled will fail if > there are no cgroup.controllers file present: > {code:java} > Failed to initialize controller paths! Exception: > java.io.IOException: No cgroup controllers file found in the directory > specified: /var/lib/yarn-ce/cgroups > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.readControllersFile(CGroupsV2HandlerImpl.java:130) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.parsePreConfiguredMountPath(CGroupsV2HandlerImpl.java:101) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.initializeControllerPaths(AbstractCGroupsHandler.java:133) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.init(AbstractCGroupsHandler.java:107) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.(AbstractCGroupsHandler.java:103) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.(CGroupsV2HandlerImpl.java:71) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.(CGroupsV2HandlerImpl.java:83) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupV2Handler(ResourceHandlerModule.java:106) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupHandlers(ResourceHandlerModule.java:83) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initCGroupsCpuResourceHandler(ResourceHandlerModule.java:177) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeConfiguredResourceHandlerChain(ResourceHandlerModule.java:334) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.getConfiguredResourceHandlerChain(ResourceHandlerModule.java:383) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:314) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:427) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054) > {code} > Basically > [readControllersFile|https://github.com/apache/hadoop/blob/950b2ff773fa828eb13bed7c3fe6b3d52c7fff18/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2HandlerImpl.java#L127]'s > thrown error should be handled if the required controllers are mounted in v1: > {code:java} > /** >* Parse the cgroup v2 controllers file (cgroup.controllers) to check the > enabled controllers. >* @param cgroupPath path to the cgroup directory >* @return set of enabled and YARN supported controllers. >* @throws IOException if the file is not found or cannot be read >*/ > public Set readControllersFile(String cgroupPath) throws > IOException { > File cgroupControllersFile = new File(cgroupPath + Path.SEPARATOR + > CGROUP_CONTROLLERS_FILE); > if (!cgroupControllersFile.exists()) { > throw new IOException("No cgroup controllers file found in the >
[jira] [Updated] (YARN-11743) Cgroup v2 support should fall back to v1 when there are no v2 controllers
[ https://issues.apache.org/jira/browse/YARN-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-11743: - Description: Cgroup v1/v2 mixed mode support was introduced in YARN-11692, however it does not support an edgecase where NM has the cgroup v2 support enabled (using {{yarn.nodemanager.linux-container-executor.cgroups.v2.enabled}} set to true), but there are only cgroup v1 controllers mounted. In larger clusters there is a chance that some part of the cluster is already on newer OSes with cgroup v2 as a default, and others are still using v1. Currently trying to launch an NM with cgroup v2 support enabled will fail if there are no cgroup.controllers file present: {code:java} Failed to initialize controller paths! Exception: java.io.IOException: No cgroup controllers file found in the directory specified: /var/lib/yarn-ce/cgroups at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.readControllersFile(CGroupsV2HandlerImpl.java:130) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.parsePreConfiguredMountPath(CGroupsV2HandlerImpl.java:101) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.initializeControllerPaths(AbstractCGroupsHandler.java:133) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.init(AbstractCGroupsHandler.java:107) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.(AbstractCGroupsHandler.java:103) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.(CGroupsV2HandlerImpl.java:71) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.(CGroupsV2HandlerImpl.java:83) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupV2Handler(ResourceHandlerModule.java:106) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupHandlers(ResourceHandlerModule.java:83) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initCGroupsCpuResourceHandler(ResourceHandlerModule.java:177) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeConfiguredResourceHandlerChain(ResourceHandlerModule.java:334) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.getConfiguredResourceHandlerChain(ResourceHandlerModule.java:383) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:314) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:427) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054) {code} Basically [readControllersFile|https://github.com/apache/hadoop/blob/950b2ff773fa828eb13bed7c3fe6b3d52c7fff18/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2HandlerImpl.java#L127]'s thrown error should be handled if the required controllers are mounted in v1. was: Cgroup v1/v2 mixed mode support was introduced in YARN-11692, however it does not support an edgecase where NM has the cgroup v2 support enabled (using {{yarn.nodemanager.linux-container-executor.cgroups.v2.enabled}} set to true), but there are only cgroup v1 controllers mounted. In larger clusters there is a chance that some part of the cluster is already on newer OSes with cgroup v2 as a default, and others are still using v1. Currently trying to launch an NM with cgroup v2 support enabled will fail if there are no cgroup.controllers file present. > Cgroup v2 support should fall back to v1 when there are no v2 controllers > - > > Key: YARN-11743 > URL: https://issues.apache.org/jira/browse/YARN-11743 > Project: Hadoop YARN > Issue Type: Sub-task > Components: yarn >Reporter: Benjamin Teke >Priority: Major > > Cgroup v1/v2 mixed mode support was introduced in YARN-11692, however it does > not support an edgecase where NM has the cgroup v2 support enabled (using > {{yarn.nodemanager.linux-container-executor.cgroups.v2.en
[jira] [Updated] (YARN-11743) Cgroup v2 support should fall back to v1 when there are no v2 controllers
[ https://issues.apache.org/jira/browse/YARN-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Teke updated YARN-11743: - Description: Cgroup v1/v2 mixed mode support was introduced in YARN-11692, however it does not support an edgecase where NM has the cgroup v2 support enabled (using {{yarn.nodemanager.linux-container-executor.cgroups.v2.enabled}} set to true), but there are only cgroup v1 controllers mounted. In larger clusters there is a chance that some part of the cluster is already on newer OSes with cgroup v2 as a default, and others are still using v1. Currently trying to launch an NM with cgroup v2 support enabled will fail if there are no cgroup.controllers file present: {code:java} Failed to initialize controller paths! Exception: java.io.IOException: No cgroup controllers file found in the directory specified: /var/lib/yarn-ce/cgroups at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.readControllersFile(CGroupsV2HandlerImpl.java:130) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.parsePreConfiguredMountPath(CGroupsV2HandlerImpl.java:101) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.initializeControllerPaths(AbstractCGroupsHandler.java:133) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.init(AbstractCGroupsHandler.java:107) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.(AbstractCGroupsHandler.java:103) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.(CGroupsV2HandlerImpl.java:71) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.(CGroupsV2HandlerImpl.java:83) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupV2Handler(ResourceHandlerModule.java:106) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupHandlers(ResourceHandlerModule.java:83) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initCGroupsCpuResourceHandler(ResourceHandlerModule.java:177) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeConfiguredResourceHandlerChain(ResourceHandlerModule.java:334) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.getConfiguredResourceHandlerChain(ResourceHandlerModule.java:383) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:314) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:427) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054) {code} Basically [readControllersFile|https://github.com/apache/hadoop/blob/950b2ff773fa828eb13bed7c3fe6b3d52c7fff18/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2HandlerImpl.java#L127]'s thrown error should be handled if the required controllers are mounted in v1: {code:java} /** * Parse the cgroup v2 controllers file (cgroup.controllers) to check the enabled controllers. * @param cgroupPath path to the cgroup directory * @return set of enabled and YARN supported controllers. * @throws IOException if the file is not found or cannot be read */ public Set readControllersFile(String cgroupPath) throws IOException { File cgroupControllersFile = new File(cgroupPath + Path.SEPARATOR + CGROUP_CONTROLLERS_FILE); if (!cgroupControllersFile.exists()) { throw new IOException("No cgroup controllers file found in the directory specified: " + cgroupPath); } String enabledControllers = FileUtils.readFileToString(cgroupControllersFile, StandardCharsets.UTF_8); Set validCGroups = getValidCGroups(); Set controllerSet = new HashSet<>(Arrays.asList(enabledControllers.split(" "))); // Collect the valid subsystem names controllerSet.retainAll(validCGroups); if (controllerSet.isEmpty()) { LOG.warn("The following cgroup directory doesn't contain any supported controllers: " + cgroupPath); } return controllerSet; } {code} was: Cgroup v1/v2 mixe