[ 
https://issues.apache.org/jira/browse/YARN-11743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906715#comment-17906715
 ] 

ASF GitHub Bot commented on YARN-11743:
---------------------------------------

p-szucs commented on code in PR #7222:
URL: https://github.com/apache/hadoop/pull/7222#discussion_r1890069838


##########
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/ResourceHandlerModule.java:
##########
@@ -76,10 +76,10 @@ public class ResourceHandlerModule {
   private static volatile CpuResourceHandler
       cGroupsCpuResourceHandler;
 
-  private static void initializeCGroupHandlers(Configuration conf)
-      throws ResourceHandlerException {
+  private static void initializeCGroupHandlers(Configuration conf,
+      CGroupsHandler.CGroupController controller) throws 
ResourceHandlerException {
     initializeCGroupV1Handler(conf);
-    if (cgroupsV2Enabled) {
+    if (cgroupsV2Enabled && !isMountedInCGroupsV1(controller)) {
       initializeCGroupV2Handler(conf);

Review Comment:
   Thanks for the review @brumi1024! With the current logic we use the v1 
handler to get the v1 controller paths and init v2 handler only if the given 
controller is not mounted in v1. But with a pure v2 setup I don't think we will 
use v1 handler after that check, there may be a cleaner way to determine this.
   
   It's not the clean v2 setup, but with the resource plugins we still use only 
the v1 handler in [this](https://github.com/apache/hadoop/pull/7222) codepart, 
so probably additional checks would be needed there as well if we decide not to 
init the v1 handler.





> Cgroup v2 support should fall back to v1 when there are no v2 controllers
> -------------------------------------------------------------------------
>
>                 Key: YARN-11743
>                 URL: https://issues.apache.org/jira/browse/YARN-11743
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: yarn
>            Reporter: Benjamin Teke
>            Assignee: Peter Szucs
>            Priority: Major
>              Labels: pull-request-available
>
> Cgroup v1/v2 mixed mode support was introduced in YARN-11692, however it does 
> not support an edgecase where NM has the cgroup v2 support enabled (using 
> {{yarn.nodemanager.linux-container-executor.cgroups.v2.enabled}} set to 
> true), but there are only cgroup v1 controllers mounted. In larger clusters 
> there is a chance that some part of the cluster is already on newer OSes with 
> cgroup v2 as a default, and others are still using v1. 
> Currently trying to launch an NM with cgroup v2 support enabled will fail if 
> there are no cgroup.controllers file present:
> {code:java}
> Failed to initialize controller paths! Exception: 
> java.io.IOException: No cgroup controllers file found in the directory 
> specified: /var/lib/yarn-ce/cgroups
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.readControllersFile(CGroupsV2HandlerImpl.java:130)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.parsePreConfiguredMountPath(CGroupsV2HandlerImpl.java:101)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.initializeControllerPaths(AbstractCGroupsHandler.java:133)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.init(AbstractCGroupsHandler.java:107)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.AbstractCGroupsHandler.<init>(AbstractCGroupsHandler.java:103)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.<init>(CGroupsV2HandlerImpl.java:71)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsV2HandlerImpl.<init>(CGroupsV2HandlerImpl.java:83)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupV2Handler(ResourceHandlerModule.java:106)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeCGroupHandlers(ResourceHandlerModule.java:83)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initCGroupsCpuResourceHandler(ResourceHandlerModule.java:177)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.initializeConfiguredResourceHandlerChain(ResourceHandlerModule.java:334)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.ResourceHandlerModule.getConfiguredResourceHandlerChain(ResourceHandlerModule.java:383)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:314)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:427)
>       at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:974)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:1054)
> {code}
> Basically 
> [readControllersFile|https://github.com/apache/hadoop/blob/950b2ff773fa828eb13bed7c3fe6b3d52c7fff18/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/linux/resources/CGroupsV2HandlerImpl.java#L127]'s
>  thrown error should be handled if the required controllers are mounted in v1:
> {code:java}
>   /**
>    * Parse the cgroup v2 controllers file (cgroup.controllers) to check the 
> enabled controllers.
>    * @param cgroupPath path to the cgroup directory
>    * @return set of enabled and YARN supported controllers.
>    * @throws IOException if the file is not found or cannot be read
>    */
>   public Set<String> readControllersFile(String cgroupPath) throws 
> IOException {
>     File cgroupControllersFile = new File(cgroupPath + Path.SEPARATOR + 
> CGROUP_CONTROLLERS_FILE);
>     if (!cgroupControllersFile.exists()) {
>       throw new IOException("No cgroup controllers file found in the 
> directory specified: " +
>               cgroupPath);
>     }
>     String enabledControllers = 
> FileUtils.readFileToString(cgroupControllersFile,
>         StandardCharsets.UTF_8);
>     Set<String> validCGroups = getValidCGroups();
>     Set<String> controllerSet =
>             new HashSet<>(Arrays.asList(enabledControllers.split(" ")));
>     // Collect the valid subsystem names
>     controllerSet.retainAll(validCGroups);
>     if (controllerSet.isEmpty()) {
>       LOG.warn("The following cgroup directory doesn't contain any supported 
> controllers: " +
>               cgroupPath);
>     }
>     return controllerSet;
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to