Shane Kumpf created YARN-8037: --------------------------------- Summary: CGroupsResourceCalculator excessive warnings on container relaunch Key: YARN-8037 URL: https://issues.apache.org/jira/browse/YARN-8037 Project: Hadoop YARN Issue Type: Bug Reporter: Shane Kumpf
When a container is relaunched, the old process no longer exists. When using the {{CGroupsResourceCalculator}} this results in the warning and exception below being logged every second until the relaunch occurs, which is excessive and filling up the logs. {code:java} 2018-03-16 14:30:33,438 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator: Failed to parse 12844 org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the interim 12844 at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457) Caused by: java.io.FileNotFoundException: /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_000002/cpuacct.stat (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320) ... 4 more 2018-03-16 14:30:33,438 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator: Failed to parse cgroups /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_000002/memory.memsw.usage_in_bytes org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the interim 12844 at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52) at org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457) Caused by: java.io.FileNotFoundException: /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_000002/memory.usage_in_bytes (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.<init>(FileInputStream.java:138) at org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320) ... 4 more{code} We should consider moving the exception to debug to reduce the noise at a minimum. Alternatively, it may make sense to stop the existing {{MonitoringThread}} during relaunch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org