[ https://issues.apache.org/jira/browse/YARN-8037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shane Kumpf resolved YARN-8037. ------------------------------- Resolution: Not A Problem > CGroupsResourceCalculator logs excessive warnings on container relaunch > ----------------------------------------------------------------------- > > Key: YARN-8037 > URL: https://issues.apache.org/jira/browse/YARN-8037 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Shane Kumpf > Priority: Major > > When a container is relaunched, the old process no longer exists. When using > the {{CGroupsResourceCalculator}} this results in the warning and exception > below being logged every second until the relaunch occurs, which is excessive > and filling up the logs. > {code:java} > 2018-03-16 14:30:33,438 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator: > Failed to parse 12844 > org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the > interim 12844 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.readTotalProcessJiffies(CGroupsResourceCalculator.java:252) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:181) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457) > Caused by: java.io.FileNotFoundException: > /sys/fs/cgroup/cpu,cpuacct/hadoop-yarn/container_e01_1521209613260_0002_01_000002/cpuacct.stat > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.<init>(FileInputStream.java:138) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320) > ... 4 more > 2018-03-16 14:30:33,438 WARN > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator: > Failed to parse cgroups > /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_000002/memory.memsw.usage_in_bytes > org.apache.hadoop.yarn.exceptions.YarnException: The process vanished in the > interim 12844 > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:336) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.getMemorySize(CGroupsResourceCalculator.java:238) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.updateProcessTree(CGroupsResourceCalculator.java:187) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CombinedResourceCalculator.updateProcessTree(CombinedResourceCalculator.java:52) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl$MonitoringThread.run(ContainersMonitorImpl.java:457) > Caused by: java.io.FileNotFoundException: > /sys/fs/cgroup/memory/hadoop-yarn/container_e01_1521209613260_0002_01_000002/memory.usage_in_bytes > (No such file or directory) > at java.io.FileInputStream.open0(Native Method) > at java.io.FileInputStream.open(FileInputStream.java:195) > at java.io.FileInputStream.<init>(FileInputStream.java:138) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.resources.CGroupsResourceCalculator.processFile(CGroupsResourceCalculator.java:320) > ... 4 more{code} > We should consider moving the exception to debug to reduce the noise at a > minimum. Alternatively, it may make sense to stop the existing > {{MonitoringThread}} during relaunch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org