[ https://issues.apache.org/jira/browse/YARN-9839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chandni Singh updated YARN-9839: -------------------------------- Description: NM fails with the below error even though the ulimit for NM is large. {code} 2019-09-12 10:27:46,348 ERROR org.apache.hadoop.util.Shell: Caught java.lang.OutOfMemoryError: unable to create new native thread. One possible reason is that ulimit setting of 'max user processes' is too low. If so, do 'ulimit -u <largerNum>' and try again. 2019-09-12 10:27:46,348 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[LocalizerRunner for container_e95_1568242982456_152026_01_000132,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at org.apache.hadoop.util.Shell.runCommand(Shell.java:562) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.util.Shell.execCommand(Shell.java:869) at org.apache.hadoop.util.Shell.execCommand(Shell.java:852) at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1441) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1405) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$800(ResourceLocalizationService.java:140) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114) {code} For each container localization request, there is a {{LocalizerRunner}} thread created and each {{LocalizerRunner}} creates another thread to get file permission info which is where we see this failure from. It is in Shell.java -> {{runCommand()}} {code} Thread errThread = new Thread() { @Override public void run() { try { String line = errReader.readLine(); while((line != null) && !isInterrupted()) { errMsg.append(line); errMsg.append(System.getProperty("line.separator")); line = errReader.readLine(); } } catch(IOException ioe) { LOG.warn("Error reading the error stream", ioe); } } }; {code} was: NM fails with the below error even though the ulimit for NM is large. {code} 2019-09-12 10:27:46,348 ERROR org.apache.hadoop.util.Shell: Caught java.lang.OutOfMemoryError: unable to create new native thread. One possible reason is that ulimit setting of 'max user processes' is too low. If so, do 'ulimit -u <largerNum>' and try again. 2019-09-12 10:27:46,348 FATAL org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[LocalizerRunner for container_e95_1568242982456_152026_01_000132,5,main] threw an Error. Shutting down now... java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:717) at org.apache.hadoop.util.Shell.runCommand(Shell.java:562) at org.apache.hadoop.util.Shell.run(Shell.java:482) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) at org.apache.hadoop.util.Shell.execCommand(Shell.java:869) at org.apache.hadoop.util.Shell.execCommand(Shell.java:852) at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659) at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1441) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1405) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$800(ResourceLocalizationService.java:140) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114) {code} For each container localization request, there is a {{LocalizerRunner}} thread created and each {{LocalizerRunner}} creates another thread to get file permission info which is where we see this failure from. It is in Shell.java -> {{runCommand()}} {code} Thread errThread = new Thread() { @Override public void run() { try { String line = errReader.readLine(); while((line != null) && !isInterrupted()) { errMsg.append(line); errMsg.append(System.getProperty("line.separator")); line = errReader.readLine(); } } catch(IOException ioe) { LOG.warn("Error reading the error stream", ioe); } } }; {code} {{LocalizerRunner}} are Threads which are cached in {{ResourceLocalizationService}}. Looking into a possibility if they are not getting removed from the cache. > NodeManager java.lang.OutOfMemoryError unable to create new native thread > ------------------------------------------------------------------------- > > Key: YARN-9839 > URL: https://issues.apache.org/jira/browse/YARN-9839 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Chandni Singh > Assignee: Chandni Singh > Priority: Major > > NM fails with the below error even though the ulimit for NM is large. > {code} > 2019-09-12 10:27:46,348 ERROR org.apache.hadoop.util.Shell: Caught > java.lang.OutOfMemoryError: unable to create new native thread. One possible > reason is that ulimit setting of 'max user processes' is too low. If so, do > 'ulimit -u <largerNum>' and try again. > 2019-09-12 10:27:46,348 FATAL > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[LocalizerRunner for > container_e95_1568242982456_152026_01_000132,5,main] threw an Error. > Shutting down now... > java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at org.apache.hadoop.util.Shell.runCommand(Shell.java:562) > at org.apache.hadoop.util.Shell.run(Shell.java:482) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:869) > at org.apache.hadoop.util.Shell.execCommand(Shell.java:852) > at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097) > at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659) > at > org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1441) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1405) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$800(ResourceLocalizationService.java:140) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114) > {code} > For each container localization request, there is a {{LocalizerRunner}} > thread created and each {{LocalizerRunner}} creates another thread to get > file permission info which is where we see this failure from. It is in > Shell.java -> {{runCommand()}} > {code} > Thread errThread = new Thread() { > @Override > public void run() { > try { > String line = errReader.readLine(); > while((line != null) && !isInterrupted()) { > errMsg.append(line); > errMsg.append(System.getProperty("line.separator")); > line = errReader.readLine(); > } > } catch(IOException ioe) { > LOG.warn("Error reading the error stream", ioe); > } > } > }; > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org