Chandni Singh created YARN-9839:
-----------------------------------

             Summary: NodeManager java.lang.OutOfMemoryError unable to create 
new native thread
                 Key: YARN-9839
                 URL: https://issues.apache.org/jira/browse/YARN-9839
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Chandni Singh
            Assignee: Chandni Singh


NM fails with the below error even though the ulimit for NM is large.

{code}
2019-09-12 10:27:46,348 ERROR org.apache.hadoop.util.Shell: Caught 
java.lang.OutOfMemoryError: unable to create new native thread. One possible 
reason is that ulimit setting of 'max user processes' is too low. If so, do 
'ulimit -u <largerNum>' and try again.
2019-09-12 10:27:46,348 FATAL 
org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
Thread[LocalizerRunner for container_e95_1568242982456_152026_01_000132,5,main] 
threw an Error.  Shutting down now...
java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:717)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:562)
        at org.apache.hadoop.util.Shell.run(Shell.java:482)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:776)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:869)
        at org.apache.hadoop.util.Shell.execCommand(Shell.java:852)
        at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
        at 
org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:659)
        at 
org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:634)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.checkLocalDir(ResourceLocalizationService.java:1441)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.getInitializedLocalDirs(ResourceLocalizationService.java:1405)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.access$800(ResourceLocalizationService.java:140)
        at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114)
{code}

For each container localization request, there is a {{LocalizerRunner}} thread 
created and each {{LocalizerRunner}} creates another thread to get file 
permission info which is where we see this failure from. It is in Shell.java -> 
{{runCommand()}}

{code}
    Thread errThread = new Thread() {
      @Override
      public void run() {
        try {
          String line = errReader.readLine();
          while((line != null) && !isInterrupted()) {
            errMsg.append(line);
            errMsg.append(System.getProperty("line.separator"));
            line = errReader.readLine();
          }
        } catch(IOException ioe) {
          LOG.warn("Error reading the error stream", ioe);
        }
      }
    };
{code}

{{LocalizerRunner}} are Threads which are cached in 
{{ResourceLocalizationService}}. Looking into a possibility if they are not 
getting removed from the cache.







--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to