Hi Clay,All Thank you for your response and help. I already solved that. i found that there are many reasons for this issue: the OutOfMemoryError: unable to create new native thread. the mmap maybe one of the reasons. But it's not the problem i met.( i use docker to run sls).
the key for my issue is: SLS-runner.xml <property> <name>yarn.sls.runner.pool.size</name> <value>100000</value> </property> the value i set is too large. in the TaskRunner.java#155: https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-sls/src/main/java/org/apache/hadoop/yarn/sls/scheduler/TaskRunner.java#L155 executor = new ThreadPoolExecutor(threadPoolSize, threadPoolSize, 0, TimeUnit.MILLISECONDS, queue); ThreadPoolExecutor: ThreadPoolExecutor(int corePoolSize, int maximumPoolSize, long keepAliveTime, TimeUnit unit, BlockingQueue<Runnable> workQueue) So the both corePoolSize and maximumPoolSize are set the same size:threadPoolSize. So if the users set the value of yarn.sls.runner.pool.size too large. the corePoolSize will cause OutOfMemoryError error. And in fact, for the sake of convenience and insurance, users will always set a big value. Here is my solution: 1. I think we need set a max_PoolSize in slsconfiguration.java. and take it as the second parameter: executor = new ThreadPoolExecutor(threadPoolSize, max_PoolSize, 0, TimeUnit.MILLISECONDS, queue); 2. We also can judge the size of input threadPoolSize, if lager than max_PoolSize, throws an error or warn to remind users. I can submit an patch for my solution. What do you think? ------------------------------------------------------------------ Hi Sichen, I would expect you are running out of mmap ranges on most stock Linux kernels. (Each thread takes a mmap slot.) You can increase your vm.max_map_count[1] to see if that helps. -Clay [1]: A discussion on effecting the change: https://www.systutorials.com/241561/maximum-number-of-mmaped-ranges-and-how-to-set-it-on-linux/ On Tue, 24 Jul 2018, 赵思晨(思霖) wrote: > Hi, > I am running 200+ jobs, and each job contains 100 tasks, when i using > slsrun.sh > to start SLS. > it came out error: > > 2018-07-24 04:47:27,957 INFO capacity.CapacityScheduler: Added node 11.178.150 > .104:1604 clusterResource: <memory:821760000, vCores:15408000, disk: 609900000 > 0M, resource2: 8025G> > Exception in thread "main" java.lang.OutOfMemoryError: unable to create new na > tive thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:717) > at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecuto > r.java:957) > at java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(Thre > adPoolExecutor.java:1617) > at org.apache.hadoop.yarn.sls.scheduler.TaskRunner.start(TaskRunner.ja > va:157) > at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:247) > at org.apache.hadoop.yarn.sls.SLSRunner.run(SLSRunner.java:950) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:957) > > I set the Xmx and Xms in Hadoop-env.sh: -Xmx20480m, -Xms20480m, but still > doesn't work. > > Anyone help me? > > thanks inadvance > > Sichen > >