java.lang.OutOfMemoryError: unable to create new native thread

Sybille Peters Fri, 18 Oct 2013 06:33:54 -0700

Hello,

using the default crawl script (runtime/local/bin/crawl) the parser willcrash trying to create a new thread after parsing slightly more than5000 documents.

This only happens if the number of documents to crawl (generate -topN)is set to > 5000.

Monitoring the number of threads created by the nutch java process: itincreases to about 5700 before the crash occurs.

I thought that the parser would not create that many threads in thefirst place. Is this a bug/misconfiguration? Ist there any way to limitthe number of threads explicitly for parsing?

I found this thread and it is recommended to decrease the number of urls(topN):http://lucene.472066.n3.nabble.com/Nutch-1-6-java-lang-OutOfMemoryError-unable-to-create-new-native-thread-td4044231.html

Is this the only possible solution? Older nutch versions did not havethis problem.


Parameters:
---------------
numSlaves=1
numTasks=`expr $numSlaves \* 2`

commonOptions="-D mapred.reduce.tasks=$numTasks -Dmapred.child.java.opts=-Xmx1000m -Dmapred.reduce.tasks.speculative.execution=false -Dmapred.map.tasks.speculative.execution=false -Dmapred.compress.map.output=true"skipRecordsOptions="-D mapred.skip.attempts.to.start.skipping=2 -Dmapred.skip.map.max.skip.records=1"

$bin/nutch parse $commonOptions $skipRecordsOptions$CRAWL_PATH/segments/$SEGMENT


hadoop.log
----------------

2013-10-18 14:57:28,294 INFO parse.ParseSegment - Parsed(0ms):http://www....2013-10-18 14:57:28,301 WARN mapred.LocalJobRunner -job_local613646134_0001java.lang.Exception: java.lang.OutOfMemoryError: unable to create newnative threadatorg.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)

Caused by: java.lang.OutOfMemoryError: unable to create new native thread
    at java.lang.Thread.start0(Native Method)
    at java.lang.Thread.start(Thread.java:640)

atjava.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727)atjava.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:657)atjava.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)

    at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:159)
    at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93)
    at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97)
    at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)

atorg.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)atjava.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)

atjava.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

    at java.lang.Thread.run(Thread.java:662)
-----------------

Any help (especially information) is appreciated.

Sybille

Nutch 1.7 / Parser / java.lang.OutOfMemoryError: unable to create new native thread

Reply via email to