java.lang.OutOfMemoryError: unable to create new native thread

Julien Nioche Fri, 18 Oct 2013 06:51:56 -0700

Hi Sybille

The threads spawned by the parser should be reclaimed once a page has been
parsed. The parsing itself is not multi-threaded, so it would mean that
something is preventing the threads to be deleted, or maybe as the error
suggests you are running out of memory.


Do you specify parser.timeout in nutch-site.xml? Are you using any custom
HTMLParsingFilter?

The number of docs should not affect the memory. The parser runs on one
document after the other so that would indicate a leak. There was a related
issue not very long ago https://issues.apache.org/jira/browse/NUTCH-1640.
Can you patch your code accordingly or use the trunk? I never got to the
bottom of it but I am wondering whether this would fix the issue.

Thanks

Julien


On 18 October 2013 14:32, Sybille Peters <[email protected]>wrote:

> Hello,
>
> using the default crawl script (runtime/local/bin/crawl) the parser will
> crash trying to create a new thread after parsing slightly more than 5000
> documents.
>
> This only happens if the number of documents to crawl (generate -topN) is
> set to > 5000.
>
> Monitoring the number of threads created by the nutch java process: it
> increases to about 5700 before the crash occurs.
>
> I thought that the parser would not create that many threads in the first
> place. Is this a bug/misconfiguration? Ist there any way to limit the
> number of threads explicitly for parsing?
>
> I found this thread and it is recommended to decrease the number of urls
> (topN): http://lucene.472066.n3.**nabble.com/Nutch-1-6-java-**
> lang-OutOfMemoryError-unable-**to-create-new-native-thread-**
> td4044231.html<http://lucene.472066.n3.nabble.com/Nutch-1-6-java-lang-OutOfMemoryError-unable-to-create-new-native-thread-td4044231.html>
>
> Is this the only possible solution? Older nutch versions did not have this
> problem.
>
> Parameters:
> ---------------
> numSlaves=1
> numTasks=`expr $numSlaves \* 2`
> commonOptions="-D mapred.reduce.tasks=$numTasks -D mapred.child.java.opts=-
> **Xmx1000m -D mapred.reduce.tasks.**speculative.execution=false -D
> mapred.map.tasks.speculative.**execution=false -D
> mapred.compress.map.output=**true"
> skipRecordsOptions="-D mapred.skip.attempts.to.start.**skipping=2 -D
> mapred.skip.map.max.skip.**records=1"
>
> $bin/nutch parse $commonOptions $skipRecordsOptions
> $CRAWL_PATH/segments/$SEGMENT
>
> hadoop.log
> ----------------
>
> 2013-10-18 14:57:28,294 INFO  parse.ParseSegment - Parsed (0ms):
> http://www....
> 2013-10-18 14:57:28,301 WARN  mapred.LocalJobRunner -
> job_local613646134_0001
> java.lang.Exception: java.lang.OutOfMemoryError: unable to create new
> native thread
>     at org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**
> LocalJobRunner.java:354)
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>     at java.lang.Thread.start0(Native Method)
>     at java.lang.Thread.start(Thread.**java:640)
>     at java.util.concurrent.**ThreadPoolExecutor.**
> addIfUnderMaximumPoolSize(**ThreadPoolExecutor.java:727)
>     at java.util.concurrent.**ThreadPoolExecutor.execute(**
> ThreadPoolExecutor.java:657)
>     at java.util.concurrent.**AbstractExecutorService.**submit(**
> AbstractExecutorService.java:**92)
>     at org.apache.nutch.parse.**ParseUtil.runParser(ParseUtil.**java:159)
>     at org.apache.nutch.parse.**ParseUtil.parse(ParseUtil.**java:93)
>     at org.apache.nutch.parse.**ParseSegment.map(ParseSegment.**java:97)
>     at org.apache.nutch.parse.**ParseSegment.map(ParseSegment.**java:44)
>     at org.apache.hadoop.mapred.**MapRunner.run(MapRunner.java:**50)
>     at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**java:430)
>     at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:366)
>     at org.apache.hadoop.mapred.**LocalJobRunner$Job$**
> MapTaskRunnable.run(**LocalJobRunner.java:223)
>     at java.util.concurrent.**Executors$RunnableAdapter.**
> call(Executors.java:441)
>     at java.util.concurrent.**FutureTask$Sync.innerRun(**
> FutureTask.java:303)
>     at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>     at java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(ThreadPoolExecutor.**java:886)
>     at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.**java:662)
> -----------------
>
> Any help (especially information) is appreciated.
>
> Sybille
>
>
>


-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Nutch 1.7 / Parser / java.lang.OutOfMemoryError: unable to create new native thread

Reply via email to