Hi Sybille

this issue may caused by this code executor service that use cached thread
pool. in the same time, user call a lot of parse method and this lead to
create a lot of thread.

check the code

executorService = Executors.newCachedThreadPool(new ThreadFactoryBuilder()
      .setNameFormat("parse-%d").setDaemon(true).build());

one solution is use fixed thread pool

int threadPoolSize = 10;

executorService = Executors.newFixedThreadPool(threadPoolSize,new
ThreadFactoryBuilder()
      .setNameFormat("parse-%d").setDaemon(true).build());

thanks.




On Fri, Oct 18, 2013 at 9:32 PM, Sybille Peters <[email protected]
> wrote:

> Hello,
>
> using the default crawl script (runtime/local/bin/crawl) the parser will
> crash trying to create a new thread after parsing slightly more than 5000
> documents.
>
> This only happens if the number of documents to crawl (generate -topN) is
> set to > 5000.
>
> Monitoring the number of threads created by the nutch java process: it
> increases to about 5700 before the crash occurs.
>
> I thought that the parser would not create that many threads in the first
> place. Is this a bug/misconfiguration? Ist there any way to limit the
> number of threads explicitly for parsing?
>
> I found this thread and it is recommended to decrease the number of urls
> (topN): http://lucene.472066.n3.**nabble.com/Nutch-1-6-java-**
> lang-OutOfMemoryError-unable-**to-create-new-native-thread-**
> td4044231.html<http://lucene.472066.n3.nabble.com/Nutch-1-6-java-lang-OutOfMemoryError-unable-to-create-new-native-thread-td4044231.html>
>
> Is this the only possible solution? Older nutch versions did not have this
> problem.
>
> Parameters:
> ---------------
> numSlaves=1
> numTasks=`expr $numSlaves \* 2`
> commonOptions="-D mapred.reduce.tasks=$numTasks -D mapred.child.java.opts=-
> **Xmx1000m -D mapred.reduce.tasks.**speculative.execution=false -D
> mapred.map.tasks.speculative.**execution=false -D
> mapred.compress.map.output=**true"
> skipRecordsOptions="-D mapred.skip.attempts.to.start.**skipping=2 -D
> mapred.skip.map.max.skip.**records=1"
>
> $bin/nutch parse $commonOptions $skipRecordsOptions
> $CRAWL_PATH/segments/$SEGMENT
>
> hadoop.log
> ----------------
>
> 2013-10-18 14:57:28,294 INFO  parse.ParseSegment - Parsed (0ms):
> http://www....
> 2013-10-18 14:57:28,301 WARN  mapred.LocalJobRunner -
> job_local613646134_0001
> java.lang.Exception: java.lang.OutOfMemoryError: unable to create new
> native thread
>     at org.apache.hadoop.mapred.**LocalJobRunner$Job.run(**
> LocalJobRunner.java:354)
> Caused by: java.lang.OutOfMemoryError: unable to create new native thread
>     at java.lang.Thread.start0(Native Method)
>     at java.lang.Thread.start(Thread.**java:640)
>     at java.util.concurrent.**ThreadPoolExecutor.**
> addIfUnderMaximumPoolSize(**ThreadPoolExecutor.java:727)
>     at java.util.concurrent.**ThreadPoolExecutor.execute(**
> ThreadPoolExecutor.java:657)
>     at java.util.concurrent.**AbstractExecutorService.**submit(**
> AbstractExecutorService.java:**92)
>     at org.apache.nutch.parse.**ParseUtil.runParser(ParseUtil.**java:159)
>     at org.apache.nutch.parse.**ParseUtil.parse(ParseUtil.**java:93)
>     at org.apache.nutch.parse.**ParseSegment.map(ParseSegment.**java:97)
>     at org.apache.nutch.parse.**ParseSegment.map(ParseSegment.**java:44)
>     at org.apache.hadoop.mapred.**MapRunner.run(MapRunner.java:**50)
>     at org.apache.hadoop.mapred.**MapTask.runOldMapper(MapTask.**java:430)
>     at org.apache.hadoop.mapred.**MapTask.run(MapTask.java:366)
>     at org.apache.hadoop.mapred.**LocalJobRunner$Job$**
> MapTaskRunnable.run(**LocalJobRunner.java:223)
>     at java.util.concurrent.**Executors$RunnableAdapter.**
> call(Executors.java:441)
>     at java.util.concurrent.**FutureTask$Sync.innerRun(**
> FutureTask.java:303)
>     at java.util.concurrent.**FutureTask.run(FutureTask.**java:138)
>     at java.util.concurrent.**ThreadPoolExecutor$Worker.**
> runTask(ThreadPoolExecutor.**java:886)
>     at java.util.concurrent.**ThreadPoolExecutor$Worker.run(**
> ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.**java:662)
> -----------------
>
> Any help (especially information) is appreciated.
>
> Sybille
>
>
>


-- 
Don't Grow Old, Grow Up... :-)

Reply via email to