Sorry, i am looking to crawl 400k documents with the crawl. I said 400 in
my last message.


On Sat, Mar 2, 2013 at 2:12 PM, kiran chitturi <[email protected]>wrote:

> Hi!
>
> I am running Nutch 1.6 on a 4 GB Mac OS desktop with Core i5 2.8GHz.
>
> Last night i started a crawl on local mode for 5 seeds with the config
> given below. If the crawl goes well, it should fetch a total of 400
> documents. The crawling is done on a single host that we own.
>
> Config
> ---------------------
>
> fetcher.threads.per.queue - 2
> fetcher.server.delay - 1
> fetcher.throughput.threshold.pages - -1
>
> crawl script settings
> ----------------------------
> timeLimitFetch- 30
> numThreads - 5
> topN - 10000
> mapred.child.java.opts=-Xmx1000m
>
>
> I have noticed today that the crawl has stopped due to an error and i have
> found the below error in logs.
>
> 2013-03-01 21:45:03,767 INFO  parse.ParseSegment - Parsed (0ms):
>> http://scholar.lib.vt.edu/ejournals/JARS/v33n3/v33n3-letcher.htm
>> 2013-03-01 21:45:03,790 WARN  mapred.LocalJobRunner - job_local_0001
>> java.lang.OutOfMemoryError: unable to create new native thread
>>         at java.lang.Thread.start0(Native Method)
>>         at java.lang.Thread.start(Thread.java:658)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:681)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655)
>>         at
>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
>>         at org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:159)
>>         at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:93)
>>         at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97)
>>         at org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44)
>>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> (END)
>
>
>
> Did anyone run in to the same issue ? I am not sure why the new native
> thread is not being created. The link here says [0] that it might due to
> the limitation of number of processes in my OS. Will increase them solve
> the issue ?
>
>
> [0] - http://ww2.cs.fsu.edu/~czhang/errors.html
>
> Thanks!
>
> --
> Kiran Chitturi
>



-- 
Kiran Chitturi

Reply via email to