> using low value for topN(2000) than 10000
That would mean: you need 200 rounds and also 200 segments for 400k documents.
That's a work-around no solution!

If you find the time you should trace the process.
Seems to be either a misconfiguration or even a bug.

Sebastian

On 03/03/2013 09:45 PM, kiran chitturi wrote:
> Thanks Sebastian for the suggestions. I came over this by using low value
> for topN(2000) than 10000. I decided to use lower value for topN with more
> rounds.
> 
> 
> On Sun, Mar 3, 2013 at 3:41 PM, Sebastian Nagel
> <wastl.na...@googlemail.com>wrote:
> 
>> Hi Kiran,
>>
>> there are many possible reasons for the problem. Beside the limits on the
>> number of processes
>> the stack size in the Java VM and the system (see java -Xss and ulimit -s).
>>
>> I think in local mode there should be only one mapper and consequently only
>> one thread spent for parsing. So the number of processes/threads is hardly
>> the
>> problem suggested that you don't run any other number crunching tasks in
>> parallel
>> on your desktop.
>>
>> Luckily, you should be able to retry via "bin/nutch parse ..."
>> Then trace the system and the Java process to catch the reason.
>>
>> Sebastian
>>
>> On 03/02/2013 08:13 PM, kiran chitturi wrote:
>>> Sorry, i am looking to crawl 400k documents with the crawl. I said 400 in
>>> my last message.
>>>
>>>
>>> On Sat, Mar 2, 2013 at 2:12 PM, kiran chitturi <
>> chitturikira...@gmail.com>wrote:
>>>
>>>> Hi!
>>>>
>>>> I am running Nutch 1.6 on a 4 GB Mac OS desktop with Core i5 2.8GHz.
>>>>
>>>> Last night i started a crawl on local mode for 5 seeds with the config
>>>> given below. If the crawl goes well, it should fetch a total of 400
>>>> documents. The crawling is done on a single host that we own.
>>>>
>>>> Config
>>>> ---------------------
>>>>
>>>> fetcher.threads.per.queue - 2
>>>> fetcher.server.delay - 1
>>>> fetcher.throughput.threshold.pages - -1
>>>>
>>>> crawl script settings
>>>> ----------------------------
>>>> timeLimitFetch- 30
>>>> numThreads - 5
>>>> topN - 10000
>>>> mapred.child.java.opts=-Xmx1000m
>>>>
>>>>
>>>> I have noticed today that the crawl has stopped due to an error and i
>> have
>>>> found the below error in logs.
>>>>
>>>> 2013-03-01 21:45:03,767 INFO  parse.ParseSegment - Parsed (0ms):
>>>>> http://scholar.lib.vt.edu/ejournals/JARS/v33n3/v33n3-letcher.htm
>>>>> 2013-03-01 21:45:03,790 WARN  mapred.LocalJobRunner - job_local_0001
>>>>> java.lang.OutOfMemoryError: unable to create new native thread
>>>>>         at java.lang.Thread.start0(Native Method)
>>>>>         at java.lang.Thread.start(Thread.java:658)
>>>>>         at
>>>>>
>> java.util.concurrent.ThreadPoolExecutor.addThread(ThreadPoolExecutor.java:681)
>>>>>         at
>>>>>
>> java.util.concurrent.ThreadPoolExecutor.addIfUnderMaximumPoolSize(ThreadPoolExecutor.java:727)
>>>>>         at
>>>>>
>> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:655)
>>>>>         at
>>>>>
>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
>>>>>         at
>> org.apache.nutch.parse.ParseUtil.runParser(ParseUtil.java:159)
>>>>>         at org.apache.nutch.parse.ParseUtil.parse(PaifrseUtil.java:93)
>>>>>         at
>> org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:97)
>>>>>         at
>> org.apache.nutch.parse.ParseSegment.map(ParseSegment.java:44)
>>>>>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>>>>         at
>> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
>>>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>>>>         at
>>>>>
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>>>>> (END)
>>>>
>>>>
>>>>
>>>> Did anyone run in to the same issue ? I am not sure why the new native
>>>> thread is not being created. The link here says [0] that it might due to
>>>> the limitation of number of processes in my OS. Will increase them solve
>>>> the issue ?
>>>>
>>>>
>>>> [0] - http://ww2.cs.fsu.edu/~czhang/errors.html
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>> Kiran Chitturi
>>>>
>>>
>>>
>>>
>>
>>
> 
> 

Reply via email to