Re: OutOfMemory Exception in parsing

manavr Tue, 24 Feb 2009 21:56:21 -0800

Hi,

I tried parsing of 1,00,000 urls with the trunk version of Nutch. However, I
still get the same error "OutOfMemory Exception" for Java Heap space. Any
ideas how to get past this error.



Bartosz Gadzimski wrote:
> 
> manavr pisze:
>> Hi,
>>
>> I have a set of 1,00,000 urls that I am trying to crawl and index. I have
>> heap memory size for child tasktrackers set to 512MB. I have disabled pdf
>> and doc parsing currently. I am running this on Nutch-0.8 with 1 RHEL
>> node
>> with depth to set to 1.  
>>
>> I get this OutOfMemoryException for Java Heap Space while running the
>> parse
>> job. The parse_data directory doesnt exist at any time during the job
>> execution. Despite several re-runs, I get the same exception repeatedly.
>> I
>> re-ran the crawling for 20,000 urls and the entire thing runs fine. 
>>
>> Is Nutch known to fail with large sets of urls ? Is there a patch
>> available
>> or am I missing something.
>>
>> Thanks,
>> Manav
>>   
> On website you have version 0.9 and in trunk (nightly builds) almost 1.0 
> (it's very stable).
> 
> Download it and try.
> 
> Regards,
> Bartosz
> 
> 

-- 
View this message in context: 
http://www.nabble.com/OutOfMemory-Exception-in-parsing-tp22178719p22196803.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: OutOfMemory Exception in parsing

Reply via email to