Re: OutOfMemory Exception in parsing

Bartosz Gadzimski Tue, 24 Feb 2009 02:12:00 -0800

manavr pisze:

Hi,


I have a set of 1,00,000 urls that I am trying to crawl and index. I have
heap memory size for child tasktrackers set to 512MB. I have disabled pdf
and doc parsing currently. I am running this on Nutch-0.8 with 1 RHEL node

with depth to set to 1.

I get this OutOfMemoryException for Java Heap Space while running the parse
job. The parse_data directory doesnt exist at any time during the job
execution. Despite several re-runs, I get the same exception repeatedly. I

re-ran the crawling for 20,000 urls and the entire thing runs fine.

Is Nutch known to fail with large sets of urls ? Is there a patch available
or am I missing something.

Thanks,
Manav

On website you have version 0.9 and in trunk (nightly builds) almost 1.0(it's very stable).


Download it and try.

Regards,
Bartosz

Re: OutOfMemory Exception in parsing

Reply via email to