Hi, I tried parsing of 1,00,000 urls with the trunk version of Nutch. However, I still get the same error "OutOfMemory Exception" for Java Heap space. Any ideas how to get past this error.
Bartosz Gadzimski wrote: > > manavr pisze: >> Hi, >> >> I have a set of 1,00,000 urls that I am trying to crawl and index. I have >> heap memory size for child tasktrackers set to 512MB. I have disabled pdf >> and doc parsing currently. I am running this on Nutch-0.8 with 1 RHEL >> node >> with depth to set to 1. >> >> I get this OutOfMemoryException for Java Heap Space while running the >> parse >> job. The parse_data directory doesnt exist at any time during the job >> execution. Despite several re-runs, I get the same exception repeatedly. >> I >> re-ran the crawling for 20,000 urls and the entire thing runs fine. >> >> Is Nutch known to fail with large sets of urls ? Is there a patch >> available >> or am I missing something. >> >> Thanks, >> Manav >> > On website you have version 0.9 and in trunk (nightly builds) almost 1.0 > (it's very stable). > > Download it and try. > > Regards, > Bartosz > > -- View this message in context: http://www.nabble.com/OutOfMemory-Exception-in-parsing-tp22178719p22196803.html Sent from the Nutch - User mailing list archive at Nabble.com.
