In distributed mode you'll have to specify the parameter mapred.child.java.opts in your conf/hadoop-site.xml so that the value is sent to the hadoop slaves. Another way to do that is to specify it on the command line with : -D mapred.child.java.opts=-Xmx750m
I might be wrong but I think that the value set in bin/nutch will affect only the stuff running on the master, typically JobTracker and NameNode. 2009/2/25 Koch Martina <[email protected]> > Did you try to increase the heap size using the Xmx parameter, e.g. setting > it to -Xmx2000m or higher, depending on your RAM ressources? Default setting > in bin/nutch script is 1000 MB. > > Kind regards, > Martina > > -----Ursprüngliche Nachricht----- > Von: manavr [mailto:[email protected]] > Gesendet: Mittwoch, 25. Februar 2009 06:56 > An: [email protected] > Betreff: Re: OutOfMemory Exception in parsing > > > Hi, > > I tried parsing of 1,00,000 urls with the trunk version of Nutch. However, > I > still get the same error "OutOfMemory Exception" for Java Heap space. Any > ideas how to get past this error. > > > Bartosz Gadzimski wrote: > > > > manavr pisze: > >> Hi, > >> > >> I have a set of 1,00,000 urls that I am trying to crawl and index. I > have > >> heap memory size for child tasktrackers set to 512MB. I have disabled > pdf > >> and doc parsing currently. I am running this on Nutch-0.8 with 1 RHEL > >> node > >> with depth to set to 1. > >> > >> I get this OutOfMemoryException for Java Heap Space while running the > >> parse > >> job. The parse_data directory doesnt exist at any time during the job > >> execution. Despite several re-runs, I get the same exception repeatedly. > >> I > >> re-ran the crawling for 20,000 urls and the entire thing runs fine. > >> > >> Is Nutch known to fail with large sets of urls ? Is there a patch > >> available > >> or am I missing something. > >> > >> Thanks, > >> Manav > >> > > On website you have version 0.9 and in trunk (nightly builds) almost 1.0 > > (it's very stable). > > > > Download it and try. > > > > Regards, > > Bartosz > > > > > > -- > View this message in context: > http://www.nabble.com/OutOfMemory-Exception-in-parsing-tp22178719p22196803.html > Sent from the Nutch - User mailing list archive at Nabble.com. > > -- DigitalPebble Ltd http://www.digitalpebble.com
