In distributed mode you'll have to specify the parameter
mapred.child.java.opts in your conf/hadoop-site.xml so that the value is
sent to the hadoop slaves. Another way to do that is to specify it on the
command line with  : -D mapred.child.java.opts=-Xmx750m

I might be wrong but I think that the value set in bin/nutch will affect
only the stuff running on the master, typically JobTracker and NameNode.

2009/2/25 Koch Martina <[email protected]>

> Did you try to increase the heap size using the Xmx parameter, e.g. setting
> it to -Xmx2000m or higher, depending on your RAM ressources? Default setting
> in bin/nutch script is 1000 MB.
>
> Kind regards,
> Martina
>
> -----Ursprüngliche Nachricht-----
> Von: manavr [mailto:[email protected]]
> Gesendet: Mittwoch, 25. Februar 2009 06:56
> An: [email protected]
> Betreff: Re: OutOfMemory Exception in parsing
>
>
> Hi,
>
> I tried parsing of 1,00,000 urls with the trunk version of Nutch. However,
> I
> still get the same error "OutOfMemory Exception" for Java Heap space. Any
> ideas how to get past this error.
>
>
> Bartosz Gadzimski wrote:
> >
> > manavr pisze:
> >> Hi,
> >>
> >> I have a set of 1,00,000 urls that I am trying to crawl and index. I
> have
> >> heap memory size for child tasktrackers set to 512MB. I have disabled
> pdf
> >> and doc parsing currently. I am running this on Nutch-0.8 with 1 RHEL
> >> node
> >> with depth to set to 1.
> >>
> >> I get this OutOfMemoryException for Java Heap Space while running the
> >> parse
> >> job. The parse_data directory doesnt exist at any time during the job
> >> execution. Despite several re-runs, I get the same exception repeatedly.
> >> I
> >> re-ran the crawling for 20,000 urls and the entire thing runs fine.
> >>
> >> Is Nutch known to fail with large sets of urls ? Is there a patch
> >> available
> >> or am I missing something.
> >>
> >> Thanks,
> >> Manav
> >>
> > On website you have version 0.9 and in trunk (nightly builds) almost 1.0
> > (it's very stable).
> >
> > Download it and try.
> >
> > Regards,
> > Bartosz
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/OutOfMemory-Exception-in-parsing-tp22178719p22196803.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>


-- 
DigitalPebble Ltd
http://www.digitalpebble.com

Reply via email to