Hi,

On 9/11/07, Tomislav Poljak <[EMAIL PROTECTED]> wrote:
> Hi Andrtzej,
> I am running fetcher in non-parsing mode, I have this in nutch-site.xml:
>
> <property>
>   <name>fetcher.parse</name>
>   <value>false</value>
>   <description>If true, fetcher will parse content.</description>
> </property>
>
> Maybe I didn't post a question correctly. I get a couple of fetcher
> threads  failing with java.lang.OutOfMemoryError like this (from
> hadoop.log):
>
> 2007-09-09 01:07:24,150 INFO  fetcher.Fetcher - fetching
> http://scholar.google.com/intl/en/scholar/libraries.html
> 2007-09-09 01:07:27,084 INFO  fetcher.Fetcher - fetch of
> http://logging.apache.org/log4j/1.2/faq.html failed with:
> java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:27,085 INFO  fetcher.Fetcher - fetching
> http://popular.ebay.com/ns/Tickets/Alabama+Tickets.html
> 2007-09-09 01:07:32,151 INFO  fetcher.Fetcher - fetch of
> http://hockey.fantasysports.yahoo.com/hockey/register/createjoin failed
> with: java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:32,817 INFO  fetcher.Fetcher - fetch of
> http://scholar.google.com/intl/en/scholar/libraries.html failed with:
> java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:32,817 FATAL fetcher.Fetcher -
> java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:33,380 FATAL fetcher.Fetcher - fetcher
> caught:java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:33,380 FATAL fetcher.Fetcher -
> java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:33,380 FATAL fetcher.Fetcher - fetcher
> caught:java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:37,865 INFO  fetcher.Fetcher - fetch of
> http://cn.yahoo.com/allservice/index.html failed with:
> java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:38,019 FATAL fetcher.Fetcher -
> java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:38,020 FATAL fetcher.Fetcher - fetcher
> caught:java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:42,887 INFO  fetcher.Fetcher - fetch of
> http://popular.ebay.com/ns/Tickets/Alabama+Tickets.html failed with:
> java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:43,045 FATAL fetcher.Fetcher -
> java.lang.OutOfMemoryError: Java heap space
> 2007-09-09 01:07:43,045 FATAL fetcher.Fetcher - fetcher
> caught:java.lang.OutOfMemoryError: Java heap space
>
> Any ideas why?

Has your fetch been going on for a long time? Nutch can leak some
plugins and classes in local mode. But it only becomes a problem if
you have too many maps (because each new map task loads new classes
without, it seems, unloading older ones.)

Related issue: NUTCH-356

>
> Thanks,
>       Tomislav
>
>
> On Mon, 2007-09-10 at 21:30 +0200, Andrzej Bialecki wrote:
> > Tomislav Poljak wrote:
> > > Hi,
> > > so I have dedicated 1000 Mb (-Xmx1000m) to Nutch java process when
> > > fetching (default settings). When using 10 threads I can fetch 25000
> > > urls, but when using 20 threads fetcher fails with:
> > > java.lang.OutOfMemoryError: Java heap space even when fetching 15000 url
> > > fetchlist. Is 20 threads to much for -Xmx1000m or is something else
> > > wrong? What would be recommended settings (number of threads, how much
> > > RAM is needed) for fetchi
>
> > ng a list of 100k urls (with best performance)?
> >
> > I routinely run crawls with 100 threads or more. If you're using the
> > fetcher in parsing mode (i.e. it not only fetches but also parses the
> > content) then your problem is likely related to the memory consumption
> > of a parsing plugin (such as PDF or MS Office parsers).
> >
> > I suggest to run the fetcher in non-parsing mode (-noParsing cmd-line
> > option), and then parsing the segment in a separate step (bin/nutch parse).
> >
> >
>
>


-- 
Doğacan Güney

Reply via email to