Hi,

more information would be useful:
- exact Nutch version (2.?)
- how Nutch is called (eg, via bin/crawl)
- details of the configuration, esp.
  -depth
  -topN
  http.content.limit
  fetcher.parse
- storage back-end

In general, something is wrong. Maybe, some oversized documents
are crawled. But even for a large PDF (several MB) 2GB heap size
should be enough.

You can try to identify the documents/URLs which cause the hang-up:
 
http://stackoverflow.com/questions/10331440/nutch-fetcher-aborting-with-n-hung-threads
Also keep track of:
 https://issues.apache.org/jira/browse/NUTCH-1182

Sebastian

On 04/22/2013 08:18 PM, Bai Shen wrote:
> I'm crawling a local server.  I have Nutch 2 working on a local machine
> with the default 1G heap size.  I got several OOM errors, but the fetch
> eventually finishes.
> 
> In order to get rid of the OOM errors, I moved everything to a machine with
> more memory and increased the heap size to 8G.  However, I'm still getting
> the OOM errors and now I'm having Nutch abort hung threads.  After it
> aborts the hung threads, Nutch itself hangs.
> 
> Any idea what could be causing this or what to look at?  hadoop.log shows
> nothing after the "Aborting with 1 hung threads." message.
> 
> Thanks.
> 

Reply via email to