Hi, more information would be useful: - exact Nutch version (2.?) - how Nutch is called (eg, via bin/crawl) - details of the configuration, esp. -depth -topN http.content.limit fetcher.parse - storage back-end
In general, something is wrong. Maybe, some oversized documents are crawled. But even for a large PDF (several MB) 2GB heap size should be enough. You can try to identify the documents/URLs which cause the hang-up: http://stackoverflow.com/questions/10331440/nutch-fetcher-aborting-with-n-hung-threads Also keep track of: https://issues.apache.org/jira/browse/NUTCH-1182 Sebastian On 04/22/2013 08:18 PM, Bai Shen wrote: > I'm crawling a local server. I have Nutch 2 working on a local machine > with the default 1G heap size. I got several OOM errors, but the fetch > eventually finishes. > > In order to get rid of the OOM errors, I moved everything to a machine with > more memory and increased the heap size to 8G. However, I'm still getting > the OOM errors and now I'm having Nutch abort hung threads. After it > aborts the hung threads, Nutch itself hangs. > > Any idea what could be causing this or what to look at? hadoop.log shows > nothing after the "Aborting with 1 hung threads." message. > > Thanks. >