Hi Ferdy, When you get the "Out of memory error" if you have these opzions on the JVM: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/tmp You get file on your filesystem with a heap dump at the instant of the problem.
You can use http://www.eclipse.org/mat/ (an eclipse's extension) that is a Java heap analyzer that helps you find memory leaks and reduce memory consumption. It can be useful for you to understand better if the are really problem. However the first test to do (for me) is just to increase the memory and try if so it works. On Thu, Aug 9, 2012 at 9:07 AM, Ferdy Galema <[email protected]>wrote: > Hi, > > Of course setting a bigger heap sure helps, but most of the time only > temporary. Can you see in the logs what type of documents are parsed? > > In case of html documents crawled on the wild web, a single document can > cause the heap to explode. By default the cyberneko parser (in HtmlParser) > is used for html documents. I hacked this library so that there are limits > in the number of elements that are loaded during a parse. (I'm still trying > to find a way to contribute this back into the codebase). > > Ferdy. > > On Wed, Aug 8, 2012 at 10:03 PM, Niccolò Becchi <[email protected] > >wrote: > > > If you are using Nutch in an hadoop cluster and you have enough memory > try > > with this parameters: > > > > <property> > > <name>mapred.child.java.opts</name> > > <value>-Xmx1600m -XX:-UseGCOverheadLimit > > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/tmp</value> > > </property> > > > > On Wed, Aug 8, 2012 at 9:32 PM, Bai Shen <[email protected]> > wrote: > > > > > Is this something other people are seeing? I was parsing 10k urls > when I > > > got this exception. I'm running Nutch 2 head as of Aug 6 with the > > default > > > memory settings(1 GB). > > > > > > Just wondering if anybody else has experienced this on Nutch 2. > > > > > > Thanks. > > > > > >

