Otis, that's a great fix! Unfortunately, I cannot do the same with the HTML parse plugin :(
The StackOverflowError is going away if I increase the stack size to 1024k. I'm specifying this by including the following in conf/hadoo-env.sh: export HADOOP_OPTS="-server -Xss1024k" But this is working only when using the 'local' JobTracker. Don't the tasks on the slaves pick up the environment from conf/hadoop-env.sh? I've stopped and started all the daemons after I made the change and have verified that the above line exists in the conf/hadoop-env.sh on all the slaves. Thanks, Siddhartha On Fri, Jun 13, 2008 at 9:43 AM, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > Removed the plugin from the config :) > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > ----- Original Message ---- > > From: Siddhartha Reddy <[EMAIL PROTECTED]> > > To: [email protected] > > Sent: Thursday, June 12, 2008 11:41:17 PM > > Subject: Re: java.lang.StackOverflowError in > HTMLMetaProcessor.getMetaTagsHelper > > > > Hi Otis, > > > > Thanks for the reply. I just noticed that I'm also getting > > StackOverFlowErrors due to the JavaScript parsing plugin. Can you please > > tell me how you worked around this problem? > > > > The stack trace: > > java.lang.StackOverflowError > > at org.apache.xerces.dom.ParentNode.getLength(Unknown Source) > > at > > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:147) > > at > > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > > at > > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > > at > > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > > at > > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > > at > > org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148) > > .... > > > > Thanks, > > Siddhartha > > > > On Thu, Jun 12, 2008 at 9:22 AM, wrote: > > > > > I've seen stack overflow errors, but I believe they were due to the > > > JavaScript parsing plugin. > > > > > > > > > Otis > > > -- > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > > ----- Original Message ---- > > > > From: Siddhartha Reddy > > > > To: [email protected] > > > > Sent: Wednesday, June 11, 2008 11:32:16 PM > > > > Subject: java.lang.StackOverflowError in > > > HTMLMetaProcessor.getMetaTagsHelper > > > > > > > > Hi, > > > > > > > > While parsing some pages, I am getting a java.lang.StackOverflowError > > > > exception due to the recursion in > HTMLMetaProcessor.getMetaTagsHelper. > > > I'm > > > > pasting part of the stack trace below. Unfortunately, I've logic that > > > > deletes the segment if fetch/parse fails, so I do not know which > > > particular > > > > web page caused this problem; I'll recrawl the same pages with > modified > > > > logic (that does not delete the segment on failed parsing) and try to > > > find > > > > the offending URL. > > > > > > > > Did anyone encounter such a problem before? Apart from increasing the > > > stack > > > > size for Java, is there any other possible solution? > > > > > > > > java.lang.StackOverflowError > > > > at java.lang.Character.toUpperCase(Character.java:4278) > > > > at java.lang.String.regionMatches(String.java:1384) > > > > at java.lang.String.equalsIgnoreCase(String.java:1120) > > > > at > > > > > > > > > > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:55) > > > > at > > > > > > > > > > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208) > > > > at > > > > > > > > > > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208) > > > > at > > > > > > > > > > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208) > > > > at > > > > > > > > > > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208) > > > > at > > > > > > > > > > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208) > > > > .... > > > > > > > > Thanks, > > > > Siddhartha > > > > > > > > > > > > -- > > http://sids.in > > "If you are not having fun, you are not doing it right." > > -- http://sids.in "If you are not having fun, you are not doing it right."
