I've seen stack overflow errors, but I believe they were due to the JavaScript parsing plugin.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Siddhartha Reddy <[EMAIL PROTECTED]> > To: [email protected] > Sent: Wednesday, June 11, 2008 11:32:16 PM > Subject: java.lang.StackOverflowError in HTMLMetaProcessor.getMetaTagsHelper > > Hi, > > While parsing some pages, I am getting a java.lang.StackOverflowError > exception due to the recursion in HTMLMetaProcessor.getMetaTagsHelper. I'm > pasting part of the stack trace below. Unfortunately, I've logic that > deletes the segment if fetch/parse fails, so I do not know which particular > web page caused this problem; I'll recrawl the same pages with modified > logic (that does not delete the segment on failed parsing) and try to find > the offending URL. > > Did anyone encounter such a problem before? Apart from increasing the stack > size for Java, is there any other possible solution? > > java.lang.StackOverflowError > at java.lang.Character.toUpperCase(Character.java:4278) > at java.lang.String.regionMatches(String.java:1384) > at java.lang.String.equalsIgnoreCase(String.java:1120) > at > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:55) > at > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208) > at > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208) > at > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208) > at > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208) > at > org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208) > .... > > Thanks, > Siddhartha
