Hi Otis,
Thanks for the reply. I just noticed that I'm also getting
StackOverFlowErrors due to the JavaScript parsing plugin. Can you please
tell me how you worked around this problem?
The stack trace:
java.lang.StackOverflowError
at org.apache.xerces.dom.ParentNode.getLength(Unknown Source)
at
org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:147)
at
org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148)
at
org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148)
at
org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148)
at
org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148)
at
org.apache.nutch.parse.js.JSParseFilter.walk(JSParseFilter.java:148)
....
Thanks,
Siddhartha
On Thu, Jun 12, 2008 at 9:22 AM, <[EMAIL PROTECTED]> wrote:
> I've seen stack overflow errors, but I believe they were due to the
> JavaScript parsing plugin.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> ----- Original Message ----
> > From: Siddhartha Reddy <[EMAIL PROTECTED]>
> > To: [email protected]
> > Sent: Wednesday, June 11, 2008 11:32:16 PM
> > Subject: java.lang.StackOverflowError in
> HTMLMetaProcessor.getMetaTagsHelper
> >
> > Hi,
> >
> > While parsing some pages, I am getting a java.lang.StackOverflowError
> > exception due to the recursion in HTMLMetaProcessor.getMetaTagsHelper.
> I'm
> > pasting part of the stack trace below. Unfortunately, I've logic that
> > deletes the segment if fetch/parse fails, so I do not know which
> particular
> > web page caused this problem; I'll recrawl the same pages with modified
> > logic (that does not delete the segment on failed parsing) and try to
> find
> > the offending URL.
> >
> > Did anyone encounter such a problem before? Apart from increasing the
> stack
> > size for Java, is there any other possible solution?
> >
> > java.lang.StackOverflowError
> > at java.lang.Character.toUpperCase(Character.java:4278)
> > at java.lang.String.regionMatches(String.java:1384)
> > at java.lang.String.equalsIgnoreCase(String.java:1120)
> > at
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:55)
> > at
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
> > at
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
> > at
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
> > at
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
> > at
> >
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
> > ....
> >
> > Thanks,
> > Siddhartha
>
>
--
http://sids.in
"If you are not having fun, you are not doing it right."