I've seen stack overflow errors, but I believe they were due to the JavaScript 
parsing plugin.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


----- Original Message ----
> From: Siddhartha Reddy <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, June 11, 2008 11:32:16 PM
> Subject: java.lang.StackOverflowError in HTMLMetaProcessor.getMetaTagsHelper
> 
> Hi,
> 
> While parsing some pages, I am getting a java.lang.StackOverflowError
> exception due to the recursion in HTMLMetaProcessor.getMetaTagsHelper. I'm
> pasting part of the stack trace below. Unfortunately, I've logic that
> deletes the segment if fetch/parse fails, so I do not know which particular
> web page caused this problem; I'll recrawl the same pages with modified
> logic (that does not delete the segment on failed parsing) and try to find
> the offending URL.
> 
> Did anyone encounter such a problem before? Apart from increasing the stack
> size for Java, is there any other possible solution?
> 
> java.lang.StackOverflowError
>         at java.lang.Character.toUpperCase(Character.java:4278)
>         at java.lang.String.regionMatches(String.java:1384)
>         at java.lang.String.equalsIgnoreCase(String.java:1120)
>         at
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:55)
>         at
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
>         at
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
>         at
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
>         at
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
>         at
> org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
>         ....
> 
> Thanks,
> Siddhartha

Reply via email to