Hi,

While parsing some pages, I am getting a java.lang.StackOverflowError
exception due to the recursion in HTMLMetaProcessor.getMetaTagsHelper. I'm
pasting part of the stack trace below. Unfortunately, I've logic that
deletes the segment if fetch/parse fails, so I do not know which particular
web page caused this problem; I'll recrawl the same pages with modified
logic (that does not delete the segment on failed parsing) and try to find
the offending URL.

Did anyone encounter such a problem before? Apart from increasing the stack
size for Java, is there any other possible solution?

java.lang.StackOverflowError
        at java.lang.Character.toUpperCase(Character.java:4278)
        at java.lang.String.regionMatches(String.java:1384)
        at java.lang.String.equalsIgnoreCase(String.java:1120)
        at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:55)
        at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
        at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
        at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
        at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
        at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
        ....

Thanks,
Siddhartha

Reply via email to