Hi,
While parsing some pages, I am getting a java.lang.StackOverflowError
exception due to the recursion in HTMLMetaProcessor.getMetaTagsHelper. I'm
pasting part of the stack trace below. Unfortunately, I've logic that
deletes the segment if fetch/parse fails, so I do not know which particular
web page caused this problem; I'll recrawl the same pages with modified
logic (that does not delete the segment on failed parsing) and try to find
the offending URL.
Did anyone encounter such a problem before? Apart from increasing the stack
size for Java, is there any other possible solution?
java.lang.StackOverflowError
at java.lang.Character.toUpperCase(Character.java:4278)
at java.lang.String.regionMatches(String.java:1384)
at java.lang.String.equalsIgnoreCase(String.java:1120)
at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:55)
at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
at
org.apache.nutch.parse.html.HTMLMetaProcessor.getMetaTagsHelper(HTMLMetaProcessor.java:208)
....
Thanks,
Siddhartha