Very intriguing, considering that we teach our students to avoid recursion where possible for this very reason.
Googling reveals http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4675952 and http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5050507 so you could try increasing the Java stack size in bin/nutch (-Xss), or use an alternate regexp if you can. Just out of curiosity, why does a performance critical program such as Nutch use Sun's backtracking-based regexp implementation rather than an efficient Thompson-based one? Do you need the additional expressiveness provided by PCRE? - Godmar On Mon, Jan 11, 2010 at 11:24 AM, Eric Osgood <e...@lakemeadonline.com> wrote: > During a crawl of about 3.8M tlds to a depth of 2, when I try to index the > segments, I get the following error: > > java.lang.StackOverflowError > at java.util.regex.Pattern$Loop.match(Pattern.java:4295) > Any help with this error would be much appreciated, I have encountered this > before. > > here is the last 10 lines of the hadoop.log file: > > tail -n 10 hadoop.log.2010-01-10 > at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227) > at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078) > at java.util.regex.Pattern$Ques.match(Pattern.java:3691) > at java.util.regex.Pattern$Branch.match(Pattern.java:4114) > at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168) > at java.util.regex.Pattern$Loop.match(Pattern.java:4295) > at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227) > at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078) > at java.util.regex.Pattern$Ques.match(Pattern.java:3691) > 2010-01-11 00:31:53,221 WARN io.UTF8 - truncating long string: 62492 chars, > starting with java.lang.StackOverf > > > > Eric Osgood > --------------------------------------------- > Cal Poly - Computer Engineering, Moon Valley Software > --------------------------------------------- > eosg...@calpoly.edu, e...@lakemeadonline.com > --------------------------------------------- > www.calpoly.edu/~eosgood, www.lakemeadonline.com > >