Very intriguing, considering that we teach our students to avoid
recursion where possible for this very reason.

Googling reveals
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4675952 and
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5050507 so you
could try increasing the Java stack size in bin/nutch (-Xss), or use
an alternate regexp if you can.

Just out of curiosity, why does a performance critical program such as
Nutch use Sun's backtracking-based regexp implementation rather than
an efficient Thompson-based one?  Do you need the additional
expressiveness provided by PCRE?

 - Godmar

On Mon, Jan 11, 2010 at 11:24 AM, Eric Osgood <e...@lakemeadonline.com> wrote:
> During a crawl of about 3.8M tlds to a depth of 2, when I try to index the 
> segments, I get the following error:
>
> java.lang.StackOverflowError
>        at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
> Any help with this error would be much appreciated, I have encountered this 
> before.
>
> here is the last 10 lines of the hadoop.log file:
>
> tail -n 10 hadoop.log.2010-01-10
>        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
>        at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
>        at java.util.regex.Pattern$Ques.match(Pattern.java:3691)
>        at java.util.regex.Pattern$Branch.match(Pattern.java:4114)
>        at java.util.regex.Pattern$GroupHead.match(Pattern.java:4168)
>        at java.util.regex.Pattern$Loop.match(Pattern.java:4295)
>        at java.util.regex.Pattern$GroupTail.match(Pattern.java:4227)
>        at java.util.regex.Pattern$BranchConn.match(Pattern.java:4078)
>        at java.util.regex.Pattern$Ques.match(Pattern.java:3691)
> 2010-01-11 00:31:53,221 WARN  io.UTF8 - truncating long string: 62492 chars, 
> starting with java.lang.StackOverf
>
>
>
> Eric Osgood
> ---------------------------------------------
> Cal Poly - Computer Engineering, Moon Valley Software
> ---------------------------------------------
> eosg...@calpoly.edu, e...@lakemeadonline.com
> ---------------------------------------------
> www.calpoly.edu/~eosgood, www.lakemeadonline.com
>
>

Reply via email to