Hi all,
Has anybody else seen the java.lang.ArrayIndexOutOfBoundsException
error displayed in Diagnostic Text column of the jobdetail.jsp page
when running 0.8?
This occasionally seems to happen during the invert links phase. The
stack crawl looks like:
java.lang.ArrayIndexOutOfBoundsException at
java.util.zip.CRC32.update(CRC32.java:43) at
org.apache.nutch.fs.NFSDataInputStream$Checker.read(NFSDataInputStream.java:92)
at
org.apache.nutch.fs.NFSDataInputStream$PositionCache.read(NFSDataInputStream.java:156)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at
java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at
java.io.BufferedInputStream.read(BufferedInputStream.java:313) at
java.io.DataInputStream.readFully(DataInputStream.java:176) at
org.apache.nutch.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:55)
at
org.apache.nutch.io.DataOutputBuffer.write(DataOutputBuffer.java:89)
at
org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:378)
at
org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:301)
at
org.apache.nutch.io.SequenceFile$Reader.next(SequenceFile.java:323)
at
org.apache.nutch.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:60)
at org.apache.nutch.mapred.MapTask$2.next(MapTask.java:106) at
org.apache.nutch.mapred.MapRunner.run(MapRunner.java:48) at
org.apache.nutch.mapred.MapTask.run(MapTask.java:116) at
org.apache.nutch.mapred.TaskTracker$Child.main(TaskTracker.java:603)
For our most recent trial, I see this 15 times out of 4840 map
attempts (along with 25 socket timeout errors, thus 4800 actual maps
completed).
I see that Rod Taylor reported an error from the same general
location (http://issues.apache.org/jira/browse/NUTCH-170), but his
reported stack had one additional entry:
org.apache.nutch.segment.SegmentReader$InputFormat$1.next(SegmentReader.java:80)
Between the MapTask$2.next and the SequenceFileRecordReader.next calls.
Seems like there might be a bug hiding in this area of the code. I'm
going to wrap some extra debugging around it to get more info when an
error does occur.
Thanks,
-- Ken
--
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers