It actually looks more like a segment's index is trashed.

Try using the following patch to identify the troubled segment, then re-index it.

Doug

Jason Boss wrote:
Any fixes for this or do I start over with a new database?

Jason

[EMAIL PROTECTED] nutch-nightly]# bin/nutch dedup segments dedup.tmp
040921 214812 Clearing old deletions in segments/20040829092114/index
040921 214812 Clearing old deletions in segments/20040829122947/index
040921 214812 Clearing old deletions in segments/20040829124357/index
040921 214812 Clearing old deletions in segments/20040829130541/index
040921 214813 Clearing old deletions in segments/20040829212107/index
040921 214813 Clearing old deletions in segments/20040829225928/index
040921 214813 Clearing old deletions in segments/20040830042947/index
040921 214813 Clearing old deletions in segments/20040830043001/index
040921 214813 Clearing old deletions in segments/20040830065943/index
040921 214813 Clearing old deletions in segments/20040830111830/index
040921 214816 Reading url hashes...
040921 214816 loading file:/root/nutch-nightly/conf/nutch-default.xml
040921 214816 loading file:/root/nutch-nightly/conf/nutch-site.xml
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index:
12243, Size: 12
  at java.util.ArrayList.RangeCheck(ArrayList.java:507)
  at java.util.ArrayList.get(ArrayList.java:324)
  at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:155)
  at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:66)
  at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:237)
  at
net.nutch.indexer.DeleteDuplicates.computeHashes(DeleteDuplicates.java:182)
  at
net.nutch.indexer.DeleteDuplicates.deleteUrlDuplicates(DeleteDuplicates.java
:149)
  at net.nutch.indexer.DeleteDuplicates.main(DeleteDuplicates.java:264)
[EMAIL PROTECTED] nutch-nightly]#



-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general
Index: src/java/net/nutch/indexer/DeleteDuplicates.java
===================================================================
RCS file: /cvsroot/nutch/nutch/src/java/net/nutch/indexer/DeleteDuplicates.java,v
retrieving revision 1.14
diff -u -r1.14 DeleteDuplicates.java
--- src/java/net/nutch/indexer/DeleteDuplicates.java	8 Sep 2004 16:29:12 -0000	1.14
+++ src/java/net/nutch/indexer/DeleteDuplicates.java	21 Sep 2004 16:44:25 -0000
@@ -204,6 +204,7 @@
     try {
       for (int index = 0; index < readers.length; index++) {
         IndexReader reader = readers[index];
+        LOG.info(" processing index in: " + reader.directory());
         int readerMax = reader.maxDoc();
         indexedDoc.index = index;
         for (int doc = 0; doc < readerMax; doc++) {

Reply via email to