It sounds like you're sorting a segment index after dedup, rather than a
merged index. It also looks like there's a bug in IndexSorter. But you
should be able to work around it by merging your segment indexes after
deduping, so there are no deletions.
Please file a bug in Jira.
Doug
Michael wrote:
When i'm trying to use IndexSorter, i'm getting this error:
Exception in thread "main" java.lang.IllegalArgumentException: attempt to
access a deleted document
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:282)
at
org.apache.lucene.index.FilterIndexReader.document(FilterIndexReader.java:104)
at
org.apache.nutch.indexer.IndexSorter$SortingReader.document(IndexSorter.java:170)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:186)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:579)
at org.apache.nutch.indexer.IndexSorter.sort(IndexSorter.java:240)
at org.apache.nutch.indexer.IndexSorter.main(IndexSorter.java:291)
Anyone knows how to fix this?
Michael
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general