It sounds like you're sorting a segment index after dedup, rather than a merged index. It also looks like there's a bug in IndexSorter. But you should be able to work around it by merging your segment indexes after deduping, so there are no deletions.

Please file a bug in Jira.

Doug

Michael wrote:
When i'm trying to use IndexSorter, i'm getting this error:

Exception in thread "main" java.lang.IllegalArgumentException: attempt to 
access a deleted document
        at 
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:282)
        at 
org.apache.lucene.index.FilterIndexReader.document(FilterIndexReader.java:104)
        at 
org.apache.nutch.indexer.IndexSorter$SortingReader.document(IndexSorter.java:170)
        at 
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:186)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
        at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:579)
        at org.apache.nutch.indexer.IndexSorter.sort(IndexSorter.java:240)
        at org.apache.nutch.indexer.IndexSorter.main(IndexSorter.java:291)
Anyone knows how to fix this?
Michael



-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to