Doug,
I cannot reproduce this.
I was able to reproduce it on different system several times.
Important is that you use at least two boxes.
Create a crawldb with may 100 000 entries.
Generate a segment from this without limitations and count the
entries in the fresh generated segment.
I had written a own tool testing this using sequence file reader,
you will see that the generated segment is around 50 000 enties not
100 000.
The problem is somehow related to the two boxes.
If you like I can write a test that makes the problem reproducible,
but it may takes some time since there is just to much in the queue.
Stefan
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers