On 12/5/2012 9:19 AM, Erick Erickson wrote:
Probably what you're seeing is that as segments are merged, deleted
documents are purged.
As to how the deleted docs got there in the first place, were you using an
index that had been populated before?
After sleeping on it, I also realized that it was merges removing the
deleted docs. Then I read your message confirming the idea.
The first thing the indexing program does to all build cores before
kicking off DIH is deleteByQuery("*:*"), commit, and optimize. The
full-import is not called with clean=false, so that should be another
thing that wipes the index.
After the import is done and my solrj app makes things completely
current, if I do a count(*) on the database table, I do get the same
number (78626805 at this moment) as when I do a distributed search for
*:* on Solr. The anomaly during import concerns me, but it doesn't
appear to be causing any real problems.
Thanks,
Shawn