Oops, you're right, term listings and counts for deleted docs are adjusted during merges. I had the impression that optimize had some special powers here that merge does not.
Thank you for bringing expungeDeletes to my attention. On Sat, Nov 21, 2009 at 7:46 AM, Yonik Seeley <yo...@lucidimagination.com> wrote: > On Sat, Nov 21, 2009 at 12:33 AM, Lance Norskog <goks...@gmail.com> wrote: >> And, terms whose documents have been deleted are not purged. So, you >> can merge all you like and the index will not shrink back completely. > > Under what conditions? Certainly not all, since I just tried a simple > test and a merge removed the terms that were no longer in any > documents just fine. > >> This is important because the orphan terms affect relevance >> calculations. > > Marking a document as deleted don't affect any term statistics (which > idf uses) until the document is actually removed (which can happen via > a merge, optimize, or expungeDeletes). That's a lucene limitation > unrelated to how many of a terms documents have been deleted. But > perhaps I don't understand how you're using the term "orphan terms". > > -Yonik > http://www.lucidimagination.com > -- Lance Norskog goks...@gmail.com