On Mon, Jan 6, 2014 at 3:42 PM, Michael Sokolov
<msoko...@safaribooksonline.com> wrote:
> I think the key optimization when there are no deletions is that you don't
> need to renumber documents and can bulk-copy blocks of contiguous documents,
> and that is independent of merge policy. I think :)

Merging of term vectors and stored fields will always use bulk-copy
for contiguous chunks of non-deleted docs, so for the append-only case
these will be the max chunk size and be efficient.

We have no codec that implements bulk merging for postings, which
would be interesting to pursue: in the append-only case it's possible,
and merging of postings is normally by far the most time consuming
step of a merge.

Also, no RAM will be used holding the doc mapping, since the docIDs
don't change.

These benefits are independent of the MergePolicy.

I think TieredMergePolicy will work fine for append-only; I'm not sure
how you'd improve on its approach.  It will in general renumber the
docs, so if that's a problem, apps should use LogByteSizeMP.

Mike McCandless

http://blog.mikemccandless.com

Reply via email to