On Mon, Jan 6, 2014 at 3:42 PM, Michael Sokolov <msoko...@safaribooksonline.com> wrote: > I think the key optimization when there are no deletions is that you don't > need to renumber documents and can bulk-copy blocks of contiguous documents, > and that is independent of merge policy. I think :)
Merging of term vectors and stored fields will always use bulk-copy for contiguous chunks of non-deleted docs, so for the append-only case these will be the max chunk size and be efficient. We have no codec that implements bulk merging for postings, which would be interesting to pursue: in the append-only case it's possible, and merging of postings is normally by far the most time consuming step of a merge. Also, no RAM will be used holding the doc mapping, since the docIDs don't change. These benefits are independent of the MergePolicy. I think TieredMergePolicy will work fine for append-only; I'm not sure how you'd improve on its approach. It will in general renumber the docs, so if that's a problem, apps should use LogByteSizeMP. Mike McCandless http://blog.mikemccandless.com