On 3/10/2016 4:06 PM, Steven White wrote: > Last question on this topic (maybe), wouldn't a commit at the very end take > too long on a 1 billion items? Wouldn't a commit every, lets say 10,000 > items be more efficient?
The behavior that I have witnessed suggests that commit speed on a well-tuned index depends more on the autowarm config than anything else. The total size of the index might make a difference, but I suspect that the slow commit times I've seen on large shards are just from the autowarming -- each warming query takes longer if the index is large. If you have the autoCommit config I recommended, the "last" commit should be very fast, because those auto commits will flush data to disk as you index, and the final manual commit should only need to deal with data that has not yet been flushed. More info than you wanted (TL;DR): Even if you don't do the autoCommit, you'll find that indexing tons of data without any commit at all *will* cause older segments to be flushed to disk ... but the transaction logs won't be rotated, and that's a whole separate problem. Thanks, Shawn