2013/11/12 Claudio Freire <klaussfre...@gmail.com>: > On Tue, Nov 12, 2013 at 6:41 PM, Nicolas Barbier > <nicolas.barb...@gmail.com> wrote: > >> (Note that K B-trees can be merged by simply scanning all of them >> concurrently, and merging them just like a merge sort merges runs. >> Also, all B-trees except for the first level (of size S) can be >> compacted 100% as there is no need to reserve space for further >> insertions in them.) > > Unless you can guarantee strong correlation of index-order vs > physical-order, scanning multiple indexes in index-order will be quite > slow (random I/O).
As all the bigger trees are written in one pass (as the result of a merge of multiple smaller trees), I would assume that it is rather easy to guarantee it for them. As for the smallest trees (size S), I think it doesn’t matter much as they “fit easily in memory.” Initially I would say that redefining it so that K of them (rather than 1) must still fit in memory is the easy fix. A future optimization could alleviate the need for the redefinition (and would also improve normal B-tree indexes): Somehow make sure that smaller trees (that fit in memory) are typically written out more-or-less in the right order. For that, one could for example postpone determining the ultimate block-order to write-out time. This is similar to what Reiser4 calls “dancing trees,” but unfortunately requires some rejiggering of the abstraction layers on PostgreSQL (I think). Having deferred insertion (which is probably way easier to implement) could conceivably also improve things. <URL:https://en.wikipedia.org/wiki/Dancing_tree> Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers