> On Mar 6, 2018, at 04:57, Craig Ringer <cr...@2ndquadrant.com> wrote:
> On 3 March 2018 at 00:30, Darafei "Komяpa" Praliaskouski <m...@komzpa.net
> <mailto:m...@komzpa.net>> wrote:
> I gave this all some thought and it looks like it all could have not happened
> if Postgres was able to cluster heap insertions by (id, ts) index. We're ok
> with synchronuous_commit=off, so amplified write won't immediately hit disk
> and can get cooled down in progress. Clustering doesn't require perfect
> sorting: we need to minimize number of pages fetched, it's ok if the pages
> are not consecutive on disk.
> I'm surprised nobody has mentioned BRIN yet.
> Ever since BRIN was introduced, I've thought it would be very interesting to
> use it + the freespace map for coarse-grained tuple routing in heap inserts.
> Try to find the best-match range with BRIN and look for free space there. I
> think there was discussion of this early on, so you may want to look up the
> BRIN threads.
> The main challenge probably comes when a range is exhausted. Your writes
> would start spilling over into new heap pages and get intermixed again. Some
> support for pre-allocating new ranges would be needed.
Basically Poor Man's Clustering solution consists of two parts:
1. Find eligible pages to insert
2. Make sure you don't screw new page
At first we want to define how we want to cluster our data, it can be a mix of
equality, hash, range and geo operators.
In Darafei's case appropriate clustering would be: CLUSTER BY (hash(id), ts
asc) or (eq(id), range(ts, '1 day interval')).
To support fast search of pages to insert, there should be present Bloom index
on id and BRIN or Btree index on ts
for the first example, and Btree or Hash on id and Btree or BRIN or Gist for
So, we can bitmap scan all needed indexes to find pages with tuples with
already present clustered keys,
we AND those bitmaps and consult fsm and proceed to insert in pages if place
Now, if we have not found pages with the same cluster keys or the pages are
full, we need to find free page and don't screw it.
To do so, we must estimate ideal clustering of a page and either insert or make
a completely new page.
In order not to intermix page with hash or equality clustering, we should
examine already present keys and consult statistics for them.
If key is in MCV - better find new page, otherwise we need to calculate how
many tuples per page on average we have for the key
and compare with number of tuples on the page with our new key. I.e. if the
page we about to insert have 30 tuples with 3 clustering keys,
and we conclude that on average we have 20 tuples per key this means that there
would be 30 more tuples with that keys and 20 for our key.
If the page can hold that 80 tuples, we insert there. I believe existing
statistics are enough to make this work.
For range based clustering we should estimate we should estimate how many
tuples there would be on a page and its interval and compare
if we would break this interval. For example we have plain ordering clustering,
we insert value 130, the page have min value 100 and number
of tuples is 20. We estimate that on average a page max value minus min value
is 100 and number of tuples is 30. So each tuple increments
range by 3. So our page with min value of 100 and 20 tuples have future closing
max value of 200, as our value is 130 we proceed to insert.
I don't know if there are enough stats for this already, or if BRIN can help
here. But it is doable.
Now for the geo part. This is the least thought through. We definitely need an
index to consult MBR and use gist penalty functions to decide.
Do you think the above rough design is doable and desirable?