On 4 October 2016 at 11:47, Heikki Linnakangas <hlinn...@iki.fi> wrote:

> When we finish writing an initial run to a tape, we keep the tape buffers
> around. That's the reason we need to reserve that memory. But there's no
> fundamental reason we have to do that. As I suggested in [2], we could flush
> the buffers to disk, and only keep in memory the location of the last block
> we wrote. If we need to write another run to the tape, we can reload the
> last incomplete block from the disk, and continue writing.
> Reloading the last block, requires an extra I/O. That's OK. It's quite rare
> to have a multi-pass sort these days, so it's a good bet that you don't need
> to do it. And if you have a very small work_mem, so that you need to do a
> multi-pass sort, having some more memory available for building the initial
> runs is probably worth it, and there's also a good chance that the block is
> still in the OS cache.

Sounds like a good idea.

Why not just make each new run start at a block boundary?
That way we waste on average BLCKSZ/2 disk space per run, which is
negligible but we avoid any need to have code to read back in the last

That also makes it easier to put each run in its own file, which will
surely help with parallelism and compression since we can spread
multiple run files across multiple temp tablespaces.

Can we also please read in the value from the last tuple in a run, so
we have min and max values in memory for each run? That can be used
during the merge step to avoid merging runs unless the value ranges

Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to