Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future

Peter Geoghegan Wed, 10 Aug 2016 14:14:51 -0700

On Wed, Aug 10, 2016 at 12:08 PM, Claudio Freire <[email protected]> wrote:
> I think it's a great design, but for that, per-worker final tapes have
> to always be random-access.


Thanks. I don't think I need to live with the randomAccess
restriction, because I can be clever about reading only the first
tuple on each logtape.c block initially. Much later, when the binary
search gets down to seeking within a single block, everything in the
block can be read at once into memory, and we can take the binary
search to that other representation. This latter part only needs to
happen once or twice per partition boundary per worker.

> I'm not hugely familiar with the code, but IIUC there's some penalty
> to making them random-access right?

Yeah, there is. For one thing, you have to store the length of the
tuple twice, to support incremental seeking in both directions. For
another, you cannot perform the final merge on-the-fly; you must
produce a serialized tape as output, which is used subsequently to
support random seeks. There is no penalty when you manage to do the
sort in memory, though (not that that has anything to do with parallel
sort).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort, partitioning, merging, and the future

Reply via email to