Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Robert Haas Wed, 12 Oct 2016 11:10:33 -0700

On Fri, Oct 7, 2016 at 5:47 PM, Peter Geoghegan <[email protected]> wrote:
> Work is still needed on:
>
> * Cost model. Should probably attempt to guess final index size, and
> derive calculation of number of workers from that. Also, I'm concerned
> that I haven't given enough thought to the low end, where with default
> settings most CREATE INDEX statements will use at least one parallel
> worker.
>
> * The whole way that I teach nbtsort.c to disallow catalog tables for
> parallel CREATE INDEX due to concerns about parallel safety is in need
> of expert review, preferably from Robert. It's complicated in a way
> that relies on things happening or not happening from a distance.
>
> * Heikki seems to want to change more about logtape.c, and its use of
> indirection blocks. That may evolve, but for now I can only target the
> master branch.
>
> * More extensive performance testing. I think that this V3 is probably
> the fastest version yet, what with Heikki's improvements, but I
> haven't really verified that.


I realize that you are primarily targeting utility commands here, and
that is obviously great, because making index builds faster is very
desirable. However, I'd just like to talk for a minute about how this
relates to parallel query.  With Rushabh's Gather Merge patch, you can
now have a plan that looks like Gather Merge -> Sort -> whatever.
That patch also allows other patterns that are useful completely
independently of this patch, like Finalize GroupAggregate -> Gather
Merge -> Partial GroupAggregate -> Sort -> whatever, but the Gather
Merge -> Sort -> whatever path is very related to what this patch
does.  For example, instead of committing this patch at all, we could
try to funnel index creation through the executor, building a plan of
that shape, and using the results to populate the index.  I'm not
saying that's a good idea, but it could be done.

On the flip side, what if anything can queries hope to get out of
parallel sort that they can't get out of Gather Merge?  One
possibility is that a parallel sort might end up being substantially
faster than Gather Merge-over-non-parallel sort.  In that case, we
obviously should prefer it.  Other possibilities seem a little
obscure.  For example, it's possible that you might want to have all
workers participate in sorting some data and then divide the result of
the sort into equal ranges that are again divided among the workers,
or that you might want all workers to sort and then each worker to
read a complete copy of the output data.  But these don't seem like
particularly mainstream needs, nor do they necessarily seem like
problems that parallel sort itself should be trying to solve.  The
Volcano paper[1], one of the oldest and most-cited sources I can find
for research into parallel execution and with a design fairly similar
to our own executor, describes various variants of what they call
Exchange, of which what we now call Gather is one.  They describe
another variant called Interchange, which acts like a Gather node
without terminating parallelism: every worker process reads the
complete output of an Interchange, which is the union of all rows
produced by all workers running the Interchange's input plan.  That
seems like a better design than coupling such data flows specifically
to parallel sort.

I'd like to think that parallel sort will help lots of queries, as
well as helping utility commands, but I'm not sure it will.  Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] "Volcano - an Extensible and Parallel Query Evaluation System",
https://pdfs.semanticscholar.org/865b/5f228f08ebac0b68d3a4bfd97929ee85e4b6.pdf
[2] See "C. Variants of the Exchange Operator" on p. 13 of [1]


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Reply via email to