Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Heikki Linnakangas Thu, 22 Sep 2016 12:52:58 -0700

On 08/02/2016 01:18 AM, Peter Geoghegan wrote:

No merging in parallel
----------------------


Currently, merging worker *output* runs may only occur in the leader
process. In other words, we always keep n worker processes busy with
scanning-and-sorting (and maybe some merging), but then all processes
but the leader process grind to a halt (note that the leader process
can participate as a scan-and-sort tuplesort worker, just as it will
everywhere else, which is why I specified "parallel_workers = 7" but
talked about 8 workers).

One leader process is kept busy with merging these n output runs on
the fly, so things will bottleneck on that, which you saw in the
example above. As already described, workers will sometimes merge in
parallel, but only their own runs -- never another worker's runs. I
did attempt to address the leader merge bottleneck by implementing
cross-worker run merging in workers. I got as far as implementing a
very rough version of this, but initial results were disappointing,
and so that was not pursued further than the experimentation stage.

Parallel merging is a possible future improvement that could be added
to what I've come up with, but I don't think that it will move the
needle in a really noticeable way.

It'd be good if you could overlap the final merges in the workers withthe merge in the leader. ISTM it would be quite straightforward toreplace the final tape of each worker with a shared memory queue, sothat the leader could start merging and returning tuples as soon as itgets the first tuple from each worker. Instead of having to wait for allthe workers to complete first.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

Reply via email to