> I did a little bit of brief experimentation on this same topic a long
> time ago and didn't see an improvement from boosting the queue size
> beyond 64k but Rafia is testing Gather rather than Gather Merge and,
> as I say, my test was very brief.  I think it would be a good idea to
> try to get a complete picture here.  Does this help on any query that
> returns many tuples through the Gather?  Only the ones that use Gather
> Merge?  Some queries but not others with no obvious pattern?  Only
> this query?

Yes, we need to get answers to those questions. I guess, performance
measurements varying one parameter at a time would help us make right
decision. Some of the relevant parameters I could quickly think of
are: number of tuples received by Gather, size of tuples,
Gather/Gather merge, number of workers. There may be more. I am
guessing that the number of tuples that can fit the queue = (size of
queue - headers)/(size of tuple + per tuple header). Higher the size
of tuple, lesser is the number of tuples that a worker can queue up
and so higher the chances that it will wait for the leader to empty
the queue. For gather merge that varies a lot depending upon how the
data is distributed across workers and it's probably more susceptible
to variations in the number of tuples that fit in the queue. More the
number of workers, busier will be the leader and thus more chance of
workers waiting for the leader to empty the queue. But in that case a
balancing effect will be that each worker will queue lesser number of
rows. Measurements would help us see how these balancing factors play
out actually.

> Blindly adding a GUC because we found one query that would be faster
> with a different value is not the right solution.   If we don't even
> know why a larger value is needed here and (maybe) not elsewhere, then
> how will any user possibly know how to tune the GUC?  And do we really
> want the user to have to keep adjusting a GUC before each query to get
> maximum performance?  I think we need to understand the whole picture
> here, and then decide what to do.  Ideally this would auto-tune, but
> we can't write code for that without a more complete picture of the
> behavior.

We will need a way for user to cap the memory allocated and GUC looks
like a better way to do that. I agree that the GUC as a tuning
parameter will be much less useful.

> BTW, there are a couple of reasons I originally picked 64k here.  One
> is that making it smaller was very noticeably terrible in my testing,
> while making it bigger didn't help much.  The other is that I figured
> 64k was small enough that nobody would care about the memory
> utilization.  I'm not sure we can assume the same thing if we make
> this bigger.  It's probably fine to use a 6.4M tuple queue for each
> worker if work_mem is set to something big, but maybe not if work_mem
> is set to the default of 4MB.

AFAIK, work_mem comes from memory private to the process whereas this
memory will come from the shared memory pool. There are different OS
level settings for those and thus linking size of parallel tuple queue
with work_mem may not always work. But I agree that size of work_mem
is indicative of size of data that needs to be processed in general
and hence can be used as a good estimate of required size of the

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to