> > I did a little bit of brief experimentation on this same topic a long > time ago and didn't see an improvement from boosting the queue size > beyond 64k but Rafia is testing Gather rather than Gather Merge and, > as I say, my test was very brief. I think it would be a good idea to > try to get a complete picture here. Does this help on any query that > returns many tuples through the Gather? Only the ones that use Gather > Merge? Some queries but not others with no obvious pattern? Only > this query?
Yes, we need to get answers to those questions. I guess, performance measurements varying one parameter at a time would help us make right decision. Some of the relevant parameters I could quickly think of are: number of tuples received by Gather, size of tuples, Gather/Gather merge, number of workers. There may be more. I am guessing that the number of tuples that can fit the queue = (size of queue - headers)/(size of tuple + per tuple header). Higher the size of tuple, lesser is the number of tuples that a worker can queue up and so higher the chances that it will wait for the leader to empty the queue. For gather merge that varies a lot depending upon how the data is distributed across workers and it's probably more susceptible to variations in the number of tuples that fit in the queue. More the number of workers, busier will be the leader and thus more chance of workers waiting for the leader to empty the queue. But in that case a balancing effect will be that each worker will queue lesser number of rows. Measurements would help us see how these balancing factors play out actually. > > Blindly adding a GUC because we found one query that would be faster > with a different value is not the right solution. If we don't even > know why a larger value is needed here and (maybe) not elsewhere, then > how will any user possibly know how to tune the GUC? And do we really > want the user to have to keep adjusting a GUC before each query to get > maximum performance? I think we need to understand the whole picture > here, and then decide what to do. Ideally this would auto-tune, but > we can't write code for that without a more complete picture of the > behavior. We will need a way for user to cap the memory allocated and GUC looks like a better way to do that. I agree that the GUC as a tuning parameter will be much less useful. > > BTW, there are a couple of reasons I originally picked 64k here. One > is that making it smaller was very noticeably terrible in my testing, > while making it bigger didn't help much. The other is that I figured > 64k was small enough that nobody would care about the memory > utilization. I'm not sure we can assume the same thing if we make > this bigger. It's probably fine to use a 6.4M tuple queue for each > worker if work_mem is set to something big, but maybe not if work_mem > is set to the default of 4MB. AFAIK, work_mem comes from memory private to the process whereas this memory will come from the shared memory pool. There are different OS level settings for those and thus linking size of parallel tuple queue with work_mem may not always work. But I agree that size of work_mem is indicative of size of data that needs to be processed in general and hence can be used as a good estimate of required size of the queue. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers