On Mon, May 15, 2017 at 9:23 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Mon, May 15, 2017 at 10:06 AM, Haribabu Kommi > <kommi.harib...@gmail.com> wrote: >> This still needs some adjustments to fix for the cases where >> the main backend also does the scan instead of waiting for >> the workers to finish the job. As increasing the workers logic >> shouldn't add an overhead in this case. > > I think it would be pretty crazy to try relaunching workers after > every tuple, as this patch does. The overhead of that will be very > high for queries where the number of tuples passing through the Gather > is large, whereas when the number of tuples passing through Gather is > small, or where tuples are sent all at once at the end of procesisng, > it will not actually be very effective at getting hold of more > workers.
+1 > A different idea is to have an area in shared memory where > queries can advertise that they didn't get all of the workers they > wanted, plus a background process that periodically tries to launch > workers to help those queries as parallel workers become available. > It can recheck for available workers after some interval, say 10s. > There are some problems there -- the process won't have bgw_notify_pid > pointing at the parallel leader -- but I think it might be best to try > to solve those problems instead of making it the leader's job to try > to grab more workers as we go along. For one thing, the background > process idea can attempt to achieve fairness. Suppose there are two > processes that didn't get all of their workers; one got 3 of 4, the > other 1 of 4. When a worker becomes available, we'd presumably like > to give it to the process that got 1 of 4, rather than having the > leaders race to see who grabs the new worker first. Similarly if > there are four workers available and two queries that each got 1 of 5 > workers they wanted, we'd like to split the workers two and two, > rather than having one leader grab all four of them. Or at least, I > think that's what we want. +1 for a separate process distributing workers. But, I am not sure whether we want to spend a full background process doing this. User is expected to configure enough parallel worker so that every parallel query gets enough of them under normal circumstances, so that process may not find anybody to dispatch idle worker to. But then the question is which process should do that job, postmaster, since it's the one which spawns workers when they die? But postmaster itself is quite busy to also execute the balancing algorithm. So, may be a new background worker is indeed needed. Also, looking at the patch, it doesn't look like it take enough care to build execution state of new worker so that it can participate in a running query. I may be wrong, but the execution state initialization routines are written with the assumption that all the workers start simultaneously? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers