On Tue, Apr 4, 2017 at 4:13 PM, Andres Freund <and...@anarazel.de> wrote: > I'm quite unconvinced that just throwing a log() in there is the best > way to combat that. Modeling the issue of starting more workers through > tuple transfer, locking, startup overhead costing seems a better to me.
Knock yourself out. There's no doubt that the way the number of parallel workers is computed is pretty stupid right now, and it obviously needs to get a lot smarter before we can consider doing things like throwing 40 workers at a query. If you throw 2 or 4 workers at a query and it turns out that it doesn't help much, that's sad, but if you throw 40 workers at a query and it turns out that it doesn't help much, or even regresses, that's a lot sadder. The existing system does try to model startup and tuple transfer overhead during costing, but only as a way of comparing parallel plans to each other or to non-parallel plans, not to work out the right number of workers. It also does not model contention, which it absolutely needs to do. I was kind of hoping that once the first version of parallel query was committed, other developers who care about the query planner would be motivated to improve some of this stuff, but so far that hasn't really happened. This release adds a decent number of new execution capabilities, and there is a lot more work to be done there, but without some serious work on the planner end of things I fear we're never going to be able to get more than ~4x speedup out of parallel query, because we're just too dumb to know how many workers we really ought to be using. That having been said, I completely and emphatically disagree that this patch ought to be required to be an order-of-magnitude smarter than the existing logic in order to get committed. There are four main things that this patch can hope to accomplish: 1. If we've got an Append node with children that have a non-zero startup cost, it is currently pretty much guaranteed that every worker will pay the startup cost for every child. With Parallel Append, we can spread out the workers across the plans, and once a plan has been finished by however many workers it got, other workers can ignore it, which means that its startup cost need not be paid by those workers. This case will arise a lot more frequently once we have partition-wise join. 2. When the Append node's children are partial plans, spreading out the workers reduces contention for whatever locks those workers use to coordinate access to shared data. 3. If the Append node represents a scan of a partitioned table, and the partitions are on different tablespaces (or there's just enough I/O bandwidth available in a single tablespace to read more than one of them at once without slowing things down), then spreading out the work gives us I/O parallelism. This is an area where some experimentation and benchmarking is needed, because there is a possibility of regressions if we run several sequential scans on the same spindle in parallel instead of consecutively. We might need to add some logic to try to avoid this, but it's not clear how that logic should work. 4. If the Append node is derived from a UNION ALL query, we can run different branches in different processes even if the plans are not themselves able to be parallelized. This was proposed by Stephen among others as an "easy" case for parallelism, which was maybe a tad optimistic, but it's sad that we're going to release v10 without having done anything about it. All of those things (except possibly #3) are wins over the status quo even if the way we choose the number of workers is still pretty dumb. It shouldn't get away with being dumber than what we've already got, but it shouldn't be radically smarter - or even just radically different because, if it is, then the results you get when you query a partitioned table will be very different from what you get when you query an partitioned table, which is not sensible. I very much agree that doing something smarter than log-scaling on the number of workers is an a good project for somebody to do, but it's not *this* project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers