On Wed, Jan 17, 2018 at 8:53 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > At present, one major use of Cursors is in postgres_fdw. > > In that usage, the query executes and returns all the rows. No other > side execution is possible.
True, although foreign tables using postgres_fdw can't be scanned in parallel for other reasons anyway -- it's the user backend, not any workers we might spin up, that has the connection to the remote server. Also, I'm talking about the external-facing use of cursors by users, not the fact that postgres_fdw uses them internally to talk to other machines. > How do we make parallel query work for Cursors, if not by Tomas' proposal? > > What more restrictive proposal would you prefer? In all honestly, if I knew how to fix this problem with a reasonable amount of work, I would have done it already. It's a very hard problem. One idea I've thought about is providing some kind of infrastructure for workers to detach from a parallel query. This could be useful in a variety of situations, including cursors. Let's say the leader is either (a) suspending execution of the query, because it's a cursor, or (b not able to absorb rows as fast as workers are generating them. In the former situation, we'd like to get rid of all workers; in the latter situation, some workers. In the former situation, getting all workers to shut down cleanly would let us exit parallel mode (and perhaps re-enter it later when we resume execution of the query). In the latter situation, we could avoid wasting workers on queries where the leader can't keep up so that those worker slots are available to other queries that can benefit from them. However, this is not simple. In the case of a parallel sequential scan, once a worker claims a page, it must scan all the tuples on that page. No other backend will ever get that page, and therefore if the backend that did claim it doesn't scan the whole thing, the query may return the wrong answer. Every other parallel-aware executor node we have has similar problems: there are points where we could safely stop without changing the final answer to the query, and other points where it is not safe to stop. One idea is to provide a new callback for parallel-aware executor nodes that tells them to stop returning tuples at the next safe stop point. When we want to get rid of workers, we somehow signal them to invoke this method on every node in the plan tree; at some point, they'll drop off but the query will still return the right answer. In the worst case, however, the worker might emit an arbitrarily large number of tuples before reaching a safe stop point and exiting (imagine a parallel-sequential scan cross-joined to generate_series(1, 100000000000) or the equivalent), which seems pretty crappy. Or we could go the other way and try to keep the workers running. I don't really like that because it ties down those workers for potentially a long period of time, but that might be acceptable for some users. The main implementation problem is that somehow we'd need to propagate to them an updated version of any state that has changed while the query was suspended, such as new combo CIDs that have been created in the meantime. dshash seems like a useful tool toward such a goal, but there are details to work out, and there are similar problems with everything else that is copied from leader to workers. We could possibly prevent these problems from arising by placing draconian restrictions on what a backend is allowed to do while a parallel cursor is open, such as in your follow-on proposal to lock out everything except FETCH. I'm not really that excited about such a thing because it's extremely limiting and still doesn't solve all the problems: in particular, after BEGIN ... DECLARE CURSOR PARALLEL ... FETCH ... FETCH ... syntax error, there is going to be trouble around the state of group locking. It will be very bad if the workers think the transaction is still alive and the leader thinks it is in a new transaction and they're all sharing locks. You also have to worry about what things can be accomplished by protocol messages, not just what can be done via SQL. But it's probably feasible with enough work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company