On Tue, Aug 2, 2016 at 3:41 AM, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote: > Thank you for the comment. > > At Mon, 1 Aug 2016 10:44:56 +0530, Amit Khandekar <amitdkhan...@gmail.com> > wrote in <caj3gd9ek4y4sgtsuc_pzkgywlmbrc9qom7m1d8bj99jnw16...@mail.gmail.com> >> On 21 July 2016 at 15:20, Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp >> > wrote: >> >> > >> > After some consideration, I found that ExecAsyncWaitForNode >> > cannot be reentrant because it means that the control goes into >> > async-unaware nodes while having not-ready nodes, that is >> > inconsistent state. To inhibit such reentering, I allocated node >> > identifiers in depth-first order so that ascendant-descendant >> > relationship can be checked (nested-set model) in simple way and >> > call ExecAsyncConfigureWait only for the descendant nodes of the >> > parameter planstate. >> > >> > >> We have estate->waiting_nodes containing a mix of async-aware and >> non-async-aware nodes. I was thinking, an asynchrony tree would have only >> async-aware nodes, with possible multiple asynchrony sub-trees in a tree. >> Somehow, if we restrict the bubbling up of events only upto the root of the >> asynchrony subtree, do you think we can simplify some of the complexities ? > > The current code prohibiting regsitration of nodes outside the > current subtree to avoid the reentring-disaster. > > Indeed leaving the "waiting node" mark or something like on every > root node at the first visit will enable the propagation to stop > upto the root of any async-subtree. Neverheless, when an > async-child in an inactive async-root fires, the new tuple is > loaded but is not consumed then the succeeding firing on the same > child leads to a dead-lock (without result queueing). However, > that can be avoided if ExecAsyncConfigureWait doesn't register > nodes in ready state.
Why would a node call ExecAsyncConfigureWait in the first place if it already had a result ready? I think it shouldn't do that. > On the other hand, any two or more asynchronous nodes can share a > syncronization object. For instance, multiple postgres_fdw scan > node can share one server connection and only one of them can get > into waitable state at once. If no async-child in the current > async subtree is waitable, it must be stuck. So I think it is > crucial for ExecAsyncWaitForNode to force at least one child *in > the current async subtree* to get into waiting state for such > situation. The ascendant-descendant relationship is necessary to > do that anyway. This is another example of a situation where waiting only for nodes within a subtree causes problems. Suppose there are two Foreign Scans in completely different parts of the plan tree that are going to use, in alternation, the same connection to the same remote server. When we encounter the first one, it kicks off the query, uses ExecAsyncConfigureWait to register itself as waiting, and returns without becoming ready. When we encounter the second one, it can't kick off the query and therefore has no chance of becoming ready until after the first one has finished with the connection. Suppose we then wait for the second Foreign Scan. Well, we had better wait for the first one, too! If we don't, it will never finish with the connection, so the second node will never get to use it, and now we're in trouble. I think what we need is for the ConnCacheEntry to have a place to note the ForeignScanState that is using the connection and any other PlanState objects that would like to use it. When one ForeignScanState is done with the ConnCacheEntry, it activates the next one, which then takes over. That seems simple enough, but there's a problem here for suspended queries: if we stop executing a plan while some scan within that plan is holding onto a ConnCacheEntry, and then we run some other query that wants to use the same one, we've got a problem. Maybe we can get by with letting the other query finish running and then executing our own query, but that might be messy to implement. Another idea is to somehow let any in-progress query finish running before allowing the first query to be suspended; that would need some new infrastructure. My main point here is that I think waiting for only a subtree is an idea that cannot work out well. Whatever problems are pushing you into that design, we need to confront those problems directly and fix them. There shouldn't be any unsolvable problems in waiting for everything in the whole query, and I'm pretty sure that's going to be a more elegant and better-performing design. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers