On 13 September 2016 at 20:20, Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Aug 29, 2016 at 4:08 AM, Kyotaro HORIGUCHI > <horiguchi.kyot...@lab.ntt.co.jp> wrote: > > [ new patches ] > > + /* > + * We assume that few nodes are async-aware and async-unaware > + * nodes cannot be revserse-dispatched from lower nodes that > is > + * async-aware. Firing of an async node that is not a > descendant > + * of the planstate will cause such reverse-diaptching to > + * async-aware nodes, which is unexpected behavior for them. > + * > + * For instance, consider an async-unaware Hashjoin(OUTER, > INNER) > + * where the OUTER is running asynchronously but the Hashjoin > is > + * waiting on the async INNER during inner-hash creation. If > the > + * OUTER fires for the case, since anyone is waiting on it, > + * ExecAsyncWaitForNode finally dispatches to the Hashjoin > which > + * is now in the middle of thing its work. > + */ > + if (!IsParent(planstate, node)) > + continue; > > I'm not entirely sure that I understand this comment, but I don't > think it's going in the right direction. Let's start with the example > in the second paragraph. If the hash join is async-unaware, then it > isn't possible for the hash join to be both running the outer side of > the join asynchronously and at the same time waiting on the inner > side. Once it tries to pull the first tuple from the outer side, it's > waiting for that to finish and can't do anything else. So, the inner > side can't possibly get touched in any way until the outer side > finishes. For anything else to happen, the hash join would have to be > async-aware. Even if we did that, I don't think it would be right to > kick off both sides of the hash join at the same time. Right now, if > the outer side turns out to be empty, we never need to build the hash > table, and that's good. > I feel the !IsParent() condition is actually to prevent the infinite wait caused by a re-entrant issue in ExecAsuncWaitForNode() that Kyotaro mentioned earlier. But yes, the comments don't explain exactly how the hash join can cause the re-entrant issue. But I attempted to come up with some testcase which might reproduce the infinite-waiting in ExecAsyncWaitForNode() after removing the !IsParent() check so that the other subtree nodes are also included, but I couldn't reproduce. Kyotaro, is it possible for you to give a testcase that consistently hangs if we revert back the !IsParent() check ? I was also thinking about another possibility where the same plan state node is re-entered, as explained below. > > I don't think it's a good idea to wait for only nodes that are in the > current subtree. For example, consider a plan like this: > > Append > -> Foreign Scan on a > -> Hash Join > -> Foreign Scan on b > -> Hash > -> Seq Scan on x > > Suppose Append and Foreign Scan are parallel-aware but the other nodes > are not. Append kicks off the Foreign Scan on a and then waits for > the hash join to produce a tuple; the hash join kicks off the Foreign > Scan on b and waits for it to return a tuple. If, while we're waiting > for the foreign scan on b, the foreign scan on a needs some attention > - either to produce tuples, or maybe just to call PQconsumeInput() so > that more data can be sent from the other side, I think we need to be > able to do that. There's no real problem here; even if the Append > becomes result-ready before the hash join returns, that is fine. Yes I agree : we should be able to do this. Sine we have all the waiting events in a common estate, there's no harm if we start executing nodes of another sub-tree if we get an event from there. But I am thinking about what would happen when this node from other sub-tree returns result_ready, and then it's parents are called, and then the result gets bubbled up upto the node which had already caused us to call ExecAsyncWaitForNode() in the first place. For e.g., in the above plan which you specified, suppose : 1. Hash Join has called ExecProcNode() for the child foreign scan b, and so is waiting in ExecAsyncWaitForNode(foreign_scan_on_b). 2. The event wait list already has foreign scan on a that is on a different subtree. 3. This foreign scan a happens to be ready, so in ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called, which returns with result_ready. 4. Since it returns result_ready, it's parent node is now inserted in the callbacks array, and so it's parent (Append) is executed. 5. But, this Append planstate is already in the middle of executing Hash join, and is waiting for HashJoin. Is this safe to execute the same plan state when it is already inside its execution ? In other words, is the plan state re-entrant ? I suspect, the new execution may even corrupt the structures with which it was already executing. In usual cases, a tree can contain multiple plan state nodes belonging to the same plan node, but in this case, we are having the same plan state node being executed again while it is already executing. I suspect this can be one reason why Kyotaro might be getting infinite recursion issues. May be we need to prevent a plan state node to re-enter, but allow nodes from any subtree to execute. So propagate the result upwards until we get a node which is already executing.