> -----Original Message----- > From: Kyotaro HORIGUCHI [mailto:horiguchi.kyot...@lab.ntt.co.jp] > Sent: Wednesday, October 14, 2015 4:40 PM > To: Kaigai Kouhei(海外 浩平) > Cc: fujita.ets...@lab.ntt.co.jp; email@example.com; > shigeru.han...@gmail.com; robertmh...@gmail.com > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > Hello, > > At Wed, 14 Oct 2015 03:07:31 +0000, Kouhei Kaigai <kai...@ak.jp.nec.com> wrote > in <9a28c8860f777e439aa12e8aea7694f801157...@bpxm15gp.gisp.nec.co.jp> > > > I noticed that the approach using a column to populate the foreign > > > scan's slot directly wouldn't work well in some cases. For example, > > > consider: > > > > > > SELECT * FROM verysmall v LEFT JOIN (bigft1 JOIN bigft2 ON bigft1.x = > > > bigft2.x) ON v.q = bigft1.q AND v.r = bigft2.r FOR UPDATE OF v; > > > > > > The best plan is presumably something like this as you said before: > > > > > > LockRows > > > -> Nested Loop > > > -> Seq Scan on verysmall v > > > -> Foreign Scan on bigft1 and bigft2 > > > Remote SQL: SELECT * FROM bigft1 JOIN bigft2 ON bigft1.x = > > > bigft2.x AND bigft1.q = $1 AND bigft2.r = $2 > > > > > > Consider the EvalPlanQual testing to see if the updated version of a > > > tuple in v satisfies the query. If we use the column in the testing, we > > > would get the wrong results in some cases. > > I have a basic (or maybe silly) qustion. Is it true that the > join-inner (the foreignscan in the example) is re-executed with > the modified value of v.r? I observed for a join case among only > local tables that previously fetched tuples for the inner are > simplly reused regardless of join types. Even when a refetch > happens (I haven't confirmed but it would occur in the case of no > security quals), the tuple is pointed by ctid so the re-join > between local and remote would fail. Is this wrong? > Let's dive into ExecNestLoop(). Once nl_NeedNewOuter is true, ExecProcNode(outerPlan) is called then ExecReScan(innerPlan) is called with new param-info delivered from the outer-tuple.
nl_NeedNewOuter is reset just after ExecProcNode(outerPlan), then it is set once outer-tuple is needed again when inner-scan reached to end of the relation, or found a tuple on semi-join. In case of semi-join returned a joined-tuple then EPQ recheck is applied, it can call ExecProcNode(outerPlan) and reset inner-plan state. It is what I can say from the existing code. I doubt whether the behavior is right on EPQ rechecks. The above scenario introduces the inner-relation (verysmall) is updated by the concurrent session, thus param-info has to be updated. However, it does not looks to me the implementation pays attention here. If ExecNestLoop() is called under the EPQ recheck context, it needs to call ExecProcNode() towards both of outer and inner plan to ensure the visibility of joined-tuple towards the latest status. Of course, underlying scan plans for base relations never make advance the scan pointer. It just returns a tuple in EPQ slot, then I want ExecNestLoop() to evaluate whether these tuples satisfies the join-clause. > > In this case, does ForeignScan have to be reset prior to ExecProcNode()? > > Once ExecReScanForeignScan() gets called by ExecNestLoop(), it marks EPQ > > slot is invalid. So, more or less, ForeignScan needs to kick the remote > > join again based on the new parameter come from the latest verysmall tuple. > > Please correct me, if I don't understand correctly. > > So, no rescan would happen for the cases, I think. ReScan seems > to be kicked only for the new(next) outer tuple that causes > change of parameter, but not kicked for EPQ. I might take you > wrongly.. > > > In case of unparametalized ForeignScan case, the cached join-tuple work > > well because it is independent from verysmall. > > > > Once again, if FDW driver is responsible to construct join-tuple from > > the base relation's tuple cached in EPQ slot, this case don't need to > > kick remote query again, because all the materials to construct join- > > tuple are already held locally. Right? > > It is definitely right and should be doable. But I think the > point we are argueing here is what is the desirable behavior. > In case of scanrelid==0, expectation to ForeignScan/CustomScan is to behave as if local join exists here. It requires ForeignScan to generate joined-tuple as a result of remote join, that may contains multiple junk TLEs to carry whole-var references of base foreign tables. According to the criteria, the desirable behavior is clear as below: 1. FDW/CSP picks up base relation's tuple from the EPQ slots. It shall be setup by whole-row reference if earlier row-lock semantics, or by RefetchForeignRow if later row-lock semantics. 2. Fill up ss_ScanTupleSlot according to the xxx_scan_tlist. We may be able to provide a common support function here, because this list keeps relation between a particular attribute of the joined-tuple and its source column. 3. Apply join-clause and base-restrict that were pushed down. setrefs.c initializes expressions kept in fdw_exprs/custom_exprs to run on the ss_ScanTupleSlot. It is the easiest way to check here. 4. If joined-tuple is still visible after the step 3, FDW/CSP returns joined-tuple. Elsewhere, returns an empty slot. It is entirely compatible behavior even if local join is located on the point of ForeignScan/CustomScan with scanrelid==0. Even if remote join is parametalized by other relation, we can simply use param-info delivered from the corresponding outer scan at the step-3. EState should have the parameters already updated, FDW driver needs to care about nothing. It is quite less invasive approach towards the existing EPQ recheck mechanism. I cannot understand why Fujita-san never "try" this approach. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kai...@ak.jp.nec.com> -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers