On 2016/12/13 23:13, Ashutosh Bapat wrote:
On Tue, Dec 13, 2016 at 4:15 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
I believe there are probably more problems here, or at least if there
aren't, it's not clear why not.  Because of GetExistingLocalJoinPath's
lack of curiosity about what's underneath the join pathnode it picks,
it seems to me that it's possible for it to return a path tree that
*isn't* all local joins.  If we're looking at, say, a hash path for
a 4-way join, whose left input is a hash path for a 3-way join, whose
left input is a 2-way foreign join, what's stopping that from being
returned as a "local" path tree?

Right, the so called "local join tree" can contain ForeignPath
representing joins between foreign relations. AFAIU what protects us
from getting into problem is that postgresRecheckForeignScan calls
ExecProcNode() on the outer plan. So, even if there are ForeignPaths
in fdw_outerpath tree, corresponding plans would use a local join plan
to apply EPQ recheck. The code in GetExistingLocalJoinPath() avoids
the redirection at least for the inner or outer ForeignPaths.

I think Ashutosh is right.

Any
comment claiming that path tree under fdw_outerpath is entirely
"local" should be removed, if it survives the fix.

+1

Likewise, it seems like the code is trying to reject any custom-path join
types, or at least this barely-intelligible comment seems to imply that:

                 * Just skip anything else. We don't know if corresponding
                 * plan would build the output row from whole-row references
                 * of base relations and execute the EPQ checks.

But this coding fails to notice any such join type that's below the
level of the immediate two join inputs.

I wrote the first version of GetExistingLocalJoinPath (and postgresRecheckForeignScan), to verify that the changes to core for pushing down foreign/custom joins in 9.5 would work well for EPQ rechecks. And I rejected custom paths in that version because I intended to use that function not only for foreign joins but custom joins. (IIRC, for 9.6, I proposed to put GetExistingLocalJoinPath in a more common area such as src/backend/optimizer/path/joinpath.c, but that was ignored...)

I kind of wonder why this infrastructure exists at all; it's not the way
I'd have foreseen handling EPQ for remote joins.  However, while "throw
it away and start again" might be good advice going forward, I suppose
it won't be very popular for applying to 9.6.

One way that we could make things better is to rely on the knowledge
that EPQ isn't asked to evaluate joins for more than one row per input
relation, and therefore using merge or hash join technology is really
overkill.  We could make a tree of simple nestloop joins, which aren't
going to care about sort order, if we could identify the correct join
clauses to apply.  At least some of that could be laid off on the FDW,
which if it's gotten this far at all, ought to know what join clauses
need to be enforced by the foreign join.  So I'm thinking a little bit
in terms of "just collect the foreign scans for all the base rels
included in the join and then build a cross-product nestloop join tree,
applying all the join clauses at the top".  This would have the signal
value that it's guaranteed to succeed and so can be left for later,
rather than having to expensively redo it at each level of join planning.

I am not able to understand how this strategy of applying all join
clauses on top of cross product would work if there are OUTER joins
involved. Wouldn't nullable sides change the result?

I'm not able to understand that, either.

(Hm, that does sound a lot like "throw it away and start again", doesn't
it.  But what we've got here is busted enough that I'm not sure there
are good alternatives.  Maybe for 9.6 we could just rewrite
GetExistingLocalJoinPath, and limp along doing a lot of redundant
computation during planning.)

An alternative I was thinking was (1) to create an equivalent nestloop join path for each foreign/custom join path and store it in fdw_outerpath as-is, except when that join path implements a full join, in which case a merge or hash join path is created, and (2) to apply postgresRecheckForeignScan to the outer subplan created from the fdw_outerpath when executing EPQ rechecks. So, I was thinking to provide a helper function that creates the equivalent local join path for a given foreign/custom join path in #1.

A possible short-term fix may be this: instead of picking any random
path to stick into fdw_outerpath, we choose a path which covers the
pathkeys of ForeignPath. If postgres_fdw was able to come up with a
ForeignPath with those pathkeys, there is high chance that there
exists a local path with those pathkeys. Since we are yet to call
add_path() with the ForeignPath being built, that local path should
exist in the path list. Stick that path in fdw_outerpath. If such a
path doesn't exist, return NULL from GetExistingLocalJoinPath().
postgres_fdw wouldn't push down the join in that case and we will
loose some optimization, but that might be better than throwing an
error.

Seems reasonable.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to