Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

Robert Haas Mon, 01 Feb 2016 20:43:42 -0800

On Mon, Feb 1, 2016 at 6:48 PM, Robert Haas <[email protected]> wrote:
> More when I have a bit more time to look at this...


Why does deparseSelectStmtForRel change the order of the existing
arguments?  I have no issue with adding new arguments as required, but
rearranging the existing argument order doesn't serve any useful
purpose that is immediately apparent.  There's also no real need for
the rel -> foreignrel renaming.

+    /*
+     * If we have constructed the SELECT clause from the targetlist, construct
+     * the retrieved attributes list as continuously increasing list of
+     * integers.
+     */
+    if (retrieved_attrs && tlist)
+    {
+        int i;
+        *retrieved_attrs = NIL;
+        for (i = 1; i <= list_length(tlist); i++)
+            *retrieved_attrs = lappend_int(*retrieved_attrs, i);
+    }

This is really wonky.  First, you pass retrieved_attrs to
deparseSelectSqlForBaseRel, but then you have this code which blows it
away and replaces it if tlist != NIL.  So I guess this will run always
for a join relation, and for a base relation only sometimes.  But
that's certainly not at all clear.  I think you need to find some way
of rearranging this so that it's more obvious what is going on here.

I suggest not renaming the existing deparseTargetList() and instead
coming up with a different name for the new thing you need, maybe
deparseExplicitTargetList().

How about adding another sentence to the header comment for
appendConditions() saying something like "This is used for both WHERE
clauses and for JOIN .. ON"?

+ * The targetlist is list of TargetEntry's which in turn contains Var nodes.

contain.

+/*
+ * Output join name for given join type */

Formatting.  This patch, overall, is badly in need of a pgindent run.

+    /*
+     * XXX
+     * Since the query is being built in recursive manner from bottom up,
+     * the FOR UPDATE/SHARE clause referring the base relations can
not be added
+     * at the top level. They need to be added to the subqueries corresponding
+     * to the base relations. This has an undesirable effect of locking more
+     * rows than specified by user, as it locks even those rows from base
+     * relations which are not part of the final join result. To avoid this
+     * undesirable effect, we need to build the join query without the
+     * subqueries, which for now, seems hard.
+     */

This is a second good reason that we should actually do that work
instead of complaining that it's too hard.  The first one is that the
queries that come out of this right now are too long and hard to read.
I actually don't see why this is all that hard.  Deparsing the target
list is simple enough; you just need to emit tab.col using varno to
determine what tab looks like and varattno to determine what col looks
like.  The trickier part is emitting the FROM clause. But this doesn't
seem all that hard either.  Suppose that when we construct the fpinfo
(PgFdwRelationInfo *) for a relation, we include in it a FROM clause
appropriate to that rel.  So, for a baserel, it's something like "foo
r4" where 4 is foo's RTI.  For a joinrel, do this:

1. Emit the FROM clause constructed for the outer relation,
surrounding it with parentheses if the outer relation is a joinrel.
2. Emit " JOIN ", " LEFT JOIN ", " RIGHT JOIN ", or " FULL JOIN "
according to the join type.
3. Emit the FROM clause constructed for the inner relation,
surrounding it with parentheses if the inner relation is a joinrel.
4. Emit " ON ".
5. Emit the joinqual.

This will produce nice things like (foo r3 JOIN bar r4 ON r3.x = r4.x)
JOIN baz r2 ON r3.y = r2.y

Then, you'd also need to stuff the conditions into the
PgFdwRelationInfo so that those could be added to paths constructed at
higher levels.  But that's not too hard either.  Basically you'd end
up with mostly the same stuff you have now, but the PgFdwRelationInfo
would store a join clause and a set of deparsed quals to be included
in the FROM and WHERE clauses respectively.  And then you'd pull the
information from the inner and outer sides to build up what you need
at the joinrel level.

This would actually be faster than what you've got right now, because
right now you're recursing down the whole join tree all over again at
each new join level, maybe not the best plan.

+    if (foreignrel->reloptkind == RELOPT_JOINREL)
+        return;
+
     /* Add any necessary FOR UPDATE/SHARE. */
     deparseLockingClause(&context);

Generally, I think in these kinds of cases it is better to reverse the
test and get rid of the return statement, like this:

if (foreignrel->reloptkind != RELOPT_JOINREL)
    deparseLockingClause(&context);

The way you wrote it, somebody who wants to add more code to the end
of the function will probably have to make that change anyhow.
Really, your intention here was to skip that code for joins, not to
skip the rest of the function for baserels.

@@ -1424,9 +1875,7 @@ deparseVar(Var *node, deparse_expr_cxt *context)
             printRemoteParam(pindex, node->vartype, node->vartypmod, context);
         }
         else
-        {
             printRemotePlaceholder(node->vartype, node->vartypmod, context);
-        }
     }
 }

Useless hunk.

+     * Constructing queries representating SEMI and ANTI joins is hard, hence

Typo.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

Reply via email to