Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

Etsuro Fujita Mon, 07 Mar 2016 04:57:19 -0800

On 2016/03/05 5:45, Robert Haas wrote:

Some comments on the latest version.  I haven't reviewed the
postgres_fdw changes in detail here, so this is just about the core
changes.


Thank you for taking the time to review the patch!

I see that show_plan_tlist checks whether the operation is any of
CMD_INSERT, CMD_UPDATE, or CMD_DELETE.  But practically every place
else where a similar test is needed instead tests whether the
operation is *not* CMD_SELECT.  I think this place should do it that
way, too.

+               resultRelInfo = mtstate->resultRelInfo;
                 for (i = 0; i < nplans; i++)
                 {
                         ExecAuxRowMark *aerm;

+                       /*
+                        * ignore subplan if the FDW pushes down the
command to the remote
+                        * server; the ModifyTable won't have anything
to do except for
+                        * evaluation of RETURNING expressions
+                        */
+                       if (resultRelInfo->ri_FdwPushdown)
+                       {
+                               resultRelInfo++;
+                               continue;
+                       }
+
                         subplan = mtstate->mt_plans[i]->plan;
                         aerm = ExecBuildAuxRowMark(erm, subplan->targetlist);
                         mtstate->mt_arowmarks[i] =
lappend(mtstate->mt_arowmarks[i], aerm);
+                       resultRelInfo++;
                 }


This kind of thing creates a hazard for future people maintaining this
code.  If somebody adds some code to this loop that needs to execute
even when resultRelInfo->ri_FdwPushdown is true, they have to add two
copies of it.  It's much better to move the three lines of logic that
execute only in the non-pushdown case inside of if
(!resultRelInfo->ri_FdwPushdown).

Another option to avoid such a hazard would be to remove the two changesfrom ExecInitModifyTable and create ExecAuxRowMarks and junk filterseven in the pushdown case. I made the changes because we won't useExecAuxRowMarks in that case since we don't need to do EvalPlanQualrechecks and because we won't use junk filters in that case since we doUPDATE/DELETE in the subplan. But the creating cost is enough small, sosimply removing the changes seems like a good idea.

This issue crops up elsewhere as well.  The changes to
ExecModifyTable() have the same problem -- in that case, it might be
wise to move the code that's going to have to be indented yet another
level into a separate function.   That code also says this:

+                       /* No need to provide scan tuple to
ExecProcessReturning. */
+                       slot = ExecProcessReturning(resultRelInfo,
NULL, planSlot);

...but, uh, why not?  The comment says what the code does, but what it
should do is explain why it does it.

As documented in IterateDMLPushdown in fdwhandler.sgml, the reason forthat is that in the pushdown case it's the IterateDMLPushdown'sresponsiblity to get actually inserted/updated/deleted tuples and makethose tuples available to the ExecProcessReturning. I'll add comments.

On a broader level, I'm not very happy with the naming this patch
uses.  Here's an example:

+    <para>
+     If an FDW supports optimizing foreign table updates, it still needs to
+     provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>,
+     <function>IterateDMLPushdown</> and <function>EndDMLPushdown</>
+     described below.
+    </para>

"Optimizing foreign table updates" is both inaccurate (since it
doesn't only optimize updates) and so vague as to be meaningless
unless you already know what it means.  The actual patch uses
terminology like "fdwPushdowns" which is just as bad.  We might push a
lot of things to the foreign side -- sorts, joins, aggregates, limits
-- and this is just one of them.  Worse, "pushdown" is itself
something of a term of art - will people who haven't been following
all of the mammoth, multi-hundred-email threads on this topic know
what that means?  I think we need some better terminology here.

The best thing that I can come up with offhand is "bulk modify".  So
we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify,
EndBulkModify, ExplainBulkModify.  Other suggestions welcome.   The
ResultRelInfo flag could be ri_usesFDWBulkModify.

I'm not sure that "bulk modify" is best. Yeah, this would improve theperformance especially in the bulk-modification case, but would improvethe performance even in the case where an UPDATE/DELETE modifies just asingle row. Let me explain using an example. Without the patch, wehave the following plan for an UPDATE on a foreign table that updates asingle row:


postgres=# explain verbose update foo set a = a + 1 where a = 1;
                                    QUERY PLAN
----------------------------------------------------------------------------------
 Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
   Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1
   ->  Foreign Scan on public.foo  (cost=100.00..101.05 rows=1 width=14)
         Output: (a + 1), b, ctid

Remote SQL: SELECT a, b, ctid FROM public.foo WHERE ((a = 1))FOR UPDATE

(5 rows)

The plan requires two queries, SELECT and UPDATE, to do the update.(Actually, the plan have additional overheads in creating a cursor forthe SELECT and establishing a prepared statement for the UPDATE.) Butwith the patch, we have:


postgres=# explain verbose update foo set a = a + 1 where a = 1;
                                QUERY PLAN
---------------------------------------------------------------------------
 Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
   ->  Foreign Update on public.foo  (cost=100.00..101.05 rows=1 width=14)
         Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a = 1))
(3 rows)

The optimized plan requires just a single UPDATE query to do that! So,even in the single-row-modification case the patch could improve theperformance.

How about "Direct Modify"; PlanDirectModify, BeginDirectModify,IterateDirectModify, EndDirectModify, ExplainDirectModify, andri_usesFDWDirectModify.

The documentation
could say something like this:

Some inserts, updates, and deletes to foreign tables can be optimized
by implementing an alternate set of interfaces.  The ordinary
interfaces for inserts, updates, and deletes fetch rows from the
remote server and then modify those rows one at a time.  In some
cases, this row-by-row approach is necessary, but it can be
inefficient.  If it is possible for the foreign server to determine
which rows should be modified without actually retrieving them, and if
there are no local triggers which would affect the operation, then it
is possible to arrange things so that the entire operation is
performed on the remote server.  The interfaces described below make
this possible.


Will update as proposed.

+     Begin executing a foreign table update directly on the remote server.

I think this should say "Prepare to execute a bulk modification
directly on the remote server".  It shouldn't actually begin the
execution phase.

+     End the table update and release resources.  It is normally not important

And I think this one should say "Clean up following a bulk
modification on the remote server".  It's not actually ending the
update; the iterate method already did that.


OK, will fix.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

Reply via email to