> I hacked on this for awhile, but there remain two matters on which I'm
> uncertain about the right way forward.
> (1) To acquire the old tuple for UPDATE/DELETE operations, the patch closely
> parallels our handling for INSTEAD OF triggers on views.  It adds a wholerow
> resjunk attribute, from which it constructs a HeapTuple before calling a
> trigger function.  This loses the system columns, an irrelevant
> consideration for views but meaningful for foreign tables.  postgres_fdw
> maintains the "ctid" system column (t_self), but triggers will always see
> an invalid t_self for the old tuple.  One way to fix this is to pass around
> the old tuple data as ROW(ctid, oid, xmin, cmin, xmax, cmax, tableoid,
> wholerow).  That's fairly close to sufficient, but it doesn't cover t_ctid.
> Frankly, I would like to find a principled excuse to not worry about making
> foreign table system columns accessible from triggers.  Supporting system
> columns dramatically affects the mechanism, and what trigger is likely to
> care?  Unfortunately, that argument seems too weak.  Does anyone have a
> cleaner idea for keeping track of the system column values or a stronger
> argument for not bothering?
Regarding to the first suggestion,
I think, it is better not to care about system columns on foreign tables,
because it fully depends on driver's implementation whether FDW fetches
"ctid" from its data source, or not.
Usually, system columns of foreign table, except for tableoid, are nonsense.
Because of implementation reason, postgres_fdw fetches "ctid" of remote
tables on UPDATE / DELETE, it is not a common nature for all FDW drivers.
For example, we can assume an implementation that uses primary key of remote
table to identify the record to be updated or deleted. In this case, local
"ctid" does not have meaningful value.
So, fundamentally, we cannot guarantee FDW driver returns meaningful "ctid"
or other system columns.

> (2) When a foreign table has an AFTER ROW trigger, the FDW's
> ExecForeign{Insert,Update,Delete} callbacks must return a slot covering
> all columns.  Current FDW API documentation says things like this:
>   The data in the returned slot is used only if the INSERT query has a
>   RETURNING clause.  Hence, the FDW could choose to optimize away returning
>   some or all columns depending on the contents of the RETURNING clause.
> Consistent with that, postgres_fdw inspects the returningLists of the
> ModifyTable node to decide which columns are necessary.  This patch has
> rewriteTargetListIU() add a resjunk wholerow var to the returningList of
> any query that will fire an AFTER ROW trigger on a foreign table.  That
> avoids the need to change the FDW API contract or any postgres_fdw code.
> I was pleased about that for a time, but on further review, I'm doubting
> that the benefit for writable FDWs justifies muddying the definition of
> returningList; until now, returningList has been free from resjunk TLEs.
> For example, callers of
> FetchStatementTargetList() may be unprepared to see an all-resjunk list,
> instead of NIL, for a data modification statement that returns nothing.
> If we do keep the patch's approach, I'm inclined to rename returningList.
> However, I more lean toward adding a separate flag to indicate the need
> to return a complete tuple regardless of the RETURNING list.  The benefits
> of overloading returningList are all short-term benefits.  We know that
> the FDW API is still converging, so changing it seems eventually-preferable
> to, and safer than, changing the name or meaning of returningList.
> Thoughts?
> On Thu, Mar 06, 2014 at 09:11:19AM +0100, Ronan Dunklau wrote:
> > Le mercredi 5 mars 2014 22:36:44 Noah Misch a écrit :
> > > Agreed.  More specifically, I see only two scenarios for retrieving
> > > tuples from the tuplestore.  Scenario one is that we need the next
> > > tuple (or pair of tuples, depending on the TriggerEvent).  Scenario
> > > two is that we need the tuple(s) most recently retrieved.  If that's
> > > correct, I'm inclined to rearrange afterTriggerInvokeEvents() and
> > > AfterTriggerExecute() to remember the tuple or pair of tuples most
> > > recently retrieved.  They'll then never call tuplestore_advance()
> > > just to reposition.  Do you see a problem with that?
> >
> > I don't see any problem with that. I don't know how this would be
> > implemented, but it would make sense to avoid those scans, as long as
> > a fresh copy is passed to the trigger: modifications to a tuple
> > performed in an after trigger should not be visible to the next one.
> Trigger functions are not permitted to modify tg_trigtuple or tg_newtuple;
> notice that, for non-foreign triggers, we pass shared_buffers-backed tuples
> in those fields.  Therefore, no copy is necessary.
> > > I was again somewhat tempted to remove ate_tupleindex, perhaps by
> > > defining the four flag bits this way:
> > >
> > > #define AFTER_TRIGGER_DONE                                0x10000000
> > > #define AFTER_TRIGGER_IN_PROGRESS         0x20000000
> > > /* two bits describing the size of and tuple sources for this event
> */
> > > #define AFTER_TRIGGER_TUP_BITS                    0xC0000000
> > > #define AFTER_TRIGGER_FDW_REUSE                   0x00000000
> > > #define AFTER_TRIGGER_FDW_FETCH                   0x40000000
> > > #define AFTER_TRIGGER_1CTID                               0x80000000
> > > #define AFTER_TRIGGER_2CTID                               0xC0000000
> > >
> > > the aforementioned scenarios one and two, respectively.  I think,
> > > though, I'll rate this as needless micro-optimization and not bother;
> opinions welcome.
> > > (The savings is four bytes per foreign table trigger event.)
> >
> > I was already happy with having a lower footprint for foreign table
> > trigger events than for regular trigger events, but if we remove the
> > need for seeking in the tuplestore entirely, it would make sense to get
> rid of the index.
> I'm pleased with how this turned out.  Besides the memory savings in
> question, this removed the INT_MAX limitation and simplified the code
> overall.  I did not observe a notable runtime improvement, though that's
> unsurprising.
> Other notable changes in the attached revision:
> 1. UPDATE/DELETE row-level triggers on foreign tables and INSTEAD OF
> triggers on views have a similar requirement to generate a HeapTuple
> representing the old row.  View triggers did so in nodeModifyTable.c, while
> foreign table triggers did so in trigger.c.  Both were valid choices, but
> the code siting should not be relkind-dependent without good reason.  I
> centralized this in ExecModifyTable().
> 2. Made CREATE TRIGGER forbid INSTEAD OF and TRUNCATE triggers on foreign
> tables.  The documentation already claimed they were unavailable.
> 3. Fixed pointer arithmetic in AfterTriggerBeginQuery()'s MemSet() call.
> 4. Modified GetCurrentFDWTuplestore() to allocate the tuplestore in
> TopTransactionContext.  We explicitly put the events list there
> (specifically, in a documentation-only child of that context), so it seemed
> more consistent to do the same for the associated foreign tuples.  I did
> not find any live bug from the previous coding, because CurrentMemoryContext
> always happened to be one that survived past AfterTriggerEndQuery().
> 5. Updated comments and documentation that still reflected earlier versions
> of the patch, as well as older comments obsoleted by the patch.
> 6. Reverted cosmetic changes, like addition of braces and blank lines, to
> passages of code not otherwise changing.  Please see:
> https://wiki.postgresql.org/wiki/Creating_Clean_Patches
> --
> Noah Misch
> EnterpriseDB
> http://www.enterprisedb.com

NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kai...@ak.jp.nec.com>

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to