RE: Transactions involving multiple postgres foreign servers, take 2

tsunakawa.ta...@fujitsu.com Fri, 11 Sep 2020 02:25:19 -0700

From: Masahiko Sawada <masahiko.saw...@2ndquadrant.com>
> On Tue, 8 Sep 2020 at 13:00, tsunakawa.ta...@fujitsu.com
> <tsunakawa.ta...@fujitsu.com> wrote:
> > 2. 2PC processing is queued and serialized in one background worker.  That
> severely subdues transaction throughput.  Each backend should perform
> 2PC.
> 
> Not sure it's safe that each backend perform PREPARE and COMMIT
> PREPARED since the current design is for not leading an inconsistency
> between the actual transaction result and the result the user sees.


As Fujii-san is asking, I also would like to know what situation you think is 
not safe.  Are you worried that the FDW's commit function might call 
ereport(ERROR | FATAL | PANIC)?  If so, can't we stipulate that the FDW 
implementor should ensure that the commit function always returns control to 
the caller?


> But in the future, I think we can have multiple background workers per
> database for better performance.

Does the database in "per database" mean the local database (that applications 
connect to), or the remote database accessed via FDW?

I'm wondering how the FDW and background worker(s) can realize parallel prepare 
and parallel commit.  That is, the coordinator transaction performs:

1. Issue prepare to all participant nodes, but doesn't wait for the reply for 
each issue.
2. Waits for replies from all participants.
3. Issue commit to all participant nodes, but doesn't wait for the reply for 
each issue.
4. Waits for replies from all participants.

If we just consider PostgreSQL and don't think about FDW, we can use libpq 
async functions -- PQsendQuery, PQconsumeInput, and PQgetResult.  pgbench uses 
them so that one thread can issue SQL statements on multiple connections in 
parallel.

But when we consider the FDW interface, plus other DBMSs, how can we achieve 
the parallelism?


> > 3. postgres_fdw cannot detect remote updates when the UDF executed on a
> remote node updates data.
> 
> I assume that you mean the pushing the UDF down to a foreign server.
> If so, I think we can do this by improving postgres_fdw. In the current patch,
> registering and unregistering a foreign server to a group of 2PC and marking a
> foreign server as updated is FDW responsible. So perhaps if we had a way to
> tell postgres_fdw that the UDF might update the data on the foreign server,
> postgres_fdw could mark the foreign server as updated if the UDF is shippable.

Maybe we can consider VOLATILE functions update data.  That may be 
overreaction, though.

Another idea is to add a new value to the ReadyForQuery message in the FE/BE 
protocol.  Say, 'U' if in a transaction block that updated data.  Here we 
consider "updated" as having allocated an XID.

52.7. Message Formats
https://www.postgresql.org/docs/devel/protocol-message-formats.html
--------------------------------------------------
ReadyForQuery (B)

Byte1
Current backend transaction status indicator. Possible values are 'I' if idle 
(not in a transaction block); 'T' if in a transaction block; or 'E' if in a 
failed transaction block (queries will be rejected until block is ended).
--------------------------------------------------


Regards
Takayuki Tsunakawa

RE: Transactions involving multiple postgres foreign servers, take 2

Reply via email to