postgres_fdw: Provide better emulation of READ COMMITTED behavior

Etsuro Fujita Thu, 05 Dec 2024 01:41:33 -0800

Hi,

postgres_fdw uses REPEATABLE READ isolation level for the remote
transaction when the local transaction has READ COMMITTED isolation
level, for the reason described in the comments for
begin_remote_xact() in connection.c:


/*
 * Start remote transaction or subtransaction, if needed.
 *
 * Note that we always use at least REPEATABLE READ in the remote session.
 * This is so that, if a query initiates multiple scans of the same or
 * different foreign tables, we will get snapshot-consistent results from
 * those scans.  A disadvantage is that we can't provide sane emulation of
 * READ COMMITTED behavior --- it would be nice if we had some other way to
 * control which remote queries share a snapshot.
 */

But as mentioned above, this causes unexpected behavior like this:

S1: CREATE TABLE t1 (c1 TEXT);
S1: CREATE FOREIGN TABLE ft1 (c1 TEXT) SERVER loopback OPTIONS
(table_name 't1');
S1: INSERT INTO ft1 VALUES ('foo');
S2: START TRANSACTION ISOLATION LEVEL READ COMMITTED;
S2: SELECT * FROM ft1;
 c1
-----
 foo
(1 row)

Looks good, but:

S1: INSERT INTO ft1 VALUES ('bar');
S2: SELECT * FROM ft1;
 c1
-----
 foo
(1 row)

The SELECT query would be expected to retrieve not only the row
(‘foo’) but the row (‘bar’) inserted by session S1 just before, but
retrieves only the row (‘foo’).

I would like to propose a simple solution for this issue.  The idea
for the solution is that while postgres_fdw uses REPEATABLE READ for
the remote transaction as before, it refreshes the snapshot for the
transaction so that the same snapshot is shared by remote queries from
within a top-level query (or a set of queries expanded from it by
rewrite rules) received via simple/extended query protocol.  To do
this, changes I would like to make to the core and postgres_fdw code
is:

* Add IDs for iterations of the PostgresMain() loop for extensions
like postgres_fdw to know which top-level query is currently being
processed.
* Add a function pg_refresh_snapshot() to refresh the snapshot for a
REPEATABLE READ transaction.  (This function would cause phantom read,
which is not allowed to occur at REPEATABLE READ isolation level in
Postgres, but is allowed by the SQL standard.)
* Modify postgres_fdw so that when encountering a new top-level query,
it refreshes the snapshot for the remote transaction by using
pg_refresh_snapshot() before sending the first query from within the
top-level query if it is safe to do so.

Attached is a WIP patch for that.  With the patch we have:

S1: INSERT INTO ft1 VALUES ('foo');
S2: START TRANSACTION ISOLATION LEVEL READ COMMITTED;
S2: SELECT * FROM ft1;
 c1
-----
 foo
(1 row)

S1: INSERT INTO ft1 VALUES ('bar');
S2: SELECT * FROM ft1;
 c1
-----
 foo
 bar
(2 rows)

The SELECT query retrieves both rows!

Comments welcome!  Maybe I am missing something, though.

Best regards,
Etsuro Fujita

postgres_fdw-emulate-RC-behavior-WIP.patch
Description: Binary data

postgres_fdw: Provide better emulation of READ COMMITTED behavior

Reply via email to