Hi, postgres_fdw uses REPEATABLE READ isolation level for the remote transaction when the local transaction has READ COMMITTED isolation level, for the reason described in the comments for begin_remote_xact() in connection.c:
/* * Start remote transaction or subtransaction, if needed. * * Note that we always use at least REPEATABLE READ in the remote session. * This is so that, if a query initiates multiple scans of the same or * different foreign tables, we will get snapshot-consistent results from * those scans. A disadvantage is that we can't provide sane emulation of * READ COMMITTED behavior --- it would be nice if we had some other way to * control which remote queries share a snapshot. */ But as mentioned above, this causes unexpected behavior like this: S1: CREATE TABLE t1 (c1 TEXT); S1: CREATE FOREIGN TABLE ft1 (c1 TEXT) SERVER loopback OPTIONS (table_name 't1'); S1: INSERT INTO ft1 VALUES ('foo'); S2: START TRANSACTION ISOLATION LEVEL READ COMMITTED; S2: SELECT * FROM ft1; c1 ----- foo (1 row) Looks good, but: S1: INSERT INTO ft1 VALUES ('bar'); S2: SELECT * FROM ft1; c1 ----- foo (1 row) The SELECT query would be expected to retrieve not only the row (‘foo’) but the row (‘bar’) inserted by session S1 just before, but retrieves only the row (‘foo’). I would like to propose a simple solution for this issue. The idea for the solution is that while postgres_fdw uses REPEATABLE READ for the remote transaction as before, it refreshes the snapshot for the transaction so that the same snapshot is shared by remote queries from within a top-level query (or a set of queries expanded from it by rewrite rules) received via simple/extended query protocol. To do this, changes I would like to make to the core and postgres_fdw code is: * Add IDs for iterations of the PostgresMain() loop for extensions like postgres_fdw to know which top-level query is currently being processed. * Add a function pg_refresh_snapshot() to refresh the snapshot for a REPEATABLE READ transaction. (This function would cause phantom read, which is not allowed to occur at REPEATABLE READ isolation level in Postgres, but is allowed by the SQL standard.) * Modify postgres_fdw so that when encountering a new top-level query, it refreshes the snapshot for the remote transaction by using pg_refresh_snapshot() before sending the first query from within the top-level query if it is safe to do so. Attached is a WIP patch for that. With the patch we have: S1: INSERT INTO ft1 VALUES ('foo'); S2: START TRANSACTION ISOLATION LEVEL READ COMMITTED; S2: SELECT * FROM ft1; c1 ----- foo (1 row) S1: INSERT INTO ft1 VALUES ('bar'); S2: SELECT * FROM ft1; c1 ----- foo bar (2 rows) The SELECT query retrieves both rows! Comments welcome! Maybe I am missing something, though. Best regards, Etsuro Fujita
postgres_fdw-emulate-RC-behavior-WIP.patch
Description: Binary data