Hi, I got an offline report from my colleague Zhibai Song that close_cursor() is called for a freed PGconn, leading to a server crash. Here is a reproducer (the original reproducer he provided is a bit complex, so I simplified it):
create server loopback
foreign data wrapper postgres_fdw
options (dbname 'postgres');
create user mapping for current_user server loopback;
create table t1 (id int, data text);
create foreign table ft1 (id int, data text)
server loopback options (table_name 't1');
insert into ft1 values (1, 'foo');
start transaction;
-- This caches the remote connection's PGconn in PgFdwScanState
declare c1 cursor for select * from ft1;
fetch c1;
id | data
----+------
1 | foo
(1 row)
savepoint s1;
select * from ft1;
id | data
----+------
1 | foo
(1 row)
select pid from pg_stat_activity
where datname = 'postgres'
and application_name = 'postgres_fdw';
pid
-------
91853
(1 row)
-- This terminates the remote session
select pg_terminate_backend(91853);
pg_terminate_backend
----------------------
t
(1 row)
-- This leaves the remote connection's changing_xact_state as true
rollback to s1;
savepoint s1;
-- This calls pgfdw_reject_incomplete_xact_state_change(), freeing
-- the remote connection's PGconn as changing_xact_state is true
select * from ft1;
ERROR: connection to server "loopback" was lost
rollback to s1;
-- This calls close_cursor() on the PGconn cached in PgFdwScanState,
-- which was freed above, leading to a server crash
close c1;
I think the root cause is that it is too early to free the PGconn in
pgfdw_reject_incomplete_xact_state_change() even if the connection is
in a state where we cannot use it any further; I think we should delay
that until abort cleanup (ie, pgfdw_xact_callback()). Attached is a
patch for that.
Best regards,
Etsuro Fujita
fix-connection-handling-in-postgres-fdw.patch
Description: Binary data
