Hi!

During debug session I found, that queries with Parallel Seq Scan hang
in the current master - the leader worker waits indefinitely the signal
from parallel workers. A query is not possible to break, the leader
does not check interrupt status in the waiting loop.

1. How to reproduce:
a) Create table:

CREATE DATABASE expr;
\c expr
CREATE TABLE testexpr(
id INT,
val INT
);
INSERT INTO testexpr (id, val)
SELECT serie as id , MOD(serie, 10) as val
FROM generate_series(1,1000000) as serie;
EXPLAIN (ANALYZE) SELECT * FROM testexpr
WHERE val=1 AND id<30;

b) start debugger for this connection

c) Run command (parallel workers should be enabled as it is by default
configuration)
EXPLAIN (ANALYZE) SELECT * FROM testexpr
WHERE val=1 AND id<30;

d) Above query will start parallel worker(s). When worker(s) finish(es),
it/they send SIGUSR1 that is caught by debugger. When you dimiss
the signal message, you find that query continues to run, but really it
waits (in latch.c or in waiteventset.c depending on commit version).

2. Original commit with reproducible behaviour.
I tracked this behaviour down to commit
commit 7202d72787d3b93b692feae62ee963238580c877
Date:   Fri Feb 21 08:03:33 2025 +0100
backend launchers void * arguments for binary data
Change backend launcher functions to take void * for binary data
instead of char *.  This removes the need for numerous casts.
Discussion: https://www.postgresql.org/message-id/flat/fd1fcedb-3492-4fc8-9e3e-74b97f2db6c7%40eisentraut.org


It could be, that this patch activated the side problem, that already was in the system before. I looked for first commit with this problem from 6 Jan 2025, and 2 commits hanged the same way, but both did not reproduce it after repeat. Starting from the patch above, the hang is reproduced on Linux and MacOS.

Also I afraid, the same behaviour will be for other types of parallel
workers under debugger (Parallel Hash etc).

--
Best regards,

Vladlen Popolitov.


Reply via email to