Re: PostgreSQL 15.5 stops processing user queries

2023-12-26 Thread Andrey Zhidenkov
Sorry, wrong stack trace. Here is a correct one:

(gdb) bt
#0  0x7f9acca1bdf6 in do_futex_wait.constprop () from
/lib64/libpthread.so.0
#1  0x7f9acca1bee8 in __new_sem_wait_slow.constprop.0 () from
/lib64/libpthread.so.0
#2  0x007815a2 in PGSemaphoreLock ()
#3  0x007fca6c in LWLockAcquire ()
#4  0x7f9ac5c385e6 in pgsm_store () from
/usr/pgsql-15/lib/pg_stat_monitor.so
#5  0x7f9ac5c39a6b in pgsm_ExecutorEnd () from
/usr/pgsql-15/lib/pg_stat_monitor.so
#6  0x00658d91 in PortalCleanup ()
#7  0x0095b264 in PortalDrop ()
#8  0x0080c6df in exec_simple_query ()
#9  0x0080de22 in PostgresMain ()
#10 0x0078db80 in ServerLoop ()
#11 0x0078eb34 in PostmasterMain ()
#12 0x0050474d in main ()

It’s seems to be a problem in pg_stat_monitor extension. The similar issue
is described here (in Chinese):
https://blog.csdn.net/qq_43687755/article/details/117592635

On Thu, 21 Dec 2023, 18:32 Tom Lane  wrote:

> Andrey Zhidenkov  writes:
> > We have encountered an issue with our PostgreSQL 15.5 installation. The
> > problem is that
> > PostgreSQL server periodically falls into a state when it accepts new
> > connections but
> > doesn't execute any queries. The session which runs a query hangs and
> > cannot be terminated
> > via SIGINT (even if it's just "SELECT 1") - a corresponding linux process
> > has state "S" and it's
> > not terminated even if the PostgreSQL master process is stopped. No
> matter
> > how we connect to
> > the database - both TCP and unix socket sessions hang but it seems that
> > existing sessions can
> > still execute queries (once we succeeded to connect using pgBouncer which
> > seemed to use an
> > existing connection to the database).
>
> Try using an existing session to capture pg_stat_activity and pg_locks
> information about the stuck session(s).
>
> > Here is a stack trace from gdb for one of the frozen sessions:
>
> This appears to be a trace of a psql process waiting for a query
> result.  It won't teach you much about what the server is doing.
>
> regards, tom lane
>


PostgreSQL 15.5 stops processing user queries

2023-12-21 Thread Andrey Zhidenkov
Hello all,

We have encountered an issue with our PostgreSQL 15.5 installation. The
problem is that
PostgreSQL server periodically falls into a state when it accepts new
connections but
doesn't execute any queries. The session which runs a query hangs and
cannot be terminated
via SIGINT (even if it's just "SELECT 1") - a corresponding linux process
has state "S" and it's
not terminated even if the PostgreSQL master process is stopped. No matter
how we connect to
the database - both TCP and unix socket sessions hang but it seems that
existing sessions can
still execute queries (once we succeeded to connect using pgBouncer which
seemed to use an
existing connection to the database).

Here is a stack trace from gdb for one of the frozen sessions:

(gdb) bt 10
#0  0x7f6d31dbd378 in poll () from /lib64/libc.so.6
#1  0x7f6d3286aee1 in pqSocketCheck.part.2 () from
/usr/pgsql-15/lib/libpq.so.5
#2  0x7f6d3286b054 in pqWaitTimed () from /usr/pgsql-15/lib/libpq.so.5
#3  0x7f6d32867848 in PQgetResult () from /usr/pgsql-15/lib/libpq.so.5
#4  0x00411320 in ExecQueryAndProcessResults
(query=query@entry=0x23a68b0
"select 1;", elapsed_msec=elapsed_msec@entry=0x7ffc2b5840a8,
svpt_gone_p=svpt_gone_p@entry=0x7ffc2b5840a7,
is_watch=is_watch@entry=false, opt=opt@entry=0x0,
printQueryFout=printQueryFout@entry=0x0) at common.c:1426
#5  0x0040feb9 in SendQuery (query=0x23a68b0 "select 1;") at
common.c:1117
#6  0x0040627b in main (argc=, argv=)
at startup.c:384

We're using glibc-2.28-236.0.1.el8.7.x86_64 on this machine and PostgreSQL
15.5:

postgres=# select version();
 version

-
 PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0
20210514 (Red Hat 8.5.0-20), 64-bit
(1 row)

We've tried to recreate a cluster from scratch using a logical dump on new
hardware but it didn't
help though. And unfortunately we could not reproduce the issue, it looks
like it occurs randomly and
when it happens only PostgreSQL restart helps. Also we have number of
machines that run the
same version of PostgreSQL but we have the problem only with one cluster so
maybe it somehow
related to queries that are specific to this cluster.

We also run patroni 2.1.4 on this cluster, for reference (I'm not sure if
it can be related). We checked PostgreSQL logs,
of course - there are no any messages that could be related to the issue as
well.

We will really appreciate any help, thanks!

-- 
With best regards, Andrei Zhidenkov.