pá 22. 8. 2025 v 17:03 odesílatel hubert depesz lubaczewski <
dep...@depesz.com> napsal:

> On Fri, Aug 22, 2025 at 07:33:03AM -0700, Adrian Klaver wrote:
> > On 8/22/25 05:37, hubert depesz lubaczewski wrote:
> > > On Thu, Aug 21, 2025 at 11:17:27AM -0700, Adrian Klaver wrote:
> > > > Have you looked at?:
> > > > https://www.pgbouncer.org/changelog.html#pgbouncer-124x
> > > > To see if anything stands out.
> > > > Then there is:
> > > > https://www.pgbouncer.org/config.html#max_prepared_statements
> > > > The below may also be worth looking at:
> > > > https://github.com/pgbouncer/pgbouncer/pull/1144
> > > > I can't help thinking that there is a caching issue at stake, though
> that is
> > > > just a guess.
> > >
> > > Will check, but we don't use prepared statements, so this doesn't seem
> > > to be part-of-the-problem.
> >
> > In your OP there is:
> >
> > "Every now and then (usually every 3-5 minutes, but not through the whole
> > day), we see situations where every query suddently takes ~ 1 second.
> >
> >
> > I'm talkign about "prepare" for preparing statements. binds. also
> > "discard all".
> > "
> >
> > Is that referring to something else?
>
> No. Same problem. The thing is that it comes and goes.
>
> *BUT*
>
> I got repeatable case today. Is is breaking on its own everyy
> ~ 5 minutes.
>
> So I made test.sql with:
> \timing on
> SELECT 'SELECT now(), pg_sleep(.05);' FROM generate_series(1,100000) i
> \gexec
>
> and ran it like this:
>
> psql -p 5432 -d the_db -f test.sql -qAtX | perl -pe 's/\|\s*$/ :: /'
>
> This bypassed pgbouncer, connected via unix socket, the whole thing got
> simpler.
>
> After 4 minutes of 50ms runtimes I got:
>
> 2025-08-22 14:44:14.013541+00 :: Time: 50.090 ms
> 2025-08-22 14:44:14.063636+00 :: Time: 50.088 ms
> 2025-08-22 14:44:14.11373+00 :: Time: 50.085 ms
> 2025-08-22 14:44:14.163822+00 :: Time: 50.099 ms
> 2025-08-22 14:44:14.213927+00 :: Time: 1393.932 ms (00:01.394)
> 2025-08-22 14:44:15.607872+00 :: Time: 50.171 ms
> 2025-08-22 14:44:15.658097+00 :: Time: 50.209 ms
> 2025-08-22 14:44:15.708353+00 :: Time: 50.212 ms
> 2025-08-22 14:44:15.758589+00 :: Time: 50.170 ms
>
> This is pg_sleep. There are no locks, nothing else. I think this
> disproves idea that the problem is about pgbouncer, and it kinda looks
> like a thing in pg?
>
> Interestingly: ran strace on the pg backend that was doing this loop,
> like this:
> strace -tt -T -ff -p 1266521  -s 256 -o a.log
>
> And it showed something really strange:
>
> …
> 14:57:00.864684 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2),
> pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000007>
> 14:57:00.864739 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050063>
> 14:57:00.914823 sendto(9,
> "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22
> 14:57:00.86+00\0\0\0\0071266521\0\0\0\0C\0\0\0\14:57:00.914849 recvfrom(9,
> "Q\0\0\0CSELECT now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);\0",
> 8192, 0, NULL, NULL) = 68 <0.000007>
> 14:57:00.914891 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050064>
> 14:57:00.964978 sendto(9,
> "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22
> 14:57:00.91+00\0\0\0\0071266521\0\0\0\0C\0\0\0\
> 14:57:00.965006 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2),
> pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000007>
>
>
>  14:57:00.965052 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050063>
> 14:57:01.015133 futex(0xffe7a6636538,
> FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 0, NULL, FUTEX_BITSET_MATCH_ANY) =
> 0 <1.354927>
> 14:57:02.370180 write(2, "\0\0\36\1YS\23\0T2025-08-22 14:57:02.370
> UTC,\"postgres\",\"the.db\",1266521,\"[local]\",68a884fb.135359,5,\"SELECT\",2025-08-22
> 14:55:55 UTC,85/0,0,LOG,00000,\"duration: 1405.106 ms  statement: SELECT
> now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);
> 14:57:02.370521 sendto(8, 
> "\2\0\0\0@\0\0\0\216A\0\0\0\0\0\0\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\255\221\33\0\0\0\0\0\262\341\6\0\0\0\0\0\0\0\0\0\0\0\0\0",
> 64, 0, NULL, 0) = 64 <0.000041>
> 14:57:02.370639 sendto(9,
> "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22
> 14:57:00.97+00\0\0\0\0071266521\0\0\0\0C\0\0\0\14:57:02.371063 recvfrom(9,
> "Q\0\0\0CSELECT now()::timestamptz(2), pg_backend_pid(), pg_sleep(.05);\0",
> 8192, 0, NULL, NULL) = 68 <0.000019>
> 14:57:02.371215 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050090>
> 14:57:02.421381 sendto(9,
> "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22
> 14:57:02.37+00\0\0\0\0071266521\0\0\0\0C\0\0\0\
> 14:57:02.421480 recvfrom(9, 0xaaaae7b3fc08, 8192, 0, NULL, NULL) = -1
> EAGAIN (Resource temporarily unavailable) <0.000008>
>
>
>  14:57:02.421507 epoll_pwait(4, [{EPOLLIN, {u32=470567512,
> u64=187651886697048}}], 1, -1, NULL, 8) = 1 <0.000037>
> 14:57:02.421570 recvfrom(9, "Q\0\0\0CSELECT now()::timestamptz(2),
> pg_backend_pid(), pg_sleep(.05);\0", 8192, 0, NULL, NULL) = 68 <0.000012>
> 14:57:02.421673 epoll_pwait(11, [], 1, 50, NULL, 8) = 0 <0.050099>
> 14:57:02.471866 sendto(9,
> "T\0\0\0X\0\3now\0\0\0\0\0\0\0\0\0\4\240\0\10\0\0\0\2\0\0pg_backend_pid\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0002\0\3\0\0\0\0312025-08-22
> 14:57:02.42+00\0\0\0\0071266521\0\0\0\0C\0\0\0\14:57:02.471961 recvfrom(9,
> 0xaaaae7b3fc08, 8192, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily
> unavailable) <0.000008>
> …
>
> As in, normal execution is: recv from, epoll_pwait that takes 0.05s, and
> sendto.
> all clear, nothing fancy.
>
> *BUT*
> in case of this one problematic thing there is futex() call that took
> 1.354s ?! Checked, and in straced ~ 800 queries this is the only futex
> call() ?!
>
> Any idea on what could it be?
>

Few years ago I had a similar experience - some sec mystic lags - the
problem was in virtualization.

Regards

Pavel



>
> Best regards,
>
> depesz
>
>
>
>

Reply via email to