On Tue, Feb 18, 2025 at 2:21 AM Michael Paquier <mich...@paquier.xyz> wrote: > > On Mon, Feb 17, 2025 at 11:25:05AM -0500, Tom Lane wrote: > > This timeout failure on hachi looks suspicious as well: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2025-02-17%2003%3A05%3A03 > > > > Might be relevant that they are both aarch64? > > Just logged into the host. The logs of the timed out run are still > around, and the last information I can see is from lastcommand.log, > which seems to have frozen in time when the timeout has begun its > vacuuming work: > ok 73 + index_including_gist 353 ms > # parallel group (16 tests): create_cast errors create_aggregate > drop_if_exists infinite_recurse > > gokiburi is on the same host, and it is currently frozen in time when > trying to fetch a WAL buffer. One of the stack traces: > #2 0x000000000084ec48 in WaitEventSetWaitBlock (set=0xd34ce0, > cur_timeout=-1, occurred_events=0xffffffffadd8, nevents=1) at > latch.c:1571 > #3 WaitEventSetWait (set=0xd34ce0, timeout=-1, > occurred_events=occurred_events@entry=0xffffffffadd8, > nevents=nevents@entry=1, wait_event_info=<optimized out>, > wait_event_info@entry=134217781) at latch.c:1519 > #4 0x000000000084e964 in WaitLatch (latch=<optimized out>, > wakeEvents=wakeEvents@entry=33, timeout=timeout@entry=-1, > wait_event_info=wait_event_info@entry=134217781) at latch.c:538 > #5 0x000000000085d2f8 in ConditionVariableTimedSleep > (cv=0xffffec0799b0, timeout=-1, wait_event_info=134217781) at > condition_variable.c:163 > #6 0x000000000085d1ec in ConditionVariableSleep > (cv=0xfffffffffffffffc, wait_event_info=1) at condition_variable.c:98 > #7 0x000000000055f4f4 in AdvanceXLInsertBuffer > (upto=upto@entry=112064880, tli=tli@entry=1, opportunistic=false) at > xlog.c:2224 > #8 0x0000000000568398 in GetXLogBuffer (ptr=ptr@entry=112064880, > tli=tli@entry=1) at xlog.c:1710 > #9 0x000000000055c650 in CopyXLogRecordToWAL (write_len=80, > isLogSwitch=false, rdata=0xcc49b0 <hdr_rdt>, StartPos=<optimized out>, > EndPos=<optimized out>, tli=1) at xlog.c:1245 > #10 XLogInsertRecord (rdata=rdata@entry=0xcc49b0 <hdr_rdt>, > fpw_lsn=fpw_lsn@entry=112025520, flags=0 '\000', num_fpi=<optimized > out>, num_fpi@entry=0, topxid_included=false) at xlog.c:928 > #11 0x000000000056b870 in XLogInsert (rmid=rmid@entry=16 '\020', > info=<optimized out>, info@entry=16 '\020') at xloginsert.c:523 > #12 0x0000000000537acc in addLeafTuple (index=0xffffebf32950, > state=0xffffffffd5e0, leafTuple=0xe43870, current=<optimized out>, > parent=<optimized out>, > > So, yes, something looks really wrong with this patch. Sounds > plausible to me that some other buildfarm animals could be stuck > without their owners knowing about it. It's proving to be a good idea > to force a timeout value in the configuration file of these animals..
Tom, Michael, thank you for the information. This patch will be better tested before next attempt. ------ Regards, Alexander Korotkov Supabase