On Mon, Feb 17, 2025 at 11:25:05AM -0500, Tom Lane wrote: > This timeout failure on hachi looks suspicious as well: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2025-02-17%2003%3A05%3A03 > > Might be relevant that they are both aarch64?
Just logged into the host. The logs of the timed out run are still around, and the last information I can see is from lastcommand.log, which seems to have frozen in time when the timeout has begun its vacuuming work: ok 73 + index_including_gist 353 ms # parallel group (16 tests): create_cast errors create_aggregate drop_if_exists infinite_recurse gokiburi is on the same host, and it is currently frozen in time when trying to fetch a WAL buffer. One of the stack traces: #2 0x000000000084ec48 in WaitEventSetWaitBlock (set=0xd34ce0, cur_timeout=-1, occurred_events=0xffffffffadd8, nevents=1) at latch.c:1571 #3 WaitEventSetWait (set=0xd34ce0, timeout=-1, occurred_events=occurred_events@entry=0xffffffffadd8, nevents=nevents@entry=1, wait_event_info=<optimized out>, wait_event_info@entry=134217781) at latch.c:1519 #4 0x000000000084e964 in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=33, timeout=timeout@entry=-1, wait_event_info=wait_event_info@entry=134217781) at latch.c:538 #5 0x000000000085d2f8 in ConditionVariableTimedSleep (cv=0xffffec0799b0, timeout=-1, wait_event_info=134217781) at condition_variable.c:163 #6 0x000000000085d1ec in ConditionVariableSleep (cv=0xfffffffffffffffc, wait_event_info=1) at condition_variable.c:98 #7 0x000000000055f4f4 in AdvanceXLInsertBuffer (upto=upto@entry=112064880, tli=tli@entry=1, opportunistic=false) at xlog.c:2224 #8 0x0000000000568398 in GetXLogBuffer (ptr=ptr@entry=112064880, tli=tli@entry=1) at xlog.c:1710 #9 0x000000000055c650 in CopyXLogRecordToWAL (write_len=80, isLogSwitch=false, rdata=0xcc49b0 <hdr_rdt>, StartPos=<optimized out>, EndPos=<optimized out>, tli=1) at xlog.c:1245 #10 XLogInsertRecord (rdata=rdata@entry=0xcc49b0 <hdr_rdt>, fpw_lsn=fpw_lsn@entry=112025520, flags=0 '\000', num_fpi=<optimized out>, num_fpi@entry=0, topxid_included=false) at xlog.c:928 #11 0x000000000056b870 in XLogInsert (rmid=rmid@entry=16 '\020', info=<optimized out>, info@entry=16 '\020') at xloginsert.c:523 #12 0x0000000000537acc in addLeafTuple (index=0xffffebf32950, state=0xffffffffd5e0, leafTuple=0xe43870, current=<optimized out>, parent=<optimized out>, So, yes, something looks really wrong with this patch. Sounds plausible to me that some other buildfarm animals could be stuck without their owners knowing about it. It's proving to be a good idea to force a timeout value in the configuration file of these animals.. -- Michael
signature.asc
Description: PGP signature