On Mon, Feb 17, 2025 at 11:25:05AM -0500, Tom Lane wrote:
> This timeout failure on hachi looks suspicious as well:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hachi&dt=2025-02-17%2003%3A05%3A03
> 
> Might be relevant that they are both aarch64?

Just logged into the host.  The logs of the timed out run are still
around, and the last information I can see is from lastcommand.log,
which seems to have frozen in time when the timeout has begun its
vacuuming work:
ok 73        + index_including_gist 353 ms
# parallel group (16 tests):  create_cast errors create_aggregate 
drop_if_exists infinite_recurse

gokiburi is on the same host, and it is currently frozen in time when
trying to fetch a WAL buffer.  One of the stack traces:
#2  0x000000000084ec48 in WaitEventSetWaitBlock (set=0xd34ce0,
cur_timeout=-1, occurred_events=0xffffffffadd8, nevents=1) at
latch.c:1571
#3  WaitEventSetWait (set=0xd34ce0, timeout=-1,
occurred_events=occurred_events@entry=0xffffffffadd8,
nevents=nevents@entry=1, wait_event_info=<optimized out>,
wait_event_info@entry=134217781) at latch.c:1519
#4  0x000000000084e964 in WaitLatch (latch=<optimized out>,
wakeEvents=wakeEvents@entry=33, timeout=timeout@entry=-1,
wait_event_info=wait_event_info@entry=134217781)     at latch.c:538
#5  0x000000000085d2f8 in ConditionVariableTimedSleep
(cv=0xffffec0799b0, timeout=-1, wait_event_info=134217781) at
condition_variable.c:163
#6  0x000000000085d1ec in ConditionVariableSleep
(cv=0xfffffffffffffffc, wait_event_info=1) at condition_variable.c:98
#7  0x000000000055f4f4 in AdvanceXLInsertBuffer
(upto=upto@entry=112064880, tli=tli@entry=1, opportunistic=false) at
xlog.c:2224
#8  0x0000000000568398 in GetXLogBuffer (ptr=ptr@entry=112064880,
tli=tli@entry=1) at xlog.c:1710
#9  0x000000000055c650 in CopyXLogRecordToWAL (write_len=80,
isLogSwitch=false, rdata=0xcc49b0 <hdr_rdt>, StartPos=<optimized out>,
EndPos=<optimized out>, tli=1)     at xlog.c:1245
#10 XLogInsertRecord (rdata=rdata@entry=0xcc49b0 <hdr_rdt>,
fpw_lsn=fpw_lsn@entry=112025520, flags=0 '\000', num_fpi=<optimized
out>, num_fpi@entry=0,      topxid_included=false) at xlog.c:928
#11 0x000000000056b870 in XLogInsert (rmid=rmid@entry=16 '\020',
info=<optimized out>, info@entry=16 '\020') at xloginsert.c:523
#12 0x0000000000537acc in addLeafTuple (index=0xffffebf32950,
state=0xffffffffd5e0, leafTuple=0xe43870, current=<optimized out>,
parent=<optimized out>,  

So, yes, something looks really wrong with this patch.  Sounds
plausible to me that some other buildfarm animals could be stuck
without their owners knowing about it.  It's proving to be a good idea
to force a timeout value in the configuration file of these animals..
--
Michael

Attachment: signature.asc
Description: PGP signature

Reply via email to