On 2021/11/17 14:39, Andrey Borodin wrote:
Hi hackers!

$subj was recently observed on one of our installations.

Startup process backtrace
#0  0x00007fd216660d27 in epoll_wait (epfd=525, events=0x55c688dfbde8, 
maxevents=maxevents@entry=1, timeout=timeout@entry=-1)
#1  0x000055c687264be9 in WaitEventSetWaitBlock (nevents=1, 
occurred_events=0x7ffdf8089f00, cur_timeout=-1, set=0x55c688dfbd78)
#2  WaitEventSetWait (set=set@entry=0x55c688dfbd78, timeout=timeout@entry=-1, 
occurred_events=occurred_events@entry=0x7ffdf8089f00, nevents=nevents@entry=1, 
wait_event_info=wait_event_info@entry=67108864) at 
/build/../src/backend/storage/ipc/latch.c:1000
#3  0x000055c687265038 in WaitLatchOrSocket (latch=0x7fd1fa735454, 
wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1, timeout=-1, timeout@entry=0, 
wait_event_info=wait_event_info@entry=67108864)
#4  0x000055c6872650f5 in WaitLatch (latch=<optimized out>, 
wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=0, 
wait_event_info=wait_event_info@entry=67108864
#5  0x000055c687276399 in ProcWaitForSignal 
(wait_event_info=wait_event_info@entry=67108864)
#6  0x000055c68726c898 in ResolveRecoveryConflictWithBufferPin ()
#7  0x000055c6872582c5 in LockBufferForCleanup (buffer=292159)
#8  0x000055c687259447 in ReadBuffer_common (smgr=0x55c688deae40, 
relpersistence=relpersistence@entry=112 'p', forkNum=forkNum@entry=MAIN_FORKNUM, 
blockNum=blockNum@entry=3751242, mode=mode@entry=RBM_ZERO_AND_CLEANUP_LOCK, 
strategy=strategy@entry=0x0, hit=0x7ffdf808a117 "\001")
#9  0x000055c687259b6b in ReadBufferWithoutRelcache (rnode=..., 
forkNum=forkNum@entry=MAIN_FORKNUM, blockNum=blockNum@entry=3751242, 
mode=mode@entry=RBM_ZERO_AND_CLEANUP_LOCK, strategy=strategy@entry=0x0)
#10 0x000055c68705655f in XLogReadBufferExtended (rnode=..., 
forknum=MAIN_FORKNUM, blkno=3751242, mode=RBM_ZERO_AND_CLEANUP_LOCK)
#11 0x000055c687056706 in XLogReadBufferForRedoExtended 
(record=record@entry=0x55c688dd2378, block_id=block_id@entry=0 '\000', 
mode=mode@entry=RBM_NORMAL, get_cleanup_lock=get_cleanup_lock@entry=1 '\001', 
buf=buf@entry=0x7ffdf808a218)
#12 0x000055c68700728b in heap_xlog_clean (record=0x55c688dd2378)
#13 heap2_redo (record=0x55c688dd2378)
#14 0x000055c68704a7eb in StartupXLOG ()

Backend holding the buffer pin:

#0  0x00007fd216660d27 in epoll_wait (epfd=5, events=0x55c688d67ca8, 
maxevents=maxevents@entry=1, timeout=timeout@entry=-1)
#1  0x000055c687264be9 in WaitEventSetWaitBlock (nevents=1, 
occurred_events=0x7ffdf808e070, cur_timeout=-1, set=0x55c688d67c38)
#2  WaitEventSetWait (set=0x55c688d67c38, timeout=timeout@entry=-1, 
occurred_events=occurred_events@entry=0x7ffdf808e070, nevents=nevents@entry=1, 
wait_event_info=wait_event_info@entry=100663297)
#3  0x000055c687185d7e in secure_write (port=0x55c688db4a60, 
ptr=ptr@entry=0x55c688dd338e, len=len@entry=2666)
#4  0x000055c687190f4c in internal_flush ()
#5  0x000055c68719118a in internal_putbytes (s=0x55c68f8dcc35 "", s@entry=0x55c68f8dcc08 
"", len=65)
#6  0x000055c687191262 in socket_putmessage (msgtype=<optimized out>, s=0x55c68f8dcc08 
"", len=<optimized out>)
#7  0x000055c687193431 in pq_endmessage (buf=buf@entry=0x7ffdf808e1a0)
#8  0x000055c686fd1442 in printtup (slot=0x55c6894c2dc0, self=0x55c689326b40)
#9  0x000055c687151962 in ExecutePlan (execute_once=<optimized out>, dest=0x55c689326b40, 
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, 
operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55c689281a28, 
estate=0x55c6892816f8)
#10 standard_ExecutorRun (queryDesc=0x55c689289628, direction=<optimized out>, 
count=0, execute_once=<optimized out>)
#11 0x00007fd2074100c5 in pgss_ExecutorRun (queryDesc=0x55c689289628, 
direction=ForwardScanDirection, count=0, execute_once=<optimized out>)
#12 0x000055c68728a356 in PortalRunSelect (portal=portal@entry=0x55c688d858b8, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x55c689326b40)
#13 0x000055c68728b988 in PortalRun (portal=portal@entry=0x55c688d858b8, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
run_once=run_once@entry=1 '\001', dest=dest@entry=0x55c689326b40, 
altdest=altdest@entry=0x55c689326b40, completionTag=0x7ffdf808e580 "")
#14 0x000055c687287425 in exec_simple_query (query_string=0x55c688ec5e38 
"select\n  '/capacity/created-at-counter-by-time-v2' as sensor,\n  
round(extract(epoch from shipment_date))
#15 0x000055c687289418 in PostgresMain (argc=<optimized out>, 
argv=argv@entry=0x55c688dd3e28, dbname=<optimized out>, username=<optimized out>)


I think the problem here is that secure_write() uses infinite timeout.

Is this the same issue as one reported at [1]?

[1] 
https://www.postgresql.org/message-id/adce2c09-3bfc-4666-997a-c21991cb1eb1.mengjuan....@alibaba-inc.com

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION


Reply via email to