Alexander Korotkov <aekorot...@gmail.com> writes:
> I think this indicates unfinished intention to wait for checkpoint
> completion.  But I think both cases (checkpoint finished and
> unfinished) should work correctly.  So, I believe there is a backend
> problem.  I'm trying to reproduce this locally.  Sorry for the
> confusion.

I've found it not hugely difficult to reproduce: on two different
Mac M4 machines, repeated runs of 046_checkpoint_logical_slot.pl
hung up after 30 or so tries.  Backtrace of the stuck query
looks like

 (lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00000001836b01c8 libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x000000018358c6f4 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x000000010070d358 postgres`pg_usleep(microsec=1000) at 
pgsleep.c:50:10 [opt]
    frame #3: 0x00000001002a0508 
postgres`read_local_xlog_page_guts(state=0x0000000126057298, 
targetPagePtr=687882240, reqLen=277, targetRecPtr=<unavailable>, 
cur_page="\U00000018\xd1\U00000005", wait_for_wal=true) at xlogutils.c:950:4 
[opt]
    frame #4: 0x0000000100298f0c 
postgres`ReadPageInternal(state=0x0000000126057298, pageptr=687882240, 
reqLen=277) at xlogreader.c:1061:12 [opt]
    frame #5: 0x0000000100298300 postgres`XLogReadAhead [inlined] 
XLogDecodeNextRecord(state=0x0000000126057298, nonblocking=<unavailable>) at 
xlogreader.c:727:14 [opt]
    frame #6: 0x00000001002980b4 
postgres`XLogReadAhead(state=0x0000000126057298, nonblocking=false) at 
xlogreader.c:974:11 [opt]
    frame #7: 0x0000000100297fd4 
postgres`XLogReadRecord(state=0x0000000126057298, errormsg=0x000000016fc24d08) 
at xlogreader.c:405:3 [opt]
    frame #8: 0x00000001004ee514 
postgres`pg_logical_slot_get_changes_guts(fcinfo=<unavailable>, confirm=true, 
binary=<unavailable>) at logicalfuncs.c:257:13 [opt]
    frame #9: 0x00000001004ee1e8 
postgres`pg_logical_slot_get_changes(fcinfo=<unavailable>) at 
logicalfuncs.c:333:9 [opt]
    frame #10: 0x00000001003a77ec 
postgres`ExecMakeTableFunctionResult(setexpr=0x000000012603da40, 
econtext=<unavailable>, argContext=<unavailable>, 
expectedDesc=0x00000001260472b0, randomAccess=false) at execSRF.c:234:13 [opt]
    frame #11: 0x00000001003ba02c 
postgres`FunctionNext(node=0x000000012603d6d0) at nodeFunctionscan.c:94:5 [opt]
    frame #12: 0x00000001003b4894 postgres`fetch_input_tuple [inlined] 
ExecProcNode(node=0x000000012603d6d0) at executor.h:313:9 [opt]
    frame #13: 0x00000001003b4878 
postgres`fetch_input_tuple(aggstate=0x000000012603cf28) at nodeAgg.c:563:10 
[opt]
    frame #14: 0x00000001003b2eb4 postgres`ExecAgg [inlined] 
agg_retrieve_direct(aggstate=0x000000012603cf28) at nodeAgg.c:2450:17 [opt]
    frame #15: 0x00000001003b2ab8 postgres`ExecAgg(pstate=0x000000012603cf28) 
at nodeAgg.c:2265:14 [opt]

so apparently it's waiting patiently for WAL that will never appear.
(I do see the 'checkpoint complete' log message before the shutdown
in this particular run.)  I still have this debugger session open
if there's any particular data you'd like me to look at.

                        regards, tom lane


Reply via email to