Hi hackers,

I was stress-testing master (commit e2b35735b00, assertions enabled) with a
workload that does a lot of DDL/DML, including creating and dropping
databases in a tight loop, and the autovacuum launcher kept crashing on me
--
every 15-40 minutes or so once it was under load:

  TRAP: failed Assert("pgstat_tracks_io_op(MyBackendType, io_object,
        io_context, io_op)"), File: "pgstat_io.c", Line: 74
  LOG:  autovacuum launcher process (PID ...) was terminated by signal 6:
        Aborted

The postmaster recovers fine, but it just starts another launcher that hits
the exact same assert, so it never really gets out of the loop.

The short version: the launcher is in get_database_list(), doing its seqscan
of pg_database, and on-access pruning kicks in during the scan. Since
b46e1e54d07 ("Allow on-access pruning to set pages all-visible"),
heap_page_prune_opt() pins the visibility map unconditionally once it
decides
to prune -- before it ever checks rel_read_only. visibilitymap_pin() isn't
read-only though: if the VM page isn't there yet it extends the fork, and
pg_database has no VM fork, so we end up doing an actual relation extend
(IOOP_EXTEND) from the launcher. pgstat_tracks_io_op() says the launcher
must never do an EXTEND, hence the assertion.

What surprised me is that the launcher's catalog scan isn't even flagged
read-only (table_beginscan_catalog doesn't set SO_HINT_REL_READ_ONLY),
so it never actually intends to set the VM -- it just pins/extends it
anyway.

Here are the relevant frames:
  #3  ExceptionalCondition ("pgstat_tracks_io_op(...)", "pgstat_io.c", 74)
          at assert.c:65
  #4  pgstat_count_io_op (io_object=IOOBJECT_RELATION,
          io_context=IOCONTEXT_NORMAL, io_op=IOOP_EXTEND, cnt=1, bytes=8192)
          at pgstat_io.c:74
  #5  pgstat_count_io_op_time (...) at pgstat_io.c:160
          at bufmgr.c:3030
  #7  ExtendBufferedRelCommon (... fork=VISIBILITYMAP_FORKNUM ...)
          at bufmgr.c:2774
  #8  ExtendBufferedRelTo (... fork=VISIBILITYMAP_FORKNUM, extend_to=1 ...)
          at bufmgr.c:1099
  #9  vm_extend (vm_nblocks=1, ...) at visibilitymap.c:614
  #10 vm_readbuf (blkno=0, extend=true) at visibilitymap.c:572
  #11 visibilitymap_pin (...) at visibilitymap.c:216
  #12 heap_page_prune_opt (..., rel_read_only=...) at pruneheap.c:339
  #13 heap_prepare_pagescan (...) at heapam.c:638
  #14 heapgettup_pagemode (... ForwardScanDirection ...) at heapam.c:1113
  #15 heap_getnext (...) at heapam.c:1454
  #16 get_database_list () at autovacuum.c:1856
  #17 do_start_worker () at autovacuum.c:1172
  #19 launch_worker (...) at autovacuum.c:1355
  #20 AutoVacLauncherMain (...) at autovacuum.c:780
  #21 postmaster_child_launch (child_type=B_AUTOVAC_LAUNCHER, ...)
          at launch_backend.c:268
  #22 StartChildProcess (type=B_AUTOVAC_LAUNCHER) at postmaster.c:4030
  #23 LaunchMissingBackgroundProcesses () at postmaster.c:3375
  #24 ServerLoop () at postmaster.c:1743
  #25 PostmasterMain (...) at postmaster.c:1415
  #26 main (...) at main.c:231

I haven't been able to boil this down to a clean standalone repro yet -- it
seems to need the launcher to hit get_database_list() at the moment a
pg_database page is prunable and the VM fork still has to grow -- but the
path
looks pretty clear from the stack.

Regards,
Ewan

Reply via email to