> 1. src/backend/access/nbtree/nbtinsert.c, line 867: shouldn't this > END_CRIT_SECTION be moved up to before the _bt_wrtbuf call? It seems > to me that an elog during the wrtbuf is not a critical failure. If > this code is correct, then all the other crit sections are wrong, > because all of them release the crit section before writing buffers, > not after. Ok to move it up. > 2. src/backend/commands/vacuum.c, line 1907: does this > START_CRIT_SECTION really have to be here, and not down at line 1935, > just before PageRepairFragmentation()? I really don't like the idea > of turning those elogs that are inside the loop into reasons to force a > system-wide restart. Agreed. > 3. src/backend/access/transam/xlog.c, routine CreateCheckPoint: > does this *entire* routine need to be a critical section? Again, > I fear a shotgun approach will mean a net decrease in reliability, > not an improvement. How much of this code really has to be critical? When postmaster has to create Checkpoint this routine is called from bootstrap.c:BootstrapMain() - ie without normal initialization, so I don't know result of elog(ERROR) in this case -:( Probably I should spend more time in this area in attempt to make Checkpoints rollback-able, but ... Anyway it seems that the real source of elog(ERROR) there is FlushBufferPool(). Maybe we could just initialize Warn_restart in that point? All other places are mostly related to WAL itself - they are more critical and less likely subject to elog(ERROR). > Do you really want a failure in, say, MoveOfflineLogs to take down the > whole database? Well, this one could be changed at least (ie STOP --> LOG). Vadim