On Thu, May 23, 2019 at 2:43 AM Michael Paquier <mich...@paquier.xyz> wrote: > On Tue, May 21, 2019 at 08:39:18AM -0400, Robert Haas wrote: > > Yes. I thought I had described it. You create an unlogged table, > > with an index of a type that does not smgrimmedsync(), your > > transaction commits, and then the system crashes, losing the _init > > fork for the index. > > The init forks won't magically go away, except in one case for empty > routines not going through shared buffers.
No magic is required. If you haven't called fsync(), the file might not be there after a system crash. Going through shared_buffers guarantees that the file will be fsync()'d before the next checkpoint, but I'm talking about a scenario where you crash before the next checkpoint. > Then, empty routines going through shared buffers fill in one or more > buffers, mark it/them as empty, dirty it/them, log the page(s) and then > unlock the buffer(s). If a crash happens after the transaction > commits, so we would still have the init page in WAL, and at the end > of recovery we would know about it. Yeah, but the problem is that the currently system requires us to know about it at the *beginning* of recovery. See my earlier remarks: Suppose we create an unlogged table and then crash. The main fork makes it to disk, and the init fork does not. Before WAL replay, we remove any main forks that have init forks, but because the init fork was lost, that does not happen. Recovery recreates the init fork. After WAL replay, we try to copy_file() each _init fork to the corresponding main fork. That fails, because copy_file() expects to be able to create the target file, and here it can't do that because it already exists. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company