On Thu, Feb 17, 2022 at 4:13 PM Andres Freund <and...@anarazel.de> wrote: > Could you or Dilip outline how it now works, and what exactly makes it safe > etc (e.g. around locking, invalidation processing, snapshots, xid horizons)? > > I just scrolled through the patchset without finding such an explanation, so > it's a bit hard to judge.
That's a good question and it's making me think about a few things I hadn't considered before. Dilip can add more here, but my impression is that most problems are prevented by CREATE DATABASE, with or without this patch, starts by acquiring a ShareLock on the database, preventing new connections, and then making sure there are no existing connections. That means nothing in the target database can be changing, which I think makes a lot of the stuff on your list a non-issue. Any problems that remain have to be the result of something that CREATE DATABASE does having a bad interaction either with something that is completed beforehand or something that begins afterward. There just can't be overlap, and I think that rules out most problems. Now you pointed out earlier one problem that it doesn't fix: unlike the current method, this method involves reading buffers from the template database into shared_buffers, and those buffers, once read, stick around even after the operation finishes. That's not an intrinsic problem, though, because a connection to the database could do the same thing. However, again as you pointed out, it is a problem, though, if we do it with less locking than a real database connection would have done. It seems to me that if there are other problems here, they have to be basically of the same sort: they have to leave the system in some state which is otherwise impossible. Do you see some other kind of hazard, or more examples of how that could happen? I think the leftover buffers in shared_buffers have to be basically the only thing, because apart from that, how is this any different than a file copy? The only other kind of hazard I can think of is: could it be unsafe to try to interpret the contents of a database to which no one else is connected at present due to any of the issues you mention? But what's the hazard exactly? It can't be a problem if we've failed to process sinval messages for the target database, because we're not using any caches anyway. We can't. It can't be unsafe to test visibility of XIDs for that database, because in an alternate universe some backend could have connected to that database and seen the same XIDs. One thing we COULD be doing wrong is using the wrong snapshot to test the visibility of XIDs. The patch uses GetActiveSnapshot(), and I'm thinking that is probably wrong. Shouldn't it be GetLatestSnapshot()? And do we need to worry about snapshots being database-specific? Maybe so. > > But if it is the latter then there's really no point to that kind of cleanup > > work and we should probably just give up now. > > This thread is long. Could you summarize what lead you to consider other > approaches (e.g. looking in the filesystem for relfilenodes) as not feasible / > too ugly / ...? I don't think it's infeasible to look at the filesystem for files and just copy whatever files we find. It's a plausible alternate design. I just don't like it as well. I think that relying on the filesystem contents to tell us what's going on is kind of hacky. The only technical issue I see there is that the WAL logging might require more kludgery, since that mechanism is kind of intertwined with shared_buffers. You'd have to get the right block references into the WAL record, and you have to make sure that checkpoints don't move the redo pointer at an inopportune moment. -- Robert Haas EDB: http://www.enterprisedb.com