On Fri, Nov 24, 2023 at 11:00 AM Les <nagy...@gmail.com> wrote: [snip]
> Writing of WAL files continued after we shut down all clients, and > restarted the primary PostgreSQL server. > > The order was: > > 1. shut down all clients > 2. stop the primary > 3. start the primary > 4. primary started to write like mad again > 5. removed replication slot > 6. primary stopped madness and deleted all WAL files (except for a few) > > How can the primary server generate more and more WAL files (writes) after > all clients have been shut down and the server was restarted? My only bet > was the autovacuum. But I ruled that out, because removing a replication > slot has no effect on the autovacuum (am I wrong?). Now you are saying that > this looks like a huge rollback. Does rolling back changes require even > more data to be written to the WAL after server restart? As far as I know, > if something was not written to the WAL, then it is not something that can > be rolled back. Does removing a replication slot lessen the amount of data > needed to be written for a rollback (or for anything else)? It is a fact > that the primary stopped writing at 1.5GB/sec the moment we removed the > slot. > > I'm not saying that you are wrong. Maybe there was a > crazy application. I'm just saying that a crazy application cannot be the > whole picture. It cannot explain this behaviour as a whole. Or maybe I have > a deep misunderstanding about how WAL files work. On the second occasion, > the primary was running for a few minutes when pg_wal started to increase. > We noticed that early, and shut down all clients, then restarted the > primary server. After the restart, the primary was writing out more WAL > files for many more minutes, until we dropped the slot again. E.g. it was > writing much more data after the restart than before the restart; and it > only stopped (exactly) when we removed the slot. > pg_stat_activity will tell you something about what's happening even after you think "all clients have been shut down". I'd crank up the logging.to at least: log_error_verbosity = verbose log_statement = all track_activity_query_size = 10240 client_min_messages = notice log_line_prefix = '%m\t%r\t%u\t%d\t%p\t%i\t%a\t%e\t'