On Thu, Feb 23, 2017 at 9:40 PM, Magnus Hagander <mag...@hagander.net> wrote: > I'm not sure this logic belongs in pg_receivexlog. If we put the decision > making there, then we lock ourselves into one "type of policy".
That's not really true. We can add other policies - or extensibility - later. A more accurate statement, ISTM, would be that initially we only support one type of policy. But that's fine; more can be added later. > Wouldn't this one, along with some other scenarios, be better provided by > the "run command at end of segment" function that we've talked about before? > And then that external command could implement whatever aging logic would be > appropriate for the environment? I don't think it's bad to have that, but I don't understand the resistance to having a policy that by default lets us keep as much WAL as will fit within our space budget. That seems like an eminently sensible thing to want. Ideally, I'd like to be able to recovery any backup, however old, from that point forward to a point of my choosing. But if I run out of disk space, removing the oldest WAL files I have is more sensible than not accepting new ones. Sure, I'll have less ability to go back in time, but I'm less likely to need the data from 3 or 4 backups ago than I am to need the most recent data. I'm only going to go back to an older backup if I can't recover from the most recent one, or if I need some data that was removed or corrupted some time ago. It's good to have that ability for as long as it is sustainable, but when I have to pick, I want the new stuff. I think we're actually desperately in need of smarter WAL management tools in core not just in this respect but in a whole bunch of places, and I think size is an excellent thing for those tools to be considering. When Heikki implemented min_wal_size and max_wal_size (88e982302684246e8af785e78a467ac37c76dee9, February 2015) there was quite a bit of discussion about how nice it would be to have a HARD limit on WAL size rather than a soft limit. When the system gets too close to the hard limit, processes trying to write WAL slow or stop until a checkpoint can be completed, allowing for the removal of WAL. Heroku also previously advocated for such a system, to replace their ad-hoc system of SIGSTOPping backends for short periods of time (!) to accomplish the same thing. When replication slots were added (858ec11858a914d4c380971985709b6d6b7dd6fc, January 2014) we talked about how nice it would be if there were a facility to detect when a replication slot (or combination of slots) was forcing the retention of too much WAL and, when some threshold is exceeded, disable WAL retention for those slots to prevent disk space exhaustion. When pg_archivecleanup was added (ca65f2190ae20b8bba9aa66e4cab1982b95d109f, 24bfbb5857a1e7ae227b526e64e540752c3b1fe3, June 2010) it documented that it was really only smart enough to handle the case of an archive for the benefit of a single standby, and it didn't do anything to help you if that one standby got far enough behind to fill up the disk. In all of these cases, we're still waiting for something smarter to come along. This is an enormous practical problem. "pg_xlog filled up" is a reasonably common cause of production outages, and "archive directory filled up" is a reasonably common cause of "pg_xlog filled up". I don't mind having a mode where we give the user the tools with which to build their own solution to these problems, but we shouldn't ignore the likelihood that many people are likely to want the same policies, and I'd rather have those commonly-used policies well-implemented in core than implemented at highly varying levels of quality in individual installations. All IMHO, of course... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers