On Fri, Jan 13, 2017 at 8:45 AM, Alvaro Herrera <alvhe...@2ndquadrant.com> wrote: > Amit Khandekar wrote: >> In a server where autovacuum is disabled and its databases reach >> autovacuum_freeze_max_age limit, an autovacuum is forced to prevent >> xid wraparound issues. At this stage, when the server is loaded with a >> lot of DML operations, an exceedingly high number of autovacuum >> workers keep on getting spawned, and these do not do anything, and >> then quit. > > I think this is the same problem as reported in > https://www.postgresql.org/message-id/CAMkU=1yE4YyCC00W_GcNoOZ4X2qxF7x5DUAR_kMt-Ta=ypy...@mail.gmail.com
If I understand correctly, and it's possible that I don't, the issues are distinct. I think that the issue in that thread has to do with the autovacuum launcher starting workers over and over again in a tight loop, whereas this issue seems to be about autovacuum workers restarting the launcher over and over again in a tight loop. In that thread, it's the autovacuum launcher that is looping, which can only happen when autovacuum=on. In this thread, the autovacuum launcher is repeatedly exiting and getting restarted, which can only happen when autovacuum=off. In general, it seems we've been pretty cavalier about just how often it's reasonable to start the autovacuum launcher when autovacuum=off. That code probably doesn't see much real-world use. Foreground processes signal the postmaster only every 64kB transactions, which on today's hardware can't happen more than every couple of seconds if you're not using subtransactions or intentionally burning XIDs, but hardware keeps getting faster, and you might be using subtransactions. However, requiring that 65,536 transactions pass between signals does serve as something of a rate limit. In the case about which Amit is complaining, there's no rate limit at all. As fast as the autovacuum launcher starts up, it spawns a worker and exits; as fast as the worker can determine that it can't do anything useful, it starts a new launcher. Clearly, some kind of rate control is needed here; the only question is about where to put it. I would be tempted to install something directly in postmaster.c. If CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) && Shutdown == NoShutdown but we last set start_autovac_launcher = true less than 10 seconds ago, don't do it again. That limits us to launching the autovacuum launcher at most six times a minute when autovacuum = off. You could argue that defeats the point of the SendPostmasterSignal in SetTransactionIdLimit, but I don't think so. If vacuuming the oldest database took less than 10 seconds, then we won't vacuum the next-oldest database until we hit the next 64kB transaction ID boundary, but that can only cause a problem if we've got so many databases that we don't get to them all before we run out of transaction IDs, which is almost unthinkable. If you had a ten million tiny databases that all crossed the threshold at the same instant, it would take you 640 million transaction IDs to visit them all. If you also had autovacuum_freeze_max_age set very close to the upper limit for that variable, you could conceivably have the system shut down before all of those databases were reached. But that's a pretty artificial scenario. If someone has that scenario, perhaps they should consider more sensible configuration choices. I wondered for a while why the existing guard in vac_update_datfrozenxid() isn't sufficient to prevent this problem. That turns out to be due to Tom's commit 794e3e81a0e8068de2606015352c1254cb071a78, which causes ForceTransactionIdLimitUpdate() always returns true when we're past xidVacLimit. The commit doesn't contain much in the way of justification for the change, but I think the issue must be that if the database nearest to wraparound is dropped, we need some mechanism for eventually forcing xidVacLimit to get updated, rather than just spewing warnings. Another place where we could insert a guard is inside SetTransactionIdLimit itself. This is a little tricky. The easy idea would be just to skip sending the signal if xidVacLimit hasn't advanced, but that's wrong in the case where there are multiple databases with exactly the same oldest XID; vacuuming the first one doesn't change anything. It would be correct -- I think -- to skip sending the signal when xidVacLimit doesn't advance and vac_update_datfrozenxid() didn't change the current database's value either, but that requires passing a flag down the call stack a few levels. That's only mildly ugly so I'd be fine with it if it were the best fix, but there seem to be better options. Amit's chosen yet another possible place to insert the guard: teach autovacuum that if a worker skips at least one table due to concurrent autovacuum activity AND ends up vacuuming no tables, don't call vac_update_datfrozenxid(). Since there is or was another worker running, vac_update_datfrozenxid() either already has been called or will be when that worker finishes. So that seems safe. If his patch were changed to skip vac_update_datfrozenxid() in all cases where we do nothing rather than only when we skip a table due to concurrent activity, we'd reintroduce the dropped-database problem that was fixed by 794e3e81a0e8068de2606015352c1254cb071a78. I'm not entirely sure whether Amit's fix is better or worse than the postmaster-based fix. It seems like a fairly fundamental weakness for the postmaster to have no rate-limiting logic whatsoever here; it should be the postmaster's job to judge whether it's getting swamped with signals, and if we fix it in the postmaster then it stops systems with high rates of XID consumption from going bonkers for that reason. On the other hand, if somebody does have a scenario where repeatedly signaling the postmaster to start the launcher in a tight loop is allowing the system to zip through many small databases efficiently, Amit's fix will let that keep working, whereas throttling in the postmaster will make it take longer to get to all of those databases. In many cases, that could be an improvement, since it would tend to spread out the datfrozenxid values better, but I can't quite shake the niggling fear that there might be some case I'm not thinking of where it's problematic. So I don't know. As far as the problem on the other thread, maybe we could extend Amit's approach so that when a worker exits after having skipped some tables but not vacuum any tables, we blacklist the database for some period of time or some number of iterations: autovacuum workers aren't allowed to choose that database until the blacklist entry expires. That way, if it becomes evident that more autovacuum workers in that database are useless, other databases get a chance to attract some workers, at least for some period of time. I'm not sure how to calibrate that exactly, but it's a thought. I think we should fix this problem first, though; it's subject to a narrower and less-speculative repair. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers