On 10/9/18 5:15 AM, Kyotaro HORIGUCHI wrote: > At Fri, 5 Oct 2018 15:35:04 +0900, Michael Paquier <mich...@paquier.xyz> > wrote in <20181005063504.gb14...@paquier.xyz> >> On Fri, Oct 05, 2018 at 12:16:03PM +0900, Michael Paquier wrote: >>> So, I have come back to this stuff, and finished with the attached >>> instead, so as the assertion is in a single place. I find that >>> clearer. The comments have also been improved. Thoughts? >> And so... I have been looking at committing this thing, and while >> testing in-depth I have been able to trigger a case where an autovacuum >> has been able to be not aggressive but anti-wraparound, which is exactly >> what should not be possible, no? I have simply created an instance with >> autovacuum_freeze_max_age = 200000, then ran pgbench with >> autovacuum_freeze_table_age=200000 set for each table, and also ran >> installcheck-world in parallel. This has been able to trigger the >> assertion pretty quickly. > I investigated it and in short, it can happen. > > It is a kind of race consdition between two autovacuum > processes. do_autovacuum() looks into pg_class (using a snapshot) > and vacuum_set_xid_limits() looks into relcache. If concurrent > vacuum happens and one has finished the relation, another gets > relcache invalidation and relfrozenxid is updated. If this > happens between do_autovacuum() and vacuum_set_xid_limits(), the > latter sees newer relfrozenxid than the former. The problem > happens when it moves by more than 5% of > autovacuum_freeze_max_age. > > If lazy_vacuum_rel() sees the situation, the relation is already > aggressively vacuumed by a cocurrent worker. We can just ingore > the state safely but also we know that the vacuum is useless. > > 1. Just allow the case there (and add comment). > Causes redundant anti-wraparound vacuum. > > 2. Skip the relation by the condition. > > I think that we can safely skip the relation in the > case. (attached) > > 3. Ensure that do_autovacuum always sees the same relfrozenxid > with vacuum_set_xid_limits(). > > 4. Prevent concurrent acuuming of the same relation rigorously, > somehow. > > Thoughts? >
I notice that this seems never to have been acted on. I think we should apply this and remove the (confusing) message setting for the case we'll now be avoiding. If not we should at least comment there that this is a case we only expect to see in pathological cases. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services