I wrote: > ProcessStandbyHSFeedbackMessage has a race condition: it thinks it can > call GetOldestXmin and then the result will politely hold still while > it considers what to do next. But in fact, whoever has the oldest xmin > could exit their transaction, allowing the global minimum to advance. > If a VACUUM process then inspects the ProcArray, it could compute an > oldest xmin that is newer than the value that > ProcessStandbyHSFeedbackMessage installs just afterwards. So much > for keeping the data the standby wanted.
Actually, it's worse than that. Clamping the walsender's xmin to GetOldestXmin() is useless, and if it weren't useless, this code would be completely wrong even discounting the bugs I pointed out already. Consider what it is we're trying to do here: we'd like to prevent VACUUMs on the master from deleting dead rows with xmax newer than the xmin the standby is requesting. Setting the walsender's xmin will only affect VACUUMs launched after that moment; anything that's already running will have already determined its threshold xmin, and perhaps already removed rows. You can't retroactively fix that. So clamping the walsender's xmin to GetOldestXmin doesn't actually do anything for data integrity; it only ensures that the value of GetOldestXmin doesn't go backwards on successive calls. But that's possible anyway, as the comments for that function already note. What's worse is that the only thing that is prevented from going backwards is the value of GetOldestXmin with allDbs = true. But that's only used by vacuums on shared catalogs. If we wanted to prevent the values seen by vacuums on non-shared tables from going backwards, we'd have to do some much-more-complex calculation that identified the largest value of GetOldestXmin with allDbs = false, across all databases. And that would *still* be wrong, because then the walsender would only be protecting data that is newer than the newest such value, which might allow data in other databases to be reclaimed too early. You could only really make this work by maintaining per-database xmins, which the standby isn't sending and there's no place to put into the walsender's ProcArray entry either. So I've concluded that there's just no point in the GetOldestXmin clamping, and we should just apply the xmin value we get from the standby if it passes basic sanity checks (the epoch test), and hope that we're not too late to prevent loss of the data the standby wanted. This line of reasoning also suggests that it's a pretty bad idea for the walsender to invalidate its existing xmin if it gets a transiently bogus (out of epoch) value from the standby. I think it should sit on the xmin it has, instead, to avoid potential loss of needed data. regards, tom lane -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers