Re: [HACKERS] Hot standby, recovery infra

Heikki Linnakangas Thu, 05 Feb 2009 03:21:56 -0800

Simon Riggs wrote:

On Thu, 2009-02-05 at 11:46 +0200, Heikki Linnakangas wrote:
Simon Riggs wrote:
So we might end up flushing more often *and* we will be doing it
potentially in the code path of other users.
For example, imagine a database that fits completely in shared buffers.If we update at every XLogFileRead, we have to fsync every 16MB of WAL.If we update in XLogFlush the way I described, you only need to updatewhen we flush a page from the buffer cache, which will only happen atrestartpoints. That's far less updates.
Oh, did you change the bgwriter so it doesn't do normal page cleaning?

No. Ok, that wasn't completely accurate. The page cleaning by bgwriterwill perform XLogFlushes, but that should be pretty insignificant. Whenthere's little page replacement going on, bgwriter will do a smalltrickle of page cleaning, which won't matter much. If there's more pagereplacement going on, bgwriter is cleaning up pages that will soon bereplaced, so it's just offsetting work from other backends (or thestartup process in this case).

Expanding that example to a database that doesn't fit in cache, you'restill replacing pages from the buffer cache that have been untouched forlongest. Such pages will have an old LSN, too, so we shouldn't need toupdate very often.
They will tend to be written in ascending LSN order which will mean we
continually update the control file. Anything out of order does skip a
write. The better the cache is at finding LRU blocks out the more writes
we will make.

When minRecoveryPoint is updated, it's not update to just the LSN that'sbeing flushed. It's updated to the recptr of the most recently read WALrecord. That's an important point to avoid that behavior. Just likeXLogFlush normally always flushes all of the outstanding WAL, not justup to the requested LSN.

I'd like to have the extra protection that this approach gives. If welet safeStartPoint to be ahead of the actual WAL we've replayed, we haveto just assume we're fine if we reach end of WAL before reaching thatpoint. That assumption falls down if e.g recovery is stopped, and you goand remove the last few WAL segments from the archive before restartingit, or signal pg_standby to trigger failover too early. Tracking thereal safe starting point and enforcing it always protects you from that.
Doing it this way will require you to remove existing specific error
messages about ending before end time of backup, to be replaced by more
general ones that say "consistency not reached" which is harder to
figure out what to do about it.

Yeah. If that's an important distinction, we could still save theoriginal backup stop location somewhere, just so that we can give theold error message when we've not passed that location. But perhaps amessage like "WAL ends before reaching a consistent state" with a hint"Make sure you archive all the WAL created during backup" or somethingwould do suffice.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hot standby, recovery infra

Reply via email to