Sorry..."designed" was poor choice of words, I meant "not unexpected". Doing the checkpoint right after pg_stop_backup() looks like it will work perfectly for me, so thanks for all your help!
On a side note I am sporadically seeing another error on hotstandby startup. I'm not terribly concerned about it as it is pretty rare and it will work on a retry so it's not a big deal. The error is "FATAL: out-of-order XID insertion in KnownAssignedXids". If you think it might be a bug and are interested in hunting it down let me know and I'll help any way I can...but if you're not too worried about it then neither am I :) On Thu, Oct 27, 2011 at 4:55 PM, Simon Riggs <si...@2ndquadrant.com> wrote: > On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop <ch...@replicon.com> > wrote: > > > hrmz, still basically the same behaviour. I think it might be a *little* > > better with this patch. Before when under load it would start up quickly > > maybe 2 or 3 times out of 10 attempts....with this patch it might be up > to 4 > > or 5 times out of 10...ish...or maybe it was just fluke *shrug*. I'm > still > > only seeing your log statement a single time (I'm running at debug2). I > > have discovered something though - when the standby is in this state if I > > force a checkpoint on the primary then the standby comes right up. Is > there > > anything I check or try for you to help figure this out?....or is it > > actually as designed that it could take 10-ish minutes to start up even > > after all clients have disconnected from the primary? > > Thanks for testing. The improvements cover specific cases, so its not > subject to chance; its not a performance patch. > > It's not "designed" to act the way you describe, but it does. > > The reason this occurs is that you have a transaction heavy workload > with occasional periods of complete quiet and a base backup time that > is much less than checkpoint_timeout. If your base backup was slower > the checkpoint would have hit naturally before recovery had reached a > consistent state. Which seems fairly atypical. I guess you're doing > this on a test system. > > It seems cheap to add in a call to LogStandbySnapshot() after each > call to pg_stop_backup(). > > Does anyone think this case is worth adding code for? Seems like one > more thing to break. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >