On Mon, Oct 17, 2011 at 11:30 PM, Chris Redekop <ch...@replicon.com> wrote: > Well, on the other hand maybe there is something wrong with the data. > Here's the test/steps I just did - > 1. I do the pg_basebackup when the master is under load, hot slave now will > not start up but warm slave will. > 2. I start a warm slave and let it catch up to current > 3. On the slave I change 'hot_standby=on' and do a 'service postgresql > restart' > 4. The postgres fails to restart with the same error. > 5. I turn hot_standby back off and postgres starts back up fine as a warm > slave > 6. I then turn off the load, the slave is all caught up, master and slave > are both sitting idle > 7. I, again, change 'hot_standby=on' and do a service restart > 8. Again it fails, with the same error, even though there is no longer any > load. > 9. I repeat this warmstart/hotstart cycle a couple more times until to my > surprise, instead of failing, it successfully starts up as a hot standby > (this is after maybe 5 minutes or so of sitting idle) > So...given that it continued to fail even after the load had been turned of, > that makes me believe that the data which was copied over was invalid in > some way. And when a checkpoint/logrotation/somethingelse occurred when not > under load it cleared itself up....I'm shooting in the dark here > Anyone have any suggestions/ideas/things to try?
Having digged at this a little -- but not too much -- the problem seems to be that postgres is reading the commit logs way, way too early, that is to say, before it has played enough WAL to be 'consistent' (the WAL between pg_start and pg_stop backup). I have not been able to reproduce this problem (I think) after the message from postgres suggesting it has reached a consistent state; at that time I am able to go into hot-standby mode. The message is like: "consistent recovery state reached at %X/%X". (this is the errmsg) It doesn't seem meaningful for StartupCLOG (or, indeed, any of the hot-standby path functionality) to be called before that code is executed, but it is anyway right now. I'm not sure if this oversight is simply an oversight, or indicative of a misplaced assumption somewhere. Basically, my thoughts for a fix are to suppress hot_standby = on (in spirit) before the consistent recovery state is reached. -- fdr -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers