Re: [HACKERS] 9.1.2 ?
On Wed, Nov 9, 2011 at 6:22 PM, Florian Pflug wrote: > On Nov9, 2011, at 23:53 , Daniel Farina wrote: > > I think a novice user would be scared half to death: I know I was the > > first time. That's not a great impression for the project to leave > > for what is not, at its root, a vast defect, and the fact it's > > occurring for people when they use rsync rather than my very sensitive > > backup routines is indication that it's not very corner-ey. > > Just to emphasize the non-conerish-ness of this problem, it should be > mentioned that the HS issue was observed even with backups taken with > pg_basebackup, if memory serves correctly. > Yes I personally can reliably reproduce both the clog+subtrans problems using pg_basebackup, and can confirm that the "oldestActiveXid_fixed.v2.patch" does resolve both issues.
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
okay, sorry I'm a little confused then. Should I be able to apply both the v2 patch as well as the v3 patch? or is it expected that I'd have to manually do the merge? On Wed, Nov 2, 2011 at 1:34 AM, Simon Riggs wrote: > On Wed, Nov 2, 2011 at 2:40 AM, Chris Redekop wrote: > > > looks like the v3 patch re-introduces the pg_subtrans issue... > > No, I just separated the patches to be clearer about the individual > changes. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] Hot Standby startup with overflowed snapshots
oopsreply-to-all -- Forwarded message -- From: Chris Redekop Date: Wed, Nov 2, 2011 at 8:41 AM Subject: Re: [HACKERS] Hot Standby startup with overflowed snapshots To: Simon Riggs Sure, I've got quite a few logs lying around - I've attached 3 of 'em...let me know if there are any specific things you'd like me to do or look for next time it happens On Wed, Nov 2, 2011 at 2:59 AM, Simon Riggs wrote: > On Fri, Oct 28, 2011 at 3:42 AM, Chris Redekop wrote: > > > On a side note I am sporadically seeing another error on hotstandby > startup. > > I'm not terribly concerned about it as it is pretty rare and it will > work > > on a retry so it's not a big deal. The error is "FATAL: out-of-order > XID > > insertion in KnownAssignedXids". If you think it might be a bug and are > > interested in hunting it down let me know and I'll help any way I > can...but > > if you're not too worried about it then neither am I :) > > I'd be interested to see further details of this if you see it again, > or have access to previous logs. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > postgresql-2011-10-27_202007.log Description: Binary data postgresql-2011-10-31_152925.log Description: Binary data postgresql-2011-11-01_094501.log Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
looks like the v3 patch re-introduces the pg_subtrans issue... On Tue, Nov 1, 2011 at 9:33 AM, Simon Riggs wrote: > On Thu, Oct 27, 2011 at 4:25 PM, Simon Riggs > wrote: > > > StartupMultiXact() didn't need changing, I thought, but I will review > further. > > Good suggestion. > > On review, StartupMultiXact() could also suffer similar error to the > clog failure. This was caused *because* MultiXact is not maintained by > recovery, which I had thought meant it was protected from such > failure. > > Revised patch attached. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] Hot Standby startup with overflowed snapshots
Sorry..."designed" was poor choice of words, I meant "not unexpected". Doing the checkpoint right after pg_stop_backup() looks like it will work perfectly for me, so thanks for all your help! On a side note I am sporadically seeing another error on hotstandby startup. I'm not terribly concerned about it as it is pretty rare and it will work on a retry so it's not a big deal. The error is "FATAL: out-of-order XID insertion in KnownAssignedXids". If you think it might be a bug and are interested in hunting it down let me know and I'll help any way I can...but if you're not too worried about it then neither am I :) On Thu, Oct 27, 2011 at 4:55 PM, Simon Riggs wrote: > On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop > wrote: > > > hrmz, still basically the same behaviour. I think it might be a *little* > > better with this patch. Before when under load it would start up quickly > > maybe 2 or 3 times out of 10 attemptswith this patch it might be up > to 4 > > or 5 times out of 10...ish...or maybe it was just fluke *shrug*. I'm > still > > only seeing your log statement a single time (I'm running at debug2). I > > have discovered something though - when the standby is in this state if I > > force a checkpoint on the primary then the standby comes right up. Is > there > > anything I check or try for you to help figure this out?or is it > > actually as designed that it could take 10-ish minutes to start up even > > after all clients have disconnected from the primary? > > Thanks for testing. The improvements cover specific cases, so its not > subject to chance; its not a performance patch. > > It's not "designed" to act the way you describe, but it does. > > The reason this occurs is that you have a transaction heavy workload > with occasional periods of complete quiet and a base backup time that > is much less than checkpoint_timeout. If your base backup was slower > the checkpoint would have hit naturally before recovery had reached a > consistent state. Which seems fairly atypical. I guess you're doing > this on a test system. > > It seems cheap to add in a call to LogStandbySnapshot() after each > call to pg_stop_backup(). > > Does anyone think this case is worth adding code for? Seems like one > more thing to break. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] Hot Standby startup with overflowed snapshots
hrmz, still basically the same behaviour. I think it might be a *little* better with this patch. Before when under load it would start up quickly maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4 or 5 times out of 10...ish...or maybe it was just fluke *shrug*. I'm still only seeing your log statement a single time (I'm running at debug2). I have discovered something though - when the standby is in this state if I force a checkpoint on the primary then the standby comes right up. Is there anything I check or try for you to help figure this out?or is it actually as designed that it could take 10-ish minutes to start up even after all clients have disconnected from the primary? On Thu, Oct 27, 2011 at 11:27 AM, Simon Riggs wrote: > On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop wrote: > > > Thanks for the patch Simon, but unfortunately it does not resolve the > issue > > I am seeing. The standby still refuses to finish starting up until long > > after all clients have disconnected from the primary (>10 minutes). I do > > see your new log statement on startup, but only once - it does not > repeat. > > Is there any way for me to see what the oldest xid on the standby is > via > > controldata or something like that? The standby does stream to keep up > with > > the primary while the primary has load, and then it becomes idle when the > > primary becomes idle (when I kill all the connections)so it appears > to > > be current...but it just doesn't finish starting up > > I'm not sure if it's relevant, but after it has sat idle for a couple > > minutes I start seeing these statements in the log (with the same offset > > every time): > > DEBUG: skipping restartpoint, already performed at 9/9520 > > OK, so it looks like there are 2 opportunities to improve, not just one. > > Try this. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] Hot Standby startup with overflowed snapshots
Thanks for the patch Simon, but unfortunately it does not resolve the issue I am seeing. The standby still refuses to finish starting up until long after all clients have disconnected from the primary (>10 minutes). I do see your new log statement on startup, but only once - it does not repeat. Is there any way for me to see what the oldest xid on the standby is via controldata or something like that? The standby does stream to keep up with the primary while the primary has load, and then it becomes idle when the primary becomes idle (when I kill all the connections)so it appears to be current...but it just doesn't finish starting up I'm not sure if it's relevant, but after it has sat idle for a couple minutes I start seeing these statements in the log (with the same offset every time): DEBUG: skipping restartpoint, already performed at 9/9520 On Thu, Oct 27, 2011 at 7:26 AM, Simon Riggs wrote: > Chris Redekop's recent report of slow startup for Hot Standby has made > me revisit the code there. > > Although there isn't a bug, there is a missed opportunity for starting > up faster which could be the source of Chris' annoyance. > > The following patch allows a faster startup in some circumstances. > > The patch also alters the log levels for messages and gives a single > simple message for this situation. The log will now say > > LOG: recovery snapshot waiting for non-overflowed snapshot or until > oldest active xid on standby is at least %u (now %u) > ...multiple times until snapshot non-overflowed or xid reached... > > whereas before the first LOG message shown was > > LOG: consistent state delayed because recovery snapshot incomplete > and only later, at DEBUG2 do you see > LOG: recovery snapshot waiting for %u oldest active xid on standby is %u > ...multiple times until xid reached... > > Comments please. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
FYI I have given this patch a good test and can now no longer reproduce either the subtrans nor the clog error. Thanks guys! On Wed, Oct 26, 2011 at 11:09 AM, Simon Riggs wrote: > On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs > wrote: > > On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs > wrote: > > > >> Brewing a patch now. > > > > Latest thinking... confirmations or other error reports please. > > > > This fixes both the subtrans and clog bugs in one patch. > > I'll be looking to commit that tomorrow afternoon as two separate > patches with appropriate credits. > > -- > Simon Riggs http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services >
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
> And I think they also reported that if they didn't run hot standby, > but just normal recovery into a new master, it didn't have the problem > either, i.e. without hotstandby, recovery ran, properly extended the > clog, and then ran as a new master fine. Yes this is correct...attempting to start as hotstandby will produce the pg_clog error repeatedly and then without changing anything else, just turning hot standby off it will start up successfully. > This fits the OP's observation ob the > problem vanishing when pg_start_backup() does an immediate checkpoint. Note that this is *not* the behaviour I'm seeingit's possible it happens more frequently without the immediate checkpoint, but I am seeing it happen even with the immediate checkpoint. > This is a different problem and has already been reported by one of > your colleagues in a separate thread, and answered in detail by me > there. There is no bug related to this error message. Excellent...I will continue this discussion in that thread.
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
> > That isn't a Hot Standby problem, a recovery problem nor is it certain > its a PostgreSQL problem. > Do you have any theories on this that I could help investigate? It happens even when using pg_basebackup and it persists until another sync is performed, so the files must be in some state that that it can't recover fromwithout understanding the internals just viewing from an outside perspective, I don't really see how this could not be a PostgreSQL problem
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
> Chris, can you rearrange the backup so you copy the pg_control file as > the first act after the pg_start_backup? I tried this and it doesn't seem to make any difference. I also tried the patch and I can no longer reproduce the subtrans error, however instead it now it starts up, but never gets to the point where it'll accept connections. It starts up but if I try to do anything I always get "FATAL: the database system is starting up"...even if the load is removed from the primary, the standby still never finishes "starting up". Attached below is a log of one of these startup attempts. In my testing with the patch applied approx 3 in 10 attempts start up successfully, 7 in 10 attempts go into the "db is starting up" statethe pg_clog error is still there, but seems much harder to reproduce nowI've seen it only once since applying the patch (out of probably 50 or 60 under-load startup attempts). It does seem to be "moody" like that thoit will be very difficult to reproduce for a while, and then it will happen damn-near every time for a while...weirdness On a bit of a side note, I've been thinking of changing my scripts so that they perform an initial rsync prior to doing the startbackup-rsync-stopbackup just so that the second rsync will be fasterso that the backup is in progress for a shorter period of time, as while it is running it will stop other standbys from starting upthis shouldn't cause any issues eh? 2011-10-25 13:43:24.035 MDT [15072]: [1-1] LOG: database system was interrupted; last known up at 2011-10-25 13:43:11 MDT 2011-10-25 13:43:24.035 MDT [15072]: [2-1] LOG: creating missing WAL directory "pg_xlog/archive_status" 2011-10-25 13:43:24.037 MDT [15072]: [3-1] LOG: entering standby mode DEBUG: received replication command: IDENTIFY_SYSTEM DEBUG: received replication command: START_REPLICATION 2/CF00 2011-10-25 13:43:24.041 MDT [15073]: [1-1] LOG: streaming replication successfully connected to primary 2011-10-25 13:43:24.177 MDT [15092]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:24.781 MDT [15072]: [4-1] DEBUG: checkpoint record is at 2/CF81A478 2011-10-25 13:43:24.781 MDT [15072]: [5-1] DEBUG: redo record is at 2/CF20; shutdown FALSE 2011-10-25 13:43:24.781 MDT [15072]: [6-1] DEBUG: next transaction ID: 0/4634700; next OID: 1188228 2011-10-25 13:43:24.781 MDT [15072]: [7-1] DEBUG: next MultiXactId: 839; next MultiXactOffset: 1686 2011-10-25 13:43:24.781 MDT [15072]: [8-1] DEBUG: oldest unfrozen transaction ID: 1669, in database 1 2011-10-25 13:43:24.781 MDT [15072]: [9-1] DEBUG: transaction ID wrap limit is 2147485316, limited by database with OID 1 2011-10-25 13:43:24.783 MDT [15072]: [10-1] DEBUG: resetting unlogged relations: cleanup 1 init 0 2011-10-25 13:43:24.791 MDT [15072]: [11-1] DEBUG: initializing for hot standby 2011-10-25 13:43:24.791 MDT [15072]: [12-1] LOG: consistent recovery state reached at 2/CF81A4D0 2011-10-25 13:43:24.791 MDT [15072]: [13-1] LOG: redo starts at 2/CF20 2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG: consistent state delayed because recovery snapshot incomplete 2011-10-25 13:43:25.019 MDT [15072]: [15-1] CONTEXT: xlog redo running xacts: nextXid 4634700 latestCompletedXid 4634698 oldestRunningXid 4634336; 130 xacts: 4634336 4634337 4634338 4634339 4634340 4634341 4634342 4634343 4634344 4634345 4634346 4634347 4634348 4634349 4634350 4634351 4634352 4634353 4634354 4634355 4634356 4634357 4634358 4634359 4634360 4634361 4634362 4634363 4634364 4634365 4634366 4634367 4634368 4634369 4634370 4634371 4634515 4634516 4634517 4634518 4634519 4634520 4634521 4634522 4634523 4634524 4634525 4634526 4634527 4634528 4634529 4634530 4634531 4634532 4634533 4634534 4634535 4634536 4634537 4634538 4634539 4634540 4634541 4634542 4634543 4634385 4634386 4634387 4634388 4634389 4634390 4634391 4634392 4634393 4634394 4634395 4634396 4634397 4634398 4634399 4634400 4634401 4634402 4634403 4634404 4634405 4634406 4634407 4634408 4634409 4634410 4634411 4634412 4634413 4634414 4634415 4634416 4634417 4634418 4634419 4634420 4634579 4634580 4634581 4634582 4634583 4634584 4634585 4634586 4634587 4634588 4634589 4634590 4634591 4634592 4634593 4634594 4634595 4634596 4634597 4634598 4634599 4634600 4634601 4634602 4634603 4634604 4634605 4634606 4634607; subxid ovf 2011-10-25 13:43:25.240 MDT [15130]: [1-1] FATAL: the database system is starting up DEBUG: standby "sync_rep_test" has now caught up with primary 2011-10-25 13:43:26.304 MDT [15167]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:27.366 MDT [15204]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:28.426 MDT [15241]: [1-1] FATAL: the database system is starting up 2011-10-25 13:43:29.461 MDT [15275]: [1-1] FATAL: the database system is starting up and so on... On Tue, Oct 25, 2011 at 6:51 AM, Simon Riggs wrote: > On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug wrote: > > > What I do
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
Well, on the other hand maybe there is something wrong with the data. Here's the test/steps I just did - 1. I do the pg_basebackup when the master is under load, hot slave now will not start up but warm slave will. 2. I start a warm slave and let it catch up to current 3. On the slave I change 'hot_standby=on' and do a 'service postgresql restart' 4. The postgres fails to restart with the same error. 5. I turn hot_standby back off and postgres starts back up fine as a warm slave 6. I then turn off the load, the slave is all caught up, master and slave are both sitting idle 7. I, again, change 'hot_standby=on' and do a service restart 8. Again it fails, with the same error, even though there is no longer any load. 9. I repeat this warmstart/hotstart cycle a couple more times until to my surprise, instead of failing, it successfully starts up as a hot standby (this is after maybe 5 minutes or so of sitting idle) So...given that it continued to fail even after the load had been turned of, that makes me believe that the data which was copied over was invalid in some way. And when a checkpoint/logrotation/somethingelse occurred when not under load it cleared itself upI'm shooting in the dark here Anyone have any suggestions/ideas/things to try? On Mon, Oct 17, 2011 at 2:13 PM, Chris Redekop wrote: > I can confirm that both the pg_clog and pg_subtrans errors do occur when > using pg_basebackup instead of rsync. The data itself seems to be fine > because using the exact same data I can start up a warm standby no problem, > it is just the hot standby that will not start up. > > > On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop wrote: > >> > > Linas, could you capture the output of pg_controldata *and* increase >> the >> > > log level to DEBUG1 on the standby? We should then see nextXid value >> of >> > > the checkpoint the recovery is starting from. >> > >> > I'll try to do that whenever I'm in that territory again... >> Incidentally, >> > recently there was a lot of unrelated-to-this-post work to polish things >> up >> > for a talk being given at PGWest 2011 Today :) >> > >> > > I also checked what rsync does when a file vanishes after rsync >> computed the >> > > file list, but before it is sent. rsync 3.0.7 on OSX, at least, >> complains >> > > loudly, and doesn't sync the file. It BTW also exits non-zero, with a >> special >> > > exit code for precisely that failure case. >> > >> > To be precise, my script has logic to accept the exit code 24, just as >> > stated in PG manual: >> > >> > Docs> For example, some versions of rsync return a separate exit code >> for >> > Docs> "vanished source files", and you can write a driver script to >> accept >> > Docs> this exit code as a non-error case. >> >> I also am running into this issue and can reproduce it very reliably. For >> me, however, it happens even when doing the "fast backup" like so: >> pg_start_backup('whatever', true)...my traffic is more write-heavy than >> linas's tho, so that might have something to do with it. Yesterday it >> reliably errored out on pg_clog every time, but today it is >> failing sporadically on pg_subtrans (which seems to be past where the >> pg_clog error was)the only thing that has changed is that I've changed >> the log level to debug1I wouldn't think that could be related though. >> I've linked the requested pg_controldata and debug1 logs for both errors. >> Both links contain the output from pg_start_backup, rsync, pg_stop_backup, >> pg_controldata, and then the postgres debug1 log produced from a subsequent >> startup attempt. >> >> pg_clog: http://pastebin.com/mTfdcjwH >> pg_subtrans: http://pastebin.com/qAXEHAQt >> >> Any workarounds would be very appreciated.would copying clog+subtrans >> before or after the rest of the data directory (or something like that) make >> any difference? >> >> Thanks! >> > >
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
I can confirm that both the pg_clog and pg_subtrans errors do occur when using pg_basebackup instead of rsync. The data itself seems to be fine because using the exact same data I can start up a warm standby no problem, it is just the hot standby that will not start up. On Sat, Oct 15, 2011 at 7:33 PM, Chris Redekop wrote: > > > Linas, could you capture the output of pg_controldata *and* increase > the > > > log level to DEBUG1 on the standby? We should then see nextXid value of > > > the checkpoint the recovery is starting from. > > > > I'll try to do that whenever I'm in that territory again... Incidentally, > > recently there was a lot of unrelated-to-this-post work to polish things > up > > for a talk being given at PGWest 2011 Today :) > > > > > I also checked what rsync does when a file vanishes after rsync > computed the > > > file list, but before it is sent. rsync 3.0.7 on OSX, at least, > complains > > > loudly, and doesn't sync the file. It BTW also exits non-zero, with a > special > > > exit code for precisely that failure case. > > > > To be precise, my script has logic to accept the exit code 24, just as > > stated in PG manual: > > > > Docs> For example, some versions of rsync return a separate exit code for > > Docs> "vanished source files", and you can write a driver script to > accept > > Docs> this exit code as a non-error case. > > I also am running into this issue and can reproduce it very reliably. For > me, however, it happens even when doing the "fast backup" like so: > pg_start_backup('whatever', true)...my traffic is more write-heavy than > linas's tho, so that might have something to do with it. Yesterday it > reliably errored out on pg_clog every time, but today it is > failing sporadically on pg_subtrans (which seems to be past where the > pg_clog error was)the only thing that has changed is that I've changed > the log level to debug1I wouldn't think that could be related though. > I've linked the requested pg_controldata and debug1 logs for both errors. > Both links contain the output from pg_start_backup, rsync, pg_stop_backup, > pg_controldata, and then the postgres debug1 log produced from a subsequent > startup attempt. > > pg_clog: http://pastebin.com/mTfdcjwH > pg_subtrans: http://pastebin.com/qAXEHAQt > > Any workarounds would be very appreciated.would copying clog+subtrans > before or after the rest of the data directory (or something like that) make > any difference? > > Thanks! >
Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load
> > Linas, could you capture the output of pg_controldata *and* increase the > > log level to DEBUG1 on the standby? We should then see nextXid value of > > the checkpoint the recovery is starting from. > > I'll try to do that whenever I'm in that territory again... Incidentally, > recently there was a lot of unrelated-to-this-post work to polish things up > for a talk being given at PGWest 2011 Today :) > > > I also checked what rsync does when a file vanishes after rsync computed the > > file list, but before it is sent. rsync 3.0.7 on OSX, at least, complains > > loudly, and doesn't sync the file. It BTW also exits non-zero, with a special > > exit code for precisely that failure case. > > To be precise, my script has logic to accept the exit code 24, just as > stated in PG manual: > > Docs> For example, some versions of rsync return a separate exit code for > Docs> "vanished source files", and you can write a driver script to accept > Docs> this exit code as a non-error case. I also am running into this issue and can reproduce it very reliably. For me, however, it happens even when doing the "fast backup" like so: pg_start_backup('whatever', true)...my traffic is more write-heavy than linas's tho, so that might have something to do with it. Yesterday it reliably errored out on pg_clog every time, but today it is failing sporadically on pg_subtrans (which seems to be past where the pg_clog error was)the only thing that has changed is that I've changed the log level to debug1I wouldn't think that could be related though. I've linked the requested pg_controldata and debug1 logs for both errors. Both links contain the output from pg_start_backup, rsync, pg_stop_backup, pg_controldata, and then the postgres debug1 log produced from a subsequent startup attempt. pg_clog: http://pastebin.com/mTfdcjwH pg_subtrans: http://pastebin.com/qAXEHAQt Any workarounds would be very appreciated.would copying clog+subtrans before or after the rest of the data directory (or something like that) make any difference? Thanks!
Re: [HACKERS] pg_last_xact_insert_timestamp
Thanks for all the feedback guys. Just to throw another monkey wrench in here - I've been playing with Simon's proposed solution of returning 0 when the WAL positions match, and I've come to the realizatiion that even if using pg_last_xact_insert_timestamp, although it would help, we still wouldn't be able to get a 100% accurate "how far behind?" counternot that this is a big deal, but I know my ops team is going to bitch to me about it :).take this situation: there's a lull of 30 seconds where there are no transactions committed on the serverthe slave is totally caught up, WAL positions match, I'm reporting 0, everything is happy. Then a transaction is committed on the masterbefore the slave gets it my query hits it and sees that we're 30 seconds behind (when in reality we're <1sec behind).Because of this affect my graph is a little spikey...I mean it's not a huge deal or anything - I can put some sanity checking in my number reporting ("if 1 second ago you were 0 seconds behind, you can't be more than 1 second behind now" sorta thing). But if we wanted to go for super-ideal solution there would be a way to get the timestamp of pg_stat_replication.replay_location+1 (the first transaction that the slave does not have). On Thu, Sep 8, 2011 at 7:03 AM, Robert Haas wrote: > On Thu, Sep 8, 2011 at 6:14 AM, Fujii Masao wrote: > > OTOH, new function enables users to monitor the delay as a timestamp. > > For users, a timestamp is obviously easier to handle than LSN, and the > delay > > as a timestamp is more intuitive. So, I think that it's worth adding > > something like pg_last_xact_insert_timestamp into core for improvement > > of user-friendness. > > It seems very nice from a usability point of view, but I have to agree > with Simon's concern about performance. Actually, as of today, > WALInsertLock is such a gigantic bottleneck that I suspect the > overhead of this additional bookkeeping would be completely > unnoticeable. But I'm still reluctant to add more centralized > spinlocks that everyone has to fight over, having recently put a lot > of effort into getting rid of some of the ones we've traditionally > had. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >