Re: [HACKERS] Time-Delayed Standbys
On 2013-12-13 13:44:30 +, Simon Riggs wrote: On 13 December 2013 13:22, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-13 13:09:13 +, Simon Riggs wrote: On 13 December 2013 11:58, Andres Freund and...@2ndquadrant.com wrote: I removed it because it was after the pause. I'll replace it, but before the pause. Doesn't after the pause make more sense? If somebody promoted while we were waiting, we want to recognize that before rolling forward? The wait can take a long while after all? That would change the way pause currently works, which is OOS for that patch. But this feature isn't pause itself - it's imo something independent. Note that we currently a) check pause again after recoveryApplyDelay(), b) do check for promotion if the sleep in recoveryApplyDelay() is interrupted. So not checking after the final sleep seems confusing. I'm proposing the attached patch. LOoks good, although I'd move it down below the comment ;) This patch implements a consistent view of recovery pause, which is that when paused, we don't check for promotion, during or immediately after. That is user noticeable behaviour and shouldn't be changed without thought and discussion on a separate thread with a clear descriptive title. (I might argue in favour of it myself, I'm not yet decided). Some more improvements in that are certainly would be good... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 12 December 2013 21:58, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Dec 12, 2013 at 3:42 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Dec 12, 2013 at 3:39 PM, Simon Riggs si...@2ndquadrant.com wrote: On 12 December 2013 15:19, Simon Riggs si...@2ndquadrant.com wrote: Don't panic guys! I meant UTC offset only. And yes, it may not be needed, will check. Checked, all non-UTC TZ offsets work without further effort here. Thanks! Reviewing the committed patch I noted that the CheckForStandbyTrigger() after the delay was removed. If we promote the standby during the delay and don't check the trigger immediately after the delay, then we will replay undesired WALs records. The attached patch add this check. I removed it because it was after the pause. I'll replace it, but before the pause. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-13 11:56:47 +, Simon Riggs wrote: On 12 December 2013 21:58, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Reviewing the committed patch I noted that the CheckForStandbyTrigger() after the delay was removed. If we promote the standby during the delay and don't check the trigger immediately after the delay, then we will replay undesired WALs records. The attached patch add this check. I removed it because it was after the pause. I'll replace it, but before the pause. Doesn't after the pause make more sense? If somebody promoted while we were waiting, we want to recognize that before rolling forward? The wait can take a long while after all? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 13 December 2013 11:58, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-13 11:56:47 +, Simon Riggs wrote: On 12 December 2013 21:58, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Reviewing the committed patch I noted that the CheckForStandbyTrigger() after the delay was removed. If we promote the standby during the delay and don't check the trigger immediately after the delay, then we will replay undesired WALs records. The attached patch add this check. I removed it because it was after the pause. I'll replace it, but before the pause. Doesn't after the pause make more sense? If somebody promoted while we were waiting, we want to recognize that before rolling forward? The wait can take a long while after all? That would change the way pause currently works, which is OOS for that patch. I'm happy to discuss such a change, but if agreed, it would need to apply in all cases, not just this one. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-13 13:09:13 +, Simon Riggs wrote: On 13 December 2013 11:58, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-13 11:56:47 +, Simon Riggs wrote: On 12 December 2013 21:58, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Reviewing the committed patch I noted that the CheckForStandbyTrigger() after the delay was removed. If we promote the standby during the delay and don't check the trigger immediately after the delay, then we will replay undesired WALs records. The attached patch add this check. I removed it because it was after the pause. I'll replace it, but before the pause. Doesn't after the pause make more sense? If somebody promoted while we were waiting, we want to recognize that before rolling forward? The wait can take a long while after all? That would change the way pause currently works, which is OOS for that patch. But this feature isn't pause itself - it's imo something independent. Note that we currently a) check pause again after recoveryApplyDelay(), b) do check for promotion if the sleep in recoveryApplyDelay() is interrupted. So not checking after the final sleep seems confusing. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 13 December 2013 13:22, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-13 13:09:13 +, Simon Riggs wrote: On 13 December 2013 11:58, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-13 11:56:47 +, Simon Riggs wrote: On 12 December 2013 21:58, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Reviewing the committed patch I noted that the CheckForStandbyTrigger() after the delay was removed. If we promote the standby during the delay and don't check the trigger immediately after the delay, then we will replay undesired WALs records. The attached patch add this check. I removed it because it was after the pause. I'll replace it, but before the pause. Doesn't after the pause make more sense? If somebody promoted while we were waiting, we want to recognize that before rolling forward? The wait can take a long while after all? That would change the way pause currently works, which is OOS for that patch. But this feature isn't pause itself - it's imo something independent. Note that we currently a) check pause again after recoveryApplyDelay(), b) do check for promotion if the sleep in recoveryApplyDelay() is interrupted. So not checking after the final sleep seems confusing. I'm proposing the attached patch. This patch implements a consistent view of recovery pause, which is that when paused, we don't check for promotion, during or immediately after. That is user noticeable behaviour and shouldn't be changed without thought and discussion on a separate thread with a clear descriptive title. (I might argue in favour of it myself, I'm not yet decided). -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services snippet.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Fri, Dec 13, 2013 at 11:44 AM, Simon Riggs si...@2ndquadrant.com wrote: On 13 December 2013 13:22, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-13 13:09:13 +, Simon Riggs wrote: On 13 December 2013 11:58, Andres Freund and...@2ndquadrant.com wrote: On 2013-12-13 11:56:47 +, Simon Riggs wrote: On 12 December 2013 21:58, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: Reviewing the committed patch I noted that the CheckForStandbyTrigger() after the delay was removed. If we promote the standby during the delay and don't check the trigger immediately after the delay, then we will replay undesired WALs records. The attached patch add this check. I removed it because it was after the pause. I'll replace it, but before the pause. Doesn't after the pause make more sense? If somebody promoted while we were waiting, we want to recognize that before rolling forward? The wait can take a long while after all? That would change the way pause currently works, which is OOS for that patch. But this feature isn't pause itself - it's imo something independent. Note that we currently a) check pause again after recoveryApplyDelay(), b) do check for promotion if the sleep in recoveryApplyDelay() is interrupted. So not checking after the final sleep seems confusing. I'm proposing the attached patch. This patch implements a consistent view of recovery pause, which is that when paused, we don't check for promotion, during or immediately after. That is user noticeable behaviour and shouldn't be changed without thought and discussion on a separate thread with a clear descriptive title. (I might argue in favour of it myself, I'm not yet decided). In my previous message [1] I attach a patch equal to your ;-) Regards, [1] http://www.postgresql.org/message-id/CAFcNs+qD0AJ=qzhsHD9+v_Mhz0RTBJ=cJPCT_T=ut_jvvnc...@mail.gmail.com -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Time-Delayed Standbys
(2013/12/12 7:23), Fabrízio de Royes Mello wrote: On Wed, Dec 11, 2013 at 7:47 PM, Andres Freund and...@2ndquadrant.com * hot_standby=off: Makes delay useable with wal_level=archive (and thus a lower WAL volume) * standby_mode=off: Configurations that use tools like pg_standby and similar simply don't need standby_mode=on. If you want to trigger failover from within the restore_command you *cannot* set it. * recovery_target_*: It can still make sense if you use pause_at_recovery_target. I don't think part of his arguments are right very much... We can just set stanby_mode=on when we use min_standby_apply_delay with pg_standby and similar simply tools. However, I tend to agree with not to need to prohibit except for standby_mode. So I'd like to propose that changing parameter name of min_standby_apply_delay to min_recovery_apply_delay. It is natural for this feature. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 12 December 2013 08:19, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: (2013/12/12 7:23), Fabrízio de Royes Mello wrote: On Wed, Dec 11, 2013 at 7:47 PM, Andres Freund and...@2ndquadrant.com * hot_standby=off: Makes delay useable with wal_level=archive (and thus a lower WAL volume) * standby_mode=off: Configurations that use tools like pg_standby and similar simply don't need standby_mode=on. If you want to trigger failover from within the restore_command you *cannot* set it. * recovery_target_*: It can still make sense if you use pause_at_recovery_target. I don't think part of his arguments are right very much... We can just set stanby_mode=on when we use min_standby_apply_delay with pg_standby and similar simply tools. However, I tend to agree with not to need to prohibit except for standby_mode. So I'd like to propose that changing parameter name of min_standby_apply_delay to min_recovery_apply_delay. It is natural for this feature. OK -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 9 December 2013 10:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: (2013/12/09 19:35), Pavel Stehule wrote: 2013/12/9 KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp Hi Fabrízio, I test your v4 patch, and send your review comments. * Fix typo 49 -# commited transactions from the master, specify a recovery time delay. 49 +# committed transactions from the master, specify a recovery time delay. * Fix white space 177 - if (secs = 0 microsecs =0) 177 + if (secs = 0 microsecs =0 ) * Add functionality (I propose) We can set negative number at min_standby_apply_delay. I think that this feature is for world wide replication situation. For example, master server is in Japan and slave server is in San Francisco. Japan time fowards than San Francisco time . And if we want to delay in this situation, it can need negative number in min_standby_apply_delay. So I propose that time delay conditional branch change under following. - if (min_standby_apply_delay 0) + if (min_standby_apply_delay != 0) What do you think? It might also be working collectry. what using interval instead absolute time? This is because local time is recorded in XLOG. And it has big cost for calculating global time. I agree with your request here, but I don't think negative values are the right way to implement that, at least it would not be very usable. My suggestion would be to add the TZ to the checkpoint record. This way all users of WAL can see the TZ of the master and act accordingly. I'll do a separate patch for that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
(2013/12/12 18:09), Simon Riggs wrote: On 9 December 2013 10:54, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: (2013/12/09 19:35), Pavel Stehule wrote: 2013/12/9 KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp Hi Fabrízio, I test your v4 patch, and send your review comments. * Fix typo 49 -# commited transactions from the master, specify a recovery time delay. 49 +# committed transactions from the master, specify a recovery time delay. * Fix white space 177 - if (secs = 0 microsecs =0) 177 + if (secs = 0 microsecs =0 ) * Add functionality (I propose) We can set negative number at min_standby_apply_delay. I think that this feature is for world wide replication situation. For example, master server is in Japan and slave server is in San Francisco. Japan time fowards than San Francisco time . And if we want to delay in this situation, it can need negative number in min_standby_apply_delay. So I propose that time delay conditional branch change under following. - if (min_standby_apply_delay 0) + if (min_standby_apply_delay != 0) What do you think? It might also be working collectry. what using interval instead absolute time? This is because local time is recorded in XLOG. And it has big cost for calculating global time. I agree with your request here, but I don't think negative values are the right way to implement that, at least it would not be very usable. I think that my proposal is the easiest and simplist way to solve this problem. And I believe that the man who cannot calculate the difference in time-zone doesn't set replication cluster across continents. My suggestion would be to add the TZ to the checkpoint record. This way all users of WAL can see the TZ of the master and act accordingly. I'll do a separate patch for that. It is something useful for also other situations. However, it might be going to happen long and complicated discussions... I think that our hope is to commit this patch in this commit-fest or next final commit-fest. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 12 December 2013 10:42, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I agree with your request here, but I don't think negative values are the right way to implement that, at least it would not be very usable. I think that my proposal is the easiest and simplist way to solve this problem. And I believe that the man who cannot calculate the difference in time-zone doesn't set replication cluster across continents. My suggestion would be to add the TZ to the checkpoint record. This way all users of WAL can see the TZ of the master and act accordingly. I'll do a separate patch for that. It is something useful for also other situations. However, it might be going to happen long and complicated discussions... I think that our hope is to commit this patch in this commit-fest or next final commit-fest. Agreed on no delay for the delay patch, as shown by my commit. Still think we need better TZ handling. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-12 09:09:21 +, Simon Riggs wrote: * Add functionality (I propose) We can set negative number at min_standby_apply_delay. I think that this feature is for world wide replication situation. For example, master server is in Japan and slave server is in San Francisco. Japan time fowards than San Francisco time . And if we want to delay in this situation, it can need negative This is because local time is recorded in XLOG. And it has big cost for calculating global time. Uhm? Isn't the timestamp in commit records actually a TimestampTz? And thus essentially stored as UTC? I don't think this problem actually exists? My suggestion would be to add the TZ to the checkpoint record. This way all users of WAL can see the TZ of the master and act accordingly. I'll do a separate patch for that. Intuitively I'd say that might be useful - but I am not reall sure what for. And we don't exactly have a great interface for looking at a checkpoint's data. Maybe add it to the control file instead? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 12 December 2013 11:05, Andres Freund and...@2ndquadrant.com wrote: My suggestion would be to add the TZ to the checkpoint record. This way all users of WAL can see the TZ of the master and act accordingly. I'll do a separate patch for that. Intuitively I'd say that might be useful - but I am not reall sure what for. And we don't exactly have a great interface for looking at a checkpoint's data. Maybe add it to the control file instead? That's actually what I had in mind, I just phrased it badly in mid-thought. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
2013/12/12 Simon Riggs si...@2ndquadrant.com On 12 December 2013 10:42, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I agree with your request here, but I don't think negative values are the right way to implement that, at least it would not be very usable. I think that my proposal is the easiest and simplist way to solve this problem. And I believe that the man who cannot calculate the difference in time-zone doesn't set replication cluster across continents. My suggestion would be to add the TZ to the checkpoint record. This way all users of WAL can see the TZ of the master and act accordingly. I'll do a separate patch for that. It is something useful for also other situations. However, it might be going to happen long and complicated discussions... I think that our hope is to commit this patch in this commit-fest or next final commit-fest. Agreed on no delay for the delay patch, as shown by my commit. Our forecast was very accurate... Nice commit, Thanks! Regards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] Time-Delayed Standbys
Simon Riggs si...@2ndquadrant.com writes: On 12 December 2013 11:05, Andres Freund and...@2ndquadrant.com wrote: My suggestion would be to add the TZ to the checkpoint record. This way all users of WAL can see the TZ of the master and act accordingly. I'll do a separate patch for that. Intuitively I'd say that might be useful - but I am not reall sure what for. And we don't exactly have a great interface for looking at a checkpoint's data. Maybe add it to the control file instead? That's actually what I had in mind, I just phrased it badly in mid-thought. I don't think you realize what a can of worms that would be. There's no compact representation of a timezone, unless you are only proposing to store the UTC offset; and frankly I'm not particularly seeing the point of that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Thu, Dec 12, 2013 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 12 December 2013 11:05, Andres Freund and...@2ndquadrant.com wrote: My suggestion would be to add the TZ to the checkpoint record. This way all users of WAL can see the TZ of the master and act accordingly. I'll do a separate patch for that. Intuitively I'd say that might be useful - but I am not reall sure what for. And we don't exactly have a great interface for looking at a checkpoint's data. Maybe add it to the control file instead? That's actually what I had in mind, I just phrased it badly in mid-thought. I don't think you realize what a can of worms that would be. There's no compact representation of a timezone, unless you are only proposing to store the UTC offset; and frankly I'm not particularly seeing the point of that. +1. I can see the point of storing a timestamp in each checkpoint record, if we don't already, but time zones should be completely irrelevant to this feature. Everything should be reckoned in seconds since the epoch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 12 December 2013 15:03, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 12, 2013 at 9:52 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On 12 December 2013 11:05, Andres Freund and...@2ndquadrant.com wrote: My suggestion would be to add the TZ to the checkpoint record. This way all users of WAL can see the TZ of the master and act accordingly. I'll do a separate patch for that. Intuitively I'd say that might be useful - but I am not reall sure what for. And we don't exactly have a great interface for looking at a checkpoint's data. Maybe add it to the control file instead? That's actually what I had in mind, I just phrased it badly in mid-thought. I don't think you realize what a can of worms that would be. There's no compact representation of a timezone, unless you are only proposing to store the UTC offset; and frankly I'm not particularly seeing the point of that. +1. I can see the point of storing a timestamp in each checkpoint record, if we don't already, but time zones should be completely irrelevant to this feature. Everything should be reckoned in seconds since the epoch. Don't panic guys! I meant UTC offset only. And yes, it may not be needed, will check. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 12 December 2013 15:19, Simon Riggs si...@2ndquadrant.com wrote: Don't panic guys! I meant UTC offset only. And yes, it may not be needed, will check. Checked, all non-UTC TZ offsets work without further effort here. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Thu, Dec 12, 2013 at 3:39 PM, Simon Riggs si...@2ndquadrant.com wrote: On 12 December 2013 15:19, Simon Riggs si...@2ndquadrant.com wrote: Don't panic guys! I meant UTC offset only. And yes, it may not be needed, will check. Checked, all non-UTC TZ offsets work without further effort here. Thanks! -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Time-Delayed Standbys
On Thu, Dec 12, 2013 at 3:42 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Thu, Dec 12, 2013 at 3:39 PM, Simon Riggs si...@2ndquadrant.com wrote: On 12 December 2013 15:19, Simon Riggs si...@2ndquadrant.com wrote: Don't panic guys! I meant UTC offset only. And yes, it may not be needed, will check. Checked, all non-UTC TZ offsets work without further effort here. Thanks! Reviewing the committed patch I noted that the CheckForStandbyTrigger() after the delay was removed. If we promote the standby during the delay and don't check the trigger immediately after the delay, then we will replay undesired WALs records. The attached patch add this check. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index a76aef3..fbc2d2f 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -6835,6 +6835,14 @@ StartupXLOG(void) recoveryApplyDelay(); /* + * Check for standby trigger to prevent the + * replay of undesired WAL records if the + * slave was promoted during the delay. + */ + if (CheckForStandbyTrigger()) + break; + + /* * We test for paused recovery again here. If * user sets delayed apply, it may be because * they expect to pause recovery in case of -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 11 December 2013 06:36, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I think this feature will be used in a lot of scenarios in which PITR is currently used. We have to judge which is better, we get something potential or to protect stupid. And we had better to wait author's comment... I'd say just document that it wouldn't make sense to use it for PITR. There may be some use case we can't see yet, so specifically prohibiting a use case that is not dangerous seems too much at this point. I will no doubt be reminded of these words in the future... -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Wed, Dec 11, 2013 at 6:27 AM, Simon Riggs si...@2ndquadrant.com wrote: On 11 December 2013 06:36, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: I think this feature will be used in a lot of scenarios in which PITR is currently used. We have to judge which is better, we get something potential or to protect stupid. And we had better to wait author's comment... I'd say just document that it wouldn't make sense to use it for PITR. There may be some use case we can't see yet, so specifically prohibiting a use case that is not dangerous seems too much at this point. I will no doubt be reminded of these words in the future... Hi all, I tend to agree with Simon, but I confess that I don't liked to delay a server with standby_mode = 'off'. The main goal of this patch is delay the Streaming Replication, so if the slave server isn't a hot-standby I think makes no sense to delay it. Mitsumasa suggested to add StandbyModeRequested in conditional branch to skip this situation. I agree with him! And I'll change 'recoveryDelay' (functions, variables) to 'standbyDelay'. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-11 16:37:54 -0200, Fabrízio de Royes Mello wrote: On Wed, Dec 11, 2013 at 6:27 AM, Simon Riggs si...@2ndquadrant.com wrote: I think this feature will be used in a lot of scenarios in which PITR is currently used. We have to judge which is better, we get something potential or to protect stupid. And we had better to wait author's comment... I'd say just document that it wouldn't make sense to use it for PITR. There may be some use case we can't see yet, so specifically prohibiting a use case that is not dangerous seems too much at this point. I will no doubt be reminded of these words in the future... I tend to agree with Simon, but I confess that I don't liked to delay a server with standby_mode = 'off'. The main goal of this patch is delay the Streaming Replication, so if the slave server isn't a hot-standby I think makes no sense to delay it. Mitsumasa suggested to add StandbyModeRequested in conditional branch to skip this situation. I agree with him! I don't think that position has any merit, sorry: Think about the way this stuff gets setup. The user creates a new basebackup (pg_basebackup, manual pg_start/stop_backup, shutdown primary). Then he creates a recovery conf by either starting from scratch, using --write-recovery-conf or by copying recovery.conf.sample. In none of these cases delay will be configured. So, with that in mind, the only way it could have been configured is by the user *explicitly* writing it into recovery.conf. And now you want to to react to this explicit step by just *silently* ignoring the setting based on some random criteria (arguments have been made about hot_standby=on/off, standby_mode=on/off which aren't directly related). Why on earth would that by a usability improvement? Also, you seem to assume there's no point in configuring it for any of hot_standby=off, standby_mode=off, recovery_target=*. Why? There's usecases for all of them: * hot_standby=off: Makes delay useable with wal_level=archive (and thus a lower WAL volume) * standby_mode=off: Configurations that use tools like pg_standby and similar simply don't need standby_mode=on. If you want to trigger failover from within the restore_command you *cannot* set it. * recovery_target_*: It can still make sense if you use pause_at_recovery_target. In which scenarios does your restriction actually improve anything? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Wed, Dec 11, 2013 at 7:47 PM, Andres Freund and...@2ndquadrant.com wrote: I don't think that position has any merit, sorry: Think about the way this stuff gets setup. The user creates a new basebackup (pg_basebackup, manual pg_start/stop_backup, shutdown primary). Then he creates a recovery conf by either starting from scratch, using --write-recovery-conf or by copying recovery.conf.sample. In none of these cases delay will be configured. Ok. So, with that in mind, the only way it could have been configured is by the user *explicitly* writing it into recovery.conf. And now you want to to react to this explicit step by just *silently* ignoring the setting based on some random criteria (arguments have been made about hot_standby=on/off, standby_mode=on/off which aren't directly related). Why on earth would that by a usability improvement? Also, you seem to assume there's no point in configuring it for any of hot_standby=off, standby_mode=off, recovery_target=*. Why? There's usecases for all of them: * hot_standby=off: Makes delay useable with wal_level=archive (and thus a lower WAL volume) * standby_mode=off: Configurations that use tools like pg_standby and similar simply don't need standby_mode=on. If you want to trigger failover from within the restore_command you *cannot* set it. * recovery_target_*: It can still make sense if you use pause_at_recovery_target. In which scenarios does your restriction actually improve anything? Given your arguments I'm forced to review my understanding of the problem. You are absolutely right in your assertions. I was not seeing the scenario on this perspective. Anyway we need to improve docs, any suggestions? Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-10 13:26:27 +0900, KONDO Mitsumasa wrote: (2013/12/09 20:29), Andres Freund wrote: On 2013-12-09 19:51:01 +0900, KONDO Mitsumasa wrote: Add my comment. We have to consider three situations. 1. PITR 2. replication standby 3. replication standby with restore_command I think this patch cannot delay in 1 situation. Why? I have three reasons. None of these reasons seem to be of technical nature, right? 1. It is written in document. Can we remove it? 2. Name of this feature is Time-delayed *standbys*, not Time-delayed *recovery*. Can we change it? I don't think that'd be a win in clarity. But perhaps somebody else has a better suggestion? 3. I think it is unnessesary in master PITR. And if it can delay in master PITR, it will become master at unexpected timing, not to continue to recovery. It is meaningless. master PITR? What's that? All PITR is based on recovery.conf and thus not really a master? Why should we prohibit using this feature in PITR? I don't see any advantage in doing so. If somebody doesn't want the delay, they shouldn't set it in the configuration file. End of story. There's not really a that meaningful distinction between PITR and replication using archive_command. Especially when using *pause_after. I think this feature will be used in a lot of scenarios in which PITR is currently used. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
(2013/12/10 18:38), Andres Freund wrote: master PITR? What's that? All PITR is based on recovery.conf and thus not really a master? master PITR is PITR with standby_mode = off. It's just recovery from basebackup. They have difference between master PITR and standby that the former will be independent timelineID, but the latter is same timeline ID taht following the master sever. In the first place, purposes are different. Why should we prohibit using this feature in PITR? I don't see any advantage in doing so. If somebody doesn't want the delay, they shouldn't set it in the configuration file. End of story. Unfortunately, there are a lot of stupid in the world... I think you have these clients, too. There's not really a that meaningful distinction between PITR and replication using archive_command. Especially when using *pause_after. It is meaningless in master PITR. It will be master which has new timelineID at unexpected timing. I think this feature will be used in a lot of scenarios in which PITR is currently used. We have to judge which is better, we get something potential or to protect stupid. And we had better to wait author's comment... Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
Hi Fabrízio, I test your v4 patch, and send your review comments. * Fix typo 49 -# commited transactions from the master, specify a recovery time delay. 49 +# committed transactions from the master, specify a recovery time delay. * Fix white space 177 - if (secs = 0 microsecs =0) 177 + if (secs = 0 microsecs =0 ) * Add functionality (I propose) We can set negative number at min_standby_apply_delay. I think that this feature is for world wide replication situation. For example, master server is in Japan and slave server is in San Francisco. Japan time fowards than San Francisco time . And if we want to delay in this situation, it can need negative number in min_standby_apply_delay. So I propose that time delay conditional branch change under following. - if (min_standby_apply_delay 0) + if (min_standby_apply_delay != 0) What do you think? It might also be working collectry. * Problem 1 I read your wittened document. There is PITR has not affected. However, when I run PITR with min_standby_apply_delay=300, it cannot start server. The log is under following. [mitsu-ko@localhost postgresql]$ bin/pg_ctl -D data2 start server starting [mitsu-ko@localhost postgresql]$ LOG: database system was interrupted; last known up at 2013-12-08 18:57:00 JST LOG: creating missing WAL directory pg_xlog/archive_status cp: cannot stat `../arc/0002.history': LOG: starting archive recovery LOG: restored log file 00010041 from archive LOG: redo starts at 0/4128 LOG: consistent recovery state reached at 0/41F0 LOG: database system is ready to accept read only connections LOG: restored log file 00010042 from archive FATAL: cannot wait on a latch owned by another process LOG: startup process (PID 30501) exited with exit code 1 LOG: terminating any other active server processes We need recovery flag for controling PITR situation. That's all for now. If you are busy, please fix in your pace. I'm busy and I'd like to wait your time, too:-) Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
2013/12/9 KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp Hi Fabrízio, I test your v4 patch, and send your review comments. * Fix typo 49 -# commited transactions from the master, specify a recovery time delay. 49 +# committed transactions from the master, specify a recovery time delay. * Fix white space 177 - if (secs = 0 microsecs =0) 177 + if (secs = 0 microsecs =0 ) * Add functionality (I propose) We can set negative number at min_standby_apply_delay. I think that this feature is for world wide replication situation. For example, master server is in Japan and slave server is in San Francisco. Japan time fowards than San Francisco time . And if we want to delay in this situation, it can need negative number in min_standby_apply_delay. So I propose that time delay conditional branch change under following. - if (min_standby_apply_delay 0) + if (min_standby_apply_delay != 0) What do you think? It might also be working collectry. what using interval instead absolute time? Regards Pavel * Problem 1 I read your wittened document. There is PITR has not affected. However, when I run PITR with min_standby_apply_delay=300, it cannot start server. The log is under following. [mitsu-ko@localhost postgresql]$ bin/pg_ctl -D data2 start server starting [mitsu-ko@localhost postgresql]$ LOG: database system was interrupted; last known up at 2013-12-08 18:57:00 JST LOG: creating missing WAL directory pg_xlog/archive_status cp: cannot stat `../arc/0002.history': LOG: starting archive recovery LOG: restored log file 00010041 from archive LOG: redo starts at 0/4128 LOG: consistent recovery state reached at 0/41F0 LOG: database system is ready to accept read only connections LOG: restored log file 00010042 from archive FATAL: cannot wait on a latch owned by another process LOG: startup process (PID 30501) exited with exit code 1 LOG: terminating any other active server processes We need recovery flag for controling PITR situation. That's all for now. If you are busy, please fix in your pace. I'm busy and I'd like to wait your time, too:-) Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
(2013/12/09 19:36), KONDO Mitsumasa wrote: * Problem 1 I read your wittened document. There is PITR has not affected. However, when I run PITR with min_standby_apply_delay=300, it cannot start server. The log is under following. [mitsu-ko@localhost postgresql]$ bin/pg_ctl -D data2 start server starting [mitsu-ko@localhost postgresql]$ LOG: database system was interrupted; last known up at 2013-12-08 18:57:00 JST LOG: creating missing WAL directory pg_xlog/archive_status cp: cannot stat `../arc/0002.history': LOG: starting archive recovery LOG: restored log file 00010041 from archive LOG: redo starts at 0/4128 LOG: consistent recovery state reached at 0/41F0 LOG: database system is ready to accept read only connections LOG: restored log file 00010042 from archive FATAL: cannot wait on a latch owned by another process LOG: startup process (PID 30501) exited with exit code 1 LOG: terminating any other active server processes We need recovery flag for controling PITR situation. Add my comment. We have to consider three situations. 1. PITR 2. replication standby 3. replication standby with restore_command I think this patch cannot delay in 1 situation. So I think you should add only StandbyModeRequested flag in conditional branch. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
(2013/12/09 19:35), Pavel Stehule wrote: 2013/12/9 KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp Hi Fabrízio, I test your v4 patch, and send your review comments. * Fix typo 49 -# commited transactions from the master, specify a recovery time delay. 49 +# committed transactions from the master, specify a recovery time delay. * Fix white space 177 - if (secs = 0 microsecs =0) 177 + if (secs = 0 microsecs =0 ) * Add functionality (I propose) We can set negative number at min_standby_apply_delay. I think that this feature is for world wide replication situation. For example, master server is in Japan and slave server is in San Francisco. Japan time fowards than San Francisco time . And if we want to delay in this situation, it can need negative number in min_standby_apply_delay. So I propose that time delay conditional branch change under following. - if (min_standby_apply_delay 0) + if (min_standby_apply_delay != 0) What do you think? It might also be working collectry. what using interval instead absolute time? This is because local time is recorded in XLOG. And it has big cost for calculating global time. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-09 19:51:01 +0900, KONDO Mitsumasa wrote: Add my comment. We have to consider three situations. 1. PITR 2. replication standby 3. replication standby with restore_command I think this patch cannot delay in 1 situation. Why? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 12/04/2013 02:46 AM, Robert Haas wrote: Thanks for your review Christian... So, I proposed this patch previously and I still think it's a good idea, but it got voted down on the grounds that it didn't deal with clock drift. I view that as insufficient reason to reject the feature, but others disagreed. Unless some of those people have changed their minds, I don't think this patch has much future here. Surely that's the operating system / VM host / sysadmin / whatever's problem? The only way to deal with clock drift that isn't fragile in the face of variable latency, etc, is to basically re-implement (S)NTP in order to find out what the clock difference with the remote is. If we're going to do that, why not just let the OS deal with it? It might well be worth complaining about obvious aberrations like timestamps in the local future - preferably by complaining and not actually dying. It does need to be able to cope with a *skewing* clock, but I'd be surprised if it had any issues there in the first place. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 9 Dec 2013 12:16, Craig Ringer cr...@2ndquadrant.com wrote: The only way to deal with clock drift that isn't fragile in the face of variable latency, etc, is to basically re-implement (S)NTP in order to find out what the clock difference with the remote is. There's actually an entirely different way to deal with clock drift: test master time and slave time as two different incomparable spaces. Similar to how you would treat measurements in different units. If you do that then you can measure and manage the delay in the slave between receiving and applying a record and also measure the amount of master server time which can be pending. These measurements don't depend at all on time sync between servers. The specified feature depends explicitly on the conversion between master and slave time spaces so it's inevitable that sync would be an issue. It might be nice to print a warning on connection if the time is far out of sync or periodically check. But I don't think reimplementing NTP is a good idea.
Re: [HACKERS] Time-Delayed Standbys
(2013/12/09 20:29), Andres Freund wrote: On 2013-12-09 19:51:01 +0900, KONDO Mitsumasa wrote: Add my comment. We have to consider three situations. 1. PITR 2. replication standby 3. replication standby with restore_command I think this patch cannot delay in 1 situation. Why? I have three reasons. 1. It is written in document. Can we remove it? 2. Name of this feature is Time-delayed *standbys*, not Time-delayed *recovery*. Can we change it? 3. I think it is unnessesary in master PITR. And if it can delay in master PITR, it will become master at unexpected timing, not to continue to recovery. It is meaningless. I'd like to ask you what do you expect from this feature and how to use it. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Thu, Dec 5, 2013 at 11:07 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Dec 3, 2013 at 5:33 PM, Simon Riggs si...@2ndquadrant.com wrote: - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and XLOG_XACT_COMMIT_COMPACT checks Why just those? Why not aborts and restore points also? I think make no sense execute the delay after aborts and/or restore points, because it not change data in a standby server. I see no reason to pause for aborts. Aside from the fact that it wouldn't be reliable in corner cases, as Fabrízio says, there's no user-visible effect, just as there's no user-visible effect from replaying a transaction up until just prior to the point where it commits (which we also do). Waiting for restore points seems like it potentially makes sense. If the standby is delayed by an hour, and you create a restore point and wait 55 minutes, you might expect that that you can still kill the standby and recover it to that restore point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Fri, Dec 6, 2013 at 1:36 PM, Robert Haas robertmh...@gmail.com wrote: On Thu, Dec 5, 2013 at 11:07 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Dec 3, 2013 at 5:33 PM, Simon Riggs si...@2ndquadrant.com wrote: - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and XLOG_XACT_COMMIT_COMPACT checks Why just those? Why not aborts and restore points also? I think make no sense execute the delay after aborts and/or restore points, because it not change data in a standby server. I see no reason to pause for aborts. Aside from the fact that it wouldn't be reliable in corner cases, as Fabrízio says, there's no user-visible effect, just as there's no user-visible effect from replaying a transaction up until just prior to the point where it commits (which we also do). Waiting for restore points seems like it potentially makes sense. If the standby is delayed by an hour, and you create a restore point and wait 55 minutes, you might expect that that you can still kill the standby and recover it to that restore point. Makes sense. Fixed. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 9d80256..12aa917 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -142,6 +142,31 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /listitem /varlistentry + varlistentry id=min-standby-apply-delay xreflabel=min_standby_apply_delay + termvarnamemin_standby_apply_delay/varname (typeinteger/type)/term + indexterm +primaryvarnamemin_standby_apply_delay/ recovery parameter/primary + /indexterm + listitem + para +Specifies the amount of time (in milliseconds, if no unit is specified) +which recovery of transaction commits should lag the master. This +parameter allows creation of a time-delayed standby. For example, if +you set this parameter to literal5min/literal, the standby will +replay each transaction commit only when the system time on the standby +is at least five minutes past the commit time reported by the master. + /para + para +Note that if the master and standby system clocks are not synchronized, +this might lead to unexpected results. + /para + para +This parameter works only for streaming replication deployments. Synchronous +replicas and PITR has not affected. + /para + /listitem + /varlistentry + /variablelist /sect1 diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index 5acfa57..e8617db 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -123,6 +123,17 @@ # #trigger_file = '' # +# min_standby_apply_delay +# +# By default, a standby server keeps restoring XLOG records from the +# primary as soon as possible. If you want to delay the replay of +# commited transactions from the master, specify a recovery time delay. +# For example, if you set this parameter to 5min, the standby will replay +# each transaction commit only when the system time on the standby is least +# five minutes past the commit time reported by the master. +# +#min_standby_apply_delay = 0 +# #--- # HOT STANDBY PARAMETERS #--- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index b68230d..7ca2f9b 100755 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -218,6 +218,8 @@ static bool recoveryPauseAtTarget = true; static TransactionId recoveryTargetXid; static TimestampTz recoveryTargetTime; static char *recoveryTargetName; +static int min_standby_apply_delay = 0; +static TimestampTz recoveryDelayUntilTime; /* options taken from recovery.conf for XLOG streaming */ static bool StandbyModeRequested = false; @@ -730,6 +732,7 @@ static void readRecoveryCommandFile(void); static void exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo); static bool recoveryStopsHere(XLogRecord *record, bool *includeThis); static void recoveryPausesHere(void); +static void recoveryDelay(void); static void SetLatestXTime(TimestampTz xtime); static void SetCurrentChunkStartTime(TimestampTz xtime); static void CheckRequiredParameterValues(void); @@ -5474,6 +5477,19 @@ readRecoveryCommandFile(void) (errmsg_internal(trigger_file = '%s', TriggerFile))); } + else if (strcmp(item-name, min_standby_apply_delay) == 0)
Re: [HACKERS] Time-Delayed Standbys
On Thu, Dec 5, 2013 at 1:45 AM, Simon Riggs si...@2ndquadrant.com wrote: On 3 December 2013 18:46, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse christ...@2ndquadrant.com wrote: Hi Fabrizio, looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It applies and compiles w/o errors or warnings. I set up a master and two hot standbys replicating from the master, one with 5 minutes delay and one without delay. After that I created a new database and generated some test data: CREATE TABLE test (val INTEGER); INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); The non-delayed standby nearly instantly had the data replicated, the delayed standby was replicated after exactly 5 minutes. I did not notice any problems, errors or warnings. Thanks for your review Christian... So, I proposed this patch previously and I still think it's a good idea, but it got voted down on the grounds that it didn't deal with clock drift. I view that as insufficient reason to reject the feature, but others disagreed. Unless some of those people have changed their minds, I don't think this patch has much future here. I had that objection and others. Since then many people have requested this feature and have persuaded me that this is worth having and that my objections are minor points. I now agree with the need for the feature, almost as written. Not recalling the older thread, but it seems the breaks on clock drift, I think we can fairly easily make that situation good enough. Just have IDENTIFY_SYSTEM return the current timestamp on the master, and refuse to start if the time difference is too great. Yes, that doesn't catch the case when the machines are in perfect sync when they start up and drift *later*, but it will catch the most common cases I bet. But I think that's good enough that we can accept the feature, given that *most* people will have ntp, and that it's a very useful feature for those people. But we could help people who run into it because of a simple config error.. Or maybe the suggested patch already does this, in which case ignore that part :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] Time-Delayed Standbys
On 5 December 2013 08:51, Magnus Hagander mag...@hagander.net wrote: Not recalling the older thread, but it seems the breaks on clock drift, I think we can fairly easily make that situation good enough. Just have IDENTIFY_SYSTEM return the current timestamp on the master, and refuse to start if the time difference is too great. Yes, that doesn't catch the case when the machines are in perfect sync when they start up and drift *later*, but it will catch the most common cases I bet. But I think that's good enough that we can accept the feature, given that *most* people will have ntp, and that it's a very useful feature for those people. But we could help people who run into it because of a simple config error.. Or maybe the suggested patch already does this, in which case ignore that part :) I think the very nature of *this* feature is that it doesnt *require* the clocks to be exactly in sync, even though that is the basis for measurement. The setting of this parameter for sane usage would be minimum 5 mins, but more likely 30 mins, 1 hour or more. In that case, a few seconds drift either way makes no real difference to this feature. So IMHO, without prejudice to other features that may be more critically reliant on time synchronisation, we are OK to proceed with this specific feature. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Thu, Dec 5, 2013 at 7:57 AM, Simon Riggs si...@2ndquadrant.com wrote: On 5 December 2013 08:51, Magnus Hagander mag...@hagander.net wrote: Not recalling the older thread, but it seems the breaks on clock drift, I think we can fairly easily make that situation good enough. Just have IDENTIFY_SYSTEM return the current timestamp on the master, and refuse to start if the time difference is too great. Yes, that doesn't catch the case when the machines are in perfect sync when they start up and drift *later*, but it will catch the most common cases I bet. But I think that's good enough that we can accept the feature, given that *most* people will have ntp, and that it's a very useful feature for those people. But we could help people who run into it because of a simple config error.. Or maybe the suggested patch already does this, in which case ignore that part :) I think the very nature of *this* feature is that it doesnt *require* the clocks to be exactly in sync, even though that is the basis for measurement. The setting of this parameter for sane usage would be minimum 5 mins, but more likely 30 mins, 1 hour or more. In that case, a few seconds drift either way makes no real difference to this feature. So IMHO, without prejudice to other features that may be more critically reliant on time synchronisation, we are OK to proceed with this specific feature. Hi all, I saw the comments of all of you. I'm a few busy with some customers issues (has been a crazy week), but I'll reply and/or fix your suggestions later. Thanks for all review and sorry to delay in reply. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Time-Delayed Standbys
On Tue, Dec 3, 2013 at 5:33 PM, Simon Riggs si...@2ndquadrant.com wrote: - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and XLOG_XACT_COMMIT_COMPACT checks Why just those? Why not aborts and restore points also? I think make no sense execute the delay after aborts and/or restore points, because it not change data in a standby server. - don't care about clockdrift because it's an admin problem. Few minor points on things * The code with comment Clear any previous recovery delay time is in wrong place, move down or remove completely. Setting the delay to zero doesn't prevent calling recoveryDelay(), so that logic looks wrong anyway. Fixed. * The loop exit in recoveryDelay() is inelegant, should break if = 0 Fixed. * There's a spelling mistake in sample Fixed. * The patch has whitespace in one place Fixed. and one important point... * The delay loop happens AFTER we check for a pause. Which means if the user notices a problem on a commit, then hits pause button on the standby, the pause will have no effect and the next commit will be applied anyway. Maybe just one commit, but its an off by one error that removes the benefit of the patch. So I think we need to test this again after we finish delaying if (xlogctl-recoveryPause) recoveryPausesHere(); Fixed. We need to explain in the docs that this is intended only for use in a live streaming deployment. It will have little or no meaning in a PITR. Fixed. I think recovery_time_delay should be called something_apply_delay to highlight the point that it is the apply of records that is delayed, not the receipt. And hence the need to document that sync rep is NOT slowed down by setting this value. Fixed. And to make the name consistent with other parameters, I suggest min_standby_apply_delay I agree. Fixed! We also need to document caveats about the patch, in that it only delays on timestamped WAL records and other records may be applied sooner than the delay in some circumstances, so it is not a way to avoid all cancellations. We also need to document the behaviour of the patch is to apply all data received as quickly as possible once triggered, so the specified delay does not slow down promoting the server to a master. That might also be seen as a negative behaviour, since promoting the master effectively sets recovery_time_delay to zero. I will handle the additional documentation, if you can update the patch with the main review comments. Thanks. Thanks, your help is welcome. Att, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 9d80256..12aa917 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -142,6 +142,31 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /listitem /varlistentry + varlistentry id=min-standby-apply-delay xreflabel=min_standby_apply_delay + termvarnamemin_standby_apply_delay/varname (typeinteger/type)/term + indexterm +primaryvarnamemin_standby_apply_delay/ recovery parameter/primary + /indexterm + listitem + para +Specifies the amount of time (in milliseconds, if no unit is specified) +which recovery of transaction commits should lag the master. This +parameter allows creation of a time-delayed standby. For example, if +you set this parameter to literal5min/literal, the standby will +replay each transaction commit only when the system time on the standby +is at least five minutes past the commit time reported by the master. + /para + para +Note that if the master and standby system clocks are not synchronized, +this might lead to unexpected results. + /para + para +This parameter works only for streaming replication deployments. Synchronous +replicas and PITR has not affected. + /para + /listitem + /varlistentry + /variablelist /sect1 diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index 5acfa57..e8617db 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -123,6 +123,17 @@ # #trigger_file = '' # +# min_standby_apply_delay +# +# By default, a standby server keeps restoring XLOG records from the +# primary as soon as possible. If you want to delay the replay of +# commited transactions from the master, specify a recovery time delay. +# For example, if you set this parameter to 5min, the standby will replay +# each transaction commit only when the system time on the standby is least +#
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-04 11:13:58 +0900, KONDO Mitsumasa wrote: 4) Start the slave and connect to it using psql and in another session I can see all archive recovery log Hmm... I had thought my mistake in reading your email, but it reproduce again. When I sat small recovery_time_delay(=3), it might work collectry. However, I sat long timed recovery_time_delay(=300), it didn't work. My reporduced operation log is under following. [mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 query mode: simple number of clients: 8 number of threads: 4 duration: 30 s number of transactions actually processed: 68704 latency average: 3.493 ms tps = 2289.196747 (including connections establishing) tps = 2290.175129 (excluding connections establishing) [mitsu-ko@localhost postgresql]$ vim slave/recovery.conf [mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start server starting [mitsu-ko@localhost postgresql]$ LOG: database system was shut down in recovery at 2013-12-03 10:26:41 JST LOG: entering standby mode LOG: consistent recovery state reached at 0/5C4D8668 LOG: redo starts at 0/5C4000D8 [mitsu-ko@localhost postgresql]$ FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up [mitsu-ko@localhost postgresql]$ bin/psql -p6543 psql: FATAL: the database system is starting up [mitsu-ko@localhost postgresql]$ bin/psql -p6543 psql: FATAL: the database system is starting up I attached my postgresql.conf and recovery.conf. It will be reproduced. So, you brought up a standby and it took more time to become consistent because it waited on commits? That's the problem? If so, I don't think that's a bug? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
Hi, On 04/12/13 11:13, KONDO Mitsumasa wrote: 1) Clusters - build master - build slave and attach to the master using SR and config recovery_time_delay to 1min. 2) Stop de Slave 3) Run some transactions on the master using pgbench to generate a lot of archives 4) Start the slave and connect to it using psql and in another session I can see all archive recovery log Hmm... I had thought my mistake in reading your email, but it reproduce again. When I sat small recovery_time_delay(=3), it might work collectry. However, I sat long timed recovery_time_delay(=300), it didn't work. […] I'm not sure if I understand your problem correctly. I try to summarize, please correct if I'm wrong: You created a master node and a hot standby with 300 delay. Then you stopped the standby, did the pgbench and startet the hot standby again. It did not get in line with the master. Is this correct? I don't see a problem here… the standby should not be in sync with the master, it should be delayed. I did step by step what you did and after 50 minutes (300ms) the standby was at the same level the master was. Did I missunderstand you? Regards, Christian Kruse -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgp7HACTkLsby.pgp Description: PGP signature
Re: [HACKERS] Time-Delayed Standbys
2013/12/4 Andres Freund and...@2ndquadrant.com On 2013-12-04 11:13:58 +0900, KONDO Mitsumasa wrote: 4) Start the slave and connect to it using psql and in another session I can see all archive recovery log Hmm... I had thought my mistake in reading your email, but it reproduce again. When I sat small recovery_time_delay(=3), it might work collectry. However, I sat long timed recovery_time_delay(=300), it didn't work. My reporduced operation log is under following. [mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 query mode: simple number of clients: 8 number of threads: 4 duration: 30 s number of transactions actually processed: 68704 latency average: 3.493 ms tps = 2289.196747 (including connections establishing) tps = 2290.175129 (excluding connections establishing) [mitsu-ko@localhost postgresql]$ vim slave/recovery.conf [mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start server starting [mitsu-ko@localhost postgresql]$ LOG: database system was shut down in recovery at 2013-12-03 10:26:41 JST LOG: entering standby mode LOG: consistent recovery state reached at 0/5C4D8668 LOG: redo starts at 0/5C4000D8 [mitsu-ko@localhost postgresql]$ FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up [mitsu-ko@localhost postgresql]$ bin/psql -p6543 psql: FATAL: the database system is starting up [mitsu-ko@localhost postgresql]$ bin/psql -p6543 psql: FATAL: the database system is starting up I attached my postgresql.conf and recovery.conf. It will be reproduced. So, you brought up a standby and it took more time to become consistent because it waited on commits? That's the problem? If so, I don't think that's a bug? When it happened, psql cannot connect standby server at all. I think this behavior is not good. It should only delay recovery position and can seen old delay table data. Cannot connect server is not hoped behavior. If you think this behavior is the best, I will set ready for commiter. And commiter will fix it better. Rregards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-04 22:47:47 +0900, Mitsumasa KONDO wrote: 2013/12/4 Andres Freund and...@2ndquadrant.com When it happened, psql cannot connect standby server at all. I think this behavior is not good. It should only delay recovery position and can seen old delay table data. That doesn't sound like a good plan - even if the clients cannot connect yet, you can still promote the server. Just not taking delay into consideration at that point seems like it would possibly surprise users rather badly in situations they really cannot use such surprises. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
2013/12/4 Christian Kruse christ...@2ndquadrant.com You created a master node and a hot standby with 300 delay. Then you stopped the standby, did the pgbench and startet the hot standby again. It did not get in line with the master. Is this correct? No. First, I start master, and execute pgbench. Second, I start standby with 300ms(50min) delay. Then it cannot connect standby server by psql at all. I'm not sure why standby did not start. It might because delay feature is disturbed in REDO loop when first standby start-up. I don't see a problem here… the standby should not be in sync with the master, it should be delayed. I did step by step what you did and after 50 minutes (300ms) the standby was at the same level the master was. I think we can connect standby server any time, nevertheless with delay option. Did I missunderstand you? I'm not sure... You might right or another best way might be existed. Regards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] Time-Delayed Standbys
Hi, On 2013-12-03 19:33:16 +, Simon Riggs wrote: - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and XLOG_XACT_COMMIT_COMPACT checks Why just those? Why not aborts and restore points also? What would the advantage of waiting on anything but commits be? If it's not a commit, the action won't change the state of the database (yesyes, there are exceptions, but those don't have a timestamp)... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
2013/12/4 Andres Freund and...@2ndquadrant.com On 2013-12-04 22:47:47 +0900, Mitsumasa KONDO wrote: 2013/12/4 Andres Freund and...@2ndquadrant.com When it happened, psql cannot connect standby server at all. I think this behavior is not good. It should only delay recovery position and can seen old delay table data. That doesn't sound like a good plan - even if the clients cannot connect yet, you can still promote the server. I'm not sure your argument, but does a purpose of this patch slip off? Just not taking delay into consideration at that point seems like it would possibly surprise users rather badly in situations they really cannot use such surprises. Hmm... I think user will be surprised... I think it is easy to fix behavior using recovery flag. So we had better to wait for other comments. Regards, -- Mitsumasa KONDO NTT Open Source Software Center
Re: [HACKERS] Time-Delayed Standbys
Robert Haas robertmh...@gmail.com wrote: So, I proposed this patch previously and I still think it's a good idea, but it got voted down on the grounds that it didn't deal with clock drift. I view that as insufficient reason to reject the feature, but others disagreed. Unless some of those people have changed their minds, I don't think this patch has much future here. There are many things that a system admin can get wrong. Failing to supply this feature because the sysadmin might not be running ntpd (or equivalent) correctly seems to me to be like not having the software do fsync because the sysadmin might not have turned off write-back buffering on drives without persistent storage. Either way, poor system management can defeat the feature. Either way, I see no reason to withhold the feature from those who manage their systems in a sane fashion. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
Hi, On 04/12/13 07:22, Kevin Grittner wrote: There are many things that a system admin can get wrong. Failing to supply this feature because the sysadmin might not be running ntpd (or equivalent) correctly seems to me to be like not having the software do fsync because the sysadmin might not have turned off write-back buffering on drives without persistent storage. Either way, poor system management can defeat the feature. Either way, I see no reason to withhold the feature from those who manage their systems in a sane fashion. I agree. But maybe we should add a warning in the documentation about time syncing? Greetings, CK -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgp929ckT_fsN.pgp Description: PGP signature
Re: [HACKERS] Time-Delayed Standbys
src/backend/access/transam/xlog.c:5889: trailing whitespace. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 3 December 2013 18:46, Robert Haas robertmh...@gmail.com wrote: On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse christ...@2ndquadrant.com wrote: Hi Fabrizio, looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It applies and compiles w/o errors or warnings. I set up a master and two hot standbys replicating from the master, one with 5 minutes delay and one without delay. After that I created a new database and generated some test data: CREATE TABLE test (val INTEGER); INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); The non-delayed standby nearly instantly had the data replicated, the delayed standby was replicated after exactly 5 minutes. I did not notice any problems, errors or warnings. Thanks for your review Christian... So, I proposed this patch previously and I still think it's a good idea, but it got voted down on the grounds that it didn't deal with clock drift. I view that as insufficient reason to reject the feature, but others disagreed. Unless some of those people have changed their minds, I don't think this patch has much future here. I had that objection and others. Since then many people have requested this feature and have persuaded me that this is worth having and that my objections are minor points. I now agree with the need for the feature, almost as written. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
Hi Fabrizio, looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It applies and compiles w/o errors or warnings. I set up a master and two hot standbys replicating from the master, one with 5 minutes delay and one without delay. After that I created a new database and generated some test data: CREATE TABLE test (val INTEGER); INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); The non-delayed standby nearly instantly had the data replicated, the delayed standby was replicated after exactly 5 minutes. I did not notice any problems, errors or warnings. Greetings, CK -- Christian Kruse http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services pgpok2vtj3rMM.pgp Description: PGP signature
Re: [HACKERS] Time-Delayed Standbys
On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse christ...@2ndquadrant.comwrote: Hi Fabrizio, looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It applies and compiles w/o errors or warnings. I set up a master and two hot standbys replicating from the master, one with 5 minutes delay and one without delay. After that I created a new database and generated some test data: CREATE TABLE test (val INTEGER); INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); The non-delayed standby nearly instantly had the data replicated, the delayed standby was replicated after exactly 5 minutes. I did not notice any problems, errors or warnings. Thanks for your review Christian... Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello
Re: [HACKERS] Time-Delayed Standbys
On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse christ...@2ndquadrant.com wrote: Hi Fabrizio, looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It applies and compiles w/o errors or warnings. I set up a master and two hot standbys replicating from the master, one with 5 minutes delay and one without delay. After that I created a new database and generated some test data: CREATE TABLE test (val INTEGER); INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); The non-delayed standby nearly instantly had the data replicated, the delayed standby was replicated after exactly 5 minutes. I did not notice any problems, errors or warnings. Thanks for your review Christian... So, I proposed this patch previously and I still think it's a good idea, but it got voted down on the grounds that it didn't deal with clock drift. I view that as insufficient reason to reject the feature, but others disagreed. Unless some of those people have changed their minds, I don't think this patch has much future here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 12/03/2013 10:46 AM, Robert Haas wrote: On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse christ...@2ndquadrant.com wrote: Hi Fabrizio, looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It applies and compiles w/o errors or warnings. I set up a master and two hot standbys replicating from the master, one with 5 minutes delay and one without delay. After that I created a new database and generated some test data: CREATE TABLE test (val INTEGER); INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); The non-delayed standby nearly instantly had the data replicated, the delayed standby was replicated after exactly 5 minutes. I did not notice any problems, errors or warnings. Thanks for your review Christian... So, I proposed this patch previously and I still think it's a good idea, but it got voted down on the grounds that it didn't deal with clock drift. I view that as insufficient reason to reject the feature, but others disagreed. Unless some of those people have changed their minds, I don't think this patch has much future here. I would agree that it is a good idea. Joshua D. Drake -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 2013-12-03 13:46:28 -0500, Robert Haas wrote: On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse christ...@2ndquadrant.com wrote: Hi Fabrizio, looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It applies and compiles w/o errors or warnings. I set up a master and two hot standbys replicating from the master, one with 5 minutes delay and one without delay. After that I created a new database and generated some test data: CREATE TABLE test (val INTEGER); INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); The non-delayed standby nearly instantly had the data replicated, the delayed standby was replicated after exactly 5 minutes. I did not notice any problems, errors or warnings. Thanks for your review Christian... So, I proposed this patch previously and I still think it's a good idea, but it got voted down on the grounds that it didn't deal with clock drift. I view that as insufficient reason to reject the feature, but others disagreed. I really fail to see why clock drift should be this patch's responsibility. It's not like the world would go under^W data corruption would ensue if the clocks drift. Your standby would get delayed imprecisely. Big deal. From what I know of potential users of this feature, they would set it to at the very least 30min - that's WAY above the range for acceptable clock-drift on servers. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On 18 October 2013 19:03, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: The attached patch is a continuation of Robert's work [1]. Reviewing v2... I made some changes: - use of Latches instead of pg_usleep, so we don't have to wakeup regularly. OK - call HandleStartupProcInterrupts() before CheckForStandbyTrigger() because might change the trigger file's location OK - compute recoveryUntilDelayTime in XLOG_XACT_COMMIT and XLOG_XACT_COMMIT_COMPACT checks Why just those? Why not aborts and restore points also? - don't care about clockdrift because it's an admin problem. Few minor points on things * The code with comment Clear any previous recovery delay time is in wrong place, move down or remove completely. Setting the delay to zero doesn't prevent calling recoveryDelay(), so that logic looks wrong anyway. * The loop exit in recoveryDelay() is inelegant, should break if = 0 * There's a spelling mistake in sample * The patch has whitespace in one place and one important point... * The delay loop happens AFTER we check for a pause. Which means if the user notices a problem on a commit, then hits pause button on the standby, the pause will have no effect and the next commit will be applied anyway. Maybe just one commit, but its an off by one error that removes the benefit of the patch. So I think we need to test this again after we finish delaying if (xlogctl-recoveryPause) recoveryPausesHere(); We need to explain in the docs that this is intended only for use in a live streaming deployment. It will have little or no meaning in a PITR. I think recovery_time_delay should be called something_apply_delay to highlight the point that it is the apply of records that is delayed, not the receipt. And hence the need to document that sync rep is NOT slowed down by setting this value. And to make the name consistent with other parameters, I suggest min_standby_apply_delay We also need to document caveats about the patch, in that it only delays on timestamped WAL records and other records may be applied sooner than the delay in some circumstances, so it is not a way to avoid all cancellations. We also need to document the behaviour of the patch is to apply all data received as quickly as possible once triggered, so the specified delay does not slow down promoting the server to a master. That might also be seen as a negative behaviour, since promoting the master effectively sets recovery_time_delay to zero. I will handle the additional documentation, if you can update the patch with the main review comments. Thanks. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
(2013/11/30 5:34), Fabrízio de Royes Mello wrote: On Fri, Nov 29, 2013 at 5:49 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp mailto:kondo.mitsum...@lab.ntt.co.jp wrote: * Problem1 Your patch does not code recovery.conf.sample about recovery_time_delay. Please add it. Fixed. OK. It seems no problem. * Problem2 When I set time-delayed standby and start standby server, I cannot access stanby server by psql. It is because PG is in first starting recovery which cannot access by psql. I think that time-delayed standby is only delayed recovery position, it must not affect other functionality. I didn't test recoevery in master server with recovery_time_delay. If you have detail test result of these cases, please send me. Well, I could not reproduce the problem that you described. I run the following test: 1) Clusters - build master - build slave and attach to the master using SR and config recovery_time_delay to 1min. 2) Stop de Slave 3) Run some transactions on the master using pgbench to generate a lot of archives 4) Start the slave and connect to it using psql and in another session I can see all archive recovery log Hmm... I had thought my mistake in reading your email, but it reproduce again. When I sat small recovery_time_delay(=3), it might work collectry. However, I sat long timed recovery_time_delay(=300), it didn't work. My reporduced operation log is under following. [mitsu-ko@localhost postgresql]$ bin/pgbench -T 30 -c 8 -j4 -p5432 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 10 query mode: simple number of clients: 8 number of threads: 4 duration: 30 s number of transactions actually processed: 68704 latency average: 3.493 ms tps = 2289.196747 (including connections establishing) tps = 2290.175129 (excluding connections establishing) [mitsu-ko@localhost postgresql]$ vim slave/recovery.conf [mitsu-ko@localhost postgresql]$ bin/pg_ctl -D slave start server starting [mitsu-ko@localhost postgresql]$ LOG: database system was shut down in recovery at 2013-12-03 10:26:41 JST LOG: entering standby mode LOG: consistent recovery state reached at 0/5C4D8668 LOG: redo starts at 0/5C4000D8 [mitsu-ko@localhost postgresql]$ FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up FATAL: the database system is starting up [mitsu-ko@localhost postgresql]$ bin/psql -p6543 psql: FATAL: the database system is starting up [mitsu-ko@localhost postgresql]$ bin/psql -p6543 psql: FATAL: the database system is starting up I attached my postgresql.conf and recovery.conf. It will be reproduced. I think that your patch should be needed recovery flags which are like ArchiveRecoveryRequested and InArchiveRecovery etc. It is because time-delayed standy works only replication situasion. And I hope that it isn't bad in startup standby server and archive recovery. Is it wrong with your image? I think this patch have a lot of potential, however I think that standby functionality is more important than this feature. And we might need to discuss that how behavior is best in this patch. Regards, -- Mitsumasa KONDO NTT Open Source Software Center conf.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
(2013/12/04 4:00), Andres Freund wrote: On 2013-12-03 13:46:28 -0500, Robert Haas wrote: On Tue, Dec 3, 2013 at 12:36 PM, Fabrízio de Royes Mello fabriziome...@gmail.com wrote: On Tue, Dec 3, 2013 at 2:33 PM, Christian Kruse christ...@2ndquadrant.com wrote: Hi Fabrizio, looks good to me. I did some testing on 9.2.4, 9.2.5 and HEAD. It applies and compiles w/o errors or warnings. I set up a master and two hot standbys replicating from the master, one with 5 minutes delay and one without delay. After that I created a new database and generated some test data: CREATE TABLE test (val INTEGER); INSERT INTO test (val) (SELECT * FROM generate_series(0, 100)); The non-delayed standby nearly instantly had the data replicated, the delayed standby was replicated after exactly 5 minutes. I did not notice any problems, errors or warnings. Thanks for your review Christian... So, I proposed this patch previously and I still think it's a good idea, but it got voted down on the grounds that it didn't deal with clock drift. I view that as insufficient reason to reject the feature, but others disagreed. I really fail to see why clock drift should be this patch's responsibility. It's not like the world would go under^W data corruption would ensue if the clocks drift. Your standby would get delayed imprecisely. Big deal. From what I know of potential users of this feature, they would set it to at the very least 30min - that's WAY above the range for acceptable clock-drift on servers. Yes. I think that purpose of this patch is long time delay in standby server, and not for little bit careful timing delay. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Time-Delayed Standbys
On Fri, Nov 29, 2013 at 5:49 AM, KONDO Mitsumasa kondo.mitsum...@lab.ntt.co.jp wrote: Hi Royes, I'm sorry for my late review... No problem... I feel potential of your patch in PG replication function, and it might be something useful for all people. I check your patch and have some comment for improvement. I haven't executed detail of unexpected sutuation yet. But I think that under following problem2 is important functionality problem. So I ask you to solve the problem in first. * Regress test No problem. * Problem1 Your patch does not code recovery.conf.sample about recovery_time_delay. Please add it. Fixed. * Problem2 When I set time-delayed standby and start standby server, I cannot access stanby server by psql. It is because PG is in first starting recovery which cannot access by psql. I think that time-delayed standby is only delayed recovery position, it must not affect other functionality. I didn't test recoevery in master server with recovery_time_delay. If you have detail test result of these cases, please send me. Well, I could not reproduce the problem that you described. I run the following test: 1) Clusters - build master - build slave and attach to the master using SR and config recovery_time_delay to 1min. 2) Stop de Slave 3) Run some transactions on the master using pgbench to generate a lot of archives 4) Start the slave and connect to it using psql and in another session I can see all archive recovery log My first easy review of your patch is that all. Thanks. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog sobre TI: http://fabriziomello.blogspot.com Perfil Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index c0c543e..641c9c6 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -135,6 +135,27 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /listitem /varlistentry + varlistentry id=recovery-time-delay xreflabel=recovery_time_delay + termvarnamerecovery_time_delay/varname (typeinteger/type)/term + indexterm +primaryvarnamerecovery_time_delay/ recovery parameter/primary + /indexterm + listitem + para +Specifies the amount of time (in milliseconds, if no unit is specified) +which recovery of transaction commits should lag the master. This +parameter allows creation of a time-delayed standby. For example, if +you set this parameter to literal5min/literal, the standby will +replay each transaction commit only when the system time on the standby +is at least five minutes past the commit time reported by the master. + /para + para +Note that if the master and standby system clocks are not synchronized, +this might lead to unexpected results. + /para + /listitem + /varlistentry + /variablelist /sect1 diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index 5acfa57..97cc7af 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -123,6 +123,17 @@ # #trigger_file = '' # +# recovery_time_delay +# +# By default, a standby server keeps restoring XLOG records from the +# primary as soon as possible. If you want to delay the replay of +# commited transactions from the master, specify a recovery time delay. +# For example, if you set this parameter to 5min, the standby will replay +# each transaction commit only whe the system time on the standby is least +# five minutes past the commit time reported by the master. +# +#recovery_time_delay = 0 +# #--- # HOT STANDBY PARAMETERS #--- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index de19d22..714b1bd 100755 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -218,6 +218,8 @@ static bool recoveryPauseAtTarget = true; static TransactionId recoveryTargetXid; static TimestampTz recoveryTargetTime; static char *recoveryTargetName; +static int recovery_time_delay = 0; +static TimestampTz recoveryDelayUntilTime; /* options taken from recovery.conf for XLOG streaming */ static bool StandbyModeRequested = false; @@ -730,6 +732,7 @@ static void readRecoveryCommandFile(void); static void exitArchiveRecovery(TimeLineID endTLI, XLogSegNo endLogSegNo); static bool recoveryStopsHere(XLogRecord *record, bool *includeThis); static void recoveryPausesHere(void); +static void recoveryDelay(void); static void SetLatestXTime(TimestampTz xtime); static void SetCurrentChunkStartTime(TimestampTz
Re: [HACKERS] Time-Delayed Standbys
Hi Royes, I'm sorry for my late review... I feel potential of your patch in PG replication function, and it might be something useful for all people. I check your patch and have some comment for improvement. I haven't executed detail of unexpected sutuation yet. But I think that under following problem2 is important functionality problem. So I ask you to solve the problem in first. * Regress test No problem. * Problem1 Your patch does not code recovery.conf.sample about recovery_time_delay. Please add it. * Problem2 When I set time-delayed standby and start standby server, I cannot access stanby server by psql. It is because PG is in first starting recovery which cannot access by psql. I think that time-delayed standby is only delayed recovery position, it must not affect other functionality. I didn't test recoevery in master server with recovery_time_delay. If you have detail test result of these cases, please send me. My first easy review of your patch is that all. Regards, -- Mitsumasa KONDO NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Thu, Jun 30, 2011 at 6:25 PM, Robert Haas robertmh...@gmail.com wrote: I think the time problems are more complex than said. The patch relies upon transaction completion times, but not all WAL records have a time attached to them. Plus you only used commits anyway, not sure why. For the same reason we do that with the recovery_target_* code - replaying something like a heap insert or heap update doesn't change the user-visible state of the database, because the records aren't visible anyway until the commit record is replayed. Some actions aren't even transactional, such as DROP DATABASE, amongst Good point. We'd probably need to add a timestamp to the drop database record, as that's a case that people would likely want to defend against with this feature. others. Consecutive records can be hours apart, so it would be possible to delay on some WAL records but then replay records that happened minutes ago, then wait hours for the next apply. So this patch doesn't do what it claims in all cases. You misread my words above, neglecting the amongst others part. I don't believe you'll be able to do this just by relying on timestamps on WAL records because not all records carry timestamps and we're not going to add them just for this. It's easier to make this work usefully using pg_standby. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Thu, Jun 30, 2011 at 2:56 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 29, 2011 at 9:54 PM, Josh Berkus j...@agliodbs.com wrote: I am not sure exactly how walreceiver handles it if the disk is full. I assume it craps out and eventually retries, so probably what will happen is that, after the standby's pg_xlog directory fills up, walreceiver will sit there and error out until replay advances enough to remove a WAL file and thus permit some more data to be streamed. Nope, it gets stuck and stops there. Replay doesn't advance unless you can somehow clear out some space manually; if the disk is full, the disk is full, and PostgreSQL doesn't remove WAL files without being able to write files first. Manual (or scripted) intervention is always necessary if you reach disk 100% full. Wow, that's a pretty crappy failure mode... but I don't think we need to fix it just on account of this patch. It would be nice to fix, of course. How is that different to running out of space in the main database? If I try to pour a pint of milk into a small cup, I don't blame the cup. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 29, 2011 at 7:11 PM, Robert Haas robertmh...@gmail.com wrote: I don't really see how that's any different from what happens now. If (for whatever reason) the master is generating WAL faster than a streaming standby can replay it, then the excess WAL is going to pile up someplace, and you might run out of disk space. Time-delaying the standby creates an additional way for that to happen, but I don't think it's an entirely new problem. The only way to control this is with a time delay that can be changed while the server is running. A recovery.conf parameter doesn't allow that, so another way is preferable. I think the time problems are more complex than said. The patch relies upon transaction completion times, but not all WAL records have a time attached to them. Plus you only used commits anyway, not sure why. Some actions aren't even transactional, such as DROP DATABASE, amongst others. Consecutive records can be hours apart, so it would be possible to delay on some WAL records but then replay records that happened minutes ago, then wait hours for the next apply. So this patch doesn't do what it claims in all cases. Similar discussion on max_standby_delay covered exactly that ground and went on for weeks in 9.0. IIRC I presented the same case you just did and we agreed in the end that was not acceptable. I'm not going to repeat it. Please check the archives. So, again +1 for the feature, but -1 for the currently proposed implementation, based upon review. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On 6/30/11 2:00 AM, Simon Riggs wrote: Manual (or scripted) intervention is always necessary if you reach disk 100% full. Wow, that's a pretty crappy failure mode... but I don't think we need to fix it just on account of this patch. It would be nice to fix, of course. How is that different to running out of space in the main database? If I try to pour a pint of milk into a small cup, I don't blame the cup. I have to agree with Simon here. ;-) We can do some things to make this easier for administrators, but there's no way to solve the problem. And the things we could do would have to be advanced optional modes which aren't on by default, so they wouldn't really help the DBA with poor planning skills. Here's my suggestions: 1) Have a utility (pg_archivecleanup?) which checks if we have more than a specific settings's worth of archive_logs, and breaks replication and deletes the archive logs if we hit that number. This would also require some way for the standby to stop replicating *without* becoming a standalone server, which I don't think we currently have. 2) Have a setting where, regardless of standby_delay settings, the standby will interrupt any running queries and start applying logs as fast as possible if it hits a certain number of unapplied archive logs. Of course, given the issues we had with standby_delay, I'm not sure I want to complicate it further. I think we've already fixed the biggest issue in 9.1, since we now have a limit on the number of WALs the master will keep if archiving is failing ... yes? That's the only big *avoidable* failure mode we have, where a failing standby effectively shuts down the master. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Thu, Jun 30, 2011 at 1:00 PM, Josh Berkus j...@agliodbs.com wrote: On 6/30/11 2:00 AM, Simon Riggs wrote: Manual (or scripted) intervention is always necessary if you reach disk 100% full. Wow, that's a pretty crappy failure mode... but I don't think we need to fix it just on account of this patch. It would be nice to fix, of course. How is that different to running out of space in the main database? If I try to pour a pint of milk into a small cup, I don't blame the cup. I have to agree with Simon here. ;-) We can do some things to make this easier for administrators, but there's no way to solve the problem. And the things we could do would have to be advanced optional modes which aren't on by default, so they wouldn't really help the DBA with poor planning skills. Here's my suggestions: 1) Have a utility (pg_archivecleanup?) which checks if we have more than a specific settings's worth of archive_logs, and breaks replication and deletes the archive logs if we hit that number. This would also require some way for the standby to stop replicating *without* becoming a standalone server, which I don't think we currently have. 2) Have a setting where, regardless of standby_delay settings, the standby will interrupt any running queries and start applying logs as fast as possible if it hits a certain number of unapplied archive logs. Of course, given the issues we had with standby_delay, I'm not sure I want to complicate it further. I think we've already fixed the biggest issue in 9.1, since we now have a limit on the number of WALs the master will keep if archiving is failing ... yes? That's the only big *avoidable* failure mode we have, where a failing standby effectively shuts down the master. I'm not sure we changed anything in this area for 9.1. Am I wrong? wal_keep_segments was present in 9.0. Using that instead of archiving is a reasonable way to bound the amount of disk space that can get used, at the cost of possibly needing to rebuild the standby if things get too far behind. Of course, in any version, you could also use an archive_command that will remove old files to make space if the disk is full, with the same downside: if the standby isn't done with those files, you're now in for a rebuild. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Thu, Jun 30, 2011 at 6:45 AM, Simon Riggs si...@2ndquadrant.com wrote: The only way to control this is with a time delay that can be changed while the server is running. A recovery.conf parameter doesn't allow that, so another way is preferable. True. We've talked about making the recovery.conf parameters into GUCs, which would address that concern (and some others). I think the time problems are more complex than said. The patch relies upon transaction completion times, but not all WAL records have a time attached to them. Plus you only used commits anyway, not sure why. For the same reason we do that with the recovery_target_* code - replaying something like a heap insert or heap update doesn't change the user-visible state of the database, because the records aren't visible anyway until the commit record is replayed. Some actions aren't even transactional, such as DROP DATABASE, amongst Good point. We'd probably need to add a timestamp to the drop database record, as that's a case that people would likely want to defend against with this feature. others. Consecutive records can be hours apart, so it would be possible to delay on some WAL records but then replay records that happened minutes ago, then wait hours for the next apply. So this patch doesn't do what it claims in all cases. Similar discussion on max_standby_delay covered exactly that ground and went on for weeks in 9.0. IIRC I presented the same case you just did and we agreed in the end that was not acceptable. I'm not going to repeat it. Please check the archives. I think this case is a bit different. First, max_standby_delay is relevant for any installation using Hot Standby, whereas this is a feature that specifically involves time. Saying that you have to have time synchronization for Hot Standby to work as designed is more of a burden than saying you need time synchronization *if you want to use the time-delayed recovery feature*. Second, and maybe more importantly, no one has come up with an idea for how to make this work reliably in the presence of time skew. Perhaps we could provide a simple time-skew correction feature that would work in the streaming case (though probably not nearly as well as running ntpd), but as I understand your argument, you're saying that most people will want to use this with archiving. I don't see how to make that work without time synchronization. In the max_standby_delay case, the requirement is that queries not get cancelled too aggressively while at the same time letting the standby get too far behind the master, which leaves some flexibility in terms of how we actually make that trade-off, and we eventually found a way that didn't require time synchronization, which was an improvement. But for a time-delayed standby, the requirement at least AIUI is that the state of the standby lag the master by a certain time interval, and I don't see any way to do that without comparing slave timestamps with master timestamps. If we can find a similar clever trick here, great! But I'm not seeing how to do it. Now, another option here is to give up on the idea of a time-delayed standby altogether and instead allow the standby to lag the master by a certain number of WAL segments or XIDs. Of course, if we do that, then we will not have a feature called time-delayed standbys. Instead, we will have a feature called standbys delayed by a certain number of WAL segments (or XIDs). That certainly caters to some of the same use cases, but I think it severely lacking in the usability department. I bet the first thing most people will do is to try to figure out how to translate between those metrics and time, and I bet we'll get complaints on systems where the activity load is variable and therefore the time lag for a fixed WAL-segment lag or XID-lag is unpredictable. So I think keeping it defined it terms of time is the right way forward, even though the need for external time synchronization is, certainly, not ideal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On 6/30/11 10:25 AM, Robert Haas wrote: So I think keeping it defined it terms of time is the right way forward, even though the need for external time synchronization is, certainly, not ideal. Actually, when we last had the argument about time synchronization, Kevin Grittner (I believe) pointed out that unsynchronized replication servers have an assortment of other issues ... like any read query involving now(). As the person who originally brought up this hurdle, I felt that his argument defeated mine. Certainly I can't see any logical way to have time delay in the absence of clock synchronization of some kind. Also, I kinda feel like this discussion seems aimed at overcomplicating a feature which only a small fraction of our users will ever use.Let's keep it as simple as possible. As for delay on streaming replication, I'm for it. I think that post-9.1, thanks to pgbasebackup, the number of our users who are doing archive log shipping is going to drop tremendously. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
Josh Berkus j...@agliodbs.com wrote: when we last had the argument about time synchronization, Kevin Grittner (I believe) pointed out that unsynchronized replication servers have an assortment of other issues ... like any read query involving now(). I don't remember making that point, although I think it's a valid one. What I'm sure I pointed out is that we have one central router which synchronizes to a whole bunch of atomic clocks around the world using the normal discard the outliers and average the rest algorithm, and then *every singe server and workstation on our network synchronizes to that router*. Our database servers are all running on Linux using ntpd. Our monitoring spams us with email if any of the clocks falls outside nominal bounds. (It's been many years since we had a misconfigured server which triggered that.) I think doing anything in PostgreSQL around this beyond allowing DBAs to trust their server clocks is insane. The arguments for using and trusting ntpd is pretty much identical to the arguments for using and trusting the OS file systems. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
Kevin, I think doing anything in PostgreSQL around this beyond allowing DBAs to trust their server clocks is insane. The arguments for using and trusting ntpd is pretty much identical to the arguments for using and trusting the OS file systems. Oh, you don't want to implement our own NTP? Coward! ;-) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Thu, Jun 30, 2011 at 1:51 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I think doing anything in PostgreSQL around this beyond allowing DBAs to trust their server clocks is insane. The arguments for using and trusting ntpd is pretty much identical to the arguments for using and trusting the OS file systems. Except that implementing our own file system would likely have more benefit and be less work than implementing our own time synchronization, at least if we want it to be reliable. Again, I am not trying to pretend that this is any great shakes. MySQL's version of this feature apparently does somehow compensate for time skew, which I assume must mean that their replication works differently than ours - inter alia, it probably requires a TCP socket connection between the servers. Since we don't require that, it limits our options in this area, but also gives us more options in other areas. Still, if I could think of a way to do this that didn't depend on time synchronization, then I'd be in favor of eliminating that requirement. I just can't; and I'm inclined to think it isn't possible. I wouldn't be opposed to having an option to try to detect time skew between the master and the slave and, say, display that information in pg_stat_replication. It might be useful to have that data for monitoring purposes, and it probably wouldn't even be that much code. However, I'd be a bit hesitant to use that data to correct the amount of time we spend waiting for time-delayed replication, because it would doubtless be extremely imprecise compared to real time synchronization, and considerably more error-prone. IOW, what you said. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Fri, Jul 1, 2011 at 2:25 AM, Robert Haas robertmh...@gmail.com wrote: Some actions aren't even transactional, such as DROP DATABASE, amongst Good point. We'd probably need to add a timestamp to the drop database record, as that's a case that people would likely want to defend against with this feature. This means that recovery_target_* code would also need to deal with DROP DATABASE case. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Fri, Jul 1, 2011 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jun 30, 2011 at 1:51 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: I think doing anything in PostgreSQL around this beyond allowing DBAs to trust their server clocks is insane. The arguments for using and trusting ntpd is pretty much identical to the arguments for using and trusting the OS file systems. Except that implementing our own file system would likely have more benefit and be less work than implementing our own time synchronization, at least if we want it to be reliable. Again, I am not trying to pretend that this is any great shakes. MySQL's version of this feature apparently does somehow compensate for time skew, which I assume must mean that their replication works differently than ours - inter alia, it probably requires a TCP socket connection between the servers. Since we don't require that, it limits our options in this area, but also gives us more options in other areas. Still, if I could think of a way to do this that didn't depend on time synchronization, then I'd be in favor of eliminating that requirement. I just can't; and I'm inclined to think it isn't possible. I wouldn't be opposed to having an option to try to detect time skew between the master and the slave and, say, display that information in pg_stat_replication. It might be useful to have that data for monitoring purposes, and it probably wouldn't even be that much code. However, I'd be a bit hesitant to use that data to correct the amount of time we spend waiting for time-delayed replication, because it would doubtless be extremely imprecise compared to real time synchronization, and considerably more error-prone. IOW, what you said. I agree with Robert. It's difficult to implement time-synchronization feature which can deal with all the cases, and I'm not sure if that's really worth taking our time. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
Fujii Masao masao.fu...@gmail.com writes: On Fri, Jul 1, 2011 at 2:25 AM, Robert Haas robertmh...@gmail.com wrote: Some actions aren't even transactional, such as DROP DATABASE, amongst Good point. We'd probably need to add a timestamp to the drop database record, as that's a case that people would likely want to defend against with this feature. This means that recovery_target_* code would also need to deal with DROP DATABASE case. there is no problem if you use restore point names... but of course you lose flexibility (ie: you can't restore to 5 minutes before now) mmm... a lazy idea: can't we just create a restore point wal record *before* we actually drop the database? then we won't need to modify logic about recovery_target_* (if it is only DROP DATABASE maybe that's enough about complicating code) and we can provide that protection since 9.1 -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL Soporte 24x7, desarrollo, capacitación y servicios -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 15, 2011 at 6:58 AM, Fujii Masao masao.fu...@gmail.com wrote: Or, we should implement new promote mode which finishes a recovery as soon as promote is requested (i.e., not replay all the available WAL records)? That's not a new feature. We had it in 8.4, but it was removed. Originally, we supported fast failover via trigger file. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Thu, Jun 16, 2011 at 7:29 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote: When the replication connection is terminated, the standby tries to read WAL files from the archive. In this case, there is no walreceiver process, so how does the standby calculate the clock difference? Good question. Also, just because we have streaming replication available doesn't mean that we should force people to use it. It's still perfectly legit to set up a standby that only use archive_command and restore_command, and it would be nice if this feature could still work in such an environment. I anticipate that most people want to use streaming replication, but a time-delayed standby is a good example of a case where you might decide you don't need it. It could be useful to have all the WAL present (but not yet applied) if you're thinking you might want to promote that standby - but my guess is that in many cases, the time-delayed standby will be *in addition* to one or more regular standbys that would be the primary promotion candidates. So I can see someone deciding that they'd rather not have the load of another walsender on the master, and just let the time-delayed standby read from the archive. Even if that were not an issue, I'm still more or less of the opinion that trying to solve the time synchronization problem is a rathole anyway. To really solve this problem well, you're going to need the standby to send a message containing a timestamp, get a reply back from the master that contains that timestamp and a master timestamp, and then compute based on those two timestamps plus the reply timestamp the maximum and minimum possible lag between the two machines. Then you're going to need to guess, based on several cycles of this activity, what the actual lag is, and adjust it over time (but not too quckly, unless of course a large manual step has occurred) as the clocks potentially drift apart from each other. This is basically what ntpd does, except that it can be virtually guaranteed that our implementation will suck by comparison. Time synchronization is neither easy nor our core competency, and I think trying to include it in this feature is going to result in a net loss of reliability. This begs the question of why we need this feature at all, in the way proposed. Streaming replication is designed for immediate transfer of WAL. File based is more about storing them for some later use. It seems strange to pollute the *immediate* transfer route with a delay, when that is easily possible with a small patch to pg_standby that can wait until the filetime delay is X before returning. The main practical problem with this is that most people's WAL partitions aren't big enough to store the delayed WAL files, which is why we provide the file archiving route anyway. So in practical terms this will be unusable, or at least dangerous to use. +1 for the feature concept, but -1 for adding this to streaming replication. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 29, 2011 at 4:00 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Jun 16, 2011 at 7:29 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote: When the replication connection is terminated, the standby tries to read WAL files from the archive. In this case, there is no walreceiver process, so how does the standby calculate the clock difference? Good question. Also, just because we have streaming replication available doesn't mean that we should force people to use it. It's still perfectly legit to set up a standby that only use archive_command and restore_command, and it would be nice if this feature could still work in such an environment. I anticipate that most people want to use streaming replication, but a time-delayed standby is a good example of a case where you might decide you don't need it. It could be useful to have all the WAL present (but not yet applied) if you're thinking you might want to promote that standby - but my guess is that in many cases, the time-delayed standby will be *in addition* to one or more regular standbys that would be the primary promotion candidates. So I can see someone deciding that they'd rather not have the load of another walsender on the master, and just let the time-delayed standby read from the archive. Even if that were not an issue, I'm still more or less of the opinion that trying to solve the time synchronization problem is a rathole anyway. To really solve this problem well, you're going to need the standby to send a message containing a timestamp, get a reply back from the master that contains that timestamp and a master timestamp, and then compute based on those two timestamps plus the reply timestamp the maximum and minimum possible lag between the two machines. Then you're going to need to guess, based on several cycles of this activity, what the actual lag is, and adjust it over time (but not too quckly, unless of course a large manual step has occurred) as the clocks potentially drift apart from each other. This is basically what ntpd does, except that it can be virtually guaranteed that our implementation will suck by comparison. Time synchronization is neither easy nor our core competency, and I think trying to include it in this feature is going to result in a net loss of reliability. This begs the question of why we need this feature at all, in the way proposed. Streaming replication is designed for immediate transfer of WAL. File based is more about storing them for some later use. It seems strange to pollute the *immediate* transfer route with a delay, when that is easily possible with a small patch to pg_standby that can wait until the filetime delay is X before returning. The main practical problem with this is that most people's WAL partitions aren't big enough to store the delayed WAL files, which is why we provide the file archiving route anyway. So in practical terms this will be unusable, or at least dangerous to use. +1 for the feature concept, but -1 for adding this to streaming replication. As implemented, the feature will work with either streaming replication or with file-based replication. I don't see any value in restricting to work ONLY with file-based replication. Also, if we were to do it by making pg_standby wait, then the whole thing would be much less accurate, and the delay would become much harder to predict, because you'd be operating on the level of entire WAL segments, rather than individual commit records. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 29, 2011 at 1:24 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 29, 2011 at 4:00 AM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, Jun 16, 2011 at 7:29 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote: When the replication connection is terminated, the standby tries to read WAL files from the archive. In this case, there is no walreceiver process, so how does the standby calculate the clock difference? Good question. Also, just because we have streaming replication available doesn't mean that we should force people to use it. It's still perfectly legit to set up a standby that only use archive_command and restore_command, and it would be nice if this feature could still work in such an environment. I anticipate that most people want to use streaming replication, but a time-delayed standby is a good example of a case where you might decide you don't need it. It could be useful to have all the WAL present (but not yet applied) if you're thinking you might want to promote that standby - but my guess is that in many cases, the time-delayed standby will be *in addition* to one or more regular standbys that would be the primary promotion candidates. So I can see someone deciding that they'd rather not have the load of another walsender on the master, and just let the time-delayed standby read from the archive. Even if that were not an issue, I'm still more or less of the opinion that trying to solve the time synchronization problem is a rathole anyway. To really solve this problem well, you're going to need the standby to send a message containing a timestamp, get a reply back from the master that contains that timestamp and a master timestamp, and then compute based on those two timestamps plus the reply timestamp the maximum and minimum possible lag between the two machines. Then you're going to need to guess, based on several cycles of this activity, what the actual lag is, and adjust it over time (but not too quckly, unless of course a large manual step has occurred) as the clocks potentially drift apart from each other. This is basically what ntpd does, except that it can be virtually guaranteed that our implementation will suck by comparison. Time synchronization is neither easy nor our core competency, and I think trying to include it in this feature is going to result in a net loss of reliability. This begs the question of why we need this feature at all, in the way proposed. Streaming replication is designed for immediate transfer of WAL. File based is more about storing them for some later use. It seems strange to pollute the *immediate* transfer route with a delay, when that is easily possible with a small patch to pg_standby that can wait until the filetime delay is X before returning. The main practical problem with this is that most people's WAL partitions aren't big enough to store the delayed WAL files, which is why we provide the file archiving route anyway. So in practical terms this will be unusable, or at least dangerous to use. +1 for the feature concept, but -1 for adding this to streaming replication. As implemented, the feature will work with either streaming replication or with file-based replication. That sounds like the exact opposite of yours and Fujii's comments above. Please explain. I don't see any value in restricting to work ONLY with file-based replication. As explained above, it won't work in practice because of the amount of file space required. Or, an alternative question: what will you do when it waits so long that the standby runs out of disk space? If you hard-enforce the time delay specified then you just make replication fail under during heavy loads. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 29, 2011 at 1:50 PM, Simon Riggs si...@2ndquadrant.com wrote: As implemented, the feature will work with either streaming replication or with file-based replication. That sounds like the exact opposite of yours and Fujii's comments above. Please explain. I think our comments above were addressing the issue of whether it's feasible to correct for time skew between the master and the slave. Tom was arguing that we should try, but I was arguing that any system we put together is likely to be pretty unreliable (since good time synchronization algorithms are quite complex, and to my knowledge no one here is an expert on implementing them, nor do I think we want that much complexity in the backend) and Fujii was pointing out that it won't work at all if the WAL files are going through the archive rather than through streaming replication, which (if I understand you correctly) will be a more common case than I had assumed. I don't see any value in restricting to work ONLY with file-based replication. As explained above, it won't work in practice because of the amount of file space required. I guess it depends on how busy your system is and how much disk space you have. If using streaming replication causes pg_xlog to fill up on your standby, then you can either (1) put pg_xlog on a larger file system or (2) configure only restore_command and not primary_conninfo, so that only the archive is used. Or, an alternative question: what will you do when it waits so long that the standby runs out of disk space? I don't really see how that's any different from what happens now. If (for whatever reason) the master is generating WAL faster than a streaming standby can replay it, then the excess WAL is going to pile up someplace, and you might run out of disk space. Time-delaying the standby creates an additional way for that to happen, but I don't think it's an entirely new problem. I am not sure exactly how walreceiver handles it if the disk is full. I assume it craps out and eventually retries, so probably what will happen is that, after the standby's pg_xlog directory fills up, walreceiver will sit there and error out until replay advances enough to remove a WAL file and thus permit some more data to be streamed. If the standby gets far enough behind the master that the required files are no longer there, then it will switch to the archive, if available. It might be nice to have a mode that only allows streaming replication when the amount of disk space on the standby is greater than or equal to some threshold, but that seems like a topic for another patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
Robert, I don't really see how that's any different from what happens now. If (for whatever reason) the master is generating WAL faster than a streaming standby can replay it, then the excess WAL is going to pile up someplace, and you might run out of disk space. Time-delaying the standby creates an additional way for that to happen, but I don't think it's an entirely new problem. Not remotely new. xlog partition full is currently 75% of the emergency support calls PGX gets from clients on 9.0 (if only they'd pay attention to their nagios alerts!) I am not sure exactly how walreceiver handles it if the disk is full. I assume it craps out and eventually retries, so probably what will happen is that, after the standby's pg_xlog directory fills up, walreceiver will sit there and error out until replay advances enough to remove a WAL file and thus permit some more data to be streamed. Nope, it gets stuck and stops there. Replay doesn't advance unless you can somehow clear out some space manually; if the disk is full, the disk is full, and PostgreSQL doesn't remove WAL files without being able to write files first. Manual (or scripted) intervention is always necessary if you reach disk 100% full. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On 6/29/11 11:11 AM, Robert Haas wrote: If the standby gets far enough behind the master that the required files are no longer there, then it will switch to the archive, if available. One more thing: As I understand it (and my testing shows this), the standby *prefers* the archive logs, and won't switch to streaming until it reaches the end of the archive logs. This is desirable behavior, as it minimizes the load on the master. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 29, 2011 at 9:54 PM, Josh Berkus j...@agliodbs.com wrote: I am not sure exactly how walreceiver handles it if the disk is full. I assume it craps out and eventually retries, so probably what will happen is that, after the standby's pg_xlog directory fills up, walreceiver will sit there and error out until replay advances enough to remove a WAL file and thus permit some more data to be streamed. Nope, it gets stuck and stops there. Replay doesn't advance unless you can somehow clear out some space manually; if the disk is full, the disk is full, and PostgreSQL doesn't remove WAL files without being able to write files first. Manual (or scripted) intervention is always necessary if you reach disk 100% full. Wow, that's a pretty crappy failure mode... but I don't think we need to fix it just on account of this patch. It would be nice to fix, of course. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 29, 2011 at 11:14 AM, Robert Haas robertmh...@gmail.com wrote: On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote: After we run pg_ctl promote, time-delayed replication should be disabled? Otherwise, failover might take very long time when we set recovery_time_delay to high value. PFA a patch that I believe will disable recovery_time_delay after promotion. The only change from the previous version is: diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog index 1dbf792..41b3ae9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5869,7 +5869,7 @@ pg_is_xlog_replay_paused(PG_FUNCTION_ARGS) static void recoveryDelay(void) { - while (1) + while (!CheckForStandbyTrigger()) { long secs; int microsecs; Thanks for updating patch! I have a few comments; ISTM recoveryDelayUntilTime needs to be calculated also when replaying the commit *compact* WAL record (i.e., record_info == XLOG_XACT_COMMIT_COMPACT). When the user uses only two-phase commit on the master, ISTM he or she cannot use this feature. Because recoveryDelayUntilTime is never set in that case. Is this intentional? We should disable this feature also after recovery reaches the stop point (specified in recovery_target_xxx)? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Thu, Jun 30, 2011 at 10:56 AM, Robert Haas robertmh...@gmail.com wrote: Nope, it gets stuck and stops there. Replay doesn't advance unless you can somehow clear out some space manually; if the disk is full, the disk is full, and PostgreSQL doesn't remove WAL files without being able to write files first. Manual (or scripted) intervention is always necessary if you reach disk 100% full. Wow, that's a pretty crappy failure mode... but I don't think we need to fix it just on account of this patch. It would be nice to fix, of course. Yeah, we need to fix that as a separate patch. The difficult point is that we cannot delete WAL files until we replay the checkpoint record and restartpoint occurs. But, if the disk is full, there would be no space to receive the checkpoint record, so we cannot WAL files infinitely. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Thu, Jun 30, 2011 at 12:14 PM, Fujii Masao masao.fu...@gmail.com wrote: We should disable this feature also after recovery reaches the stop point (specified in recovery_target_xxx)? Another comment; it's very helpful to document the behavior of delayed standby when promoting or after reaching the stop point. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote: After we run pg_ctl promote, time-delayed replication should be disabled? Otherwise, failover might take very long time when we set recovery_time_delay to high value. PFA a patch that I believe will disable recovery_time_delay after promotion. The only change from the previous version is: diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog index 1dbf792..41b3ae9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -5869,7 +5869,7 @@ pg_is_xlog_replay_paused(PG_FUNCTION_ARGS) static void recoveryDelay(void) { - while (1) + while (!CheckForStandbyTrigger()) { longsecs; int microsecs; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company time-delayed-standby-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Fri, Jun 17, 2011 at 11:34 AM, Robert Haas robertmh...@gmail.com wrote: On Thu, Jun 16, 2011 at 10:10 PM, Fujii Masao masao.fu...@gmail.com wrote: According to the above page, one purpose of time-delayed replication is to protect against user mistakes on master. But, when an user notices his wrong operation on master, what should he do next? The WAL records of his wrong operation might have already arrived at the standby, so neither promote nor restart doesn't cancel that wrong operation. Instead, probably he should shutdown the standby, investigate the timestamp of XID of the operation he'd like to cancel, set recovery_target_time and restart the standby. Something like this procedures should be documented? Or, we should implement new promote mode which finishes a recovery as soon as promote is requested (i.e., not replay all the available WAL records)? I like the idea of a new promote mode; Are you going to implement that mode in this CF? or next one? I wasn't really planning on it - I thought you might want to take a crack at it. The feature is usable without that, just maybe a bit less cool. Right. Certainly, it's too late for any more formal submissions to this CF, but I wouldn't mind reviewing a patch if you want to write one. Okay, I add that into my TODO list. But I might not have enough time to develop that. So, everyone, please feel free to implement that if you want! Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao masao.fu...@gmail.com wrote: When the replication connection is terminated, the standby tries to read WAL files from the archive. In this case, there is no walreceiver process, so how does the standby calculate the clock difference? Good question. Also, just because we have streaming replication available doesn't mean that we should force people to use it. It's still perfectly legit to set up a standby that only use archive_command and restore_command, and it would be nice if this feature could still work in such an environment. I anticipate that most people want to use streaming replication, but a time-delayed standby is a good example of a case where you might decide you don't need it. It could be useful to have all the WAL present (but not yet applied) if you're thinking you might want to promote that standby - but my guess is that in many cases, the time-delayed standby will be *in addition* to one or more regular standbys that would be the primary promotion candidates. So I can see someone deciding that they'd rather not have the load of another walsender on the master, and just let the time-delayed standby read from the archive. Even if that were not an issue, I'm still more or less of the opinion that trying to solve the time synchronization problem is a rathole anyway. To really solve this problem well, you're going to need the standby to send a message containing a timestamp, get a reply back from the master that contains that timestamp and a master timestamp, and then compute based on those two timestamps plus the reply timestamp the maximum and minimum possible lag between the two machines. Then you're going to need to guess, based on several cycles of this activity, what the actual lag is, and adjust it over time (but not too quckly, unless of course a large manual step has occurred) as the clocks potentially drift apart from each other. This is basically what ntpd does, except that it can be virtually guaranteed that our implementation will suck by comparison. Time synchronization is neither easy nor our core competency, and I think trying to include it in this feature is going to result in a net loss of reliability. errmsg(parameter \%s\ requires a temporal value, recovery_time_delay), We should s/a temporal/an Integer? It seems strange to ask for an integer when what we want is an amount of time in seconds or minutes... After we run pg_ctl promote, time-delayed replication should be disabled? Otherwise, failover might take very long time when we set recovery_time_delay to high value. Yeah, I think so. http://forge.mysql.com/worklog/task.php?id=344 According to the above page, one purpose of time-delayed replication is to protect against user mistakes on master. But, when an user notices his wrong operation on master, what should he do next? The WAL records of his wrong operation might have already arrived at the standby, so neither promote nor restart doesn't cancel that wrong operation. Instead, probably he should shutdown the standby, investigate the timestamp of XID of the operation he'd like to cancel, set recovery_target_time and restart the standby. Something like this procedures should be documented? Or, we should implement new promote mode which finishes a recovery as soon as promote is requested (i.e., not replay all the available WAL records)? I like the idea of a new promote mode; and documenting the other approach you mention doesn't sound bad either. Either one sounds like a job for a separate patch, though. The other option is to pause recovery and run pg_dump... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Fri, Jun 17, 2011 at 3:29 AM, Robert Haas robertmh...@gmail.com wrote: Even if that were not an issue, I'm still more or less of the opinion that trying to solve the time synchronization problem is a rathole anyway. To really solve this problem well, you're going to need the standby to send a message containing a timestamp, get a reply back from the master that contains that timestamp and a master timestamp, and then compute based on those two timestamps plus the reply timestamp the maximum and minimum possible lag between the two machines. Then you're going to need to guess, based on several cycles of this activity, what the actual lag is, and adjust it over time (but not too quckly, unless of course a large manual step has occurred) as the clocks potentially drift apart from each other. This is basically what ntpd does, except that it can be virtually guaranteed that our implementation will suck by comparison. Time synchronization is neither easy nor our core competency, and I think trying to include it in this feature is going to result in a net loss of reliability. Agreed. You've already added the note about time synchronization into the document. That's enough, I think. errmsg(parameter \%s\ requires a temporal value, recovery_time_delay), We should s/a temporal/an Integer? It seems strange to ask for an integer when what we want is an amount of time in seconds or minutes... OK. http://forge.mysql.com/worklog/task.php?id=344 According to the above page, one purpose of time-delayed replication is to protect against user mistakes on master. But, when an user notices his wrong operation on master, what should he do next? The WAL records of his wrong operation might have already arrived at the standby, so neither promote nor restart doesn't cancel that wrong operation. Instead, probably he should shutdown the standby, investigate the timestamp of XID of the operation he'd like to cancel, set recovery_target_time and restart the standby. Something like this procedures should be documented? Or, we should implement new promote mode which finishes a recovery as soon as promote is requested (i.e., not replay all the available WAL records)? I like the idea of a new promote mode; Are you going to implement that mode in this CF? or next one? and documenting the other approach you mention doesn't sound bad either. Either one sounds like a job for a separate patch, though. The other option is to pause recovery and run pg_dump... Yes, please. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Thu, Jun 16, 2011 at 10:10 PM, Fujii Masao masao.fu...@gmail.com wrote: According to the above page, one purpose of time-delayed replication is to protect against user mistakes on master. But, when an user notices his wrong operation on master, what should he do next? The WAL records of his wrong operation might have already arrived at the standby, so neither promote nor restart doesn't cancel that wrong operation. Instead, probably he should shutdown the standby, investigate the timestamp of XID of the operation he'd like to cancel, set recovery_target_time and restart the standby. Something like this procedures should be documented? Or, we should implement new promote mode which finishes a recovery as soon as promote is requested (i.e., not replay all the available WAL records)? I like the idea of a new promote mode; Are you going to implement that mode in this CF? or next one? I wasn't really planning on it - I thought you might want to take a crack at it. The feature is usable without that, just maybe a bit less cool. Certainly, it's too late for any more formal submissions to this CF, but I wouldn't mind reviewing a patch if you want to write one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, Jun 15, 2011 at 12:58 AM, Fujii Masao masao.fu...@gmail.com wrote: http://forge.mysql.com/worklog/task.php?id=344 According to the above page, one purpose of time-delayed replication is to protect against user mistakes on master. But, when an user notices his wrong operation on master, what should he do next? The WAL records of his wrong operation might have already arrived at the standby, so neither promote nor restart doesn't cancel that wrong operation. Instead, probably he should shutdown the standby, investigate the timestamp of XID of the operation he'd like to cancel, set recovery_target_time and restart the standby. Something like this procedures should be documented? Or, we should implement new promote mode which finishes a recovery as soon as promote is requested (i.e., not replay all the available WAL records)? i would prefer something like pg_ctl promote -m immediate that terminates the recovery -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Thu, Apr 21, 2011 at 12:18 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Apr 20, 2011 at 11:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I am a bit concerned about the reliability of this approach. If there is some network lag, or some lag in processing from the master, we could easily get the idea that there is time skew between the machines when there really isn't. And our perception of the time skew could easily bounce around from message to message, as the lag varies. I think it would be tremendously ironic of the two machines were actually synchronized to the microsecond, but by trying to be clever about it we managed to make the lag-time accurate only to within several seconds. Well, if walreceiver concludes that there is no more than a few seconds' difference between the clocks, it'd probably be OK to take the master timestamps at face value. The problem comes when the skew gets large (compared to the configured time delay, I guess). I suppose. Any bound on how much lag there can be before we start applying to skew correction is going to be fairly arbitrary. When the replication connection is terminated, the standby tries to read WAL files from the archive. In this case, there is no walreceiver process, so how does the standby calculate the clock difference? errmsg(parameter \%s\ requires a temporal value, recovery_time_delay), We should s/a temporal/an Integer? After we run pg_ctl promote, time-delayed replication should be disabled? Otherwise, failover might take very long time when we set recovery_time_delay to high value. http://forge.mysql.com/worklog/task.php?id=344 According to the above page, one purpose of time-delayed replication is to protect against user mistakes on master. But, when an user notices his wrong operation on master, what should he do next? The WAL records of his wrong operation might have already arrived at the standby, so neither promote nor restart doesn't cancel that wrong operation. Instead, probably he should shutdown the standby, investigate the timestamp of XID of the operation he'd like to cancel, set recovery_target_time and restart the standby. Something like this procedures should be documented? Or, we should implement new promote mode which finishes a recovery as soon as promote is requested (i.e., not replay all the available WAL records)? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On 07.05.2011 16:48, Robert Haas wrote: I was able to reproduce something very like this in unpatched master, just by letting recovery pause at a named restore point, and then resuming it. LOG: recovery stopping at restore point stop, time 2011-05-07 09:28:01.652958-04 LOG: recovery has paused HINT: Execute pg_xlog_replay_resume() to continue. (at this point I did pg_xlog_replay_resume()) LOG: redo done at 0/520 PANIC: wal receiver still active LOG: startup process (PID 38762) was terminated by signal 6: Abort trap LOG: terminating any other active server processes I'm thinking that this code is wrong: if (recoveryPauseAtTarget standbyState == STANDBY_SNAPSHOT_READY) { SetRecoveryPause(true); recoveryPausesHere(); } reachedStopPoint = true;/* see below */ recoveryContinue = false; I think that recoveryContinue = false assignment should not happen if we decide to pause. That is, we should say if (recoveryPauseAtTarget standbyState == STANDBY_SNAPSHOT_READY) { same as now } else recoveryContinue = false. No, recovery stops at that point whether or not you pause. Resuming after stopping at the recovery target doesn't mean that you resume recovery, it means that you resume to end recovery and start up the server (see the 2nd to last paragraph at http://www.postgresql.org/docs/9.1/static/recovery-target-settings.html). It would probably be more useful to allow a new stopping target to be set and continue recovery, but the current pause/resume functions don't allow that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On 11.05.2011 08:29, Fujii Masao wrote: On Sat, May 7, 2011 at 10:48 PM, Robert Haasrobertmh...@gmail.com wrote: I was able to reproduce something very like this in unpatched master, just by letting recovery pause at a named restore point, and then resuming it. I was able to reproduce the same problem even in 9.0. When the standby reaches the recovery target, it always tries to end the recovery even though walreceiver is still running, which causes the problem. This seems to be an oversight in streaming replication. I should have considered how the standby should work when recovery_target is specified. What about the attached patch? Which stops walreceiver instead of emitting PANIC there only if we've reached the recovery target. I think we can just always call ShutdownWalRcv(). It should be gone if the server was promoted while streaming, but that's just an implementation detail of what the promotion code does. There's no hard reason why it shouldn't be running at that point anymore, as long as we kill it before going any further. Committed a patch to do that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Wed, May 11, 2011 at 6:50 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think we can just always call ShutdownWalRcv(). It should be gone if the server was promoted while streaming, but that's just an implementation detail of what the promotion code does. There's no hard reason why it shouldn't be running at that point anymore, as long as we kill it before going any further. Okay. But I'd like to add the following assertion check just before ShutdownWalRcv() which you added, in order to detect such a bug that we found this time, i.e., the bug which causes unexpected end of recovery. Thought? Assert(reachedStopPoint || !WalRcvInProgress()) Committed a patch to do that. Thanks. Should we backport it to 9.0? 9.0 has the same problem. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On 11.05.2011 14:16, Fujii Masao wrote: On Wed, May 11, 2011 at 6:50 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I think we can just always call ShutdownWalRcv(). It should be gone if the server was promoted while streaming, but that's just an implementation detail of what the promotion code does. There's no hard reason why it shouldn't be running at that point anymore, as long as we kill it before going any further. Okay. But I'd like to add the following assertion check just before ShutdownWalRcv() which you added, in order to detect such a bug that we found this time, i.e., the bug which causes unexpected end of recovery. Thought? Assert(reachedStopPoint || !WalRcvInProgress()) There's no unexpected end of recovery here. The recovery ends when we reach the target, as it should. It was the assumption that WAL receiver can't be running at that point anymore that was wrong. That assertion would work, AFAICS, but I don't think it's something we need to assert. There isn't any harm done if WAL receiver is still running, as long as we shut it down at that point. Committed a patch to do that. Thanks. Should we backport it to 9.0? 9.0 has the same problem. Ah, thanks, missed that, Cherry-picked to 9.0 now as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Sat, May 7, 2011 at 10:48 PM, Robert Haas robertmh...@gmail.com wrote: I was able to reproduce something very like this in unpatched master, just by letting recovery pause at a named restore point, and then resuming it. I was able to reproduce the same problem even in 9.0. When the standby reaches the recovery target, it always tries to end the recovery even though walreceiver is still running, which causes the problem. This seems to be an oversight in streaming replication. I should have considered how the standby should work when recovery_target is specified. What about the attached patch? Which stops walreceiver instead of emitting PANIC there only if we've reached the recovery target. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center recovery_target_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Sat, Apr 23, 2011 at 9:46 PM, Jaime Casanova ja...@2ndquadrant.com wrote: On Tue, Apr 19, 2011 at 9:47 PM, Robert Haas robertmh...@gmail.com wrote: That is, a standby configured such that replay lags a prescribed amount of time behind the master. This seemed easy to implement, so I did. Patch (for 9.2, obviously) attached. This crashes when stoping recovery to a target (i tried with a named restore point and with a poin in time) after executing pg_xlog_replay_resume(). here is the backtrace. I will try to check later but i wanted to report it before... #0 0xb537 in raise () from /lib/libc.so.6 #1 0xb777a922 in abort () from /lib/libc.so.6 #2 0x08393a19 in errfinish (dummy=0) at elog.c:513 #3 0x083944ba in elog_finish (elevel=22, fmt=0x83d5221 wal receiver still active) at elog.c:1156 #4 0x080f04cb in StartupXLOG () at xlog.c:6691 #5 0x080f2825 in StartupProcessMain () at xlog.c:10050 #6 0x0811468f in AuxiliaryProcessMain (argc=2, argv=0xbfa326a8) at bootstrap.c:417 #7 0x0827c2ea in StartChildProcess (type=StartupProcess) at postmaster.c:4488 #8 0x08280b85 in PostmasterMain (argc=3, argv=0xa4c17e8) at postmaster.c:1106 #9 0x0821730f in main (argc=3, argv=0xa4c17e8) at main.c:199 Sorry for the slow response on this - I was on vacation for a week and my schedule got a big hole in it. I was able to reproduce something very like this in unpatched master, just by letting recovery pause at a named restore point, and then resuming it. LOG: recovery stopping at restore point stop, time 2011-05-07 09:28:01.652958-04 LOG: recovery has paused HINT: Execute pg_xlog_replay_resume() to continue. (at this point I did pg_xlog_replay_resume()) LOG: redo done at 0/520 PANIC: wal receiver still active LOG: startup process (PID 38762) was terminated by signal 6: Abort trap LOG: terminating any other active server processes I'm thinking that this code is wrong: if (recoveryPauseAtTarget standbyState == STANDBY_SNAPSHOT_READY) { SetRecoveryPause(true); recoveryPausesHere(); } reachedStopPoint = true;/* see below */ recoveryContinue = false; I think that recoveryContinue = false assignment should not happen if we decide to pause. That is, we should say if (recoveryPauseAtTarget standbyState == STANDBY_SNAPSHOT_READY) { same as now } else recoveryContinue = false. I haven't tested that, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] time-delayed standbys
On Tue, Apr 19, 2011 at 9:47 PM, Robert Haas robertmh...@gmail.com wrote: That is, a standby configured such that replay lags a prescribed amount of time behind the master. This seemed easy to implement, so I did. Patch (for 9.2, obviously) attached. This crashes when stoping recovery to a target (i tried with a named restore point and with a poin in time) after executing pg_xlog_replay_resume(). here is the backtrace. I will try to check later but i wanted to report it before... #0 0xb537 in raise () from /lib/libc.so.6 #1 0xb777a922 in abort () from /lib/libc.so.6 #2 0x08393a19 in errfinish (dummy=0) at elog.c:513 #3 0x083944ba in elog_finish (elevel=22, fmt=0x83d5221 wal receiver still active) at elog.c:1156 #4 0x080f04cb in StartupXLOG () at xlog.c:6691 #5 0x080f2825 in StartupProcessMain () at xlog.c:10050 #6 0x0811468f in AuxiliaryProcessMain (argc=2, argv=0xbfa326a8) at bootstrap.c:417 #7 0x0827c2ea in StartChildProcess (type=StartupProcess) at postmaster.c:4488 #8 0x08280b85 in PostmasterMain (argc=3, argv=0xa4c17e8) at postmaster.c:1106 #9 0x0821730f in main (argc=3, argv=0xa4c17e8) at main.c:199 -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte y capacitación de PostgreSQL -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers