[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Mon, 5 Jan 2015 17:16:43 +0900 от Michael Paquier michael.paqu...@gmail.com: On Wed, Dec 31, 2014 at 12:22 AM, Alexey Vasiliev leopard...@inbox.ru wrote: Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier michael.paqu...@gmail.com: As I understand now = (pg_time_t) time(NULL); return time in seconds, what is why I multiply value to 1000 to compare with restore_command_retry_interval in milliseconds. This way of doing is incorrect, you would get a wait time of 1s even for retries lower than 1s. In order to get the feature working correctly and to get a comparison granularity sufficient, you need to use TimetampTz for the tracking of the current and last failure time and to use the APIs from TimestampTz for difference calculations. I am not sure about small retry interval of time, in my cases I need interval bigger 5 seconds (20-40 seconds). Right now I limiting this value be bigger 100 milliseconds. Other people may disagree here, but I don't see any reason to put restrictions to this interval of time. Attached is a patch fixing the feature to use TimestampTz, updating as well the docs and recovery.conf.sample which was incorrect, on top of other things I noticed here and there. Alexey, does this new patch look fine for you? -- Michael Hello. Thanks for help. Yes, new patch look fine! -- Alexey Vasiliev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Hello. Thanks, I understand, what look in another part of code. Hope right now I did right changes. To not modify current pg_usleep calculation, I changed restore_command_retry_interval value to seconds (not milliseconds). In this case, min value - 1 second. Mon, 29 Dec 2014 00:15:03 +0900 от Michael Paquier michael.paqu...@gmail.com: On Sat, Dec 27, 2014 at 3:42 AM, Alexey Vasiliev leopard...@inbox.ru wrote: Thanks for suggestions. Patch updated. Cool, thanks. I just had an extra look at it. +This is useful, if I using for restore of wal logs some +external storage (like AWS S3) and no matter what the slave database +will lag behind the master. The problem, what for each request to +AWS S3 need to pay, what is why for N nodes, which try to get next +wal log each 5 seconds will be bigger price, than for example each +30 seconds. I reworked this portion of the docs, it is rather incorrect as the documentation should not use first-person subjects, and I don't believe that referencing any commercial products is a good thing in this context. +# specifies an optional timeout after nonzero code of restore_command. +# This can be useful to increase/decrease number of a restore_command calls. This is still referring to a timeout. That's not good. And the name of the parameter at the top of this comment block is missing. +static int restore_command_retry_interval = 5000L; I think that it would be more adapted to set that to 5000, and multiply by 1L. I am also wondering about having a better lower bound, like 100ms to avoid some abuse with this feature in the retries? + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), +errmsg(\%s\ must be bigger zero, + restore_command_retry_interval))); I'd rather rewrite that to must have a strictly positive value. -* Wait for more WAL to arrive. Time out after 5 seconds, +* Wait for more WAL to arrive. Time out after +* restore_command_retry_interval (5 seconds by default), * like when polling the archive, to react to a trigger * file promptly. */ WaitLatch(XLogCtl-recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, - 5000L); + restore_command_retry_interval); I should have noticed earlier, but in its current state your patch actually does not work. What you are doing here is tuning the time process waits for WAL from stream. In your case what you want to control is the retry time for a restore_command in archive recovery, no? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Alexey Vasiliev diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index ef78bc0..38420a5 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -145,6 +145,26 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /listitem /varlistentry + varlistentry id=restore-command-retry-interval xreflabel=restore_command_retry_interval + termvarnamerestore_command_retry_interval/varname (typeinteger/type) + indexterm +primaryvarnamerestore_command_retry_interval/ recovery parameter/primary + /indexterm + /term + listitem + para +If varnamerestore_command/ returns nonzero exit status code, retry +command after the interval of time specified by this parameter. +Default value is literal5s/. + /para + para +This is useful, if I using for restore of wal logs some +external storage and no matter what the slave database +will lag behind the master. + /para + /listitem + /varlistentry + /variablelist /sect1 diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index b777400..5b63f60 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -58,6 +58,11 @@ # #recovery_end_command = '' # +# specifies an optional retry interval of restore_command command, if previous return nonzero exit status code. +# This can be useful to increase/decrease number of a restore_command calls. +# +#restore_command_retry_interval = 5s +# #--- # RECOVERY TARGET PARAMETERS
[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier michael.paqu...@gmail.com: On Tue, Dec 30, 2014 at 9:33 PM, Michael Paquier michael.paqu...@gmail.com wrote: On Tue, Dec 30, 2014 at 9:10 PM, Alexey Vasiliev leopard...@inbox.ru wrote: To not modify current pg_usleep calculation, I changed restore_command_retry_interval value to seconds (not milliseconds). In this case, min value - 1 second. Er, what the problem with not changing 100L to 1000L? The unit of your parameter is ms AFAIK. Of course I meant in the previous version of the patch not the current one. Wouldn't it be useful to use it with for example retry intervals of the order of 100ms~300ms for some cases? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Thanks, patch changed. As I understand now = (pg_time_t) time(NULL); return time in seconds, what is why I multiply value to 1000 to compare with restore_command_retry_interval in milliseconds. I am not sure about small retry interval of time, in my cases I need interval bigger 5 seconds (20-40 seconds). Right now I limiting this value be bigger 100 milliseconds. -- Alexey Vasilievdiff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index ef78bc0..52cb47d 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -145,6 +145,29 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /listitem /varlistentry + varlistentry id=restore-command-retry-interval xreflabel=restore_command_retry_interval + termvarnamerestore_command_retry_interval/varname (typeinteger/type) + indexterm +primaryvarnamerestore_command_retry_interval/ recovery parameter/primary + /indexterm + /term + listitem + para +If varnamerestore_command/ returns nonzero exit status code, retry +command after the interval of time specified by this parameter. +Default value is literal5s/. + /para + para +This is useful, if I using for restore of wal logs some +external storage and request each 5 seconds for wal logs +can be useless and cost additional money. +Increasing this parameter allow to increase retry interval of time, +if not found new wal logs inside external storage and no matter +what the slave database will lag behind the master. + /para + /listitem + /varlistentry + /variablelist /sect1 diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index b777400..5b63f60 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -58,6 +58,11 @@ # #recovery_end_command = '' # +# specifies an optional retry interval of restore_command command, if previous return nonzero exit status code. +# This can be useful to increase/decrease number of a restore_command calls. +# +#restore_command_retry_interval = 5s +# #--- # RECOVERY TARGET PARAMETERS #--- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e54..67a873c 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime; static char *recoveryTargetName; static int recovery_min_apply_delay = 0; static TimestampTz recoveryDelayUntilTime; +static int restore_command_retry_interval = 5000; /* options taken from recovery.conf for XLOG streaming */ static bool StandbyModeRequested = false; @@ -4881,6 +4882,28 @@ readRecoveryCommandFile(void) (errmsg_internal(trigger_file = '%s', TriggerFile))); } + else if (strcmp(item-name, restore_command_retry_interval) == 0) + { + const char *hintmsg; + + if (!parse_int(item-value, restore_command_retry_interval, GUC_UNIT_MS, + hintmsg)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(parameter \%s\ requires a temporal value, +restore_command_retry_interval), + hintmsg ? errhint(%s, _(hintmsg)) : 0)); + ereport(DEBUG2, + (errmsg_internal(restore_command_retry_interval = '%s', item-value))); + + if (restore_command_retry_interval 100) + { +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(\%s\ must have a bigger 100 milliseconds value, + restore_command_retry_interval))); + } + } else if (strcmp(item-name, recovery_min_apply_delay) == 0) { const char *hintmsg; @@ -10495,13 +10518,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * machine, so we've exhausted all the options
[HACKERS] Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Thanks for suggestions. Patch updated. Mon, 22 Dec 2014 12:07:06 +0900 от Michael Paquier michael.paqu...@gmail.com: On Tue, Nov 4, 2014 at 6:25 AM, Alexey Vasiliev leopard...@inbox.ru wrote: Added new patch. Seems useful to me to be able to tune this interval of time. I would simply reword the documentation as follows: If varnamerestore_command/ returns nonzero exit status code, retry command after the interval of time specified by this parameter. Default value is literal5s/. Also, I think that it would be a good idea to error out if this parameter has a value of let's say, less than 1s instead of doing a check for a positive value in the waiting latch. On top of that, the default value of restore_command_retry_interval should be 5000 and not 0 to simplify the code. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Alexey Vasiliev diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index ef78bc0..53ccf13 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -145,6 +145,29 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /listitem /varlistentry + varlistentry id=restore-command-retry-interval xreflabel=restore_command_retry_interval + termvarnamerestore_command_retry_interval/varname (typeinteger/type) + indexterm +primaryvarnamerestore_command_retry_interval/ recovery parameter/primary + /indexterm + /term + listitem + para +If varnamerestore_command/ returns nonzero exit status code, retry +command after the interval of time specified by this parameter. +Default value is literal5s/. + /para + para +This is useful, if I using for restore of wal logs some +external storage (like AWS S3) and no matter what the slave database +will lag behind the master. The problem, what for each request to +AWS S3 need to pay, what is why for N nodes, which try to get next +wal log each 5 seconds will be bigger price, than for example each +30 seconds. + /para + /listitem + /varlistentry + /variablelist /sect1 diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index b777400..7ed6f3b 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -58,6 +58,11 @@ # #recovery_end_command = '' # +# specifies an optional timeout after nonzero code of restore_command. +# This can be useful to increase/decrease number of a restore_command calls. +# +#restore_command_retry_interval = 5s +# #--- # RECOVERY TARGET PARAMETERS #--- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index e54..02c55a8 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime; static char *recoveryTargetName; static int recovery_min_apply_delay = 0; static TimestampTz recoveryDelayUntilTime; +static int restore_command_retry_interval = 5000L; /* options taken from recovery.conf for XLOG streaming */ static bool StandbyModeRequested = false; @@ -4881,6 +4882,28 @@ readRecoveryCommandFile(void) (errmsg_internal(trigger_file = '%s', TriggerFile))); } + else if (strcmp(item-name, restore_command_retry_interval) == 0) + { + const char *hintmsg; + + if (!parse_int(item-value, restore_command_retry_interval, GUC_UNIT_MS, + hintmsg)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(parameter \%s\ requires a temporal value, +restore_command_retry_interval), + hintmsg ? errhint(%s, _(hintmsg)) : 0)); + ereport(DEBUG2, + (errmsg_internal(restore_command_retry_interval = '%s', item-value))); + + if (restore_command_retry_interval = 0) + { +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(\%s\ must be bigger zero, + restore_command_retry_interval))); + } + } else if (strcmp(item-name, recovery_min_apply_delay) == 0) { const char *hintmsg; @@ -10658,13 +10681,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, } /* - * Wait for more WAL to arrive. Time out after 5 seconds, + * Wait for more WAL to arrive. Time out after + * restore_command_retry_interval (5 seconds by default), * like when polling the archive, to react to a trigger * file promptly. */ WaitLatch(XLogCtl-recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, - 5000L
[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Mon, 3 Nov 2014 22:55:02 -0200 от Fabrízio de Royes Mello fabriziome...@gmail.com: On Mon, Nov 3, 2014 at 7:25 PM, Alexey Vasiliev leopard...@inbox.ru wrote: Mon, 3 Nov 2014 19:18:51 -0200 от Fabrízio de Royes Mello fabriziome...@gmail.com : Should I change my patch and send it by email? And also as I understand I should change message ID for https://commitfest.postgresql.org/action/patch_view?id=1636 , isn't it? You should just send another version of your patch by email and add a new comment to commit-fest using the Patch comment type. Added new patch. And you should add the patch to an open commit-fest not to an in-progress. The next opened is 2014-12 [1]. Moved. Thanks. -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL Timbira: http://www.timbira.com.br Blog: http://fabriziomello.github.io Linkedin: http://br.linkedin.com/in/fabriziomello Twitter: http://twitter.com/fabriziomello Github: http://github.com/fabriziomello -- Alexey Vasiliev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Tue, 4 Nov 2014 14:41:56 +0100 от Andres Freund and...@2ndquadrant.com: Hi, On 2014-11-03 14:04:00 +0300, Alexey Vasiliev wrote: * What the patch does in a short paragraph: This patch should add option recovery_timeout, which help to control timeout of restore_command nonzero status code. Right now default value is 5 seconds. This is useful, if I using for restore of wal logs some external storage (like AWS S3) and no matter what the slave database will lag behind the master. The problem, what for each request to AWS S3 need to pay, what is why for N nodes, which try to get next wal log each 5 seconds will be bigger price, than for example each 30 seconds. Before I do this in this way: if ! (/usr/local/bin/envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch %f %p); then sleep 60; fi . But in this case restart/stop database slower. Without saying that the feature is unneccessary, wouldn't this better be solved by using streaming rep most of the time? But we don't need streaming rep. Master database no need to know about slaves (and no need to add this little overhead to master). Slaves read wal logs from S3 and the same S3 wal logs used as backups. -- Alexey Vasiliev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Hello everyone. * Project name: Add recovery_timeout option to control timeout of restore_command nonzero status code * Uniquely identifiable file name, so we can tell difference between your v1 and v24: 0001-add-recovery_timeout-to-controll-timeout-between-res.patch * What the patch does in a short paragraph: This patch should add option recovery_timeout, which help to control timeout of restore_command nonzero status code. Right now default value is 5 seconds. This is useful, if I using for restore of wal logs some external storage (like AWS S3) and no matter what the slave database will lag behind the master. The problem, what for each request to AWS S3 need to pay, what is why for N nodes, which try to get next wal log each 5 seconds will be bigger price, than for example each 30 seconds. Before I do this in this way: if ! (/usr/local/bin/envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch %f %p); then sleep 60; fi . But in this case restart/stop database slower. * Whether the patch is for discussion or for application: No such thing. * Which branch the patch is against: master branch * Whether it compiles and tests successfully, so we know nothing obvious is broken: compiled and pass tests on local mashine. * Whether it contains any platform-specific items and if so, has it been tested on other platforms: hope, no. * Confirm that the patch includes regression tests to check the new feature actually works as described: No it doesn't have test. I don't found ho to testing new config variables. * Include documentation: added. * Describe the effect your patch has on performance, if any: shouldn't effect on database performance. This is my first patch. I am not sure about name of option. Maybe it should called recovery_nonzero_timeout. -- Alexey VasilievFrom 35abe56b2497f238a6888fe98c54aa9cb5300866 Mon Sep 17 00:00:00 2001 From: Alexey Vasiliev leopard.not.a@gmail.com Date: Mon, 3 Nov 2014 00:21:14 +0200 Subject: [PATCH] add recovery_timeout to controll timeout between restore_command nonzero --- doc/src/sgml/recovery-config.sgml | 16 src/backend/access/transam/recovery.conf.sample | 5 + src/backend/access/transam/xlog.c | 17 - 3 files changed, 37 insertions(+), 1 deletion(-) diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 0f1ff34..bc410b3 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -145,6 +145,22 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /listitem /varlistentry + varlistentry id=restore-timeout xreflabel=restore_timeout + termvarnamerestore_timeout/varname (typeinteger/type) + indexterm +primaryvarnamerestore_timeout/ recovery parameter/primary + /indexterm + /term + listitem + para +By default, if varnamerestore_command/ return nonzero status, +server will retry command again after 5 seconds. This parameter +allow to change this time. This parameter is optional. This can +be useful to increase/decrease number of a restore_command calls. + /para + /listitem + /varlistentry + /variablelist /sect1 diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index 7657df3..282e898 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -58,6 +58,11 @@ # #recovery_end_command = '' # +# specifies an optional timeout after nonzero code of restore_command. +# This can be useful to increase/decrease number of a restore_command calls. +# +#restore_timeout = '' +# #--- # RECOVERY TARGET PARAMETERS #--- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3c9aeae..98f0fca 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -233,6 +233,7 @@ static TimestampTz recoveryTargetTime; static char *recoveryTargetName; static int recovery_min_apply_delay = 0; static TimestampTz recoveryDelayUntilTime; +static int restore_timeout = 0; /* options taken from recovery.conf for XLOG streaming */ static bool StandbyModeRequested = false; @@ -5245,6 +5246,20 @@ readRecoveryCommandFile(void) (errmsg_internal(trigger_file = '%s', TriggerFile))); } + else if (strcmp(item-name, restore_timeout) == 0) + { + const char *hintmsg; + + if (!parse_int(item-value, restore_timeout, GUC_UNIT_MS, + hintmsg)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(parameter \%s\ requires a temporal value, +restore_timeout), + hintmsg ? errhint(%s, _(hintmsg)) : 0)); + ereport(DEBUG2
[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Mon, 03 Nov 2014 14:36:33 -0500 от Peter Eisentraut pete...@gmx.net: On 11/3/14 6:04 AM, Alexey Vasiliev wrote: 3. What the patch does in a short paragraph: This patch should add option recovery_timeout, which help to control timeout of restore_command nonzero status code. Right now default value is 5 seconds. This is useful, if I using for restore of wal logs some external storage (like AWS S3) and no matter what the slave database will lag behind the master. The problem, what for each request to AWS S3 need to pay, what is why for N nodes, which try to get next wal log each 5 seconds will be bigger price, than for example each 30 seconds. That seems useful. I would include something about this use case in the documentation. Ok, I will add this in patch. This is my first patch. I am not sure about name of option. Maybe it should called recovery_nonzero_timeout. The option name had me confused. At first I though this is the time after which a running restore_command invocation gets killed. I think a more precise description might be restore_command_retry_interval. restore_command_retry_interval - I like this name! Should I change my patch and send it by email? And also as I understand I should change message ID for https://commitfest.postgresql.org/action/patch_view?id=1636, isn't it? Thanks -- Alexey Vasiliev -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re[3]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code
Mon, 3 Nov 2014 19:18:51 -0200 от Fabrízio de Royes Mello fabriziome...@gmail.com: Should I change my patch and send it by email? And also as I understand I should change message ID for https://commitfest.postgresql.org/action/patch_view?id=1636 , isn't it? You should just send another version of your patch by email and add a new comment to commit-fest using the Patch comment type. Added new patch. -- Alexey Vasiliev From a5d941717df71e4c16941b8a02f0dca40a1107a0 Mon Sep 17 00:00:00 2001 From: Alexey Vasiliev leopard.not.a@gmail.com Date: Mon, 3 Nov 2014 00:21:14 +0200 Subject: [PATCH] add recovery_timeout to controll timeout between restore_command nonzero --- doc/src/sgml/recovery-config.sgml | 24 src/backend/access/transam/recovery.conf.sample | 5 + src/backend/access/transam/xlog.c | 20 ++-- 3 files changed, 47 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml index 0f1ff34..98eb451 100644 --- a/doc/src/sgml/recovery-config.sgml +++ b/doc/src/sgml/recovery-config.sgml @@ -145,6 +145,30 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p' # Windows /listitem /varlistentry + varlistentry id=restore-command-retry-interval xreflabel=restore_command_retry_interval + termvarnamerestore_command_retry_interval/varname (typeinteger/type) + indexterm +primaryvarnamerestore_command_retry_interval/ recovery parameter/primary + /indexterm + /term + listitem + para +By default, if varnamerestore_command/ return nonzero status, +server will retry command again after 5 seconds. This parameter +allow to change this time. This parameter is optional. This can +be useful to increase/decrease number of a restore_command calls. + /para + para +This is useful, if I using for restore of wal logs some +external storage (like AWS S3) and no matter what the slave database +will lag behind the master. The problem, what for each request to +AWS S3 need to pay, what is why for N nodes, which try to get next +wal log each 5 seconds will be bigger price, than for example each +30 seconds. + /para + /listitem + /varlistentry + /variablelist /sect1 diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample index 7657df3..4eee8f4 100644 --- a/src/backend/access/transam/recovery.conf.sample +++ b/src/backend/access/transam/recovery.conf.sample @@ -58,6 +58,11 @@ # #recovery_end_command = '' # +# specifies an optional timeout after nonzero code of restore_command. +# This can be useful to increase/decrease number of a restore_command calls. +# +#restore_command_retry_interval = 5s +# #--- # RECOVERY TARGET PARAMETERS #--- diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 3c9aeae..c26101e 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -233,6 +233,7 @@ static TimestampTz recoveryTargetTime; static char *recoveryTargetName; static int recovery_min_apply_delay = 0; static TimestampTz recoveryDelayUntilTime; +static int restore_command_retry_interval = 0; /* options taken from recovery.conf for XLOG streaming */ static bool StandbyModeRequested = false; @@ -5245,6 +5246,20 @@ readRecoveryCommandFile(void) (errmsg_internal(trigger_file = '%s', TriggerFile))); } + else if (strcmp(item-name, restore_command_retry_interval) == 0) + { + const char *hintmsg; + + if (!parse_int(item-value, restore_command_retry_interval, GUC_UNIT_MS, + hintmsg)) +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg(parameter \%s\ requires a temporal value, +restore_command_retry_interval), + hintmsg ? errhint(%s, _(hintmsg)) : 0)); + ereport(DEBUG2, + (errmsg_internal(restore_command_retry_interval = '%s', item-value))); + } else if (strcmp(item-name, recovery_min_apply_delay) == 0) { const char *hintmsg; @@ -11104,13 +9,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, } /* - * Wait for more WAL to arrive. Time out after 5 seconds, + * Wait for more WAL to arrive. Time out after + * restore_command_retry_interval (5 seconds by default), * like when polling the archive, to react to a trigger * file promptly. */ WaitLatch(XLogCtl-recoveryWakeupLatch, WL_LATCH_SET | WL_TIMEOUT, - 5000L); + restore_command_retry_interval 0 ? restore_command_retry_interval : 5000L); ResetLatch(XLogCtl-recoveryWakeupLatch); break
[HACKERS] Re[2]: [HACKERS] Connect from background worker thread to database
Воскресенье, 24 ноября 2013, 14:06 +01:00 от Andres Freund and...@anarazel.de: Hi, On 2013-11-24 16:27:06 +0400, Олексій Васильєв wrote: This is part where I try to connect to database: https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L92-L132 , but SPI functions give error in log (it is commented): 2013-11-24 02:57:43 UTC ERROR: stack depth limit exceeded 2013-11-24 02:57:43 UTC HINT: Increase the configuration parameter max_stack_depth (currently 2048kB), after ensuring the platform's stack depth limit is adequate. 2013-11-24 02:57:43 UTC CONTEXT: SQL statement SELECT COUNT(*) FROM pg_class; Because I doing something in wrong way. I will appreciate for any help: where I doing wrong, link to the article how to do it, just a tip, pull request - anything. Google search and PostgreSQL sources reading so far helped me to this point. At the very least you're calling InitPostgres() instead of BackgroundWorkerInitializeConnection() which you have commented out. And the latter should only be called once in every worker. Greetings, Andres Freund Fixed by using PQconnectdb from libpq-fe.h. Thanks. -- Alexey Vasiliev
[HACKERS] Re[2]: [HACKERS] Re[2]: [HACKERS] Connect from background worker thread to database
Понедельник, 25 ноября 2013, 8:31 +09:00 от Michael Paquier michael.paqu...@gmail.com: On Mon, Nov 25, 2013 at 2:10 AM, Alexey Vasiliev leopard...@inbox.ru wrote: Воскресенье, 24 ноября 2013, 14:06 +01:00 от Andres Freund and...@anarazel.de : Hi, On 2013-11-24 16:27:06 +0400, Олексій Васильєв wrote: This is part where I try to connect to database: https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L92-L132 , but SPI functions give error in log (it is commented): 2013-11-24 02:57:43 UTC ERROR: stack depth limit exceeded 2013-11-24 02:57:43 UTC HINT: Increase the configuration parameter max_stack_depth (currently 2048kB), after ensuring the platform's stack depth limit is adequate. 2013-11-24 02:57:43 UTC CONTEXT: SQL statement SELECT COUNT(*) FROM pg_class; Because I doing something in wrong way. I will appreciate for any help: where I doing wrong, link to the article how to do it, just a tip, pull request - anything. Google search and PostgreSQL sources reading so far helped me to this point. At the very least you're calling InitPostgres() instead of BackgroundWorkerInitializeConnection() which you have commented out. And the latter should only be called once in every worker. Greetings, Andres Freund Fixed by using PQconnectdb from libpq-fe.h. Thanks. You should not need an extra PQconnectdb to connect to a database using a background worker for your case as far as I understood. By using that you are requiring a connection to database without using the internal infrastructure in place, meaning that your bgworker is not connected to the database server from the inside but from the outside, like a normal client would do. Aren't to trying to have a background worker connected to a database when it is initialized with BgWorkerStart_PostmasterStart? Bgworkers using this start-up mode are not eligible to initialize database connections. Please use either BgWorkerStart_ConsistentState or BgWorkerStart_RecoveryFinished and BackgroundWorkerInitializeConnection will allow a connection to database without any extra work. -- Michael Thanks, I just try this. This work if I working with database in loop inside bgw_main function. But if I create threads inside bgw_main and try to connect to database - I have errors stack depth limit exceeded . The code: https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L195 - connect to database https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L100-L131 - http_event_handler function execute in threads, because it handle http requests. And this code not work. BTW, I need connect to database depend from request url, so execute BackgroundWorkerInitializeConnection at the beginning not enough. Thanks again. -- Alexey Vasiliev
[HACKERS] Re[2]: [HACKERS] Re[2]: [HACKERS] Connect from background worker thread to database
Понедельник, 25 ноября 2013, 13:31 +09:00 от Michael Paquier michael.paqu...@gmail.com: On Mon, Nov 25, 2013 at 1:02 PM, Alexey Vasiliev leopard...@inbox.ru wrote: Thanks, I just try this. This work if I working with database in loop inside bgw_main function. But if I create threads inside bgw_main and try to connect to database - I have errors stack depth limit exceeded . The code: https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L195 - connect to database https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L100-L131 - http_event_handler function execute in threads, because it handle http requests. And this code not work. BTW, I need connect to database depend from request url, so execute BackgroundWorkerInitializeConnection at the beginning not enough. There is a design problem with your application when trying to create threads with mg_start in order to grab events. Note that I am not familiar with mongoose, but with bgworkers you cannot simply create new threads that would be able to connect to server concurrently. A model that would be more suited with bgworkers would be something like: - Handle event messages by for example opening a port or monitoring the events on a single bgworker launched by bgw_main, with for example a message queue model (possible with 9.3). Connection to database would be done with a single connection, managed within the loop of bgw_main. - Create a new bgworker dynamically each time a new event comes in (possible only with 9.4~). The top work would be done by a central bgworker initialized with server, which would treat events and kick new slave workers when necessary. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers Thanks. I will look how to do this in the best way by your suggestions. -- Alexey Vasiliev