[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2015-01-05 Thread Alexey Vasiliev
Mon, 5 Jan 2015 17:16:43 +0900 от Michael Paquier michael.paqu...@gmail.com:
 On Wed, Dec 31, 2014 at 12:22 AM, Alexey Vasiliev leopard...@inbox.ru wrote:
  Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier 
  michael.paqu...@gmail.com:
  As I understand now = (pg_time_t) time(NULL); return time in seconds, what 
  is why I multiply value to 1000 to compare with 
  restore_command_retry_interval in milliseconds.
 This way of doing is incorrect, you would get a wait time of 1s even
 for retries lower than 1s. In order to get the feature working
 correctly and to get a comparison granularity sufficient, you need to
 use TimetampTz for the tracking of the current and last failure time
 and to use the APIs from TimestampTz for difference calculations.
 
  I am not sure about small retry interval of time, in my cases I need 
  interval bigger 5 seconds (20-40 seconds). Right now I limiting this value 
  be bigger 100 milliseconds.
 Other people may disagree here, but I don't see any reason to put
 restrictions to this interval of time.
 
 Attached is a patch fixing the feature to use TimestampTz, updating as
 well the docs and recovery.conf.sample which was incorrect, on top of
 other things I noticed here and there.
 
 Alexey, does this new patch look fine for you?
 -- 
 Michael
 
 

Hello. Thanks for help. Yes, new patch look fine!

-- 
Alexey Vasiliev
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-12-30 Thread Alexey Vasiliev
 Hello.

Thanks, I understand, what look in another part of code. Hope right now I did 
right changes.

To not modify current pg_usleep calculation, I changed 
restore_command_retry_interval value to seconds (not milliseconds). In this 
case, min value - 1 second.


Mon, 29 Dec 2014 00:15:03 +0900 от Michael Paquier michael.paqu...@gmail.com:
On Sat, Dec 27, 2014 at 3:42 AM, Alexey Vasiliev  leopard...@inbox.ru  wrote:
 Thanks for suggestions.

 Patch updated.

Cool, thanks. I just had an extra look at it.

+This is useful, if I using for restore of wal logs some
+external storage (like AWS S3) and no matter what the slave database
+will lag behind the master. The problem, what for each request to
+AWS S3 need to pay, what is why for N nodes, which try to get next
+wal log each 5 seconds will be bigger price, than for example each
+30 seconds.
I reworked this portion of the docs, it is rather incorrect as the
documentation should not use first-person subjects, and I don't
believe that referencing any commercial products is a good thing in
this context.

+# specifies an optional timeout after nonzero code of restore_command.
+# This can be useful to increase/decrease number of a restore_command calls.
This is still referring to a timeout. That's not good. And the name of
the parameter at the top of this comment block is missing.

+static int restore_command_retry_interval = 5000L;
I think that it would be more adapted to set that to 5000, and
multiply by 1L. I am also wondering about having a better lower bound,
like 100ms to avoid some abuse with this feature in the retries?

+   ereport(ERROR,
+
(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg(\%s\ must
be bigger zero,
+   restore_command_retry_interval)));
I'd rather rewrite that to must have a strictly positive value.

-* Wait for more WAL to
arrive. Time out after 5 seconds,
+* Wait for more WAL to
arrive. Time out after
+*
restore_command_retry_interval (5 seconds by default),
 * like when polling the
archive, to react to a trigger
 * file promptly.
 */

WaitLatch(XLogCtl-recoveryWakeupLatch,
  WL_LATCH_SET
| WL_TIMEOUT,
- 5000L);
+
restore_command_retry_interval);
I should have noticed earlier, but in its current state your patch
actually does not work. What you are doing here is tuning the time
process waits for WAL from stream. In your case what you want to
control is the retry time for a restore_command in archive recovery,
no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Alexey Vasiliev
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index ef78bc0..38420a5 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,26 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
   /listitem
  /varlistentry
 
+ varlistentry id=restore-command-retry-interval xreflabel=restore_command_retry_interval
+  termvarnamerestore_command_retry_interval/varname (typeinteger/type)
+  indexterm
+primaryvarnamerestore_command_retry_interval/ recovery parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+If varnamerestore_command/ returns nonzero exit status code, retry
+command after the interval of time specified by this parameter.
+Default value is literal5s/.
+   /para
+   para
+This is useful, if I using for restore of wal logs some
+external storage and no matter what the slave database
+will lag behind the master.
+   /para
+  /listitem
+ /varlistentry
+
 /variablelist
 
   /sect1
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index b777400..5b63f60 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,11 @@
 #
 #recovery_end_command = ''
 #
+# specifies an optional retry interval of restore_command command, if previous return nonzero exit status code.
+# This can be useful to increase/decrease number of a restore_command calls.
+#
+#restore_command_retry_interval = 5s
+#
 #---
 # RECOVERY TARGET PARAMETERS

[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-12-30 Thread Alexey Vasiliev
Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier michael.paqu...@gmail.com:
 On Tue, Dec 30, 2014 at 9:33 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  On Tue, Dec 30, 2014 at 9:10 PM, Alexey Vasiliev leopard...@inbox.ru 
  wrote:
  To not modify current pg_usleep calculation, I changed
  restore_command_retry_interval value to seconds (not milliseconds). In this
  case, min value - 1 second.
  Er, what the problem with not changing 100L to 1000L? The unit of
  your parameter is ms AFAIK.
 Of course I meant in the previous version of the patch not the current
 one. Wouldn't it be useful to use it with for example retry intervals
 of the order of 100ms~300ms for some cases?
 -- 
 Michael
 
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 

Thanks, patch changed. 

As I understand now = (pg_time_t) time(NULL); return time in seconds, what is 
why I multiply value to 1000 to compare with restore_command_retry_interval in 
milliseconds.

I am not sure about small retry interval of time, in my cases I need interval 
bigger 5 seconds (20-40 seconds). Right now I limiting this value be bigger 100 
milliseconds.

-- 
Alexey Vasilievdiff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index ef78bc0..52cb47d 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,29 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
   /listitem
  /varlistentry
 
+ varlistentry id=restore-command-retry-interval xreflabel=restore_command_retry_interval
+  termvarnamerestore_command_retry_interval/varname (typeinteger/type)
+  indexterm
+primaryvarnamerestore_command_retry_interval/ recovery parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+If varnamerestore_command/ returns nonzero exit status code, retry
+command after the interval of time specified by this parameter.
+Default value is literal5s/.
+   /para
+   para
+This is useful, if I using for restore of wal logs some
+external storage and request each 5 seconds for wal logs
+can be useless and cost additional money.
+Increasing this parameter allow to increase retry interval of time,
+if not found new wal logs inside external storage and no matter
+what the slave database will lag behind the master.
+   /para
+  /listitem
+ /varlistentry
+
 /variablelist
 
   /sect1
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index b777400..5b63f60 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,11 @@
 #
 #recovery_end_command = ''
 #
+# specifies an optional retry interval of restore_command command, if previous return nonzero exit status code.
+# This can be useful to increase/decrease number of a restore_command calls.
+#
+#restore_command_retry_interval = 5s
+#
 #---
 # RECOVERY TARGET PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e54..67a873c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int 	restore_command_retry_interval = 5000;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -4881,6 +4882,28 @@ readRecoveryCommandFile(void)
 	(errmsg_internal(trigger_file = '%s',
 	 TriggerFile)));
 		}
+		else if (strcmp(item-name, restore_command_retry_interval) == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item-value, restore_command_retry_interval, GUC_UNIT_MS,
+		   hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg(parameter \%s\ requires a temporal value,
+restore_command_retry_interval),
+		 hintmsg ? errhint(%s, _(hintmsg)) : 0));
+			ereport(DEBUG2,
+	(errmsg_internal(restore_command_retry_interval = '%s', item-value)));
+
+			if (restore_command_retry_interval  100)
+			{
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg(\%s\ must have a bigger 100 milliseconds value,
+	restore_command_retry_interval)));
+			}
+		}
 		else if (strcmp(item-name, recovery_min_apply_delay) == 0)
 		{
 			const char *hintmsg;
@@ -10495,13 +10518,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 * machine, so we've exhausted all the options

[HACKERS] Re: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-12-26 Thread Alexey Vasiliev
 Thanks for suggestions.

Patch updated.


Mon, 22 Dec 2014 12:07:06 +0900 от Michael Paquier michael.paqu...@gmail.com:
On Tue, Nov 4, 2014 at 6:25 AM, Alexey Vasiliev  leopard...@inbox.ru  wrote:
 Added new patch.
Seems useful to me to be able to tune this interval of time.

I would simply reword the documentation as follows:
If varnamerestore_command/ returns nonzero exit status code, retry
command after the interval of time specified by this parameter.
Default value is literal5s/.

Also, I think that it would be a good idea to error out if this
parameter has a value of let's say, less than 1s instead of doing a
check for a positive value in the waiting latch. On top of that, the
default value of restore_command_retry_interval should be 5000 and not
0 to simplify the code.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Alexey Vasiliev
diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index ef78bc0..53ccf13 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,29 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
   /listitem
  /varlistentry
 
+ varlistentry id=restore-command-retry-interval xreflabel=restore_command_retry_interval
+  termvarnamerestore_command_retry_interval/varname (typeinteger/type)
+  indexterm
+primaryvarnamerestore_command_retry_interval/ recovery parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+If varnamerestore_command/ returns nonzero exit status code, retry
+command after the interval of time specified by this parameter.
+Default value is literal5s/.
+   /para
+   para
+This is useful, if I using for restore of wal logs some
+external storage (like AWS S3) and no matter what the slave database
+will lag behind the master. The problem, what for each request to
+AWS S3 need to pay, what is why for N nodes, which try to get next
+wal log each 5 seconds will be bigger price, than for example each
+30 seconds.
+   /para
+  /listitem
+ /varlistentry
+
 /variablelist
 
   /sect1
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index b777400..7ed6f3b 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,11 @@
 #
 #recovery_end_command = ''
 #
+# specifies an optional timeout after nonzero code of restore_command.
+# This can be useful to increase/decrease number of a restore_command calls.
+#
+#restore_command_retry_interval = 5s
+#
 #---
 # RECOVERY TARGET PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e54..02c55a8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int 	restore_command_retry_interval = 5000L;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -4881,6 +4882,28 @@ readRecoveryCommandFile(void)
 	(errmsg_internal(trigger_file = '%s',
 	 TriggerFile)));
 		}
+		else if (strcmp(item-name, restore_command_retry_interval) == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item-value, restore_command_retry_interval, GUC_UNIT_MS,
+		   hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg(parameter \%s\ requires a temporal value,
+restore_command_retry_interval),
+		 hintmsg ? errhint(%s, _(hintmsg)) : 0));
+			ereport(DEBUG2,
+	(errmsg_internal(restore_command_retry_interval = '%s', item-value)));
+
+			if (restore_command_retry_interval = 0)
+			{
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg(\%s\ must be bigger zero,
+	restore_command_retry_interval)));
+			}
+		}
 		else if (strcmp(item-name, recovery_min_apply_delay) == 0)
 		{
 			const char *hintmsg;
@@ -10658,13 +10681,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	}
 
 	/*
-	 * Wait for more WAL to arrive. Time out after 5 seconds,
+	 * Wait for more WAL to arrive. Time out after
+	 * restore_command_retry_interval (5 seconds by default),
 	 * like when polling the archive, to react to a trigger
 	 * file promptly.
 	 */
 	WaitLatch(XLogCtl-recoveryWakeupLatch,
 			  WL_LATCH_SET | WL_TIMEOUT,
-			  5000L

[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-04 Thread Alexey Vasiliev



Mon, 3 Nov 2014 22:55:02 -0200 от Fabrízio de Royes Mello 
fabriziome...@gmail.com:
 
 
 
 On Mon, Nov 3, 2014 at 7:25 PM, Alexey Vasiliev  leopard...@inbox.ru  wrote:
 
 
 
 
  Mon, 3 Nov 2014 19:18:51 -0200 от Fabrízio de Royes Mello  
  fabriziome...@gmail.com :
  
   
Should I change my patch and send it by email? And also as I understand 
I should change message ID for   
https://commitfest.postgresql.org/action/patch_view?id=1636 , isn't it?
   
  
   You should just send another version of your patch by email and add a new 
   comment to commit-fest using the Patch comment type.
 
 
  Added new patch.
 
 
 And you should add the patch to an open commit-fest not to an in-progress. 
 The next opened is 2014-12 [1].
 

Moved. Thanks.

 
 --
 Fabrízio de Royes Mello
 Consultoria/Coaching PostgreSQL
  Timbira:  http://www.timbira.com.br
  Blog:  http://fabriziomello.github.io
  Linkedin:  http://br.linkedin.com/in/fabriziomello
  Twitter:  http://twitter.com/fabriziomello
  Github:  http://github.com/fabriziomello

-- 
Alexey Vasiliev

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-04 Thread Alexey Vasiliev



Tue, 4 Nov 2014 14:41:56 +0100 от Andres Freund and...@2ndquadrant.com:
 Hi,
 
 On 2014-11-03 14:04:00 +0300, Alexey Vasiliev wrote:
  *  What the patch does in a short paragraph: This patch should add option 
  recovery_timeout, which help to control timeout of restore_command nonzero 
  status code. Right now default value is 5 seconds. This is useful, if I 
  using for restore of wal logs some external storage (like AWS S3) and no 
  matter what the slave database will lag behind the master. The problem, 
  what for each request to AWS S3 need to pay, what is why for N nodes, which 
  try to get next wal log each 5 seconds will be bigger price, than for 
  example each 30 seconds. Before I do this in this way:  if ! 
  (/usr/local/bin/envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch %f 
  %p); then sleep 60; fi . But in this case restart/stop database slower.
 
 Without saying that the feature is unneccessary, wouldn't this better be
 solved by using streaming rep most of the time?

But we don't need streaming rep. Master database no need to know about slaves 
(and no need to add this little overhead to master). Slaves read wal logs from 
S3 and the same S3 wal logs used as backups. 

-- 
Alexey Vasiliev
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Alexey Vasiliev

Hello everyone.

*  Project name:  Add recovery_timeout option to control timeout of 
restore_command nonzero status code
*  Uniquely identifiable file name, so we can tell difference between your v1 
and v24:  0001-add-recovery_timeout-to-controll-timeout-between-res.patch
*  What the patch does in a short paragraph: This patch should add option 
recovery_timeout, which help to control timeout of restore_command nonzero 
status code. Right now default value is 5 seconds. This is useful, if I using 
for restore of wal logs some external storage (like AWS S3) and no matter what 
the slave database will lag behind the master. The problem, what for each 
request to AWS S3 need to pay, what is why for N nodes, which try to get next 
wal log each 5 seconds will be bigger price, than for example each 30 seconds. 
Before I do this in this way:  if ! (/usr/local/bin/envdir /etc/wal-e.d/env 
/usr/local/bin/wal-e wal-fetch %f %p); then sleep 60; fi . But in this 
case restart/stop database slower.
*  Whether the patch is for discussion or for application: No such thing.
*  Which branch the patch is against: master branch
*  Whether it compiles and tests successfully, so we know nothing obvious is 
broken: compiled and pass tests on local mashine.
*  Whether it contains any platform-specific items and if so, has it been 
tested on other platforms: hope, no.
*  Confirm that the patch includes regression tests to check the new feature 
actually works as described: No it doesn't have test. I don't found ho to 
testing new config variables.
*  Include documentation: added.
*  Describe the effect your patch has on performance, if any: shouldn't effect 
on database performance.
This is my first patch. I am not sure about name of option. Maybe it should 
called recovery_nonzero_timeout.

-- 
Alexey VasilievFrom 35abe56b2497f238a6888fe98c54aa9cb5300866 Mon Sep 17 00:00:00 2001
From: Alexey Vasiliev leopard.not.a@gmail.com
Date: Mon, 3 Nov 2014 00:21:14 +0200
Subject: [PATCH] add recovery_timeout to controll timeout between
 restore_command nonzero

---
 doc/src/sgml/recovery-config.sgml   | 16 
 src/backend/access/transam/recovery.conf.sample |  5 +
 src/backend/access/transam/xlog.c   | 17 -
 3 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 0f1ff34..bc410b3 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,22 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
   /listitem
  /varlistentry
 
+ varlistentry id=restore-timeout xreflabel=restore_timeout
+  termvarnamerestore_timeout/varname (typeinteger/type)
+  indexterm
+primaryvarnamerestore_timeout/ recovery parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+By default, if varnamerestore_command/ return nonzero status,
+server will retry command again after 5 seconds. This parameter
+allow to change this time. This parameter is optional. This can
+be useful to increase/decrease number of a restore_command calls.
+   /para
+  /listitem
+ /varlistentry
+
 /variablelist
 
   /sect1
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index 7657df3..282e898 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,11 @@
 #
 #recovery_end_command = ''
 #
+# specifies an optional timeout after nonzero code of restore_command.
+# This can be useful to increase/decrease number of a restore_command calls.
+#
+#restore_timeout = ''
+#
 #---
 # RECOVERY TARGET PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3c9aeae..98f0fca 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -233,6 +233,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int 	restore_timeout = 0;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -5245,6 +5246,20 @@ readRecoveryCommandFile(void)
 	(errmsg_internal(trigger_file = '%s',
 	 TriggerFile)));
 		}
+		else if (strcmp(item-name, restore_timeout) == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item-value, restore_timeout, GUC_UNIT_MS,
+		   hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg(parameter \%s\ requires a temporal value,
+restore_timeout),
+		 hintmsg ? errhint(%s, _(hintmsg)) : 0));
+			ereport(DEBUG2

[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Alexey Vasiliev
Mon, 03 Nov 2014 14:36:33 -0500 от Peter Eisentraut pete...@gmx.net:
 On 11/3/14 6:04 AM, Alexey Vasiliev wrote:
   3. What the patch does in a short paragraph: This patch should add
  option recovery_timeout, which help to control timeout of
  restore_command nonzero status code. Right now default value is 5
  seconds. This is useful, if I using for restore of wal logs some
  external storage (like AWS S3) and no matter what the slave database
  will lag behind the master. The problem, what for each request to
  AWS S3 need to pay, what is why for N nodes, which try to get next
  wal log each 5 seconds will be bigger price, than for example each
  30 seconds.
 
 That seems useful.  I would include something about this use case in the
 documentation.

Ok, I will add this in patch.

 
  This is my first patch. I am not sure about name of option. Maybe it
  should called recovery_nonzero_timeout.
 
 The option name had me confused.  At first I though this is the time
 after which a running restore_command invocation gets killed.  I think a
 more precise description might be restore_command_retry_interval.

restore_command_retry_interval - I like this name!

Should I change my patch and send it by email? And also as I understand I 
should change message ID for 
https://commitfest.postgresql.org/action/patch_view?id=1636, isn't it?

Thanks

-- 
Alexey Vasiliev
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[3]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Alexey Vasiliev



Mon, 3 Nov 2014 19:18:51 -0200 от Fabrízio de Royes Mello 
fabriziome...@gmail.com:
 
 
  Should I change my patch and send it by email? And also as I understand I 
  should change message ID for  
  https://commitfest.postgresql.org/action/patch_view?id=1636 , isn't it?
 
 
 You should just send another version of your patch by email and add a new 
 comment to commit-fest using the Patch comment type.


Added new patch.

-- 
Alexey Vasiliev
From a5d941717df71e4c16941b8a02f0dca40a1107a0 Mon Sep 17 00:00:00 2001
From: Alexey Vasiliev leopard.not.a@gmail.com
Date: Mon, 3 Nov 2014 00:21:14 +0200
Subject: [PATCH] add recovery_timeout to controll timeout between
 restore_command nonzero

---
 doc/src/sgml/recovery-config.sgml   | 24 
 src/backend/access/transam/recovery.conf.sample |  5 +
 src/backend/access/transam/xlog.c   | 20 ++--
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index 0f1ff34..98eb451 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,30 @@ restore_command = 'copy C:\\server\\archivedir\\%f %p'  # Windows
   /listitem
  /varlistentry
 
+ varlistentry id=restore-command-retry-interval xreflabel=restore_command_retry_interval
+  termvarnamerestore_command_retry_interval/varname (typeinteger/type)
+  indexterm
+primaryvarnamerestore_command_retry_interval/ recovery parameter/primary
+  /indexterm
+  /term
+  listitem
+   para
+By default, if varnamerestore_command/ return nonzero status,
+server will retry command again after 5 seconds. This parameter
+allow to change this time. This parameter is optional. This can
+be useful to increase/decrease number of a restore_command calls.
+   /para
+   para
+This is useful, if I using for restore of wal logs some
+external storage (like AWS S3) and no matter what the slave database
+will lag behind the master. The problem, what for each request to
+AWS S3 need to pay, what is why for N nodes, which try to get next
+wal log each 5 seconds will be bigger price, than for example each
+30 seconds.
+   /para
+  /listitem
+ /varlistentry
+
 /variablelist
 
   /sect1
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index 7657df3..4eee8f4 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,11 @@
 #
 #recovery_end_command = ''
 #
+# specifies an optional timeout after nonzero code of restore_command.
+# This can be useful to increase/decrease number of a restore_command calls.
+#
+#restore_command_retry_interval = 5s
+#
 #---
 # RECOVERY TARGET PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 3c9aeae..c26101e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -233,6 +233,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int 	restore_command_retry_interval = 0;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -5245,6 +5246,20 @@ readRecoveryCommandFile(void)
 	(errmsg_internal(trigger_file = '%s',
 	 TriggerFile)));
 		}
+		else if (strcmp(item-name, restore_command_retry_interval) == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item-value, restore_command_retry_interval, GUC_UNIT_MS,
+		   hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg(parameter \%s\ requires a temporal value,
+restore_command_retry_interval),
+		 hintmsg ? errhint(%s, _(hintmsg)) : 0));
+			ereport(DEBUG2,
+	(errmsg_internal(restore_command_retry_interval = '%s', item-value)));
+		}
 		else if (strcmp(item-name, recovery_min_apply_delay) == 0)
 		{
 			const char *hintmsg;
@@ -11104,13 +9,14 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	}
 
 	/*
-	 * Wait for more WAL to arrive. Time out after 5 seconds,
+	 * Wait for more WAL to arrive. Time out after
+	 * restore_command_retry_interval (5 seconds by default),
 	 * like when polling the archive, to react to a trigger
 	 * file promptly.
 	 */
 	WaitLatch(XLogCtl-recoveryWakeupLatch,
 			  WL_LATCH_SET | WL_TIMEOUT,
-			  5000L);
+			  restore_command_retry_interval  0 ? restore_command_retry_interval : 5000L);
 	ResetLatch(XLogCtl-recoveryWakeupLatch);
 	break

[HACKERS] Re[2]: [HACKERS] Connect from background worker thread to database

2013-11-24 Thread Alexey Vasiliev
 Воскресенье, 24 ноября 2013, 14:06 +01:00 от Andres Freund 
and...@anarazel.de:
Hi,

On 2013-11-24 16:27:06 +0400, Олексій Васильєв wrote:
 This is part where I try to connect to database:   
 https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L92-L132 , but 
 SPI functions give error in log (it is commented):
 
 2013-11-24 02:57:43 UTC ERROR:  stack depth limit exceeded
 2013-11-24 02:57:43 UTC HINT:  Increase the configuration parameter 
 max_stack_depth (currently 2048kB), after ensuring the platform's stack 
 depth limit is adequate.
 2013-11-24 02:57:43 UTC CONTEXT:  SQL statement SELECT COUNT(*) FROM 
 pg_class;
 
 Because I doing something in wrong way. I will appreciate for any help: 
 where I doing wrong, link to the article how to do it, just a tip, pull 
 request - anything. Google search and PostgreSQL sources reading  so far 
 helped me to this point.

At the very least you're calling InitPostgres() instead of
BackgroundWorkerInitializeConnection() which you have commented out. And
the latter should only be called once in every worker.

Greetings,

Andres Freund

Fixed by using PQconnectdb from libpq-fe.h. Thanks.


-- 
Alexey Vasiliev


[HACKERS] Re[2]: [HACKERS] Re[2]: [HACKERS] Connect from background worker thread to database

2013-11-24 Thread Alexey Vasiliev
 Понедельник, 25 ноября 2013, 8:31 +09:00 от Michael Paquier 
michael.paqu...@gmail.com:
On Mon, Nov 25, 2013 at 2:10 AM, Alexey Vasiliev  leopard...@inbox.ru  wrote:
 Воскресенье, 24 ноября 2013, 14:06 +01:00 от Andres Freund
  and...@anarazel.de :

 Hi,


 On 2013-11-24 16:27:06 +0400, Олексій Васильєв wrote:
 This is part where I try to connect to database:
  https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L92-L132 , but
 SPI functions give error in log (it is commented):

 2013-11-24 02:57:43 UTC ERROR:  stack depth limit exceeded
 2013-11-24 02:57:43 UTC HINT:  Increase the configuration parameter
 max_stack_depth (currently 2048kB), after ensuring the platform's stack
 depth limit is adequate.
 2013-11-24 02:57:43 UTC CONTEXT:  SQL statement SELECT COUNT(*) FROM
 pg_class;

 Because I doing something in wrong way. I will appreciate for any help:
 where I doing wrong, link to the article how to do it, just a tip, pull
 request - anything. Google search and PostgreSQL sources reading  so far
 helped me to this point.

 At the very least you're calling InitPostgres() instead of
 BackgroundWorkerInitializeConnection() which you have commented out. And
 the latter should only be called once in every worker.

 Greetings,

 Andres Freund


 Fixed by using PQconnectdb from libpq-fe.h. Thanks.
You should not need an extra PQconnectdb to connect to a database
using a background worker for your case as far as I understood. By
using that you are requiring a connection to database without using
the internal infrastructure in place, meaning that your bgworker is
not connected to the database server from the inside but from the
outside, like a normal client would do. Aren't to trying to have a
background worker connected to a database when it is initialized with
BgWorkerStart_PostmasterStart? Bgworkers using this start-up mode are
not eligible to initialize database connections. Please use either
BgWorkerStart_ConsistentState or BgWorkerStart_RecoveryFinished and
BackgroundWorkerInitializeConnection will allow a connection to
database without any extra work.
-- 
Michael
Thanks, I just try this. This work if I working with database in loop inside 
bgw_main function. But if I create threads inside bgw_main and try to connect 
to database - I have errors stack depth limit exceeded . The code:

https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L195  - connect to 
database
https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L100-L131  - 
http_event_handler function execute in threads, because it handle http 
requests. And this code not work. BTW, I need connect to database depend from 
request url, so execute BackgroundWorkerInitializeConnection at the beginning 
not enough.

Thanks again.

-- 
Alexey Vasiliev


[HACKERS] Re[2]: [HACKERS] Re[2]: [HACKERS] Connect from background worker thread to database

2013-11-24 Thread Alexey Vasiliev
 Понедельник, 25 ноября 2013, 13:31 +09:00 от Michael Paquier 
michael.paqu...@gmail.com:
On Mon, Nov 25, 2013 at 1:02 PM, Alexey Vasiliev  leopard...@inbox.ru  wrote:
 Thanks, I just try this. This work if I working with database in loop inside
 bgw_main function. But if I create threads inside bgw_main and try to
 connect to database - I have errors stack depth limit exceeded . The code:

  https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L195 - connect to
 database
  https://github.com/le0pard/pg_web/blob/master/src/pg_web.c#L100-L131 -
 http_event_handler function execute in threads, because it handle http
 requests. And this code not work. BTW, I need connect to database depend
 from request url, so execute BackgroundWorkerInitializeConnection at the
 beginning not enough.
There is a design problem with your application when trying to create
threads with mg_start in order to grab events. Note that I am not
familiar with mongoose, but with bgworkers you cannot simply create
new threads that would be able to connect to server concurrently. A
model that would be more suited with bgworkers would be something
like:
- Handle event messages by for example opening a port or monitoring
the events on a single bgworker launched by bgw_main, with for example
a message queue model (possible with 9.3). Connection to database
would be done with a single connection, managed within the loop of
bgw_main.
- Create a new bgworker dynamically each time a new event comes in
(possible only with 9.4~). The top work would be done by a central
bgworker initialized with server, which would treat events and kick
new slave workers when necessary.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thanks. I will look how to do this in the best way by your suggestions.

-- 
Alexey Vasiliev