[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2015-01-05 Thread Alexey Vasiliev
Mon, 5 Jan 2015 17:16:43 +0900 от Michael Paquier :
> On Wed, Dec 31, 2014 at 12:22 AM, Alexey Vasiliev  wrote:
> > Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier 
> > :
> > As I understand now = (pg_time_t) time(NULL); return time in seconds, what 
> > is why I multiply value to 1000 to compare with 
> > restore_command_retry_interval in milliseconds.
> This way of doing is incorrect, you would get a wait time of 1s even
> for retries lower than 1s. In order to get the feature working
> correctly and to get a comparison granularity sufficient, you need to
> use TimetampTz for the tracking of the current and last failure time
> and to use the APIs from TimestampTz for difference calculations.
> 
> > I am not sure about small retry interval of time, in my cases I need 
> > interval bigger 5 seconds (20-40 seconds). Right now I limiting this value 
> > be bigger 100 milliseconds.
> Other people may disagree here, but I don't see any reason to put
> restrictions to this interval of time.
> 
> Attached is a patch fixing the feature to use TimestampTz, updating as
> well the docs and recovery.conf.sample which was incorrect, on top of
> other things I noticed here and there.
> 
> Alexey, does this new patch look fine for you?
> -- 
> Michael
> 
> 

Hello. Thanks for help. Yes, new patch look fine!

-- 
Alexey Vasiliev
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-12-30 Thread Alexey Vasiliev
Tue, 30 Dec 2014 21:39:33 +0900 от Michael Paquier :
> On Tue, Dec 30, 2014 at 9:33 PM, Michael Paquier
>  wrote:
> > On Tue, Dec 30, 2014 at 9:10 PM, Alexey Vasiliev  
> > wrote:
> >> To not modify current pg_usleep calculation, I changed
> >> restore_command_retry_interval value to seconds (not milliseconds). In this
> >> case, min value - 1 second.
> > Er, what the problem with not changing 100L to 1000L? The unit of
> > your parameter is ms AFAIK.
> Of course I meant in the previous version of the patch not the current
> one. Wouldn't it be useful to use it with for example retry intervals
> of the order of 100ms~300ms for some cases?
> -- 
> Michael
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
> 

Thanks, patch changed. 

As I understand now = (pg_time_t) time(NULL); return time in seconds, what is 
why I multiply value to 1000 to compare with restore_command_retry_interval in 
milliseconds.

I am not sure about small retry interval of time, in my cases I need interval 
bigger 5 seconds (20-40 seconds). Right now I limiting this value be bigger 100 
milliseconds.

-- 
Alexey Vasilievdiff --git a/doc/src/sgml/recovery-config.sgml b/doc/src/sgml/recovery-config.sgml
index ef78bc0..52cb47d 100644
--- a/doc/src/sgml/recovery-config.sgml
+++ b/doc/src/sgml/recovery-config.sgml
@@ -145,6 +145,29 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
   
  
 
+ 
+  restore_command_retry_interval (integer)
+  
+restore_command_retry_interval recovery parameter
+  
+  
+  
+   
+If restore_command returns nonzero exit status code, retry
+command after the interval of time specified by this parameter.
+Default value is 5s.
+   
+   
+This is useful, if I using for restore of wal logs some
+external storage and request each 5 seconds for wal logs
+can be useless and cost additional money.
+Increasing this parameter allow to increase retry interval of time,
+if not found new wal logs inside external storage and no matter
+what the slave database will lag behind the master.
+   
+  
+ 
+
 
 
   
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index b777400..5b63f60 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -58,6 +58,11 @@
 #
 #recovery_end_command = ''
 #
+# specifies an optional retry interval of restore_command command, if previous return nonzero exit status code.
+# This can be useful to increase/decrease number of a restore_command calls.
+#
+#restore_command_retry_interval = 5s
+#
 #---
 # RECOVERY TARGET PARAMETERS
 #---
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index e54..67a873c 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -235,6 +235,7 @@ static TimestampTz recoveryTargetTime;
 static char *recoveryTargetName;
 static int	recovery_min_apply_delay = 0;
 static TimestampTz recoveryDelayUntilTime;
+static int 	restore_command_retry_interval = 5000;
 
 /* options taken from recovery.conf for XLOG streaming */
 static bool StandbyModeRequested = false;
@@ -4881,6 +4882,28 @@ readRecoveryCommandFile(void)
 	(errmsg_internal("trigger_file = '%s'",
 	 TriggerFile)));
 		}
+		else if (strcmp(item->name, "restore_command_retry_interval") == 0)
+		{
+			const char *hintmsg;
+
+			if (!parse_int(item->value, &restore_command_retry_interval, GUC_UNIT_MS,
+		   &hintmsg))
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("parameter \"%s\" requires a temporal value",
+"restore_command_retry_interval"),
+		 hintmsg ? errhint("%s", _(hintmsg)) : 0));
+			ereport(DEBUG2,
+	(errmsg_internal("restore_command_retry_interval = '%s'", item->value)));
+
+			if (restore_command_retry_interval < 100)
+			{
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+		 errmsg("\"%s\" must have a bigger 100 milliseconds value",
+	"restore_command_retry_interval")));
+			}
+		}
 		else if (strcmp(item->name, "recovery_min_apply_delay") == 0)
 		{
 			const char *hintmsg;
@@ -10495,13 +10518,13 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 * machine, so we've exhausted all the options for
 	 * obtaining the requested WAL. We're going to loop back
 	 * and retry from the archive, but if it hasn't been long
-	 * since last attempt, sleep 5 seconds to avoid
-	 * busy-waiting.
+	 * since last attempt, sleep restore_command_retry_interval
+	 * (by default

[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-04 Thread Alexey Vasiliev



Tue, 4 Nov 2014 14:41:56 +0100 от Andres Freund :
> Hi,
> 
> On 2014-11-03 14:04:00 +0300, Alexey Vasiliev wrote:
> > *  What the patch does in a short paragraph: This patch should add option 
> > recovery_timeout, which help to control timeout of restore_command nonzero 
> > status code. Right now default value is 5 seconds. This is useful, if I 
> > using for restore of wal logs some external storage (like AWS S3) and no 
> > matter what the slave database will lag behind the master. The problem, 
> > what for each request to AWS S3 need to pay, what is why for N nodes, which 
> > try to get next wal log each 5 seconds will be bigger price, than for 
> > example each 30 seconds. Before I do this in this way: " if ! 
> > (/usr/local/bin/envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-fetch "%f" 
> > "%p"); then sleep 60; fi ". But in this case restart/stop database slower.
> 
> Without saying that the feature is unneccessary, wouldn't this better be
> solved by using streaming rep most of the time?

But we don't need streaming rep. Master database no need to know about slaves 
(and no need to add this little overhead to master). Slaves read wal logs from 
S3 and the same S3 wal logs used as backups. 

-- 
Alexey Vasiliev
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-04 Thread Alexey Vasiliev



Mon, 3 Nov 2014 22:55:02 -0200 от Fabrízio de Royes Mello 
:
> 
> 
> 
> On Mon, Nov 3, 2014 at 7:25 PM, Alexey Vasiliev < leopard...@inbox.ru > wrote:
> >
> >
> >
> >
> > Mon, 3 Nov 2014 19:18:51 -0200 от Fabrízio de Royes Mello < 
> > fabriziome...@gmail.com >:
> > >
> > > >
> > > > Should I change my patch and send it by email? And also as I understand 
> > > > I should change message ID for   
> > > > https://commitfest.postgresql.org/action/patch_view?id=1636 , isn't it?
> > > >
> > >
> > > You should just send another version of your patch by email and add a new 
> > > "comment" to commit-fest using the "Patch" comment type.
> >
> >
> > Added new patch.
> >
> 
> And you should add the patch to an open commit-fest not to an in-progress. 
> The next opened is 2014-12 [1].
> 

Moved. Thanks.

> 
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Timbira:  http://www.timbira.com.br
> >> Blog:  http://fabriziomello.github.io
> >> Linkedin:  http://br.linkedin.com/in/fabriziomello
> >> Twitter:  http://twitter.com/fabriziomello
> >> Github:  http://github.com/fabriziomello

-- 
Alexey Vasiliev

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Fabrízio de Royes Mello
>
> Should I change my patch and send it by email? And also as I understand I
should change message ID for
https://commitfest.postgresql.org/action/patch_view?id=1636, isn't it?
>

You should just send another version of your patch by email and add a new
"comment" to commit-fest using the "Patch" comment type.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


[HACKERS] Re[2]: [HACKERS] Patch: add recovery_timeout option to control timeout of restore_command nonzero status code

2014-11-03 Thread Alexey Vasiliev
Mon, 03 Nov 2014 14:36:33 -0500 от Peter Eisentraut :
> On 11/3/14 6:04 AM, Alexey Vasiliev wrote:
> >  3. What the patch does in a short paragraph: This patch should add
> > option recovery_timeout, which help to control timeout of
> > restore_command nonzero status code. Right now default value is 5
> > seconds. This is useful, if I using for restore of wal logs some
> > external storage (like AWS S3) and no matter what the slave database
> > will lag behind the master. The problem, what for each request to
> > AWS S3 need to pay, what is why for N nodes, which try to get next
> > wal log each 5 seconds will be bigger price, than for example each
> > 30 seconds.
> 
> That seems useful.  I would include something about this use case in the
> documentation.

Ok, I will add this in patch.

> 
> > This is my first patch. I am not sure about name of option. Maybe it
> > should called "recovery_nonzero_timeout".
> 
> The option name had me confused.  At first I though this is the time
> after which a running restore_command invocation gets killed.  I think a
> more precise description might be restore_command_retry_interval.

"restore_command_retry_interval" - I like this name!

Should I change my patch and send it by email? And also as I understand I 
should change message ID for 
https://commitfest.postgresql.org/action/patch_view?id=1636, isn't it?

Thanks

-- 
Alexey Vasiliev
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers