Re: [HACKERS] archive_keepalive_command

2012-08-27 Thread Bruce Momjian

Where are we on this?

---

On Mon, Jan 16, 2012 at 01:52:35AM +, Simon Riggs wrote:
 On Fri, Dec 16, 2011 at 3:01 PM, Simon Riggs si...@2ndquadrant.com wrote:
  archive_command and restore_command describe how to ship WAL files
  to/from an archive.
 
  When there is nothing to ship, we delay sending WAL files. When no WAL
  files, the standby has no information at all.
 
  To provide some form of keepalive on quiet systems the
  archive_keepalive_command provides a generic hook to implement
  keepalives. This is implemented as a separate command to avoid storing
  keepalive messages in the archive, or at least allow overwrites using
  a single filename like keepalive.
 
  Examples
  archive_keepalive_command = 'arch_cmd keepalive'   # sends a file
  called keepalive to archive, overwrites allowed
  archive_keepalive_command = 'arch_cmd %f.%t.keepalive  #sends a file
  like 0001000ABFE.20111216143517.keepalive
 
  If there is no WAL file to send, then we send a keepalive file
  instead. Keepalive is a small file that contains same contents as a
  streaming keepalive message (re: other patch on that).
 
  If no WAL file is available and we are attempting to restore in
  standby_mode, then we execute restore_keepalive_command to see if a
  keepalive file is available. Checks for a file in the specific
  keepalive format and then uses that to update last received info from
  master.
 
  e.g.
  restore_keepalive_command = 'restore_cmd keepalive'   # gets a file
  called keepalive to archive, overwrites allowed
 
 Patch.
 
 -- 
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services

 diff --git a/src/backend/access/transam/recovery.conf.sample 
 b/src/backend/access/transam/recovery.conf.sample
 index 5acfa57..fab288c 100644
 --- a/src/backend/access/transam/recovery.conf.sample
 +++ b/src/backend/access/transam/recovery.conf.sample
 @@ -43,6 +43,13 @@
  #
  #restore_command = ''# e.g. 'cp /mnt/server/archivedir/%f %p'
  #
 +# restore_keepalive_command
 +#
 +# specifies an optional shell command to download keepalive files
 +#  e.g. archive_keepalive_command = 'cp -f %p $ARCHIVE/keepalive /dev/null'
 +#  e.g. restore_keepalive_command = 'cp $ARCHIVE/keepalive %p'
 +#
 +#restore_keepalive_command = ''
  #
  # archive_cleanup_command
  #
 diff --git a/src/backend/access/transam/xlog.c 
 b/src/backend/access/transam/xlog.c
 index ce659ec..2729141 100644
 --- a/src/backend/access/transam/xlog.c
 +++ b/src/backend/access/transam/xlog.c
 @@ -73,8 +73,10 @@ intCheckPointSegments = 3;
  int  wal_keep_segments = 0;
  int  XLOGbuffers = -1;
  int  XLogArchiveTimeout = 0;
 +int  XLogArchiveKeepaliveTimeout = 10;   /* XXX set to 
 60 before commit */
  bool XLogArchiveMode = false;
  char*XLogArchiveCommand = NULL;
 +char*XLogArchiveKeepaliveCommand = NULL;
  bool EnableHotStandby = false;
  bool fullPageWrites = true;
  bool log_checkpoints = false;
 @@ -188,6 +190,7 @@ static bool restoredFromArchive = false;
  
  /* options taken from recovery.conf for archive recovery */
  static char *recoveryRestoreCommand = NULL;
 +static char *recoveryRestoreKeepaliveCommand = NULL;
  static char *recoveryEndCommand = NULL;
  static char *archiveCleanupCommand = NULL;
  static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
 @@ -634,6 +637,7 @@ static intemode_for_corrupt_record(int emode, 
 XLogRecPtr RecPtr);
  static void XLogFileClose(void);
  static bool RestoreArchivedFile(char *path, const char *xlogfname,
   const char *recovername, off_t 
 expectedSize);
 +static void RestoreKeepaliveFile(void);
  static void ExecuteRecoveryCommand(char *command, char *commandName,
  bool failOnerror);
  static void PreallocXlogFiles(XLogRecPtr endptr);
 @@ -2718,7 +2722,10 @@ XLogFileRead(uint32 log, uint32 seg, int emode, 
 TimeLineID tli,
   
   RECOVERYXLOG,
   
   XLogSegSize);
   if (!restoredFromArchive)
 + {
 + RestoreKeepaliveFile();
   return -1;
 + }
   break;
  
   case XLOG_FROM_PG_XLOG:
 @@ -3179,6 +3186,192 @@ not_available:
   return false;
  }
  
 +static void
 +RestoreKeepaliveFile(void)
 +{
 + charkeepalivepath[MAXPGPATH];
 + charkeepaliveRestoreCmd[MAXPGPATH];
 + char   *dp;
 + char   *endp;
 + const char *sp;
 + int  

Re: [HACKERS] archive_keepalive_command

2012-08-27 Thread Robert Haas
On Mon, Aug 27, 2012 at 9:48 AM, Bruce Momjian br...@momjian.us wrote:
 Where are we on this?

It didn't make it into 9.2, and the patch hasn't been resubmitted for
9.3.  It's still not really 100% clear to me what problem it lets us
solve that we can't solve otherwise.  Maybe that is just a question of
adding documentation; I don't know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2012-03-28 Thread Robert Haas
On Mon, Mar 5, 2012 at 11:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sun, Mar 4, 2012 at 1:20 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 Does this patch have any user-visible effect?  I thought it would make
 pg_last_xact_replay_timestamp() advance, but it does not seem to.  I
 looked through the source a bit, and as best I can tell this only sets
 some internal state which is never used, except under DEBUG2

 Thanks for the review. I'll look into that.

Simon, are you still hoping to get this done for this releases?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2012-03-05 Thread Simon Riggs
On Sun, Mar 4, 2012 at 1:20 AM, Jeff Janes jeff.ja...@gmail.com wrote:

 Does this patch have any user-visible effect?  I thought it would make
 pg_last_xact_replay_timestamp() advance, but it does not seem to.  I
 looked through the source a bit, and as best I can tell this only sets
 some internal state which is never used, except under DEBUG2

Thanks for the review. I'll look into that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2012-03-03 Thread Jeff Janes
On Sun, Jan 15, 2012 at 5:52 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Dec 16, 2011 at 3:01 PM, Simon Riggs si...@2ndquadrant.com wrote:
 archive_command and restore_command describe how to ship WAL files
 to/from an archive.

 When there is nothing to ship, we delay sending WAL files. When no WAL
 files, the standby has no information at all.

 To provide some form of keepalive on quiet systems the
 archive_keepalive_command provides a generic hook to implement
 keepalives. This is implemented as a separate command to avoid storing
 keepalive messages in the archive, or at least allow overwrites using
 a single filename like keepalive.


 Patch.

Preliminary review:

Applies with several hunks, and with some fuzz in xlog.h

Builds cleanly and passes make check.

Does not provide documentation, which is needed.

Does not include regression tests, but there is no framework for
testing archiving.

Usability testing:

Does this patch have any user-visible effect?  I thought it would make
pg_last_xact_replay_timestamp() advance, but it does not seem to.  I
looked through the source a bit, and as best I can tell this only sets
some internal state which is never used, except under DEBUG2

The example archive_keepalive_command given in postgresql.conf.sample
is not usable as given.  If the file is named %f, then there is no
easy way for restore_keepalive_command to retrieve the file because it
would not know the name to use.  So the example given in
postgresql.conf.sample should be more like the one given in
recovery.conf.sample, where it uses a hard-coded name rather than %f.
But in that case, it is not clear what %f might be useful for.


Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2012-01-15 Thread Simon Riggs
On Fri, Dec 16, 2011 at 3:01 PM, Simon Riggs si...@2ndquadrant.com wrote:
 archive_command and restore_command describe how to ship WAL files
 to/from an archive.

 When there is nothing to ship, we delay sending WAL files. When no WAL
 files, the standby has no information at all.

 To provide some form of keepalive on quiet systems the
 archive_keepalive_command provides a generic hook to implement
 keepalives. This is implemented as a separate command to avoid storing
 keepalive messages in the archive, or at least allow overwrites using
 a single filename like keepalive.

 Examples
 archive_keepalive_command = 'arch_cmd keepalive'   # sends a file
 called keepalive to archive, overwrites allowed
 archive_keepalive_command = 'arch_cmd %f.%t.keepalive  #sends a file
 like 0001000ABFE.20111216143517.keepalive

 If there is no WAL file to send, then we send a keepalive file
 instead. Keepalive is a small file that contains same contents as a
 streaming keepalive message (re: other patch on that).

 If no WAL file is available and we are attempting to restore in
 standby_mode, then we execute restore_keepalive_command to see if a
 keepalive file is available. Checks for a file in the specific
 keepalive format and then uses that to update last received info from
 master.

 e.g.
 restore_keepalive_command = 'restore_cmd keepalive'   # gets a file
 called keepalive to archive, overwrites allowed

Patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/transam/recovery.conf.sample b/src/backend/access/transam/recovery.conf.sample
index 5acfa57..fab288c 100644
--- a/src/backend/access/transam/recovery.conf.sample
+++ b/src/backend/access/transam/recovery.conf.sample
@@ -43,6 +43,13 @@
 #
 #restore_command = ''		# e.g. 'cp /mnt/server/archivedir/%f %p'
 #
+# restore_keepalive_command
+#
+# specifies an optional shell command to download keepalive files
+#  e.g. archive_keepalive_command = 'cp -f %p $ARCHIVE/keepalive /dev/null'
+#  e.g. restore_keepalive_command = 'cp $ARCHIVE/keepalive %p'
+#
+#restore_keepalive_command = ''
 #
 # archive_cleanup_command
 #
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index ce659ec..2729141 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -73,8 +73,10 @@ int			CheckPointSegments = 3;
 int			wal_keep_segments = 0;
 int			XLOGbuffers = -1;
 int			XLogArchiveTimeout = 0;
+int			XLogArchiveKeepaliveTimeout = 10;	/* XXX set to 60 before commit */
 bool		XLogArchiveMode = false;
 char	   *XLogArchiveCommand = NULL;
+char	   *XLogArchiveKeepaliveCommand = NULL;
 bool		EnableHotStandby = false;
 bool		fullPageWrites = true;
 bool		log_checkpoints = false;
@@ -188,6 +190,7 @@ static bool restoredFromArchive = false;
 
 /* options taken from recovery.conf for archive recovery */
 static char *recoveryRestoreCommand = NULL;
+static char *recoveryRestoreKeepaliveCommand = NULL;
 static char *recoveryEndCommand = NULL;
 static char *archiveCleanupCommand = NULL;
 static RecoveryTargetType recoveryTarget = RECOVERY_TARGET_UNSET;
@@ -634,6 +637,7 @@ static int	emode_for_corrupt_record(int emode, XLogRecPtr RecPtr);
 static void XLogFileClose(void);
 static bool RestoreArchivedFile(char *path, const char *xlogfname,
 	const char *recovername, off_t expectedSize);
+static void RestoreKeepaliveFile(void);
 static void ExecuteRecoveryCommand(char *command, char *commandName,
 	   bool failOnerror);
 static void PreallocXlogFiles(XLogRecPtr endptr);
@@ -2718,7 +2722,10 @@ XLogFileRead(uint32 log, uint32 seg, int emode, TimeLineID tli,
 	  RECOVERYXLOG,
 	  XLogSegSize);
 			if (!restoredFromArchive)
+			{
+RestoreKeepaliveFile();
 return -1;
+			}
 			break;
 
 		case XLOG_FROM_PG_XLOG:
@@ -3179,6 +3186,192 @@ not_available:
 	return false;
 }
 
+static void
+RestoreKeepaliveFile(void)
+{
+	char		keepalivepath[MAXPGPATH];
+	char		keepaliveRestoreCmd[MAXPGPATH];
+	char	   *dp;
+	char	   *endp;
+	const char *sp;
+	int			rc;
+	bool		signaled;
+	struct stat stat_buf;
+
+	/* In standby mode, restore_command might not be supplied */
+	if (recoveryRestoreKeepaliveCommand == NULL)
+		return;
+
+	snprintf(keepalivepath, MAXPGPATH, XLOGDIR /archive_status/KEEPALIVE);
+
+	/*
+	 * Make sure there is no existing file in keepalivepath
+	 */
+	if (stat(keepalivepath, stat_buf) == 0)
+	{
+		if (unlink(keepalivepath) != 0)
+			ereport(FATAL,
+	(errcode_for_file_access(),
+	 errmsg(could not remove file \%s\: %m,
+			keepalivepath)));
+	}
+
+	/*
+	 * construct the command to be executed
+	 */
+	dp = keepaliveRestoreCmd;
+	endp = keepaliveRestoreCmd + MAXPGPATH - 1;
+	*endp = '\0';
+
+	for (sp = recoveryRestoreKeepaliveCommand; *sp; sp++)
+	{
+		if (*sp == '%')
+		{
+			switch (sp[1])
+			{
+case 'p':
+	/* %p: relative path of target file */
+	sp++;
+	

Re: [HACKERS] archive_keepalive_command

2011-12-22 Thread Robert Haas
On Mon, Dec 19, 2011 at 1:02 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Dec 12, you said It also strikes me that anything
 that is based on augmenting the walsender/walreceiver protocol leaves
 anyone who is using WAL shipping out in the cold.  I'm not clear from
 the comments you or Simon have made how important you think that use
 case still is.

 Not wanting to leave anyone out in the cold, I proposed something to
 enhance file based replication also.

Fair enough.

I am still of the opinion that we ought to commit some version of the
pg_last_xact_insert_timestamp patch.  I accept that patch isn't going
to solve every problem, but I still think it's worth having.  If one
of these other solutions comes along and turns out to work great,
that's fine, too; but I don't think any of them are so compelling that
we can credibly say that pg_last_xact_insert_timestamp is useless or
obsolete.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2011-12-19 Thread Robert Haas
On Fri, Dec 16, 2011 at 10:01 AM, Simon Riggs si...@2ndquadrant.com wrote:
 To provide some form of keepalive on quiet systems the
 archive_keepalive_command provides a generic hook to implement
 keepalives. This is implemented as a separate command to avoid storing
 keepalive messages in the archive, or at least allow overwrites using
 a single filename like keepalive.

This may be stupid of me, but I don't see the point of this.  If you
want keepalives, why use log shipping rather than SR?  Implementing a
really-high-latency method of passing protocol messages through the
archive seems like a complex solution to a non-problem (but, like I
say, I may be missing something).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2011-12-19 Thread Greg Smith

On 12/19/2011 08:17 AM, Robert Haas wrote:

If you want keepalives, why use log shipping rather than SR?  Implementing a
really-high-latency method of passing protocol messages through the
archive seems like a complex solution to a non-problem


The problem being addressed is how can people using archiving compute 
time-based lag usefully?  Thinking about an answer to that question 
that made sense for SR drove us toward keepalive timestamp sharing.  
This is trying to introduce a mechanism good enough to do the same thing 
for regular archive recovery.


In the archiving case, the worst case waiting to trip you up is always 
the one where not enough activity happened to generate a new WAL file 
yet.  If people want lag to move correctly in that case anyway, a 
message needs to be transferred from archiver to recovery system.  Simon 
is suggesting that we do that via shipping a new small file in that 
case, rather than trying to muck with putting it into the WAL data or 
something like that.  It's a bit hackish, but a) no more hackish than 
people are used to for PITR, and b) in a way that avoids touching 
database code in the critical path for SR.


This idea might eliminate the last of the reasons I was speculating on 
for adding more timestamps into the WAL stream.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2011-12-19 Thread Simon Riggs
On Mon, Dec 19, 2011 at 1:17 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Dec 16, 2011 at 10:01 AM, Simon Riggs si...@2ndquadrant.com wrote:
 To provide some form of keepalive on quiet systems the
 archive_keepalive_command provides a generic hook to implement
 keepalives. This is implemented as a separate command to avoid storing
 keepalive messages in the archive, or at least allow overwrites using
 a single filename like keepalive.

 This may be stupid of me, but I don't see the point of this.  If you
 want keepalives, why use log shipping rather than SR?

On Dec 12, you said It also strikes me that anything
that is based on augmenting the walsender/walreceiver protocol leaves
anyone who is using WAL shipping out in the cold.  I'm not clear from
the comments you or Simon have made how important you think that use
case still is.

Not wanting to leave anyone out in the cold, I proposed something to
enhance file based replication also.

In any case, multiple others have requested this feature, so its worth
doing even if you have changed your mind.

 Implementing a
 really-high-latency method of passing protocol messages through the
 archive seems like a complex solution to a non-problem (but, like I
 say, I may be missing something).

So a) it is a problem, and b) its not complex.

The proposed method doesn't necessarily use the archive. Allowing
users to specify how the keepalive will work makes it a flexible
solution to a widely recognised problem.

This proposal doesn't replace the protocol keepalive for streaming
replication, it provides exactly the same thing for file based
replication users. Many people use both streaming and file-based, so
need a way to measure latency that acts similarly no matter which one
is currently in use.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers