subject:"\[HACKERS\] Allowing multiple concurrent base backups"

Re: [HACKERS] Allowing multiple concurrent base backups

2011-03-21 Thread Heikki Linnakangas


On 18.03.2011 13:56, Heikki Linnakangas wrote:

On 18.03.2011 10:48, Heikki Linnakangas wrote:

On 17.03.2011 21:39, Robert Haas wrote:

On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masaomasao.fu...@gmail.com
wrote:

On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

Hmm, good point. It's harmless, but creating the history file in the
first
place sure seems like a waste of time.


The attached patch changes pg_stop_backup so that it doesn't create
the backup history file if archiving is not enabled.

When I tested the multiple backups, I found that they can have the same
checkpoint location and the same history file name.


$ for ((i=0; i4; i++)); do
pg_basebackup -D test$i -c fast -x -l test$i
done

$ cat test0/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test0

$ cat test1/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test1

$ cat test2/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test2

$ cat test3/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test3

$ ls archive/*.backup
archive/00010002.00B0.backup


This would cause a serious problem. Because the backup-end record
which indicates the same START WAL LOCATION can be written by the
first backup before the other finishes. So we might think wrongly that
we've already reached a consistency state by reading the backup-end
record (written by the first backup) before reading the last required
WAL
file.

/*
* Force a CHECKPOINT. Aside from being necessary to prevent torn
* page problems, this guarantees that two successive backup runs will
* have different checkpoint positions and hence different history
* file names, even if nothing happened in between.
*
* We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
* fast = true). Otherwise this can take awhile.
*/
RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
(fast ? CHECKPOINT_IMMEDIATE : 0));

This problem happens because the above code (in do_pg_start_backup)
actually doesn't ensure that the concurrent backups have the different
checkpoint locations. ISTM that we should change the above or elsewhere
to ensure that.


Yes, good point.


Here's a patch based on that approach, ensuring that each base backup
uses a different checkpoint as the start location. I think I'll commit
this, rather than invent a new unique ID mechanism for backups. The
latter would need changes in recovery and control file too, and I don't
feel like tinkering with that at this stage.


Ok, committed this.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-03-18 Thread Heikki Linnakangas


On 17.03.2011 21:39, Robert Haas wrote:

On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masaomasao.fu...@gmail.com  wrote:

On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Hmm, good point. It's harmless, but creating the history file in the first
place sure seems like a waste of time.


The attached patch changes pg_stop_backup so that it doesn't create
the backup history file if archiving is not enabled.

When I tested the multiple backups, I found that they can have the same
checkpoint location and the same history file name.


$ for ((i=0; i4; i++)); do
pg_basebackup -D test$i -c fast -x -l test$i
done

$ cat test0/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test0

$ cat test1/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test1

$ cat test2/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test2

$ cat test3/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test3

$ ls archive/*.backup
archive/00010002.00B0.backup


This would cause a serious problem. Because the backup-end record
which indicates the same START WAL LOCATION can be written by the
first backup before the other finishes. So we might think wrongly that
we've already reached a consistency state by reading the backup-end
record (written by the first backup) before reading the last required WAL
file.

/*
 * Force a CHECKPOINT.  Aside from being necessary to prevent 
torn
 * page problems, this guarantees that two successive backup 
runs will
 * have different checkpoint positions and hence different 
history
 * file names, even if nothing happened in between.
 *
 * We use CHECKPOINT_IMMEDIATE only if requested by user (via 
passing
 * fast = true).  Otherwise this can take awhile.
 */
RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
  (fast ? CHECKPOINT_IMMEDIATE 
: 0));

This problem happens because the above code (in do_pg_start_backup)
actually doesn't ensure that the concurrent backups have the different
checkpoint locations. ISTM that we should change the above or elsewhere
to ensure that.


Yes, good point.


Or we should include backup label name in the backup-end
record, to prevent a recovery from reading not-its-own backup-end record.


Backup labels are not guaranteed to be unique either, so including 
backup label in the backup-end-record doesn't solve the problem. But 
something else like a backup-start counter in shared memory or process 
id would work.


It won't make the history file names unique, though. Now that we use on 
the end-of-backup record for detecting end-of-backup, the history files 
are just for documenting purposes. Do we want to give up on history 
files for backups performed with pg_basebackup? Or we can include the 
backup counter or similar in the filename too.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-03-18 Thread Heikki Linnakangas


On 18.03.2011 10:48, Heikki Linnakangas wrote:

On 17.03.2011 21:39, Robert Haas wrote:

On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masaomasao.fu...@gmail.com
wrote:

On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

Hmm, good point. It's harmless, but creating the history file in the
first
place sure seems like a waste of time.


The attached patch changes pg_stop_backup so that it doesn't create
the backup history file if archiving is not enabled.

When I tested the multiple backups, I found that they can have the same
checkpoint location and the same history file name.


$ for ((i=0; i4; i++)); do
pg_basebackup -D test$i -c fast -x -l test$i
done

$ cat test0/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test0

$ cat test1/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test1

$ cat test2/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test2

$ cat test3/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test3

$ ls archive/*.backup
archive/00010002.00B0.backup


This would cause a serious problem. Because the backup-end record
which indicates the same START WAL LOCATION can be written by the
first backup before the other finishes. So we might think wrongly that
we've already reached a consistency state by reading the backup-end
record (written by the first backup) before reading the last required
WAL
file.

/*
* Force a CHECKPOINT. Aside from being necessary to prevent torn
* page problems, this guarantees that two successive backup runs will
* have different checkpoint positions and hence different history
* file names, even if nothing happened in between.
*
* We use CHECKPOINT_IMMEDIATE only if requested by user (via passing
* fast = true). Otherwise this can take awhile.
*/
RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
(fast ? CHECKPOINT_IMMEDIATE : 0));

This problem happens because the above code (in do_pg_start_backup)
actually doesn't ensure that the concurrent backups have the different
checkpoint locations. ISTM that we should change the above or elsewhere
to ensure that.


Yes, good point.


Here's a patch based on that approach, ensuring that each base backup 
uses a different checkpoint as the start location. I think I'll commit 
this, rather than invent a new unique ID mechanism for backups. The 
latter would need changes in recovery and control file too, and I don't 
feel like tinkering with that at this stage.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 15af669..570f02b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -355,10 +355,13 @@ typedef struct XLogCtlInsert
 	 * exclusiveBackup is true if a backup started with pg_start_backup() is
 	 * in progress, and nonExclusiveBackups is a counter indicating the number
 	 * of streaming base backups currently in progress. forcePageWrites is
-	 * set to true when either of these is non-zero.
+	 * set to true when either of these is non-zero. lastBackupStart is the
+	 * latest checkpoint redo location used as a starting point for an online
+	 * backup.
 	 */
 	bool		exclusiveBackup;
 	int			nonExclusiveBackups;
+	XLogRecPtr	lastBackupStart;
 } XLogCtlInsert;
 
 /*
@@ -8809,6 +8812,19 @@ do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
 		MAXPGPATH)));
 
 	/*
+	 * Force an XLOG file switch before the checkpoint, to ensure that the WAL
+	 * segment the checkpoint is written to doesn't contain pages with old
+	 * timeline IDs. That would otherwise happen if you called
+	 * pg_start_backup() right after restoring from a PITR archive: the first
+	 * WAL segment containing the startup checkpoint has pages in the
+	 * beginning with the old timeline ID. That can cause trouble at recovery:
+	 * we won't have a history file covering the old timeline if pg_xlog
+	 * directory was not included in the base backup and the WAL archive was
+	 * cleared too before starting the backup.
+	 */
+	RequestXLogSwitch();
+
+	/*
 	 * Mark backup active in shared memory.  We must do full-page WAL writes
 	 * during an on-line backup even if not doing so at other times, because
 	 * it's quite possible for the backup dump to obtain a torn (partially
@@ -8843,43 +8859,54 @@ do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
 	XLogCtl-Insert.forcePageWrites = true;
 	LWLockRelease(WALInsertLock);
 
-	/*
-	 * Force an XLOG file switch before the checkpoint, to

Re: [HACKERS] Allowing multiple concurrent base backups

2011-03-17 Thread Robert Haas

On Mon, Jan 31, 2011 at 10:45 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Hmm, good point. It's harmless, but creating the history file in the first
 place sure seems like a waste of time.

 The attached patch changes pg_stop_backup so that it doesn't create
 the backup history file if archiving is not enabled.

 When I tested the multiple backups, I found that they can have the same
 checkpoint location and the same history file name.

 
 $ for ((i=0; i4; i++)); do
 pg_basebackup -D test$i -c fast -x -l test$i 
 done

 $ cat test0/backup_label
 START WAL LOCATION: 0/2B0 (file 00010002)
 CHECKPOINT LOCATION: 0/2E8
 START TIME: 2011-02-01 12:12:31 JST
 LABEL: test0

 $ cat test1/backup_label
 START WAL LOCATION: 0/2B0 (file 00010002)
 CHECKPOINT LOCATION: 0/2E8
 START TIME: 2011-02-01 12:12:31 JST
 LABEL: test1

 $ cat test2/backup_label
 START WAL LOCATION: 0/2B0 (file 00010002)
 CHECKPOINT LOCATION: 0/2E8
 START TIME: 2011-02-01 12:12:31 JST
 LABEL: test2

 $ cat test3/backup_label
 START WAL LOCATION: 0/2B0 (file 00010002)
 CHECKPOINT LOCATION: 0/2E8
 START TIME: 2011-02-01 12:12:31 JST
 LABEL: test3

 $ ls archive/*.backup
 archive/00010002.00B0.backup
 

 This would cause a serious problem. Because the backup-end record
 which indicates the same START WAL LOCATION can be written by the
 first backup before the other finishes. So we might think wrongly that
 we've already reached a consistency state by reading the backup-end
 record (written by the first backup) before reading the last required WAL
 file.

                /*
                 * Force a CHECKPOINT.  Aside from being necessary to prevent 
 torn
                 * page problems, this guarantees that two successive backup 
 runs will
                 * have different checkpoint positions and hence different 
 history
                 * file names, even if nothing happened in between.
                 *
                 * We use CHECKPOINT_IMMEDIATE only if requested by user (via 
 passing
                 * fast = true).  Otherwise this can take awhile.
                 */
                RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
                                                  (fast ? CHECKPOINT_IMMEDIATE 
 : 0));

 This problem happens because the above code (in do_pg_start_backup)
 actually doesn't ensure that the concurrent backups have the different
 checkpoint locations. ISTM that we should change the above or elsewhere
 to ensure that. Or we should include backup label name in the backup-end
 record, to prevent a recovery from reading not-its-own backup-end record.

 Thought?

This patch is on the 9.1 open items list, but I don't understand it
well enough to know whether it's correct.  Can someone else pick it
up?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-31 Thread Heikki Linnakangas


On 25.01.2011 06:02, Fujii Masao wrote:

On Tue, Jan 25, 2011 at 6:02 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Hmm, perhaps the code would be more readable if instead of the
forcePageWrites counter that counts exclusive and non-exclusive backups, and
an exclusiveBackup boolean indicating if one of the in-progress backups is
an exclusive one, we had a counter that only counts non-exclusive backups,
plus a boolean indicating if an exclusive backup is in progress in addition
to them.

Attached is a patch for that (against master branch, including only xlog.c).


I read this patch and previous-posted one. Those look good.

Comments:

+ * do_pg_start_backup is the workhorse of the user-visible pg_stop_backup()
+ * function.

Typo: s/do_pg_start_backup/do_pg_stop_backup

It's helpful to explain about this behavior in pg_basebackup.sgml or elsewhere.


Thanks. I've committed this now, fixing that, and hopefully all the 
other issues mentioned in this thread.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-31 Thread Heikki Linnakangas


On 27.01.2011 15:15, Fujii Masao wrote:

When I read the patch, I found that pg_stop_backup removes the backup history
file as soon as it creates the file, if archive_mode is not enabled.
This looks like
oversight. We should prevent pg_stop_backup from removing the fresh history
file? Or we should prevent pg_stop_backup from creating the history
file from the
beginning since it's not used at all if archiving is disabled?
(If archiving is enabled, the history file can be used to clean the
archived files up).


Hmm, good point. It's harmless, but creating the history file in the 
first place sure seems like a waste of time.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-31 Thread Fujii Masao

On Tue, Feb 1, 2011 at 1:31 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Hmm, good point. It's harmless, but creating the history file in the first
 place sure seems like a waste of time.

The attached patch changes pg_stop_backup so that it doesn't create
the backup history file if archiving is not enabled.

When I tested the multiple backups, I found that they can have the same
checkpoint location and the same history file name.


$ for ((i=0; i4; i++)); do
pg_basebackup -D test$i -c fast -x -l test$i 
done

$ cat test0/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test0

$ cat test1/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test1

$ cat test2/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test2

$ cat test3/backup_label
START WAL LOCATION: 0/2B0 (file 00010002)
CHECKPOINT LOCATION: 0/2E8
START TIME: 2011-02-01 12:12:31 JST
LABEL: test3

$ ls archive/*.backup
archive/00010002.00B0.backup


This would cause a serious problem. Because the backup-end record
which indicates the same START WAL LOCATION can be written by the
first backup before the other finishes. So we might think wrongly that
we've already reached a consistency state by reading the backup-end
record (written by the first backup) before reading the last required WAL
file.

/*
 * Force a CHECKPOINT.  Aside from being necessary to prevent 
torn
 * page problems, this guarantees that two successive backup 
runs will
 * have different checkpoint positions and hence different 
history
 * file names, even if nothing happened in between.
 *
 * We use CHECKPOINT_IMMEDIATE only if requested by user (via 
passing
 * fast = true).  Otherwise this can take awhile.
 */
RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT |
  (fast ? CHECKPOINT_IMMEDIATE 
: 0));

This problem happens because the above code (in do_pg_start_backup)
actually doesn't ensure that the concurrent backups have the different
checkpoint locations. ISTM that we should change the above or elsewhere
to ensure that. Or we should include backup label name in the backup-end
record, to prevent a recovery from reading not-its-own backup-end record.

Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


not_create_histfile_if_not_arch_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-27 Thread Fujii Masao

On Tue, Jan 25, 2011 at 1:02 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Jan 25, 2011 at 6:02 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Hmm, perhaps the code would be more readable if instead of the
 forcePageWrites counter that counts exclusive and non-exclusive backups, and
 an exclusiveBackup boolean indicating if one of the in-progress backups is
 an exclusive one, we had a counter that only counts non-exclusive backups,
 plus a boolean indicating if an exclusive backup is in progress in addition
 to them.

 Attached is a patch for that (against master branch, including only xlog.c).

 I read this patch and previous-posted one. Those look good.

 Comments:

 + * do_pg_start_backup is the workhorse of the user-visible pg_stop_backup()
 + * function.

 Typo: s/do_pg_start_backup/do_pg_stop_backup

 It's helpful to explain about this behavior in pg_basebackup.sgml or 
 elsewhere.

When I read the patch, I found that pg_stop_backup removes the backup history
file as soon as it creates the file, if archive_mode is not enabled.
This looks like
oversight. We should prevent pg_stop_backup from removing the fresh history
file? Or we should prevent pg_stop_backup from creating the history
file from the
beginning since it's not used at all if archiving is disabled?
(If archiving is enabled, the history file can be used to clean the
archived files up).

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-24 Thread Heikki Linnakangas


On 13.01.2011 23:32, Heikki Linnakangas wrote:

Anyway, here's an updated patch with all the known issues fixed.


Another updated patch. Fixed bitrot, and addressed the privilege issue 
Fujii-san raised here: 
http://archives.postgresql.org/message-id/4d380560.3040...@enterprisedb.com. 
I changed the privilege checks so that pg_start/stop_backup() functions 
require superuser privileges again, but not for a base backup via the 
replication protocol (replication privilege is needed to establish a 
replication connection to begin with).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 60,67 
  
  
  /* File path names (all relative to $PGDATA) */
- #define BACKUP_LABEL_FILE		backup_label
- #define BACKUP_LABEL_OLD		backup_label.old
  #define RECOVERY_COMMAND_FILE	recovery.conf
  #define RECOVERY_COMMAND_DONE	recovery.done
  
--- 60,65 
***
*** 338,344  typedef struct XLogCtlInsert
  	XLogPageHeader currpage;	/* points to header of block in cache */
  	char	   *currpos;		/* current insertion point in cache */
  	XLogRecPtr	RedoRecPtr;		/* current redo point for insertions */
! 	bool		forcePageWrites;	/* forcing full-page writes for PITR? */
  } XLogCtlInsert;
  
  /*
--- 336,343 
  	XLogPageHeader currpage;	/* points to header of block in cache */
  	char	   *currpos;		/* current insertion point in cache */
  	XLogRecPtr	RedoRecPtr;		/* current redo point for insertions */
! 	int			forcePageWrites;	/* forcing full-page writes for PITR? */
! 	bool		exclusiveBackup;	/* a backup was started with pg_start_backup() */
  } XLogCtlInsert;
  
  /*
***
*** 8352,8367  pg_start_backup(PG_FUNCTION_ARGS)
  
  	backupidstr = text_to_cstring(backupid);
  
! 	startpoint = do_pg_start_backup(backupidstr, fast);
  
  	snprintf(startxlogstr, sizeof(startxlogstr), %X/%X,
  			 startpoint.xlogid, startpoint.xrecoff);
  	PG_RETURN_TEXT_P(cstring_to_text(startxlogstr));
  }
  
  XLogRecPtr
! do_pg_start_backup(const char *backupidstr, bool fast)
  {
  	XLogRecPtr	checkpointloc;
  	XLogRecPtr	startpoint;
  	pg_time_t	stamp_time;
--- 8351,8388 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	startpoint = do_pg_start_backup(backupidstr, fast, NULL);
  
  	snprintf(startxlogstr, sizeof(startxlogstr), %X/%X,
  			 startpoint.xlogid, startpoint.xrecoff);
  	PG_RETURN_TEXT_P(cstring_to_text(startxlogstr));
  }
  
+ /*
+  * do_pg_start_backup is the workhorse of the user-visible pg_start_backup()
+  * function. It creates the necessary starting checkpoint and constructs the
+  * backup label file.
+  * 
+  * There are two kind of backups: exclusive and non-exclusive. An exclusive
+  * backup is started with pg_start_backup(), and there can be only one active
+  * at a time. The backup label file of an exclusive backup is written to
+  * $PGDATA/backup_label, and it is removed by pg_stop_backup().
+  *
+  * A non-exclusive backup is used for the streaming base backups (see
+  * src/backend/replication/basebackup.c). The difference to exclusive backups
+  * is that the backup label file is not written to disk. Instead, its would-be
+  * contents are returned in *labelfile, and the caller is responsible for
+  * including it in the backup archive as 'backup_label'. There can be many
+  * non-exclusive backups active at the same time, and they don't conflict
+  * with an exclusive backup either.
+  *
+  * Every successfully started non-exclusive backup must be stopped by calling
+  * do_pg_stop_backup() or do_pg_abort_backup().
+  */
  XLogRecPtr
! do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
  {
+ 	bool		exclusive = (labelfile == NULL);
  	XLogRecPtr	checkpointloc;
  	XLogRecPtr	startpoint;
  	pg_time_t	stamp_time;
***
*** 8371,8381  do_pg_start_backup(const char *backupidstr, bool fast)
  	uint32		_logSeg;
  	struct stat stat_buf;
  	FILE	   *fp;
  
! 	if (!superuser()  !is_authenticated_user_replication_role())
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  errmsg(must be superuser or replication role to run a backup)));
  
  	if (RecoveryInProgress())
  		ereport(ERROR,
--- 8392,8403 
  	uint32		_logSeg;
  	struct stat stat_buf;
  	FILE	   *fp;
+ 	StringInfoData labelfbuf;
  
! 	if (exclusive  !superuser())
  		ereport(ERROR,
  (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
!  errmsg(must be superuser to run a backup)));
  
  	if (RecoveryInProgress())
  		ereport(ERROR,
***
*** 8389,8394  do_pg_start_backup(const char *backupidstr, bool fast)
--- 8411,8422 
  			  errmsg(WAL level not sufficient for making an online backup),
   errhint(wal_level must be set to \archive\ or \hot_standby\ at server start.)));
  
+ 	if (strlen(backupidstr)  MAXPGPATH)
+ 		ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+  errmsg(backup label too long (max %d

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-24 Thread Magnus Hagander

On Mon, Jan 24, 2011 at 17:52, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 13.01.2011 23:32, Heikki Linnakangas wrote:

 Anyway, here's an updated patch with all the known issues fixed.

 Another updated patch. Fixed bitrot, and addressed the privilege issue
 Fujii-san raised here:
 http://archives.postgresql.org/message-id/4d380560.3040...@enterprisedb.com.
 I changed the privilege checks so that pg_start/stop_backup() functions
 require superuser privileges again, but not for a base backup via the
 replication protocol (replication privilege is needed to establish a
 replication connection to begin with).

I'm not entirely sure the replication privilege removal is correct.
Doing that, it's no longer possible to deploy a slave *without* using
pg_basebackup, unless you are superuser. Do we really want to put that
restriction back in?

(And if we do, the docs proably need an update...)

I can't see an explicit check for the user ttempting to do
pg_stop_backup() when there is a nonexclusive backup running? Maybe
I'm reading it wrong? The case being when a user has started a backup
with pg_basebackup but then connects and manually does a
pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the
decrement, but also doesn't throw an error?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-24 Thread Heikki Linnakangas


On 24.01.2011 20:22, Magnus Hagander wrote:

On Mon, Jan 24, 2011 at 17:52, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

Another updated patch. Fixed bitrot, and addressed the privilege issue
Fujii-san raised here:
http://archives.postgresql.org/message-id/4d380560.3040...@enterprisedb.com.
I changed the privilege checks so that pg_start/stop_backup() functions
require superuser privileges again, but not for a base backup via the
replication protocol (replication privilege is needed to establish a
replication connection to begin with).


I'm not entirely sure the replication privilege removal is correct.
Doing that, it's no longer possible to deploy a slave *without* using
pg_basebackup, unless you are superuser. Do we really want to put that
restriction back in?


Hmm, I thought we do, I thought that was changed just to make 
pg_basebackup work without superuser privileges. But I guess it makes 
sense apart from that. So the question is, should 'replication' 
privilege be enough to use pg_start/stop_backup(), or should we require 
superuser access for that? In any case, replication privilege will be 
enough for pg_basebackup.


Ok, I won't touch that. But then we'll need to decide what to do about 
Fujii's observation 
(http://archives.postgresql.org/pgsql-hackers/2011-01/msg01934.php):

Both the user with REPLICATION privilege and the superuser can
call pg_stop_backup. But only superuser can connect to the server
to cancel online backup during shutdown. The non-superuser with
REPLICATION privilege cannot. Is this behavior intentional? Or just
oversight?




I can't see an explicit check for the user ttempting to do
pg_stop_backup() when there is a nonexclusive backup running? Maybe
I'm reading it wrong? The case being when a user has started a backup
with pg_basebackup but then connects and manually does a
pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the
decrement, but also doesn't throw an error?


It throws an error later when it won't find backup_label. For better or 
worse, it's always been like that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-24 Thread Magnus Hagander

On Mon, Jan 24, 2011 at 21:14, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 24.01.2011 20:22, Magnus Hagander wrote:
 I can't see an explicit check for the user ttempting to do
 pg_stop_backup() when there is a nonexclusive backup running? Maybe
 I'm reading it wrong? The case being when a user has started a backup
 with pg_basebackup but then connects and manually does a
 pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the
 decrement, but also doesn't throw an error?

 It throws an error later when it won't find backup_label. For better or
 worse, it's always been like that.

Isn't that going to leave us in a broken state though? As in a
mistaken pg_stop_backup() will decrement the counter both breaking the
streaming base backup, and also possibly throwing an assert when that
one eventually wants to do it's do_pg_stop_backup()?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-24 Thread Heikki Linnakangas


On 24.01.2011 22:31, Magnus Hagander wrote:

On Mon, Jan 24, 2011 at 21:14, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 24.01.2011 20:22, Magnus Hagander wrote:

I can't see an explicit check for the user ttempting to do
pg_stop_backup() when there is a nonexclusive backup running? Maybe
I'm reading it wrong? The case being when a user has started a backup
with pg_basebackup but then connects and manually does a
pg_stop_backup. ISTM it drops us ina codepath that just doesn't do the
decrement, but also doesn't throw an error?


It throws an error later when it won't find backup_label. For better or
worse, it's always been like that.


Isn't that going to leave us in a broken state though? As in a
mistaken pg_stop_backup() will decrement the counter both breaking the
streaming base backup, and also possibly throwing an assert when that
one eventually wants to do it's do_pg_stop_backup()?


Umm, no. pg_stop_backup() won't decrement the counter unless there's an 
exclusive backup in operation according to the exclusiveBackup flag.


Hmm, perhaps the code would be more readable if instead of the 
forcePageWrites counter that counts exclusive and non-exclusive backups, 
and an exclusiveBackup boolean indicating if one of the in-progress 
backups is an exclusive one, we had a counter that only counts 
non-exclusive backups, plus a boolean indicating if an exclusive backup 
is in progress in addition to them.


Attached is a patch for that (against master branch, including only xlog.c).

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 60,67 
  
  
  /* File path names (all relative to $PGDATA) */
- #define BACKUP_LABEL_FILE		backup_label
- #define BACKUP_LABEL_OLD		backup_label.old
  #define RECOVERY_COMMAND_FILE	recovery.conf
  #define RECOVERY_COMMAND_DONE	recovery.done
  
--- 60,65 
***
*** 339,344  typedef struct XLogCtlInsert
--- 337,351 
  	char	   *currpos;		/* current insertion point in cache */
  	XLogRecPtr	RedoRecPtr;		/* current redo point for insertions */
  	bool		forcePageWrites;	/* forcing full-page writes for PITR? */
+ 
+ 	/*
+ 	 * exclusiveBackup is true if a backup started with pg_start_backup() is
+ 	 * in progress, and nonExclusiveBackups is a counter indicating the number
+ 	 * of streaming base backups currently in progress. forcePageWrites is
+ 	 * set to true, when either of these is non-zero.
+ 	 */
+ 	bool		exclusiveBackup;
+ 	int			nonExclusiveBackups;
  } XLogCtlInsert;
  
  /*
***
*** 8352,8367  pg_start_backup(PG_FUNCTION_ARGS)
  
  	backupidstr = text_to_cstring(backupid);
  
! 	startpoint = do_pg_start_backup(backupidstr, fast);
  
  	snprintf(startxlogstr, sizeof(startxlogstr), %X/%X,
  			 startpoint.xlogid, startpoint.xrecoff);
  	PG_RETURN_TEXT_P(cstring_to_text(startxlogstr));
  }
  
  XLogRecPtr
! do_pg_start_backup(const char *backupidstr, bool fast)
  {
  	XLogRecPtr	checkpointloc;
  	XLogRecPtr	startpoint;
  	pg_time_t	stamp_time;
--- 8359,8396 
  
  	backupidstr = text_to_cstring(backupid);
  
! 	startpoint = do_pg_start_backup(backupidstr, fast, NULL);
  
  	snprintf(startxlogstr, sizeof(startxlogstr), %X/%X,
  			 startpoint.xlogid, startpoint.xrecoff);
  	PG_RETURN_TEXT_P(cstring_to_text(startxlogstr));
  }
  
+ /*
+  * do_pg_start_backup is the workhorse of the user-visible pg_start_backup()
+  * function. It creates the necessary starting checkpoint and constructs the
+  * backup label file.
+  * 
+  * There are two kind of backups: exclusive and non-exclusive. An exclusive
+  * backup is started with pg_start_backup(), and there can be only one active
+  * at a time. The backup label file of an exclusive backup is written to
+  * $PGDATA/backup_label, and it is removed by pg_stop_backup().
+  *
+  * A non-exclusive backup is used for the streaming base backups (see
+  * src/backend/replication/basebackup.c). The difference to exclusive backups
+  * is that the backup label file is not written to disk. Instead, its would-be
+  * contents are returned in *labelfile, and the caller is responsible for
+  * including it in the backup archive as 'backup_label'. There can be many
+  * non-exclusive backups active at the same time, and they don't conflict
+  * with an exclusive backup either.
+  *
+  * Every successfully started non-exclusive backup must be stopped by calling
+  * do_pg_stop_backup() or do_pg_abort_backup().
+  */
  XLogRecPtr
! do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
  {
+ 	bool		exclusive = (labelfile == NULL);
  	XLogRecPtr	checkpointloc;
  	XLogRecPtr	startpoint;
  	pg_time_t	stamp_time;
***
*** 8371,8381  do_pg_start_backup(const char *backupidstr, bool fast)
  	uint32		_logSeg;
  	struct stat stat_buf;
  	FILE	   *fp;
  
! 	if (!superuser()  !is_authenticated_user_replication_role())
  		ereport(ERROR,

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-24 Thread Fujii Masao

On Tue, Jan 25, 2011 at 5:14 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I'm not entirely sure the replication privilege removal is correct.
 Doing that, it's no longer possible to deploy a slave *without* using
 pg_basebackup, unless you are superuser. Do we really want to put that
 restriction back in?

 Hmm, I thought we do, I thought that was changed just to make pg_basebackup
 work without superuser privileges.

If we encourage users not to use the replication privilege to log in
the database, putting that restriction seems to be reasonable.

 Ok, I won't touch that. But then we'll need to decide what to do about
 Fujii's observation
 (http://archives.postgresql.org/pgsql-hackers/2011-01/msg01934.php):

Yes. If we allow the replication users to call pg_start/stop_backup,
we also allow them to connect to the database even during shutdown
in order to cancel the backup.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-24 Thread Fujii Masao

On Tue, Jan 25, 2011 at 6:02 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Hmm, perhaps the code would be more readable if instead of the
 forcePageWrites counter that counts exclusive and non-exclusive backups, and
 an exclusiveBackup boolean indicating if one of the in-progress backups is
 an exclusive one, we had a counter that only counts non-exclusive backups,
 plus a boolean indicating if an exclusive backup is in progress in addition
 to them.

 Attached is a patch for that (against master branch, including only xlog.c).

I read this patch and previous-posted one. Those look good.

Comments:

+ * do_pg_start_backup is the workhorse of the user-visible pg_stop_backup()
+ * function.

Typo: s/do_pg_start_backup/do_pg_stop_backup

It's helpful to explain about this behavior in pg_basebackup.sgml or elsewhere.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-14 Thread Magnus Hagander

On Thu, Jan 13, 2011 at 22:32, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 13.01.2011 22:57, Josh Berkus wrote:

 On 1/13/11 12:11 PM, Robert Haas wrote:

 That's going to depend on the situation.  If the database fits in
 memory, then it's just going to work.  If it fits on disk, it's less
 obvious whether it'll be good or bad, but an arbitrary limitation here
 doesn't serve us well.

 FWIW, if we had this feature right now in 9.0 we (PGX) would be using
 it.  We run into the case of DB in memory, multiple slaves fairly often
 these days.

 Anyway, here's an updated patch with all the known issues fixed.

It's kind of crude that do_pg_stop_backup() has to parse the
backuplabel with sscanf() when it's coming from a non-exclusive backup
- we could pass the location as a parameter instead, to make it
cleaner. There might be a point to it being the same in both
codepaths, but I'm not sure it's that important...

Doesn't this code put the backup label in *every* tarfile, and not
just the main one? From what I can tell the code in
SendBackupDirectory() relies on labelfile being NULL in tar files for
other tablespaces, but the caller never sets this.

Finally, I'd move the addition of the labelfile to it's own function,
but that's just me ;)


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-14 Thread Simon Riggs

On Thu, 2011-01-13 at 23:32 +0200, Heikki Linnakangas wrote:
 On 13.01.2011 22:57, Josh Berkus wrote:
  On 1/13/11 12:11 PM, Robert Haas wrote:
  That's going to depend on the situation.  If the database fits in
  memory, then it's just going to work.  If it fits on disk, it's less
  obvious whether it'll be good or bad, but an arbitrary limitation here
  doesn't serve us well.
 
  FWIW, if we had this feature right now in 9.0 we (PGX) would be using
  it.  We run into the case of DB in memory, multiple slaves fairly often
  these days.
 
 Anyway, here's an updated patch with all the known issues fixed.

It's good we have this as an option and I like the way you've done this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-13 Thread Ross J. Reedstrom

On Tue, Jan 11, 2011 at 11:06:18AM -0800, Josh Berkus wrote:
 
  It makes it very convenient to set up standbys, without having to worry
  that you'll conflict e.g with a nightly backup. I don't imagine people
  will use streaming base backups for very large databases anyway.
 
 Also, imagine that you're provisioning a 10-node replication cluster on
 EC2.  This would make that worlds easier.

Hmm, perhaps. My concern is that a naive attempt to do that is going to
have 10 base-backups happening at the same time, completely slamming the
master, and none of them completing is a reasonable time. Is this
possible, or is it that simultaneity will buy you hot caches and backup
#2 - #10 all run faster?

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer  Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-13 Thread Robert Haas

On Thu, Jan 13, 2011 at 2:19 PM, Ross J. Reedstrom reeds...@rice.edu wrote:
 On Tue, Jan 11, 2011 at 11:06:18AM -0800, Josh Berkus wrote:

  It makes it very convenient to set up standbys, without having to worry
  that you'll conflict e.g with a nightly backup. I don't imagine people
  will use streaming base backups for very large databases anyway.

 Also, imagine that you're provisioning a 10-node replication cluster on
 EC2.  This would make that worlds easier.

 Hmm, perhaps. My concern is that a naive attempt to do that is going to
 have 10 base-backups happening at the same time, completely slamming the
 master, and none of them completing is a reasonable time. Is this
 possible, or is it that simultaneity will buy you hot caches and backup
 #2 - #10 all run faster?

That's going to depend on the situation.  If the database fits in
memory, then it's just going to work.  If it fits on disk, it's less
obvious whether it'll be good or bad, but an arbitrary limitation here
doesn't serve us well.

P.S. Your reply-to header is busted.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-13 Thread Josh Berkus

On 1/13/11 12:11 PM, Robert Haas wrote:
 That's going to depend on the situation.  If the database fits in
 memory, then it's just going to work.  If it fits on disk, it's less
 obvious whether it'll be good or bad, but an arbitrary limitation here
 doesn't serve us well.

FWIW, if we had this feature right now in 9.0 we (PGX) would be using
it.  We run into the case of DB in memory, multiple slaves fairly often
these days.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-13 Thread Heikki Linnakangas


On 13.01.2011 22:57, Josh Berkus wrote:

On 1/13/11 12:11 PM, Robert Haas wrote:

That's going to depend on the situation.  If the database fits in
memory, then it's just going to work.  If it fits on disk, it's less
obvious whether it'll be good or bad, but an arbitrary limitation here
doesn't serve us well.


FWIW, if we had this feature right now in 9.0 we (PGX) would be using
it.  We run into the case of DB in memory, multiple slaves fairly often
these days.


Anyway, here's an updated patch with all the known issues fixed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5b6a230..400e12e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -60,8 +60,6 @@
 
 
 /* File path names (all relative to $PGDATA) */
-#define BACKUP_LABEL_FILE		backup_label
-#define BACKUP_LABEL_OLD		backup_label.old
 #define RECOVERY_COMMAND_FILE	recovery.conf
 #define RECOVERY_COMMAND_DONE	recovery.done
 
@@ -338,7 +336,8 @@ typedef struct XLogCtlInsert
 	XLogPageHeader currpage;	/* points to header of block in cache */
 	char	   *currpos;		/* current insertion point in cache */
 	XLogRecPtr	RedoRecPtr;		/* current redo point for insertions */
-	bool		forcePageWrites;	/* forcing full-page writes for PITR? */
+	int			forcePageWrites;	/* forcing full-page writes for PITR? */
+	bool		exclusiveBackup;	/* a backup was started with pg_start_backup() */
 } XLogCtlInsert;
 
 /*
@@ -8313,16 +8312,38 @@ pg_start_backup(PG_FUNCTION_ARGS)
 
 	backupidstr = text_to_cstring(backupid);
 
-	startpoint = do_pg_start_backup(backupidstr, fast);
+	startpoint = do_pg_start_backup(backupidstr, fast, NULL);
 
 	snprintf(startxlogstr, sizeof(startxlogstr), %X/%X,
 			 startpoint.xlogid, startpoint.xrecoff);
 	PG_RETURN_TEXT_P(cstring_to_text(startxlogstr));
 }
 
+/*
+ * do_pg_start_backup is the workhorse of the user-visible pg_start_backup()
+ * function. It creates the necessary starting checkpoint and constructs the
+ * backup label file.
+ * 
+ * There are two kind of backups: exclusive and non-exclusive. An exclusive
+ * backup is started with pg_start_backup(), and there can be only one active
+ * at a time. The backup label file of an exclusive backup is written to
+ * $PGDATA/backup_label, and it is removed by pg_stop_backup().
+ *
+ * A non-exclusive backup is used for the streaming base backups (see
+ * src/backend/replication/basebackup.c). The difference to exclusive backups
+ * is that the backup label file is not written to disk. Instead, its would-be
+ * contents are returned in *labelfile, and the caller is responsible for
+ * including it in the backup archive as 'backup_label'. There can be many
+ * non-exclusive backups active at the same time, and they don't conflict
+ * with exclusive backups either.
+ *
+ * Every successfully started non-exclusive backup must be stopped by calling
+ * do_pg_stop_backup() or do_pg_abort_backup().
+ */
 XLogRecPtr
-do_pg_start_backup(const char *backupidstr, bool fast)
+do_pg_start_backup(const char *backupidstr, bool fast, char **labelfile)
 {
+	bool		exclusive = (labelfile == NULL);
 	XLogRecPtr	checkpointloc;
 	XLogRecPtr	startpoint;
 	pg_time_t	stamp_time;
@@ -8332,6 +8353,7 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	uint32		_logSeg;
 	struct stat stat_buf;
 	FILE	   *fp;
+	StringInfoData labelfbuf;
 
 	if (!superuser()  !is_authenticated_user_replication_role())
 		ereport(ERROR,
@@ -8350,6 +8372,12 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 			  errmsg(WAL level not sufficient for making an online backup),
  errhint(wal_level must be set to \archive\ or \hot_standby\ at server start.)));
 
+	if (strlen(backupidstr)  MAXPGPATH)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg(backup label too long (max %d bytes),
+		MAXPGPATH)));
+
 	/*
 	 * Mark backup active in shared memory.  We must do full-page WAL writes
 	 * during an on-line backup even if not doing so at other times, because
@@ -8368,15 +8396,19 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	 * ensure adequate interlocking against XLogInsert().
 	 */
 	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
-	if (XLogCtl-Insert.forcePageWrites)
+	if (exclusive)
 	{
-		LWLockRelease(WALInsertLock);
-		ereport(ERROR,
-(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg(a backup is already in progress),
- errhint(Run pg_stop_backup() and try again.)));
+		if (XLogCtl-Insert.exclusiveBackup)
+		{
+			LWLockRelease(WALInsertLock);
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg(a backup is already in progress),
+	 errhint(Run pg_stop_backup() and try again.)));
+		}
+		XLogCtl-Insert.exclusiveBackup = true;
 	}
-	XLogCtl-Insert.forcePageWrites = true;
+	XLogCtl-Insert.forcePageWrites++;
 	LWLockRelease(WALInsertLock);
 
 	/*
@@ -8393,7 +8425,7 @@

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-12 Thread marcin mank

On Tue, Jan 11, 2011 at 8:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Tue, Jan 11, 2011 at 19:51, Tom Lane t...@sss.pgh.pa.us wrote:
 Seems like either one of these is fairly problematic in that you have to
 have some monstrous kluge to get the backup_label file to appear with
 the right name in the tarfile.  How badly do we actually need this?
 I don't think the use-case for concurrent base backups is all that large
 in practice given the I/O hit it's going to involve.

 I think it can be done cleaner in the tar file injection - I've been
 chatting with Heikki offlist about that. Not sure, but I have a
 feeling it does.

 One point that I'm particularly interested to see how you'll kluge it
 is ensuring that the tarball contains only the desired temp data and not
 also the real $PGDATA/backup_label, should there be a normal base
 backup being done concurrently with the streamed one.

 The whole thing just seems too fragile and dangerous to be worth dealing
 with given that actual usage will be a corner case.  *I* sure wouldn't
 trust it to work when the chips were down.


Maybe if pg_start_backup() notices that there is another backup
running should block waiting for another session to run
pg_stop_backup() ? Or have a new function like pg_start_backup_wait()
?

Considering that parallell base backups would be io-bound (or
network-bound), there is little need to actually run them in parallell
.

Greetings
Marcin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-12 Thread David Fetter

On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote:
 On Tue, Jan 11, 2011 at 8:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Magnus Hagander mag...@hagander.net writes:
  On Tue, Jan 11, 2011 at 19:51, Tom Lane t...@sss.pgh.pa.us wrote:
  Seems like either one of these is fairly problematic in that you have to
  have some monstrous kluge to get the backup_label file to appear with
  the right name in the tarfile.  How badly do we actually need this?
  I don't think the use-case for concurrent base backups is all that large
  in practice given the I/O hit it's going to involve.
 
  I think it can be done cleaner in the tar file injection - I've been
  chatting with Heikki offlist about that. Not sure, but I have a
  feeling it does.
 
  One point that I'm particularly interested to see how you'll kluge it
  is ensuring that the tarball contains only the desired temp data and not
  also the real $PGDATA/backup_label, should there be a normal base
  backup being done concurrently with the streamed one.
 
  The whole thing just seems too fragile and dangerous to be worth dealing
  with given that actual usage will be a corner case.  *I* sure wouldn't
  trust it to work when the chips were down.
 
 Maybe if pg_start_backup() notices that there is another backup
 running should block waiting for another session to run
 pg_stop_backup() ? Or have a new function like pg_start_backup_wait()
 ?
 
 Considering that parallell base backups would be io-bound (or
 network-bound), there is little need to actually run them in parallell

That's not actually true.  Backups at the moment are CPU-bound, and
running them in parallel is one way to make them closer to I/O-bound,
which is what they *should* be.

There are other proposals out there, and some work being done, to make
backups less dependent on CPU, among them:

- Making the on-disk representation smaller
- Making COPY more efficient

As far as I know, none of this work is public yet.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-12 Thread Heikki Linnakangas


On 12.01.2011 17:15, David Fetter wrote:

On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote:

Considering that parallell base backups would be io-bound (or
network-bound), there is little need to actually run them in parallell


That's not actually true.  Backups at the moment are CPU-bound, and
running them in parallel is one way to make them closer to I/O-bound,
which is what they *should* be.


That's a different kind of parallel. We're talking about taking 
multiple backups in parallel, each using one process, and you're talking 
about taking one backup using multiple parallel processes or threads.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-12 Thread David Fetter

On Wed, Jan 12, 2011 at 05:17:38PM +0200, Heikki Linnakangas wrote:
 On 12.01.2011 17:15, David Fetter wrote:
 On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote:
 Considering that parallell base backups would be io-bound (or
 network-bound), there is little need to actually run them in parallell
 
 That's not actually true.  Backups at the moment are CPU-bound, and
 running them in parallel is one way to make them closer to I/O-bound,
 which is what they *should* be.
 
 That's a different kind of parallel. We're talking about taking
 multiple backups in parallel, each using one process, and you're
 talking about taking one backup using multiple parallel processes or
 threads.

Good point.  The idea that IO/network bandwidth is always saturated by
one backup just isn't true, though.  Take the case of multiple
independent NICs, e.g.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-12 Thread Aidan Van Dyk

On Wed, Jan 12, 2011 at 10:15 AM, David Fetter da...@fetter.org wrote:

Considering that parallell base backups would be io-bound (or
network-bound), there is little need to actually run them in parallell

That's not actually true. Backups at the moment are CPU-bound, and
running them in parallel is one way to make them closer to I/O-bound,
which is what they *should* be.

Remember, we're talking about filesystem base backups here. If you're
CPU can't handle a stream from disk - network, byte for byte (maybe
encrypting it), then you've spend *WY* to much on your storage
sub-system, and way to little on CPU.

I can see trying to parallize the base backup such that each
table-space could be run concurrently, but that's about it.

There are other proposals out there, and some work being done, to make
backups less dependent on CPU, among them:

- Making the on-disk representation smaller
- Making COPY more efficient

As far as I know, none of this work is public yet.

pg_dump is another story. But it's not related to base backups for
PIT Recovery/Replication.

--
Aidan Van Dyk Create like a god,
ai...@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-12 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 On 12.01.2011 17:15, David Fetter wrote:
 On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote:
 Considering that parallell base backups would be io-bound (or
 network-bound), there is little need to actually run them in parallell
 
 That's not actually true.  Backups at the moment are CPU-bound, and
 running them in parallel is one way to make them closer to I/O-bound,
 which is what they *should* be.

 That's a different kind of parallel. We're talking about taking 
 multiple backups in parallel, each using one process, and you're talking 
 about taking one backup using multiple parallel processes or threads.

Even more to the point, you're confusing pg_dump with a base backup.
The reason pg_dump eats a lot of CPU is primarily COPY's data conversion
and formatting requirements, none of which will happen in a base backup
(streaming or otherwise).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-12 Thread David Fetter

On Wed, Jan 12, 2011 at 10:24:31AM -0500, Tom Lane wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
  On 12.01.2011 17:15, David Fetter wrote:
  On Wed, Jan 12, 2011 at 10:26:05AM +0100, marcin mank wrote:
  Considering that parallell base backups would be io-bound (or
  network-bound), there is little need to actually run them in
  parallell
  
  That's not actually true.  Backups at the moment are CPU-bound,
  and running them in parallel is one way to make them closer to
  I/O-bound, which is what they *should* be.
 
  That's a different kind of parallel. We're talking about taking
  multiple backups in parallel, each using one process, and you're
  talking about taking one backup using multiple parallel processes
  or threads.
 
 Even more to the point, you're confusing pg_dump with a base backup.
 The reason pg_dump eats a lot of CPU is primarily COPY's data
 conversion and formatting requirements, none of which will happen in
 a base backup (streaming or otherwise).

Oops.  That'll teach me to post before coffee. :P

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas

Now that we have a basic over-the-wire base backup capability in 
walsender, it would be nice to allow taking multiple base backups at the 
same time. It might not seem very useful at first, but it makes it 
easier to set up standbys for small databases. At the moment, if you 
want to set up two standbys, you have to either take a single base 
backup and distribute it to both standbys, or somehow coordinate that 
they don't try to take the base backup at the same time. Also, you don't 
want initializing a standby to conflict with a nightly backup cron script.


So, this patch modifies the internal do_pg_start/stop_backup functions 
so that in addition to the traditional mode of operation, where a 
backup_label file is created in the data directory where it's backed up 
along with all other files, the backup label file is be returned to the 
caller, and the caller is responsible for including it in the backup. 
The code in replication/basebackup.c includes it in the tar file that's 
streamed the client, as backup_label.


To make that safe, I've changed forcePageWrites into an integer. 
Whenever a backup is started, it's incremented, and when one ends, it's 
decremented. When forcePageWrites == 0, there's no backup in progress.


The user-visible pg_start_backup() function is not changed. You can only 
have one backup started that way in progress at a time. But you can do 
streaming base backups at the same time with traditional pg_start_backup().


I implemented this in two ways, and can't decide which I like better:

1. The contents of the backup label file are returned to the caller of 
do_pg_start_backup() as a palloc'd string.


2. do_pg_start_backup() creates a temporary file that the backup label 
is written to (instead of backup_label).


Implementation 1 changes more code, as pg_start/stop_backup() need to be 
changed to write/read from memory instead of file, but the result isn't 
any more complicated. Nevertheless, I somehow feel more comfortable with 2.


Patches for both approaches attached. They're also available in my 
github repository at g...@github.com:hlinnaka/postgres.git.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 5b6a230..cc4d46e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -338,7 +338,8 @@ typedef struct XLogCtlInsert
 	XLogPageHeader currpage;	/* points to header of block in cache */
 	char	   *currpos;		/* current insertion point in cache */
 	XLogRecPtr	RedoRecPtr;		/* current redo point for insertions */
-	bool		forcePageWrites;	/* forcing full-page writes for PITR? */
+	int			forcePageWrites;	/* forcing full-page writes for PITR? */
+	bool		exclusiveBackup;	/* a backup was started with pg_start_backup() */
 } XLogCtlInsert;
 
 /*
@@ -8313,7 +8314,7 @@ pg_start_backup(PG_FUNCTION_ARGS)
 
 	backupidstr = text_to_cstring(backupid);
 
-	startpoint = do_pg_start_backup(backupidstr, fast);
+	startpoint = do_pg_start_backup(backupidstr, fast, true, NULL);
 
 	snprintf(startxlogstr, sizeof(startxlogstr), %X/%X,
 			 startpoint.xlogid, startpoint.xrecoff);
@@ -8321,7 +8322,7 @@ pg_start_backup(PG_FUNCTION_ARGS)
 }
 
 XLogRecPtr
-do_pg_start_backup(const char *backupidstr, bool fast)
+do_pg_start_backup(const char *backupidstr, bool fast, bool exclusive, char **labelfile)
 {
 	XLogRecPtr	checkpointloc;
 	XLogRecPtr	startpoint;
@@ -8332,6 +8333,7 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	uint32		_logSeg;
 	struct stat stat_buf;
 	FILE	   *fp;
+	StringInfoData labelfbuf;
 
 	if (!superuser()  !is_authenticated_user_replication_role())
 		ereport(ERROR,
@@ -8368,15 +8370,19 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	 * ensure adequate interlocking against XLogInsert().
 	 */
 	LWLockAcquire(WALInsertLock, LW_EXCLUSIVE);
-	if (XLogCtl-Insert.forcePageWrites)
+	if (exclusive)
 	{
-		LWLockRelease(WALInsertLock);
-		ereport(ERROR,
-(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg(a backup is already in progress),
- errhint(Run pg_stop_backup() and try again.)));
+		if (XLogCtl-Insert.exclusiveBackup)
+		{
+			LWLockRelease(WALInsertLock);
+			ereport(ERROR,
+	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+	 errmsg(a backup is already in progress),
+	 errhint(Run pg_stop_backup() and try again.)));
+		}
+		XLogCtl-Insert.exclusiveBackup = true;
 	}
-	XLogCtl-Insert.forcePageWrites = true;
+	XLogCtl-Insert.forcePageWrites++;
 	LWLockRelease(WALInsertLock);
 
 	/*
@@ -8393,7 +8399,7 @@ do_pg_start_backup(const char *backupidstr, bool fast)
 	RequestXLogSwitch();
 
 	/* Ensure we release forcePageWrites if fail below */
-	PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) 0);
+	PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive));
 	{
 		/*
 		 * Force a CHECKPOINT.	Aside from being necessary to prevent torn
@@ -8420,54

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Tom Lane

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I implemented this in two ways, and can't decide which I like better:

 1. The contents of the backup label file are returned to the caller of 
 do_pg_start_backup() as a palloc'd string.

 2. do_pg_start_backup() creates a temporary file that the backup label 
 is written to (instead of backup_label).

 Implementation 1 changes more code, as pg_start/stop_backup() need to be 
 changed to write/read from memory instead of file, but the result isn't 
 any more complicated. Nevertheless, I somehow feel more comfortable with 2.

Seems like either one of these is fairly problematic in that you have to
have some monstrous kluge to get the backup_label file to appear with
the right name in the tarfile.  How badly do we actually need this?
I don't think the use-case for concurrent base backups is all that large
in practice given the I/O hit it's going to involve.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 19:51, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 I implemented this in two ways, and can't decide which I like better:

 1. The contents of the backup label file are returned to the caller of
 do_pg_start_backup() as a palloc'd string.

 2. do_pg_start_backup() creates a temporary file that the backup label
 is written to (instead of backup_label).

 Implementation 1 changes more code, as pg_start/stop_backup() need to be
 changed to write/read from memory instead of file, but the result isn't
 any more complicated. Nevertheless, I somehow feel more comfortable with 2.

 Seems like either one of these is fairly problematic in that you have to
 have some monstrous kluge to get the backup_label file to appear with
 the right name in the tarfile.  How badly do we actually need this?
 I don't think the use-case for concurrent base backups is all that large
 in practice given the I/O hit it's going to involve.

I think it can be done cleaner in the tar file injection - I've been
chatting with Heikki offlist about that. Not sure, but I have a
feeling it does.

As for the use-case, it doesn't necessarily involve a huge I/O hit if
either the entire DB is in RAM (not all that uncommon - though then
the backup finishes quickly as well), or if the *client* is slow, thus
making the server backlogged on sending the base backup.

Having it possible to do it concurrently also makes for *much* easier
use of the feature. It might be just about overlapping with a few
seconds, for example - which would probably have no major effect, but
takes a lot more code on the other end to work around.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 20:51, Tom Lane wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

I implemented this in two ways, and can't decide which I like better:



1. The contents of the backup label file are returned to the caller of
do_pg_start_backup() as a palloc'd string.



2. do_pg_start_backup() creates a temporary file that the backup label
is written to (instead of backup_label).



Implementation 1 changes more code, as pg_start/stop_backup() need to be
changed to write/read from memory instead of file, but the result isn't
any more complicated. Nevertheless, I somehow feel more comfortable with 2.


Seems like either one of these is fairly problematic in that you have to
have some monstrous kluge to get the backup_label file to appear with
the right name in the tarfile.


Oh. I'm surprised you feel that way - that part didn't feel ugly or 
kludgey at all to me.



 How badly do we actually need this?
I don't think the use-case for concurrent base backups is all that large
in practice given the I/O hit it's going to involve.


It makes it very convenient to set up standbys, without having to worry 
that you'll conflict e.g with a nightly backup. I don't imagine people 
will use streaming base backups for very large databases anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Josh Berkus


 It makes it very convenient to set up standbys, without having to worry
 that you'll conflict e.g with a nightly backup. I don't imagine people
 will use streaming base backups for very large databases anyway.

Also, imagine that you're provisioning a 10-node replication cluster on
EC2.  This would make that worlds easier.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Tom Lane

Magnus Hagander mag...@hagander.net writes:
 On Tue, Jan 11, 2011 at 19:51, Tom Lane t...@sss.pgh.pa.us wrote:
 Seems like either one of these is fairly problematic in that you have to
 have some monstrous kluge to get the backup_label file to appear with
 the right name in the tarfile.  How badly do we actually need this?
 I don't think the use-case for concurrent base backups is all that large
 in practice given the I/O hit it's going to involve.

 I think it can be done cleaner in the tar file injection - I've been
 chatting with Heikki offlist about that. Not sure, but I have a
 feeling it does.

One point that I'm particularly interested to see how you'll kluge it
is ensuring that the tarball contains only the desired temp data and not
also the real $PGDATA/backup_label, should there be a normal base
backup being done concurrently with the streamed one.

The whole thing just seems too fragile and dangerous to be worth dealing
with given that actual usage will be a corner case.  *I* sure wouldn't
trust it to work when the chips were down.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 20:17, Heikki Linnakangas wrote:

Patches for both approaches attached. They're also available in my
github repository at g...@github.com:hlinnaka/postgres.git.


Just so people won't report the same issues again, a couple of bugs have 
already cropped up (thanks Magnus):


* a backup_label file in the data directory should now not be tarred up 
- we're injecting a different backup_label file there.


* pg_start_backup_callback needs to be updated to just decrement 
forcePageWrites, not reset it to 'false'


* pg_stop_backup() decrements forcePageWrites twice. oops.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Dimitri Fontaine

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

 Now that we have a basic over-the-wire base backup capability in walsender,
 it would be nice to allow taking multiple base backups at the same time.

I would prefer to be able to take a base backup from a standby, or is
that already possible?  What about cascading the wal shipping?  Maybe
those ideas are much bigger projects, but if I don't ask, I don't get to
know :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 21:50, Dimitri Fontaine wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:


Now that we have a basic over-the-wire base backup capability in walsender,
it would be nice to allow taking multiple base backups at the same time.


I would prefer to be able to take a base backup from a standby, or is
that already possible?  What about cascading the wal shipping?  Maybe
those ideas are much bigger projects, but if I don't ask, I don't get to
know :)


Yeah, those are bigger projects..

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Jeff Davis

On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote:
 So, this patch modifies the internal do_pg_start/stop_backup functions 
 so that in addition to the traditional mode of operation, where a 
 backup_label file is created in the data directory where it's backed up 
 along with all other files, the backup label file is be returned to the 
 caller, and the caller is responsible for including it in the backup. 
 The code in replication/basebackup.c includes it in the tar file that's 
 streamed the client, as backup_label.

Perhaps we can use this more intelligent form of base backup to
differentiate between:
 a. a primary that has crashed while a backup was in progress; and
 b. an online backup that is being restored.

Allowing the user to do an unrestricted file copy as a base backup
doesn't allow us to make that differentiation. That lead to the two bugs
that we fixed in StartupXLOG(). And right now there are still two
problems remaining (albeit less severe):

 1. If it's a primary recovering from a crash, and there is a
backup_label file, and the WAL referenced in the backup_label exists,
then it does a bunch of extra work during recovery; and
 2. In the same situation, if the WAL referenced in the backup_label
does not exist, then it PANICs with a HINT to tell you to remove the
backup_label.

Is this an opportunity to solve these problems and simplify the code?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 22:16, Jeff Davis wrote:

On Tue, 2011-01-11 at 20:17 +0200, Heikki Linnakangas wrote:

So, this patch modifies the internal do_pg_start/stop_backup functions
so that in addition to the traditional mode of operation, where a
backup_label file is created in the data directory where it's backed up
along with all other files, the backup label file is be returned to the
caller, and the caller is responsible for including it in the backup.
The code in replication/basebackup.c includes it in the tar file that's
streamed the client, as backup_label.


Perhaps we can use this more intelligent form of base backup to
differentiate between:
  a. a primary that has crashed while a backup was in progress; and
  b. an online backup that is being restored.

Allowing the user to do an unrestricted file copy as a base backup
doesn't allow us to make that differentiation. That lead to the two bugs
that we fixed in StartupXLOG(). And right now there are still two
problems remaining (albeit less severe):

  1. If it's a primary recovering from a crash, and there is a
backup_label file, and the WAL referenced in the backup_label exists,
then it does a bunch of extra work during recovery; and
  2. In the same situation, if the WAL referenced in the backup_label
does not exist, then it PANICs with a HINT to tell you to remove the
backup_label.

Is this an opportunity to solve these problems and simplify the code?


It won't change the situation for pg_start_backup(), but with the patch 
the base backups done via streaming won't have those issues, because 
backup_label is not created (with that name) in the master.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Jeff Davis

On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote:
1. If it's a primary recovering from a crash, and there is a
  backup_label file, and the WAL referenced in the backup_label exists,
  then it does a bunch of extra work during recovery; and
2. In the same situation, if the WAL referenced in the backup_label
  does not exist, then it PANICs with a HINT to tell you to remove the
  backup_label.
 
  Is this an opportunity to solve these problems and simplify the code?
 
 It won't change the situation for pg_start_backup(), but with the patch 
 the base backups done via streaming won't have those issues, because 
 backup_label is not created (with that name) in the master.

Do you think we should change the backup protocol for normal base
backups to try to fix this? Or do you think that the simplicity of
unrestricted file copy is worth these problems?

We could probably make some fairly minor changes, like making a file on
the primary and telling users to exclude it from any base backup. The
danger, of course, is that they do copy it, and their backup is
compromised.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Magnus Hagander

On Tue, Jan 11, 2011 at 22:51, Jeff Davis pg...@j-davis.com wrote:
 On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote:
    1. If it's a primary recovering from a crash, and there is a
  backup_label file, and the WAL referenced in the backup_label exists,
  then it does a bunch of extra work during recovery; and
    2. In the same situation, if the WAL referenced in the backup_label
  does not exist, then it PANICs with a HINT to tell you to remove the
  backup_label.
 
  Is this an opportunity to solve these problems and simplify the code?

 It won't change the situation for pg_start_backup(), but with the patch
 the base backups done via streaming won't have those issues, because
 backup_label is not created (with that name) in the master.

 Do you think we should change the backup protocol for normal base
 backups to try to fix this? Or do you think that the simplicity of
 unrestricted file copy is worth these problems?

 We could probably make some fairly minor changes, like making a file on
 the primary and telling users to exclude it from any base backup. The
 danger, of course, is that they do copy it, and their backup is
 compromised.


I think keeping the flexibility is important. If it does add an extra
step I think that's ok once we have pg_basebackup, but it must be
reasonably *safe*. Corrupt backups from forgetting to exclude a file
seems not so.

But if the problem is you forgot to exclude it, can't you just remove
it at a later time?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Robert Haas

On Jan 11, 2011, at 2:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 The whole thing just seems too fragile and dangerous to be worth dealing
 with given that actual usage will be a corner case.  *I* sure wouldn't
 trust it to work when the chips were down.

I hope this assessment proves to be incorrect, because like Magnus and Heikki, 
I think this will be a major usability improvement. It takes us from there's a 
way to do that to it just works.

...Robert
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread David Fetter

On Tue, Jan 11, 2011 at 05:06:34PM -0500, Robert Haas wrote:
 On Jan 11, 2011, at 2:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  The whole thing just seems too fragile and dangerous to be worth dealing
  with given that actual usage will be a corner case.  *I* sure wouldn't
  trust it to work when the chips were down.
 
 I hope this assessment proves to be incorrect, because like Magnus and 
 Heikki, I think this will be a major usability improvement. It takes us from 
 there's a way to do that to it just works.

(It just works)++ :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Jeff Davis

On Tue, 2011-01-11 at 23:07 +0100, Magnus Hagander wrote: 
 I think keeping the flexibility is important. If it does add an extra
 step I think that's ok once we have pg_basebackup, but it must be
 reasonably *safe*. Corrupt backups from forgetting to exclude a file
 seems not so.

Agreed.

 But if the problem is you forgot to exclude it, can't you just remove
 it at a later time?

If you think you are recovering the primary, and it's really the backup,
then you get corruption. It's too late to remove a file after that
(unless you have a backup of your backup ;) ).

If you think you are restoring a backup, and it's really a primary that
crashed, then you run into one of the two problems that I mentioned
(which are less severe than corruption, but very annoying).

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Cédric Villemain

2011/1/11 Heikki Linnakangas heikki.linnakan...@enterprisedb.com:
 On 11.01.2011 21:50, Dimitri Fontaine wrote:

 Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

 Now that we have a basic over-the-wire base backup capability in
 walsender,
 it would be nice to allow taking multiple base backups at the same time.

 I would prefer to be able to take a base backup from a standby, or is
 that already possible?  What about cascading the wal shipping?  Maybe
 those ideas are much bigger projects, but if I don't ask, I don't get to
 know :)

 Yeah, those are bigger projects..

Way more interesting, IMHO.
Feature to allow multiple backup at the same time looks useless to me.
If DB is small, then it is not that hard to do that before or after a
possible nightly backup.
If DB is large necessary to comment ?

The only case I find interesting is if you can use the bandwith of
more than one ethernet device, else I would use rsync between nodes
after the inital basebackup, pretty sure.


 --
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allowing multiple concurrent base backups

2011-01-11 Thread Heikki Linnakangas


On 11.01.2011 23:51, Jeff Davis wrote:

On Tue, 2011-01-11 at 22:56 +0200, Heikki Linnakangas wrote:

   1. If it's a primary recovering from a crash, and there is a
backup_label file, and the WAL referenced in the backup_label exists,
then it does a bunch of extra work during recovery; and
   2. In the same situation, if the WAL referenced in the backup_label
does not exist, then it PANICs with a HINT to tell you to remove the
backup_label.

Is this an opportunity to solve these problems and simplify the code?


It won't change the situation for pg_start_backup(), but with the patch
the base backups done via streaming won't have those issues, because
backup_label is not created (with that name) in the master.


Do you think we should change the backup protocol for normal base
backups to try to fix this? Or do you think that the simplicity of
unrestricted file copy is worth these problems?

We could probably make some fairly minor changes, like making a file on
the primary and telling users to exclude it from any base backup. The
danger, of course, is that they do copy it, and their backup is
compromised.


Yeah, I don't think it's a good idea to provide such a foot-gun.

Last time we discussed this there was the idea of creating a file in 
$PGDATA, and removing read-permissions from it, so that it couldn't be 
accidentally included in the backup. Even with that safeguard, it 
doesn't feel very safe - a backup running as root or some other special 
privileges might still be able to read it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

46 matches

Mail list logo