Re: [HACKERS] Hot Standby 0.2.1

2009-10-09 Thread Bruce Momjian
Simon Riggs wrote:
 You have posted patches that I have said I don't agree with. My name is
 going to be on this when it goes in, so I don't think it makes any sense
 to force that commit to include changes I don't agree with. I cannot
 prevent you making changes afterwards, nor would I wish to. I'd like you
 to respond sensibly to comments on those. We should work together on a
 consensus basis, especially since I know you have not fully tested your
 changes (either). Your error rate might be lower than mine, but it is
 non-zero.

The commit message and release notes mention might have just Simon's
name, or multiple people.

The hot patch commit is going to have multiple people involved before it
is committed, so if Simon is worried that the patch will have ideas in
it he does not agree with, perhaps we can make sure the commit and
release note items include Heikki's name as well.  Normally if a
committer makes signficant changes to a patch, the committer's name is
also added to the commmit message, and I suggest we do the same thing
here with hot standby.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-09 Thread Robert Haas

On Oct 9, 2009, at 1:21 PM, Bruce Momjian br...@momjian.us wrote:


Simon Riggs wrote:
You have posted patches that I have said I don't agree with. My  
name is
going to be on this when it goes in, so I don't think it makes any  
sense

to force that commit to include changes I don't agree with. I cannot
prevent you making changes afterwards, nor would I wish to. I'd  
like you
to respond sensibly to comments on those. We should work together  
on a
consensus basis, especially since I know you have not fully tested  
your

changes (either). Your error rate might be lower than mine, but it is
non-zero.


The commit message and release notes mention might have just Simon's
name, or multiple people.

The hot patch commit is going to have multiple people involved  
before it

is committed, so if Simon is worried that the patch will have ideas in
it he does not agree with, perhaps we can make sure the commit and
release note items include Heikki's name as well.  Normally if a
committer makes signficant changes to a patch, the committer's name is
also added to the commmit message, and I suggest we do the same thing
here with hot standby.


I think this is a weakness of our current style of heavy-weight  
commits.  I don't have a great suggestion for fixing it, though.  Even  
if we move to git, a major feature like this has such a complex  
development history that I'm queasy about slurping it in unsquashed.  
But at least for simple features I think that there would be a value  
in separating the patch author's work from the committer's adjustments.


I realize (now) that this would complicate the release note generation  
process somewhat, based on our current process, and there might be  
other downsides as well. All the same, I think it has enough value to  
make it worth thinking about whether there's some way to make it work.


...Robert

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-09 Thread Andrew Dunstan



Robert Haas wrote:
But at least for simple features I think that there would be a value 
in separating the patch author's work from the committer's adjustments.





That is just going to make life harder for committers.

There are plenty of things with my name on them that are not exactly 
what I submitted. I think that's true of just about everybody. Mostly 
things changed hae improved, but not always. I don't think we should be 
too proprietary about patches. As far as I'm concerned, credit goes to 
the submitter and blame if any to the committer.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-09 Thread Joshua D. Drake
On Fri, 2009-10-09 at 14:05 -0400, Andrew Dunstan wrote:
 
 Robert Haas wrote:
  But at least for simple features I think that there would be a value 
  in separating the patch author's work from the committer's adjustments.
 
 
 
 That is just going to make life harder for committers.
 
 There are plenty of things with my name on them that are not exactly 
 what I submitted. I think that's true of just about everybody. Mostly 
 things changed hae improved, but not always. I don't think we should be 
 too proprietary about patches. As far as I'm concerned, credit goes to 
 the submitter and blame if any to the committer.

+1

 
 cheers
 
 andrew
 
-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - 
Salamander


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-09 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 Robert Haas wrote:
  But at least for simple features I think that there would be a value 
  in separating the patch author's work from the committer's adjustments.
 
 
 
 That is just going to make life harder for committers.
 
 There are plenty of things with my name on them that are not exactly 
 what I submitted. I think that's true of just about everybody. Mostly 
 things changed hae improved, but not always. I don't think we should be 
 too proprietary about patches. As far as I'm concerned, credit goes to 
 the submitter and blame if any to the committer.

Agreed.  

Simon is right that if only his name is on the commit, there is an
assumption that the committer made no changes, or only cosmetic ones.
For hot standby, I think the committer is making significant changes
(that could lead to bugs) and hence the committer's name should be in
the commit.  Sometimes we say adjusted by committer, but in this case
I think Heikki is doing more than adjustmants --- nothing wrong with
that --- it should just be documented in the commit message.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-08 Thread Simon Riggs

On Wed, 2009-10-07 at 23:19 -0400, Heikki Linnakangas wrote:
 Heikki Linnakangas wrote:
  I've made public the version I'm working on. That's the version I'm
  ultimately going to commit. It would be a lot more helpful if you
  provided these patches over that version. Otherwise I have to refactor
  them over that codebase, possibly introducing new bugs.
 
 Actually, it makes most sense if you push your changes directly to my
 git repository. I just granted you write access to it. Please do these
 changes against the version currently there, and push any fixes there
 directly.

OK, that makes sense. Thank you.

I would still like you to make a clear statement that the contents of
that repository are BSD licenced open source contributions. I have a
contractual responsibility to ensure the code I write is licenced in
that way. I have already mentioned I'm not looking at it yet for that
reason, so I am unaware of any changes not posted to the list.

You have posted patches that I have said I don't agree with. My name is
going to be on this when it goes in, so I don't think it makes any sense
to force that commit to include changes I don't agree with. I cannot
prevent you making changes afterwards, nor would I wish to. I'd like you
to respond sensibly to comments on those. We should work together on a
consensus basis, especially since I know you have not fully tested your
changes (either). Your error rate might be lower than mine, but it is
non-zero.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-08 Thread Heikki Linnakangas
Simon Riggs wrote:
 I would still like you to make a clear statement that the contents of
 that repository are BSD licenced open source contributions.

Ok. All the content in the repository at
git://git.postgresql.org/git/users/heikki/postgres.git is released under
the same BSD license as PostgreSQL.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-08 Thread Simon Riggs

On Thu, 2009-10-08 at 07:33 -0400, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  I would still like you to make a clear statement that the contents of
  that repository are BSD licenced open source contributions.
 
 Ok. All the content in the repository at
 git://git.postgresql.org/git/users/heikki/postgres.git is released under
 the same BSD license as PostgreSQL.

Thank you. I presume that the copyright is novated also.

I'll get cracking on some changes.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-08 Thread Simon Riggs
On Thu, 2009-10-08 at 12:49 +0100, Simon Riggs wrote:

 I'll get cracking on some changes.

This will probably be next week now, just in case you're wondering when
I'll start adding patches.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-07 Thread Simon Riggs

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:
 It looks like the standby tries to remove XID 4323 from the
 known-assigned hash table, but it's not there because it was removed
 and set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I
 guess we should just not throw an error in that case, but is there a
 way we could catch that narrow case and still keep the check in
 KnownAssignedXidsRemove()? It seems like the check might help catch
 other bugs, so I'd rather keep it if possible.

Fix attached.

-- 
 Simon Riggs   www.2ndQuadrant.com
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index f6f50be..031a6a2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2561,8 +2561,15 @@ KnownAssignedXidsRemove(TransactionId xid, bool report_error)
 
 	if (!found  report_error)
 	{
-		KnownAssignedXidsDisplay(LOG);
-		elog(ERROR, cannot remove KnownAssignedXid %u, xid);
+		/*
+		 * Check to see whether we have updated subtrans with this xid.
+		 * If we did, its OK that it is no longer present in KnownAssignedXids
+		 */
+		if (!TransactionIdIsValid(SubTransGetParent(xid)))
+		{
+			KnownAssignedXidsDisplay(LOG);
+			elog(ERROR, cannot remove KnownAssignedXid %u, xid);
+		}
 	}
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-07 Thread Simon Riggs

On Wed, 2009-09-23 at 12:07 +0300, Heikki Linnakangas wrote:
 we need be careful to avoid putting any extra work into the normal
 recovery path. Otherwise bugs in hot standby related code can cause
 crash recovery to fail.

Re-checked code and found a couple of additional places that needed
tests 
if (InHotStandby)

-- 
 Simon Riggs   www.2ndQuadrant.com
diff --git a/src/backend/commands/tablespace.c b/src/backend/commands/tablespace.c
index bef0df1..b14d90a 100644
--- a/src/backend/commands/tablespace.c
+++ b/src/backend/commands/tablespace.c
@@ -1345,40 +1345,48 @@ tblspc_redo(XLogRecPtr lsn, XLogRecord *record)
 		{
 			VirtualTransactionId *temp_file_users;
 
-			/*
-			 * Standby users may be currently using this tablespace for
-			 * for their temporary files. We only care about current
-			 * users because temp_tablespace parameter will just ignore
-			 * tablespaces that no longer exist.
-			 *
-			 * Ask everybody to cancel their queries immediately so
-			 * we can ensure no temp files remain and we can remove the
-			 * tablespace. Nuke the entire site from orbit, it's the only
-			 * way to be sure.
-			 *
-			 * XXX: We could work out the pids of active backends
-			 * using this tablespace by examining the temp filenames in the
-			 * directory. We would then convert the pids into VirtualXIDs
-			 * before attempting to cancel them.
-			 *
-			 * We don't wait for commit because drop tablespace is
-			 * non-transactional.
-			 */
-			temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
-		InvalidOid,
-		false);
-			ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
-	drop tablespace,
-	CONFLICT_MODE_ERROR_IF_NOT_IDLE,
-	InvalidXLogRecPtr);
+			if (InHotStandby)
+			{
+/*
+ * Standby users may be currently using this tablespace for
+ * for their temporary files. We only care about current
+ * users because temp_tablespace parameter will just ignore
+ * tablespaces that no longer exist.
+ *
+ * Ask everybody to cancel their queries immediately so
+ * we can ensure no temp files remain and we can remove the
+ * tablespace. Nuke the entire site from orbit, it's the only
+ * way to be sure.
+ *
+ * XXX: We could work out the pids of active backends
+ * using this tablespace by examining the temp filenames in the
+ * directory. We would then convert the pids into VirtualXIDs
+ * before attempting to cancel them.
+ *
+ * We don't wait for commit because drop tablespace is
+ * non-transactional.
+ */
+temp_file_users = GetConflictingVirtualXIDs(InvalidTransactionId,
+			InvalidOid,
+			false);
+ResolveRecoveryConflictWithVirtualXIDs(temp_file_users,
+		drop tablespace,
+		CONFLICT_MODE_ERROR_IF_NOT_IDLE,
+		InvalidXLogRecPtr);
 
-			/*
-			 * If we did recovery processing then hopefully the
-			 * backends who wrote temp files should have cleaned up and
-			 * exited by now. So lets recheck before we throw an error.
-			 * If !process_conflicts then this will just fail again.
-			 */
-			if (!remove_tablespace_directories(xlrec-ts_id, true))
+/*
+ * If we did recovery processing then hopefully the
+ * backends who wrote temp files should have cleaned up and
+ * exited by now. So lets recheck before we throw an error.
+ * If !process_conflicts then this will just fail again.
+ */
+if (!remove_tablespace_directories(xlrec-ts_id, true))
+	ereport(ERROR,
+		(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+		 errmsg(tablespace %u is not empty,
+		xlrec-ts_id)));
+			}
+			else
 ereport(ERROR,
 	(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
 	 errmsg(tablespace %u is not empty,
diff --git a/src/backend/utils/cache/inval.c b/src/backend/utils/cache/inval.c
index a8cce99..a23c7a0 100644
--- a/src/backend/utils/cache/inval.c
+++ b/src/backend/utils/cache/inval.c
@@ -1820,13 +1820,15 @@ relation_redo(XLogRecPtr lsn, XLogRecord *record)
 	{
 		xl_rel_inval *xlrec = (xl_rel_inval *) XLogRecGetData(record);
 
-		relation_redo_inval(xlrec);
+		if (InHotStandby)
+			relation_redo_inval(xlrec);
 	}
 	else if (info == XLOG_RELATION_LOCK)
 	{
 		xl_rel_lock *xlrec = (xl_rel_lock *) XLogRecGetData(record);
 
-		relation_redo_locks(xlrec, 1);
+		if (InHotStandby)
+			relation_redo_locks(xlrec, 1);
 	}
 	else
 		elog(PANIC, relation_redo: unknown op code %u, info);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-07 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:
 It looks like the standby tries to remove XID 4323 from the
 known-assigned hash table, but it's not there because it was removed
 and set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I
 guess we should just not throw an error in that case, but is there a
 way we could catch that narrow case and still keep the check in
 KnownAssignedXidsRemove()? It seems like the check might help catch
 other bugs, so I'd rather keep it if possible.
 
 Fix attached.

I fixed that on Friday already, in a slightly different manner. Do you
see a problem with that approach?

I've made public the version I'm working on. That's the version I'm
ultimately going to commit. It would be a lot more helpful if you
provided these patches over that version. Otherwise I have to refactor
them over that codebase, possibly introducing new bugs.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-07 Thread Heikki Linnakangas
Heikki Linnakangas wrote:
 I've made public the version I'm working on. That's the version I'm
 ultimately going to commit. It would be a lot more helpful if you
 provided these patches over that version. Otherwise I have to refactor
 them over that codebase, possibly introducing new bugs.

Actually, it makes most sense if you push your changes directly to my
git repository. I just granted you write access to it. Please do these
changes against the version currently there, and push any fixes there
directly.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-02 Thread Simon Riggs

On Sun, 2009-09-27 at 14:59 +0300, Heikki Linnakangas wrote:
 The locking in smgr_redo_commit and smgr_redo_abort doesn't look right.
 First of all, smgr_redo_abort is not holding XidGenLock and
 ProcArrayLock while modifying ShmemVariableCache-nextXid and
 ShmemVariableCache-latestCompletedXid, respectively, like
 smgr_redo_commit is. Attached patch fixes that.
 
 But I think there's a little race condition in there anyway, as they
 remove the finished xids from known-assigned-xids list by calling
 ExpireTreeKnownAssignedTransactionIds, and
 ShmemVariableCache-latestCompletedXid is updated only after that. We're
 not holding ProcArrayLock between those two steps, like we do during
 normal transaction commit. I'm not sure what kind of issues that can
 lead to, but it seems like it can lead to broken snapshots if someone
 calls GetSnapshotData() between those steps.

I confess I didn't know what you were talking about when you wrote this
and was expecting you to have a coffee and retract. I realise now you
meant xact_redo_commit() rather than smgr_ and it makes sense at last.

I've just committed about half of your patch exactly, but not all.

I've avoided making the changes to
ShmemVariableCache-latestCompletedXid directly and moved the update to
ExpireTreeKnownAssignedTransactionIds() which already had access to
max_xid while holding ProcArrayLock. Hopefully that resolves your
comment about race condition.

Also, I noticed that you removed the line 
TransactionIdAdvance(ShmemVariableCache-nextXid);
in xact_redo_abort(). That looks like an error to me, since this follows
neither the comment nor how it is coded in xact_redo_commit(). Let me
know if there was some other reason for that line removal and I'll make
the change again.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-10-01 Thread Simon Riggs

On Sun, 2009-09-27 at 15:35 +0300, Heikki Linnakangas wrote:
 TransactionIdIsInProgress() doesn't consult the known-assigned-xids
 structure. That's a problem: in the standby, TransactionIdIsInProgress()
 can return false for a transaction that is still running in the master.
 HeapTupleSatisfies* functions can incorrectly set HEAP_XMIN/XMAX_INVALID
 hint bits for xids that commit later on.

Agreed. Will fix.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-27 Thread Heikki Linnakangas
The locking in smgr_redo_commit and smgr_redo_abort doesn't look right.
First of all, smgr_redo_abort is not holding XidGenLock and
ProcArrayLock while modifying ShmemVariableCache-nextXid and
ShmemVariableCache-latestCompletedXid, respectively, like
smgr_redo_commit is. Attached patch fixes that.

But I think there's a little race condition in there anyway, as they
remove the finished xids from known-assigned-xids list by calling
ExpireTreeKnownAssignedTransactionIds, and
ShmemVariableCache-latestCompletedXid is updated only after that. We're
not holding ProcArrayLock between those two steps, like we do during
normal transaction commit. I'm not sure what kind of issues that can
lead to, but it seems like it can lead to broken snapshots if someone
calls GetSnapshotData() between those steps.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
From 9d6ef0a3eb901b1d2182b3f2efd5c75d52e88cba Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas hei...@enterprisedb.com
Date: Sun, 27 Sep 2009 14:51:43 +0300
Subject: [PATCH] Make locking in smgr_redo_abort reflect that in smgr_redo_commit.

---
 src/backend/access/transam/xact.c |   22 --
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index a696857..92a7c43 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -4502,10 +4502,7 @@ xact_redo_commit(xl_xact_commit *xlrec, TransactionId xid,
 		/*
 		 * Release locks, if any. We do this for both two phase and normal
 		 * one phase transactions. In effect we are ignoring the prepare
-		 * phase and just going straight to lock release. This explains
-		 * why the twophase_postcommit_standby_callbacks[] do not invoke
-		 * a special routine to handle locks - that is performed here
-		 * instead.
+		 * phase and just going straight to lock release.
 		 */
 		RelationReleaseRecoveryLockTree(xid, xlrec-nsubxacts, sub_xids);
 	}
@@ -4593,12 +4590,25 @@ xact_redo_abort(xl_xact_abort *xlrec, TransactionId xid)
 	}
 
 	/* Make sure nextXid is beyond any XID mentioned in the record */
+	/* We don't expect anyone else to modify nextXid, hence we
+	 * don't need to hold a lock while checking this. We still acquire
+	 * the lock to modify it, though.
+	 */
 	if (TransactionIdFollowsOrEquals(max_xid,
 	 ShmemVariableCache-nextXid))
 	{
+		LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
 		ShmemVariableCache-nextXid = max_xid;
-		ShmemVariableCache-latestCompletedXid = ShmemVariableCache-nextXid;
-		TransactionIdAdvance(ShmemVariableCache-nextXid);
+		LWLockRelease(XidGenLock);
+	}
+
+	/* Same here, don't use lock to test, but need one to modify */
+	if (TransactionIdFollowsOrEquals(max_xid,
+	 ShmemVariableCache-latestCompletedXid))
+	{
+		LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
+		ShmemVariableCache-latestCompletedXid = max_xid;
+		LWLockRelease(ProcArrayLock);
 	}
 
 	/* Make sure files supposed to be dropped are dropped */
-- 
1.6.3.3


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-27 Thread Heikki Linnakangas
TransactionIdIsInProgress() doesn't consult the known-assigned-xids
structure. That's a problem: in the standby, TransactionIdIsInProgress()
can return false for a transaction that is still running in the master.
HeapTupleSatisfies* functions can incorrectly set HEAP_XMIN/XMAX_INVALID
hint bits for xids that commit later on.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Heikki Linnakangas
Looking at the startup sequence now.

I see that you modified ExtendSUBTRANS so that it doesn't wipe out
previously set values if it's called with out-of-order xids. I guess
that works, although I think it can leave pages unzeroed if it's called
with a large enough gap between xids, so that the previous one was on
e.g page 10 and the next one on page 12. Page 11 would not be zeroed,
AFAICS. Not sure if such big gaps in the xid sequence are possible, but
seems so if you have a very large max_connections setting and a lot of
subtransactions.

Nevertheless, I'd feel better if we kept ExtendSUBTRANS unmodified, and
instead arranged things so that ExtendSUBTRANS is always called in
xid-order. We can call it from RecordKnownAssignedTransactionIds, which
handles the out-of-order problem for other purposes already.

I think we have similar problem with clog. ExtendCLOG is currently not
called during recovery, so isn't it possible for the problem with
subtransaction commits that's described in the comments in StartupCLOG
to arise during hot standby? Again, I think we should call ExtendCLOG()
in RecordKnownAssignedTransactionIds. RecordKnownAssignedTransactionIds
is the hot standby version of GetNewTransactionId(), so to speak.

If we go with that, I think we'll need to call StartupSUBTRANS/CLOG
earlier in the startup sequence too, when we establish the startup
snapshot and let backends in.

Thoughts?

Other stuff:

- I removed the new DBConsistentUpToLSN() function and moved its
functionality into XLogNeedsFlush(). Since XLogFlush() updates
minRecoveryPoint instead of flushing WAL during recovery, it makes sense
for XLogNeedsFlush() to correspondingly check if minRecoveryPoint needs
to be updated when in recovery. That's a better definition for the call
in bufmgr.c too.

- I went ahead with the changes with RecoveryInfoLock and tracking the
number of held AccessExclusive locks in lmgr.c instead of proc array.

Can we find a better name for loggable locks? It means locks that need
to be WAL logged when acquired, for hot standby, and loggable lock
doesn't sound right for that. Loggable implies that it can be logged,
but it really means that it *must* be logged.

Keep an eye on my git repository for latest changes.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Simon Riggs

On Thu, 2009-09-24 at 13:33 +0300, Heikki Linnakangas wrote:
 Heikki Linnakangas wrote:
  The problem becomes a lot easier if we accept that it's OK to have a
  lock included in the running-xacts snapshot and also appear in a
  XLOG_RELATION_LOCK record later. The standby should handle that
  gracefully already. If we just remove RecoveryInfoLock, that can happen,
  but it still won't be possible for a lock to be missed out which is what
  we really care about.
 
 I see the problem with that now. Without the lock, it's possible that
 the XLOG_RELATION_LOCK WAL record is written before the
 XLOG_RUNNING_XACTS record. If the lock is not included in the snapshot,
 we want the lock WAL record to be after the snapshot record.
 
 So i guess we'll need the RecoveryInfoLock. But we don't need to hold it
  across the wait. I think it's enough to acquire it just before writing
 the WAL record in LockAcquire. That will ensure that the WAL record
 isn't written until the snapshot is completely finished.

I will think some more on that. I remember thinking there was a reason
why that wasn't enough, possibly to do with no-wait locks which I
remember caused me to change that code.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Simon Riggs

On Wed, 2009-09-23 at 19:07 +0300, Heikki Linnakangas wrote:

 Rather than keep the numHeldLocks counters per-proc in proc array, I
 think it would be simpler to have a single (or one per lock partition)
 counter in shared memory in lock.c. It's just an optimization to make it
 faster to find out that there is no loggable AccessExclusiveLocks in the
 system, so it really rather belongs into the lock manager.

What lock would protect that value? The whole purpose is to avoid taking
the LockMgrLocks and to give something that is accessible by the locks
already held by GetRunningTransactionData().

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Simon Riggs

On Wed, 2009-09-23 at 17:45 +0300, Heikki Linnakangas wrote:
 Heikki Linnakangas wrote:
  Simon Riggs wrote:
  On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote:
  I note that we don't emit RunningXacts after a shutdown checkpoint. So
  if recovery starts at a shutdown checkpoint, we don't let read-only
  backends in until the first online checkpoint. Could we treat a shutdown
  checkpoint as a snapshot with no transactions running? Or do prepared
  transactions screw that up?
  We could, but I see no requirement for starting HS from a backup taken
  on a shutdown database. It's just another special case to test and since
  we already have significant number of important test cases I'd say add
  this later.
  
  There's also a related issue that if a backend holding
  AccessExclusiveLock crashes without writing an abort WAL record, the
  lock is never released in the standby. We handle the expiration of xids
  at replay of running-xacts records, but AFAICS we don't do that for locks.
 
 Ah, scratch that, I now see that we do call
 XactClearRecoveryTransactions() when we see a shutdown checkpoint, which
 clears all recovery locks. But doesn't that prematurely clear all locks
 belonging to prepared transactions as well?

Much better to read your second post(s). :-)

Yes, you have found a(nother) issue. This was the first one that gave me
pause to think of the answer. The locks currently aren't tracked as to
whether they are 2PC or not, so we would need to store that info also so
that we can selectively release locks later.

Question: is it possible to do a fast shutdown when we have a prepared
transaction? Would it be better to take a different approach there for
prepared transactions? It seems strange to write a shutdown checkpoint
when the system isn't yet clean.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Wed, 2009-09-23 at 19:07 +0300, Heikki Linnakangas wrote:
 
 Rather than keep the numHeldLocks counters per-proc in proc array, I
 think it would be simpler to have a single (or one per lock partition)
 counter in shared memory in lock.c. It's just an optimization to make it
 faster to find out that there is no loggable AccessExclusiveLocks in the
 system, so it really rather belongs into the lock manager.
 
 What lock would protect that value? The whole purpose is to avoid taking
 the LockMgrLocks and to give something that is accessible by the locks
 already held by GetRunningTransactionData().

The lock partition lock (so we really need one counter per partition, a
single counter would need additional locking). We're already holding
that in LockAcquire/LockRelease when we need to increment/decrement the
counter.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Simon Riggs

On Fri, 2009-09-25 at 11:49 +0300, Heikki Linnakangas wrote:
 Looking at the startup sequence now.
 
 I see that you modified ExtendSUBTRANS so that it doesn't wipe out
 previously set values if it's called with out-of-order xids. I guess
 that works, although I think it can leave pages unzeroed if it's called
 with a large enough gap between xids, so that the previous one was on
 e.g page 10 and the next one on page 12. Page 11 would not be zeroed,
 AFAICS. Not sure if such big gaps in the xid sequence are possible, but
 seems so if you have a very large max_connections setting and a lot of
 subtransactions.

Yeh, it happens and I've seen it - which is why the code is there.

 Nevertheless, I'd feel better if we kept ExtendSUBTRANS unmodified, and
 instead arranged things so that ExtendSUBTRANS is always called in
 xid-order. We can call it from RecordKnownAssignedTransactionIds, which
 handles the out-of-order problem for other purposes already.
 
 I think we have similar problem with clog. ExtendCLOG is currently not
 called during recovery, so isn't it possible for the problem with
 subtransaction commits that's described in the comments in StartupCLOG
 to arise during hot standby? Again, I think we should call ExtendCLOG()
 in RecordKnownAssignedTransactionIds. RecordKnownAssignedTransactionIds
 is the hot standby version of GetNewTransactionId(), so to speak.

OK. We record xids in sequence, so it is now a much more natural place
to do this. Zeroing them makes them dirty, unfortunately, but that is
another issue.

RecordKnownAssignedTransactionIds() only works once the snapshot has
been initialised. That could leave a gap, so we will need to run it
continuously if InHotStandby. Which may have knock-on changes also.

 If we go with that, I think we'll need to call StartupSUBTRANS/CLOG
 earlier in the startup sequence too, when we establish the startup
 snapshot and let backends in.

Yes, I think an earlier patch had that, but it turns out that there
really isn't anything for those functions to do. Really we could rename
those functions EndOfRecoverySUBTRANS and EndOfRecoveryCLOG to
illustrate their role better.

 - I removed the new DBConsistentUpToLSN() function and moved its
 functionality into XLogNeedsFlush(). Since XLogFlush() updates
 minRecoveryPoint instead of flushing WAL during recovery, it makes sense
 for XLogNeedsFlush() to correspondingly check if minRecoveryPoint needs
 to be updated when in recovery. That's a better definition for the call
 in bufmgr.c too.
 
 - I went ahead with the changes with RecoveryInfoLock and tracking the
 number of held AccessExclusive locks in lmgr.c instead of proc array.
 
 Can we find a better name for loggable locks? It means locks that need
 to be WAL logged when acquired, for hot standby, and loggable lock
 doesn't sound right for that. Loggable implies that it can be logged,
 but it really means that it *must* be logged.

The distinction of loggable is somewhat false since we rely on the
fact that only one person is holding it. So we may as well just call em
what they are: AccessExclusiveLocks.

 Keep an eye on my git repository for latest changes.

No, I'm not doing that. If you want to submit changes, please do so to
the list or just mention what needs work and I will do it. Having
multiple versions of a patch isn't helpful, as I have already said. I
have already been burned multiple times by accepting trial code and you
yourself have re-written particular parts multiple times. I am very,
very grateful for your reviews and detailed suggestions, so this comment
is only about coherency and not tripping each other up. (If you want to
editorialize, it needs to be just prior to commit, but making changes
to a patch just prior to commit has historically shown to introduce bugs
where there weren't any before).

There's enough changes already to demand a full re-test, so everything
discovered so far plus testing is about 2 weeks work. I will commit
things onto git as agreed as I complete coding but that won't imply full
testing.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Simon Riggs

On Fri, 2009-09-25 at 13:23 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Wed, 2009-09-23 at 17:45 +0300, Heikki Linnakangas wrote:
  XactClearRecoveryTransactions() when we see a shutdown checkpoint, which
  clears all recovery locks. But doesn't that prematurely clear all locks
  belonging to prepared transactions as well?
  
  Much better to read your second post(s). :-)
  
  Yes, you have found a(nother) issue. This was the first one that gave me
  pause to think of the answer. The locks currently aren't tracked as to
  whether they are 2PC or not, so we would need to store that info also so
  that we can selectively release locks later.
  
  Question: is it possible to do a fast shutdown when we have a prepared
  transaction?
 
 Yes.
 
  Would it be better to take a different approach there for
  prepared transactions? It seems strange to write a shutdown checkpoint
  when the system isn't yet clean.
 
 Hmm, I guess you could define prepared transactions as active backends
 from the shutdown point of view, and wait for them to finish. I can see
 one problem, though: Once you issue shutdown, fast or smart, we no
 longer accept new connections. So you can't connect to issue the
 ROLLBACK/COMMIT PREPARED anymore. Anyway, it would be a change from the
 current behavior, so it would be better to cope with prepared
 transactions in the standby.

Definitely need to cope with them for Hot Standby. My point was general
one to say that behaviour is very non-useful for users with prepared
transactions. It just causes manual effort by a DBA each time the system
is shutdown.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Heikki Linnakangas
Simon Riggs wrote:
 Definitely need to cope with them for Hot Standby. My point was general
 one to say that behaviour is very non-useful for users with prepared
 transactions. It just causes manual effort by a DBA each time the system
 is shutdown.

The transaction manager is supposed to reconnect and finish any in-doubt
transactions, eventually.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Wed, 2009-09-23 at 17:45 +0300, Heikki Linnakangas wrote:
 XactClearRecoveryTransactions() when we see a shutdown checkpoint, which
 clears all recovery locks. But doesn't that prematurely clear all locks
 belonging to prepared transactions as well?
 
 Much better to read your second post(s). :-)
 
 Yes, you have found a(nother) issue. This was the first one that gave me
 pause to think of the answer. The locks currently aren't tracked as to
 whether they are 2PC or not, so we would need to store that info also so
 that we can selectively release locks later.
 
 Question: is it possible to do a fast shutdown when we have a prepared
 transaction?

Yes.

 Would it be better to take a different approach there for
 prepared transactions? It seems strange to write a shutdown checkpoint
 when the system isn't yet clean.

Hmm, I guess you could define prepared transactions as active backends
from the shutdown point of view, and wait for them to finish. I can see
one problem, though: Once you issue shutdown, fast or smart, we no
longer accept new connections. So you can't connect to issue the
ROLLBACK/COMMIT PREPARED anymore. Anyway, it would be a change from the
current behavior, so it would be better to cope with prepared
transactions in the standby.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Simon Riggs

On Fri, 2009-09-25 at 13:14 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Wed, 2009-09-23 at 19:07 +0300, Heikki Linnakangas wrote:
  
  Rather than keep the numHeldLocks counters per-proc in proc array, I
  think it would be simpler to have a single (or one per lock partition)
  counter in shared memory in lock.c. It's just an optimization to make it
  faster to find out that there is no loggable AccessExclusiveLocks in the
  system, so it really rather belongs into the lock manager.
  
  What lock would protect that value? The whole purpose is to avoid taking
  the LockMgrLocks and to give something that is accessible by the locks
  already held by GetRunningTransactionData().
 
 The lock partition lock (so we really need one counter per partition, a
 single counter would need additional locking). We're already holding
 that in LockAcquire/LockRelease when we need to increment/decrement the
 counter.

Again: The whole purpose is to avoid taking those locks. Why would we
put something behind a lock we are trying to avoid taking?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Heikki Linnakangas
In XidInMVCCSnapshot:
   if (snapshot-takenDuringRecovery)
   {
   /*
* If the snapshot contains full subxact data, the fastest way 
 to check
* things is just to compare the given XID against both subxact 
 XIDs and
* top-level XIDs.  If the snapshot overflowed, we have to 
 use pg_subtrans
* to convert a subxact XID to its parent XID, but then we need 
 only look
* at top-level XIDs not subxacts.
*/
...
   }
   else
   {
   int32   j;
 
   /*
* 
* In recovery we store all xids in the subxact array because 
 this
* is by far the bigger array and we mostly don't know which 
 xids
* are top-level and which are subxacts. The xip array is empty.
*
* We start by searching subtrans, if we overflowed.
*/
...
   }

Hang on, isn't this 180 degrees backwards?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Simon Riggs

On Fri, 2009-09-25 at 15:50 +0300, Heikki Linnakangas wrote:

 Hang on, isn't this 180 degrees backwards?

A witty riposte escapes me; yes, the if test is !correct.

What I find amazing is that it passed the test where I put it doing make
installcheck in an infinite loop for a long time. I guess that means the
regression tests hardly touch the concurrency code at all, which now I
think about it makes sense but I still find that very worrying. :-(

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Alvaro Herrera
Simon Riggs wrote:

 What I find amazing is that it passed the test where I put it doing make
 installcheck in an infinite loop for a long time. I guess that means the
 regression tests hardly touch the concurrency code at all, which now I
 think about it makes sense but I still find that very worrying. :-(

Try installcheck-parallel, which should be a bit better.  It's probably
not yet good enough though because it always runs the same tests
concurrently.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Simon Riggs

On Fri, 2009-09-25 at 16:30 -0400, Alvaro Herrera wrote:
 Simon Riggs wrote:
 
  What I find amazing is that it passed the test where I put it doing make
  installcheck in an infinite loop for a long time. I guess that means the
  regression tests hardly touch the concurrency code at all, which now I
  think about it makes sense but I still find that very worrying. :-(
 
 Try installcheck-parallel, which should be a bit better.  It's probably
 not yet good enough though because it always runs the same tests
 concurrently.

Good tip, thanks.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Andrew Dunstan



Alvaro Herrera wrote:

Simon Riggs wrote:

  

What I find amazing is that it passed the test where I put it doing make
installcheck in an infinite loop for a long time. I guess that means the
regression tests hardly touch the concurrency code at all, which now I
think about it makes sense but I still find that very worrying. :-(



Try installcheck-parallel, which should be a bit better.  It's probably
not yet good enough though because it always runs the same tests
concurrently.

  


It is also quite easy to set up your own schedule.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-25 Thread Alvaro Herrera
Andrew Dunstan wrote:
 
 Alvaro Herrera wrote:

 Try installcheck-parallel, which should be a bit better.  It's probably
 not yet good enough though because it always runs the same tests
 concurrently.
 
 It is also quite easy to set up your own schedule.

... except you have to be careful with hidden test dependencies :-(

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-24 Thread Heikki Linnakangas
Heikki Linnakangas wrote:
 The problem becomes a lot easier if we accept that it's OK to have a
 lock included in the running-xacts snapshot and also appear in a
 XLOG_RELATION_LOCK record later. The standby should handle that
 gracefully already. If we just remove RecoveryInfoLock, that can happen,
 but it still won't be possible for a lock to be missed out which is what
 we really care about.

I see the problem with that now. Without the lock, it's possible that
the XLOG_RELATION_LOCK WAL record is written before the
XLOG_RUNNING_XACTS record. If the lock is not included in the snapshot,
we want the lock WAL record to be after the snapshot record.

So i guess we'll need the RecoveryInfoLock. But we don't need to hold it
 across the wait. I think it's enough to acquire it just before writing
the WAL record in LockAcquire. That will ensure that the WAL record
isn't written until the snapshot is completely finished.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Heikki Linnakangas
Looking at the way cache invalidations are handled in two-phase
transactions, it would be simpler if we store the shared cache
invalidation messages in the twophase state file header, like we store
deleted relations and subxids. This allows them to be copied to the
COMMIT_PREPARED WAL record, so that we don't need treat twophase commits
differently than regular ones in xact_redo_commit. As the patch stands,
the new
xactGetCommittedInvalidationMessages/MakeSharedInvalidMessagesArray
mechanism is duplicated functionality with
AtPrepare_Inval/-PersistInvalidationMessage - both materialize the
pending shared invalidation messages so that they can be written to
disk. I did that in my git branch.

I wonder if we might have alignment issues with the
SharedInvalidationMessages being stored in WAL records, following the
subxids. All the data stored in that record have 4-byte alignment at the
moment, but if SharedInvalidationMessage ever needs 8-byte alignment, we
would have trouble. Probably not worth changing code, it's highly
unlikely that SharedInvalidationMessage will ever need 8-byte alignment,
but perhaps a comment somewhere would be in order.

I note that we don't emit RunningXacts after a shutdown checkpoint. So
if recovery starts at a shutdown checkpoint, we don't let read-only
backends in until the first online checkpoint. Could we treat a shutdown
checkpoint as a snapshot with no transactions running? Or do prepared
transactions screw that up?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Heikki Linnakangas
The logic in the lock manager to track the number of held
AccessExclusiveLocks (with ProcArrayIncrementNumHeldLocks and
ProcArrayDecrementNumHeldLocks) seems to be broken. I added an Assertion
into ProcArrayDecrementNumHeldLocks:

--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -1401,6 +1401,7 @@ ProcArrayIncrementNumHeldLocks(PGPROC *proc)
 void
 ProcArrayDecrementNumHeldLocks(PGPROC *proc)
 {
+   Assert(proc-numHeldLocks  0);
proc-numHeldLocks--;
 }

This tripped the assertion:

postgres=# CREATE TABLE foo (id int4 primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
foo_pkey for table foo
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Making matters worse, the primary server refuses to startup up after
that, tripping the assertion again in crash recovery:

$ bin/postmaster -D data
LOG:  database system was interrupted while in recovery at 2009-09-23
11:56:15 EEST
HINT:  This probably means that some data is corrupted and you will have
to use the last backup for recovery.
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  redo starts at 0/3270
LOG:  REDO @ 0/3270; LSN 0/32AC: prev 0/3220; xid 0; len 32:
Heap2 - clean: rel 1663/11562/1249; blk 32 remxid 4352
LOG:  consistent recovery state reached
LOG:  REDO @ 0/32AC; LSN 0/32CC: prev 0/3270; xid 0; len 4:
XLOG - nextOid: 24600
LOG:  REDO @ 0/32CC; LSN 0/32F4: prev 0/32AC; xid 0; len 12:
Storage - file create: base/11562/16408
LOG:  REDO @ 0/32F4; LSN 0/3200011C: prev 0/32CC; xid 4364; len
12: Relation - exclusive relation lock: xid 4364 db 11562 rel 16408
LOG:  REDO @ 0/3200011C; LSN 0/320001D8: prev 0/32F4; xid 4364; len
159: Heap - insert: rel 1663/11562/1259; tid 5/4
...
LOG:  REDO @ 0/32004754; LSN 0/32004878: prev 0/320046A8; xid 4364; len
264: Transaction - commit: 2009-09-23 11:55:51.888398+03; 15 inval
msgs:catcache id38 catcache id37 catcache id38 catcache id37 catcache
id38 catcache id37 catcache id7 catcache id6 catcache id26 smgr relcache
smgr relcache smgr relcache
TRAP: FailedAssertion(!(proc-numHeldLocks  0), File: procarray.c,
Line: 1404)
LOG:  startup process (PID 27430) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure

I'm sure that's just a simple bug somewhere, but it highlights that we
need be careful to avoid putting any extra work into the normal recovery
path. Otherwise bugs in hot standby related code can cause crash
recovery to fail.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Simon Riggs

On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote:

 Looking at the way cache invalidations are handled in two-phase
 transactions, it would be simpler if we store the shared cache
 invalidation messages in the twophase state file header, like we store
 deleted relations and subxids. This allows them to be copied to the
 COMMIT_PREPARED WAL record, so that we don't need treat twophase commits
 differently than regular ones in xact_redo_commit. As the patch stands,
 the new
 xactGetCommittedInvalidationMessages/MakeSharedInvalidMessagesArray
 mechanism is duplicated functionality with
 AtPrepare_Inval/-PersistInvalidationMessage - both materialize the
 pending shared invalidation messages so that they can be written to
 disk. I did that in my git branch.

We could, but the prepared transaction path already contains special
case code anyway, so we aren't reducing number of test cases required.
This looks like a possible area for refactoring, but I don't see the
need for pre-factoring. i.e. don't rework before commit, rework after.

 I wonder if we might have alignment issues with the
 SharedInvalidationMessages being stored in WAL records, following the
 subxids. All the data stored in that record have 4-byte alignment at the
 moment, but if SharedInvalidationMessage ever needs 8-byte alignment, we
 would have trouble. Probably not worth changing code, it's highly
 unlikely that SharedInvalidationMessage will ever need 8-byte alignment,
 but perhaps a comment somewhere would be in order.

It's a possible source of bugs, but there are no issues there AFAICS.
The technique of multiple arrays on a WAL record wasn't invented by this
patch.

 I note that we don't emit RunningXacts after a shutdown checkpoint. So
 if recovery starts at a shutdown checkpoint, we don't let read-only
 backends in until the first online checkpoint. Could we treat a shutdown
 checkpoint as a snapshot with no transactions running? Or do prepared
 transactions screw that up?

We could, but I see no requirement for starting HS from a backup taken
on a shutdown database. It's just another special case to test and since
we already have significant number of important test cases I'd say add
this later.


That seems to have reflected all of your points on this post, though
thanks for the comments. I'm keen to reduce complexity in areas that
have caused bugs, but for stuff that is working I am tempted to leave
alone on such a big patch. Anything we can do to avoid re-testing
sections of code and/or reduce the number of tests required is going to
increase stability. 

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Simon Riggs

On Wed, 2009-09-23 at 12:07 +0300, Heikki Linnakangas wrote:
 it highlights that we
 need be careful to avoid putting any extra work into the normal
 recovery
 path. Otherwise bugs in hot standby related code can cause crash
 recovery to fail.

Excellent point. I will put in additional protective code there.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Simon Riggs

On Wed, 2009-09-23 at 12:07 +0300, Heikki Linnakangas wrote:
 seems to be broken

Agreed. 

Patch withdrawn for correction and rework. Nothing serious, but not much
point doing further testing to all current issues resolved.

Tracking of issues raised and later solved via Wiki.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote:
 I note that we don't emit RunningXacts after a shutdown checkpoint. So
 if recovery starts at a shutdown checkpoint, we don't let read-only
 backends in until the first online checkpoint. Could we treat a shutdown
 checkpoint as a snapshot with no transactions running? Or do prepared
 transactions screw that up?
 
 We could, but I see no requirement for starting HS from a backup taken
 on a shutdown database. It's just another special case to test and since
 we already have significant number of important test cases I'd say add
 this later.

There's also a related issue that if a backend holding
AccessExclusiveLock crashes without writing an abort WAL record, the
lock is never released in the standby. We handle the expiration of xids
at replay of running-xacts records, but AFAICS we don't do that for locks.

It shouldn't be much code to clear those states at shutdown checkpoint,
just a few lines to call the right functions methinks.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Heikki Linnakangas
Heikki Linnakangas wrote:
 Simon Riggs wrote:
 On Wed, 2009-09-23 at 11:13 +0300, Heikki Linnakangas wrote:
 I note that we don't emit RunningXacts after a shutdown checkpoint. So
 if recovery starts at a shutdown checkpoint, we don't let read-only
 backends in until the first online checkpoint. Could we treat a shutdown
 checkpoint as a snapshot with no transactions running? Or do prepared
 transactions screw that up?
 We could, but I see no requirement for starting HS from a backup taken
 on a shutdown database. It's just another special case to test and since
 we already have significant number of important test cases I'd say add
 this later.
 
 There's also a related issue that if a backend holding
 AccessExclusiveLock crashes without writing an abort WAL record, the
 lock is never released in the standby. We handle the expiration of xids
 at replay of running-xacts records, but AFAICS we don't do that for locks.

Ah, scratch that, I now see that we do call
XactClearRecoveryTransactions() when we see a shutdown checkpoint, which
clears all recovery locks. But doesn't that prematurely clear all locks
belonging to prepared transactions as well?

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Jeff Janes
On Tue, Sep 22, 2009 at 10:02 AM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Heikki Linnakangas escribió:
 Simon Riggs wrote:
  On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
  jjanes=# begin;
  BEGIN
  jjanes=# lock table pgbench_history in access exclusive mode;
  LOCK TABLE
  jjanes=# select count(*) from pgbench_history;
   count
  
   519104
  (1 row)
 
  jjanes=# select count(*) from pgbench_history;
   count
  
   527814
  (1 row)
 
  Is this the expected behavior?
 
  By me, yes. WAL replay does not require a table lock to progress. Any
  changes are protected with block-level locks. It does acquire a table
  lock and cancel conflicting queries when it is about to replay something
  that would cause a query to explode, such as dropping a table, as
  explained in docs.

 That is rather surprising. You can't get that result in a normal server,
 which I think is enough of a reason to disallow it. If you do LOCK TABLE
 ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
 nose.

Right.  I'd rather be denied the lock than have it granted to me but
not do the same thing it does on a primary---prevent all changes to
the locked table.

 I think the fallout from that argument is that WAL replay should hold
 table-level locks (otherwise the workaround to this problem is too
 special-casey).  I'm not sure about that.  If I really wanted to get
 consistent results, I'd use a serializable transaction.


Unfortunately, isolation level serializable is not truly
serializable.  Usually it is good enough, but when it isn't good
enough and you need an explicit table lock (a very rare but not
nonexistent situation), I think it should either lock the table in the
manner it would do on the primary, or throw an error.  I think that
silently changing the behavior between primary and standby is not a
good thing.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 Unfortunately, isolation level serializable is not truly
 serializable.  Usually it is good enough, but when it isn't good
 enough and you need an explicit table lock (a very rare but not
 nonexistent situation), I think it should either lock the table in the
 manner it would do on the primary, or throw an error.  I think that
 silently changing the behavior between primary and standby is not a
 good thing.

+1 --- this proposal made me acutely uncomfortable, too.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Wed, 2009-09-23 at 12:07 +0300, Heikki Linnakangas wrote:
 seems to be broken
 
 Agreed. 

Looking at the relation lock stuff a bit more...

When someone tries to acquire an AccessExclusiveLock, but can't get it
immediately, we sleep while holding RecoveryInfoLock. That doesn't
affect normal queries, but it will in turn block any attempt to get a
running-xacts snapshot, and will thus block checkpoint from finishing.

It could take a very long time until you get an AccessExclusiveLock on a
busy table, so that seems pretty bad. I think we need to think a bit
harder about that locking.

The problem becomes a lot easier if we accept that it's OK to have a
lock included in the running-xacts snapshot and also appear in a
XLOG_RELATION_LOCK record later. The standby should handle that
gracefully already. If we just remove RecoveryInfoLock, that can happen,
but it still won't be possible for a lock to be missed out which is what
we really care about.


Rather than keep the numHeldLocks counters per-proc in proc array, I
think it would be simpler to have a single (or one per lock partition)
counter in shared memory in lock.c. It's just an optimization to make it
faster to find out that there is no loggable AccessExclusiveLocks in the
system, so it really rather belongs into the lock manager.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Josh Berkus
Simon,

 Patch withdrawn for correction and rework. Nothing serious, but not much
 point doing further testing to all current issues resolved.

:-(

Good thing we went for 4 CFs.

Is there a GIT branch of Simon's current working version up somewhere?

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Heikki Linnakangas
Josh Berkus wrote:
 Patch withdrawn for correction and rework. Nothing serious, but not much
 point doing further testing to all current issues resolved.
 
 :-(
 
 Good thing we went for 4 CFs.

I think we should try to hammer this in in this commitfest. None of the
issues found this far are too serious, nothing that requires major rewrites.

 Is there a GIT branch of Simon's current working version up somewhere?

I have my git repository at git.postgresql.org, branch 'hs-riggs'. I've
pushed a number of small changes there over Simon's patch.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-23 Thread Josh Berkus
Heikki,

 I think we should try to hammer this in in this commitfest. None of the
 issues found this far are too serious, nothing that requires major rewrites.

It would certainly be valuable to get users testing it starting with
Alpha2 instead of waiting 2 months.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-22 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Mon, 2009-09-21 at 14:01 +0100, Simon Riggs wrote:
 On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:
 
 is this that we seem to be missing conflict
 resolution for GiST index tuples deleted by the kill_prior_tuples
 mechanism. Unless I'm missing something, we need similar handling there
 that we have in b-tree.
 OK, I agree with that. Straightforward change. Thanks very much.

 I marked the comment to indicate that the handling for GIST and GIN
 indexes looked dubious to me also. I had the earlier it is safe
 comments but that was before we looked at the kill prior tuples issue.
 
 ISTM I looked at this too quickly.
 
 kill_prior_tuple is only ever set by these lines, after scan starts:
 
 if (!scan-xactStartedInRecovery)
 scan-kill_prior_tuple = scan-xs_hot_dead;
 
 which is set in indexam.c, not within any particular am. So the coding,
 as submitted, covers all index types, current and future.

That just stops more tuples from being killed in the standby. I was
thinking that we need similar conflict resolution in GiST that we do
with b-tree delete records, to stop killed tuples from being deleted
while they might still be needed in the standby.

But looking closer at GiST, it seems that GiST doesn't actually do that;
killed tuples are not removed at page splits, but only by VACUUM. So
that's not an issue after all.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-22 Thread Simon Riggs

On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
 On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs si...@2ndquadrant.com wrote:
 
  OK, here is the latest version of the Hot Standby patchset. This is
  about version 30+ by now, but we should regard this as 0.2.1
  Patch against CVS HEAD (now): clean apply, compile, no known bugs.
 
  OVERVIEW
 
  You can download PDF versions of the fine manual is here
  http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf
 
 
 From this doc:
 
 In recovery, transactions will not be permitted to take any lock
 higher other than
 AccessShareLock or AccessExclusiveLock. In addition, transactions may never
 assign a TransactionId and may never write WAL. The LOCK TABLE command by
 default applies an AccessExclusiveLock. Any LOCK TABLE command that runs on
 the standby and requests a specific lock type other than AccessShareLock will 
 be
 rejected.
 
 The first sentence seems to say that clients on the stand-by can take
 ACCESS EXCLUSIVE, while the last sentence seems to say that they
 cannot do so.

You are right to pick up that discrepancy, as Heikki did also.

The root cause of that discrepancy is a change in patch behaviour
between January and now that I will use your post to highlight and
discuss, if you don't mind. (and yes, the docs need to be corrected)

Initially, it seemed that it was certain that a read-only backend could
not take an AccessExclusiveLock. On further thought, there is no
particular reason to block AccessExclusiveLocks themselves, just that
most things you might do while holding one are banned. But the lock
itself is fine. (Any challenge on that?)

AccessExclusiveLocks can be used to serialize the actions of other
backends. That is a very common use case, so my concern was that LOCK
TABLE would be a no-op unless we allowed AccessExclusiveLock, so the
patch does allow it.

 I did a little experiment on a hot standby instance.  I expected that
 either I would be denied the lock altogether, or the lock would cause
 WAL replay to be paused until either I committed or was forcibly
 canceled.  But neither happened, I was granted the lock but WAL replay
 continued anyway.
 
 jjanes=# begin;
 BEGIN
 jjanes=# lock table pgbench_history in access exclusive mode;
 LOCK TABLE
 jjanes=# select count(*) from pgbench_history;
  count
 
  519104
 (1 row)
 
 jjanes=# select count(*) from pgbench_history;
  count
 
  527814
 (1 row)
 
 Is this the expected behavior?

By me, yes. WAL replay does not require a table lock to progress. Any
changes are protected with block-level locks. It does acquire a table
lock and cancel conflicting queries when it is about to replay something
that would cause a query to explode, such as dropping a table, as
explained in docs.

So this is not a bug.

The explanation of how the above sequence of events occurs is that the
backend acquires AccessExclusiveLock - please check on other session in
pg_locks. WAL replay continues by the Startup process, inserting further
rows into the pgbench_history table as a series of transactions. The
second select takes a later snapshot than the first and so sees more
data than the first select, hence a larger count. (And I am pleased to
see that recovery is progressing quickly even while your queries run).

So not a bug, but just one of many possible behaviours we could enforce.
1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay
2. Allow AccessExclusiveLocks but have them pause WAL replay
3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a
no-op because it will not be able to serialize anything)

So the patch originally implemented (3) but now implements (1).

I would say that (2) is very undesirable because it puts WAL replay in
the control of non-superusers. That could mean LOCK TABLE implicitly
alters the high availability of the standby, and might even be used
maliciously to do that.

I'm open to views on whether we should use (1) or (3). Comments?

Implementing either is no problem and we have a straight choice. We may
even wish to review that again later from additional feedback.

(Jeff, you have also helped me understand that there is a bug in the way
serializable transactions are cancelled, which is easily corrected.
Thanks for that unexpected windfall, but it is otherwise unrelated to
your comments.)

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-22 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
 jjanes=# begin;
 BEGIN
 jjanes=# lock table pgbench_history in access exclusive mode;
 LOCK TABLE
 jjanes=# select count(*) from pgbench_history;
  count
 
  519104
 (1 row)

 jjanes=# select count(*) from pgbench_history;
  count
 
  527814
 (1 row)

 Is this the expected behavior?
 
 By me, yes. WAL replay does not require a table lock to progress. Any
 changes are protected with block-level locks. It does acquire a table
 lock and cancel conflicting queries when it is about to replay something
 that would cause a query to explode, such as dropping a table, as
 explained in docs.

That is rather surprising. You can't get that result in a normal server,
which I think is enough of a reason to disallow it. If you do LOCK TABLE
ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
nose.

 So not a bug, but just one of many possible behaviours we could enforce.
 1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay
 2. Allow AccessExclusiveLocks but have them pause WAL replay
 3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a
 no-op because it will not be able to serialize anything)
 
 So the patch originally implemented (3) but now implements (1).
 
 I would say that (2) is very undesirable because it puts WAL replay in
 the control of non-superusers. That could mean LOCK TABLE implicitly
 alters the high availability of the standby, and might even be used
 maliciously to do that.

I don't see a problem with (2) as long as the locker is kicked out after
max_standby_delay, like a long-running query. That's what I would
expect. I'm fine with (3) as well, but (1) does seem rather suprising
behavior.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-22 Thread Simon Riggs

On Tue, 2009-09-22 at 11:04 +0300, Heikki Linnakangas wrote:
  
  By me, yes. WAL replay does not require a table lock to progress. Any
  changes are protected with block-level locks. It does acquire a table
  lock and cancel conflicting queries when it is about to replay something
  that would cause a query to explode, such as dropping a table, as
  explained in docs.
 
 That is rather surprising. You can't get that result in a normal server,
 which I think is enough of a reason to disallow it. If you do LOCK TABLE
 ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
 nose.

OK, normality is a reasonable argument against. So (1) is only a
partial implementation of serializing the table.

  So not a bug, but just one of many possible behaviours we could enforce.
  1. Allow AccessExclusiveLocks yet they do not interrupt WAL replay
  2. Allow AccessExclusiveLocks but have them pause WAL replay
  3. Disallow AccessExclusiveLocks (and so LOCK TABLE is effectively a
  no-op because it will not be able to serialize anything)
  
  So the patch originally implemented (3) but now implements (1).
  
  I would say that (2) is very undesirable because it puts WAL replay in
  the control of non-superusers. That could mean LOCK TABLE implicitly
  alters the high availability of the standby, and might even be used
  maliciously to do that.
 
 I don't see a problem with (2) as long as the locker is kicked out after
 max_standby_delay, like a long-running query. That's what I would
 expect. I'm fine with (3) as well, but (1) does seem rather suprising
 behavior.

(2) gives other problems because it would force us to check for
conflicting locks for each heap  index WAL record to ensure that the
lock was honoured. We could optimize that but it's still going to cost.

I'd rather leave things at (3) for now and wait for further feedback.
Start strict, relax later.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-22 Thread Heikki Linnakangas
In testing, it looks like there's still something wrong with the
subtransaction handling. I created a test function to create a large
number of subtransactions:

CREATE LANGUAGE plpgsql;
CREATE TABLE bar (id int4);

CREATE OR REPLACE FUNCTION subxids (n integer) RETURNS void LANGUAGE
plpgsql AS $$
BEGIN
  IF n = 0 THEN RETURN; END IF;
  INSERT INTO bar VALUES (n);
  PERFORM subxids(n - 1);
  RETURN;
EXCEPTION WHEN raise_exception THEN NULL; END;
$$;

And used that to created 100 nested subtransactions in the primary server:

SELECT subxids(100);


I got this in the standby:

...
LOG:  REDO @ 0/A000EA8; LSN 0/A000FCC: prev 0/A000E6C; xid 2662; len
264: Transaction - xid assignment xtop 2599 nsubxids 64; 2600 2601 2602
2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616
2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630
2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644
2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658
2659 2660 2661 2662 0
LOG:  extend subtrans  xid 2600 page 1 last_page 0
CONTEXT:  rmgr 1 redo xid assignment xtop 2599 nsubxids 64; 2600 2601
2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615
2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629
2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643
2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657
2658 2659 2660 2661 2662 0
TRAP: FailedAssertion(!(((newestXact) = ((TransactionId) 3))), File:
subtrans.c, Line: 299)
LOG:  startup process (PID 28594) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes

Apparently an InvalidXid is sneaking into the unreported xid array
collected in the primary.

I actually bumped into this while I was testing a simpler implementation
of the logic to collect the unreported xids in the primary. As the patch
stands, we keep track of how many of the childXids at each
subtransaction level have already been reported, but it occurs to me
that it would be a lot simpler to populate a separate array of
unreported xids on-the-fly, as xids are assigned. With the simpler
implementation I bumped into another issue (see below), which is why I
wanted to test if it worked without the simplification.

So with the simpler logic, I had this problem. When I do this in the
primary:

postgres=# SELECT subxids(1);
ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter max_stack_depth, after
ensuring the platform's stack depth limit is adequate.
CONTEXT:  SQL statement SELECT  subxids( $1  - 1)
PL/pgSQL function subxids line 4 at PERFORM
SQL statement SELECT  subxids( $1  - 1)
...

I get this in the standby:

...
LOG:  REDO @ 0/1000B6C4; LSN 0/1000B6F0: prev 0/1000B698; xid 4325; len
16: Transaction - abort: 2009-09-22 12:45:00.938243+03
LOG:  record known xact 4325 latestObservedXid 4334
CONTEXT:  rmgr 1 redo abort: 2009-09-22 12:45:00.938243+03
LOG:  remove KnownAssignedXid 4325
CONTEXT:  rmgr 1 redo abort: 2009-09-22 12:45:00.938243+03
LOG:  REDO @ 0/1000B6F0; LSN 0/1000B71C: prev 0/1000B6C4; xid 4324; len
16: Transaction - abort: 2009-09-22 12:45:00.938438+03
LOG:  record known xact 4324 latestObservedXid 4334
CONTEXT:  rmgr 1 redo abort: 2009-09-22 12:45:00.938438+03
LOG:  remove KnownAssignedXid 4324
CONTEXT:  rmgr 1 redo abort: 2009-09-22 12:45:00.938438+03
LOG:  REDO @ 0/1000B71C; LSN 0/1000B748: prev 0/1000B6F0; xid 4323; len
16: Transaction - abort: 2009-09-22 12:45:00.938632+03
LOG:  record known xact 4323 latestObservedXid 4334
CONTEXT:  rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG:  remove KnownAssignedXid 4323
CONTEXT:  rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG:  1 KnownAssignedXids 3619
CONTEXT:  rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
FATAL:  cannot remove KnownAssignedXid 4323
CONTEXT:  rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG:  startup process (PID 31284) exited with exit code 1
LOG:  terminating any other active server processes

It looks like the standby tries to remove XID 4323 from the
known-assigned hash table, but it's not there because it was removed and
set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I guess we
should just not throw an error in that case, but is there a way we could
catch that narrow case and still keep the check in
KnownAssignedXidsRemove()? It seems like the check might help catch
other bugs, so I'd rather keep it if possible.

I've pushed the simplified code to my git repository if you want to take
a look.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-22 Thread Simon Riggs

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:

 In testing, it looks like there's still something wrong with the
 subtransaction handling. I created a test function to create a large
 number of subtransactions:

OK, looking at this now. Thanks for the report.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-22 Thread Simon Riggs

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:

 It looks like the standby tries to remove XID 4323 from the
 known-assigned hash table, but it's not there because it was removed
 and set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I
 guess we should just not throw an error in that case, but is there a
 way we could catch that narrow case and still keep the check in
 KnownAssignedXidsRemove()? It seems like the check might help catch
 other bugs, so I'd rather keep it if possible.

Yes, a certain test is important and I'm glad we've (almost) achieved
it.

ISTM we can say if not in KnownAssignedXids then check pg_subtrans. If
not in either then report error. May need to add info to the abort
record to check xtop also.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-22 Thread Simon Riggs

On Tue, 2009-09-22 at 12:53 +0300, Heikki Linnakangas wrote:
 In testing, it looks like there's still something wrong with the
 subtransaction handling. I created a test function to create a large
 number of subtransactions:
 
 CREATE LANGUAGE plpgsql;
 CREATE TABLE bar (id int4);
 
 CREATE OR REPLACE FUNCTION subxids (n integer) RETURNS void LANGUAGE
 plpgsql AS $$
 BEGIN
   IF n = 0 THEN RETURN; END IF;
   INSERT INTO bar VALUES (n);
   PERFORM subxids(n - 1);
   RETURN;
 EXCEPTION WHEN raise_exception THEN NULL; END;
 $$;
 
 And used that to created 100 nested subtransactions in the primary server:
 
 SELECT subxids(100);
 
 
 I got this in the standby:
 
 ...
 LOG:  REDO @ 0/A000EA8; LSN 0/A000FCC: prev 0/A000E6C; xid 2662; len
 264: Transaction - xid assignment xtop 2599 nsubxids 64; 2600 2601 2602
 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616
 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630
 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644
 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658
 2659 2660 2661 2662 0
 LOG:  extend subtrans  xid 2600 page 1 last_page 0
 CONTEXT:  rmgr 1 redo xid assignment xtop 2599 nsubxids 64; 2600 2601
 2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615
 2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629
 2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643
 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657
 2658 2659 2660 2661 2662 0
 TRAP: FailedAssertion(!(((newestXact) = ((TransactionId) 3))), File:
 subtrans.c, Line: 299)
 LOG:  startup process (PID 28594) was terminated by signal 6: Aborted
 LOG:  terminating any other active server processes
 
 Apparently an InvalidXid is sneaking into the unreported xid array
 collected in the primary.
 
 I actually bumped into this while I was testing a simpler implementation
 of the logic to collect the unreported xids in the primary. As the patch
 stands, we keep track of how many of the childXids at each
 subtransaction level have already been reported, but it occurs to me
 that it would be a lot simpler to populate a separate array of
 unreported xids on-the-fly, as xids are assigned. With the simpler
 implementation I bumped into another issue (see below), which is why I
 wanted to test if it worked without the simplification.

The bug seems an off-by-one error and would seem easily corrected in any
case; there is no fundamental problem revealed.

I prefer your suggested approach, since it avoids that rather complex
looking code that did the grovelling thru the nested xact states.

I'll get back to you with more on this tomorrow.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-22 Thread Alvaro Herrera
Heikki Linnakangas escribió:
 Simon Riggs wrote:
  On Mon, 2009-09-21 at 19:42 -0700, Jeff Janes wrote:
  jjanes=# begin;
  BEGIN
  jjanes=# lock table pgbench_history in access exclusive mode;
  LOCK TABLE
  jjanes=# select count(*) from pgbench_history;
   count
  
   519104
  (1 row)
 
  jjanes=# select count(*) from pgbench_history;
   count
  
   527814
  (1 row)
 
  Is this the expected behavior?
  
  By me, yes. WAL replay does not require a table lock to progress. Any
  changes are protected with block-level locks. It does acquire a table
  lock and cancel conflicting queries when it is about to replay something
  that would cause a query to explode, such as dropping a table, as
  explained in docs.
 
 That is rather surprising. You can't get that result in a normal server,
 which I think is enough of a reason to disallow it. If you do LOCK TABLE
 ACCESS EXCLUSIVE, you wouldn't expect the contents to change under your
 nose.

I think the fallout from that argument is that WAL replay should hold
table-level locks (otherwise the workaround to this problem is too
special-casey).  I'm not sure about that.  If I really wanted to get
consistent results, I'd use a serializable transaction.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-22 Thread Bruce Momjian
Simon Riggs wrote:
 
 OK, here is the latest version of the Hot Standby patchset. This is
 about version 30+ by now, but we should regard this as 0.2.1
 Patch against CVS HEAD (now): clean apply, compile, no known bugs.
 
 OVERVIEW

Anyone who is interested in how the hot standby behaves should read the
following excellent PDF Simon produced.  It goes into great detail of
the slave's read-only transactions and how the standby behaves during
continuous slave recovery:

 You can download PDF versions of the fine manual is here
 http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf

The function call docs are at:

 http://wiki.postgresql.org/images/1/10/Hot_Standby_Recovery_Functions.pdf

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-21 Thread Heikki Linnakangas
Simon Riggs wrote:
 OK, here is the latest version of the Hot Standby patchset. This is
 about version 30+ by now, but we should regard this as 0.2.1
 Patch against CVS HEAD (now): clean apply, compile, no known bugs.

Thanks! Attached is some minor comment and fixes, and some dead code
removal. Also available in my git repository, branch 'hs-riggs'.

The documentation talks about setting and checking
default_transaction_read_only, but I think it doesn't say anything about
transaction_read_only, which I find odd. This in particular:

 Users will be able to tell whether their session is read-only by
 +   issuing SHOW default_transaction_read_only

seems misleading, as you might have default_transaction_read_only=on,
but still be able to do SET transaction_read_only, so the *session*
isn't necessarily read-only.

The only bug I've found is this that we seem to be missing conflict
resolution for GiST index tuples deleted by the kill_prior_tuples
mechanism. Unless I'm missing something, we need similar handling there
that we have in b-tree.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 91917cf..2257ec6 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -2099,9 +2099,9 @@ if (!triggered)
 
para
 	In recovery, transactions will not be permitted to take any lock higher
-	other than AccessShareLock or AccessExclusiveLock. In addition,
-	transactions may never assign a TransactionId and may never write WAL.
-	The LOCK TABLE command by default applies an AccessExclusiveLock. 
+	than AccessShareLock. In addition, transactions may never assign a
+TransactionId and may never write WAL.
+	The LOCK TABLE command by default applies an AccessExclusiveLock.
 	Any LOCK TABLE command that runs on the standby and requests a specific
 	lock type other than AccessShareLock will be rejected.
/para
@@ -2168,8 +2168,8 @@ if (!triggered)
 
para
 	An example of the above would be an Administrator on Primary server
-	runs a DROP TABLE command that refers to a table currently in use by
-	a User query on the standby server.
+	runs a DROP TABLE command on a table that's currently being queried
+	in the standby server.
/para
 
para
@@ -2198,9 +2198,9 @@ if (!triggered)
para
 	We have a number of choices for resolving query conflicts.  The default
 	is that we wait and hope the query completes. If the recovery is not paused,
-	then the server will wait automatically until the server the lag between
+	then the server will wait automatically until the lag between
 	primary and standby is at most max_standby_delay seconds. Once that grace
-	period expires, we then take one of the following actions:
+	period expires, we take one of the following actions:
 
 	  itemizedlist
 	   listitem
@@ -2213,7 +2213,7 @@ if (!triggered)
 	para
 		 If the conflict is caused by cleanup records we tell the standby query
 	 	 that a conflict has occurred and that it must cancel itself to avoid the
-	 	 risk that it attempts to silently fails to read relevant data because
+		 risk that it silently fails to read relevant data because
 	 	 that data has been removed. (This is very similar to the much feared
 		 error message snapshot too old).
 	/para
@@ -,7 +,7 @@ if (!triggered)
 		 Note also that this means that idle-in-transaction sessions are never
 		 canceled except by locks. Users should be clear that tables that are
 		 regularly and heavily updated on primary server will quickly cause
-		 cancellation of any longer running queries made against those tables.
+		 cancellation of any longer running queries in the standby.
 	/para
 
 	para
@@ -2235,7 +2235,7 @@ if (!triggered)
/para
 
para
-	Other remdial actions exist if the number of cancelations is unacceptable.
+	Other remedial actions exist if the number of cancelations is unacceptable.
 	The first option is to connect to primary server and keep a query active
 	for as long as we need to run queries on the standby. This guarantees that
 	a WAL cleanup record is never generated and we don't ever get query
@@ -2283,7 +2283,7 @@ if (!triggered)
titleAdministrator's Overview/title
 
para
-	If there is a recovery.conf file present then the will start in Hot Standby
+	If there is a recovery.conf file present the server will start in Hot Standby
 	mode by default, though this can be disabled by setting
 	recovery_connections = off in recovery.conf. The server may take some
 	time to enable recovery connections since the server must first complete
@@ -2329,7 +2329,7 @@ LOG:  database system is ready to accept read only connections
 	A set of functions allow superusers to control the flow of recovery
 	are described in xref linkend=functions-recovery-control-table.
 	These functions allow you to pause and continue recovery, as well
-	as dynamically set new recovery targets wile recovery progresses.
+	as dynamically set new 

Re: [HACKERS] Hot Standby 0.2.1

2009-09-21 Thread Simon Riggs

On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:

 The only bug I've found 

!

 is this that we seem to be missing conflict
 resolution for GiST index tuples deleted by the kill_prior_tuples
 mechanism. Unless I'm missing something, we need similar handling there
 that we have in b-tree.

OK, I agree with that. Straightforward change. Thanks very much.

I marked the comment to indicate that the handling for GIST and GIN
indexes looked dubious to me also. I had the earlier it is safe
comments but that was before we looked at the kill prior tuples issue.

Re-reading code for GIN also, I note that there isn't any further work
because we don't kill prior tuples ever. Also, there is no special
handling of the GIN b-tree posting tree because VACUUM scans that in
logical sequence, rather than the physical sequence in nbtree.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-21 Thread Simon Riggs

On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:
 The documentation talks about setting and checking
 default_transaction_read_only, but I think it doesn't say anything
 about
 transaction_read_only, which I find odd. This in particular:
 
  Users will be able to tell whether their session is read-only by
  +   issuing SHOW default_transaction_read_only
 
 seems misleading, as you might have default_transaction_read_only=on,
 but still be able to do SET transaction_read_only, so the *session*
 isn't necessarily read-only.

Yes, clearly missing a check there. Those two operations should be
blocked at higher level, using PreventCommandDuringRecovery() and I
confess that I thought they already were. Doesn't crash because of the
other checks in place, but gives wrong error message.

Thanks for penetration testing the patch.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-21 Thread Simon Riggs

On Mon, 2009-09-21 at 14:01 +0100, Simon Riggs wrote:
 On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:

  is this that we seem to be missing conflict
  resolution for GiST index tuples deleted by the kill_prior_tuples
  mechanism. Unless I'm missing something, we need similar handling there
  that we have in b-tree.
 
 OK, I agree with that. Straightforward change. Thanks very much.
 
 I marked the comment to indicate that the handling for GIST and GIN
 indexes looked dubious to me also. I had the earlier it is safe
 comments but that was before we looked at the kill prior tuples issue.

ISTM I looked at this too quickly.

kill_prior_tuple is only ever set by these lines, after scan starts:

if (!scan-xactStartedInRecovery)
scan-kill_prior_tuple = scan-xs_hot_dead;

which is set in indexam.c, not within any particular am. So the coding,
as submitted, covers all index types, current and future.

AFAICS there is no bug, unless you have a test case or can explain
further?

Worth raising as a query because it forced me to re-check how GIST and
GIN work and am happy again now.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-21 Thread Robert Haas
On Mon, Sep 21, 2009 at 9:01 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On Mon, 2009-09-21 at 13:50 +0300, Heikki Linnakangas wrote:

 The only bug I've found

 !

Yeah, wow.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-21 Thread Bruce Momjian
Simon Riggs wrote:
 
 OK, here is the latest version of the Hot Standby patchset. This is
 about version 30+ by now, but we should regard this as 0.2.1
 Patch against CVS HEAD (now): clean apply, compile, no known bugs.

Wow, great!  Simon has allowed us to pass a great milestone in Postgres
development.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-21 Thread Jeff Janes
On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs si...@2ndquadrant.com wrote:

 OK, here is the latest version of the Hot Standby patchset. This is
 about version 30+ by now, but we should regard this as 0.2.1
 Patch against CVS HEAD (now): clean apply, compile, no known bugs.

 OVERVIEW

 You can download PDF versions of the fine manual is here
 http://wiki.postgresql.org/images/0/01/Hot_Standby_main.pdf


From this doc:

In recovery, transactions will not be permitted to take any lock
higher other than
AccessShareLock or AccessExclusiveLock. In addition, transactions may never
assign a TransactionId and may never write WAL. The LOCK TABLE command by
default applies an AccessExclusiveLock. Any LOCK TABLE command that runs on
the standby and requests a specific lock type other than AccessShareLock will be
rejected.

The first sentence seems to say that clients on the stand-by can take
ACCESS EXCLUSIVE, while the last sentence seems to say that they
cannot do so.

I did a little experiment on a hot standby instance.  I expected that
either I would be denied the lock altogether, or the lock would cause
WAL replay to be paused until either I committed or was forcibly
canceled.  But neither happened, I was granted the lock but WAL replay
continued anyway.

jjanes=# begin;
BEGIN
jjanes=# lock table pgbench_history in access exclusive mode;
LOCK TABLE
jjanes=# select count(*) from pgbench_history;
 count

 519104
(1 row)

jjanes=# select count(*) from pgbench_history;
 count

 527814
(1 row)

Is this the expected behavior?

Thanks,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-18 Thread Simon Riggs

On Thu, 2009-09-17 at 19:01 -0400, Robert Haas wrote:
 
  I'm going to put the index-only scans aside for now to focus on hot
  standby and streaming replication. Both are big patches, so there's
  plenty of work in those two alone, and not only for me.
 
 What is the best way to attack this?  Should I keep reviewing
 index-only scans so that you have feedback for when you get back to
 it, or should I move on to something else?  If something else, does it
 make more sense for me to look at HS since I did a bit of work with a
 previous version, or would it be better for me to just pick one of the
 other patches from the CommitFest and work on that?
 
 Also, stepping back from me personally, should we try to assign some
 additional reviewers to these patches?  Is there some way we can
 divide up review tasks among multiple people so that we're not
 repeating each others work?
 
 Thoughts appreciated, from Heikki, Simon, or others.

I think this is a great opportunity to widen the pool of people
contributing to reviews.

I suggest the creation of a second group of people, performing
round-robin testing of patches. These people would be able to verify
* documentation matches implemented features (does it do what it says on
the tin?)
* usability of explicit features (do the features work well?)
* usability gap of unimplemented features (what else do we need?)
* are there any bugs?

These questions are often quickly answered for smaller patches, but HS's
scope mean that such a task properly executed could take a full week, if
not longer.

Second group of people are just as skilled Postgres people as reviewers,
in some cases more so, apart from they have less detailed knowledge of
the codebase. We have many such people and it would be good to encourage
them to perform thorough reviews rather than tire kicking.

I'm not sure that Heikki needs additional reviewers. He now has
significant knowledge of the patch and is good at focusing on key
aspects of the internals. Other code reviewers are welcome, of course.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-18 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 Also, stepping back from me personally, should we try to assign some
 additional reviewers to these patches?  Is there some way we can
 divide up review tasks among multiple people so that we're not
 repeating each others work?

 Thoughts appreciated, from Heikki, Simon, or others.

How about this proposal:
  http://archives.postgresql.org/pgsql-hackers/2009-08/msg00764.php

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-18 Thread Jeff Janes
On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs si...@2ndquadrant.com wrote:


 OK, here is the latest version of the Hot Standby patchset. This is
 about version 30+ by now, but we should regard this as 0.2.1
 Patch against CVS HEAD (now): clean apply, compile, no known bugs.



Hi Simon,

Is there a reason that you remove the WAL_DEBUG shown below?


*** begin:;
*** 899,923 
FIN_CRC32(rdata_crc);
record-xl_crc = rdata_crc;

- #ifdef WAL_DEBUG
-   if (XLOG_DEBUG)
-   {
-   StringInfoData buf;
-
-   initStringInfo(buf);
-   appendStringInfo(buf, INSERT @ %X/%X: ,
-RecPtr.xlogid,
RecPtr.xrecoff);
-   xlog_outrec(buf, record);
-   if (rdata-data != NULL)
-   {
-   appendStringInfo(buf,  - );
-   RmgrTable[record-xl_rmid].rm_desc(buf,
record-xl_info, rdata-data);
-   }
-   elog(LOG, %s, buf.data);
-   pfree(buf.data);
-   }
- #endif
-
/* Record begin of record in appropriate places */
ProcLastRecPtr = RecPtr;
Insert-PrevRecord = RecPtr;
--- 947,952 



Thanks,

Jeff


Re: [HACKERS] Hot Standby 0.2.1

2009-09-18 Thread Simon Riggs

On Fri, 2009-09-18 at 07:23 -0700, Jeff Janes wrote:
 On Tue, Sep 15, 2009 at 2:41 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
 
 OK, here is the latest version of the Hot Standby patchset.
 This is
 about version 30+ by now, but we should regard this as 0.2.1
 Patch against CVS HEAD (now): clean apply, compile, no known
 bugs.

 Is there a reason that you remove the WAL_DEBUG shown below?

WAL_DEBUG is not removed by the patch, though that section of code is
removed, as you observe. I recall an earlier bug report by
me/conversation on hackers about how that section of code was
irrecoverably broken, since it's calling an rmgr routine while not in
recovery and also assuming the data is fully accessible at that point,
which it is not.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-18 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Fri, 2009-09-18 at 07:23 -0700, Jeff Janes wrote:
 Is there a reason that you remove the WAL_DEBUG shown below?

 WAL_DEBUG is not removed by the patch, though that section of code is
 removed, as you observe. I recall an earlier bug report by
 me/conversation on hackers about how that section of code was
 irrecoverably broken, since it's calling an rmgr routine while not in
 recovery and also assuming the data is fully accessible at that point,
 which it is not.

Wouldn't it be sufficient to remove the rm_desc() call?  I agree
that that's broken, but the rest doesn't seem to be.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-18 Thread Simon Riggs

On Fri, 2009-09-18 at 11:14 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Fri, 2009-09-18 at 07:23 -0700, Jeff Janes wrote:
  Is there a reason that you remove the WAL_DEBUG shown below?
 
  WAL_DEBUG is not removed by the patch, though that section of code is
  removed, as you observe. I recall an earlier bug report by
  me/conversation on hackers about how that section of code was
  irrecoverably broken, since it's calling an rmgr routine while not in
  recovery and also assuming the data is fully accessible at that point,
  which it is not.
 
 Wouldn't it be sufficient to remove the rm_desc() call?  I agree
 that that's broken, but the rest doesn't seem to be.

That would make sense also. Previous action just because that was
earlier consensus. Will change.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-18 Thread Marcos Luis Ortiz Valmaseda
I want to help on this area, but I need a mentor for this.
For example, Heikki will be a excellent mentor for me.

Following the theme, I think that we have to wide all questions for the process 
of the acceptance of a patch on the same way that you Simon.

We could write new requirements with all these ideas. Don´t you think?

Regards

The hurry is enemy of the success: for that reason...Be patient

Ing. Marcos L. Ortiz Valmaseda
Línea Soporte y Despliegue
Centro de Tecnologías de Almacenamiento y Análisis de Datos (CENTALAD)

Linux User # 418229
PostgreSQL User
http://www.postgresql.org
http://www.planetpostgresql.org/
http://www.postgresql-es.org/


- Mensaje original -
De: Simon Riggs si...@2ndquadrant.com
Para: Robert Haas robertmh...@gmail.com
CC: Heikki Linnakangas heikki.linnakan...@enterprisedb.com, Josh Berkus 
j...@agliodbs.com, pgsql-hackers@postgresql.org
Enviados: Jueves, 17 de Septiembre 2009 20:53:24 GMT -10:00 Hawai
Asunto: Re: [HACKERS] Hot Standby 0.2.1


On Thu, 2009-09-17 at 19:01 -0400, Robert Haas wrote:
 
  I'm going to put the index-only scans aside for now to focus on hot
  standby and streaming replication. Both are big patches, so there's
  plenty of work in those two alone, and not only for me.
 
 What is the best way to attack this?  Should I keep reviewing
 index-only scans so that you have feedback for when you get back to
 it, or should I move on to something else?  If something else, does it
 make more sense for me to look at HS since I did a bit of work with a
 previous version, or would it be better for me to just pick one of the
 other patches from the CommitFest and work on that?
 
 Also, stepping back from me personally, should we try to assign some
 additional reviewers to these patches?  Is there some way we can
 divide up review tasks among multiple people so that we're not
 repeating each others work?
 
 Thoughts appreciated, from Heikki, Simon, or others.

I think this is a great opportunity to widen the pool of people
contributing to reviews.

I suggest the creation of a second group of people, performing
round-robin testing of patches. These people would be able to verify
* documentation matches implemented features (does it do what it says on
the tin?)
* usability of explicit features (do the features work well?)
* usability gap of unimplemented features (what else do we need?)
* are there any bugs?

These questions are often quickly answered for smaller patches, but HS's
scope mean that such a task properly executed could take a full week, if
not longer.

Second group of people are just as skilled Postgres people as reviewers,
in some cases more so, apart from they have less detailed knowledge of
the codebase. We have many such people and it would be good to encourage
them to perform thorough reviews rather than tire kicking.

I'm not sure that Heikki needs additional reviewers. He now has
significant knowledge of the patch and is good at focusing on key
aspects of the internals. Other code reviewers are welcome, of course.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-17 Thread Heikki Linnakangas
Robert Haas wrote:
 On Wed, Sep 16, 2009 at 6:05 PM, Josh Berkus j...@agliodbs.com wrote:
 Now that Simon has submitted this, can some of the heavy-hitters here
 review it?  Heikki?

 Nobody's name is next to it.
 
 I don't think anyone is planning to ignore this patch, but it wasn't
 included in the first round of round-robin reviewing assignments
 because it wasn't submitted until the following day, after the
 announced deadline for submissions had already passed.  So most people
 are probably busy with with some other patch at the moment, but that's
 a temporary phenomenon.

Right, I've added myself as reviewer now.

  This is a pretty small CommitFest, so there
 shouldn't be any shortage of reviewers, though Heikki's time may be
 stretched a little thin, since Streaming Replication is also in the
 queue, and he is working on index-only scans.  That's really for him
 to comment on, though.

I'm going to put the index-only scans aside for now to focus on hot
standby and streaming replication. Both are big patches, so there's
plenty of work in those two alone, and not only for me.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-17 Thread Simon Riggs

On Thu, 2009-09-17 at 09:54 +0300, Heikki Linnakangas wrote:

   This is a pretty small CommitFest, so there
  shouldn't be any shortage of reviewers, though Heikki's time may be
  stretched a little thin, since Streaming Replication is also in the
  queue, and he is working on index-only scans.  That's really for him
  to comment on, though.
 
 I'm going to put the index-only scans aside for now to focus on hot
 standby and streaming replication. Both are big patches, so there's
 plenty of work in those two alone, and not only for me.

That's very good of you, thanks.

It was already clear to a few people that your time would bottleneck
trying to review both at the same time. I personally wasn't expecting
you to jump into immediate action on HS. We have time.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-17 Thread Robert Haas
On Thu, Sep 17, 2009 at 2:54 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Robert Haas wrote:
 On Wed, Sep 16, 2009 at 6:05 PM, Josh Berkus j...@agliodbs.com wrote:
 Now that Simon has submitted this, can some of the heavy-hitters here
 review it?  Heikki?

 Nobody's name is next to it.

 I don't think anyone is planning to ignore this patch, but it wasn't
 included in the first round of round-robin reviewing assignments
 because it wasn't submitted until the following day, after the
 announced deadline for submissions had already passed.  So most people
 are probably busy with with some other patch at the moment, but that's
 a temporary phenomenon.

 Right, I've added myself as reviewer now.

  This is a pretty small CommitFest, so there
 shouldn't be any shortage of reviewers, though Heikki's time may be
 stretched a little thin, since Streaming Replication is also in the
 queue, and he is working on index-only scans.  That's really for him
 to comment on, though.

 I'm going to put the index-only scans aside for now to focus on hot
 standby and streaming replication. Both are big patches, so there's
 plenty of work in those two alone, and not only for me.

What is the best way to attack this?  Should I keep reviewing
index-only scans so that you have feedback for when you get back to
it, or should I move on to something else?  If something else, does it
make more sense for me to look at HS since I did a bit of work with a
previous version, or would it be better for me to just pick one of the
other patches from the CommitFest and work on that?

Also, stepping back from me personally, should we try to assign some
additional reviewers to these patches?  Is there some way we can
divide up review tasks among multiple people so that we're not
repeating each others work?

Thoughts appreciated, from Heikki, Simon, or others.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-16 Thread Bernd Helmle



--On 15. September 2009 22:41:59 +0100 Simon Riggs si...@2ndquadrant.com 
wrote:



http://wiki.postgresql.org/images/0/01/Hot_Standby_Recovery_Functions.pdf


This doesn't work for me, it seems the correct link is

http://wiki.postgresql.org/images/1/10/Hot_Standby_Recovery_Functions.pdf 
?


--
Thanks

Bernd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-16 Thread Josh Berkus
All,

Now that Simon has submitted this, can some of the heavy-hitters here
review it?  Heikki?

Nobody's name is next to it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-16 Thread Robert Haas
On Wed, Sep 16, 2009 at 6:05 PM, Josh Berkus j...@agliodbs.com wrote:
 Now that Simon has submitted this, can some of the heavy-hitters here
 review it?  Heikki?

 Nobody's name is next to it.

I don't think anyone is planning to ignore this patch, but it wasn't
included in the first round of round-robin reviewing assignments
because it wasn't submitted until the following day, after the
announced deadline for submissions had already passed.  So most people
are probably busy with with some other patch at the moment, but that's
a temporary phenomenon.  This is a pretty small CommitFest, so there
shouldn't be any shortage of reviewers, though Heikki's time may be
stretched a little thin, since Streaming Replication is also in the
queue, and he is working on index-only scans.  That's really for him
to comment on, though.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby 0.2.1

2009-09-15 Thread David Fetter
On Tue, Sep 15, 2009 at 10:41:59PM +0100, Simon Riggs wrote:
 
 OK, here is the latest version of the Hot Standby patchset.  This is
 about version 30+ by now, but we should regard this as 0.2.1 Patch
 against CVS HEAD (now): clean apply, compile, no known bugs.

Kudos

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers