Re: [HACKERS] Review of Refactoring code for sync node detection

2014-12-12 Thread Michael Paquier
On Fri, Dec 12, 2014 at 9:38 PM, Heikki Linnakangas
 wrote:
> On 12/12/2014 04:29 AM, Michael Paquier wrote:
>>
>> On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas <
>> hlinnakan...@vmware.com> wrote:
>>
>>> I propose the attached (I admit I haven't tested it).
>>>
>> Actually if you do it this way I think that it would be worth adding the
>> small optimization Fujii-san mentioned upthread: if priority is equal to
>> 1,
>> we leave the loop earlier and return immediately the pointer. All those
>> things gathered give the patch attached, that I actually tested FWIW with
>> multiple standbys and multiple entries in s_s_names.
>
>
> Ok, committed.
Thanks!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-12-12 Thread Heikki Linnakangas

On 12/12/2014 04:29 AM, Michael Paquier wrote:

On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas <
hlinnakan...@vmware.com> wrote:


I propose the attached (I admit I haven't tested it).


Actually if you do it this way I think that it would be worth adding the
small optimization Fujii-san mentioned upthread: if priority is equal to 1,
we leave the loop earlier and return immediately the pointer. All those
things gathered give the patch attached, that I actually tested FWIW with
multiple standbys and multiple entries in s_s_names.


Ok, committed.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-12-11 Thread Michael Paquier
On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas <
hlinnakan...@vmware.com> wrote:

> * I don't like filling the priorities-array in
> SyncRepGetSynchronousStandby(). It might be convenient for the one caller
> that needs it, but it seems pretty ad hoc.
>
> In fact, I don't really see the point of having the array at all. You
> could just print the priority in the main loop in
> pg_stat_get_wal_senders(). Yeah, nominally you need to hold SyncRepLock
> while reading the priority, but it seems over-zealous to be so strict about
> that in pg_stat_wal_senders(), since it's just an informational view, and
> priority changes so rarely that it's highly unlikely to hit a race
> condition there. Also note that when you change the list of synchronous
> standbys in the config file, and SIGHUP, the walsender processes will
> update their priorities in random order. So the idea that you get a
> consistent snapshot of the priorities is a bit bogus anyway. Also note that
> between filling the priorities array and the main loop in
> pg_stat_get_wal_senders(), a new WAL sender might connect and set a slot's
> pid. With the current coding, you'll print an uninitialized value from the
> array if that happens. So the only thing that holding the SyncRepLock
> really protects from is a "torn" read of the value, if reading an int is
> not atomic. We could use the spinlock to protect from that if we care.
>

That's fair, and it simplifies the whole process as well as the API able to
fetch the synchronous WAL sender.


> * Would be nicer to return a pointer, than index into the wal senders
> array. That's what the caller really wants.
>
> I propose the attached (I admit I haven't tested it).
>

Actually if you do it this way I think that it would be worth adding the
small optimization Fujii-san mentioned upthread: if priority is equal to 1,
we leave the loop earlier and return immediately the pointer. All those
things gathered give the patch attached, that I actually tested FWIW with
multiple standbys and multiple entries in s_s_names.

Regards,
-- 
Michael
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..34530a0 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -358,6 +358,62 @@ SyncRepInitConfig(void)
 }
 
 /*
+ * Find the WAL sender servicing the synchronous standby with the lowest
+ * priority value, or NULL if no synchronous standby is connected. If there
+ * are multiple nodes with the same lowest priority value, the first node
+ * found is selected. The caller must hold SyncRepLock.
+ */
+WalSnd *
+SyncRepGetSynchronousStandby(void)
+{
+	WalSnd	   *result = NULL;
+	int			result_priority = 0;
+	int			i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i < max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+		int			this_priority;
+
+		/* Must be active */
+		if (walsnd->pid == 0)
+			continue;
+
+		/* Must be streaming */
+		if (walsnd->state != WALSNDSTATE_STREAMING)
+			continue;
+
+		/* Must be synchronous */
+		this_priority = walsnd->sync_standby_priority;
+		if (this_priority == 0)
+			continue;
+
+		/* Must have a lower priority value than any previous ones */
+		if (result != NULL && result_priority <= this_priority)
+			continue;
+
+		/* Must have a valid flush position */
+		if (XLogRecPtrIsInvalid(walsnd->flush))
+			continue;
+
+		result = (WalSnd *) walsnd;
+		result_priority = this_priority;
+
+		/*
+		 * If priority is equal to 1, we are sure that there are no other
+		 * WAL senders that could be synchronous with a lower prioroty,
+		 * hence leave immediately with this one.
+		 */
+		if (this_priority == 1)
+			return result;
+	}
+
+	return result;
+}
+
+/*
  * Update the LSNs on each queue based upon our latest state. This
  * implements a simple policy of first-valid-standby-releases-waiter.
  *
@@ -368,11 +424,9 @@ void
 SyncRepReleaseWaiters(void)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
-	volatile WalSnd *syncWalSnd = NULL;
+	WalSnd	   *syncWalSnd;
 	int			numwrite = 0;
 	int			numflush = 0;
-	int			priority = 0;
-	int			i;
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -388,32 +442,13 @@ SyncRepReleaseWaiters(void)
 	/*
 	 * We're a potential sync standby. Release waiters if we are the highest
 	 * priority standby. If there are multiple standbys with same priorities
-	 * then we use the first mentioned standby. If you change this, also
-	 * change pg_stat_get_wal_senders().
+	 * then we 

Re: [HACKERS] Review of Refactoring code for sync node detection

2014-12-11 Thread Heikki Linnakangas

On 11/18/2014 11:23 PM, Michael Paquier wrote:

On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs  wrote:


Can we just wait on this patch until we have the whole feature?


Well, this may take some time to even define, and even if goals are clearly
defined this may take even more time to have a prototype to discuss about.



We quite often break larger patches down into smaller ones, but if
they arrive in lots of small pieces its more difficult to understand
and correct issues that are happening on the larger scale. Churning
the same piece of code multiple times is rather mind numbing.


Hm. I think that we are losing ourselves in this thread. The primary goal
is to remove a code duplication between syncrep.c and walsender,c that
exists since 9.1. Would it be possible to focus only on that for now?
That's still the topic of this CF.


Some comments on this patch:

* I don't like filling the priorities-array in 
SyncRepGetSynchronousStandby(). It might be convenient for the one 
caller that needs it, but it seems pretty ad hoc.


In fact, I don't really see the point of having the array at all. You 
could just print the priority in the main loop in 
pg_stat_get_wal_senders(). Yeah, nominally you need to hold SyncRepLock 
while reading the priority, but it seems over-zealous to be so strict 
about that in pg_stat_wal_senders(), since it's just an informational 
view, and priority changes so rarely that it's highly unlikely to hit a 
race condition there. Also note that when you change the list of 
synchronous standbys in the config file, and SIGHUP, the walsender 
processes will update their priorities in random order. So the idea that 
you get a consistent snapshot of the priorities is a bit bogus anyway. 
Also note that between filling the priorities array and the main loop in 
pg_stat_get_wal_senders(), a new WAL sender might connect and set a 
slot's pid. With the current coding, you'll print an uninitialized value 
from the array if that happens. So the only thing that holding the 
SyncRepLock really protects from is a "torn" read of the value, if 
reading an int is not atomic. We could use the spinlock to protect from 
that if we care.


* Would be nicer to return a pointer, than index into the wal senders 
array. That's what the caller really wants.


I propose the attached (I admit I haven't tested it).

- Heikki

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..da89a3b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -358,6 +358,54 @@ SyncRepInitConfig(void)
 }
 
 /*
+ * Find the WAL sender servicing the synchronous standby with the lowest
+ * priority value, or NULL if no synchronous standby is connected. If there
+ * are multiple nodes with the same lowest priority value, the first node
+ * found is selected. The caller must hold SyncRepLock.
+ */
+WalSnd *
+SyncRepGetSynchronousStandby(void)
+{
+	WalSnd	   *result = NULL;
+	int			result_priority = 0;
+	int			i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i < max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+		int			this_priority;
+
+		/* Must be active */
+		if (walsnd->pid == 0)
+			continue;
+
+		/* Must be streaming */
+		if (walsnd->state != WALSNDSTATE_STREAMING)
+			continue;
+
+		/* Must be synchronous */
+		this_priority = walsnd->sync_standby_priority;
+		if (this_priority == 0)
+			continue;
+
+		/* Must have a lower priority value than any previous ones */
+		if (result != NULL && result_priority <= this_priority)
+			continue;
+
+		/* Must have a valid flush position */
+		if (XLogRecPtrIsInvalid(walsnd->flush))
+			continue;
+
+		result = (WalSnd *) walsnd;
+		result_priority = this_priority;
+	}
+
+	return result;
+}
+
+/*
  * Update the LSNs on each queue based upon our latest state. This
  * implements a simple policy of first-valid-standby-releases-waiter.
  *
@@ -368,11 +416,9 @@ void
 SyncRepReleaseWaiters(void)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
-	volatile WalSnd *syncWalSnd = NULL;
+	WalSnd	   *syncWalSnd;
 	int			numwrite = 0;
 	int			numflush = 0;
-	int			priority = 0;
-	int			i;
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -388,32 +434,13 @@ SyncRepReleaseWaiters(void)
 	/*
 	 * We're a potential sync standby. Release waiters if we are the highest
 	 * priority standby. If there are multiple standbys with same priorities
-	 * then we use the first mentioned standby. If you change this, also
-	 * cha

Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-18 Thread Michael Paquier
On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs  wrote:

> Can we just wait on this patch until we have the whole feature?
>
Well, this may take some time to even define, and even if goals are clearly
defined this may take even more time to have a prototype to discuss about.


> We quite often break larger patches down into smaller ones, but if
> they arrive in lots of small pieces its more difficult to understand
> and correct issues that are happening on the larger scale. Churning
> the same piece of code multiple times is rather mind numbing.
>
Hm. I think that we are losing ourselves in this thread. The primary goal
is to remove a code duplication between syncrep.c and walsender,c that
exists since 9.1. Would it be possible to focus only on that for now?
That's still the topic of this CF.
-- 
Michael


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-18 Thread Simon Riggs
On 18 November 2014 00:02, Michael Paquier  wrote:

>> I've not read the patches yet. But while I was reading the code around
>> sync node detection, I thought that it might be good idea to treat the
>> node with priority as special. That is, if we find the active node with
>> the priority 1, we can immediately go out of the sync-node-detection-loop.
>>
>> In many cases we can expect that the sync standby has the priority 1.
>> If yes, treating the priority 1 as special is good way to do, I think. 
>> Thought?

Agreed

> That would really save some cycles. Now we still need to fetch the
> priority numbers of all nodes for pg_stat_get_wal_senders, so in this
> case scanning all the entries in the WAL sender array is necessary.
> This optimization is orthogonal to the refactoring patch btw, no?

To the refactoring patch, possibly. But not to the changes you're
planning to make.

ISTM that we should remember the priority list of nodes and check them
in that order by direct lookups to array elements. That will work for
1 or more nodes and it also works better with large numbers of
walsenders.

Can we just wait on this patch until we have the whole feature?

We quite often break larger patches down into smaller ones, but if
they arrive in lots of small pieces its more difficult to understand
and correct issues that are happening on the larger scale. Churning
the same piece of code multiple times is rather mind numbing.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-17 Thread Michael Paquier
On Tue, Nov 18, 2014 at 3:03 AM, Fujii Masao  wrote:
> On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier
>  wrote:
>> On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
>>  wrote:
>>> I'll send an updated patch tomorrow.
>>
>> Here are updated versions. I divided the patch into two for clarity,
>> the first one is the actual refactoring patch, renaming
>> SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
>> like updating synchronous to sync in the comments as you mentioned)
>> such as the namings have no conflicts.
>>
>> The second one updates the syncrep code, including WAL sender and WAL
>> receiver, and its surroundings to use the term "node" instead of
>> "standby". This brings in the code the idea that a node using
>> replication APIs is not necessarily a standby, making it more generic.
>> This can be applied on top of the refactoring patch. If any other
>> folks (Heikki or Fujii-san?) have comments about this idea feel free.
>
> I've not read the patches yet. But while I was reading the code around
> sync node detection, I thought that it might be good idea to treat the
> node with priority as special. That is, if we find the active node with
> the priority 1, we can immediately go out of the sync-node-detection-loop.
>
> In many cases we can expect that the sync standby has the priority 1.
> If yes, treating the priority 1 as special is good way to do, I think. 
> Thought?

That would really save some cycles. Now we still need to fetch the
priority numbers of all nodes for pg_stat_get_wal_senders, so in this
case scanning all the entries in the WAL sender array is necessary.
This optimization is orthogonal to the refactoring patch btw, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-17 Thread Fujii Masao
On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier
 wrote:
> On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
>  wrote:
>> I'll send an updated patch tomorrow.
>
> Here are updated versions. I divided the patch into two for clarity,
> the first one is the actual refactoring patch, renaming
> SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
> like updating synchronous to sync in the comments as you mentioned)
> such as the namings have no conflicts.
>
> The second one updates the syncrep code, including WAL sender and WAL
> receiver, and its surroundings to use the term "node" instead of
> "standby". This brings in the code the idea that a node using
> replication APIs is not necessarily a standby, making it more generic.
> This can be applied on top of the refactoring patch. If any other
> folks (Heikki or Fujii-san?) have comments about this idea feel free.

I've not read the patches yet. But while I was reading the code around
sync node detection, I thought that it might be good idea to treat the
node with priority as special. That is, if we find the active node with
the priority 1, we can immediately go out of the sync-node-detection-loop.

In many cases we can expect that the sync standby has the priority 1.
If yes, treating the priority 1 as special is good way to do, I think. Thought?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-17 Thread Michael Paquier
On Mon, Nov 17, 2014 at 10:00 PM, Simon Riggs  wrote:
> On 16 November 2014 12:07, Michael Paquier  wrote:
>
>> Let's work
>> the multiple node thing once we have a better spec of how to do it,
>> visibly using a dedicated micro-language within s_s_names.
>
> Hmm, please make sure that is a new post. That is easily something I
> could disagree with, even though I support the need for more
> functionality.
Sure. I am not really on that yet though :)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-17 Thread Simon Riggs
On 16 November 2014 12:07, Michael Paquier  wrote:

> Let's work
> the multiple node thing once we have a better spec of how to do it,
> visibly using a dedicated micro-language within s_s_names.

Hmm, please make sure that is a new post. That is easily something I
could disagree with, even though I support the need for more
functionality.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-16 Thread Michael Paquier
On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier
 wrote:
> I'll send an updated patch tomorrow.

Here are updated versions. I divided the patch into two for clarity,
the first one is the actual refactoring patch, renaming
SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha,
like updating synchronous to sync in the comments as you mentioned)
such as the namings have no conflicts.

The second one updates the syncrep code, including WAL sender and WAL
receiver, and its surroundings to use the term "node" instead of
"standby". This brings in the code the idea that a node using
replication APIs is not necessarily a standby, making it more generic.
This can be applied on top of the refactoring patch. If any other
folks (Heikki or Fujii-san?) have comments about this idea feel free.
Regards,
-- 
Michael
From 51e9988ff44b7c2b716e3a0da3f1d1c9359a1d79 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 30 Oct 2014 22:01:10 +0900
Subject: [PATCH 1/2] Refactor code to detect synchronous node in WAL sender
 array

This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
 src/backend/replication/syncrep.c   | 97 +++--
 src/backend/replication/walsender.c | 35 +++--
 src/include/replication/syncrep.h   |  1 +
 3 files changed, 78 insertions(+), 55 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..f838ad0 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -357,6 +357,70 @@ SyncRepInitConfig(void)
 	}
 }
 
+
+/*
+ * Obtain position of sync standby in the array referencing all the WAL
+ * senders, or -1 if no sync node can be found. The caller of this function
+ * should take a lock on SyncRepLock. If there are multiple nodes with
+ * the same lowest priority value, the first node found is selected.
+ * sync_priority is a preallocated array of size max_wal_senders that can
+ * be used to retrieve the priority of each WAL sender. Its inclusion in
+ * this function has the advantage to limit the scan of the WAL sender
+ * array to one pass, limiting the amount of cycles SyncRepLock is taken.
+ */
+int
+SyncRepGetSynchronousStandby(int *node_priority)
+{
+	int		sync_node = -1;
+	int		priority = 0;
+	int		i;
+
+	/* Scan WAL senders and find the sync node if any */
+	for (i = 0; i < max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+
+		/* First get the priority of this WAL sender */
+		if (node_priority)
+			node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
+0 : walsnd->sync_standby_priority;
+
+		/* Proceed to next if not active */
+		if (walsnd->pid == 0)
+			continue;
+
+		/* Proceed to next if not streaming */
+		if (walsnd->state != WALSNDSTATE_STREAMING)
+			continue;
+
+		/* Proceed to next one if asynchronous */
+		if (walsnd->sync_standby_priority == 0)
+			continue;
+
+		/* Proceed to next one if priority conditions not satisfied */
+		if (priority != 0 &&
+			priority <= walsnd->sync_standby_priority)
+			continue;
+
+		/* Proceed to next one if flush position is invalid */
+		if (XLogRecPtrIsInvalid(walsnd->flush))
+			continue;
+
+		/*
+		 * We have a potential sync candidate, choose it as the sync
+		 * node for the time being before going through the other nodes
+		 * listed in the WAL sender array.
+		 */
+		sync_node = i;
+
+		/* Update priority to current value of WAL sender */
+		priority = walsnd->sync_standby_priority;
+	}
+
+	return sync_node;
+}
+
 /*
  * Update the LSNs on each queue based upon our latest state. This
  * implements a simple policy of first-valid-standby-releases-waiter.
@@ -369,10 +433,9 @@ SyncRepReleaseWaiters(void)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
 	volatile WalSnd *syncWalSnd = NULL;
+	int			sync_node;
 	int			numwrite = 0;
 	int			numflush = 0;
-	int			priority = 0;
-	int			i;
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -388,32 +451,14 @@ SyncRepReleaseWaiters(void)
 	/*
 	 * We're a potential sync standby. Release waiters if we are the highest
 	 * priority standby. If there are multiple standbys with same priorities
-	 * then we use the first mentioned standby. If you change this, also
-	 * change pg_stat_get_wal_senders().
+	 * then we use the first mentioned standby.
 	 */
 	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+	sync_node = SyncRepGetSynchronousStandby(NULL);
 
-	for (i = 0; i < max_wal_senders; i++)
-	{
-		/* use volatile pointer to prevent code rearrangement */
-		volatile WalSnd *walsnd = &walsndctl->walsnds[i];
-
-		if (walsnd->pid != 0 &&
-			walsnd->state == WALSNDSTATE_STREAMING &&
-			walsnd->sync_standby_priority > 0 &&
-			(priority == 0 ||
-			 priority > walsnd->sync_standby_priority) &&
-			!XLogRecPtrIsInvalid

Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-16 Thread Michael Paquier
Thanks for the comments!

On Sun, Nov 16, 2014 at 8:32 PM, Simon Riggs  wrote:
> On 31 October 2014 00:27, Michael Paquier  wrote:
>> On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby  wrote:
>>>
>>> If we stick with this version I'd argue it makes more sense to just stick
>>> the sync_node = and priority = statements into the if block and ditch the
>>> continue. 
>>
>> Let's go with the cleaner version then, I'd prefer code that can be read
>> easily for this code path. Switching back is not much complicated either.
>
> I like where you are headed with this feature set. I will do my best
> to comment and review so we get something in 9.5.
>
> Some review comments
>
> * The new function calls them a Node rather than a Standby. That is a
> good change, but it needs to be applied consistently, so we no longer
> use the word Standby anywhere near the sync rep code. I'd prefer if we
> continue to use the abbreviation sync for synchronous, since its less
> typing and avoids spelling mistakes in code and comments.
Yes I guess this makes sense.

> * Rationale...
> Robert Haas wrote
>> Interestingly, the syncrep code has in some of its code paths the idea
>> that a synchronous node is unique, while other code paths assume that
>> there can be multiple synchronous nodes. If that's fine I think that
>> it would be better to just make the code multiple-sync node aware, by
>> having a single function call in walsender.c and syncrep.c that
>> returns an integer array of WAL sender positions (WalSndCtl). as that
>> seems more extensible long-term.
>
> I thought this was the rationale for the patch, but it doesn't do
> this. If you disagree with Robert, it would be best to say so now,
> since I would guess he's expecting the patch to be doing as he asked.
> Or perhaps I misunderstand.

This comment is originally from me :)
http://www.postgresql.org/message-id/CAB7nPqTtmSo0YX1_3_PykpKbdGUi3UeFd0_=2-6vrqo5n_t...@mail.gmail.com
Changing with my older opinion, I wrote the patch on this thread with
in mind the idea to keep the code as simple as possible, so for now it
is better to still think that we have a single sync node. Let's work
the multiple node thing once we have a better spec of how to do it,
visibly using a dedicated micro-language within s_s_names.

> * Multiple sync standbys were originally avoided because of the code
> cost and complexity at ReleaseWaiters(). I'm very keen to keep the
> code at that point very robust, so simplicity is essential. It would
> also be useful to have some kind of micro-benchmark that allows us to
> understand the overhead of various configs. Jim mentions he isn't sure
> of the performance there. I don't see too much a problem with this
> patch, but the later patches will require this, so we may as well do
> this soon.
Yes I have been playing with that with the patch series of the
previous commit fest, with something not that much kludgy.

> * We probably don't need a comment like /* Cleanup */ inserted above a
> pfree. ;-)
Sure. I'll send an updated patch tomorrow.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-11-16 Thread Simon Riggs
On 31 October 2014 00:27, Michael Paquier  wrote:
> On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby  wrote:
>>
>> If we stick with this version I'd argue it makes more sense to just stick
>> the sync_node = and priority = statements into the if block and ditch the
>> continue. 
>
> Let's go with the cleaner version then, I'd prefer code that can be read
> easily for this code path. Switching back is not much complicated either.

I like where you are headed with this feature set. I will do my best
to comment and review so we get something in 9.5.

Some review comments

* The new function calls them a Node rather than a Standby. That is a
good change, but it needs to be applied consistently, so we no longer
use the word Standby anywhere near the sync rep code. I'd prefer if we
continue to use the abbreviation sync for synchronous, since its less
typing and avoids spelling mistakes in code and comments.

* Rationale...
Robert Haas wrote
> Interestingly, the syncrep code has in some of its code paths the idea
> that a synchronous node is unique, while other code paths assume that
> there can be multiple synchronous nodes. If that's fine I think that
> it would be better to just make the code multiple-sync node aware, by
> having a single function call in walsender.c and syncrep.c that
> returns an integer array of WAL sender positions (WalSndCtl). as that
> seems more extensible long-term.

I thought this was the rationale for the patch, but it doesn't do
this. If you disagree with Robert, it would be best to say so now,
since I would guess he's expecting the patch to be doing as he asked.
Or perhaps I misunderstand.

* Multiple sync standbys were originally avoided because of the code
cost and complexity at ReleaseWaiters(). I'm very keen to keep the
code at that point very robust, so simplicity is essential. It would
also be useful to have some kind of micro-benchmark that allows us to
understand the overhead of various configs. Jim mentions he isn't sure
of the performance there. I don't see too much a problem with this
patch, but the later patches will require this, so we may as well do
this soon.

* We probably don't need a comment like /* Cleanup */ inserted above a
pfree. ;-)

I see this should get committed in next few days, even outside the CF.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-10-30 Thread Michael Paquier
On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby  wrote:

> If we stick with this version I'd argue it makes more sense to just stick
> the sync_node = and priority = statements into the if block and ditch the
> continue. 
>
Let's go with the cleaner version then, I'd prefer code that can be read
easily for this code path. Switching back is not much complicated either.
-- 
Michael
From 00d85f046d187056a85c6d6dc82e9a79674b132d Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 30 Oct 2014 22:01:10 +0900
Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array

This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
 src/backend/replication/syncrep.c   | 101 ++--
 src/backend/replication/walsender.c |  36 +++--
 src/include/replication/syncrep.h   |   1 +
 3 files changed, 82 insertions(+), 56 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..848c783 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -357,6 +357,72 @@ SyncRepInitConfig(void)
 	}
 }
 
+
+/*
+ * Obtain position of synchronous standby in the array referencing all
+ * the WAL senders, or -1 if no synchronous node can be found. The caller
+ * of this function should take a lock on SyncRepLock. If there are
+ * multiple nodes with the same lowest priority value, the first node
+ * found is selected.
+ * sync_priority is a preallocated array of size max_wal_senders that
+ * can be used to retrieve the priority of each WAL sender. Its
+ * inclusion in this function has the advantage to limit the scan
+ * of the WAL sender array to one pass, limiting the amount of cycles
+ * SyncRepLock is taken.
+ */
+int
+SyncRepGetSynchronousNode(int *node_priority)
+{
+	int		sync_node = -1;
+	int		priority = 0;
+	int		i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i < max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+
+		/* First get the priority of this WAL sender */
+		if (node_priority)
+			node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
+0 : walsnd->sync_standby_priority;
+
+		/* Proceed to next if not active */
+		if (walsnd->pid == 0)
+			continue;
+
+		/* Proceed to next if not streaming */
+		if (walsnd->state != WALSNDSTATE_STREAMING)
+			continue;
+
+		/* Proceed to next one if asynchronous */
+		if (walsnd->sync_standby_priority == 0)
+			continue;
+
+		/* Proceed to next one if priority conditions not satisfied */
+		if (priority != 0 &&
+			priority <= walsnd->sync_standby_priority)
+			continue;
+
+		/* Proceed to next one if flush position is invalid */
+		if (XLogRecPtrIsInvalid(walsnd->flush))
+			continue;
+
+		/*
+		 * We have a potential synchronous candidate, choose it as the
+		 * synchronous node for the time being before going through the
+		 * other nodes listed in the WAL sender array.
+		 */
+		sync_node = i;
+
+		/* Update priority to current value of WAL sender */
+		priority = walsnd->sync_standby_priority;
+	}
+
+	return sync_node;
+}
+
 /*
  * Update the LSNs on each queue based upon our latest state. This
  * implements a simple policy of first-valid-standby-releases-waiter.
@@ -369,10 +435,9 @@ SyncRepReleaseWaiters(void)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
 	volatile WalSnd *syncWalSnd = NULL;
+	int			sync_node;
 	int			numwrite = 0;
 	int			numflush = 0;
-	int			priority = 0;
-	int			i;
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -388,32 +453,14 @@ SyncRepReleaseWaiters(void)
 	/*
 	 * We're a potential sync standby. Release waiters if we are the highest
 	 * priority standby. If there are multiple standbys with same priorities
-	 * then we use the first mentioned standby. If you change this, also
-	 * change pg_stat_get_wal_senders().
+	 * then we use the first mentioned standbys.
 	 */
 	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+	sync_node = SyncRepGetSynchronousNode(NULL);
 
-	for (i = 0; i < max_wal_senders; i++)
-	{
-		/* use volatile pointer to prevent code rearrangement */
-		volatile WalSnd *walsnd = &walsndctl->walsnds[i];
-
-		if (walsnd->pid != 0 &&
-			walsnd->state == WALSNDSTATE_STREAMING &&
-			walsnd->sync_standby_priority > 0 &&
-			(priority == 0 ||
-			 priority > walsnd->sync_standby_priority) &&
-			!XLogRecPtrIsInvalid(walsnd->flush))
-		{
-			priority = 

Re: [HACKERS] Review of Refactoring code for sync node detection

2014-10-30 Thread Jim Nasby

On 10/30/14, 8:05 AM, Michael Paquier wrote:

This switches from using a single if() with multiple conditions &&'d
together to a bunch of if() continue's. I don't know if those will perform
the same, and AFAIK this is pretty performance critical.

Well, we could still use the old notation with a single if(). That's
not much complicated to change.


I actually prefer the multiple if's; it reads a LOT cleaner. I don't know what 
the compiler will do with it though.

If we stick with this version I'd argue it makes more sense to just stick the 
sync_node = and priority = statements into the if block and ditch the continue. 

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-10-30 Thread Michael Paquier
Thanks for your review! (No worries for creating a new thread, I don't
mind as this is not a huge patch. I think you could have downloaded
the mbox from postgresql.org btw).

On Thu, Oct 30, 2014 at 8:24 AM, Jim Nasby  wrote:
> SyncRepGetSynchronousNode():
> IMHO, the top comment should mention that if there are multiple standbys of
> the same priority it returns the first one.
OK. Corrected.

> This switches from using a single if() with multiple conditions &&'d
> together to a bunch of if() continue's. I don't know if those will perform
> the same, and AFAIK this is pretty performance critical.
Well, we could still use the old notation with a single if(). That's
not much complicated to change.

> In the comments, "Process" should be
> "Proceed".
Noted. Thanks for correcting my Frenglish.

> pg_stat_get_wal_senders():
> The original version only loops through walsnds[] one time; the new code
> loops twice, both times while holding SyncRepLock(shared). This will block
> transaction commit if syncrep is enabled. That's not great, but presumably
> this function isn't going to be called terribly often, so maybe it's OK. (On
> a side note, perhaps we should add a warning to the documentation for
> pg_stat_replication warning not to hammer it...)
Hm. For code readability I did not really want to do so, but let's do
this: let's add a new argument in SyncRepGetSynchronousNode to
retrieve the priority of each WAL sender and do a single pass through
the array such as there is no performance difference.

Updated patch attached.
Regards,
-- 
Michael
From 1a02c982de0aeb2bd7dcf0bbf34605017619a417 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 30 Oct 2014 22:01:10 +0900
Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array

This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
 src/backend/replication/syncrep.c   | 99 +++--
 src/backend/replication/walsender.c | 36 +++---
 src/include/replication/syncrep.h   |  1 +
 3 files changed, 80 insertions(+), 56 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..70c799b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -357,6 +357,70 @@ SyncRepInitConfig(void)
 	}
 }
 
+
+/*
+ * Obtain position of synchronous standby in the array referencing all
+ * the WAL senders, or -1 if no synchronous node can be found. The caller
+ * of this function should take a lock on SyncRepLock. If there are
+ * multiple nodes with the same lowest priority value, the first node
+ * found is selected.
+ * sync_priority is a preallocated array of size max_wal_senders that
+ * can be used to retrieve the priority of each WAL sender. Its
+ * inclusion in this function has the advantage to limit the scan
+ * of the WAL sender array to one pass, limiting the amount of cycles
+ * SyncRepLock is taken.
+ */
+int
+SyncRepGetSynchronousNode(int *node_priority)
+{
+	int		sync_node = -1;
+	int		priority = 0;
+	int		i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i < max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = &WalSndCtl->walsnds[i];
+
+		/* First get the priority of this WAL sender */
+		if (node_priority)
+			node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
+0 : walsnd->sync_standby_priority;
+
+		/*
+		 * Proceed immediately to next WAL sender if one of those
+		 * conditions is satisfied:
+		 * - Not active with PID equal to 0.
+		 * - Not streaming.
+		 * - Priority conditions not satisfied. If priority is not 0 it
+		 *   means that one potential synchronous node has already been
+		 *   found. The node selected needs as well to have the lowest
+		 *   priority in the list of WAL senders.
+		 * - Stream flush position is invalid.
+		 */
+		if (walsnd->pid == 0 ||
+			walsnd->state != WALSNDSTATE_STREAMING ||
+			walsnd->sync_standby_priority == 0 ||
+			(priority != 0 &&
+			 priority <= walsnd->sync_standby_priority) ||
+			XLogRecPtrIsInvalid(walsnd->flush))
+			continue;
+
+		/*
+		 * We have a potential synchronous candidate, choose it as the
+		 * synchronous node for the time being before going through the
+		 * other nodes listed in the WAL sender array.
+		 */
+		sync_node = i;
+
+		/* Update priority to current value of WAL sender */
+		priority = walsnd->sync_standby_priority;
+	}
+
+	return sync_node;
+}

[HACKERS] Review of Refactoring code for sync node detection

2014-10-29 Thread Jim Nasby

Original message (sorry, don't have a copy to reply to :( ): 
http://www.postgresql.org/message-id/CAB7nPqThX=WvuGA1J-_CQ293dK3FmUivuYkNvHR0W5xjEc=o...@mail.gmail.com
Commitfest URL: https://commitfest.postgresql.org/action/patch_view?id=1579

Summary:
I'd like someone to chime in on the potential performance impact. Other than 
that, looks good modulo a few grammar fixes.

Patch applies cleanly and passes make check.

Searching for both SyncRepLock and sync_standby_priority I find no other code 
this patch needs to modify.

SyncRepGetSynchronousNode():
IMHO, the top comment should mention that if there are multiple standbys of the 
same priority it returns the first one.

This switches from using a single if() with multiple conditions &&'d together 
to a bunch of if() continue's. I don't know if those will perform the same, and AFAIK 
this is pretty performance critical.

In the comments, "Process" should be "Proceed".

The original search logic in SyncRepReleaseWaiters is correctly copied and 
negated.

pg_stat_get_wal_senders():
The original version only loops through walsnds[] one time; the new code loops 
twice, both times while holding SyncRepLock(shared). This will block 
transaction commit if syncrep is enabled. That's not great, but presumably this 
function isn't going to be called terribly often, so maybe it's OK. (On a side 
note, perhaps we should add a warning to the documentation for 
pg_stat_replication warning not to hammer it...)


+   /* Get first the priorities on each standby as long as we hold a lock */
should be
+   /* First get the priority of each standby as long as we hold a lock */
or even just
+   /* First get the priority of each standby */
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers