Re: [HACKERS] Review of Refactoring code for sync node detection
On Fri, Dec 12, 2014 at 9:38 PM, Heikki Linnakangas wrote: > On 12/12/2014 04:29 AM, Michael Paquier wrote: >> >> On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas < >> hlinnakan...@vmware.com> wrote: >> >>> I propose the attached (I admit I haven't tested it). >>> >> Actually if you do it this way I think that it would be worth adding the >> small optimization Fujii-san mentioned upthread: if priority is equal to >> 1, >> we leave the loop earlier and return immediately the pointer. All those >> things gathered give the patch attached, that I actually tested FWIW with >> multiple standbys and multiple entries in s_s_names. > > > Ok, committed. Thanks! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On 12/12/2014 04:29 AM, Michael Paquier wrote: On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote: I propose the attached (I admit I haven't tested it). Actually if you do it this way I think that it would be worth adding the small optimization Fujii-san mentioned upthread: if priority is equal to 1, we leave the loop earlier and return immediately the pointer. All those things gathered give the patch attached, that I actually tested FWIW with multiple standbys and multiple entries in s_s_names. Ok, committed. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On Thu, Dec 11, 2014 at 10:07 PM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote: > * I don't like filling the priorities-array in > SyncRepGetSynchronousStandby(). It might be convenient for the one caller > that needs it, but it seems pretty ad hoc. > > In fact, I don't really see the point of having the array at all. You > could just print the priority in the main loop in > pg_stat_get_wal_senders(). Yeah, nominally you need to hold SyncRepLock > while reading the priority, but it seems over-zealous to be so strict about > that in pg_stat_wal_senders(), since it's just an informational view, and > priority changes so rarely that it's highly unlikely to hit a race > condition there. Also note that when you change the list of synchronous > standbys in the config file, and SIGHUP, the walsender processes will > update their priorities in random order. So the idea that you get a > consistent snapshot of the priorities is a bit bogus anyway. Also note that > between filling the priorities array and the main loop in > pg_stat_get_wal_senders(), a new WAL sender might connect and set a slot's > pid. With the current coding, you'll print an uninitialized value from the > array if that happens. So the only thing that holding the SyncRepLock > really protects from is a "torn" read of the value, if reading an int is > not atomic. We could use the spinlock to protect from that if we care. > That's fair, and it simplifies the whole process as well as the API able to fetch the synchronous WAL sender. > * Would be nicer to return a pointer, than index into the wal senders > array. That's what the caller really wants. > > I propose the attached (I admit I haven't tested it). > Actually if you do it this way I think that it would be worth adding the small optimization Fujii-san mentioned upthread: if priority is equal to 1, we leave the loop earlier and return immediately the pointer. All those things gathered give the patch attached, that I actually tested FWIW with multiple standbys and multiple entries in s_s_names. Regards, -- Michael diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index aa54bfb..34530a0 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -5,7 +5,7 @@ * Synchronous replication is new as of PostgreSQL 9.1. * * If requested, transaction commits wait until their commit LSN is - * acknowledged by the sync standby. + * acknowledged by the synchronous standby. * * This module contains the code for waiting and release of backends. * All code in this module executes on the primary. The core streaming @@ -358,6 +358,62 @@ SyncRepInitConfig(void) } /* + * Find the WAL sender servicing the synchronous standby with the lowest + * priority value, or NULL if no synchronous standby is connected. If there + * are multiple nodes with the same lowest priority value, the first node + * found is selected. The caller must hold SyncRepLock. + */ +WalSnd * +SyncRepGetSynchronousStandby(void) +{ + WalSnd *result = NULL; + int result_priority = 0; + int i; + + /* Scan WAL senders and find synchronous node if any */ + for (i = 0; i < max_wal_senders; i++) + { + /* Use volatile pointer to prevent code rearrangement */ + volatile WalSnd *walsnd = &WalSndCtl->walsnds[i]; + int this_priority; + + /* Must be active */ + if (walsnd->pid == 0) + continue; + + /* Must be streaming */ + if (walsnd->state != WALSNDSTATE_STREAMING) + continue; + + /* Must be synchronous */ + this_priority = walsnd->sync_standby_priority; + if (this_priority == 0) + continue; + + /* Must have a lower priority value than any previous ones */ + if (result != NULL && result_priority <= this_priority) + continue; + + /* Must have a valid flush position */ + if (XLogRecPtrIsInvalid(walsnd->flush)) + continue; + + result = (WalSnd *) walsnd; + result_priority = this_priority; + + /* + * If priority is equal to 1, we are sure that there are no other + * WAL senders that could be synchronous with a lower prioroty, + * hence leave immediately with this one. + */ + if (this_priority == 1) + return result; + } + + return result; +} + +/* * Update the LSNs on each queue based upon our latest state. This * implements a simple policy of first-valid-standby-releases-waiter. * @@ -368,11 +424,9 @@ void SyncRepReleaseWaiters(void) { volatile WalSndCtlData *walsndctl = WalSndCtl; - volatile WalSnd *syncWalSnd = NULL; + WalSnd *syncWalSnd; int numwrite = 0; int numflush = 0; - int priority = 0; - int i; /* * If this WALSender is serving a standby that is not on the list of @@ -388,32 +442,13 @@ SyncRepReleaseWaiters(void) /* * We're a potential sync standby. Release waiters if we are the highest * priority standby. If there are multiple standbys with same priorities - * then we use the first mentioned standby. If you change this, also - * change pg_stat_get_wal_senders(). + * then we
Re: [HACKERS] Review of Refactoring code for sync node detection
On 11/18/2014 11:23 PM, Michael Paquier wrote: On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs wrote: Can we just wait on this patch until we have the whole feature? Well, this may take some time to even define, and even if goals are clearly defined this may take even more time to have a prototype to discuss about. We quite often break larger patches down into smaller ones, but if they arrive in lots of small pieces its more difficult to understand and correct issues that are happening on the larger scale. Churning the same piece of code multiple times is rather mind numbing. Hm. I think that we are losing ourselves in this thread. The primary goal is to remove a code duplication between syncrep.c and walsender,c that exists since 9.1. Would it be possible to focus only on that for now? That's still the topic of this CF. Some comments on this patch: * I don't like filling the priorities-array in SyncRepGetSynchronousStandby(). It might be convenient for the one caller that needs it, but it seems pretty ad hoc. In fact, I don't really see the point of having the array at all. You could just print the priority in the main loop in pg_stat_get_wal_senders(). Yeah, nominally you need to hold SyncRepLock while reading the priority, but it seems over-zealous to be so strict about that in pg_stat_wal_senders(), since it's just an informational view, and priority changes so rarely that it's highly unlikely to hit a race condition there. Also note that when you change the list of synchronous standbys in the config file, and SIGHUP, the walsender processes will update their priorities in random order. So the idea that you get a consistent snapshot of the priorities is a bit bogus anyway. Also note that between filling the priorities array and the main loop in pg_stat_get_wal_senders(), a new WAL sender might connect and set a slot's pid. With the current coding, you'll print an uninitialized value from the array if that happens. So the only thing that holding the SyncRepLock really protects from is a "torn" read of the value, if reading an int is not atomic. We could use the spinlock to protect from that if we care. * Would be nicer to return a pointer, than index into the wal senders array. That's what the caller really wants. I propose the attached (I admit I haven't tested it). - Heikki diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index aa54bfb..da89a3b 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -5,7 +5,7 @@ * Synchronous replication is new as of PostgreSQL 9.1. * * If requested, transaction commits wait until their commit LSN is - * acknowledged by the sync standby. + * acknowledged by the synchronous standby. * * This module contains the code for waiting and release of backends. * All code in this module executes on the primary. The core streaming @@ -358,6 +358,54 @@ SyncRepInitConfig(void) } /* + * Find the WAL sender servicing the synchronous standby with the lowest + * priority value, or NULL if no synchronous standby is connected. If there + * are multiple nodes with the same lowest priority value, the first node + * found is selected. The caller must hold SyncRepLock. + */ +WalSnd * +SyncRepGetSynchronousStandby(void) +{ + WalSnd *result = NULL; + int result_priority = 0; + int i; + + /* Scan WAL senders and find synchronous node if any */ + for (i = 0; i < max_wal_senders; i++) + { + /* Use volatile pointer to prevent code rearrangement */ + volatile WalSnd *walsnd = &WalSndCtl->walsnds[i]; + int this_priority; + + /* Must be active */ + if (walsnd->pid == 0) + continue; + + /* Must be streaming */ + if (walsnd->state != WALSNDSTATE_STREAMING) + continue; + + /* Must be synchronous */ + this_priority = walsnd->sync_standby_priority; + if (this_priority == 0) + continue; + + /* Must have a lower priority value than any previous ones */ + if (result != NULL && result_priority <= this_priority) + continue; + + /* Must have a valid flush position */ + if (XLogRecPtrIsInvalid(walsnd->flush)) + continue; + + result = (WalSnd *) walsnd; + result_priority = this_priority; + } + + return result; +} + +/* * Update the LSNs on each queue based upon our latest state. This * implements a simple policy of first-valid-standby-releases-waiter. * @@ -368,11 +416,9 @@ void SyncRepReleaseWaiters(void) { volatile WalSndCtlData *walsndctl = WalSndCtl; - volatile WalSnd *syncWalSnd = NULL; + WalSnd *syncWalSnd; int numwrite = 0; int numflush = 0; - int priority = 0; - int i; /* * If this WALSender is serving a standby that is not on the list of @@ -388,32 +434,13 @@ SyncRepReleaseWaiters(void) /* * We're a potential sync standby. Release waiters if we are the highest * priority standby. If there are multiple standbys with same priorities - * then we use the first mentioned standby. If you change this, also - * cha
Re: [HACKERS] Review of Refactoring code for sync node detection
On Tue, Nov 18, 2014 at 6:33 PM, Simon Riggs wrote: > Can we just wait on this patch until we have the whole feature? > Well, this may take some time to even define, and even if goals are clearly defined this may take even more time to have a prototype to discuss about. > We quite often break larger patches down into smaller ones, but if > they arrive in lots of small pieces its more difficult to understand > and correct issues that are happening on the larger scale. Churning > the same piece of code multiple times is rather mind numbing. > Hm. I think that we are losing ourselves in this thread. The primary goal is to remove a code duplication between syncrep.c and walsender,c that exists since 9.1. Would it be possible to focus only on that for now? That's still the topic of this CF. -- Michael
Re: [HACKERS] Review of Refactoring code for sync node detection
On 18 November 2014 00:02, Michael Paquier wrote: >> I've not read the patches yet. But while I was reading the code around >> sync node detection, I thought that it might be good idea to treat the >> node with priority as special. That is, if we find the active node with >> the priority 1, we can immediately go out of the sync-node-detection-loop. >> >> In many cases we can expect that the sync standby has the priority 1. >> If yes, treating the priority 1 as special is good way to do, I think. >> Thought? Agreed > That would really save some cycles. Now we still need to fetch the > priority numbers of all nodes for pg_stat_get_wal_senders, so in this > case scanning all the entries in the WAL sender array is necessary. > This optimization is orthogonal to the refactoring patch btw, no? To the refactoring patch, possibly. But not to the changes you're planning to make. ISTM that we should remember the priority list of nodes and check them in that order by direct lookups to array elements. That will work for 1 or more nodes and it also works better with large numbers of walsenders. Can we just wait on this patch until we have the whole feature? We quite often break larger patches down into smaller ones, but if they arrive in lots of small pieces its more difficult to understand and correct issues that are happening on the larger scale. Churning the same piece of code multiple times is rather mind numbing. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On Tue, Nov 18, 2014 at 3:03 AM, Fujii Masao wrote: > On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier > wrote: >> On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier >> wrote: >>> I'll send an updated patch tomorrow. >> >> Here are updated versions. I divided the patch into two for clarity, >> the first one is the actual refactoring patch, renaming >> SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha, >> like updating synchronous to sync in the comments as you mentioned) >> such as the namings have no conflicts. >> >> The second one updates the syncrep code, including WAL sender and WAL >> receiver, and its surroundings to use the term "node" instead of >> "standby". This brings in the code the idea that a node using >> replication APIs is not necessarily a standby, making it more generic. >> This can be applied on top of the refactoring patch. If any other >> folks (Heikki or Fujii-san?) have comments about this idea feel free. > > I've not read the patches yet. But while I was reading the code around > sync node detection, I thought that it might be good idea to treat the > node with priority as special. That is, if we find the active node with > the priority 1, we can immediately go out of the sync-node-detection-loop. > > In many cases we can expect that the sync standby has the priority 1. > If yes, treating the priority 1 as special is good way to do, I think. > Thought? That would really save some cycles. Now we still need to fetch the priority numbers of all nodes for pg_stat_get_wal_senders, so in this case scanning all the entries in the WAL sender array is necessary. This optimization is orthogonal to the refactoring patch btw, no? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On Mon, Nov 17, 2014 at 2:20 PM, Michael Paquier wrote: > On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier > wrote: >> I'll send an updated patch tomorrow. > > Here are updated versions. I divided the patch into two for clarity, > the first one is the actual refactoring patch, renaming > SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha, > like updating synchronous to sync in the comments as you mentioned) > such as the namings have no conflicts. > > The second one updates the syncrep code, including WAL sender and WAL > receiver, and its surroundings to use the term "node" instead of > "standby". This brings in the code the idea that a node using > replication APIs is not necessarily a standby, making it more generic. > This can be applied on top of the refactoring patch. If any other > folks (Heikki or Fujii-san?) have comments about this idea feel free. I've not read the patches yet. But while I was reading the code around sync node detection, I thought that it might be good idea to treat the node with priority as special. That is, if we find the active node with the priority 1, we can immediately go out of the sync-node-detection-loop. In many cases we can expect that the sync standby has the priority 1. If yes, treating the priority 1 as special is good way to do, I think. Thought? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On Mon, Nov 17, 2014 at 10:00 PM, Simon Riggs wrote: > On 16 November 2014 12:07, Michael Paquier wrote: > >> Let's work >> the multiple node thing once we have a better spec of how to do it, >> visibly using a dedicated micro-language within s_s_names. > > Hmm, please make sure that is a new post. That is easily something I > could disagree with, even though I support the need for more > functionality. Sure. I am not really on that yet though :) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On 16 November 2014 12:07, Michael Paquier wrote: > Let's work > the multiple node thing once we have a better spec of how to do it, > visibly using a dedicated micro-language within s_s_names. Hmm, please make sure that is a new post. That is easily something I could disagree with, even though I support the need for more functionality. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On Sun, Nov 16, 2014 at 9:07 PM, Michael Paquier wrote: > I'll send an updated patch tomorrow. Here are updated versions. I divided the patch into two for clarity, the first one is the actual refactoring patch, renaming SyncRepGetSynchronousNode to SyncRepGetSynchronousStandby (+alpha, like updating synchronous to sync in the comments as you mentioned) such as the namings have no conflicts. The second one updates the syncrep code, including WAL sender and WAL receiver, and its surroundings to use the term "node" instead of "standby". This brings in the code the idea that a node using replication APIs is not necessarily a standby, making it more generic. This can be applied on top of the refactoring patch. If any other folks (Heikki or Fujii-san?) have comments about this idea feel free. Regards, -- Michael From 51e9988ff44b7c2b716e3a0da3f1d1c9359a1d79 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 30 Oct 2014 22:01:10 +0900 Subject: [PATCH 1/2] Refactor code to detect synchronous node in WAL sender array This patch is made to remove code duplication in walsender.c and syncrep.c in order to detect what is the node with the lowest strictly-positive priority, facilitating maintenance of this code. --- src/backend/replication/syncrep.c | 97 +++-- src/backend/replication/walsender.c | 35 +++-- src/include/replication/syncrep.h | 1 + 3 files changed, 78 insertions(+), 55 deletions(-) diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index aa54bfb..f838ad0 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -357,6 +357,70 @@ SyncRepInitConfig(void) } } + +/* + * Obtain position of sync standby in the array referencing all the WAL + * senders, or -1 if no sync node can be found. The caller of this function + * should take a lock on SyncRepLock. If there are multiple nodes with + * the same lowest priority value, the first node found is selected. + * sync_priority is a preallocated array of size max_wal_senders that can + * be used to retrieve the priority of each WAL sender. Its inclusion in + * this function has the advantage to limit the scan of the WAL sender + * array to one pass, limiting the amount of cycles SyncRepLock is taken. + */ +int +SyncRepGetSynchronousStandby(int *node_priority) +{ + int sync_node = -1; + int priority = 0; + int i; + + /* Scan WAL senders and find the sync node if any */ + for (i = 0; i < max_wal_senders; i++) + { + /* Use volatile pointer to prevent code rearrangement */ + volatile WalSnd *walsnd = &WalSndCtl->walsnds[i]; + + /* First get the priority of this WAL sender */ + if (node_priority) + node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ? +0 : walsnd->sync_standby_priority; + + /* Proceed to next if not active */ + if (walsnd->pid == 0) + continue; + + /* Proceed to next if not streaming */ + if (walsnd->state != WALSNDSTATE_STREAMING) + continue; + + /* Proceed to next one if asynchronous */ + if (walsnd->sync_standby_priority == 0) + continue; + + /* Proceed to next one if priority conditions not satisfied */ + if (priority != 0 && + priority <= walsnd->sync_standby_priority) + continue; + + /* Proceed to next one if flush position is invalid */ + if (XLogRecPtrIsInvalid(walsnd->flush)) + continue; + + /* + * We have a potential sync candidate, choose it as the sync + * node for the time being before going through the other nodes + * listed in the WAL sender array. + */ + sync_node = i; + + /* Update priority to current value of WAL sender */ + priority = walsnd->sync_standby_priority; + } + + return sync_node; +} + /* * Update the LSNs on each queue based upon our latest state. This * implements a simple policy of first-valid-standby-releases-waiter. @@ -369,10 +433,9 @@ SyncRepReleaseWaiters(void) { volatile WalSndCtlData *walsndctl = WalSndCtl; volatile WalSnd *syncWalSnd = NULL; + int sync_node; int numwrite = 0; int numflush = 0; - int priority = 0; - int i; /* * If this WALSender is serving a standby that is not on the list of @@ -388,32 +451,14 @@ SyncRepReleaseWaiters(void) /* * We're a potential sync standby. Release waiters if we are the highest * priority standby. If there are multiple standbys with same priorities - * then we use the first mentioned standby. If you change this, also - * change pg_stat_get_wal_senders(). + * then we use the first mentioned standby. */ LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); + sync_node = SyncRepGetSynchronousStandby(NULL); - for (i = 0; i < max_wal_senders; i++) - { - /* use volatile pointer to prevent code rearrangement */ - volatile WalSnd *walsnd = &walsndctl->walsnds[i]; - - if (walsnd->pid != 0 && - walsnd->state == WALSNDSTATE_STREAMING && - walsnd->sync_standby_priority > 0 && - (priority == 0 || - priority > walsnd->sync_standby_priority) && - !XLogRecPtrIsInvalid
Re: [HACKERS] Review of Refactoring code for sync node detection
Thanks for the comments! On Sun, Nov 16, 2014 at 8:32 PM, Simon Riggs wrote: > On 31 October 2014 00:27, Michael Paquier wrote: >> On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby wrote: >>> >>> If we stick with this version I'd argue it makes more sense to just stick >>> the sync_node = and priority = statements into the if block and ditch the >>> continue. >> >> Let's go with the cleaner version then, I'd prefer code that can be read >> easily for this code path. Switching back is not much complicated either. > > I like where you are headed with this feature set. I will do my best > to comment and review so we get something in 9.5. > > Some review comments > > * The new function calls them a Node rather than a Standby. That is a > good change, but it needs to be applied consistently, so we no longer > use the word Standby anywhere near the sync rep code. I'd prefer if we > continue to use the abbreviation sync for synchronous, since its less > typing and avoids spelling mistakes in code and comments. Yes I guess this makes sense. > * Rationale... > Robert Haas wrote >> Interestingly, the syncrep code has in some of its code paths the idea >> that a synchronous node is unique, while other code paths assume that >> there can be multiple synchronous nodes. If that's fine I think that >> it would be better to just make the code multiple-sync node aware, by >> having a single function call in walsender.c and syncrep.c that >> returns an integer array of WAL sender positions (WalSndCtl). as that >> seems more extensible long-term. > > I thought this was the rationale for the patch, but it doesn't do > this. If you disagree with Robert, it would be best to say so now, > since I would guess he's expecting the patch to be doing as he asked. > Or perhaps I misunderstand. This comment is originally from me :) http://www.postgresql.org/message-id/CAB7nPqTtmSo0YX1_3_PykpKbdGUi3UeFd0_=2-6vrqo5n_t...@mail.gmail.com Changing with my older opinion, I wrote the patch on this thread with in mind the idea to keep the code as simple as possible, so for now it is better to still think that we have a single sync node. Let's work the multiple node thing once we have a better spec of how to do it, visibly using a dedicated micro-language within s_s_names. > * Multiple sync standbys were originally avoided because of the code > cost and complexity at ReleaseWaiters(). I'm very keen to keep the > code at that point very robust, so simplicity is essential. It would > also be useful to have some kind of micro-benchmark that allows us to > understand the overhead of various configs. Jim mentions he isn't sure > of the performance there. I don't see too much a problem with this > patch, but the later patches will require this, so we may as well do > this soon. Yes I have been playing with that with the patch series of the previous commit fest, with something not that much kludgy. > * We probably don't need a comment like /* Cleanup */ inserted above a > pfree. ;-) Sure. I'll send an updated patch tomorrow. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On 31 October 2014 00:27, Michael Paquier wrote: > On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby wrote: >> >> If we stick with this version I'd argue it makes more sense to just stick >> the sync_node = and priority = statements into the if block and ditch the >> continue. > > Let's go with the cleaner version then, I'd prefer code that can be read > easily for this code path. Switching back is not much complicated either. I like where you are headed with this feature set. I will do my best to comment and review so we get something in 9.5. Some review comments * The new function calls them a Node rather than a Standby. That is a good change, but it needs to be applied consistently, so we no longer use the word Standby anywhere near the sync rep code. I'd prefer if we continue to use the abbreviation sync for synchronous, since its less typing and avoids spelling mistakes in code and comments. * Rationale... Robert Haas wrote > Interestingly, the syncrep code has in some of its code paths the idea > that a synchronous node is unique, while other code paths assume that > there can be multiple synchronous nodes. If that's fine I think that > it would be better to just make the code multiple-sync node aware, by > having a single function call in walsender.c and syncrep.c that > returns an integer array of WAL sender positions (WalSndCtl). as that > seems more extensible long-term. I thought this was the rationale for the patch, but it doesn't do this. If you disagree with Robert, it would be best to say so now, since I would guess he's expecting the patch to be doing as he asked. Or perhaps I misunderstand. * Multiple sync standbys were originally avoided because of the code cost and complexity at ReleaseWaiters(). I'm very keen to keep the code at that point very robust, so simplicity is essential. It would also be useful to have some kind of micro-benchmark that allows us to understand the overhead of various configs. Jim mentions he isn't sure of the performance there. I don't see too much a problem with this patch, but the later patches will require this, so we may as well do this soon. * We probably don't need a comment like /* Cleanup */ inserted above a pfree. ;-) I see this should get committed in next few days, even outside the CF. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby wrote: > If we stick with this version I'd argue it makes more sense to just stick > the sync_node = and priority = statements into the if block and ditch the > continue. > Let's go with the cleaner version then, I'd prefer code that can be read easily for this code path. Switching back is not much complicated either. -- Michael From 00d85f046d187056a85c6d6dc82e9a79674b132d Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 30 Oct 2014 22:01:10 +0900 Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array This patch is made to remove code duplication in walsender.c and syncrep.c in order to detect what is the node with the lowest strictly-positive priority, facilitating maintenance of this code. --- src/backend/replication/syncrep.c | 101 ++-- src/backend/replication/walsender.c | 36 +++-- src/include/replication/syncrep.h | 1 + 3 files changed, 82 insertions(+), 56 deletions(-) diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index aa54bfb..848c783 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -5,7 +5,7 @@ * Synchronous replication is new as of PostgreSQL 9.1. * * If requested, transaction commits wait until their commit LSN is - * acknowledged by the sync standby. + * acknowledged by the synchronous standby. * * This module contains the code for waiting and release of backends. * All code in this module executes on the primary. The core streaming @@ -357,6 +357,72 @@ SyncRepInitConfig(void) } } + +/* + * Obtain position of synchronous standby in the array referencing all + * the WAL senders, or -1 if no synchronous node can be found. The caller + * of this function should take a lock on SyncRepLock. If there are + * multiple nodes with the same lowest priority value, the first node + * found is selected. + * sync_priority is a preallocated array of size max_wal_senders that + * can be used to retrieve the priority of each WAL sender. Its + * inclusion in this function has the advantage to limit the scan + * of the WAL sender array to one pass, limiting the amount of cycles + * SyncRepLock is taken. + */ +int +SyncRepGetSynchronousNode(int *node_priority) +{ + int sync_node = -1; + int priority = 0; + int i; + + /* Scan WAL senders and find synchronous node if any */ + for (i = 0; i < max_wal_senders; i++) + { + /* Use volatile pointer to prevent code rearrangement */ + volatile WalSnd *walsnd = &WalSndCtl->walsnds[i]; + + /* First get the priority of this WAL sender */ + if (node_priority) + node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ? +0 : walsnd->sync_standby_priority; + + /* Proceed to next if not active */ + if (walsnd->pid == 0) + continue; + + /* Proceed to next if not streaming */ + if (walsnd->state != WALSNDSTATE_STREAMING) + continue; + + /* Proceed to next one if asynchronous */ + if (walsnd->sync_standby_priority == 0) + continue; + + /* Proceed to next one if priority conditions not satisfied */ + if (priority != 0 && + priority <= walsnd->sync_standby_priority) + continue; + + /* Proceed to next one if flush position is invalid */ + if (XLogRecPtrIsInvalid(walsnd->flush)) + continue; + + /* + * We have a potential synchronous candidate, choose it as the + * synchronous node for the time being before going through the + * other nodes listed in the WAL sender array. + */ + sync_node = i; + + /* Update priority to current value of WAL sender */ + priority = walsnd->sync_standby_priority; + } + + return sync_node; +} + /* * Update the LSNs on each queue based upon our latest state. This * implements a simple policy of first-valid-standby-releases-waiter. @@ -369,10 +435,9 @@ SyncRepReleaseWaiters(void) { volatile WalSndCtlData *walsndctl = WalSndCtl; volatile WalSnd *syncWalSnd = NULL; + int sync_node; int numwrite = 0; int numflush = 0; - int priority = 0; - int i; /* * If this WALSender is serving a standby that is not on the list of @@ -388,32 +453,14 @@ SyncRepReleaseWaiters(void) /* * We're a potential sync standby. Release waiters if we are the highest * priority standby. If there are multiple standbys with same priorities - * then we use the first mentioned standby. If you change this, also - * change pg_stat_get_wal_senders(). + * then we use the first mentioned standbys. */ LWLockAcquire(SyncRepLock, LW_EXCLUSIVE); + sync_node = SyncRepGetSynchronousNode(NULL); - for (i = 0; i < max_wal_senders; i++) - { - /* use volatile pointer to prevent code rearrangement */ - volatile WalSnd *walsnd = &walsndctl->walsnds[i]; - - if (walsnd->pid != 0 && - walsnd->state == WALSNDSTATE_STREAMING && - walsnd->sync_standby_priority > 0 && - (priority == 0 || - priority > walsnd->sync_standby_priority) && - !XLogRecPtrIsInvalid(walsnd->flush)) - { - priority =
Re: [HACKERS] Review of Refactoring code for sync node detection
On 10/30/14, 8:05 AM, Michael Paquier wrote: This switches from using a single if() with multiple conditions &&'d together to a bunch of if() continue's. I don't know if those will perform the same, and AFAIK this is pretty performance critical. Well, we could still use the old notation with a single if(). That's not much complicated to change. I actually prefer the multiple if's; it reads a LOT cleaner. I don't know what the compiler will do with it though. If we stick with this version I'd argue it makes more sense to just stick the sync_node = and priority = statements into the if block and ditch the continue. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review of Refactoring code for sync node detection
Thanks for your review! (No worries for creating a new thread, I don't mind as this is not a huge patch. I think you could have downloaded the mbox from postgresql.org btw). On Thu, Oct 30, 2014 at 8:24 AM, Jim Nasby wrote: > SyncRepGetSynchronousNode(): > IMHO, the top comment should mention that if there are multiple standbys of > the same priority it returns the first one. OK. Corrected. > This switches from using a single if() with multiple conditions &&'d > together to a bunch of if() continue's. I don't know if those will perform > the same, and AFAIK this is pretty performance critical. Well, we could still use the old notation with a single if(). That's not much complicated to change. > In the comments, "Process" should be > "Proceed". Noted. Thanks for correcting my Frenglish. > pg_stat_get_wal_senders(): > The original version only loops through walsnds[] one time; the new code > loops twice, both times while holding SyncRepLock(shared). This will block > transaction commit if syncrep is enabled. That's not great, but presumably > this function isn't going to be called terribly often, so maybe it's OK. (On > a side note, perhaps we should add a warning to the documentation for > pg_stat_replication warning not to hammer it...) Hm. For code readability I did not really want to do so, but let's do this: let's add a new argument in SyncRepGetSynchronousNode to retrieve the priority of each WAL sender and do a single pass through the array such as there is no performance difference. Updated patch attached. Regards, -- Michael From 1a02c982de0aeb2bd7dcf0bbf34605017619a417 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 30 Oct 2014 22:01:10 +0900 Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array This patch is made to remove code duplication in walsender.c and syncrep.c in order to detect what is the node with the lowest strictly-positive priority, facilitating maintenance of this code. --- src/backend/replication/syncrep.c | 99 +++-- src/backend/replication/walsender.c | 36 +++--- src/include/replication/syncrep.h | 1 + 3 files changed, 80 insertions(+), 56 deletions(-) diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c index aa54bfb..70c799b 100644 --- a/src/backend/replication/syncrep.c +++ b/src/backend/replication/syncrep.c @@ -5,7 +5,7 @@ * Synchronous replication is new as of PostgreSQL 9.1. * * If requested, transaction commits wait until their commit LSN is - * acknowledged by the sync standby. + * acknowledged by the synchronous standby. * * This module contains the code for waiting and release of backends. * All code in this module executes on the primary. The core streaming @@ -357,6 +357,70 @@ SyncRepInitConfig(void) } } + +/* + * Obtain position of synchronous standby in the array referencing all + * the WAL senders, or -1 if no synchronous node can be found. The caller + * of this function should take a lock on SyncRepLock. If there are + * multiple nodes with the same lowest priority value, the first node + * found is selected. + * sync_priority is a preallocated array of size max_wal_senders that + * can be used to retrieve the priority of each WAL sender. Its + * inclusion in this function has the advantage to limit the scan + * of the WAL sender array to one pass, limiting the amount of cycles + * SyncRepLock is taken. + */ +int +SyncRepGetSynchronousNode(int *node_priority) +{ + int sync_node = -1; + int priority = 0; + int i; + + /* Scan WAL senders and find synchronous node if any */ + for (i = 0; i < max_wal_senders; i++) + { + /* Use volatile pointer to prevent code rearrangement */ + volatile WalSnd *walsnd = &WalSndCtl->walsnds[i]; + + /* First get the priority of this WAL sender */ + if (node_priority) + node_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ? +0 : walsnd->sync_standby_priority; + + /* + * Proceed immediately to next WAL sender if one of those + * conditions is satisfied: + * - Not active with PID equal to 0. + * - Not streaming. + * - Priority conditions not satisfied. If priority is not 0 it + * means that one potential synchronous node has already been + * found. The node selected needs as well to have the lowest + * priority in the list of WAL senders. + * - Stream flush position is invalid. + */ + if (walsnd->pid == 0 || + walsnd->state != WALSNDSTATE_STREAMING || + walsnd->sync_standby_priority == 0 || + (priority != 0 && + priority <= walsnd->sync_standby_priority) || + XLogRecPtrIsInvalid(walsnd->flush)) + continue; + + /* + * We have a potential synchronous candidate, choose it as the + * synchronous node for the time being before going through the + * other nodes listed in the WAL sender array. + */ + sync_node = i; + + /* Update priority to current value of WAL sender */ + priority = walsnd->sync_standby_priority; + } + + return sync_node; +}
[HACKERS] Review of Refactoring code for sync node detection
Original message (sorry, don't have a copy to reply to :( ): http://www.postgresql.org/message-id/CAB7nPqThX=WvuGA1J-_CQ293dK3FmUivuYkNvHR0W5xjEc=o...@mail.gmail.com Commitfest URL: https://commitfest.postgresql.org/action/patch_view?id=1579 Summary: I'd like someone to chime in on the potential performance impact. Other than that, looks good modulo a few grammar fixes. Patch applies cleanly and passes make check. Searching for both SyncRepLock and sync_standby_priority I find no other code this patch needs to modify. SyncRepGetSynchronousNode(): IMHO, the top comment should mention that if there are multiple standbys of the same priority it returns the first one. This switches from using a single if() with multiple conditions &&'d together to a bunch of if() continue's. I don't know if those will perform the same, and AFAIK this is pretty performance critical. In the comments, "Process" should be "Proceed". The original search logic in SyncRepReleaseWaiters is correctly copied and negated. pg_stat_get_wal_senders(): The original version only loops through walsnds[] one time; the new code loops twice, both times while holding SyncRepLock(shared). This will block transaction commit if syncrep is enabled. That's not great, but presumably this function isn't going to be called terribly often, so maybe it's OK. (On a side note, perhaps we should add a warning to the documentation for pg_stat_replication warning not to hammer it...) + /* Get first the priorities on each standby as long as we hold a lock */ should be + /* First get the priority of each standby as long as we hold a lock */ or even just + /* First get the priority of each standby */ -- Jim C. Nasby, Data Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers