Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-07 Thread Robert Haas
On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote:
 On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
  The WALSender deliberately does *not* wake waiting users if the standby
  disconnects. Doing so would break the whole reason for having sync rep
  in the first place. What we do is allow a potential standby to takeover
  the role of sync standby, if one is available. Or the failing standby
  can reconnect and then release waiters.
 
  If there is potential standby when synchronous standby has gone, I agree
  that it's not good idea to release the waiting backends soon. In this case,
  those backends should wait for next synchronous standby.
 
  On the other hand, if there is no potential standby, I think that the 
  waiting
  backends should not wait for the timeout and should wake up as soon as
  synchronous standby has gone. Otherwise, those backends suspend for
  a long time (i.e., until the timeout expires), which would decrease the
  high-availability, I'm afraid.
 
  Keeping those backends waiting for the failed standby to reconnect is an
  idea. But this looks like the behavior for allow_standalone_primary = 
  off.
  If allow_standalone_primary = on, it looks more natural to make the
  primary work alone without waiting the timeout.

 Also I think that the waiting backends should be released as soon as the
 last synchronous standby switches to asynchronous mode. Since there is
 no standby which is planning to reconnect, obviously they no longer need
 to wait.

 I've not done this, but we could.

 It can't run in a WALSender, so this code would need to live in either
 WALWriter or BgWriter.

I would have thought that the last WALSender to switch to async would
have been responsible for doing this at that time.  Why doesn't that
work?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-07 Thread Simon Riggs
On Mon, 2011-03-07 at 13:15 -0500, Robert Haas wrote:
 On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote:
  On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote:
   On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com 
   wrote:

 
  Also I think that the waiting backends should be released as soon as the
  last synchronous standby switches to asynchronous mode. Since there is
  no standby which is planning to reconnect, obviously they no longer need
  to wait.
 
  I've not done this, but we could.
 
  It can't run in a WALSender, so this code would need to live in either
  WALWriter or BgWriter.
 
 I would have thought that the last WALSender to switch to async would
 have been responsible for doing this at that time.  Why doesn't that
 work?

The main time we get extended waits is when there are no WALsenders.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-06 Thread Simon Riggs
On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: 
 On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
  The WALSender deliberately does *not* wake waiting users if the standby
  disconnects. Doing so would break the whole reason for having sync rep
  in the first place. What we do is allow a potential standby to takeover
  the role of sync standby, if one is available. Or the failing standby
  can reconnect and then release waiters.
 
  If there is potential standby when synchronous standby has gone, I agree
  that it's not good idea to release the waiting backends soon. In this case,
  those backends should wait for next synchronous standby.
 
  On the other hand, if there is no potential standby, I think that the 
  waiting
  backends should not wait for the timeout and should wake up as soon as
  synchronous standby has gone. Otherwise, those backends suspend for
  a long time (i.e., until the timeout expires), which would decrease the
  high-availability, I'm afraid.
 
  Keeping those backends waiting for the failed standby to reconnect is an
  idea. But this looks like the behavior for allow_standalone_primary = off.
  If allow_standalone_primary = on, it looks more natural to make the
  primary work alone without waiting the timeout.
 
 Also I think that the waiting backends should be released as soon as the
 last synchronous standby switches to asynchronous mode. Since there is
 no standby which is planning to reconnect, obviously they no longer need
 to wait.

I've not done this, but we could.

It can't run in a WALSender, so this code would need to live in either
WALWriter or BgWriter.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Fujii Masao
On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The WALSender deliberately does *not* wake waiting users if the standby
 disconnects. Doing so would break the whole reason for having sync rep
 in the first place. What we do is allow a potential standby to takeover
 the role of sync standby, if one is available. Or the failing standby
 can reconnect and then release waiters.

 If there is potential standby when synchronous standby has gone, I agree
 that it's not good idea to release the waiting backends soon. In this case,
 those backends should wait for next synchronous standby.

 On the other hand, if there is no potential standby, I think that the waiting
 backends should not wait for the timeout and should wake up as soon as
 synchronous standby has gone. Otherwise, those backends suspend for
 a long time (i.e., until the timeout expires), which would decrease the
 high-availability, I'm afraid.

 Keeping those backends waiting for the failed standby to reconnect is an
 idea. But this looks like the behavior for allow_standalone_primary = off.
 If allow_standalone_primary = on, it looks more natural to make the
 primary work alone without waiting the timeout.

Also I think that the waiting backends should be released as soon as the
last synchronous standby switches to asynchronous mode. Since there is
no standby which is planning to reconnect, obviously they no longer need
to wait.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers