Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-07 Thread Robert Haas
On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote:
 On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
  The WALSender deliberately does *not* wake waiting users if the standby
  disconnects. Doing so would break the whole reason for having sync rep
  in the first place. What we do is allow a potential standby to takeover
  the role of sync standby, if one is available. Or the failing standby
  can reconnect and then release waiters.
 
  If there is potential standby when synchronous standby has gone, I agree
  that it's not good idea to release the waiting backends soon. In this case,
  those backends should wait for next synchronous standby.
 
  On the other hand, if there is no potential standby, I think that the 
  waiting
  backends should not wait for the timeout and should wake up as soon as
  synchronous standby has gone. Otherwise, those backends suspend for
  a long time (i.e., until the timeout expires), which would decrease the
  high-availability, I'm afraid.
 
  Keeping those backends waiting for the failed standby to reconnect is an
  idea. But this looks like the behavior for allow_standalone_primary = 
  off.
  If allow_standalone_primary = on, it looks more natural to make the
  primary work alone without waiting the timeout.

 Also I think that the waiting backends should be released as soon as the
 last synchronous standby switches to asynchronous mode. Since there is
 no standby which is planning to reconnect, obviously they no longer need
 to wait.

 I've not done this, but we could.

 It can't run in a WALSender, so this code would need to live in either
 WALWriter or BgWriter.

I would have thought that the last WALSender to switch to async would
have been responsible for doing this at that time.  Why doesn't that
work?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-07 Thread Simon Riggs
On Mon, 2011-03-07 at 13:15 -0500, Robert Haas wrote:
 On Sun, Mar 6, 2011 at 5:36 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote:
  On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote:
   On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com 
   wrote:

 
  Also I think that the waiting backends should be released as soon as the
  last synchronous standby switches to asynchronous mode. Since there is
  no standby which is planning to reconnect, obviously they no longer need
  to wait.
 
  I've not done this, but we could.
 
  It can't run in a WALSender, so this code would need to live in either
  WALWriter or BgWriter.
 
 I would have thought that the last WALSender to switch to async would
 have been responsible for doing this at that time.  Why doesn't that
 work?

The main time we get extended waits is when there are no WALsenders.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-06 Thread Simon Riggs
On Fri, 2011-03-04 at 16:57 +0900, Fujii Masao wrote: 
 On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote:
  On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
  The WALSender deliberately does *not* wake waiting users if the standby
  disconnects. Doing so would break the whole reason for having sync rep
  in the first place. What we do is allow a potential standby to takeover
  the role of sync standby, if one is available. Or the failing standby
  can reconnect and then release waiters.
 
  If there is potential standby when synchronous standby has gone, I agree
  that it's not good idea to release the waiting backends soon. In this case,
  those backends should wait for next synchronous standby.
 
  On the other hand, if there is no potential standby, I think that the 
  waiting
  backends should not wait for the timeout and should wake up as soon as
  synchronous standby has gone. Otherwise, those backends suspend for
  a long time (i.e., until the timeout expires), which would decrease the
  high-availability, I'm afraid.
 
  Keeping those backends waiting for the failed standby to reconnect is an
  idea. But this looks like the behavior for allow_standalone_primary = off.
  If allow_standalone_primary = on, it looks more natural to make the
  primary work alone without waiting the timeout.
 
 Also I think that the waiting backends should be released as soon as the
 last synchronous standby switches to asynchronous mode. Since there is
 no standby which is planning to reconnect, obviously they no longer need
 to wait.

I've not done this, but we could.

It can't run in a WALSender, so this code would need to live in either
WALWriter or BgWriter.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Simon Riggs
On Thu, 2011-03-03 at 02:14 -0500, Tom Lane wrote:
 Fujii Masao masao.fu...@gmail.com writes:
  On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas
  heikki.linnakan...@enterprisedb.com wrote:
  To achieve the effect Fujii is looking for, we would have to silently drop
  the connection. That would correctly leave the client not knowing whether
  the transaction committed or not.
 
  Yeah, this seems to make more sense.
 
 It was pointed out that sending an ERROR would not do because it would
 likely lead to client code assuming the transaction failed, which might
 or might not be the case.  But maybe we could send a WARNING and then
 close the connection?  That would give humans a clue what had happened,
 but not do anything to the state of automated clients.

So when we perform a Fast Shutdown we want to do something fairly
similar to quickdie()?

Please review the attached patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 
*** a/src/backend/replication/syncrep.c
--- b/src/backend/replication/syncrep.c
***
*** 63,71 
  
  /* User-settable parameters for sync rep */
  bool	sync_rep_mode = false;			/* Only set in user backends */
! int		sync_rep_timeout = 120;	/* Only set in user backends */
  char 	*SyncRepStandbyNames;
  
  
  #define	IsOnSyncRepQueue()		(MyProc-lwWaiting)
  
--- 63,72 
  
  /* User-settable parameters for sync rep */
  bool	sync_rep_mode = false;			/* Only set in user backends */
! int		sync_rep_timeout = 120;			/* Only set in user backends */
  char 	*SyncRepStandbyNames;
  
+ bool	WaitingForSyncRep = false;	/* Global state for some exit methods */
  
  #define	IsOnSyncRepQueue()		(MyProc-lwWaiting)
  
***
*** 202,207  SyncRepWaitOnQueue(XLogRecPtr XactCommitLSN)
--- 203,209 
  			MyProc-waitLSN = XactCommitLSN;
  			SyncRepAddToQueue();
  			LWLockRelease(SyncRepLock);
+ 			WaitingForSyncRep = true;
  
  			/*
  			 * Alter ps display to show waiting for sync rep.
***
*** 241,246  SyncRepWaitOnQueue(XLogRecPtr XactCommitLSN)
--- 243,249 
  			{
  SyncRepRemoveFromQueue();
  LWLockRelease(SyncRepLock);
+ WaitingForSyncRep = false;
  
  /*
   * Reset our waitLSN.
***
*** 248,254  SyncRepWaitOnQueue(XLogRecPtr XactCommitLSN)
  MyProc-waitLSN.xlogid = 0;
  MyProc-waitLSN.xrecoff = 0;
  
- 
  if (new_status)
  {
  	/* Reset ps display */
--- 251,256 
*** a/src/backend/tcop/postgres.c
--- b/src/backend/tcop/postgres.c
***
*** 2902,2907  ProcessInterrupts(void)
--- 2902,2935 
  			ereport(FATAL,
  	(errcode(ERRCODE_ADMIN_SHUTDOWN),
  	 errmsg(terminating autovacuum process due to administrator command)));
+ 		else if (WaitingForSyncRep)
+ 		{
+ 			/*
+ 			 * This must NOT be a FATAL message. We want the state of the
+ 			 * transaction being aborted to be indeterminate to ensure that
+ 			 * the transaction completion guarantee is never broken.
+ 			 */
+ 			ereport(WARNING,
+ 	(errcode(ERRCODE_ADMIN_SHUTDOWN),
+ 	 errmsg(terminating connection because fast shutdown is requested),
+ 			errdetail(This connection requested synchronous replication at commit
+ 	   yet confirmation of replication has not been received.
+ 	   The transaction has committed locally and might be committed
+ 	   on recently disconnected standby servers also.)));
+ 
+ 			/*
+ 			 * We DO NOT want to run proc_exit() callbacks -- we're here because
+ 			 * we are shutting down and don't want any code to stall or
+ 			 * prevent that.
+ 			 */
+ 			on_exit_reset();
+ 
+ 			/*
+ 			 * Note we do exit(0) not exit(0). This is to avoid forcing
+ 			 * postmaster into a system reset cycle.
+ 			 */
+ 			exit(0);
+ 		}
  		else if (RecoveryConflictPending  RecoveryConflictRetryable)
  		{
  			pgstat_report_recovery_conflict(RecoveryConflictReason);
*** a/src/include/miscadmin.h
--- b/src/include/miscadmin.h
***
*** 78,83  extern PGDLLIMPORT volatile uint32 CritSectionCount;
--- 78,86 
  /* in tcop/postgres.c */
  extern void ProcessInterrupts(void);
  
+ /* in replication/syncrep.c */
+ extern bool WaitingForSyncRep;
+ 
  #ifndef WIN32
  
  #define CHECK_FOR_INTERRUPTS() \

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 I don't understand how synchronous replication with
 allow_standalone_primary=on gives you ANY extra nines.  AFAICS, the
 only point of having synchronous replication is that you wait to
 acknowledge the commit to the client until the commit record has been
 replicated.  Doing that only when the standby happens to be connected
 doesn't seem like it helps much.

Because you're still thinking in terms of data availability, rather than
in terms of service availability.  With the later in mind, what you want
is to be able to continue servicing from the standby should the primary
crash, and you want a good guarantee about the standby's data.

Of course in such a situation you will have some monitoring to ensure
that the standby remains in sync, and you want to know that at failover
time.  But a standby failure, when you want service availability, should
never bring the service down.  It's what happens, though, if you've been
setting up *data* availability, because there there's no choice.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Robert Haas
On Wed, Mar 2, 2011 at 5:10 PM, Yeb Havinga yebhavi...@gmail.com wrote:
 On 2011-03-02 21:26, Kevin Grittner wrote:

 I think including synchronous is OK as long as it's properly
 qualified.  Off-hand thoughts in no particular order:

 semi-synchronous
 conditionally synchronous
 synchronous with automatic failover to standalone

 It would be good to name the concept equal to how other DBMSses call it, if
 they have a similar concept - don't know if Mysql's semisynchronous
 replication is the same, but after a quick read it sounds like it does.

Here's the link:

http://dev.mysql.com/doc/refman/5.5/en/replication-semisync.html

I think this is mostly about how many slaves have to ack the commit.
It's not entirely clear to me what happens if a slave is set up but
not connected at the moment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


How should the primary behave when the sync standby goes away? Re: [HACKERS] Sync Rep v17

2011-03-03 Thread Fujii Masao
On Wed, Mar 2, 2011 at 11:30 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The WALSender deliberately does *not* wake waiting users if the standby
 disconnects. Doing so would break the whole reason for having sync rep
 in the first place. What we do is allow a potential standby to takeover
 the role of sync standby, if one is available. Or the failing standby
 can reconnect and then release waiters.

 If there is potential standby when synchronous standby has gone, I agree
 that it's not good idea to release the waiting backends soon. In this case,
 those backends should wait for next synchronous standby.

 On the other hand, if there is no potential standby, I think that the waiting
 backends should not wait for the timeout and should wake up as soon as
 synchronous standby has gone. Otherwise, those backends suspend for
 a long time (i.e., until the timeout expires), which would decrease the
 high-availability, I'm afraid.

 Keeping those backends waiting for the failed standby to reconnect is an
 idea. But this looks like the behavior for allow_standalone_primary = off.
 If allow_standalone_primary = on, it looks more natural to make the
 primary work alone without waiting the timeout.

Also I think that the waiting backends should be released as soon as the
last synchronous standby switches to asynchronous mode. Since there is
no standby which is planning to reconnect, obviously they no longer need
to wait.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote:

 No, I've never wished wait-forever option for now. I'd like to make
 the primary work alone when there is no connected standby, for
 high-availability. 

allow_standalone_primary seems to need to be better through than it is
now, yet neither of us think its worth having.

If the people that want it can think it through a little better then it
might make this release, but I propose to remove it from this current
patch to allow us to commit with greater certainty and fewer bugs.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote:
 On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote:
   SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
   if the standby crashes while a transaction is waiting for replication,
   it waits infinitely.
 
  Will think on this.
 
  The behaviour seems correct to me:

  If allow_standalone_primary = on then we sit and wait until we hit
  client timeout, which occurs even after last standby has gone.
 
 In that case, why do backends need to wait until the timeout occurs?
 We can make those backends resume their transaction as soon as
 the last standby has gone. No?

I'm getting a little confused as to what you're asking for regarding
shutdowns and WALSender disconnects.

The current behaviour is that when a reply is received we attempt to
wake up backends waiting for an LSN. If that reply is not received
within a timeout then we just return to user anyway. You can wait
forever, if you choose, by setting the timeout to 0.

The WALSender deliberately does *not* wake waiting users if the standby
disconnects. Doing so would break the whole reason for having sync rep
in the first place. What we do is allow a potential standby to takeover
the role of sync standby, if one is available. Or the failing standby
can reconnect and then release waiters.

If we shutdown, then we wait for the shutdown commit record to be
transferred to our standby, so a normal or fast shutdown of the master
always leaves all connected standbys up to date. We already do that, so
sync rep doesn't touch that behaviour. If a standby is disconnected,
then it doesn't receive the shutdown checkpoint record.

The wait state for a commit does not persist when we shutdown and
restart.

Can you restate which bits of the above you think need to be changed?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Yeb Havinga

On 2011-03-02 11:40, Simon Riggs wrote:


allow_standalone_primary seems to need to be better through than it is
now, yet neither of us think its worth having.

If the people that want it can think it through a little better then it
might make this release, but I propose to remove it from this current
patch to allow us to commit with greater certainty and fewer bugs.
As somebody who actually thought having it was a good idea, +1 for 
remove. I can monitor having one or two sync standbys at all times and 
let bells ring when they fail, but especially when things might break 
and the standbys are gone, having allow_standalone_primary set to off is 
a very efficient way to limit your options then to effectively zero with 
the master.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 6:22 AM, Simon Riggs si...@2ndquadrant.com wrote:
 The WALSender deliberately does *not* wake waiting users if the standby
 disconnects. Doing so would break the whole reason for having sync rep
 in the first place. What we do is allow a potential standby to takeover
 the role of sync standby, if one is available. Or the failing standby
 can reconnect and then release waiters.

If the transaction would have been allowed to commit without waiting
had the standby not been connected in the first place, then presumably
it should also be allowed to commit if the standby disconnects later,
too.  Otherwise, it doesn't seem very consistent.  A commit should
either wait for a disconnected standby to reconnect, or it should not
wait.  It shouldn't wait in some situations but not others, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Wed, Mar 2, 2011 at 8:22 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The WALSender deliberately does *not* wake waiting users if the standby
 disconnects. Doing so would break the whole reason for having sync rep
 in the first place. What we do is allow a potential standby to takeover
 the role of sync standby, if one is available. Or the failing standby
 can reconnect and then release waiters.

If there is potential standby when synchronous standby has gone, I agree
that it's not good idea to release the waiting backends soon. In this case,
those backends should wait for next synchronous standby.

On the other hand, if there is no potential standby, I think that the waiting
backends should not wait for the timeout and should wake up as soon as
synchronous standby has gone. Otherwise, those backends suspend for
a long time (i.e., until the timeout expires), which would decrease the
high-availability, I'm afraid.

Keeping those backends waiting for the failed standby to reconnect is an
idea. But this looks like the behavior for allow_standalone_primary = off.
If allow_standalone_primary = on, it looks more natural to make the
primary work alone without waiting the timeout.

 If we shutdown, then we wait for the shutdown commit record to be
 transferred to our standby, so a normal or fast shutdown of the master
 always leaves all connected standbys up to date. We already do that, so
 sync rep doesn't touch that behaviour. If a standby is disconnected,
 then it doesn't receive the shutdown checkpoint record.

 The wait state for a commit does not persist when we shutdown and
 restart.

 Can you restate which bits of the above you think need to be changed?

What I'm thinking is: when the waiting backends are released because
of the timeout while the fast shutdown is being done in the master,
those backends should not return the success indication to the client.
Of course, in that case, WAL has already been flushed in the master,
but I think that those backends should exit with FATAL error before
returning the success. This is for avoiding breaking the synchronous
replication rule, i.e., all the transaction which the client knows as
committed must be committed in the synchronous standby after failover.

If we allow those backends to return the success in that situation, the
following scenario which can cause a data loss can happen.

1. The primary is running with allow_standalone_primary = on. There
is only one (synchronous) standby connected.
2. The replication connection is closed because of the network outage.
3. While some backends are waiting for replication, the user requests
fast shutdown in the master.
4. Since the timeout expires, those backends stop waiting and return
the success indication to the client (but not replicated to the standby).
5. Since there is no backend waiting for replication, fast shutdown
completes.
6. The clusterware like pacemaker detects the death of the primary
and triggers the failover.
7. New primary doesn't have some transactions committed to the
client, i.e., transaction lost happens!!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Wed, Mar 2, 2011 at 7:40 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote:

 No, I've never wished wait-forever option for now. I'd like to make
 the primary work alone when there is no connected standby, for
 high-availability.

 allow_standalone_primary seems to need to be better through than it is
 now, yet neither of us think its worth having.

 If the people that want it can think it through a little better then it
 might make this release, but I propose to remove it from this current
 patch to allow us to commit with greater certainty and fewer bugs.

+1 to remove the wait-forever behavior and the parameter for 9.1.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas

On 02.03.2011 12:40, Simon Riggs wrote:

allow_standalone_primary seems to need to be better through than it is
now, yet neither of us think its worth having.

If the people that want it can think it through a little better then it
might make this release, but I propose to remove it from this current
patch to allow us to commit with greater certainty and fewer bugs.


If you leave it out, then let's rename the feature to semi-synchronous 
replication or such. The point of synchronous replication is 
zero-data-loss, and you don't achieve that with allow_standalone_primary=on.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 9:30 AM, Fujii Masao masao.fu...@gmail.com wrote:
 What I'm thinking is: when the waiting backends are released because
 of the timeout while the fast shutdown is being done in the master,
 those backends should not return the success indication to the client.
 Of course, in that case, WAL has already been flushed in the master,
 but I think that those backends should exit with FATAL error before
 returning the success. This is for avoiding breaking the synchronous
 replication rule, i.e., all the transaction which the client knows as
 committed must be committed in the synchronous standby after failover.

That seems like an extremely bad idea.  Now any client that assumes
that FATAL means his transaction didn't commit is broken.  Clients
should be entitled to assume that a successful COMMIT means the
transaction committed (with whatever the operative durability
guarantee is) and that an error means it rolled back.  If the
connection is closed before either one of those things happens, the
client can't assume anything.

It might be reasonable to COMMIT but also issue a warning message, or
to just close the connection without telling the client what happened,
but sending an error seems poor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 02.03.2011 12:40, Simon Riggs wrote:

 allow_standalone_primary seems to need to be better through than it is
 now, yet neither of us think its worth having.

 If the people that want it can think it through a little better then it
 might make this release, but I propose to remove it from this current
 patch to allow us to commit with greater certainty and fewer bugs.

 If you leave it out, then let's rename the feature to semi-synchronous
 replication or such. The point of synchronous replication is
 zero-data-loss, and you don't achieve that with allow_standalone_primary=on.

I think that'd just be adding confusion.  Replication will still be
synchronous; it'll just be more likely to be not happening when you
think it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas

On 02.03.2011 17:07, Robert Haas wrote:

On Wed, Mar 2, 2011 at 9:30 AM, Fujii Masaomasao.fu...@gmail.com  wrote:

What I'm thinking is: when the waiting backends are released because
of the timeout while the fast shutdown is being done in the master,
those backends should not return the success indication to the client.
Of course, in that case, WAL has already been flushed in the master,
but I think that those backends should exit with FATAL error before
returning the success. This is for avoiding breaking the synchronous
replication rule, i.e., all the transaction which the client knows as
committed must be committed in the synchronous standby after failover.


That seems like an extremely bad idea.  Now any client that assumes
that FATAL means his transaction didn't commit is broken.  Clients
should be entitled to assume that a successful COMMIT means the
transaction committed (with whatever the operative durability
guarantee is) and that an error means it rolled back.  If the
connection is closed before either one of those things happens, the
client can't assume anything.


To achieve the effect Fujii is looking for, we would have to silently 
drop the connection. That would correctly leave the client not knowing 
whether the transaction committed or not.



It might be reasonable to COMMIT but also issue a warning message, or
to just close the connection without telling the client what happened,
but sending an error seems poor.


Yeah, I guess that would work too, if the client knows to watch out for 
those warnings.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas

On 02.03.2011 17:07, Robert Haas wrote:

On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

On 02.03.2011 12:40, Simon Riggs wrote:


allow_standalone_primary seems to need to be better through than it is
now, yet neither of us think its worth having.

If the people that want it can think it through a little better then it
might make this release, but I propose to remove it from this current
patch to allow us to commit with greater certainty and fewer bugs.


If you leave it out, then let's rename the feature to semi-synchronous
replication or such. The point of synchronous replication is
zero-data-loss, and you don't achieve that with allow_standalone_primary=on.


I think that'd just be adding confusion.  Replication will still be
synchronous; it'll just be more likely to be not happening when you
think it is.


The defining property of synchronous replication is that when a 
transaction is acknowledged as committed to the client, it has also been 
replicated to the standby. You don't achieve that with 
allow_standalone_primary=on, plain and simple. That's fine for a lot of 
people, most people don't actually want synchronous replication because 
they're not willing to pay the availability penalty. But IMHO it would 
be disingenuous to call it synchronous replication if you can't achieve 
zero data loss with it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Dimitri Fontaine
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 To achieve the effect Fujii is looking for, we would have to silently drop
 the connection. That would correctly leave the client not knowing whether
 the transaction committed or not.

+1

 It might be reasonable to COMMIT but also issue a warning message, or
 to just close the connection without telling the client what happened,
 but sending an error seems poor.

 Yeah, I guess that would work too, if the client knows to watch out for
 those warnings.

-1

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 
 The defining property of synchronous replication is that when a 
 transaction is acknowledged as committed to the client, it has
 also been replicated to the standby. You don't achieve that with 
 allow_standalone_primary=on, plain and simple. That's fine for a
 lot of people, most people don't actually want synchronous
 replication because they're not willing to pay the availability
 penalty. But IMHO it would be disingenuous to call it synchronous
 replication if you can't achieve zero data loss with it.
 
Right.  While there may be more people who favor high availability
than the guarantees of synchronous replication, let's not blur the
lines by mislabeling things.  It's not synchronous replication if a
commit returns successfully without the data being persisted on a
second server.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Aidan Van Dyk
On Wed, Mar 2, 2011 at 2:30 PM, Fujii Masao masao.fu...@gmail.com wrote:

 1. The primary is running with allow_standalone_primary = on. There
    is only one (synchronous) standby connected.

OK.  Explicitly configured to allow the master to report as commited
stuff which isn't on a/any slave.

 7. New primary doesn't have some transactions committed to the
    client, i.e., transaction lost happens!!

And this is a surprise?

I'm not saying there isn't a better way to to sequence/control a
shutdown to make this risk less, but isn't that the whole point of the
allow_standalone_primary debate/option?

If there isn't a sync slave for whatever reason, just march on, I'll
deal with the transactions that are committed and not replicated some
other way.

I guess complaining that it shouldn't be possible to just march on
when no sync slave is available is one possible way oof dealing
with them ;-)

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Merlin Moncure
On Wed, Mar 2, 2011 at 9:58 AM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 To achieve the effect Fujii is looking for, we would have to silently drop
 the connection. That would correctly leave the client not knowing whether
 the transaction committed or not.

 +1

 It might be reasonable to COMMIT but also issue a warning message, or
 to just close the connection without telling the client what happened,
 but sending an error seems poor.

 Yeah, I guess that would work too, if the client knows to watch out for
 those warnings.

 -1

yeah, unless by warning, you meant 'error'.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 12:39 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Wed, Mar 2, 2011 at 9:58 AM, Dimitri Fontaine dimi...@2ndquadrant.fr 
 wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 To achieve the effect Fujii is looking for, we would have to silently drop
 the connection. That would correctly leave the client not knowing whether
 the transaction committed or not.

 +1

 It might be reasonable to COMMIT but also issue a warning message, or
 to just close the connection without telling the client what happened,
 but sending an error seems poor.

 Yeah, I guess that would work too, if the client knows to watch out for
 those warnings.

 -1

 yeah, unless by warning, you meant 'error'.

Well, as mentioned upthread, throwing an error when the transaction is
actually committed seems poor.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 17:23 +0200, Heikki Linnakangas wrote:
 On 02.03.2011 17:07, Robert Haas wrote:
  On Wed, Mar 2, 2011 at 9:53 AM, Heikki Linnakangas
  heikki.linnakan...@enterprisedb.com  wrote:
  On 02.03.2011 12:40, Simon Riggs wrote:
 
  allow_standalone_primary seems to need to be better through than it is
  now, yet neither of us think its worth having.
 
  If the people that want it can think it through a little better then it
  might make this release, but I propose to remove it from this current
  patch to allow us to commit with greater certainty and fewer bugs.
 
  If you leave it out, then let's rename the feature to semi-synchronous
  replication or such. The point of synchronous replication is
  zero-data-loss, and you don't achieve that with 
  allow_standalone_primary=on.
 
  I think that'd just be adding confusion.  Replication will still be
  synchronous; it'll just be more likely to be not happening when you
  think it is.
 
 The defining property of synchronous replication is that when a 
 transaction is acknowledged as committed to the client, it has also been 
 replicated to the standby. You don't achieve that with 
 allow_standalone_primary=on, plain and simple. That's fine for a lot of 
 people, most people don't actually want synchronous replication because 
 they're not willing to pay the availability penalty. But IMHO it would 
 be disingenuous to call it synchronous replication if you can't achieve 
 zero data loss with it.

I agree with some of what you say, but that focuses on one corner case
and not on the whole solution.

Yes, if we pick the allow_standalone_primary=on behaviour AND you are
stupid enough to lose *all* of your sync standbys then you may get data
loss.

What I've spent a lot of time doing is trying to ensure that we never
lose all our sync standbys, via clear recommendation to use multiple
servers AND functionality to allow that the standbys work together to
give true high availability without data loss. The current design allows
for an arbitrary number of potential standby servers so your risk of
data loss can be as many 9s as you care to make it.

Not really bothered what we call it, but if you intend to make a song
and dance about whether it is proper or not, then I would object to
that because you're not presenting the full situation.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote:
 On 02.03.2011 12:40, Simon Riggs wrote:
  allow_standalone_primary seems to need to be better through than it is
  now, yet neither of us think its worth having.
 
  If the people that want it can think it through a little better then it
  might make this release, but I propose to remove it from this current
  patch to allow us to commit with greater certainty and fewer bugs.
 
 If you leave it out, then let's rename the feature to semi-synchronous 
 replication or such. The point of synchronous replication is 
 zero-data-loss, and you don't achieve that with allow_standalone_primary=on.

The reason I have suggested leaving that parameter out is because the
behaviour is not fully specified and Yeb has reported cases that don't
(yet) make sense. If you want to fully specify it then we could yet add
it, assuming we have time.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Heikki Linnakangas

On 02.03.2011 21:48, Simon Riggs wrote:

On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote:

On 02.03.2011 12:40, Simon Riggs wrote:

allow_standalone_primary seems to need to be better through than it is
now, yet neither of us think its worth having.

If the people that want it can think it through a little better then it
might make this release, but I propose to remove it from this current
patch to allow us to commit with greater certainty and fewer bugs.


If you leave it out, then let's rename the feature to semi-synchronous
replication or such. The point of synchronous replication is
zero-data-loss, and you don't achieve that with allow_standalone_primary=on.


The reason I have suggested leaving that parameter out is because the
behaviour is not fully specified and Yeb has reported cases that don't
(yet) make sense. If you want to fully specify it then we could yet add
it, assuming we have time.


Fair enough. All I'm saying is that if we end up shipping without that 
parameter (implying allow_standalone_primary=on), we need to call the 
feature something else. The GUCs and code can probably stay as it is, 
but we shouldn't use the term synchronous replication in the 
documentation, and release notes and such.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
 
 All I'm saying is that if we end up shipping without that
 parameter (implying allow_standalone_primary=on), we need to call
 the feature something else. The GUCs and code can probably stay as
 it is, but we shouldn't use the term synchronous replication in
 the documentation, and release notes and such.
 
I think including synchronous is OK as long as it's properly
qualified.  Off-hand thoughts in no particular order:
 
semi-synchronous
conditionally synchronous
synchronous with automatic failover to standalone
 
Perhaps the qualifications can be relaxed in some places but not
others?  The documentation should certainly emphasize that there is
no guarantee that a successful commit means that the data is on at
least two separate servers, if no such guarantee exists.
 
If we expect that losing all replicas is such a terribly thin long
shot that we don't need to worry about this difference, it is such a
long shot that we don't need to worry about wait forever behavior,
either; and we should just implement *that* so that no qualification
is needed.  I think that is an odd assumption, though; and I think a
HA failover to weaker persistence guarantees in exchange for
increased up-time would be popular.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Joshua D. Drake
On Wed, 2011-03-02 at 14:26 -0600, Kevin Grittner wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote:
  
  All I'm saying is that if we end up shipping without that
  parameter (implying allow_standalone_primary=on), we need to call
  the feature something else. The GUCs and code can probably stay as
  it is, but we shouldn't use the term synchronous replication in
  the documentation, and release notes and such.
  
 I think including synchronous is OK as long as it's properly
 qualified.  Off-hand thoughts in no particular order:
  
 semi-synchronous 

You mean asynchronous

 conditionally synchronous

You mean asynchronous

JD




-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote:
 On 02.03.2011 21:48, Simon Riggs wrote:
  On Wed, 2011-03-02 at 16:53 +0200, Heikki Linnakangas wrote:
  On 02.03.2011 12:40, Simon Riggs wrote:
  allow_standalone_primary seems to need to be better through than it is
  now, yet neither of us think its worth having.
 
  If the people that want it can think it through a little better then it
  might make this release, but I propose to remove it from this current
  patch to allow us to commit with greater certainty and fewer bugs.
 
  If you leave it out, then let's rename the feature to semi-synchronous
  replication or such. The point of synchronous replication is
  zero-data-loss, and you don't achieve that with 
  allow_standalone_primary=on.
 
  The reason I have suggested leaving that parameter out is because the
  behaviour is not fully specified and Yeb has reported cases that don't
  (yet) make sense. If you want to fully specify it then we could yet add
  it, assuming we have time.
 
 Fair enough. All I'm saying is that if we end up shipping without that 
 parameter (implying allow_standalone_primary=on), we need to call the 
 feature something else. The GUCs and code can probably stay as it is, 
 but we shouldn't use the term synchronous replication in the 
 documentation, and release notes and such.

allow_standalone_primary=off means wait forever. It does nothing to
reduce data loss since you can't replicate to a server that isn't there.

As we discussed it, allow_standalone_primary=off was not a persistent
state, so shutting down the database would simply leave the data
committed. Which gives the same problem, but implicitly.

Truly synchronous requires two-phase commit, which this never was. So
the absence or presence of the poorly specified parameter called
allow_standalone_primary should have no bearing on what we call this
feature.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Andrew Dunstan



On 03/02/2011 03:39 PM, Simon Riggs wrote:

Truly synchronous requires two-phase commit, which this never was. So
the absence or presence of the poorly specified parameter called
allow_standalone_primary should have no bearing on what we call this
feature.



I haven't been following this very closely, but to me this screams out 
that we simply must not call it synchronous.


Just my $0.02 worth.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 allow_standalone_primary=off means wait forever. It does nothing
 to reduce data loss since you can't replicate to a server that
 isn't there.
 
Unless you're pulling from some persistent source which will then
feel free to discard what you have retrieved.  You can't assume
otherwise; it's not that rare a pattern for interfaces.  We use such
a pattern for accepting criminal complaints from district attorneys
and sending warrant information to police agencies.  Waiting a long
time (it won't *actually* be forever) is better than losing
information.
 
In other words, making a persistence promise which is not kept can
lose data on the client side, if the clients actually trust your
guarantees.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 3:45 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Simon Riggs si...@2ndquadrant.com wrote:

 allow_standalone_primary=off means wait forever. It does nothing
 to reduce data loss since you can't replicate to a server that
 isn't there.

 Unless you're pulling from some persistent source which will then
 feel free to discard what you have retrieved.  You can't assume
 otherwise; it's not that rare a pattern for interfaces.  We use such
 a pattern for accepting criminal complaints from district attorneys
 and sending warrant information to police agencies.  Waiting a long
 time (it won't *actually* be forever) is better than losing
 information.

 In other words, making a persistence promise which is not kept can
 lose data on the client side, if the clients actually trust your
 guarantees.

I agree.  I assumed that when Simon was talking about removing
allow_standalone_primary, he meant making the code always behave as if
it were turned OFF.  The common scenario here is bound to be:

1. Everything is humming along.
2. The network link between the master and standby drops.
3. Then it comes back up again.

After (2) and before (3), what should the behavior the master be?  It
seems clear to me that it should WAIT.  Otherwise, a crash on the
master now leaves you with transactions that were confirmed committed
but not actually replicated to the standby.  If you were OK with that
scenario, you would have used asynchronous replication in the first
place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 15:50 -0500, Robert Haas wrote:

 I assumed that when Simon was talking about removing
 allow_standalone_primary, he meant making the code always behave as if
 it were turned OFF. 

That is the part that is currently not fully specified, so no that is
not currently included in the patch.

That isn't double-talk for and I will not include it.

What I mean is I'd rather have something than nothing, whatever we
decide to call it.

But the people that want it had better come up with a clear definition
of how it will actually work, covering all cases, not just splish
splash, it kinda works and we'll let Simon will work out the rest.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2011-03-02 at 15:50 -0500, Robert Haas wrote:
 
 I assumed that when Simon was talking about removing
 allow_standalone_primary, he meant making the code always behave
 as if it were turned OFF. 
 
 That is the part that is currently not fully specified, so no that
 is not currently included in the patch.
 
 That isn't double-talk for and I will not include it.
 
 What I mean is I'd rather have something than nothing, whatever we
 decide to call it.
 
+1 on that.
 
 But the people that want it had better come up with a clear
 definition of how it will actually work
 
What is ill-defined?  I would have thought that the commit request
would hang indefinitely until the server was able to provide its
usual guarantees.  I'm not clear on what cases aren't covered by
that.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote:
 
 On 03/02/2011 03:39 PM, Simon Riggs wrote:
  Truly synchronous requires two-phase commit, which this never was. So
  the absence or presence of the poorly specified parameter called
  allow_standalone_primary should have no bearing on what we call this
  feature.
 
 
 I haven't been following this very closely, but to me this screams out 
 that we simply must not call it synchronous.

As long as we describe it via its characteristics, then I'll be happy:

* significantly reduces the possibility of data loss in a sensibly
configured cluster

* allow arbitrary N+k resilience that can meet and easily exceed
 5 nines data durability

* isn't two phase commit

* isn't a magic bullet that will protect your data even after your
hardware fails or is disconnected

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote:
 Fair enough. All I'm saying is that if we end up shipping without that 
 parameter (implying allow_standalone_primary=on), we need to call the 
 feature something else. The GUCs and code can probably stay as it is, 
 but we shouldn't use the term synchronous replication in the 
 documentation, and release notes and such.

 allow_standalone_primary=off means wait forever. It does nothing to
 reduce data loss since you can't replicate to a server that isn't there.

This is irrelevant to the point.  The point is that sync rep implies
that we will not *tell a client* its data is committed unless the commit
is down to disk in two places.  I agree with the people saying that not
having this parameter makes it not real sync rep.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 1. Everything is humming along.
 2. The network link between the master and standby drops.
 3. Then it comes back up again.

 After (2) and before (3), what should the behavior the master be?  It
 seems clear to me that it should WAIT.  Otherwise, a crash on the

That just means you want data high availability, not service HA.  Some
people want the *service* to stay available in such a situation.

 master now leaves you with transactions that were confirmed committed
 but not actually replicated to the standby.  If you were OK with that
 scenario, you would have used asynchronous replication in the first
 place.

What is so hard to understand in worst case scenario being different
than expected conditions.  We all know that getting the last percent
is more expensive than getting the 99 first one.  We have no reason to
force people into building for the last percent whatever their context.

So, what cases need to be defined wrt forbidding the primary to continue
alone?

 - in flight commit

   blocked until we can offer the asked for durability, wait forever

 - shutdown request

   blocked until standby acknowledge the final checkpoint

   are immediate shutdown requests permitted? what do they do?

What other cases are to be fully designed here?  Please note that above
list is just a way to start the design, not a definitive proposal from
me.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Andrew Dunstan



On 03/02/2011 04:13 PM, Simon Riggs wrote:

On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote:

On 03/02/2011 03:39 PM, Simon Riggs wrote:

Truly synchronous requires two-phase commit, which this never was. So
the absence or presence of the poorly specified parameter called
allow_standalone_primary should have no bearing on what we call this
feature.


I haven't been following this very closely, but to me this screams out
that we simply must not call it synchronous.

As long as we describe it via its characteristics, then I'll be happy:

* significantly reduces the possibility of data loss in a sensibly
configured cluster

* allow arbitrary N+k resilience that can meet and easily exceed
  5 nines data durability

* isn't two phase commit

* isn't a magic bullet that will protect your data even after your
hardware fails or is disconnected




Ok, so let's call it enhanced safety or something else that isn't a 
term of art.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Robert Haas
On Wed, Mar 2, 2011 at 4:19 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 1. Everything is humming along.
 2. The network link between the master and standby drops.
 3. Then it comes back up again.

 After (2) and before (3), what should the behavior the master be?  It
 seems clear to me that it should WAIT.  Otherwise, a crash on the

 That just means you want data high availability, not service HA.  Some
 people want the *service* to stay available in such a situation.

 master now leaves you with transactions that were confirmed committed
 but not actually replicated to the standby.  If you were OK with that
 scenario, you would have used asynchronous replication in the first
 place.

 What is so hard to understand in worst case scenario being different
 than expected conditions.  We all know that getting the last percent
 is more expensive than getting the 99 first one.  We have no reason to
 force people into building for the last percent whatever their context.

I don't understand how synchronous replication with
allow_standalone_primary=on gives you ANY extra nines.  AFAICS, the
only point of having synchronous replication is that you wait to
acknowledge the commit to the client until the commit record has been
replicated.  Doing that only when the standby happens to be connected
doesn't seem like it helps much.

If the master is up, then it doesn't really matter what the standby
does; we don't need high availability in that case, because we have
just plain regular old availability.

If the master goes down, then we need to know that we haven't lost any
confirmed-committed transactions.  With allow_standalone_primary=off,
we don't know that.  They might be, or they might not be.  Even if we
have 100 separate standbys, there is no way of knowing whether there
was a time period just before the crash during which the master
couldn't get out to the Internet, and some commits by clients on the
local network went through.  Maybe with some careful network
engineering you can convince yourself that that isn't very likely, but
I sure wouldn't bet on it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Yeb Havinga

On 2011-03-02 21:26, Kevin Grittner wrote:


I think including synchronous is OK as long as it's properly
qualified.  Off-hand thoughts in no particular order:

semi-synchronous
conditionally synchronous
synchronous with automatic failover to standalone
It would be good to name the concept equal to how other DBMSses call it, 
if they have a similar concept - don't know if Mysql's semisynchronous 
replication is the same, but after a quick read it sounds like it does.


regards,
Yeb Havinga

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 16:16 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Wed, 2011-03-02 at 22:10 +0200, Heikki Linnakangas wrote:
  Fair enough. All I'm saying is that if we end up shipping without that 
  parameter (implying allow_standalone_primary=on), we need to call the 
  feature something else. The GUCs and code can probably stay as it is, 
  but we shouldn't use the term synchronous replication in the 
  documentation, and release notes and such.
 
  allow_standalone_primary=off means wait forever. It does nothing to
  reduce data loss since you can't replicate to a server that isn't there.
 
 This is irrelevant to the point.  The point is that sync rep implies
 that we will not *tell a client* its data is committed unless the commit
 is down to disk in two places.  I agree with the people saying that not
 having this parameter makes it not real sync rep.

Depends what you think the point is.

Your comments go exactly to *my* point which is that the behaviour I'm
looking to commit maximises data durability *and* availability and is
the real choice that people will make. Claiming it isn't real will
make people scared of it, when its actually what they have been waiting
for. There is nothing half-arsed or unreal about what is being
delivered.

Let's not get hung up on textbook definitions, lets do the useful thing.

And to repeat, I am not against the other way. It has its place, in a
few cases. And I'm not against including it in this release either,
given we have a good definition.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Kevin Grittner
Yeb Havinga yebhavi...@gmail.com wrote:
 On 2011-03-02 21:26, Kevin Grittner wrote:

 I think including synchronous is OK as long as it's properly
 qualified.  Off-hand thoughts in no particular order:

 semi-synchronous
 conditionally synchronous
 synchronous with automatic failover to standalone
 It would be good to name the concept equal to how other DBMSses
 call it,  if they have a similar concept - don't know if Mysql's
 semisynchronous replication is the same, but after a quick read it
 sounds like it does.
 
I had no idea MySQL used that terminology; it just seemed apt for
describing a setup which is synchronous except when it isn't. Using
the same terminology for equivalent functionality has its pluses,
but might there be an trademark or other IP issues here?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Wed, 2011-03-02 at 16:24 -0500, Andrew Dunstan wrote:
 
 On 03/02/2011 04:13 PM, Simon Riggs wrote:
  On Wed, 2011-03-02 at 15:44 -0500, Andrew Dunstan wrote:
  On 03/02/2011 03:39 PM, Simon Riggs wrote:
  Truly synchronous requires two-phase commit, which this never was. So
  the absence or presence of the poorly specified parameter called
  allow_standalone_primary should have no bearing on what we call this
  feature.
 
  I haven't been following this very closely, but to me this screams out
  that we simply must not call it synchronous.
  As long as we describe it via its characteristics, then I'll be happy:
 
  * significantly reduces the possibility of data loss in a sensibly
  configured cluster
 
  * allow arbitrary N+k resilience that can meet and easily exceed
5 nines data durability
 
  * isn't two phase commit
 
  * isn't a magic bullet that will protect your data even after your
  hardware fails or is disconnected
 
 
 
 Ok, so let's call it enhanced safety or something else that isn't a 
 term of art.

Good plan.

Oracle avoided the whole issue by referring to the two modes as maximum
availability and maximum protection. I'm not sure if that is patented
or copyright etc, but I'm guessing they had this exact same discussion,
just a little less friendly.

Perhaps we can coin the term High Durability. 

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Thu, Mar 3, 2011 at 5:50 AM, Robert Haas robertmh...@gmail.com wrote:
 I agree.  I assumed that when Simon was talking about removing
 allow_standalone_primary, he meant making the code always behave as if
 it were turned OFF.

I feel the same thing.. Despite his saying, the patch implements
sync_replication_timeout_client, and if its value is too large,
the primary behaves like wait-forever. Though the primary
behaves like standalone only when it's started first without
connected standbys. So if we have
sync_replication_timeout_client, allow_standalone_primary
looks no longer useful.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Thu, Mar 3, 2011 at 6:33 AM, Robert Haas robertmh...@gmail.com wrote:
 I don't understand how synchronous replication with
 allow_standalone_primary=on gives you ANY extra nines.

When you start the primary (or when there is one connected standby and
it crashes), allow_standalone_primary = on allows the database service
to proceed. OTOH, setting the parameter to off keeps the service stopping
until new standby has connected and has caught up with the primary. This
would cause long service down time, and decrease the availability.

Of course, running the primary alone has the risk. If its disk gets corrupted
before new standby appears, some committed transactions are lost. But
we can decrease this risk to a certain extent by using RAID or something
to the storage. So I think that some systems can accept the risk and prefer
the availability of the database service. Josh explained clearly before why
allow_standalone_primary = off is required for his case.
http://archives.postgresql.org/message-id/4CAE2488.9020207%40agliodbs.com

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Fujii Masao
On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 To achieve the effect Fujii is looking for, we would have to silently drop
 the connection. That would correctly leave the client not knowing whether
 the transaction committed or not.

Yeah, this seems to make more sense.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Simon Riggs
On Thu, 2011-03-03 at 13:35 +0900, Fujii Masao wrote:
 On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  To achieve the effect Fujii is looking for, we would have to silently drop
  the connection. That would correctly leave the client not knowing whether
  the transaction committed or not.
 
 Yeah, this seems to make more sense.

How do you propose we do that?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-02 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 On Thu, Mar 3, 2011 at 12:11 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 To achieve the effect Fujii is looking for, we would have to silently drop
 the connection. That would correctly leave the client not knowing whether
 the transaction committed or not.

 Yeah, this seems to make more sense.

It was pointed out that sending an ERROR would not do because it would
likely lead to client code assuming the transaction failed, which might
or might not be the case.  But maybe we could send a WARNING and then
close the connection?  That would give humans a clue what had happened,
but not do anything to the state of automated clients.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Simon Riggs
On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
 Thanks for update of the patch!
 
 On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
  SyncRepRemoveFromQueue seems not to be as short-term as we can
  use the spinlock. Instead, LW lock should be used there.
 
 You seem to have forgotten to fix the above-mentioned issue.

Not forgotten.

 A spinlock can be used only for very short-term operation like
 read/write of some shared-variables. The operation on the queue
 is not short, so should be protected by LWLock, I think.

There's no need to sleep while holding locks and the operations are very
short in most cases. The code around it isn't trivial, but that's no
reason to use LWlocks.

LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
that in the current code. Other views welcome.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Simon Riggs
On Tue, 2011-03-01 at 16:28 +0900, Fujii Masao wrote:
 On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs si...@2ndquadrant.com wrote:
  PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
  replication as well as COMMIT PREPARED?
 
  PREPARE - Yes
  ROLLBACK - No
 
  Further discussion welcome
 
 If we don't make ROLLBACK PREPARED wait for replication, we might need to
 issue ROLLBACK PREPARED to new master again after failover, even if we've
 already received a success indication of ROLLBACK PREPARED from old master.
 This looks strange to me because, OTOH, in simple COMMIT/ROLLBACK case,
 we don't need to issue that to new master again after failover.

OK

  What if fast shutdown is requested while RecordTransactionCommit
  is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
  until replication has been successfully done (i.e., until at least one
  synchronous standby has connected to the master especially if
  allow_standalone_primary is disabled). Is this OK?
 
  A behaviour - important, though needs further discussion.
 
 One of the scenarios which I'm concerned is:
 
 1. The primary is running with allow_standalone_primary = on.
 2. While some backends are waiting for replication, the user requests
 fast shutdown.
 3. Since the timeout expires, those backends stop waiting and return the 
 success
 indication to the client (but not replicated to the standby).
 4. Since there is no backend waiting for replication, fast shutdown completes.
 5. The clusterware like pacemaker detects the death of the primary and
 triggers the
 failover.
 6. New primary doesn't have some transactions committed to the client, i.e.,
 transaction lost happens!!
 
 To avoid such a transaction lost, we should prevent the primary from
 returning the
 success indication to the client while fast shutdown is being executed, even 
 if
 allow_standalone_primary is enabled, I think. Thought?

Walsenders don't shutdown until after they have sent the shutdown
checkpoint.

We could make them wait for the reply from the standby at that point.

I'll think some more, not convinced yet.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Fujii Masao
On Tue, Mar 1, 2011 at 5:21 PM, Simon Riggs si...@2ndquadrant.com wrote:
 A spinlock can be used only for very short-term operation like
 read/write of some shared-variables. The operation on the queue
 is not short, so should be protected by LWLock, I think.

 There's no need to sleep while holding locks and the operations are very
 short in most cases. The code around it isn't trivial, but that's no
 reason to use LWlocks.

 LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
 that in the current code. Other views welcome.

The following description in in src/backend/storage/lmgr/README
leads me to further think that the spinlock should not be used there.
Because, in the patch, while the spinlock is being held, whole of the
waiting list (if there are many concurrent running transactions, this
might be large) can be searched, SetLatch can be called and
elog(WARNING) can be called (though this should not happen).


* Spinlocks.  These are intended for *very* short-term locks.  If a lock
is to be held more than a few dozen instructions, or across any sort of
kernel call (or even a call to a nontrivial subroutine), don't use a
spinlock. Spinlocks are primarily used as infrastructure for lightweight
locks. They are implemented using a hardware atomic-test-and-set
instruction, if available.  Waiting processes busy-loop until they can
get the lock. There is no provision for deadlock detection, automatic
release on error, or any other nicety.  There is a timeout if the lock
cannot be gotten after a minute or so (which is approximately forever in
comparison to the intended lock hold time, so this is certainly an error
condition).


Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Fujii Masao
On Tue, Mar 1, 2011 at 4:56 PM, Simon Riggs si...@2ndquadrant.com wrote:
  If allow_standalone_primary = on then we sit and wait until we hit
  client timeout, which occurs even after last standby has gone.

 In that case, why do backends need to wait until the timeout occurs?
 We can make those backends resume their transaction as soon as
 the last standby has gone. No?

 The guarantee provided is that we will wait for up to client timeout for
 the sync standby to confirm. If we stop waiting right at the point that
 an event occurs, it breaks the whole purpose of the feature.

ISTM that at least people (including me) who want to use synchronous
replication for high-availability don't like this behavior. Because, when
there is one synchronous standby and it's shut down, the transactions
would get blocked until the timeout happens. The system down time
gets longer.

Of course, if walsender doesn't detect the termination of replication
connection, I can't complain that transactions wait until the timeout
happens. But, otherwise, releasing the waiting transactions would
be help for high-availability, I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Fujii Masao
On Tue, Mar 1, 2011 at 5:29 PM, Simon Riggs si...@2ndquadrant.com wrote:
  What if fast shutdown is requested while RecordTransactionCommit
  is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
  until replication has been successfully done (i.e., until at least one
  synchronous standby has connected to the master especially if
  allow_standalone_primary is disabled). Is this OK?
 
  A behaviour - important, though needs further discussion.

 One of the scenarios which I'm concerned is:

 1. The primary is running with allow_standalone_primary = on.
 2. While some backends are waiting for replication, the user requests
 fast shutdown.
 3. Since the timeout expires, those backends stop waiting and return the 
 success
     indication to the client (but not replicated to the standby).
 4. Since there is no backend waiting for replication, fast shutdown 
 completes.
 5. The clusterware like pacemaker detects the death of the primary and
 triggers the
     failover.
 6. New primary doesn't have some transactions committed to the client, i.e.,
     transaction lost happens!!

 To avoid such a transaction lost, we should prevent the primary from
 returning the
 success indication to the client while fast shutdown is being executed, even 
 if
 allow_standalone_primary is enabled, I think. Thought?

 Walsenders don't shutdown until after they have sent the shutdown
 checkpoint.

 We could make them wait for the reply from the standby at that point.

Right. But what if the replication connection is closed before #3?
In this case, obviously walsender cannot send WAL up to the
shutdown checkpoint.

 I'll think some more, not convinced yet.

Thanks! I'll think more, too, to make sync rep more reliable!

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Yeb Havinga

On 2011-02-28 20:32, Simon Riggs wrote:


I have reworded it to see if that improves the explanation
Code available at git://github.com/simon2ndQuadrant/postgres.git


  If a standby is removed from the list of servers then it will stop
  being the synchronous standby, allowing another to take it's place.
How can I see at the master which one is the synchronous standby? It 
would be nice if it was an extra attribute in the pg_stat_replication 
view. As part of understanding syncrep better, we're going to work on that.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Robert Haas
On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
 Thanks for update of the patch!

 On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
  SyncRepRemoveFromQueue seems not to be as short-term as we can
  use the spinlock. Instead, LW lock should be used there.

 You seem to have forgotten to fix the above-mentioned issue.

 Not forgotten.

 A spinlock can be used only for very short-term operation like
 read/write of some shared-variables. The operation on the queue
 is not short, so should be protected by LWLock, I think.

 There's no need to sleep while holding locks and the operations are very
 short in most cases. The code around it isn't trivial, but that's no
 reason to use LWlocks.

 LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
 that in the current code. Other views welcome.

An LWLock is a lot safer, in general, than a spinlock.  A spinlock
mustn't do anything that could emit an error or abort (among other
things).  I doubt that the performance cost of using an LWLock rather
than a spin lock here is enough to matter, and the spin lock seems
more likely to result in hard-to-find bugs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
 A spinlock can be used only for very short-term operation like
 read/write of some shared-variables. The operation on the queue
 is not short, so should be protected by LWLock, I think.

 There's no need to sleep while holding locks and the operations are very
 short in most cases. The code around it isn't trivial, but that's no
 reason to use LWlocks.

What does in most cases mean?

 LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
 that in the current code. Other views welcome.

Simon, that is absolutely NOT acceptable.  Spinlocks are to be used only
for short straight-line code segments.  If the lock has any potential to
be held for more than nanoseconds, use an LWLock.  The contention costs
of the shortcut you propose are too high.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Mar 1, 2011 at 3:21 AM, Simon Riggs si...@2ndquadrant.com wrote:
 LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
 that in the current code. Other views welcome.

 An LWLock is a lot safer, in general, than a spinlock.  A spinlock
 mustn't do anything that could emit an error or abort (among other
 things).  I doubt that the performance cost of using an LWLock rather
 than a spin lock here is enough to matter, and the spin lock seems
 more likely to result in hard-to-find bugs.

Well, stuck spinlocks aren't exactly hard to identify.  But I agree that
the lack of any release-on-error infrastructure is a killer reason not
to use a spinlock for anything but short straight-line code segments.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-03-01 Thread Simon Riggs
On Tue, 2011-03-01 at 10:02 -0500, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Tue, 2011-03-01 at 15:51 +0900, Fujii Masao wrote:
  A spinlock can be used only for very short-term operation like
  read/write of some shared-variables. The operation on the queue
  is not short, so should be protected by LWLock, I think.
 
  There's no need to sleep while holding locks and the operations are very
  short in most cases. The code around it isn't trivial, but that's no
  reason to use LWlocks.
 
 What does in most cases mean?
 
  LWlocks are just spinlocks plus sem sleeps, so I don't see the need for
  that in the current code. Other views welcome.
 
 Simon, that is absolutely NOT acceptable.  Spinlocks are to be used only
 for short straight-line code segments.  If the lock has any potential to
 be held for more than nanoseconds, use an LWLock.  The contention costs
 of the shortcut you propose are too high.

No problem to change.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Yeb Havinga

On 2011-02-25 20:40, Jaime Casanova wrote:

On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havingayebhavi...@gmail.com  wrote:

I also did some initial testing on this patch and got the queue related
errors with  1 clients. With the code change from Jaime above I still got a
lot of 'not on queue warnings'.

I tried to understand how the queue was supposed to work - resulting in the
changes below that also incorporates a suggestion from Fujii upthread, to
early exit when myproc was found

yes, looking at the code, the warning and your patch... it seems yours
is the right solution...
I'm compiling right now to test again and see the effects, Robert
maybe you can test your failure case again? i'm really sure it's
related to this...
I did some more testing over the weekend with this patched v17 patch. 
Since you've posted a v18 patch, let me write some findings with the v17 
patch before continuing with the v18 patch.


The tests were done on a x86_64 platform, 1Gbit network interfaces, 3 
servers. Non default configuration changes are copy pasted at the end of 
this mail.


1) no automatic switch to other synchronous standby
- start master server, add synchronous standby 1
- change allow_standalone_primary to off
- add second synchronous standby
- wait until pg_stat_replication shows both standby's are in STREAMING state
- stop standby 1
what happens is that the master stalls, where I expected that it 
would've switched to standby 2 acknowledge commits.


The following thing was pilot error, but since I was test-piloting a new 
plane, I still think it might be usual feedback. In my opinion, any 
number and order of pg_ctl stops and starts on both the master and 
standby servers, as long as they are not with -m immediate, should never 
cause the state I reached.


2) reaching some sort of shutdown deadlock state
- start master server, add synchronous standby
- change allow_standalone_primary to off
then I did all sorts of test things, everything still ok. Then I wanted 
to shutdown everything, and maybe because of some symmetry (stack like) 
I did the following because I didn't think it through
- pg_ctl stop on standby (didn't actualy wait until done, but 
immediately in other terminal)

- pg_ctl stop on master
O wait.. master needs to sync transactions
- start standby again. but now: FATAL:  the database system is shutting down

There is no clean way to get out of this situation. 
allow_standalone_primary in the face of shutdowns might be tricky. Maybe 
shutdown must be prohibited to enter the shutting down phase in 
allow_standalone_primary = off together with no sync standby, that would 
allow for the sync standby to attach again.


3) PANIC on standby server
At some point a standby suddenly disconnected after I started a new 
pgbench run on a existing master/standby pair, with the following error 
in the logfile.


LOCATION:  libpqrcv_connect, libpqwalreceiver.c:171
PANIC:  XX000: heap_update_redo: failed to add tuple
CONTEXT:  xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new 
305453/102

LOCATION:  heap_xlog_update, heapam.c:4724
LOG:  0: startup process (PID 32597) was terminated by signal 6: Aborted

This might be due to pilot error as well; I did a several tests over the 
weekend and after this error I was more alert on remembering immediate 
shutdowns/starting with a clean backup after that, and didn't see 
similar errors since.


4) The performance of the syncrep seems to be quite an improvement over 
the previous syncrep patches, I've seen tps-ses of O(650) where the 
others were more like O(20). The O(650) tps is limited by the speed of 
the standby server I used-at several times the master would halt only 
because of heavy disk activity at the standby. A warning in the docs 
might be right: be sure to use good IO hardware for your synchronous 
replicas! With that bottleneck gone, I suspect the current syncrep 
version can go beyond 1000tps over 1 Gbit.


regards,
Yeb Havinga

recovery.conf:
standby_mode = 'on'
primary_conninfo = 'host=mg73 user=repuser password=pwd 
application_name=standby1'

trigger_file = '/tmp/postgresql.trigger.5432'

postgresql.conf nondefault parameters:
log_error_verbosity = verbose
log_min_messages = warning
log_min_error_statement = warning
listen_addresses = '*'# what IP address(es) to listen on;
search_path='\$user\, public, hl7'
archive_mode = on
archive_command = 'test ! -f /data/backup_in_progress || cp -i %p 
/archive/%f  /dev/null'

checkpoint_completion_target = 0.9
checkpoint_segments = 16
default_statistics_target = 500
constraint_exclusion = on
max_connections = 120
maintenance_work_mem = 128MB
effective_cache_size = 1GB
work_mem = 44MB
wal_buffers = 8MB
shared_buffers = 128MB
wal_level = 'archive'
max_wal_senders = 4
wal_keep_segments = 1000 # 16000MB (for production increase this)
synchronous_standby_names = 'standby1,standby2,standby3'
synchronous_replication = on
allow_standalone_primary = off


--
Sent via pgsql-hackers mailing 

Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-21 at 18:06 +0900, Fujii Masao wrote:

 Thanks for the patch!

Thanks for the review.

Code available at git://github.com/simon2ndQuadrant/postgres.git

 PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
 replication as well as COMMIT PREPARED?

PREPARE - Yes
ROLLBACK - No

Further discussion welcome

 What if fast shutdown is requested while RecordTransactionCommit
 is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
 until replication has been successfully done (i.e., until at least one
 synchronous standby has connected to the master especially if
 allow_standalone_primary is disabled). Is this OK?

A behaviour - important, though needs further discussion.

 We should emit WARNING when the synchronous standby with
 wal_receiver_status_interval = 0 connects to the master. Because,
 in that case, a transaction unexpectedly would wait for replication
 infinitely.

This can't happen because a WALSender only activates as a sync standby
once it has received a reply from the chosen standby.

 + /* Need a modifiable copy of string */
 + rawstring = pstrdup(SyncRepStandbyNames);
 +
 + /* Parse string into list of identifiers */
 + if (!SplitIdentifierString(rawstring, ',', elemlist))
 
 pfree(rawstring) and list_free(elemlist) should be called also if
 SplitIdentifierString returns TRUE. Otherwise, memory-leak would
 happen.

Fixed, thanks

 + ereport(FATAL,
 + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
 +errmsg(invalid list syntax for parameter
 \synchronous_standby_names\)));
 + return false;
 
 return false is not required here though that might be harmless.

Compiler likes it.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 10:31 +0100, Yeb Havinga wrote:

 1) no automatic switch to other synchronous standby
 - start master server, add synchronous standby 1
 - change allow_standalone_primary to off
 - add second synchronous standby
 - wait until pg_stat_replication shows both standby's are in STREAMING state
 - stop standby 1
 what happens is that the master stalls, where I expected that it 
 would've switched to standby 2 acknowledge commits.
 
 The following thing was pilot error, but since I was test-piloting a new 
 plane, I still think it might be usual feedback. In my opinion, any 
 number and order of pg_ctl stops and starts on both the master and 
 standby servers, as long as they are not with -m immediate, should never 
 cause the state I reached.

The behaviour of allow_synchronous_standby = off is pretty much
untested and does seem to have various gotchas in there.

 2) reaching some sort of shutdown deadlock state
 - start master server, add synchronous standby
 - change allow_standalone_primary to off
 then I did all sorts of test things, everything still ok. Then I wanted 
 to shutdown everything, and maybe because of some symmetry (stack like) 
 I did the following because I didn't think it through
 - pg_ctl stop on standby (didn't actualy wait until done, but 
 immediately in other terminal)
 - pg_ctl stop on master
 O wait.. master needs to sync transactions
 - start standby again. but now: FATAL:  the database system is shutting down
 
 There is no clean way to get out of this situation. 
 allow_standalone_primary in the face of shutdowns might be tricky. Maybe 
 shutdown must be prohibited to enter the shutting down phase in 
 allow_standalone_primary = off together with no sync standby, that would 
 allow for the sync standby to attach again.

The behaviour of allow_synchronous_standby = off is not something I'm
worried about personally and I've argued all along it sounds pretty
silly to me. If someone wants to spend some time defining how it
*should* work that might help matters. I'm inclined to remove it before
commit if it can't work cleanly, to be re-added at a later date if it
makes sense.

 
 3) PANIC on standby server
 At some point a standby suddenly disconnected after I started a new 
 pgbench run on a existing master/standby pair, with the following error 
 in the logfile.
 
 LOCATION:  libpqrcv_connect, libpqwalreceiver.c:171
 PANIC:  XX000: heap_update_redo: failed to add tuple
 CONTEXT:  xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new 
 305453/102
 LOCATION:  heap_xlog_update, heapam.c:4724
 LOG:  0: startup process (PID 32597) was terminated by signal 6: Aborted
 
 This might be due to pilot error as well; I did a several tests over the 
 weekend and after this error I was more alert on remembering immediate 
 shutdowns/starting with a clean backup after that, and didn't see 
 similar errors since.

Good. There are no changes in the patch for that section of code.

 4) The performance of the syncrep seems to be quite an improvement over 
 the previous syncrep patches, I've seen tps-ses of O(650) where the 
 others were more like O(20). The O(650) tps is limited by the speed of 
 the standby server I used-at several times the master would halt only 
 because of heavy disk activity at the standby. A warning in the docs 
 might be right: be sure to use good IO hardware for your synchronous 
 replicas! With that bottleneck gone, I suspect the current syncrep 
 version can go beyond 1000tps over 1 Gbit.

Good, thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Thu, 2011-02-24 at 22:08 +0900, Fujii Masao wrote:
 On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao masao.fu...@gmail.com wrote:
  I've read about two-tenths of the patch, so I'll submit another comments
  about the rest later. Sorry for the slow reviewing...
 
 Here are another comments:

Thanks for your comments
Code available at git://github.com/simon2ndQuadrant/postgres.git

 + {synchronous_standby_names, PGC_SIGHUP, WAL_REPLICATION,
 + gettext_noop(List of potential standby names to 
 synchronise with.),
 + NULL,
 + GUC_LIST_INPUT | GUC_IS_NAME
 
 Why did you add GUC_IS_NAME here? I don't think that it's reasonable
 to limit the length of this parameter to 63. Because dozens of standby
 names might be specified in the parameter.

OK, misunderstanding by me causing bug. Fixed

 SyncRepQueue-qlock should be initialized by calling SpinLockInit?

Fixed

 + * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group

 Typo: s/2010/2011

Fixed

 sync_replication_timeout_client would mess up the wait-forever option.
 So, when allow_standalone_primary is disabled, ISTM that
 sync_replication_timeout_client should have no effect.

Agreed, done.

 Please check max_wal_senders before calling SyncRepWaitForLSN for
 non-replication case.

SyncRepWaitForLSN() handles this

 SyncRepRemoveFromQueue seems not to be as short-term as we can
 use the spinlock. Instead, LW lock should be used there.
 
 + old_status = get_ps_display(len);
 + new_status = (char *) palloc(len + 21 + 1);
 + memcpy(new_status, old_status, len);
 + strcpy(new_status + len,  waiting for sync rep);
 + set_ps_display(new_status, false);
 + new_status[len] = '\0'; /* truncate off  waiting */
 
 Updating the PS display should be skipped if update_process_title is false.

Fixed.

 + /*
 +  * XXX extra code needed here to maintain sorted invariant.
 
 Yeah, such a code is required. I think that we can shorten the time
 it takes to find an insert position by searching the list backwards.
 Because the given LSN is expected to be relatively new in the queue.

Sure, just skipped that because of time pressure. Will add.

 +  * Our approach should be same as racing car - slow in, fast 
 out.
 +  */
 
 Really? Even when removing the entry from the queue, we need
 to search the queue as well as we do in the add-entry case.
 Why don't you make walsenders remove the entry from the queue,
 instead?

This models wakeup behaviour of LWlocks

 + longtimeout = SyncRepGetWaitTimeout();
 snip
 + bool timeout = false;
 snip
 + else if (timeout  0 
 + 
 TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(),
 + 
 now, timeout))
 + {
 + release = true;
 + timeout = true;
 + }
 
 You seem to mix up two timeout variables.

Yes, good catch. Fixed.

 + if (proc-lwWaitLink == MyProc)
 + {
 + /*
 +  * Remove ourselves from middle of queue.
 +  * No need to touch head or tail.
 +  */
 + proc-lwWaitLink = MyProc-lwWaitLink;
 + }
 
 When we find ourselves, we should break out of the loop soon,
 instead of continuing the loop to the end?

Incorporated in Yeb's patch

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Tue, 2011-02-22 at 14:38 +0900, Fujii Masao wrote:
 On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao masao.fu...@gmail.com wrote:
  I've read about a tenth of the patch, so I'll submit another comments
  about the rest later.
 
 Here are another comments:

Thanks for your comments
Code available at git://github.com/simon2ndQuadrant/postgres.git

 SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
 if the standby crashes while a transaction is waiting for replication,
 it waits infinitely.

Will think on this.

 sync_rep_service and potential_sync_standby are not required to be in the
 WalSnd shmem because only walsender accesses them.

For use in debug, if not later monitoring

 +static bool
 +SyncRepServiceAvailable(void)
 +{
 + bool result = false;
 +
 + SpinLockAcquire(WalSndCtl-ctlmutex);
 + result = WalSndCtl-sync_rep_service_available;
 + SpinLockRelease(WalSndCtl-ctlmutex);
 +
 + return result;
 +}

Fixed

 volatile pointer needs to be used to prevent code rearrangement.
 
 + slock_t ctlmutex;   /* locks shared variables shown 
 above */
 
 cltmutex should be initialized by calling SpinLockInit.

Fixed

 + /*
 +  * Stop providing the sync rep service, even if there 
 are
 +  * waiting backends.
 +  */
 + {
 + SpinLockAcquire(WalSndCtl-ctlmutex);
 + WalSndCtl-sync_rep_service_available = false;
 + SpinLockRelease(WalSndCtl-ctlmutex);
 + }
 
 sync_rep_service_available should be set to false only when
 there is no synchronous walsender.

The way I had it is correct because  if (MyWalSnd-sync_rep_service)
then if we're the sync walsender, so if we stop being it, then there
isn't one. A potential walsender might then become the sync walsender.

I think you'd like it if there was no gap at the point the potential wal
sender takes over? Just not sure how to do that robustly at present.
Will think some more.

 + /*
 +  * When we first start replication the standby will be behind the 
 primary.
 +  * For some applications, for example, synchronous replication, it is
 +  * important to have a clear state for this initial catchup mode, so we
 +  * can trigger actions when we change streaming state later. We may stay
 +  * in this state for a long time, which is exactly why we want to be
 +  * able to monitor whether or not we are still here.
 +  */
 + WalSndSetState(WALSNDSTATE_CATCHUP);
 +
 
 The above has already been committed. Please remove that from the patch.

Removed

 I don't like calling SyncRepReleaseWaiters for each feedback because
 I guess that it's too frequent. How about receiving all the feedbacks 
 available
 from the socket, and then calling SyncRepReleaseWaiters as well as
 walreceiver does?

Possibly, but an optimisation for later when we have behaviour correct.

 + boolownLatch;   /* do we own the above latch? */
 
 We can just remove the ownLatch flag.

Agreed, removed

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote:

 --- a/src/backend/replication/syncrep.c
 +++ b/src/backend/replication/syncrep.c
 @@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void)
  }
  else
  {
 +   bool found = false;
 +
  while (proc-lwWaitLink != NULL)
  {
  /* Are we the next proc in our traversal of the 
 queue? */
 @@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void)
   * No need to touch head or tail.
   */
  proc-lwWaitLink = MyProc-lwWaitLink;
 +   found = true;
 +   break;
  }
 
 -   if (proc-lwWaitLink == NULL)
 -   elog(WARNING, could not locate 
 ourselves on wait queue);
  proc = proc-lwWaitLink;
  }
 +   if (!found)
 +   elog(WARNING, could not locate ourselves on 
 wait queue);
 
 -   if (proc-lwWaitLink == NULL)   /* At tail */
 +   /* If MyProc was removed from the tail, maintain list 
 invariant head==tail */
 +   if (proc-lwWaitLink == NULL)
  {
 -   Assert(proc == MyProc);
 -   /* Remove ourselves from tail of queue */
 +   Assert(proc != MyProc); /* impossible since that 
 is the head=MyProc branch above */
  Assert(queue-tail == MyProc);
  queue-tail = proc;
  proc-lwWaitLink = NULL;

Used your suggested fix
Code available at git://github.com/simon2ndQuadrant/postgres.git

 I needed to add this to make the documentation compile
 
 --- a/doc/src/sgml/config.sgml
 +++ b/doc/src/sgml/config.sgml
 @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;
   You should also consider setting varnamehot_standby_feedback/
   as an alternative to using this parameter.
 /para
 + /listitem
 + /varlistentry
 + /variablelist/sect2
 
 sect2 id=runtime-config-sync-rep

Separate bug, will fix

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Thu, 2011-02-24 at 18:13 -0800, Daniel Farina wrote:

 I have also reproduced this. Notably, things seem fine as long as
 pgbench is confined to one backend, but as soon as two are used (-c 2)
 by the feature I can get segfaults.

Sorry that you all experienced this. I wasn't able to get concurrent
queue accesses even with -c 8, so I spent about half a day last week
investigating a possible spinlock locking flaw. That meant the code in
that area was untested, which is most obvious now. I guess that means I
should test on different hardware in future.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-21 at 21:35 +0900, Tatsuo Ishii wrote:
  Well, good news all round.
  
  v17 implements what I believe to be the final set of features for sync
  rep. This one I'm actually fairly happy with. It can be enjoyed best at
  DEBUG3.
  
  The patch is very lite touch on a few areas of code, plus a chunk of
  specific code, all on master-side. Pretty straight really. I'm sure
  problems will be found, its not long since I completed this; thanks to
  Daniel Farina for your help with patch assembly.
 
 +   primaryvarnamesynchronous_standby_names/ configuration 
 parameter/primary
 +  /indexterm
 +  listitem
 +   para
 +Specifies a list of standby names that can become the sole
 +synchronous standby. Other standby servers connect that are also on
 +the list become potential standbys. If the current synchronous 
 standby
 +goes away it will be replaced with one of the potential standbys.
 +Specifying more than one standby name can allow very high 
 availability.
 +   /para
 
 Can anybody please enlighten me? I do not quite follow Other standby
 servers connect that are also on the list become potential standbys
 part.
 
 Can I read this as Other standby servers that are also on the list
 become potential synchrnous standbys?

Yes


I have reworded it to see if that improves the explanation
Code available at git://github.com/simon2ndQuadrant/postgres.git

untagged text included here for clarity

 synchronous_standby_names

 Specifies a list of standby names that can become the sole
 synchronous standby.  At any time there can be only one synchronous
 standby server.  The first standby to connect that is listed here
 will become the synchronous standby server.  Other standby servers
 that connect will then become potential synchronous standbys.
 If the current synchronous standby disconnects for whatever reason
 it will be replaced with one of the potential standbys.
 Specifying more than one standby name can allow very high availability.

 The standby name is currently taken as the application_name of the
 standby, as set in the primary_conninfo on the standby. Names are
 not enforced for uniqueness, though clearly that could lead to
 confusion and misconfiguration. Specifying multiple standbys with the
 same name does not allow more than one standby to be the current
 synchronous standby.

 If a standby is removed from the list of servers then it will stop
 being the synchronous standby, allowing another to take it's place.
 Standbys may also be added to the list without restarting the server.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Fri, 2011-02-25 at 16:41 +0100, Yeb Havinga wrote:

 I needed to add this to make the documentation compile
 
 --- a/doc/src/sgml/config.sgml
 +++ b/doc/src/sgml/config.sgml
 @@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;
   You should also consider setting
 varnamehot_standby_feedback/
   as an alternative to using this parameter.
 /para
 + /listitem
 + /varlistentry
 + /variablelist/sect2
 
 sect2 id=runtime-config-sync-rep

Corrected, thanks.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Sat, 2011-02-19 at 22:52 -0500, Robert Haas wrote:
 On Sat, Feb 19, 2011 at 3:28 AM, Simon Riggs si...@2ndquadrant.com wrote:
  First, we should be clear to explain that you are referring to the fact
  that the request
   synchronous_commit = off
   synchronous_replication = on
  makes no sense in the way the replication system is currently designed,
  even though it is a wish-list item to make it work in 9.2+
 
 What exactly do you mean by make it work?  We can either (1) wait
 for the local commit and the remote commit (synchronous_commit=on,
 synchronous_replication=on), (2) wait for the local commit only
 (synchronous_commit=on, synchronous_replication=off), or (3) wait for
 neither (synchronous_commit=off, synchronous_replication=off).
 There's no fourth possible behavior, AFAICS.

Currently, no, since as we discussed earlier we currently need to fsync
WAL locally before it gets sent to standby.

 The question is whether synchronous_commit=off,
 synchronous_replication=on should behave like (1) or (3)

Yes, that is the right question.

 You have it as #1; I'm arguing
 it should be #3.  I realize it's an arguable point; I'm just arguing
 for what makes most sense to me.

Various comments follow on thread. We can pick this up once we've
committed the main patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote:

 I believe the problem is that the definition of IsOnSyncRepQueue is
 bogus, so that the loop in SyncRepWaitOnQueue always takes the first
 branch.

Sorry, don't see that. Jaime/Yeb fix applied.

 It was a little confusing to me setting this up that setting only
 synchronous_replication did nothing; I had to also set
 synchronous_standby_names.  We might need a cross-check there.  

I'm inclined to make an empty synchronous_standby_names mean that any
standby can become the sync standby. That seems more useful behaviour
and avoids the need for a cross-check (what exactly would we check??).

 I
 believe the docs for synchronous_replication also need some updating;
 this part appears to be out of date:
 
 +between primary and standby. The commit wait will last until
 the
 +first reply from any standby. Multiple standby servers allow
 +increased availability and possibly increase performance as
 well.

Agreed

 The words on the primary in the next sentence may not be necessary
 any more either, as I believe this parameter now has no effect
 anywhere else. 

Agreed

Docs changed: git://github.com/simon2ndQuadrant/postgres.git

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Robert Haas
On Mon, Feb 28, 2011 at 4:13 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Sat, 2011-02-19 at 23:26 -0500, Robert Haas wrote:

 I believe the problem is that the definition of IsOnSyncRepQueue is
 bogus, so that the loop in SyncRepWaitOnQueue always takes the first
 branch.

 Sorry, don't see that. Jaime/Yeb fix applied.

 It was a little confusing to me setting this up that setting only
 synchronous_replication did nothing; I had to also set
 synchronous_standby_names.  We might need a cross-check there.

 I'm inclined to make an empty synchronous_standby_names mean that any
 standby can become the sync standby. That seems more useful behaviour
 and avoids the need for a cross-check (what exactly would we check??).

Hmm, that is a little surprising but might be reasonable.  My thought
was that we would check that if synchronous_replication=on then
synchronous_standbys must be non-empty.  I think there ought to be
some way for the admin to turn synchronous replication *off* though,
in a way that an individual user cannot override.  How will we do
that?

 Docs changed: git://github.com/simon2ndQuadrant/postgres.git

I'm hoping you're going to post an updated patch once the current rash
of updates is all done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote:

  Docs changed: git://github.com/simon2ndQuadrant/postgres.git
 
 I'm hoping you're going to post an updated patch once the current rash
 of updates is all done.

Immediately prior to commit, yes. 

Everybody else has been nudging me towards developing in public view,
commit by commit on a public repo. So that's what I'm doing now, as
promised earlier. That should help people object to specific commits if
they no likey.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Robert Haas
On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote:

  Docs changed: git://github.com/simon2ndQuadrant/postgres.git

 I'm hoping you're going to post an updated patch once the current rash
 of updates is all done.

 Immediately prior to commit, yes.

 Everybody else has been nudging me towards developing in public view,
 commit by commit on a public repo. So that's what I'm doing now, as
 promised earlier. That should help people object to specific commits if
 they no likey.

It took a few days for the problems with the last version to shake
out.  I think you should give people about that much time again.  It's
not realistic to suppose that everyone will follow your git repo in
detail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 16:55 -0500, Robert Haas wrote:
 On Mon, Feb 28, 2011 at 4:36 PM, Simon Riggs si...@2ndquadrant.com wrote:
  On Mon, 2011-02-28 at 16:22 -0500, Robert Haas wrote:
 
   Docs changed: git://github.com/simon2ndQuadrant/postgres.git
 
  I'm hoping you're going to post an updated patch once the current rash
  of updates is all done.
 
  Immediately prior to commit, yes.
 
  Everybody else has been nudging me towards developing in public view,
  commit by commit on a public repo. So that's what I'm doing now, as
  promised earlier. That should help people object to specific commits if
  they no likey.
 
 It took a few days for the problems with the last version to shake
 out.  I think you should give people about that much time again.  It's
 not realistic to suppose that everyone will follow your git repo in
 detail.

Yeh, I'm not rushing to commit. And even afterwards I expect comments
that will mean I'm editing this for next month at least.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote:
  SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
  if the standby crashes while a transaction is waiting for replication,
  it waits infinitely.
 
 Will think on this.

The behaviour seems correct to me:

If allow_standalone_primary = off then you wish to wait forever (at your
request...)

If allow_standalone_primary = on then we sit and wait until we hit
client timeout, which occurs even after last standby has gone.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Fujii Masao
On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote:
  SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
  if the standby crashes while a transaction is waiting for replication,
  it waits infinitely.

 Will think on this.

 The behaviour seems correct to me:

 If allow_standalone_primary = off then you wish to wait forever (at your
 request...)

No, I've never wished wait-forever option for now. I'd like to make
the primary work alone when there is no connected standby, for
high-availability.

 If allow_standalone_primary = on then we sit and wait until we hit
 client timeout, which occurs even after last standby has gone.

In that case, why do backends need to wait until the timeout occurs?
We can make those backends resume their transaction as soon as
the last standby has gone. No?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Fujii Masao
Thanks for update of the patch!

On Tue, Mar 1, 2011 at 3:40 AM, Simon Riggs si...@2ndquadrant.com wrote:
 SyncRepRemoveFromQueue seems not to be as short-term as we can
 use the spinlock. Instead, LW lock should be used there.

You seem to have forgotten to fix the above-mentioned issue.
A spinlock can be used only for very short-term operation like
read/write of some shared-variables. The operation on the queue
is not short, so should be protected by LWLock, I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Fujii Masao
On Tue, Mar 1, 2011 at 3:39 AM, Simon Riggs si...@2ndquadrant.com wrote:
 PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
 replication as well as COMMIT PREPARED?

 PREPARE - Yes
 ROLLBACK - No

 Further discussion welcome

If we don't make ROLLBACK PREPARED wait for replication, we might need to
issue ROLLBACK PREPARED to new master again after failover, even if we've
already received a success indication of ROLLBACK PREPARED from old master.
This looks strange to me because, OTOH, in simple COMMIT/ROLLBACK case,
we don't need to issue that to new master again after failover.

 What if fast shutdown is requested while RecordTransactionCommit
 is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
 until replication has been successfully done (i.e., until at least one
 synchronous standby has connected to the master especially if
 allow_standalone_primary is disabled). Is this OK?

 A behaviour - important, though needs further discussion.

One of the scenarios which I'm concerned is:

1. The primary is running with allow_standalone_primary = on.
2. While some backends are waiting for replication, the user requests
fast shutdown.
3. Since the timeout expires, those backends stop waiting and return the success
indication to the client (but not replicated to the standby).
4. Since there is no backend waiting for replication, fast shutdown completes.
5. The clusterware like pacemaker detects the death of the primary and
triggers the
failover.
6. New primary doesn't have some transactions committed to the client, i.e.,
transaction lost happens!!

To avoid such a transaction lost, we should prevent the primary from
returning the
success indication to the client while fast shutdown is being executed, even if
allow_standalone_primary is enabled, I think. Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-28 Thread Simon Riggs
On Tue, 2011-03-01 at 15:25 +0900, Fujii Masao wrote:
 On Tue, Mar 1, 2011 at 9:19 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On Mon, 2011-02-28 at 18:40 +, Simon Riggs wrote:
   SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
   if the standby crashes while a transaction is waiting for replication,
   it waits infinitely.
 
  Will think on this.
 
  The behaviour seems correct to me:
 
  If allow_standalone_primary = off then you wish to wait forever (at your
  request...)
 
 No, I've never wished wait-forever option for now. I'd like to make
 the primary work alone when there is no connected standby, for
 high-availability.

Good news, please excuse that reference.

  If allow_standalone_primary = on then we sit and wait until we hit
  client timeout, which occurs even after last standby has gone.
 
 In that case, why do backends need to wait until the timeout occurs?
 We can make those backends resume their transaction as soon as
 the last standby has gone. No?

The guarantee provided is that we will wait for up to client timeout for
the sync standby to confirm. If we stop waiting right at the point that
an event occurs, it breaks the whole purpose of the feature.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/books/
 PostgreSQL Development, 24x7 Support, Training and Services
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Yeb Havinga

On 2011-02-22 20:43, Jaime Casanova wrote:


you can make this happen more easily, i just run pgbench -n -c10 -j10
test and qot that warning and sometimes a segmentation fault and
sometimes a failed assertion

and the problematic code starts at
src/backend/replication/syncrep.c:277, here my suggestions on that
code.
still i get a failed assertion because of the second Assert (i think
we should just remove that one)

*** SyncRepRemoveFromQueue(void)
*** 288,299 

 if (proc-lwWaitLink == NULL)
 elog(WARNING, could not locate
ourselves on wait queue);
!   proc = proc-lwWaitLink;
 }

 if (proc-lwWaitLink == NULL)   /* At tail */
 {
!   Assert(proc == MyProc);
 /* Remove ourselves from tail of queue */
 Assert(queue-tail == MyProc);
 queue-tail = proc;
--- 288,300 

 if (proc-lwWaitLink == NULL)
 elog(WARNING, could not locate
ourselves on wait queue);
!   else
!   proc = proc-lwWaitLink;
 }

 if (proc-lwWaitLink == NULL)   /* At tail */
 {
!   Assert(proc != MyProc);
 /* Remove ourselves from tail of queue */
 Assert(queue-tail == MyProc);
 queue-tail = proc;

I also did some initial testing on this patch and got the queue related 
errors with  1 clients. With the code change from Jaime above I still 
got a lot of 'not on queue warnings'.


I tried to understand how the queue was supposed to work - resulting in 
the changes below that also incorporates a suggestion from Fujii 
upthread, to early exit when myproc was found.


With the changes below all seems to work without warnings. I now see 
that the note about the list invariant is too short, better was: if 
queue length = 1 then head = tail


--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -274,6 +274,8 @@ SyncRepRemoveFromQueue(void)
}
else
{
+   bool found = false;
+
while (proc-lwWaitLink != NULL)
{
/* Are we the next proc in our traversal of the 
queue? */

@@ -284,17 +286,19 @@ SyncRepRemoveFromQueue(void)
 * No need to touch head or tail.
 */
proc-lwWaitLink = MyProc-lwWaitLink;
+   found = true;
+   break;
}

-   if (proc-lwWaitLink == NULL)
-   elog(WARNING, could not locate 
ourselves on wait queue);

proc = proc-lwWaitLink;
}
+   if (!found)
+   elog(WARNING, could not locate ourselves on 
wait queue);


-   if (proc-lwWaitLink == NULL)   /* At tail */
+   /* If MyProc was removed from the tail, maintain list 
invariant head==tail */

+   if (proc-lwWaitLink == NULL)
{
-   Assert(proc == MyProc);
-   /* Remove ourselves from tail of queue */
+   Assert(proc != MyProc); /* impossible since that 
is the head=MyProc branch above */

Assert(queue-tail == MyProc);
queue-tail = proc;
proc-lwWaitLink = NULL;

I needed to add this to make the documentation compile

--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2010,6 +2010,9 @@ SET ENABLE_SEQSCAN TO OFF;
 You should also consider setting varnamehot_standby_feedback/
 as an alternative to using this parameter.
/para
+ /listitem
+ /varlistentry
+ /variablelist/sect2

sect2 id=runtime-config-sync-rep

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Jaime Casanova
On Fri, Feb 25, 2011 at 10:41 AM, Yeb Havinga yebhavi...@gmail.com wrote:

 I also did some initial testing on this patch and got the queue related
 errors with  1 clients. With the code change from Jaime above I still got a
 lot of 'not on queue warnings'.

 I tried to understand how the queue was supposed to work - resulting in the
 changes below that also incorporates a suggestion from Fujii upthread, to
 early exit when myproc was found.


yes, looking at the code, the warning and your patch... it seems yours
is the right solution...
I'm compiling right now to test again and see the effects, Robert
maybe you can test your failure case again? i'm really sure it's
related to this...

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Jeff Davis
On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote:
 Oh, yes, this reproduces past shutdowns/startups, and there's quite a
 few txids before I catch up. I'm also comfortable poking around with
 gdb (I have already recompiled with debugging symbols and
 optimizations off and was poking around, especially at
 MemoryContextStats(TopMemoryContext), but was not rewarded.

Where is all of that memory going during recovery? Recovery shouldn't
use much memory at all, as far as I can tell.

What's even allocating memory at all?

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-25 Thread Daniel Farina
On Fri, Feb 25, 2011 at 4:52 PM, Jeff Davis pg...@j-davis.com wrote:
 On Wed, 2011-02-23 at 22:42 -0800, Daniel Farina wrote:
 Oh, yes, this reproduces past shutdowns/startups, and there's quite a
 few txids before I catch up. I'm also comfortable poking around with
 gdb (I have already recompiled with debugging symbols and
 optimizations off and was poking around, especially at
 MemoryContextStats(TopMemoryContext), but was not rewarded.

 Where is all of that memory going during recovery? Recovery shouldn't
 use much memory at all, as far as I can tell.

 What's even allocating memory at all?

I noticed this is RSS fooling with me. As pages get touched in shared
memory, for some reason RSS was constantly getting increased, along
with SHR at the same time.

Still, the long recovery time was mystifying to me, considering the
lack of unclean shutdowns.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Fujii Masao
On Tue, Feb 22, 2011 at 2:38 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I've read about two-tenths of the patch, so I'll submit another comments
 about the rest later. Sorry for the slow reviewing...

Here are another comments:

+   {synchronous_standby_names, PGC_SIGHUP, WAL_REPLICATION,
+   gettext_noop(List of potential standby names to 
synchronise with.),
+   NULL,
+   GUC_LIST_INPUT | GUC_IS_NAME

Why did you add GUC_IS_NAME here? I don't think that it's reasonable
to limit the length of this parameter to 63. Because dozens of standby
names might be specified in the parameter.

SyncRepQueue-qlock should be initialized by calling SpinLockInit?

+ * Portions Copyright (c) 2010-2010, PostgreSQL Global Development Group

Typo: s/2010/2011

sync_replication_timeout_client would mess up the wait-forever option.
So, when allow_standalone_primary is disabled, ISTM that
sync_replication_timeout_client should have no effect.

Please check max_wal_senders before calling SyncRepWaitForLSN for
non-replication case.

SyncRepRemoveFromQueue seems not to be as short-term as we can
use the spinlock. Instead, LW lock should be used there.

+   old_status = get_ps_display(len);
+   new_status = (char *) palloc(len + 21 + 1);
+   memcpy(new_status, old_status, len);
+   strcpy(new_status + len,  waiting for sync rep);
+   set_ps_display(new_status, false);
+   new_status[len] = '\0'; /* truncate off  waiting */

Updating the PS display should be skipped if update_process_title is false.

+   /*
+* XXX extra code needed here to maintain sorted invariant.

Yeah, such a code is required. I think that we can shorten the time
it takes to find an insert position by searching the list backwards.
Because the given LSN is expected to be relatively new in the queue.

+* Our approach should be same as racing car - slow in, fast 
out.
+*/

Really? Even when removing the entry from the queue, we need
to search the queue as well as we do in the add-entry case.
Why don't you make walsenders remove the entry from the queue,
instead?

+   longtimeout = SyncRepGetWaitTimeout();
snip
+   bool timeout = false;
snip
+   else if (timeout  0 
+   
TimestampDifferenceExceeds(GetCurrentTransactionStopTimestamp(),
+   
now, timeout))
+   {
+   release = true;
+   timeout = true;
+   }

You seem to mix up two timeout variables.

+   if (proc-lwWaitLink == MyProc)
+   {
+   /*
+* Remove ourselves from middle of queue.
+* No need to touch head or tail.
+*/
+   proc-lwWaitLink = MyProc-lwWaitLink;
+   }

When we find ourselves, we should break out of the loop soon,
instead of continuing the loop to the end?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Daniel Farina
On Tue, Feb 22, 2011 at 11:43 AM, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas robertmh...@gmail.com wrote:

 DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
 DEBUG:  released 0 procs up to 0/3014690
 DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
 DEBUG:  released 2 procs up to 0/3027BC8
 WARNING:  could not locate ourselves on wait queue
 server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
 The connection to the server was lost. Attempting reset: DEBUG:

 you can make this happen more easily, i just run pgbench -n -c10 -j10
 test and qot that warning and sometimes a segmentation fault and
 sometimes a failed assertion

I have also reproduced this. Notably, things seem fine as long as
pgbench is confined to one backend, but as soon as two are used (-c 2)
by the feature I can get segfaults.

In the UI department, I am finding it somewhat difficult to affirm
that I am, in fact, synchronously replicating anything in the HA
scenario (where I never want to block. However, by enjoying the patch
at DEBUG3 and running what I think to be syncrepped and non-syncrepped
runs, I believe that I am not committing user error (seeing syncrep
waiting vs. lack thereof).  This is in part hard to confirm because
the single-backend performance (if DEBUG3 is to be believed, I will
write a real torture test later) of syncrep is actually very good, I
was expecting a more perceptible performance dropoff. But then again,
I imagine the real kicker will happen when we can run concurrent
backends. Also, Amazon EBS doesn't have the fastest disks, and within
an availability zone network latency is awfully low.

I can't quite explain what I was seeing before w.r.t.  memory usage,
and more pressingly, a very slow recover. I turned off hot standby and
was messing around and, before I knew it, the server was caught up. I
do not know if that was just coincidence (probably) or overhead
imposed by HS. The very high RES number was linux fooling me, as most
of it was SHR and in SHMEM.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Daniel Farina
With some more fooling around, I have also managed to get this elog(WARNING)

if (proc-lwWaitLink == NULL)
elog(WARNING, could not locate ourselves on 
wait queue);

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-24 Thread Jaime Casanova
On Mon, Feb 21, 2011 at 4:06 AM, Fujii Masao masao.fu...@gmail.com wrote:

 PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
 replication as well as COMMIT PREPARED?


maybe ROLLBACK PREPARED but i'm not sure... i'm pretty sure we don't
need to wait for PREPARE TRANSACTION, but i could be wrong

 What if fast shutdown is requested while RecordTransactionCommit
 is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
 until replication has been successfully done (i.e., until at least one
 synchronous standby has connected to the master especially if
 allow_standalone_primary is disabled). Is this OK?


i guess that is debatable, IMHO if there is a synch standby then wait
(because the user request durability of the data),
if there isn't a synch rep wait until the timeout (even if
allow_standalone_primary is disabled) because probably this is
a miss configuration or an special case i'm trying to handle (network
broke, data center of standbies explode, etc).

 We should emit WARNING when the synchronous standby with
 wal_receiver_status_interval = 0 connects to the master. Because,
 in that case, a transaction unexpectedly would wait for replication
 infinitely.


actually i think we should reject such standby as a synch standby, and
look for another one in the synchronous_standby_names list

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-23 Thread Daniel Farina
On Fri, Feb 18, 2011 at 4:06 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Well, good news all round.

 v17 implements what I believe to be the final set of features for sync
 rep. This one I'm actually fairly happy with. It can be enjoyed best at
 DEBUG3.

I've been messing with this patch and am wondering if this behavior is expected:

I've been frobbing the server around (I was messing around with the
syncrep feature, but do not know if this is related just yet), and
came upon a case I do not expect: it would appear that prior to
establishing a connection to do streaming replication, the startup
process (which is recovering) is very slowly catching up (or so it
would be indicated by txid_current_snapshot()) and eating up enormous
amounts of memory, such as 6GB at a time in RES,  monotonically
increasing. Furthermore, the incrementation of the txid_snapshot is
very slow, and it doesn't seem like I'm coming close to making full
use of my resources: cpu and block devices are not very busy.  There
may have been a brief spurt of pgbench activity that would generate
such WAL traffic to replay.

I have not done a hard shutdown to my knowledge, and the server does
allow me to query relatively quickly as a standby.

Looks like I'm about to hit 7+GB. Is there something I'm missing?

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-23 Thread Daniel Farina
On Wed, Feb 23, 2011 at 10:39 PM, Daniel Farina dan...@heroku.com wrote:
 On Fri, Feb 18, 2011 at 4:06 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Well, good news all round.

 v17 implements what I believe to be the final set of features for sync
 rep. This one I'm actually fairly happy with. It can be enjoyed best at
 DEBUG3.

 I've been messing with this patch and am wondering if this behavior is 
 expected:

 I've been frobbing the server around (I was messing around with the
 syncrep feature, but do not know if this is related just yet), and
 came upon a case I do not expect: it would appear that prior to
 establishing a connection to do streaming replication, the startup
 process (which is recovering) is very slowly catching up (or so it
 would be indicated by txid_current_snapshot()) and eating up enormous
 amounts of memory, such as 6GB at a time in RES,  monotonically
 increasing. Furthermore, the incrementation of the txid_snapshot is
 very slow, and it doesn't seem like I'm coming close to making full
 use of my resources: cpu and block devices are not very busy.  There
 may have been a brief spurt of pgbench activity that would generate
 such WAL traffic to replay.

 I have not done a hard shutdown to my knowledge, and the server does
 allow me to query relatively quickly as a standby.

Oh, yes, this reproduces past shutdowns/startups, and there's quite a
few txids before I catch up. I'm also comfortable poking around with
gdb (I have already recompiled with debugging symbols and
optimizations off and was poking around, especially at
MemoryContextStats(TopMemoryContext), but was not rewarded.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Marti Raudsepp
On Tue, Feb 22, 2011 at 07:38, Fujii Masao masao.fu...@gmail.com wrote:
 +       SpinLockAcquire(WalSndCtl-ctlmutex);
 +       result = WalSndCtl-sync_rep_service_available;
 +       SpinLockRelease(WalSndCtl-ctlmutex);

 volatile pointer needs to be used to prevent code rearrangement.

I don't think that's necessary. Spinlock functions already prevent
reordering using __asm__ __volatile__

Otherwise, surely they would be utterly broken?

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Andres Freund
On Tuesday 22 February 2011 09:59:21 Marti Raudsepp wrote:
 On Tue, Feb 22, 2011 at 07:38, Fujii Masao masao.fu...@gmail.com wrote:
  +   SpinLockAcquire(WalSndCtl-ctlmutex);
  +   result = WalSndCtl-sync_rep_service_available;
  +   SpinLockRelease(WalSndCtl-ctlmutex);
  
  volatile pointer needs to be used to prevent code rearrangement.
 
 I don't think that's necessary. Spinlock functions already prevent
 reordering using __asm__ __volatile__
 
 Otherwise, surely they would be utterly broken?
Its not the spinlock thats the problem but that result may be already loaded 
into a register. Thats not prohibited by __asm__ __volatile__.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Greg Smith

Daniel Farina wrote:

As it will be somewhat hard to prove the durability guarantees of
commit without special heroics, unless someone can suggest a
mechanism.


Could you introduce a hack creating deterministic server side crashes in 
order to test this out?  The simplest thing that comes to mind is a rule 
like kick shared memory in the teeth to force a crash after every 100 
commits, then see if #100 shows up as expected.  Pick two different 
small numbers for the interval and you could probably put that on both 
sides to simulate all sorts of badness.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Tom Lane
Marti Raudsepp ma...@juffo.org writes:
 On Tue, Feb 22, 2011 at 07:38, Fujii Masao masao.fu...@gmail.com wrote:
 +   SpinLockAcquire(WalSndCtl-ctlmutex);
 +   result = WalSndCtl-sync_rep_service_available;
 +   SpinLockRelease(WalSndCtl-ctlmutex);

 volatile pointer needs to be used to prevent code rearrangement.

 I don't think that's necessary. Spinlock functions already prevent
 reordering using __asm__ __volatile__

You're mistaken.  We started using that volatile-pointer convention
after noting that some compilers would misoptimize the code otherwise.

It's not a problem with LWLock-protected stuff because the LWLock calls
are actual out-of-line function calls, and the compiler knows it doesn't
know what those functions might do.  But gcc is a lot more willing to
reorder stuff around asm operations, so you can't assume that
SpinLockAcquire/SpinLockRelease are equally safe.  The way to prevent
optimization bugs is to make sure that the fetches/stores protected by a
spinlock are done through volatile pointers.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Jaime Casanova
On Sat, Feb 19, 2011 at 11:26 PM, Robert Haas robertmh...@gmail.com wrote:

 DEBUG:  write 0/3027BC8 flush 0/3014690 apply 0/3014690
 DEBUG:  released 0 procs up to 0/3014690
 DEBUG:  write 0/3027BC8 flush 0/3027BC8 apply 0/3014690
 DEBUG:  released 2 procs up to 0/3027BC8
 WARNING:  could not locate ourselves on wait queue
 server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
 The connection to the server was lost. Attempting reset: DEBUG:

you can make this happen more easily, i just run pgbench -n -c10 -j10
test and qot that warning and sometimes a segmentation fault and
sometimes a failed assertion

and the problematic code starts at
src/backend/replication/syncrep.c:277, here my suggestions on that
code.
still i get a failed assertion because of the second Assert (i think
we should just remove that one)

*** SyncRepRemoveFromQueue(void)
*** 288,299 

if (proc-lwWaitLink == NULL)
elog(WARNING, could not locate
ourselves on wait queue);
!   proc = proc-lwWaitLink;
}

if (proc-lwWaitLink == NULL)   /* At tail */
{
!   Assert(proc == MyProc);
/* Remove ourselves from tail of queue */
Assert(queue-tail == MyProc);
queue-tail = proc;
--- 288,300 

if (proc-lwWaitLink == NULL)
elog(WARNING, could not locate
ourselves on wait queue);
!   else
!   proc = proc-lwWaitLink;
}

if (proc-lwWaitLink == NULL)   /* At tail */
{
!   Assert(proc != MyProc);
/* Remove ourselves from tail of queue */
Assert(queue-tail == MyProc);
queue-tail = proc;

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte y capacitación de PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-22 Thread Daniel Farina
On Tue, Feb 22, 2011 at 5:04 AM, Greg Smith g...@2ndquadrant.com wrote:
 Daniel Farina wrote:

 As it will be somewhat hard to prove the durability guarantees of
 commit without special heroics, unless someone can suggest a
 mechanism.

 Could you introduce a hack creating deterministic server side crashes in
 order to test this out?  The simplest thing that comes to mind is a rule
 like kick shared memory in the teeth to force a crash after every 100
 commits, then see if #100 shows up as expected.  Pick two different small
 numbers for the interval and you could probably put that on both sides to
 simulate all sorts of badness.

I probably could via function, would a kill -9 also be of interest to you?

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Sat, Feb 19, 2011 at 9:06 AM, Simon Riggs si...@2ndquadrant.com wrote:

 Well, good news all round.

 v17 implements what I believe to be the final set of features for sync
 rep. This one I'm actually fairly happy with. It can be enjoyed best at
 DEBUG3.

 The patch is very lite touch on a few areas of code, plus a chunk of
 specific code, all on master-side. Pretty straight really. I'm sure
 problems will be found, its not long since I completed this; thanks to
 Daniel Farina for your help with patch assembly.

 Which is just as well, because the other news is that I'm off on holiday
 for a few days, which is most inconvenient. I won't be committing this
 for at least a week and absent from the list. OTOH, I think its ready
 for a final review and commit, so I'm OK if you commit or OK if you
 leave it for me.

Thanks for the patch!

Here are the comments:


PREPARE TRANSACTION and ROLLBACK PREPARED should wait for
replication as well as COMMIT PREPARED?

What if fast shutdown is requested while RecordTransactionCommit
is waiting in SyncRepWaitForLSN? ISTM fast shutdown cannot complete
until replication has been successfully done (i.e., until at least one
synchronous standby has connected to the master especially if
allow_standalone_primary is disabled). Is this OK?

We should emit WARNING when the synchronous standby with
wal_receiver_status_interval = 0 connects to the master. Because,
in that case, a transaction unexpectedly would wait for replication
infinitely.

+   /* Need a modifiable copy of string */
+   rawstring = pstrdup(SyncRepStandbyNames);
+
+   /* Parse string into list of identifiers */
+   if (!SplitIdentifierString(rawstring, ',', elemlist))

pfree(rawstring) and list_free(elemlist) should be called also if
SplitIdentifierString returns TRUE. Otherwise, memory-leak would
happen.

+   ereport(FATAL,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+  errmsg(invalid list syntax for parameter
\synchronous_standby_names\)));
+   return false;

return false is not required here though that might be harmless.


I've read about a tenth of the patch, so I'll submit another comments
about the rest later.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Tatsuo Ishii
 Well, good news all round.
 
 v17 implements what I believe to be the final set of features for sync
 rep. This one I'm actually fairly happy with. It can be enjoyed best at
 DEBUG3.
 
 The patch is very lite touch on a few areas of code, plus a chunk of
 specific code, all on master-side. Pretty straight really. I'm sure
 problems will be found, its not long since I completed this; thanks to
 Daniel Farina for your help with patch assembly.

+   primaryvarnamesynchronous_standby_names/ configuration 
parameter/primary
+  /indexterm
+  listitem
+   para
+Specifies a list of standby names that can become the sole
+synchronous standby. Other standby servers connect that are also on
+the list become potential standbys. If the current synchronous standby
+goes away it will be replaced with one of the potential standbys.
+Specifying more than one standby name can allow very high availability.
+   /para

Can anybody please enlighten me? I do not quite follow Other standby
servers connect that are also on the list become potential standbys
part.

Can I read this as Other standby servers that are also on the list
become potential synchrnous standbys?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Daniel Farina
On Mon, Feb 21, 2011 at 4:35 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Well, good news all round.

Hello on this thread,

I'm taking a look at replication timeout with non-blocking which would
be nice but not required for this patch, in my understanding. But
before that, we're going to put this patch through some exercise via
pgbench to determine two things:

* How much performance is impacted

* Does it crash?

As it will be somewhat hard to prove the durability guarantees of
commit without special heroics, unless someone can suggest a
mechanism. Expect some results Real Soon.

-- 
fdr

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Tue, Feb 22, 2011 at 7:55 AM, Daniel Farina dan...@heroku.com wrote:
 I'm taking a look at replication timeout with non-blocking which would
 be nice but not required for this patch, in my understanding.

Why do you think so? You think sync_replication_timeout_client is sufficient
for sync rep?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Mon, Feb 21, 2011 at 6:06 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I've read about a tenth of the patch, so I'll submit another comments
 about the rest later.

Here are another comments:

SyncRepReleaseWaiters should be called when walsender exits. Otherwise,
if the standby crashes while a transaction is waiting for replication,
it waits infinitely.

sync_rep_service and potential_sync_standby are not required to be in the
WalSnd shmem because only walsender accesses them.

+static bool
+SyncRepServiceAvailable(void)
+{
+   bool result = false;
+
+   SpinLockAcquire(WalSndCtl-ctlmutex);
+   result = WalSndCtl-sync_rep_service_available;
+   SpinLockRelease(WalSndCtl-ctlmutex);
+
+   return result;
+}

volatile pointer needs to be used to prevent code rearrangement.

+   slock_t ctlmutex;   /* locks shared variables shown 
above */

cltmutex should be initialized by calling SpinLockInit.

+   /*
+* Stop providing the sync rep service, even if there 
are
+* waiting backends.
+*/
+   {
+   SpinLockAcquire(WalSndCtl-ctlmutex);
+   WalSndCtl-sync_rep_service_available = false;
+   SpinLockRelease(WalSndCtl-ctlmutex);
+   }

sync_rep_service_available should be set to false only when
there is no synchronous walsender.

+   /*
+* When we first start replication the standby will be behind the 
primary.
+* For some applications, for example, synchronous replication, it is
+* important to have a clear state for this initial catchup mode, so we
+* can trigger actions when we change streaming state later. We may stay
+* in this state for a long time, which is exactly why we want to be
+* able to monitor whether or not we are still here.
+*/
+   WalSndSetState(WALSNDSTATE_CATCHUP);
+

The above has already been committed. Please remove that from the patch.

I don't like calling SyncRepReleaseWaiters for each feedback because
I guess that it's too frequent. How about receiving all the feedbacks available
from the socket, and then calling SyncRepReleaseWaiters as well as
walreceiver does?

+   boolownLatch;   /* do we own the above latch? */

We can just remove the ownLatch flag.


I've read about two-tenths of the patch, so I'll submit another comments
about the rest later. Sorry for the slow reviewing...

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sync Rep v17

2011-02-21 Thread Fujii Masao
On Mon, Feb 21, 2011 at 9:35 PM, Tatsuo Ishii is...@postgresql.org wrote:
 +       primaryvarnamesynchronous_standby_names/ configuration 
 parameter/primary
 +      /indexterm
 +      listitem
 +       para
 +        Specifies a list of standby names that can become the sole
 +        synchronous standby. Other standby servers connect that are also on
 +        the list become potential standbys. If the current synchronous 
 standby
 +        goes away it will be replaced with one of the potential standbys.
 +        Specifying more than one standby name can allow very high 
 availability.
 +       /para

 Can anybody please enlighten me? I do not quite follow Other standby
 servers connect that are also on the list become potential standbys
 part.

 Can I read this as Other standby servers that are also on the list
 become potential synchrnous standbys?

I read that in the same way.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >