subject:"Re\: \[HACKERS\] Synchronous replication"

Re: [HACKERS] Synchronous replication

2010-12-13 Thread Robert Haas

On Wed, Sep 15, 2010 at 8:39 AM, Fujii Masao masao.fu...@gmail.com wrote:
 I rebased the patch against current HEAD because it conflicted with
 recent commits about a latch.

Can you please rebase this again?  It no longer applies.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-15 Thread Erik Rijkers

On Wed, September 15, 2010 11:58, Fujii Masao wrote:
 On Wed, Sep 15, 2010 at 6:38 AM, David Fetter da...@fetter.org wrote:
 Now that the latch patch is in, when do you think you'll be able to use it
 instead of the poll loop?

 Here is the updated version, which uses a latch in communication from
 walsender to backend. I've not changed the others. Because walsender
 already uses it in HEAD, and Heikki already proposed the patch which
 replaced the poll loop between walreceiver and startup process with
 a latch.


( synchrep_0915-2.patch; patch applies cleanly;
compile, check and install are without problem)

How does one enable synchronous replication with this patch?
With previous versions I could do (in standby's recovery.conf):

replication_mode = 'recv'

but not anymore, apparently.

(sorry, I have probably overlooked part of the discussion;
-hackers is getting too high-volume for me... )

thanks,


Erik Rijkers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-15 Thread Erik Rijkers

nevermind...  I see standbys.conf is now used.

sorry for the noise...


Erik Rijkers

On Thu, September 16, 2010 01:12, Erik Rijkers wrote:
 On Wed, September 15, 2010 11:58, Fujii Masao wrote:
 On Wed, Sep 15, 2010 at 6:38 AM, David Fetter da...@fetter.org wrote:
 Now that the latch patch is in, when do you think you'll be able to use it
 instead of the poll loop?

 Here is the updated version, which uses a latch in communication from
 walsender to backend. I've not changed the others. Because walsender
 already uses it in HEAD, and Heikki already proposed the patch which
 replaced the poll loop between walreceiver and startup process with
 a latch.


 ( synchrep_0915-2.patch; patch applies cleanly;
 compile, check and install are without problem)

 How does one enable synchronous replication with this patch?
 With previous versions I could do (in standby's recovery.conf):

 replication_mode = 'recv'

 but not anymore, apparently.

 (sorry, I have probably overlooked part of the discussion;
 -hackers is getting too high-volume for me... )

 thanks,


 Erik Rijkers


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-14 Thread David Fetter

On Fri, Sep 10, 2010 at 11:52:20AM +0900, Fujii Masao wrote:
 On Fri, Sep 3, 2010 at 3:42 PM, Fujii Masao masao.fu...@gmail.com wrote:
  Here is the proposed detailed design:
 
  standbys.conf
  =
  # This is not initialized by initdb, so users need to create it under 
  $PGDATA.
     * The template is located in the PREFIX/share directory.
 
  # This is read by postmaster at the startup as well as pg_hba.conf is.
     * In EXEC_BACKEND environement, each walsender must read it at the 
  startup.
     * This is ignored when max_wal_senders is zero.
     * FATAL is emitted when standbys.conf doesn't exist even if 
  max_wal_senders
       is positive.
 
  # SIGHUP makes only postmaser re-read the standbys.conf.
     * New configuration doesn't affect the existing connections to the 
  standbys,
       i.e., it's used only for subsequent connections.
     * XXX: Should the existing connections react to new configuration? What 
  if
       new standbys.conf doesn't have the standby_name of the existing
  connection?
 
  # The connection from the standby is rejected if its standby_name is not 
  listed
   in standbys.conf.
     * Multiple standbys with the same name are allowed.
 
  # The valid values of SYNCHRONOUS field are async, recv, fsync and replay.
 
  standby_name
  
  # This is new string-typed parameter in recovery.conf.
     * XXX: Should standby_name and standby_mode be merged?
 
  # Walreceiver sends this to the master when establishing the connection.
 
 The attached patch implements the above and simple synchronous replication
 feature, which doesn't include quorum commit capability. The replication
 mode (async, recv, fsync, replay) can be specified on a per-standby basis,
 in standbys.conf.
 
 The patch still uses a poll loop in the backend, walsender, startup process
 and walreceiver. If a latch feature Heikki proposed will have been committed,
 I'll replace that with a latch.

Now that the latch patch is in, when do you think you'll be able to use it
instead of the poll loop?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-09 Thread Fujii Masao

On Fri, Sep 10, 2010 at 11:52 AM, Fujii Masao masao.fu...@gmail.com wrote:
 The attached patch implements the above and simple synchronous replication
 feature, which doesn't include quorum commit capability. The replication
 mode (async, recv, fsync, replay) can be specified on a per-standby basis,
 in standbys.conf.

 The patch still uses a poll loop in the backend, walsender, startup process
 and walreceiver. If a latch feature Heikki proposed will have been committed,
 I'll replace that with a latch.

 The documentation has not fully updated yet. I'll work on the document until
 the deadline of the next CF.

BTW, the latest code is available in my git repository too:

   git://git.postgresql.org/git/users/fujii/postgres.git
   branch: synchrep

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Heikki Linnakangas


On 06/09/10 17:14, Simon Riggs wrote:

On Mon, 2010-09-06 at 16:14 +0300, Heikki Linnakangas wrote:


The standby is sending a stream of messages to the master with current
LSN positions at the time the message is sent. Given a synchronous
transaction, the master would wait until the feedback stream reports
that the current transaction is in the past compared to the streamed
last known synced one (or the same).


That doesn't really answer the question: *when* does standby send back
the acknowledgment?


I think you should explain when you think this happens in your proposal.

Are you saying that you think the standby should send back one message
for every transaction? That you do not think we should buffer the return
messages?


For the sake of argument, yes that's what I was thinking. Now please 
explain how *you're* thinking it should work.



You seem to be proposing a design for responsiveness to a single
transaction, not for overall throughput. That's certainly a design
choice, but it wouldn't be my recommendation that we did that.


Sure, if there's more traffic, you can combine things. For example, if 
one fsync in the standby flushes more than one commit record, you only 
need one acknowledgment for all of them.


But don't dodge the question!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Simon Riggs

On Tue, 2010-09-07 at 09:27 +0300, Heikki Linnakangas wrote:
 On 06/09/10 17:14, Simon Riggs wrote:
  On Mon, 2010-09-06 at 16:14 +0300, Heikki Linnakangas wrote:
 
  The standby is sending a stream of messages to the master with current
  LSN positions at the time the message is sent. Given a synchronous
  transaction, the master would wait until the feedback stream reports
  that the current transaction is in the past compared to the streamed
  last known synced one (or the same).
 
  That doesn't really answer the question: *when* does standby send back
  the acknowledgment?
 
  I think you should explain when you think this happens in your proposal.
 
  Are you saying that you think the standby should send back one message
  for every transaction? That you do not think we should buffer the return
  messages?
 
 For the sake of argument, yes that's what I was thinking. Now please 
 explain how *you're* thinking it should work.

The WAL is sent from master to standby in 8192 byte chunks, frequently
including multiple commits. From standby, one reply per chunk. If we
need to wait for apply while nothing else is received, we do. 

  You seem to be proposing a design for responsiveness to a single
  transaction, not for overall throughput. That's certainly a design
  choice, but it wouldn't be my recommendation that we did that.
 
 Sure, if there's more traffic, you can combine things. For example, if 
 one fsync in the standby flushes more than one commit record, you only 
 need one acknowledgment for all of them.

 But don't dodge the question!

Given that I've previously outlined the size and contents of request
packets, their role and frequency I don't think I've dodged anything; in
fact, I've almost outlined the whole design for you. 

I am coding something to demonstrate the important aspects I've
espoused, just as you have done in the past when I didn't appreciate
and/or understand your ideas. That seems like the best way forwards
rather than wrangle through all the that can't work responses, which
actually takes longer.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Heikki Linnakangas


On 07/09/10 12:47, Simon Riggs wrote:

The WAL is sent from master to standby in 8192 byte chunks, frequently
including multiple commits. From standby, one reply per chunk. If we
need to wait for apply while nothing else is received, we do.


Ok, thank you. The obvious performance problem is that even if you 
define a transaction to use synchronization level 'recv', and there's no 
other concurrent transactions running, you actually need to wait until 
it's applied. If you have only one client, there is no difference 
between the levels, you always get the same performance hit you get with 
'apply'. With more clients, you get some benefit, but there's still 
plenty of delays compared to the optimum.


Also remember that there can be a very big gap between when a record is 
fsync'd and when it's applied, if the recovery needs to wait for a hot 
standby transaction to finish.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Simon Riggs

On Tue, 2010-09-07 at 13:11 +0300, Heikki Linnakangas wrote:
 The obvious performance problem 

Is not obvious at all, and you misunderstand again. This emphasises the
need for me to show code.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 On Tue, 2010-09-07 at 09:27 +0300, Heikki Linnakangas wrote:
 For the sake of argument, yes that's what I was thinking. Now please 
 explain how *you're* thinking it should work.

 The WAL is sent from master to standby in 8192 byte chunks, frequently
 including multiple commits. From standby, one reply per chunk. If we
 need to wait for apply while nothing else is received, we do. 

That premise is completely false.  SR does not send WAL in page units.
If it did, it would have the same performance problems as the old
WAL-file-at-a-time implementation, just with slightly smaller
granularity.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Simon Riggs

On Tue, 2010-09-07 at 10:47 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Tue, 2010-09-07 at 09:27 +0300, Heikki Linnakangas wrote:
  For the sake of argument, yes that's what I was thinking. Now please 
  explain how *you're* thinking it should work.
 
  The WAL is sent from master to standby in 8192 byte chunks, frequently
  including multiple commits. From standby, one reply per chunk. If we
  need to wait for apply while nothing else is received, we do. 
 
 That premise is completely false.  SR does not send WAL in page units.
 If it did, it would have the same performance problems as the old
 WAL-file-at-a-time implementation, just with slightly smaller
 granularity.

There's no dependence on pages in that proposal, so don't understand.

What aspect of the above would you change? and to what?

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 On Tue, 2010-09-07 at 10:47 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 The WAL is sent from master to standby in 8192 byte chunks, frequently
 including multiple commits. From standby, one reply per chunk. If we
 need to wait for apply while nothing else is received, we do. 
 
 That premise is completely false.  SR does not send WAL in page units.
 If it did, it would have the same performance problems as the old
 WAL-file-at-a-time implementation, just with slightly smaller
 granularity.

 There's no dependence on pages in that proposal, so don't understand.

Oh, well you certainly didn't explain it well then.

What I *think* you're saying is that the slave doesn't send per-commit
messages, but instead processes the WAL as it's received and then sends
a heres-where-I-am status message back upstream immediately before going
to sleep waiting for the next chunk.  That's fine as far as the protocol
goes, but I'm not convinced that it really does all that much in terms
of improving performance.  You still have the problem that the master
has to fsync its WAL before it can send it to the slave.  Also, the
slave won't know whether it ought to fsync its own WAL before replying.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Robert Haas

On Tue, Sep 7, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Oh, well you certainly didn't explain it well then.

 What I *think* you're saying is that the slave doesn't send per-commit
 messages, but instead processes the WAL as it's received and then sends
 a heres-where-I-am status message back upstream immediately before going
 to sleep waiting for the next chunk.  That's fine as far as the protocol
 goes, but I'm not convinced that it really does all that much in terms
 of improving performance.  You still have the problem that the master
 has to fsync its WAL before it can send it to the slave.

We have that problem in all of these proposals, don't we?  We
certainly have no infrastructure to handle the slave getting ahead of
the master in the WAL stream.

 Also, the
 slave won't know whether it ought to fsync its own WAL before replying.

Right.  And whether it ought to replay it before replying.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Simon Riggs

On Tue, 2010-09-07 at 11:41 -0400, Tom Lane wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  On Tue, 2010-09-07 at 10:47 -0400, Tom Lane wrote:
  Simon Riggs si...@2ndquadrant.com writes:
  The WAL is sent from master to standby in 8192 byte chunks, frequently
  including multiple commits. From standby, one reply per chunk. If we
  need to wait for apply while nothing else is received, we do. 
  
  That premise is completely false.  SR does not send WAL in page units.
  If it did, it would have the same performance problems as the old
  WAL-file-at-a-time implementation, just with slightly smaller
  granularity.
 
  There's no dependence on pages in that proposal, so don't understand.
 
 Oh, well you certainly didn't explain it well then.
 
 What I *think* you're saying is that the slave doesn't send per-commit
 messages, but instead processes the WAL as it's received and then sends
 a heres-where-I-am status message back upstream immediately before going
 to sleep waiting for the next chunk.  That's fine as far as the protocol
 goes, but I'm not convinced that it really does all that much in terms
 of improving performance.  You still have the problem that the master
 has to fsync its WAL before it can send it to the slave.  Also, the
 slave won't know whether it ought to fsync its own WAL before replying.

Yes, apart from last sentence. Please wait for the code.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Robert Haas

On Tue, Sep 7, 2010 at 11:59 AM, Simon Riggs si...@2ndquadrant.com wrote:
 What I *think* you're saying is that the slave doesn't send per-commit
 messages, but instead processes the WAL as it's received and then sends
 a heres-where-I-am status message back upstream immediately before going
 to sleep waiting for the next chunk.  That's fine as far as the protocol
 goes, but I'm not convinced that it really does all that much in terms
 of improving performance.  You still have the problem that the master
 has to fsync its WAL before it can send it to the slave.  Also, the
 slave won't know whether it ought to fsync its own WAL before replying.

 Yes, apart from last sentence. Please wait for the code.

So, we're going around and around in circles here because you're
repeatedly refusing to explain how the slave will know WHEN to send
acknowledgments back to the master without knowing which sync rep
level is in use.  It seems to be perfectly evident to everyone else
here that there are only two ways for this to work: either the value
is configured on the standby, or there's a registration system on the
master and the master tells the standby its wishes.  Instead of asking
the entire community to wait for an unspecified period of time for you
to write code that will handle this in an unspecified way, how about
answering the question?  We've wasted far too much time arguing about
this already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Simon Riggs

On Tue, 2010-09-07 at 12:07 -0400, Robert Haas wrote:
 On Tue, Sep 7, 2010 at 11:59 AM, Simon Riggs si...@2ndquadrant.com wrote:
  What I *think* you're saying is that the slave doesn't send per-commit
  messages, but instead processes the WAL as it's received and then sends
  a heres-where-I-am status message back upstream immediately before going
  to sleep waiting for the next chunk.  That's fine as far as the protocol
  goes, but I'm not convinced that it really does all that much in terms
  of improving performance.  You still have the problem that the master
  has to fsync its WAL before it can send it to the slave.  Also, the
  slave won't know whether it ought to fsync its own WAL before replying.
 
  Yes, apart from last sentence. Please wait for the code.
 
 So, we're going around and around in circles here because you're
 repeatedly refusing to explain how the slave will know WHEN to send
 acknowledgments back to the master without knowing which sync rep
 level is in use.  It seems to be perfectly evident to everyone else
 here that there are only two ways for this to work: either the value
 is configured on the standby, or there's a registration system on the
 master and the master tells the standby its wishes.  Instead of asking
 the entire community to wait for an unspecified period of time for you
 to write code that will handle this in an unspecified way, how about
 answering the question?  We've wasted far too much time arguing about
 this already.

Every time I explain anything, I get someone run around shouting but
that can't work!. I'm sorry, but again your logic is poor and the bias
against properly considering viable alternatives is the only thing
perfectly evident. So yes, I agree, it is a waste of time discussing it
until I show working code.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Robert Haas

On Tue, Sep 7, 2010 at 2:15 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Every time I explain anything, I get someone run around shouting but
 that can't work!. I'm sorry, but again your logic is poor and the bias
 against properly considering viable alternatives is the only thing
 perfectly evident. So yes, I agree, it is a waste of time discussing it
 until I show working code.

Obviously you don't agree, because that's the exact opposite of what
I just said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-07 Thread Bruce Momjian

Robert Haas wrote:
 On Tue, Sep 7, 2010 at 11:59 AM, Simon Riggs si...@2ndquadrant.com wrote:
  What I *think* you're saying is that the slave doesn't send per-commit
  messages, but instead processes the WAL as it's received and then sends
  a heres-where-I-am status message back upstream immediately before going
  to sleep waiting for the next chunk. ?That's fine as far as the protocol
  goes, but I'm not convinced that it really does all that much in terms
  of improving performance. ?You still have the problem that the master
  has to fsync its WAL before it can send it to the slave. ?Also, the
  slave won't know whether it ought to fsync its own WAL before replying.
 
  Yes, apart from last sentence. Please wait for the code.
 
 So, we're going around and around in circles here because you're
 repeatedly refusing to explain how the slave will know WHEN to send
 acknowledgments back to the master without knowing which sync rep
 level is in use.  It seems to be perfectly evident to everyone else
 here that there are only two ways for this to work: either the value
 is configured on the standby, or there's a registration system on the
 master and the master tells the standby its wishes.  Instead of asking
 the entire community to wait for an unspecified period of time for you
 to write code that will handle this in an unspecified way, how about
 answering the question?  We've wasted far too much time arguing about
 this already.

Ideally I would like the sync method to be set on each slave, and have
some method for the master to query the sync mode of all the slaves, e.g.
appname.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-06 Thread Dimitri Fontaine


Disclaimer : I have understood things in a way that allows me to answer
here, I don't know at all if that's the way it's meant to be understood.

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 (scratches head..) What's the point of differentiating
 received/fsynced/replayed, if the master receives the ack for all of them at
 the same time?

It wouldn't the way I understand Simon's proposal.

What's happening is that the feedback channel is periodically sending an
array of 3 LSN, the currently last received, fsync()ed and applied ones.

Now what you're saying is that we should feed back this information
after each recovery step forward, what Simon is saying is that we could
have a looser coupling between the slave activity and the feedback
channel to the master.

That means the master will not see all the slave's restoring activity,
but as the LSN are a monotonic sequence that's not a problem, we can use
= rather than = in the wait-and-wakeup loop on the master.

 Let's try this with an example: In the master, I do stuff and commit a
 transaction. I want to know when the transaction is fsynced in the
 standby. The WAL is sent to the standby, up to the commit record.
[...]
 So, when does standby send the single message back to the master?

The standby is sending a stream of messages to the master with current
LSN positions at the time the message is sent. Given a synchronous
transaction, the master would wait until the feedback stream reports
that the current transaction is in the past compared to the streamed
last known synced one (or the same).

Hope this helps, regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-06 Thread Heikki Linnakangas


On 06/09/10 16:03, Dimitri Fontaine wrote:

Heikki Linnakangasheikki.linnakan...@enterprisedb.com  writes:

(scratches head..) What's the point of differentiating
received/fsynced/replayed, if the master receives the ack for all of them at
the same time?


It wouldn't the way I understand Simon's proposal.

What's happening is that the feedback channel is periodically sending an
array of 3 LSN, the currently last received, fsync()ed and applied ones.


Periodically is a performance problem. The bottleneck in synchronous 
replication is typically the extra round-trip between master and 
standby, as the master needs to wait for the acknowledgment. Any delays 
in sending that acknowledgment lead directly to a decrease in 
performance. That's also why we need to eliminate the polling loops in 
walsender and walreceiver, and make them react immediately when there's 
work to do.



Let's try this with an example: In the master, I do stuff and commit a
transaction. I want to know when the transaction is fsynced in the
standby. The WAL is sent to the standby, up to the commit record.

[...]

So, when does standby send the single message back to the master?


The standby is sending a stream of messages to the master with current
LSN positions at the time the message is sent. Given a synchronous
transaction, the master would wait until the feedback stream reports
that the current transaction is in the past compared to the streamed
last known synced one (or the same).


That doesn't really answer the question: *when* does standby send back 
the acknowledgment?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-06 Thread Simon Riggs

On Mon, 2010-09-06 at 16:14 +0300, Heikki Linnakangas wrote:
 
  The standby is sending a stream of messages to the master with current
  LSN positions at the time the message is sent. Given a synchronous
  transaction, the master would wait until the feedback stream reports
  that the current transaction is in the past compared to the streamed
  last known synced one (or the same).
 
 That doesn't really answer the question: *when* does standby send back 
 the acknowledgment?

I think you should explain when you think this happens in your proposal.

Are you saying that you think the standby should send back one message
for every transaction? That you do not think we should buffer the return
messages?

You seem to be proposing a design for responsiveness to a single
transaction, not for overall throughput. That's certainly a design
choice, but it wouldn't be my recommendation that we did that.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-06 Thread Robert Haas

On Mon, Sep 6, 2010 at 10:14 AM, Simon Riggs si...@2ndquadrant.com wrote:
 That doesn't really answer the question: *when* does standby send back
 the acknowledgment?

 I think you should explain when you think this happens in your proposal.

 Are you saying that you think the standby should send back one message
 for every transaction? That you do not think we should buffer the return
 messages?

That's certainly what I was assuming - I can't speak for anyone else, of course.

 You seem to be proposing a design for responsiveness to a single
 transaction, not for overall throughput. That's certainly a design
 choice, but it wouldn't be my recommendation that we did that.

Gee, I thought that if we tried to buffer the messages, you'd end up
*reducing* overall throughput.  Suppose we have a busy system.  The
number of simultaneous transactions in flight is limited by
max_connections.  So it seems to me that if each transaction takes X%
longer to commit, then throughput will be reduced by X%.  And as
you've said, batching responses will make individual transactions less
responsive.  The corresponding advantage of batching the responses is
that you reduce consumption of network bandwidth, but I don't think
that's normally where the bottleneck will be.

Of course, you might be able to opportunistically combine messages, if
additional transactions become ready to acknowledge after the first
one has become ready but before the acknowledgement has actually been
sent.  But waiting to try to increase the batch size doesn't seem
right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Simon Riggs

On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:
 On Thu, Sep 2, 2010 at 11:32 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  I understand what you're after, the idea of being able to set
  synchronization level on a per-transaction basis is cool. But I haven't seen
  a satisfactory design for it. I don't understand how it would work in
  practice. Even though it's cool, having different kinds of standbys
  connected is a more common scenario, and the design needs to accommodate
  that too. I'm all ears if you can sketch a design that can do that.
 
 That design would affect what the standby should reply. If we choose
 async/recv/fsync/replay on a per-transaction basis, the standby
 should send multiple LSNs and the master needs to decide when
 replication has been completed. OTOH, if we choose just sync/async,
 the standby has only to send one LSN.
 
 The former seems to be more useful, but triples the number of ACK
 from the standby. I'm not sure whether its overhead is ignorable,
 especially when the distance between the master and the standby is
 very long.

No, it doesn't. There is no requirement for additional messages. It just
adds 16 bytes onto the reply message, maybe 24. If there is a noticeable
overhead from that, shoot me. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Fujii Masao

On Thu, Sep 2, 2010 at 7:24 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I propose a configuration file standbys.conf, in the master:

 # STANDBY NAME    SYNCHRONOUS   TIMEOUT
 importantreplica  yes           100ms
 tempcopy          no            10s

 Seems good. In fact, instead of yes/no, async/recv/fsync/replay is specified
 in SYNCHRONOUS field?

 OTOH, something like standby_name parameter should be introduced in
 recovery.conf.

 We should allow multiple standbys with the same name? Probably yes.
 We might need to add NUMBER field into the standbys.conf, in the future.

Here is the proposed detailed design:

standbys.conf
=
# This is not initialized by initdb, so users need to create it under $PGDATA.
* The template is located in the PREFIX/share directory.

# This is read by postmaster at the startup as well as pg_hba.conf is.
* In EXEC_BACKEND environement, each walsender must read it at the startup.
* This is ignored when max_wal_senders is zero.
* FATAL is emitted when standbys.conf doesn't exist even if max_wal_senders
  is positive.

# SIGHUP makes only postmaser re-read the standbys.conf.
* New configuration doesn't affect the existing connections to the standbys,
  i.e., it's used only for subsequent connections.
* XXX: Should the existing connections react to new configuration? What if
  new standbys.conf doesn't have the standby_name of the existing
connection?

# The connection from the standby is rejected if its standby_name is not listed
  in standbys.conf.
* Multiple standbys with the same name are allowed.

# The valid values of SYNCHRONOUS field are async, recv, fsync and replay.

standby_name

# This is new string-typed parameter in recovery.conf.
* XXX: Should standby_name and standby_mode be merged?

# Walreceiver sends this to the master when establishing the connection.

Comments? Is the above too complicated for the first step? If so, I'd
propose to just introduce new recovery.conf parameter like replication_mode
specifying the synchronization level, instead.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Heikki Linnakangas


On 03/09/10 09:36, Simon Riggs wrote:

On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:

That design would affect what the standby should reply. If we choose
async/recv/fsync/replay on a per-transaction basis, the standby
should send multiple LSNs and the master needs to decide when
replication has been completed. OTOH, if we choose just sync/async,
the standby has only to send one LSN.

The former seems to be more useful, but triples the number of ACK
from the standby. I'm not sure whether its overhead is ignorable,
especially when the distance between the master and the standby is
very long.


No, it doesn't. There is no requirement for additional messages.


Please explain how you do it then. When a commit record is sent to the 
standby, it needs to acknowledge it 1) when it has received it, 2) when 
it fsyncs it to disk and c) when it's replayed. I don't see how you can 
get around that.


Perhaps you can save a bit by combining multiple messages together, like 
in Nagle's algorithm, but then you introduce extra delays which is 
exactly what you don't want.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Fujii Masao

On Fri, Sep 3, 2010 at 3:36 PM, Simon Riggs si...@2ndquadrant.com wrote:
 The former seems to be more useful, but triples the number of ACK
 from the standby. I'm not sure whether its overhead is ignorable,
 especially when the distance between the master and the standby is
 very long.

 No, it doesn't. There is no requirement for additional messages. It just
 adds 16 bytes onto the reply message, maybe 24. If there is a noticeable
 overhead from that, shoot me.

The reply message would be sent at least three times every WAL chunk,
i.e., when the standby has received, synced and replayed it. So ISTM
that additional messagings happen. Though I'm not sure if this really
harms the performance...

You'd like to choose async/recv/fsync/replay on a per-transaction basis
rather than async/sync?

Even when async is chosen as the synchronization level in standbys.conf,
it can be changed to other level in transaction? If so, the standby has
to send the reply even if async is chosen and most replies might be
ignored in the master.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Simon Riggs

On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote:
 On 03/09/10 09:36, Simon Riggs wrote:
  On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:
  That design would affect what the standby should reply. If we choose
  async/recv/fsync/replay on a per-transaction basis, the standby
  should send multiple LSNs and the master needs to decide when
  replication has been completed. OTOH, if we choose just sync/async,
  the standby has only to send one LSN.
 
  The former seems to be more useful, but triples the number of ACK
  from the standby. I'm not sure whether its overhead is ignorable,
  especially when the distance between the master and the standby is
  very long.
 
  No, it doesn't. There is no requirement for additional messages.
 
 Please explain how you do it then. When a commit record is sent to the 
 standby, it needs to acknowledge it 1) when it has received it, 2) when 
 it fsyncs it to disk and c) when it's replayed. I don't see how you can 
 get around that.
 
 Perhaps you can save a bit by combining multiple messages together, like 
 in Nagle's algorithm, but then you introduce extra delays which is 
 exactly what you don't want.

From my perspective, you seem to be struggling to find reasons why this
should not happen, rather than seeing the alternatives that would
obviously present themselves if your attitude was a positive one. We
won't make any progress with this style of discussion.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Heikki Linnakangas


On 03/09/10 10:45, Simon Riggs wrote:

On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote:

On 03/09/10 09:36, Simon Riggs wrote:

On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:

That design would affect what the standby should reply. If we choose
async/recv/fsync/replay on a per-transaction basis, the standby
should send multiple LSNs and the master needs to decide when
replication has been completed. OTOH, if we choose just sync/async,
the standby has only to send one LSN.

The former seems to be more useful, but triples the number of ACK
from the standby. I'm not sure whether its overhead is ignorable,
especially when the distance between the master and the standby is
very long.


No, it doesn't. There is no requirement for additional messages.


Please explain how you do it then. When a commit record is sent to the
standby, it needs to acknowledge it 1) when it has received it, 2) when
it fsyncs it to disk and c) when it's replayed. I don't see how you can
get around that.

Perhaps you can save a bit by combining multiple messages together, like
in Nagle's algorithm, but then you introduce extra delays which is
exactly what you don't want.



From my perspective, you seem to be struggling to find reasons why this

should not happen, rather than seeing the alternatives that would
obviously present themselves if your attitude was a positive one. We
won't make any progress with this style of discussion.


Huh? You made a very clear claim above that you don't need additional 
messages. I explained why I don't think that's true, and asked you to 
explain why you think it is true. Whether the claim is true or not does 
not depend on my attitude.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Simon Riggs

On Fri, 2010-09-03 at 12:33 +0300, Heikki Linnakangas wrote:
 On 03/09/10 10:45, Simon Riggs wrote:
  On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote:
  On 03/09/10 09:36, Simon Riggs wrote:
  On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:
  That design would affect what the standby should reply. If we choose
  async/recv/fsync/replay on a per-transaction basis, the standby
  should send multiple LSNs and the master needs to decide when
  replication has been completed. OTOH, if we choose just sync/async,
  the standby has only to send one LSN.
 
  The former seems to be more useful, but triples the number of ACK
  from the standby. I'm not sure whether its overhead is ignorable,
  especially when the distance between the master and the standby is
  very long.
 
  No, it doesn't. There is no requirement for additional messages.
 
  Please explain how you do it then. When a commit record is sent to the
  standby, it needs to acknowledge it 1) when it has received it, 2) when
  it fsyncs it to disk and c) when it's replayed. I don't see how you can
  get around that.
 
  Perhaps you can save a bit by combining multiple messages together, like
  in Nagle's algorithm, but then you introduce extra delays which is
  exactly what you don't want.
 
  From my perspective, you seem to be struggling to find reasons why this
  should not happen, rather than seeing the alternatives that would
  obviously present themselves if your attitude was a positive one. We
  won't make any progress with this style of discussion.
 
 Huh? You made a very clear claim above that you don't need additional 
 messages. I explained why I don't think that's true, and asked you to 
 explain why you think it is true. Whether the claim is true or not does 
 not depend on my attitude.

Why exactly would we need to send 3 messages when we could send 1? 
Replace your statements of it needs to with why would it instead.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-03 Thread Heikki Linnakangas


On 03/09/10 13:20, Simon Riggs wrote:

On Fri, 2010-09-03 at 12:33 +0300, Heikki Linnakangas wrote:

On 03/09/10 10:45, Simon Riggs wrote:

On Fri, 2010-09-03 at 09:55 +0300, Heikki Linnakangas wrote:

On 03/09/10 09:36, Simon Riggs wrote:

On Fri, 2010-09-03 at 12:50 +0900, Fujii Masao wrote:

That design would affect what the standby should reply. If we choose
async/recv/fsync/replay on a per-transaction basis, the standby
should send multiple LSNs and the master needs to decide when
replication has been completed. OTOH, if we choose just sync/async,
the standby has only to send one LSN.

The former seems to be more useful, but triples the number of ACK
from the standby. I'm not sure whether its overhead is ignorable,
especially when the distance between the master and the standby is
very long.


No, it doesn't. There is no requirement for additional messages.


Please explain how you do it then. When a commit record is sent to the
standby, it needs to acknowledge it 1) when it has received it, 2) when
it fsyncs it to disk and c) when it's replayed. I don't see how you can
get around that.

Perhaps you can save a bit by combining multiple messages together, like
in Nagle's algorithm, but then you introduce extra delays which is
exactly what you don't want.



 From my perspective, you seem to be struggling to find reasons why this

should not happen, rather than seeing the alternatives that would
obviously present themselves if your attitude was a positive one. We
won't make any progress with this style of discussion.


Huh? You made a very clear claim above that you don't need additional
messages. I explained why I don't think that's true, and asked you to
explain why you think it is true. Whether the claim is true or not does
not depend on my attitude.


Why exactly would we need to send 3 messages when we could send 1?
Replace your statements of it needs to with why would it instead.


(scratches head..) What's the point of differentiating 
received/fsynced/replayed, if the master receives the ack for all of 
them at the same time?


Let's try this with an example: In the master, I do stuff and commit a 
transaction. I want to know when the transaction is fsynced in the 
standby. The WAL is sent to the standby, up to the commit record.


Upthread you said that:

 The standby does *not* need
 to know the wishes of transactions on the master.

So, when does standby send the single message back to the master?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: register/unregister standby Re: [HACKERS] Synchronous replication

2010-09-02 Thread Dimitri Fontaine

Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Hmm, that's clever. I was thinking that you'd initialize the standby from an
 existing backup, and in that context the standby would not need to connect
 to the master except via the replication connection. To take a base backup,
 you'll need not only that but also access to the filesystem in the master,
 ie. shell access.

In fact you don't need shell access here, it's rather easy to stream the
base backup from the libpq connection, as implemented here :

  http://github.com/dimitri/pg_basebackup

 There's been some talk of being able to stream a base backup over the
 replication connection too, which would be extremely handy. 

Yes please ! :)
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: register/unregister standby Re: [HACKERS] Synchronous replication

2010-09-02 Thread Itagaki Takahiro

On Thu, Sep 2, 2010 at 6:41 PM, Dimitri Fontaine dfonta...@hi-media.com wrote:
 In fact you don't need shell access here, it's rather easy to stream the
 base backup from the libpq connection, as implemented here :

  http://github.com/dimitri/pg_basebackup

 There's been some talk of being able to stream a base backup over the
 replication connection too, which would be extremely handy.

 Yes please ! :)

One issue of the base backup function is that the operation will
be a long transaction. So, non-transactional special commands,
as like as VACUUM, would be better in terms of performance.
For example, CREATE or ALTER REPLICATION.

Of course, function-based approach is more flexible and
less invasive to the SQL parser. There are trade-offs.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Fujii Masao

On Wed, Sep 1, 2010 at 7:23 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 That requirement falls out from the handling of disconnected standbys. If a
 standby is not connected, what does the master do with commits? If the
 answer is anything else than acknowledge them to the client immediately, as
 if the standby never existed, the master needs to know what standby servers
 exist. Otherwise it can't know if all the standbys are connected or not.

Thanks. I understood why the registration is required.

 I'd like to keep this as simple as possible, yet flexible so that with
 enough scripting and extensions, you can get all sorts of behavior. I think
 quorum commit falls into the extension category; if you're setup is
 complex enough, it's going to be impossible to represent that in our config
 files no matter what. But if you write a little proxy, you can implement
 arbitrary rules there.

Agreed.

 I think recv/fsync/replay should be specified in the standby. It has no
 direct effect on the master, the master would just relay the setting to the
 standby when it connects, or the standby would send multiple XLogRecPtrs and
 let the master decide when the WAL is persistent enough.

The latter seems wasteful since the master uses only one XLogRecPtr even if
the standby sends multiple ones. So I prefer the former design. Which also
makes the code and design very simple, and we can easily write the proxy.

 sync vs async on the other hand should be specified in the master, because
 it has a direct impact on the behavior of commits in the master.

 I propose a configuration file standbys.conf, in the master:

 # STANDBY NAME    SYNCHRONOUS   TIMEOUT
 importantreplica  yes           100ms
 tempcopy          no            10s

Seems good. In fact, instead of yes/no, async/recv/fsync/replay is specified
in SYNCHRONOUS field?

OTOH, something like standby_name parameter should be introduced in
recovery.conf.

We should allow multiple standbys with the same name? Probably yes.
We might need to add NUMBER field into the standbys.conf, in the future.

 Yeah, though of course you might want to set that per-standby too..

Yep.

 Let's step back a bit and ask what would be the simplest thing that you
 could call synchronous replication in good conscience, and also be useful
 at least to some people. Let's leave out the down mode, because that
 requires registration. We'll probably have to do registration at some point,
 but let's take as small steps as possible.

Agreed.

 Without the down mode in the master, frankly I don't see the point of the
 recv and fsync levels in the standby. Either way, when the master
 acknowledges a commit to the client, you don't know if it has made it to the
 standby yet because the replication connection might be down for some
 reason.

True. We cannot know whether the standby can be brought up to the master
without any data loss when the master crashes, because the standby might
be disconnected before for some reasons and not have some latest data.

But the situation would be the same even when 'replay' mode is chosen.
Though we might be able to check whether the latest transaction has
replicated to the standby by running read only query to the standby,
it's actually difficult to do that. How can we know the content of the
latest transaction?

Also even when 'recv' or 'fsync' is chosen, we might be able to check
that by doing pg_last_xlog_receive_location() on the standby. But the
similar question occurs to me: How can we know the LSN of the latest
transaction?

I'm thinking to introduce new parameter specifying the command which
is executed when the standby is disconnected. This command is executed
by walsender before resuming the transaction processings which have
been suspended by the disconnection. For example, if STONISH against
the standby is supplied as the command, we can prevent the standby not
having the latest data from becoming the master by forcibly shutting
such a delayed standby down. Thought?

 That leaves us the 'replay' mode, which *is* useful, because it gives you
 the guarantee that when the master acknowledges a commit, it will appear
 committed in all hot standby servers that are currently connected. With that
 guarantee you can build a reliable cluster with something pgpool-II where
 all writes go to one node, and reads are distributed to multiple nodes.

I'm concerned that the conflict by read-only query and recovery might
harm the performance on the master in 'replay' mode. If the conflict
occurs, all running transactions on the master have to wait for it to
disappear, and which can take very long. Of course, wihtout the conflict,
waiting until the standby has received, fsync'd, read and replayed WAL
would take long. So I'd like to support also 'recv' and 'fsync'.
I believe that it's not complicated and difficult to implement those
two modes.

 I'm not sure what we should aim for in the first phase. But if you want as
 little code as possible yet

Re: register/unregister standby Re: [HACKERS] Synchronous replication

2010-09-02 Thread Thom Brown

On 30 August 2010 13:14, Fujii Masao masao.fu...@gmail.com wrote:
 I think that the advantage of registering standbys is that we can
 specify which WAL files the master has to keep for the upcoming
 standby. IMO, it's usually called together with pg_start_backup
 as follows:

    SELECT register_standby('foo', pg_start_backup())

 This requests the master keep to all the WAL files following the
 backup starting location which pg_start_backup returns. Now we
 can do that by using wal_keep_segments, but it's not easy to set
 because it's difficult to predict how many WAL files the standby
 will require.

+1  I don't like the idea of having to guess how many WAL files you
think you'll need to keep around.

And if these standby instances have to register, could there be a view
to list subscriber information?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: register/unregister standby Re: [HACKERS] Synchronous replication

2010-09-02 Thread Dimitri Fontaine

Itagaki Takahiro itagaki.takah...@gmail.com writes:
  http://github.com/dimitri/pg_basebackup

 There's been some talk of being able to stream a base backup over the
 replication connection too, which would be extremely handy.

 Yes please ! :)

 One issue of the base backup function is that the operation will
 be a long transaction. So, non-transactional special commands,
 as like as VACUUM, would be better in terms of performance.
 For example, CREATE or ALTER REPLICATION.

Well, you still need to stream the data to the client in a format it
will understand. Would that be the plan of your command proposal?

 Of course, function-based approach is more flexible and
 less invasive to the SQL parser. There are trade-offs.

Well that was easier for a proof-of-concept at least.
-- 
Dimitri Fontaine
PostgreSQL DBA, Architecte

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: register/unregister standby Re: [HACKERS] Synchronous replication

2010-09-02 Thread Itagaki Takahiro

On Thu, Sep 2, 2010 at 7:54 PM, Dimitri Fontaine dfonta...@hi-media.com wrote:
 One issue of the base backup function is that the operation will
 be a long transaction. So, non-transactional special commands,
 as like as VACUUM, would be better in terms of performance.
 For example, CREATE or ALTER REPLICATION.

 Well, you still need to stream the data to the client in a format it
 will understand.

True, but using libpq connection might be not the most important thing.
The most simplest proof-of-concept might be system(rsync) in the function ;-)

 Would that be the plan of your command proposal?

What I meant was function-based maintenance does not work well in some
cases. I heard before pg_start_backup( no-fast-checkpoint ) caused table
bloating problem because it was a long transaction for 20+ minutes.
The backup function would have the similar issue.

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Simon Riggs

On Thu, 2010-09-02 at 19:24 +0900, Fujii Masao wrote:
 On Wed, Sep 1, 2010 at 7:23 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  That requirement falls out from the handling of disconnected standbys. If a
  standby is not connected, what does the master do with commits? If the
  answer is anything else than acknowledge them to the client immediately, as
  if the standby never existed, the master needs to know what standby servers
  exist. Otherwise it can't know if all the standbys are connected or not.
 
 Thanks. I understood why the registration is required.

I don't. There is a simpler design that does not require registration.

Please explain why we need registration, with an explanation that does
not presume it as a requirement.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Heikki Linnakangas


On 02/09/10 15:03, Simon Riggs wrote:

On Thu, 2010-09-02 at 19:24 +0900, Fujii Masao wrote:

On Wed, Sep 1, 2010 at 7:23 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

That requirement falls out from the handling of disconnected standbys. If a
standby is not connected, what does the master do with commits? If the
answer is anything else than acknowledge them to the client immediately, as
if the standby never existed, the master needs to know what standby servers
exist. Otherwise it can't know if all the standbys are connected or not.


Thanks. I understood why the registration is required.


I don't. There is a simpler design that does not require registration.

Please explain why we need registration, with an explanation that does
not presume it as a requirement.


Please explain how you would implement don't acknowledge commits until 
they're replicated to all standbys without standby registration.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Simon Riggs

On Thu, 2010-09-02 at 15:15 +0300, Heikki Linnakangas wrote:
 On 02/09/10 15:03, Simon Riggs wrote:
  On Thu, 2010-09-02 at 19:24 +0900, Fujii Masao wrote:
  On Wed, Sep 1, 2010 at 7:23 PM, Heikki Linnakangas
  heikki.linnakan...@enterprisedb.com  wrote:
  That requirement falls out from the handling of disconnected standbys. If 
  a
  standby is not connected, what does the master do with commits? If the
  answer is anything else than acknowledge them to the client immediately, 
  as
  if the standby never existed, the master needs to know what standby 
  servers
  exist. Otherwise it can't know if all the standbys are connected or not.
 
  Thanks. I understood why the registration is required.
 
  I don't. There is a simpler design that does not require registration.
 
  Please explain why we need registration, with an explanation that does
  not presume it as a requirement.
 
 Please explain how you would implement don't acknowledge commits until 
 they're replicated to all standbys without standby registration.

All standbys has no meaning without registration. It is not a question
that needs an answer.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Robert Haas

On Thu, Sep 2, 2010 at 8:44 AM, Simon Riggs si...@2ndquadrant.com wrote:
 All standbys has no meaning without registration. It is not a question
 that needs an answer.

Tell that to the DBA.  I bet s/he knows what all standbys means.
The fact that the system doesn't know something doesn't make it
unimportant.

I agree that we don't absolutely need standby registration for some
really basic version of synchronous replication.  But I think we'd be
better off biting the bullet and adding it.  I think that without it
we're going to resort to a series of increasingly grotty and
user-unfriendly hacks to make this work.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 Tell that to the DBA.  I bet s/he knows what all standbys means.
 The fact that the system doesn't know something doesn't make it
 unimportant.

Well as a DBA I think I'd much prefer to attribute votes to each
standby so that each ack is weighted. Let me explain in more details the
setup I'm thinking about.

The transaction on the master wants a certain service level (async,
recv, fsync, replay) and a certain number of votes. As proposed earlier,
the standby would feedback the last XID known locally in each state
(received, synced, replayed) and its current weight, and the master
would arbitrate given those information.

That's highly flexible, you can have slaves join the party at any point
in time, and change 2 user GUC (set by session, transaction, function,
database, role, in postgresql.conf) to setup the service level target
you want to ensure, from the master.

  (We could go as far as wanting fsync:2,replay:1 as a service level.)

From that you have either the fail when slave disappear and the
please don't shut the service down if a slave disappear settings, per
transaction, and per slave too (that depends on its weight, remember).

  (You can setup the slave weights as powers of 2 and have the service
   level be masks to allow you to choose precisely which slave will ack
   your fsync service level, and you can switch this slave at run time
   easily — sounds cleverer, but sounds also easier to implement given
   the flexibility it gives — precedents in PostgreSQL? the PITR and WAL
   Shipping facilities are hard to use, full of traps, but very
   flexible).

You can even give some more weight to one slave while you're maintaining
another so that the master just don't complain.

I see a need for very dynamic *and decentralized* replication topology
setup, I fail to see a need for a centralized registration based setup.

 I agree that we don't absolutely need standby registration for some
 really basic version of synchronous replication.  But I think we'd be
 better off biting the bullet and adding it.

What does that mechanism allow us to implement we can't do without?
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Simon Riggs

On Thu, 2010-09-02 at 08:59 -0400, Robert Haas wrote:
 On Thu, Sep 2, 2010 at 8:44 AM, Simon Riggs si...@2ndquadrant.com wrote:
  All standbys has no meaning without registration. It is not a question
  that needs an answer.
 
 Tell that to the DBA.  I bet s/he knows what all standbys means.
 The fact that the system doesn't know something doesn't make it
 unimportant.

 I agree that we don't absolutely need standby registration for some
 really basic version of synchronous replication.  But I think we'd be
 better off biting the bullet and adding it.  I think that without it
 we're going to resort to a series of increasingly grotty and
 user-unfriendly hacks to make this work.

I'm personally quite happy to have server registration.

My interest is in ensuring we have master-controlled robustness, which
is so far being ignored because we need simple. Refrring to above, we
are clearly quite willing to go beyond the most basic implementation, so
there's no further argument to exclude it for that reason.

The implementation of master-controlled robustness is no more difficult
than the alternative.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Robert Haas

On Thu, Sep 2, 2010 at 10:06 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, 2010-09-02 at 08:59 -0400, Robert Haas wrote:
 On Thu, Sep 2, 2010 at 8:44 AM, Simon Riggs si...@2ndquadrant.com wrote:
  All standbys has no meaning without registration. It is not a question
  that needs an answer.

 Tell that to the DBA.  I bet s/he knows what all standbys means.
 The fact that the system doesn't know something doesn't make it
 unimportant.

 I agree that we don't absolutely need standby registration for some
 really basic version of synchronous replication.  But I think we'd be
 better off biting the bullet and adding it.  I think that without it
 we're going to resort to a series of increasingly grotty and
 user-unfriendly hacks to make this work.

 I'm personally quite happy to have server registration.

OK, thanks for clarifying.

 My interest is in ensuring we have master-controlled robustness, which
 is so far being ignored because we need simple. Refrring to above, we
 are clearly quite willing to go beyond the most basic implementation, so
 there's no further argument to exclude it for that reason.

 The implementation of master-controlled robustness is no more difficult
 than the alternative.

But I'm not sure I quite follow this part.  I don't think I know what
you mean by master-controlled robustness.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Heikki Linnakangas


On 02/09/10 17:06, Simon Riggs wrote:

On Thu, 2010-09-02 at 08:59 -0400, Robert Haas wrote:

On Thu, Sep 2, 2010 at 8:44 AM, Simon Riggssi...@2ndquadrant.com  wrote:

All standbys has no meaning without registration. It is not a question
that needs an answer.


Tell that to the DBA.  I bet s/he knows what all standbys means.
The fact that the system doesn't know something doesn't make it
unimportant.



I agree that we don't absolutely need standby registration for some
really basic version of synchronous replication.  But I think we'd be
better off biting the bullet and adding it.  I think that without it
we're going to resort to a series of increasingly grotty and
user-unfriendly hacks to make this work.


I'm personally quite happy to have server registration.

My interest is in ensuring we have master-controlled robustness, which
is so far being ignored because we need simple. Refrring to above, we
are clearly quite willing to go beyond the most basic implementation, so
there's no further argument to exclude it for that reason.

The implementation of master-controlled robustness is no more difficult
than the alternative.


I understand what you're after, the idea of being able to set 
synchronization level on a per-transaction basis is cool. But I haven't 
seen a satisfactory design for it. I don't understand how it would work 
in practice. Even though it's cool, having different kinds of standbys 
connected is a more common scenario, and the design needs to accommodate 
that too. I'm all ears if you can sketch a design that can do that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Joshua Tolley

On Wed, Sep 01, 2010 at 04:53:38PM +0900, Fujii Masao wrote:
 - down
   When that situation occurs, the master shuts down immediately.
   Though this is unsafe for the system requiring high availability,
   as far as I recall, some people wanted this mode in the previous
   discussion.

Oracle provides this, among other possible configurations; perhaps that's why
it came up earlier.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-02 Thread Fujii Masao

On Thu, Sep 2, 2010 at 11:32 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I understand what you're after, the idea of being able to set
 synchronization level on a per-transaction basis is cool. But I haven't seen
 a satisfactory design for it. I don't understand how it would work in
 practice. Even though it's cool, having different kinds of standbys
 connected is a more common scenario, and the design needs to accommodate
 that too. I'm all ears if you can sketch a design that can do that.

That design would affect what the standby should reply. If we choose
async/recv/fsync/replay on a per-transaction basis, the standby
should send multiple LSNs and the master needs to decide when
replication has been completed. OTOH, if we choose just sync/async,
the standby has only to send one LSN.

The former seems to be more useful, but triples the number of ACK
from the standby. I'm not sure whether its overhead is ignorable,
especially when the distance between the master and the standby is
very long.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-01 Thread Fujii Masao

On Wed, Sep 1, 2010 at 2:33 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Once we're done with that, all the big questions are still left.

Yeah, let's discuss about those topics :)

 How to configure it?

Before discussing about that, we should determine whether registering
standbys in master is really required. It affects configuration a lot.
Heikki thinks that it's required, but I'm still unclear about why and
how.

Why do standbys need to be registered in master? What information
should be registered?

 What does synchronous replication mean, when is a transaction
 acknowledged as committed?

I proposed four synchronization levels:

1. async
  doesn't make transaction commit wait for replication, i.e.,
  asynchronous replication. This mode has been already supported in
  9.0.

2. recv
  makes transaction commit wait until the standby has received WAL
  records.

3. fsync
  makes transaction commit wait until the standby has received and
  flushed WAL records to disk

4. replay
  makes transaction commit wait until the standby has replayed WAL
  records after receiving and flushing them to disk

OTOH, Simon proposed the quorum commit feature. I think that both
is required for various our use cases. Thought?

 What to do if a standby server dies and never
 acknowledges a commit?

The master's reaction to that situation should be configurable. So
I'd propose new configuration parameter specifying the reaction.
Valid values are:

- standalone
  When the master has waited for the ACK much longer than the timeout
  (or detected the failure of the standby), it closes the connection
  to the standby and restarts transactions.

- down
  When that situation occurs, the master shuts down immediately.
  Though this is unsafe for the system requiring high availability,
  as far as I recall, some people wanted this mode in the previous
  discussion.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-01 Thread Heikki Linnakangas


On 01/09/10 10:53, Fujii Masao wrote:

Before discussing about that, we should determine whether registering
standbys in master is really required. It affects configuration a lot.
Heikki thinks that it's required, but I'm still unclear about why and
how.

Why do standbys need to be registered in master? What information
should be registered?


That requirement falls out from the handling of disconnected standbys. 
If a standby is not connected, what does the master do with commits? If 
the answer is anything else than acknowledge them to the client 
immediately, as if the standby never existed, the master needs to know 
what standby servers exist. Otherwise it can't know if all the standbys 
are connected or not.



What does synchronous replication mean, when is a transaction
acknowledged as committed?


I proposed four synchronization levels:

1. async
   doesn't make transaction commit wait for replication, i.e.,
   asynchronous replication. This mode has been already supported in
   9.0.

2. recv
   makes transaction commit wait until the standby has received WAL
   records.

3. fsync
   makes transaction commit wait until the standby has received and
   flushed WAL records to disk

4. replay
   makes transaction commit wait until the standby has replayed WAL
   records after receiving and flushing them to disk

OTOH, Simon proposed the quorum commit feature. I think that both
is required for various our use cases. Thought?


I'd like to keep this as simple as possible, yet flexible so that with 
enough scripting and extensions, you can get all sorts of behavior. I 
think quorum commit falls into the extension category; if you're setup 
is complex enough, it's going to be impossible to represent that in our 
config files no matter what. But if you write a little proxy, you can 
implement arbitrary rules there.


I think recv/fsync/replay should be specified in the standby. It has no 
direct effect on the master, the master would just relay the setting to 
the standby when it connects, or the standby would send multiple 
XLogRecPtrs and let the master decide when the WAL is persistent enough. 
And what if you write a proxy that has some other meaning of persistent 
enough? Like when it has been written to the OS buffers but not yet 
fsync'd, or when it has been fsync'd to at least one standby and 
received by at least three others. recv/fsync/replay is not going to 
represent that behavior well.


sync vs async on the other hand should be specified in the master, 
because it has a direct impact on the behavior of commits in the master.


I propose a configuration file standbys.conf, in the master:

# STANDBY NAMESYNCHRONOUS   TIMEOUT
importantreplica  yes   100ms
tempcopy  no10s

Or perhaps this should be stored in a system catalog.


What to do if a standby server dies and never
acknowledges a commit?


The master's reaction to that situation should be configurable. So
I'd propose new configuration parameter specifying the reaction.
Valid values are:

- standalone
   When the master has waited for the ACK much longer than the timeout
   (or detected the failure of the standby), it closes the connection
   to the standby and restarts transactions.

- down
   When that situation occurs, the master shuts down immediately.
   Though this is unsafe for the system requiring high availability,
   as far as I recall, some people wanted this mode in the previous
   discussion.


Yeah, though of course you might want to set that per-standby too..


Let's step back a bit and ask what would be the simplest thing that you 
could call synchronous replication in good conscience, and also be 
useful at least to some people. Let's leave out the down mode, because 
that requires registration. We'll probably have to do registration at 
some point, but let's take as small steps as possible.


Without the down mode in the master, frankly I don't see the point of 
the recv and fsync levels in the standby. Either way, when the 
master acknowledges a commit to the client, you don't know if it has 
made it to the standby yet because the replication connection might be 
down for some reason.


That leaves us the 'replay' mode, which *is* useful, because it gives 
you the guarantee that when the master acknowledges a commit, it will 
appear committed in all hot standby servers that are currently 
connected. With that guarantee you can build a reliable cluster with 
something pgpool-II where all writes go to one node, and reads are 
distributed to multiple nodes.


I'm not sure what we should aim for in the first phase. But if you want 
as little code as possible yet have something useful, I think 'replay' 
mode with no standby registration is the way to go.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: register/unregister standby Re: [HACKERS] Synchronous replication

2010-09-01 Thread Heikki Linnakangas


On 30/08/10 15:14, Fujii Masao wrote:

I think that the advantage of registering standbys is that we can
specify which WAL files the master has to keep for the upcoming
standby. IMO, it's usually called together with pg_start_backup
as follows:

 SELECT register_standby('foo', pg_start_backup())

This requests the master keep to all the WAL files following the
backup starting location which pg_start_backup returns.


Hmm, that's clever. I was thinking that you'd initialize the standby 
from an existing backup, and in that context the standby would not need 
to connect to the master except via the replication connection. To take 
a base backup, you'll need not only that but also access to the 
filesystem in the master, ie. shell access.


There's been some talk of being able to stream a base backup over the 
replication connection too, which would be extremely handy. And that 
makes my point even stronger that registering a standby should be 
possible via the replication connection.


Of course, you could well expose the functionality as both a built-in 
function and a command in replication mode, so this detail isn't really 
that important right now.



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-01 Thread Robert Haas

On Wed, Sep 1, 2010 at 6:23 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I'm not sure what we should aim for in the first phase. But if you want as
 little code as possible yet have something useful, I think 'replay' mode
 with no standby registration is the way to go.

IMHO, less is more.  Trying to do too much at once can cause us to
miss the release window (and can also create more bugs).  We just need
to leave the door open to adding later whatever we leave out now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-01 Thread Simon Riggs

On Wed, 2010-09-01 at 08:33 +0300, Heikki Linnakangas wrote:
 On 01/09/10 04:02, Robert Haas wrote:
   See the thread on interruptible sleeps.  The problem
  right now is that there are some polling loops that act to throttle
  the maximum rate at which a node doing sync rep can make forward
  progress, independent of the capabilities of the hardware.
 
 To be precise, the polling doesn't affect the bandwidth the 
 replication can handle, but it introduces a delay wh

We're sending the WAL data in batches. We can't really escape from the
fact that we're effectively using group commit when we use synch rep.
That will necessarily increase delay and require more sessions to get
same throughput.

   Those need
  to be replaced with a system that doesn't inject unnecessary delays
  into the process, which is what Heikki is working on.
 
 Right.

 Once we're done with that, all the big questions are still left. How to 
 configure it? What does synchronous replication mean, when is a 
 transaction acknowledged as committed? What to do if a standby server 
 dies and never acknowledges a commit? All these issues have been 
 discussed, but there is no consensus yet.

That sounds an awful lot like performance tuning first and the feature
additions last.

And if you're in the middle of performance tuning, surely some objective
performance tests would help us, no?

IMHO we should be concentrating on how to add the next features because
its clear to me that if you do things in the wrong order you'll be
wasting time. And we don't have much of that, ever.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-09-01 Thread Simon Riggs

On Wed, 2010-09-01 at 13:23 +0300, Heikki Linnakangas wrote:
 On 01/09/10 10:53, Fujii Masao wrote:
  Before discussing about that, we should determine whether registering
  standbys in master is really required. It affects configuration a lot.
  Heikki thinks that it's required, but I'm still unclear about why and
  how.
 
  Why do standbys need to be registered in master? What information
  should be registered?
 
 That requirement falls out from the handling of disconnected standbys. 
 If a standby is not connected, what does the master do with commits? If 
 the answer is anything else than acknowledge them to the client 
 immediately, as if the standby never existed, the master needs to know 
 what standby servers exist. Otherwise it can't know if all the standbys 
 are connected or not.

All the standbys presupposes that we know what they are, i.e. we have
registered them, so I see that argument as circular. Quorum commit does
not need registration, so quorum commit is the easy to implement
option and registration is the more complex later feature. I don't have
a problem with adding registration later and believe it can be done
later without issues.

  What does synchronous replication mean, when is a transaction
  acknowledged as committed?
 
  I proposed four synchronization levels:
 
  1. async
 doesn't make transaction commit wait for replication, i.e.,
 asynchronous replication. This mode has been already supported in
 9.0.
 
  2. recv
 makes transaction commit wait until the standby has received WAL
 records.
 
  3. fsync
 makes transaction commit wait until the standby has received and
 flushed WAL records to disk
 
  4. replay
 makes transaction commit wait until the standby has replayed WAL
 records after receiving and flushing them to disk
 
  OTOH, Simon proposed the quorum commit feature. I think that both
  is required for various our use cases. Thought?
 
 I'd like to keep this as simple as possible, yet flexible so that with 
 enough scripting and extensions, you can get all sorts of behavior. I 
 think quorum commit falls into the extension category; if you're setup 
 is complex enough, it's going to be impossible to represent that in our 
 config files no matter what. But if you write a little proxy, you can 
 implement arbitrary rules there.
 
 I think recv/fsync/replay should be specified in the standby. 

I think the wait mode (i.e. recv/fsync/replay or others) should be
specified in the master. This allows the application to specify whatever
level of protection it requires, and also allows the behaviour to be
different for user-specifiable parts of the application. As soon as you
set this on the standby then you have the one-size fits all approach to
synchronisation.

We already know performance of synchronous rep is poor, which is exactly
why I want to be able to control it at the application level. Fine
grained control is important, otherwise we may as well just use DRBD and
skip this project completely, since we already have that. It will also
be a feature that no other database has, taking us truly beyond what has
gone before.

The master/standby decision is not something that is easily changed.
Whichever we decide now will be the thing we stick with.

 It has no 
 direct effect on the master, the master would just relay the setting to 
 the standby when it connects, or the standby would send multiple 
 XLogRecPtrs and let the master decide when the WAL is persistent enough. 
 And what if you write a proxy that has some other meaning of persistent 
 enough? Like when it has been written to the OS buffers but not yet 
 fsync'd, or when it has been fsync'd to at least one standby and 
 received by at least three others. recv/fsync/replay is not going to 
 represent that behavior well.
 
 sync vs async on the other hand should be specified in the master, 
 because it has a direct impact on the behavior of commits in the master.
 



 I propose a configuration file standbys.conf, in the master:
 
 # STANDBY NAMESYNCHRONOUS   TIMEOUT
 importantreplica  yes   100ms
 tempcopy  no10s
 
 Or perhaps this should be stored in a system catalog.

That part sounds like complexity that can wait until later. I would not
object if you really want this, but would prefer it to look like this:

# STANDBY NAMEDEFAULT_WAIT_MODE   TIMEOUT
importantreplica  sync  100ms
tempcopy  async 10s

You don't *have* to use the application level control if you don't want
it. But its an important capability for real world apps, since the
alternative is deliberately splitting an application across two database
servers each with different wait modes.

  What to do if a standby server dies and never
  acknowledges a commit?
 
  The master's reaction to that situation should be configurable. So
  I'd propose new configuration parameter specifying the reaction.
  Valid values are:
 
  - standalone
 When

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread Bruce Momjian

fazool mein wrote:
 Hello everyone,
 
 I'm interested in benchmarking synchronous replication, to see how
 performance degrades compared to asynchronous streaming replication.
 
 I browsed through the archive of emails, but things still seem unclear. Do
 we have a final agreed upon patch that I can use? Any links for that?

No.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread David Fetter

On Tue, Aug 31, 2010 at 05:44:15PM -0400, Bruce Momjian wrote:
 fazool mein wrote:
  Hello everyone,
  
  I'm interested in benchmarking synchronous replication, to see how
  performance degrades compared to asynchronous streaming replication.
  
  I browsed through the archive of emails, but things still seem unclear. Do
  we have a final agreed upon patch that I can use? Any links for that?
 
 No.

That was a mite brusque and not super informative.

There are patches, and the latest from Fujii Masao is probably worth
looking at :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread Robert Haas

On Tue, Aug 31, 2010 at 6:24 PM, David Fetter da...@fetter.org wrote:
 On Tue, Aug 31, 2010 at 05:44:15PM -0400, Bruce Momjian wrote:
 fazool mein wrote:
  Hello everyone,
 
  I'm interested in benchmarking synchronous replication, to see how
  performance degrades compared to asynchronous streaming replication.
 
  I browsed through the archive of emails, but things still seem unclear. Do
  we have a final agreed upon patch that I can use? Any links for that?

 No.

 That was a mite brusque and not super informative.

 There are patches, and the latest from Fujii Masao is probably worth
 looking at :)

I am pretty sure, however, that the performance will be terrible at
this point.  Heikki is working on fixing that, but it ain't done yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread Robert Haas

On Tue, Aug 31, 2010 at 6:24 PM, David Fetter da...@fetter.org wrote:
 On Tue, Aug 31, 2010 at 05:44:15PM -0400, Bruce Momjian wrote:
 fazool mein wrote:
  Hello everyone,
 
  I'm interested in benchmarking synchronous replication, to see how
  performance degrades compared to asynchronous streaming replication.
 
  I browsed through the archive of emails, but things still seem unclear. Do
  we have a final agreed upon patch that I can use? Any links for that?

 No.

 That was a mite brusque and not super informative.

 There are patches, and the latest from Fujii Masao is probably worth
 looking at :)

I am pretty sure, however, that the performance will be terrible at
this point.  Heikki is working on fixing that, but it ain't done yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread David Fetter

On Tue, Aug 31, 2010 at 08:34:31PM -0400, Robert Haas wrote:
 On Tue, Aug 31, 2010 at 6:24 PM, David Fetter da...@fetter.org wrote:
  On Tue, Aug 31, 2010 at 05:44:15PM -0400, Bruce Momjian wrote:
  fazool mein wrote:
   Hello everyone,
  
   I'm interested in benchmarking synchronous replication, to see
   how performance degrades compared to asynchronous streaming
   replication.
  
   I browsed through the archive of emails, but things still seem
   unclear. Do we have a final agreed upon patch that I can use?
   Any links for that?
 
  No.
 
  That was a mite brusque and not super informative.
 
  There are patches, and the latest from Fujii Masao is probably
  worth looking at :)
 
 I am pretty sure, however, that the performance will be terrible at
 this point.  Heikki is working on fixing that, but it ain't done
 yet.

Is this something for an eDB feature, or for community PostgreSQL,
or...?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread Robert Haas

On Tue, Aug 31, 2010 at 8:45 PM, David Fetter da...@fetter.org wrote:
 I am pretty sure, however, that the performance will be terrible at
 this point.  Heikki is working on fixing that, but it ain't done
 yet.

 Is this something for an eDB feature, or for community PostgreSQL,
 or...?

It's an EDB feature in the sense that Heikki is developing it as part
of his employment with EDB, but it will be committed to community
PostgreSQL.  See the thread on interruptible sleeps.  The problem
right now is that there are some polling loops that act to throttle
the maximum rate at which a node doing sync rep can make forward
progress, independent of the capabilities of the hardware.  Those need
to be replaced with a system that doesn't inject unnecessary delays
into the process, which is what Heikki is working on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread Fujii Masao

On Wed, Sep 1, 2010 at 9:34 AM, Robert Haas robertmh...@gmail.com wrote:
 There are patches, and the latest from Fujii Masao is probably worth
 looking at :)

 I am pretty sure, however, that the performance will be terrible at
 this point.  Heikki is working on fixing that, but it ain't done yet.

Yep. The latest WIP code is available in my git repository, but it's
not worth benchmarking yet. I'll need to merge Heikki's effort and
the synchronous replication patch.

git://git.postgresql.org/git/users/fujii/postgres.git
branch: synchrep

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread fazool mein

Thanks!

I'll wait for the merging then; there is no point in benchmarking otherwise.

Regards

On Tue, Aug 31, 2010 at 6:06 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Wed, Sep 1, 2010 at 9:34 AM, Robert Haas robertmh...@gmail.com wrote:
  There are patches, and the latest from Fujii Masao is probably worth
  looking at :)
 
  I am pretty sure, however, that the performance will be terrible at
  this point.  Heikki is working on fixing that, but it ain't done yet.

 Yep. The latest WIP code is available in my git repository, but it's
 not worth benchmarking yet. I'll need to merge Heikki's effort and
 the synchronous replication patch.

git://git.postgresql.org/git/users/fujii/postgres.git
branch: synchrep

 Regards,

 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center

Re: [HACKERS] Synchronous replication - patch status inquiry

2010-08-31 Thread Heikki Linnakangas


On 01/09/10 04:02, Robert Haas wrote:

 See the thread on interruptible sleeps.  The problem
right now is that there are some polling loops that act to throttle
the maximum rate at which a node doing sync rep can make forward
progress, independent of the capabilities of the hardware.


To be precise, the polling doesn't affect the bandwidth the 
replication can handle, but it introduces a delay wh



 Those need
to be replaced with a system that doesn't inject unnecessary delays
into the process, which is what Heikki is working on.


Right.

Once we're done with that, all the big questions are still left. How to 
configure it? What does synchronous replication mean, when is a 
transaction acknowledged as committed? What to do if a standby server 
dies and never acknowledges a commit? All these issues have been 
discussed, but there is no consensus yet.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

register/unregister standby Re: [HACKERS] Synchronous replication

2010-08-30 Thread Fujii Masao

On Tue, Aug 10, 2010 at 5:58 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 05/08/10 17:14, Fujii Masao wrote:

 I'm thinking to make users register and unregister each standbys via SQL
 functions like register_standby() and unregister_standby():

 The register/unregister facility should be accessible from the streaming
 replication connection, so that you don't need to connect to any particular
 database in addition to the streaming connection.

Probably I've not understood your point correctly yet.

I think that the advantage of registering standbys is that we can
specify which WAL files the master has to keep for the upcoming
standby. IMO, it's usually called together with pg_start_backup
as follows:

SELECT register_standby('foo', pg_start_backup())

This requests the master keep to all the WAL files following the
backup starting location which pg_start_backup returns. Now we
can do that by using wal_keep_segments, but it's not easy to set
because it's difficult to predict how many WAL files the standby
will require.

So I've thought that the register/unregister facility should be
used from the normal client connection. Why do you think it should
be accessible from the SR connection?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-16 Thread Heikki Linnakangas


On 05/08/10 13:40, Fujii Masao wrote:

On Wed, Aug 4, 2010 at 12:35 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

There's some race conditions with the signaling. If another process finishes
XLOG flush and sends the signal when a walsender has just finished one
iteration of its main loop, walsender will reset xlogsend_requested and go
to sleep. It should not sleep but send the pending WAL immediately.


Yep. To avoid that race condition, xlogsend_requested should be reset to
false after sleep and before calling XLogSend(). I attached the updated
version of the patch.


There's still a small race condition: if you receive the signal just 
before entering pg_usleep(), it will not be interrupted.


Of course, on platforms where signals don't interrupt sleep, the problem 
is even bigger. Magnus reminded me that we can use select() instead of 
pg_usleep() on such platforms, but that's still vulnerable to the race 
condition.


ppoll() or pselect() could be used, but I don't think they're fully 
portable. I think we'll have to resort to the self-pipe trick mentioned 
in the Linux select(3) man page:



  On systems that  lack  pselect(),  reliable  (and
   more  portable)  signal  trapping  can  be achieved using the self-pipe
   trick (where a signal handler writes a byte to a pipe whose  other  end
   is monitored by select() in the main program.)


Another idea is to use something different than Unix signals, like 
ProcSendSignal/ProcWaitForSignal which are implemented using semaphores.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-09 Thread Heikki Linnakangas


On 05/08/10 17:14, Fujii Masao wrote:

I'm thinking to make users register and unregister each standbys via SQL
functions like register_standby() and unregister_standby():


The register/unregister facility should be accessible from the streaming 
replication connection, so that you don't need to connect to any 
particular database in addition to the streaming connection.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-09 Thread Heikki Linnakangas


On 01/08/10 15:30, Greg Stark wrote:

On Sun, Aug 1, 2010 at 7:11 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

I don't think any of this quorum stuff makes much sense without explicitly
registering standbys in the master.


This doesn't have to be done manually. The streaming protocol could
include the standby sending its system id to the master. The master
could just keep a list of system ids with the last record they've been
sent and the last they've confirmed receipt, fsync, application,
whatever the protocol covers. If the same system reconnects it just
overwrites the existing data for that system id.


Systemid doesn't work for that. Systemid is assigned at initdb time, so 
all the standbys have the same systemid as the master.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-09 Thread Heikki Linnakangas

I wonder if we can continue to rely on the pg_sleep() loop for sleeping 
in walsender. On those platforms where interrupts don't interrupt sleep, 
sending the signal is not going to promptly wake up walsender. That was 
fine before, but any delay is going to be poison to synchronous 
replication performance.


Thoughts?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-09 Thread Bruce Momjian

Fujii Masao wrote:
 On Wed, Aug 4, 2010 at 10:38 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  Then you risk running out of disk space. Similar to having an archive
  command that fails for some reason.
 
  That's one reason the registration should not be too automatic - there is
  serious repercussions if the standby just disappears. If the standby is a
  synchronous one, the master will stop committing or delay acknowledging
  commits, depending on the configuration, and the master needs to keep extra
  WAL around.
 
 Umm... in addition to registration of each standby, I think we should allow
 users to set the upper limit of the number of WAL files kept in pg_xlog to
 avoid running out of disk space. If it exceeds the upper limit, the master
 disconnects too old standbys from the cluster and removes all the WAL files
 not required for current connected standbys. If you don't want any standby
 to disappear unexpectedly because of the upper limit, you can set it to 0
 (= no limit).
 
 I'm thinking to make users register and unregister each standbys via SQL
 functions like register_standby() and unregister_standby():
 
 void register_standby(standby_name text, streaming_start_lsn text)
 void unregister_standby(standby_name text)
 
 Note that standby_name should be specified in recovery.conf of each
 standby.
 
 By using them we can easily specify which WAL files are unremovable because
 of new standby when taking the base backup for it as follows:
 
 SELECT register_standby('foo', pg_start_backup())

I know there has been discussion about how to identify the standby
servers --- how about using the connection application_name in
recovery.conf:

primary_conninfo = 'host=localhost port=5432 application_name=slave1'

The good part is that once recovery.conf goes away because it isn't a
standby anymore, the the application_name is gone.

An even more interesting approach would be to specify the replication
mode in the application_name:

primary_conninfo = 'host=localhost port=5432 application_name=replay'

and imagine being able to view the status of standby servers from
pg_stat_activity.  (Right now standby servers do not appear in
pg_stat_activity.)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-05 Thread Fujii Masao

On Wed, Aug 4, 2010 at 12:35 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 There's some race conditions with the signaling. If another process finishes
 XLOG flush and sends the signal when a walsender has just finished one
 iteration of its main loop, walsender will reset xlogsend_requested and go
 to sleep. It should not sleep but send the pending WAL immediately.

Yep. To avoid that race condition, xlogsend_requested should be reset to
false after sleep and before calling XLogSend(). I attached the updated
version of the patch.

Of course, the code is also available in my git repository:
git://git.postgresql.org/git/users/fujii/postgres.git
branch: wakeup-walsnd

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


change_poll_loop_in_walsender_0805.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-05 Thread Fujii Masao

On Wed, Aug 4, 2010 at 10:38 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Then you risk running out of disk space. Similar to having an archive
 command that fails for some reason.

 That's one reason the registration should not be too automatic - there is
 serious repercussions if the standby just disappears. If the standby is a
 synchronous one, the master will stop committing or delay acknowledging
 commits, depending on the configuration, and the master needs to keep extra
 WAL around.

Umm... in addition to registration of each standby, I think we should allow
users to set the upper limit of the number of WAL files kept in pg_xlog to
avoid running out of disk space. If it exceeds the upper limit, the master
disconnects too old standbys from the cluster and removes all the WAL files
not required for current connected standbys. If you don't want any standby
to disappear unexpectedly because of the upper limit, you can set it to 0
(= no limit).

I'm thinking to make users register and unregister each standbys via SQL
functions like register_standby() and unregister_standby():

void register_standby(standby_name text, streaming_start_lsn text)
void unregister_standby(standby_name text)

Note that standby_name should be specified in recovery.conf of each
standby.

By using them we can easily specify which WAL files are unremovable because
of new standby when taking the base backup for it as follows:

SELECT register_standby('foo', pg_start_backup())

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-04 Thread Heikki Linnakangas


On 27/07/10 13:29, Fujii Masao wrote:

On Tue, Jul 27, 2010 at 7:39 PM, Yeb Havingayebhavi...@gmail.com  wrote:

Fujii Masao wrote:
I noted the changes in XlogSend where instead of *caughtup = true/false it
now returns !MyWalSnd-sndrqst. That value is initialized to false in that
procedure and it cannot be changed to true during execution of that
procedure, or can it?


That value is set to true in WalSndWakeup(). If WalSndWakeup() is called
after initialization of that value in XLogSend(), *caughtup is set to false.


There's some race conditions with the signaling. If another process 
finishes XLOG flush and sends the signal when a walsender has just 
finished one iteration of its main loop, walsender will reset 
xlogsend_requested and go to sleep. It should not sleep but send the 
pending WAL immediately.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-04 Thread Heikki Linnakangas


On 02/08/10 11:45, Fujii Masao wrote:

On Sun, Aug 1, 2010 at 3:11 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com  wrote:

I don't think any of this quorum stuff makes much sense without explicitly
registering standbys in the master.


I'm not sure if this is a good idea. This requires users to do more
manual operations than ever when setting up the replication; assign
unique name (or ID) to each standby, register them in the master,
specify the names in each recovery.conf (or elsewhere), and remove
the registration from the master when getting rid of the standby.

But this is similar to the way of MySQL replication setup, so some
people (excluding me) may be familiar with it.


That would also solve the fuzziness with wal_keep_segments - if the master
knew what standbys exist, it could keep track of how far each standby has
received WAL, and keep just enough WAL for each standby to catch up.


What if the registered standby stays down for a long time?


Then you risk running out of disk space. Similar to having an archive 
command that fails for some reason.


That's one reason the registration should not be too automatic - there 
is serious repercussions if the standby just disappears. If the standby 
is a synchronous one, the master will stop committing or delay 
acknowledging commits, depending on the configuration, and the master 
needs to keep extra WAL around.


Of course, we can still support unregistered standbys, with the current 
semantics.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-02 Thread Fujii Masao

On Sun, Aug 1, 2010 at 3:11 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I don't think any of this quorum stuff makes much sense without explicitly
 registering standbys in the master.

I'm not sure if this is a good idea. This requires users to do more
manual operations than ever when setting up the replication; assign
unique name (or ID) to each standby, register them in the master,
specify the names in each recovery.conf (or elsewhere), and remove
the registration from the master when getting rid of the standby.

But this is similar to the way of MySQL replication setup, so some
people (excluding me) may be familiar with it.

 That would also solve the fuzziness with wal_keep_segments - if the master
 knew what standbys exist, it could keep track of how far each standby has
 received WAL, and keep just enough WAL for each standby to catch up.

What if the registered standby stays down for a long time?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-02 Thread Fujii Masao

On Sun, Aug 1, 2010 at 9:51 PM, Robert Haas robertmh...@gmail.com wrote:
 Perhaps someone will claim that nobody wants to do that anyway (which
 I don't believe, BTW), but even in simpler cases it would be nicer to
 have an explicit policy rather than - in effect - inferring a policy
 from a soup of GUC settings.  For example, if you want one synchronous
 standby (A) and two asynchronous standbys (B and C).  You can say
 quorum=1 on the master and then configure vote=1 on A and vote=0 on B
 and C, but now you have to look at four machines to figure out what
 the policy is, and a change on any one of those machines can break it.
  ISTM that if you can just write synchronous_standbys=A on the master,
 that's a whole lot more clear and less error-prone.

Some standbys may become master later by failover. So we would
need to write something like synchronous_standbys=A on not only
current one master but also those standbys. Changing
synchronous_standbys would require change on all those servers.
Or the master should replicate even that change to the standbys?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-02 Thread Robert Haas

On Mon, Aug 2, 2010 at 5:02 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Aug 1, 2010 at 9:51 PM, Robert Haas robertmh...@gmail.com wrote:
 Perhaps someone will claim that nobody wants to do that anyway (which
 I don't believe, BTW), but even in simpler cases it would be nicer to
 have an explicit policy rather than - in effect - inferring a policy
 from a soup of GUC settings.  For example, if you want one synchronous
 standby (A) and two asynchronous standbys (B and C).  You can say
 quorum=1 on the master and then configure vote=1 on A and vote=0 on B
 and C, but now you have to look at four machines to figure out what
 the policy is, and a change on any one of those machines can break it.
  ISTM that if you can just write synchronous_standbys=A on the master,
 that's a whole lot more clear and less error-prone.

 Some standbys may become master later by failover. So we would
 need to write something like synchronous_standbys=A on not only
 current one master but also those standbys. Changing
 synchronous_standbys would require change on all those servers.
 Or the master should replicate even that change to the standbys?

Let's not get *the manner of specifying the policy* confused with *the
need to update the policy when the master changes*.  It doesn't seem
likely you would want the same value for  synchronous_standbys on all
your machines.  In the most common configuration, you'd probably have:

on A: synchronous_standbys=B
on B: synchronous_standbys=A

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-02 Thread Fujii Masao

On Mon, Aug 2, 2010 at 7:53 PM, Robert Haas robertmh...@gmail.com wrote:
 Let's not get *the manner of specifying the policy* confused with *the
 need to update the policy when the master changes*.  It doesn't seem
 likely you would want the same value for  synchronous_standbys on all
 your machines.  In the most common configuration, you'd probably have:

 on A: synchronous_standbys=B
 on B: synchronous_standbys=A

Oh, true. But, what if we have another synchronous standby called C?
We specify the policy as follows?:

on A: synchronous_standbys=B,C
on B: synchronous_standbys=A,C
on C: synchronous_standbys=A,B

We would need to change the setting on both A and B when we want to
change the name of the third standby from C to D, for example. No?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-02 Thread Robert Haas

On Mon, Aug 2, 2010 at 7:06 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Aug 2, 2010 at 7:53 PM, Robert Haas robertmh...@gmail.com wrote:
 Let's not get *the manner of specifying the policy* confused with *the
 need to update the policy when the master changes*.  It doesn't seem
 likely you would want the same value for  synchronous_standbys on all
 your machines.  In the most common configuration, you'd probably have:

 on A: synchronous_standbys=B
 on B: synchronous_standbys=A

 Oh, true. But, what if we have another synchronous standby called C?
 We specify the policy as follows?:

 on A: synchronous_standbys=B,C
 on B: synchronous_standbys=A,C
 on C: synchronous_standbys=A,B

 We would need to change the setting on both A and B when we want to
 change the name of the third standby from C to D, for example. No?

Sure.  If you give the standbys names, then if people change the
names, they'll have to update their configuration.  But I can't see
that as an argument against doing it.  You can remove the possibility
that someone will have a hassle if they rename a server by not
allowing them to give it a name in the first place, but that doesn't
seem like a win from a usability perspective.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-02 Thread Fujii Masao

On Mon, Aug 2, 2010 at 8:32 PM, Robert Haas robertmh...@gmail.com wrote:
 Sure.  If you give the standbys names, then if people change the
 names, they'll have to update their configuration.  But I can't see
 that as an argument against doing it.  You can remove the possibility
 that someone will have a hassle if they rename a server by not
 allowing them to give it a name in the first place, but that doesn't
 seem like a win from a usability perspective.

I'm just comparing your idea (i.e., set synchronous_standbys on
each possible master) with my idea (i.e., set replication_mode on
each standby). Though your idea has the advantage described in the
following post, it seems to make the setup of the standbys more
complicated, as I described. So I'm trying to generate better idea.
http://archives.postgresql.org/pgsql-hackers/2010-08/msg7.php

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-02 Thread Yeb Havinga


Fujii Masao wrote:

On Mon, Aug 2, 2010 at 7:53 PM, Robert Haas robertmh...@gmail.com wrote:
  

Let's not get *the manner of specifying the policy* confused with *the
need to update the policy when the master changes*.  It doesn't seem
likely you would want the same value for  synchronous_standbys on all
your machines.  In the most common configuration, you'd probably have:

on A: synchronous_standbys=B
on B: synchronous_standbys=A



Oh, true. But, what if we have another synchronous standby called C?
We specify the policy as follows?:

on A: synchronous_standbys=B,C
on B: synchronous_standbys=A,C
on C: synchronous_standbys=A,B

We would need to change the setting on both A and B when we want to
change the name of the third standby from C to D, for example. No?
  
What if the master is named as well in the 'pool of servers that are in 
sync'? In the scenario above this pool would be A,B,C. Working with this 
concept has as benefit that the setting can be copied to all other 
servers as well, and is invariant under any number of failures or 
switchovers. The same could also hold for quorum expressions like A  
(B || C), if A,B,C are either master or standby.


I initially though that once the definitions could be the same on all 
servers, having them in a system catalog would be a good thing. However 
that'd propably hard to setup, and also in the case of failures during 
change of the parameters it could become very messy.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-02 Thread Robert Haas

On Mon, Aug 2, 2010 at 8:57 AM, Yeb Havinga yebhavi...@gmail.com wrote:
 Fujii Masao wrote:

 On Mon, Aug 2, 2010 at 7:53 PM, Robert Haas robertmh...@gmail.com wrote:


 Let's not get *the manner of specifying the policy* confused with *the
 need to update the policy when the master changes*.  It doesn't seem
 likely you would want the same value for  synchronous_standbys on all
 your machines.  In the most common configuration, you'd probably have:

 on A: synchronous_standbys=B
 on B: synchronous_standbys=A


 Oh, true. But, what if we have another synchronous standby called C?
 We specify the policy as follows?:

 on A: synchronous_standbys=B,C
 on B: synchronous_standbys=A,C
 on C: synchronous_standbys=A,B

 We would need to change the setting on both A and B when we want to
 change the name of the third standby from C to D, for example. No?


 What if the master is named as well in the 'pool of servers that are in
 sync'? In the scenario above this pool would be A,B,C. Working with this
 concept has as benefit that the setting can be copied to all other servers
 as well, and is invariant under any number of failures or switchovers. The
 same could also hold for quorum expressions like A  (B || C), if A,B,C are
 either master or standby.

 I initially though that once the definitions could be the same on all
 servers, having them in a system catalog would be a good thing. However
 that'd propably hard to setup, and also in the case of failures during
 change of the parameters it could become very messy.

Yeah, I think this information has to be stored either in GUCs or in a
flat-file somewhere.  Putting it in a system catalog will cause major
problems when trying to get a down system back up, I think.

I suspect that for complex setups, people will need to use some kind
of cluster-ware to update the settings as nodes go up and down.  But I
think it will still be simpler if the nodes are named.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-01 Thread Heikki Linnakangas


On 27/07/10 16:12, Joshua Tolley wrote:

My concern is that in a quorum system, if the quorum number is less than the
total number of replicas, there's no way to know *which* replicas composed the
quorum for any given transaction, so we can't know which servers to fail to if
the master dies.


In fact, it's possible for one standby to sync up to X, then disconnect 
and reconnect, and have the master count it second time in the quorum. 
Especially if the master doesn't notice that the standby disconnected, 
e.g a network problem.


I don't think any of this quorum stuff makes much sense without 
explicitly registering standbys in the master.


That would also solve the fuzziness with wal_keep_segments - if the 
master knew what standbys exist, it could keep track of how far each 
standby has received WAL, and keep just enough WAL for each standby to 
catch up.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-01 Thread Greg Stark

On Sun, Aug 1, 2010 at 7:11 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 In fact, it's possible for one standby to sync up to X, then disconnect and
 reconnect, and have the master count it second time in the quorum.
 Especially if the master doesn't notice that the standby disconnected, e.g a
 network problem.

 I don't think any of this quorum stuff makes much sense without explicitly
 registering standbys in the master.

This doesn't have to be done manually. The streaming protocol could
include the standby sending its system id to the master. The master
could just keep a list of system ids with the last record they've been
sent and the last they've confirmed receipt, fsync, application,
whatever the protocol covers. If the same system reconnects it just
overwrites the existing data for that system id.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-01 Thread Robert Haas

On Sun, Aug 1, 2010 at 8:30 AM, Greg Stark gsst...@mit.edu wrote:
 On Sun, Aug 1, 2010 at 7:11 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 In fact, it's possible for one standby to sync up to X, then disconnect and
 reconnect, and have the master count it second time in the quorum.
 Especially if the master doesn't notice that the standby disconnected, e.g a
 network problem.

 I don't think any of this quorum stuff makes much sense without explicitly
 registering standbys in the master.

 This doesn't have to be done manually. The streaming protocol could
 include the standby sending its system id to the master. The master
 could just keep a list of system ids with the last record they've been
 sent and the last they've confirmed receipt, fsync, application,
 whatever the protocol covers. If the same system reconnects it just
 overwrites the existing data for that system id.

That seems entirely too clever.  Where are you going to store this
data?  What if you want to clean out the list?

I've felt from the beginning that the idea of doing synchronous
replication without having an explicit notion of what standbys are out
there was not on very sound footing, and I think the difficulties of
making quorum commit work properly are only further evidence of that.
Much has been made of the notion of wait for N votes, but allow
standbys to explicitly give up their vote, but that's still not fully
general - for example, you can't implement A  (B || C).

Perhaps someone will claim that nobody wants to do that anyway (which
I don't believe, BTW), but even in simpler cases it would be nicer to
have an explicit policy rather than - in effect - inferring a policy
from a soup of GUC settings.  For example, if you want one synchronous
standby (A) and two asynchronous standbys (B and C).  You can say
quorum=1 on the master and then configure vote=1 on A and vote=0 on B
and C, but now you have to look at four machines to figure out what
the policy is, and a change on any one of those machines can break it.
 ISTM that if you can just write synchronous_standbys=A on the master,
that's a whole lot more clear and less error-prone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-01 Thread Fujii Masao

On Sun, Aug 1, 2010 at 9:30 PM, Greg Stark gsst...@mit.edu wrote:
 This doesn't have to be done manually.

Agreed, if we register standbys in the master.

 The streaming protocol could
 include the standby sending its system id to the master. The master
 could just keep a list of system ids with the last record they've been
 sent and the last they've confirmed receipt, fsync, application,
 whatever the protocol covers. If the same system reconnects it just
 overwrites the existing data for that system id.

Since every standby has the same system id, we cannot distinguish
them by that id. ISTM that the master should assign the unique id
for each standby, and they should save it in pg_control.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-08-01 Thread Robert Haas

On Sun, Aug 1, 2010 at 10:08 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Sun, Aug 1, 2010 at 9:30 PM, Greg Stark gsst...@mit.edu wrote:
 This doesn't have to be done manually.

 Agreed, if we register standbys in the master.

 The streaming protocol could
 include the standby sending its system id to the master. The master
 could just keep a list of system ids with the last record they've been
 sent and the last they've confirmed receipt, fsync, application,
 whatever the protocol covers. If the same system reconnects it just
 overwrites the existing data for that system id.

 Since every standby has the same system id, we cannot distinguish
 them by that id. ISTM that the master should assign the unique id
 for each standby, and they should save it in pg_control.

Another option might be to let the user name them.

standby_name='near'
standby_name='far1'
standby_name='far2'

...or whatever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Yeb Havinga


Fujii Masao wrote:

On Mon, Jul 26, 2010 at 8:25 PM, Robert Haas robertmh...@gmail.com wrote:
  

On Mon, Jul 26, 2010 at 6:48 AM, Marko Tiikkaja
marko.tiikk...@cs.helsinki.fi wrote:


On 7/26/10 1:44 PM +0300, Fujii Masao wrote:
  

On Mon, Jul 26, 2010 at 6:36 PM, Yeb Havingayebhavi...@gmail.com  wrote:


I wasn't entirely clear. My suggestion was to have only

  acknowledge_commit = {no|recv|fsync|replay}

instead of

  replication_mode = {async|recv|fsync|replay}
  

Okay, I'll change the patch accordingly.


For what it's worth, I think replication_mode is a lot clearer.
Acknowledge_commit sounds like it would do something similar to
asynchronous_commit.
  

I agree.



As the result of the vote, I'll leave the parameter replication_mode
as it is.
  
I'd like to bring forward another suggestion (please tell me when it is 
becoming spam). My feeling about replication_mode as is, is that is says 
in the same parameter something about async or sync, as well as, if 
sync, which method of feedback to the master. OTOH having two parameters 
would need documentation that the feedback method may only be set if the 
replication_mode was sync, as well as checks. So it is actually good to 
have it all in one parameter


But somehow the shoe pinches, because async feels different from the 
other three parameters. There is a way to move async out of the enumeration:


synchronous_replication_mode = off | recv | fsync | replay

This also looks a bit like the synchronous_replication = N # similar in 
name to synchronous_commit Simon Riggs proposed in 
http://archives.postgresql.org/pgsql-hackers/2010-05/msg01418.php


regards,
Yeb Havinga



PS: Please bear with me, I thought a bit about a way to make clear what 
deduction users must make when figuring out if the replication mode is 
synchronous. That question might be important when counting 'which 
servers are the synchronous standbys' to debug quorum settings.


replication_mode

from the assumption !async - sync
and !async - recv|fsync|replay
to infer recv|fsync|replay - synchronous_replication.

synchronous_replication_mode

from the assumption !off - on
and !off - recv|fsync|replay
to infer recv|fsync|replay - synchronous_replication.

I think the last one is easier made by humans, since everybody will make 
the !off- on assumption, but not the !async - sync without having that 
verified in the documentation.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Yeb Havinga


Joshua Tolley wrote:

Perhaps I'm hijacking the wrong thread for this, but I wonder if the quorum
idea is really the best thing for us.
For reference: it appeared in a long thread a while ago 
http://archives.postgresql.org/pgsql-hackers/2010-05/msg01226.php.

In short, there are three different modes: availability,
performance, and protection. Protection appears to mean that at least one
standby has applied the log; availability means at least one standby has
received the log info
  
Maybe we could do both, by describing use cases along the availability, 
performance and protection setups in the documentation and how they 
would be reflected with the standby related parameters.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Yeb Havinga


Fujii Masao wrote:

The attached patch changes the backend so that it signals walsender to
wake up from the sleep and send WAL immediately. It doesn't include any
other synchronous replication stuff.
  

Hello Fujii,

I noted the changes in XlogSend where instead of *caughtup = true/false 
it now returns !MyWalSnd-sndrqst. That value is initialized to false in 
that procedure and it cannot be changed to true during execution of that 
procedure, or can it?


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Fujii Masao

On Tue, Jul 27, 2010 at 7:39 PM, Yeb Havinga yebhavi...@gmail.com wrote:
 Fujii Masao wrote:

 The attached patch changes the backend so that it signals walsender to
 wake up from the sleep and send WAL immediately. It doesn't include any
 other synchronous replication stuff.


 Hello Fujii,

Thanks for the review!

 I noted the changes in XlogSend where instead of *caughtup = true/false it
 now returns !MyWalSnd-sndrqst. That value is initialized to false in that
 procedure and it cannot be changed to true during execution of that
 procedure, or can it?

That value is set to true in WalSndWakeup(). If WalSndWakeup() is called
after initialization of that value in XLogSend(), *caughtup is set to false.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Fujii Masao

On Tue, Jul 27, 2010 at 5:42 PM, Yeb Havinga yebhavi...@gmail.com wrote:
 I'd like to bring forward another suggestion (please tell me when it is
 becoming spam). My feeling about replication_mode as is, is that is says in
 the same parameter something about async or sync, as well as, if sync, which
 method of feedback to the master. OTOH having two parameters would need
 documentation that the feedback method may only be set if the
 replication_mode was sync, as well as checks. So it is actually good to have
 it all in one parameter

 But somehow the shoe pinches, because async feels different from the other
 three parameters. There is a way to move async out of the enumeration:

 synchronous_replication_mode = off | recv | fsync | replay

ISTM that we need to get more feedback from users to determine which
is the best. So, how about leaving the parameter as it is and revisiting
this topic later? Since it's not difficult to change the parameter later,
we will not regret even if we delay that determination.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Yeb Havinga


Fujii Masao wrote:

I noted the changes in XlogSend where instead of *caughtup = true/false it
now returns !MyWalSnd-sndrqst. That value is initialized to false in that
procedure and it cannot be changed to true during execution of that
procedure, or can it?



That value is set to true in WalSndWakeup(). If WalSndWakeup() is called
after initialization of that value in XLogSend(), *caughtup is set to false.
  

Ah, so it can be changed by another backend process.

Another question:

Is there a reason not to send the signal in XlogFlush itself, so it 
would be called at


CreateCheckPoint(), EndPrepare(), FlushBuffer(), 
RecordTransactionAbortPrepared(), RecordTransactionCommit(), 
RecordTransactionCommitPrepared(), RelationTruncate(), 
SlruPhysicalWritePage(), write_relmap_file(), WriteTruncateXlogRec(), 
and xact_redo_commit().


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Joshua Tolley

On Tue, Jul 27, 2010 at 01:41:10PM +0900, Fujii Masao wrote:
 On Tue, Jul 27, 2010 at 12:36 PM, Joshua Tolley eggyk...@gmail.com wrote:
  Perhaps I'm hijacking the wrong thread for this, but I wonder if the quorum
  idea is really the best thing for us. I've been thinking about Oracle's way 
  of
  doing things[1]. In short, there are three different modes: availability,
  performance, and protection. Protection appears to mean that at least one
  standby has applied the log; availability means at least one standby has
  received the log info (it doesn't specify whether that info has been fsynced
  or applied, but presumably does not mean applied, since it's distinct from
  protection mode); performance means replication is asynchronous. I'm not
  sure this method is perfect, but it might be simpler than the quorum 
  behavior
  that has been considered, and adequate for actual use cases.
 
 In my case, I'd like to set up one synchronous standby on the near rack for
 high-availability, and one asynchronous standby on the remote site for 
 disaster
 recovery. Can Oracle's way cover the case?

I don't think it can support the case you're interested in, though I'm not
terribly expert on it. I'm definitely not arguing for the syntax Oracle uses,
or something similar; I much prefer the flexibility we're proposing, and agree
with Yeb Havinga in another email who suggests we spell out in documentation
some recipes for achieving various possible scenarios given whatever GUCs we
settle on.

 availability mode with two standbys might create a sort of similar 
 situation.
 That is, since the ACK from the near standby arrives in first, the near 
 standby
 acts synchronous and the remote one does asynchronous. But the ACK from the
 remote standby can arrive in first, so it's not guaranteed that the near 
 standby
 has received the log info before transaction commit returns a success to the
 client. In this case, we have to failover to the remote standby even if it's 
 not
 under control of a clusterware. This is a problem for me.

My concern is that in a quorum system, if the quorum number is less than the
total number of replicas, there's no way to know *which* replicas composed the
quorum for any given transaction, so we can't know which servers to fail to if
the master dies. This isn't different from Oracle, where it looks like
essentially the quorum value is always 1. Your scenario shows that all
replicas are not created equal, and that sometimes we'll be interested in WAL
getting committed on a specific subset of the available servers. If I had two
nearby replicas called X and Y, and one at a remote site called Z, for
instance, I'd set quorum to 2, but really I'd want to say wait for server X
and Y before committing, but don't worry about Z.

I have no idea how to set up our GUCs to encode a situation like that :)

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Fujii Masao

On Tue, Jul 27, 2010 at 8:48 PM, Yeb Havinga yebhavi...@gmail.com wrote:
 Is there a reason not to send the signal in XlogFlush itself, so it would be
 called at

 CreateCheckPoint(), EndPrepare(), FlushBuffer(),
 RecordTransactionAbortPrepared(), RecordTransactionCommit(),
 RecordTransactionCommitPrepared(), RelationTruncate(),
 SlruPhysicalWritePage(), write_relmap_file(), WriteTruncateXlogRec(), and
 xact_redo_commit().

Yes, it's because there is no need to send WAL immediately in other
than the following functions:

* EndPrepare()
* RecordTransactionAbortPrepared()
* RecordTransactionCommit()
* RecordTransactionCommitPrepared()

Some functions call XLogFlush() to follow the basic WAL rule. In the
standby, WAL records are always flushed to disk prior to any corresponding
data-file change. So, we don't need to replicate the result of XLogFlush()
immediately for the WAL rule.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Fujii Masao

On Tue, Jul 27, 2010 at 10:12 PM, Joshua Tolley eggyk...@gmail.com wrote:
 I don't think it can support the case you're interested in, though I'm not
 terribly expert on it. I'm definitely not arguing for the syntax Oracle uses,
 or something similar; I much prefer the flexibility we're proposing, and agree
 with Yeb Havinga in another email who suggests we spell out in documentation
 some recipes for achieving various possible scenarios given whatever GUCs we
 settle on.

Agreed. I'll add it to my TODO list.

 My concern is that in a quorum system, if the quorum number is less than the
 total number of replicas, there's no way to know *which* replicas composed the
 quorum for any given transaction, so we can't know which servers to fail to if
 the master dies.

What about checking the current WAL receive location of each standby by
using pg_last_xlog_receive_location()? The standby which has the newest
location should be failed over to.

 This isn't different from Oracle, where it looks like
 essentially the quorum value is always 1. Your scenario shows that all
 replicas are not created equal, and that sometimes we'll be interested in WAL
 getting committed on a specific subset of the available servers. If I had two
 nearby replicas called X and Y, and one at a remote site called Z, for
 instance, I'd set quorum to 2, but really I'd want to say wait for server X
 and Y before committing, but don't worry about Z.

 I have no idea how to set up our GUCs to encode a situation like that :)

Yeah, quorum commit alone cannot cover that situation. I think that
current approach (i.e., quorum commit plus replication mode per standby)
would cover that. In your example, you can choose recv, fsync or
replay as replication_mode in X and Y, and choose async in Z.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Joshua Tolley

On Tue, Jul 27, 2010 at 10:53:45PM +0900, Fujii Masao wrote:
 On Tue, Jul 27, 2010 at 10:12 PM, Joshua Tolley eggyk...@gmail.com wrote:
  My concern is that in a quorum system, if the quorum number is less than the
  total number of replicas, there's no way to know *which* replicas composed 
  the
  quorum for any given transaction, so we can't know which servers to fail to 
  if
  the master dies.
 
 What about checking the current WAL receive location of each standby by
 using pg_last_xlog_receive_location()? The standby which has the newest
 location should be failed over to.

That makes sense. Thanks.

  This isn't different from Oracle, where it looks like
  essentially the quorum value is always 1. Your scenario shows that all
  replicas are not created equal, and that sometimes we'll be interested in 
  WAL
  getting committed on a specific subset of the available servers. If I had 
  two
  nearby replicas called X and Y, and one at a remote site called Z, for
  instance, I'd set quorum to 2, but really I'd want to say wait for server X
  and Y before committing, but don't worry about Z.
 
  I have no idea how to set up our GUCs to encode a situation like that :)
 
 Yeah, quorum commit alone cannot cover that situation. I think that
 current approach (i.e., quorum commit plus replication mode per standby)
 would cover that. In your example, you can choose recv, fsync or
 replay as replication_mode in X and Y, and choose async in Z.

Clearly I need to read through the GUCs and docs better. I'll try to keep
quiet until that's finished :)


--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature

Re: [HACKERS] Synchronous replication

2010-07-27 Thread Dimitri Fontaine

Le 27 juil. 2010 à 15:12, Joshua Tolley eggyk...@gmail.com a écrit :
 My concern is that in a quorum system, if the quorum number is less than the
 total number of replicas, there's no way to know *which* replicas composed the
 quorum for any given transaction, so we can't know which servers to fail to if
 the master dies. This isn't different from Oracle, where it looks like
 essentially the quorum value is always 1. Your scenario shows that all
 replicas are not created equal, and that sometimes we'll be interested in WAL
 getting committed on a specific subset of the available servers. If I had two
 nearby replicas called X and Y, and one at a remote site called Z, for
 instance, I'd set quorum to 2, but really I'd want to say wait for server X
 and Y before committing, but don't worry about Z.
 
 I have no idea how to set up our GUCs to encode a situation like that :)

You make it so that Z does not take a vote, by setting it async.

Regards,
-- 
dim
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-26 Thread Fujii Masao

On Thu, Jul 22, 2010 at 5:37 PM, Yeb Havinga yebhavi...@gmail.com wrote:
 Fujii Masao wrote:

 How should the synchronous replication behave when the number of connected
 standby servers is less than quorum?

 1. Ignore quorum. The current patch adopts this. If the ACKs from all
   connected standbys have arrived, transaction commit is successful
   even if the number of standbys is less than quorum. If there is no
   connected standby, transaction commit always is successful without
   regard to quorum.

 2. Observe quorum. Aidan wants this. Until the number of connected
   standbys has become more than or equal to quorum, transaction commit
   waits.

 Which is the right behavior of quorum commit? Or we should add new
 parameter specifying the behavior of quorum commit?


 Initially I also expected the quorum to behave like described by
 Aidan/option 2.

OK. But some people (including me) would like to prevent the master
from halting when the standby fails, so I think that 1. also should
be supported. So I'm inclined to add new parameter specifying the
behavior of quorum commit when the number of synchronous standbys
becomes less than quorum.

 Also, IMHO the name quorom is a bit short, like having
 maximum but not saying a max_something.

 quorum_min_sync_standbys
 quorum_max_sync_standbys

What about quorum_standbys?

 The question remains what are the sync standbys? Does it mean not-async?

It's the standby which sets replication_mode to recv, fsync, or replay.

 Intuitively by looking at the enumeration of replication_mode I'd think that
 the sync standbys are all standby's that operate in a not async mode. That
 would be clearer with a boolean sync (or not) and for sync standbys the
 replication_mode specified.

You mean that something like synchronous_replication as the recovery.conf
parameter should be added in addition to replication_mode? Since increasing
the number of similar parameters would confuse users, I don't like do that.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-26 Thread Yeb Havinga


Fujii Masao wrote:

Intuitively by looking at the enumeration of replication_mode I'd think that
the sync standbys are all standby's that operate in a not async mode. That
would be clearer with a boolean sync (or not) and for sync standbys the
replication_mode specified.



You mean that something like synchronous_replication as the recovery.conf
parameter should be added in addition to replication_mode? Since increasing
the number of similar parameters would confuse users, I don't like do that.
  
I think what would be confusing if there is a mismatch between 
implemented concepts and parameters.


1 does the master wait for standby servers on commit?
2 how many acknowledgements must the master receive before it can continue?
3 is a standby server a synchronous one, i.e. does it acknowledge a commit?
4 when do standby servers acknowledge a commit?
5 does it only wait when the standby's are connected, or also when they 
are not connected?

6..?

When trying to match parameter names for the concepts above:
1 - does not exist, but can be answered with quorum_standbys = 0
2 - quorum_standbys
3 - yes, if replication_mode != async (here is were I thought I had to 
think to much)

4 - replication modes recv, fsync and replay bot not async
5 - Zoltan's strict_sync_replication parameter

Just an idea, what about
for 4: acknowledge_commit = {no|recv|fsync|replay}
then 3 = yes, if acknowledge_commit != no

regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Synchronous replication

2010-07-26 Thread Fujii Masao

On Mon, Jul 26, 2010 at 5:27 PM, Yeb Havinga yebhavi...@gmail.com wrote:
 Fujii Masao wrote:

 Intuitively by looking at the enumeration of replication_mode I'd think
 that
 the sync standbys are all standby's that operate in a not async mode.
 That
 would be clearer with a boolean sync (or not) and for sync standbys the
 replication_mode specified.


 You mean that something like synchronous_replication as the recovery.conf
 parameter should be added in addition to replication_mode? Since
 increasing
 the number of similar parameters would confuse users, I don't like do
 that.


 I think what would be confusing if there is a mismatch between implemented
 concepts and parameters.

 1 does the master wait for standby servers on commit?
 2 how many acknowledgements must the master receive before it can continue?
 3 is a standby server a synchronous one, i.e. does it acknowledge a commit?
 4 when do standby servers acknowledge a commit?
 5 does it only wait when the standby's are connected, or also when they are
 not connected?
 6..?

 When trying to match parameter names for the concepts above:
 1 - does not exist, but can be answered with quorum_standbys = 0
 2 - quorum_standbys
 3 - yes, if replication_mode != async (here is were I thought I had to think
 to much)
 4 - replication modes recv, fsync and replay bot not async
 5 - Zoltan's strict_sync_replication parameter

 Just an idea, what about
 for 4: acknowledge_commit = {no|recv|fsync|replay}
 then 3 = yes, if acknowledge_commit != no

Thanks for the clarification.

I still like

replication_mode = {async|recv|fsync|replay}

rather than

synchronous_replication = {on|off}
acknowledge_commit = {no|recv|fsync|replay}

because the former is more intuitive for me and I don't want
to increase the number of parameters.

We need to hear from some users in this respect. If most want
the latter, of course, I'd love to adopt it.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

quorum commit Re: [HACKERS] Synchronous replication

2010-07-26 Thread Fujii Masao

On Thu, Jul 22, 2010 at 5:37 PM, Yeb Havinga yebhavi...@gmail.com wrote:
 Fujii Masao wrote:

 How should the synchronous replication behave when the number of connected
 standby servers is less than quorum?

 1. Ignore quorum. The current patch adopts this. If the ACKs from all
   connected standbys have arrived, transaction commit is successful
   even if the number of standbys is less than quorum. If there is no
   connected standby, transaction commit always is successful without
   regard to quorum.

 2. Observe quorum. Aidan wants this. Until the number of connected
   standbys has become more than or equal to quorum, transaction commit
   waits.

 Which is the right behavior of quorum commit? Or we should add new
 parameter specifying the behavior of quorum commit?


 Initially I also expected the quorum to behave like described by
 Aidan/option 2.

I have another question about the detailed design of quorum commit.

In the following case, how should quorum commit behave?

1. quorum_standbys = 2; there are three connected synchronous standbys
2. One standby sends the ACK back and fails
3. The ACK arrives from another standby
4. How should quorum commit behave?

(a) Transaction commit returns a success since the master has already
received two ACKs
(b) Transaction commit waits for the last ACK since only one of
currently connected standbys has sent the ACK

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: quorum commit Re: [HACKERS] Synchronous replication

2010-07-26 Thread Yeb Havinga


Fujii Masao wrote:

In the following case, how should quorum commit behave?

1. quorum_standbys = 2; there are three connected synchronous standbys
2. One standby sends the ACK back and fails
3. The ACK arrives from another standby
4. How should quorum commit behave?

(a) Transaction commit returns a success since the master has already
received two ACKs
(b) Transaction commit waits for the last ACK since only one of
currently connected standbys has sent the ACK
  
I'd opt for option (b) if that doesn't make the code very complex, or 
expensive (to check connected state when reaching quorum).


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 >

1 - 100 of 237 matches

Mail list logo