Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-16 Thread Markus Wanner

Hi,

Greg Stark wrote:

I think my definition would be that a query against the replica will
produce the same result as a query against the master -- and that that
will be the case even after a system failure. That might not
necessarily mean that the log entry is fsynced on the replica, only
that it's fsynced in a location where the replica will have access to
it when it runs recovery.


I tend to agree with that definition of synchrony for replicated
databases. However, let me point to an earlier thread around the same
topic:
http://archives.postgresql.org/message-id/4942ecf7.5040...@bluegap.ch

You will definitely find different definitions and requirements of what
synchronous replication means there. It convinced me that synchronous
is more of a marketing term in this area and is better avoided in
technical documents and discussions, or needs explanation.

As far as marketing goes, there are the customers who absolutely want
synchronous replication for its consistency and then there are the 
others who absolutely don't want it due to its unusably high latency.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-16 Thread Greg Smith

Markus Wanner wrote:

You will definitely find different definitions and requirements of what
synchronous replication means there. 
To quote from the Wikipedia entry on Database Replication that Simon 
pointed to during the earlier discussion, 
http://en.wikipedia.org/wiki/Database_replication


Synchronous replication - guarantees zero data loss by the means of 
atomic write operation, i.e. write either completes on both sides or not 
at all. Write is not considered complete until acknowledgement by both 
local and remote storage.


That last part is the critical one:  acknowledgement by both local and 
remote storage is required before you can label something truly 
synchronous replication.  In implementation terms, that means you must 
have both local and slave fsync calls finish to be considered truly 
synchronous.  That part is not ambiguous at all.


There's a definition of the weaker form in there too, which is where the 
ambiguity is at:


Semi-synchronous replication - this usually means that a write is 
considered complete as soon as local storage acknowledges it and a 
remote server acknowledges that it has received the write either into 
memory or to a dedicated log file.


I don't consider that really synchronous replication anymore, but as you 
say it's been strengthened by marketing enough to be a valid industry 
term at this point.  Since it's already gained traction we might use it, 
as long as it's defined properly and its trade-offs vs. a true 
synchronous implementation are documented.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-16 Thread Markus Wanner

Hi,

Quoting Greg Smith g...@2ndquadrant.com:
Synchronous replication - guarantees zero data loss by the means  
of atomic write operation, i.e. write either completes on both sides  
or not at all. Write is not considered complete until  
acknowledgement by both local and remote storage.


Note that a storage acknowledge (hopefully) guarantees durability, but  
it does not necessarily mean that the transactional changes are  
immediately visible on a remote node. Which is what you had in your  
definition.


My point is that there are at least three things that can run  
synchronously or not, WRT to distributed databases:


 1. conflict detection and handling (for consistency)
 2. storage acknowledgement (for durability)
 3. effective application of changes (for visibility across nodes)

That last part is the critical one:  acknowledgement by both local  
and remote storage is required before you can label something truly  
synchronous replication.  In implementation terms, that means you  
must have both local and slave fsync calls finish to be considered  
truly synchronous.  That part is not ambiguous at all.


I personally agree 100%. (Given it implies a congruent conflict  
handling *before* the disk write. Having conflicting transactional  
changes on the disk wouldn't help much at recovery time).


(And yes, this means I think the effective application of changes can  
be deferred. IMO the load balancer and/or the application should take  
care not to send transactions from the same session to different nodes).



Semi-synchronous replication


..is plain non-sense to my ears. Either something is synchronous or it  
is not. No half, no semi, no virtual synchrony. To have any technical  
relevance, one needs to add *what* is synchronous and what not.


In that spirit I have to admit that the term 'eager' that I'm  
currently using to describe Postgres-R may not be any more helpful. I  
take it to mean synchrony of 1. and 2., but not 3.


Regards

Markus Wanner


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-13 Thread Fujii Masao
On Fri, Nov 13, 2009 at 3:17 PM, Greg Smith g...@2ndquadrant.com wrote:
 Yeah, that's the other parts of the industry I was referring to.  MySQL
 uses semi-synchronous to distinguish between its completely asynchronous
 default replication mode and one where it provides a somewhat safer
 implementation.  The description reads more as asynchronous with some
 synchronous elements, not one style of synchronous implementation.  None
 of their documentation wanders into the problem area here by calling it a
 true synchronous solution when it's really not--MySQL Cluster is their
 synchronous vehicle.
 It's fine to adopt the term semi-synchronous, as it's become quite popular
 and people are going to label the PG implementation with it regardless of
 what is settled on here.  But we should all try to be careful to use it as
 correctly as possible.

OK. Let's think over what recv ACK and fsync ACK
synchronization modes should be called later.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Fujii Masao
Hi,

On Thu, Nov 12, 2009 at 4:32 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Fujii Masao wrote:
 The problem is that fsync needs to be issued too frequently, which would
 be harmless in asynchronous replication, but not in synchronous one.
 A transaction would have to wait for the primary's and standby's fsync
 before returning a success to a client.

 So I'm inclined to change the startup process and bgwriter, instead of
 walreceiver, so as to fsync the WAL for the WAL rule.

 Let's keep it simple for now. Just make the walreceiver do the fsync. We
 can optimize later. For now, we're only going to have async mode anyway.

Okey, I'll do that; the walreceiver issues the fsync for each arrival of
the WAL records, and the startup process replays only the records already
fsynced.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Simon Riggs
On Thu, 2009-11-12 at 17:03 +0900, Fujii Masao wrote:

 On Thu, Nov 12, 2009 at 4:32 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  Fujii Masao wrote:
  The problem is that fsync needs to be issued too frequently, which would
  be harmless in asynchronous replication, but not in synchronous one.
  A transaction would have to wait for the primary's and standby's fsync
  before returning a success to a client.
 
  So I'm inclined to change the startup process and bgwriter, instead of
  walreceiver, so as to fsync the WAL for the WAL rule.
 
  Let's keep it simple for now. Just make the walreceiver do the fsync. We
  can optimize later. For now, we're only going to have async mode anyway.
 
 Okey, I'll do that; the walreceiver issues the fsync for each arrival of
 the WAL records, and the startup process replays only the records already
 fsynced.

I agree with you, though it has taken some time to understand what you
said and at first my reaction was to disagree. I think the responses you
got on this are because you dived straight in with a question before
explaining other things around this.

We already have a number of options for how to handle incoming WAL. We
can choose to fsync or not when WAL arrives. Choosing *not* to fsync
would be the typical choice because it provides reasonable performance;
fsyncing after each transaction commit would be worse. In any case, if
WAL receiver does the fsyncs then we will get worse performance. If we
reduce the number of fsyncs it does we just get spiky behaviour around
the fsyncs.

If recovery starts reading WAL records that have not been fsynced then
we may need to flush a shared buffer to disk that depends upon a
non-fsynced(yet) WAL record. Fsyncing WAL after *every* WAL record is
going to make performance suck even worse and is completely out of the
question. So implementing the fsync-WAL-before-buffer-flush rule during
recovery makes much more sense. It's also only small change during
XlogFlush().

Another way of doing this would be to only allow recovery to progress as
far as has been fsynced. That seems a more plausible approach, but would
lead to delays if we had a small number of long write transactions. The
benefit of streaming is that it potentially allows us to keep as near to
real-time recovery as possible.

So overall, yes, we need to do as you suggested: implement WAL rule in
recovery. WALreceiver smoothly does write(), Startup replays and we
leave the WAL file fsyncs to be performed by the bgwriter. 

But I also agree with Heikki. Let's plan to do this later in this
release.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Fujii Masao
On Thu, Nov 12, 2009 at 6:27 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I agree with you, though it has taken some time to understand what you
 said and at first my reaction was to disagree. I think the responses you
 got on this are because you dived straight in with a question before
 explaining other things around this.

Thanks for clarifying this topic ;)

 If recovery starts reading WAL records that have not been fsynced then
 we may need to flush a shared buffer to disk that depends upon a
 non-fsynced(yet) WAL record. Fsyncing WAL after *every* WAL record is
 going to make performance suck even worse and is completely out of the
 question. So implementing the fsync-WAL-before-buffer-flush rule during
 recovery makes much more sense. It's also only small change during
 XlogFlush().

Agreed. This approach has lesser impact on the performance.

But, as I said on my first post on this thread, even such low-frequent
fsync-WAL-before-buffer-flush might cause a response time spike on the
primary because the walreceiver must sleep during that fsync. I think
that leaving the WAL-logging business to another process like walwriter
is a good idea for reducing further the impact on the walreceiver; In
typical case,

* The walreceiver receives WAL records, returns the ACK to the primary,
  saves them in the wal_buffers, and lets the startup process know
  the arrival.

* The walwriter writes and fsyncs the WAL records in the wal_buffers.

* The startup process applies the WAL records in the wal_buffers
  when it receives the notice of the arrival.

* The startup process and bgwriter fsyncs the WAL before the buffer
  flush.

Of course, since this approach is too complicated, it's out of the scope
of the development for v8.5.

 But I also agree with Heikki. Let's plan to do this later in this
 release.

Okey. I implement nothing around this topic until the core part of
asynchronous replication will have been committed.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Simon Riggs
On Thu, 2009-11-12 at 21:45 +0900, Fujii Masao wrote:

 But, as I said on my first post on this thread, even such low-frequent
 fsync-WAL-before-buffer-flush might cause a response time spike on the
 primary because the walreceiver must sleep during that fsync. I think
 that leaving the WAL-logging business to another process like walwriter
 is a good idea for reducing further the impact on the walreceiver; In
 typical case,

Agree completely.

 Of course, since this approach is too complicated, it's out of the scope
 of the development for v8.5.

It's out of scope for phase 1, certainly.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 The problem is that fsync needs to be issued too frequently, which would
 be harmless in asynchronous replication, but not in synchronous one.
 A transaction would have to wait for the primary's and standby's fsync
 before returning a success to a client.

Surely that is exactly what is *required* if the user has asked for
synchronous replication.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Greg Smith

Tom Lane wrote:

Fujii Masao masao.fu...@gmail.com writes:
  

The problem is that fsync needs to be issued too frequently, which would
be harmless in asynchronous replication, but not in synchronous one.
A transaction would have to wait for the primary's and standby's fsync
before returning a success to a client.



Surely that is exactly what is *required* if the user has asked for
synchronous replication.
  
This a distressingly common thing people get wrong about replication.  
You can either have synchronous replication, which as you say has to be 
slow:  you must wait for an fsync ACK from the secondary and a return 
trip before you can say something is committed on the primary.  Or you 
can get better performance by not waiting for all of those things, but 
the minute you do that it's *not* synchronous replication anymore.  You 
can't get high-performance and true synchronous behavior; you have to 
pick one.  The best you can do if you need both is work on accelerating 
fsync everywhere using the standard battery-backed write cache technique.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Fujii Masao
On Fri, Nov 13, 2009 at 1:49 AM, Greg Smith g...@2ndquadrant.com wrote:
 This a distressingly common thing people get wrong about replication.  You
 can either have synchronous replication, which as you say has to be slow:
 you must wait for an fsync ACK from the secondary and a return trip before
 you can say something is committed on the primary.  Or you can get better
 performance by not waiting for all of those things, but the minute you do
 that it's *not* synchronous replication anymore.  You can't get
 high-performance and true synchronous behavior; you have to pick one.  The
 best you can do if you need both is work on accelerating fsync everywhere
 using the standard battery-backed write cache technique.

I'm not happy that such frequent fsyncs would harm even semi-synchronous
replication (i.e., you must wait for a *recv* ACK from the secondary
and a return
trip before you can say something is committed on the primary. This corresponds
to the DRBD's protocol B) rather than synchronous one. Personally, I think that
semi-synchronous replication is sufficient for HA.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Aidan Van Dyk
* Fujii Masao masao.fu...@gmail.com [091112 20:52]:

Personally, I think 
 that
 semi-synchronous replication is sufficient for HA.

Often, but that's not synchronous replication so don't call it such...

-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Greg Smith

Fujii Masao wrote:

Personally, I think that semi-synchronous replication is sufficient for HA.
  
Whether or not you think it's sufficient for what you have in mind, 
synchronous replication requires a return ACK from the secondary 
before you say things are committed on the primary.  If you don't do 
that, it's not true sync replication anymore; it's asynchronous 
replication.  Plenty of people decide that a local commit combined with 
a promise to synchronize as soon as possible to the slave is good enough 
for their apps, which as you say is getting referred to as 
semi-synchronous replication nowadays.  That's an awful name though, 
because it's not true--that's asynchronous replication, just aiming for 
minimal lag.  It's OK to say that's what you want, but you can't say 
it's really a synchronous commit anymore if you do things that way.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Fujii Masao
On Fri, Nov 13, 2009 at 10:58 AM, Aidan Van Dyk ai...@highrise.ca wrote:
 * Fujii Masao masao.fu...@gmail.com [091112 20:52]:

                                                        Personally, I think 
 that
 semi-synchronous replication is sufficient for HA.

 Often, but that's not synchronous replication so don't call it such...

Hmm, though I'm not sure about your definition of synchronous,
if the primary waits for a *redo* ACK from the standby before
returning a success of a transaction to a client, you can call
SR synchronous?

This is one of TODO items of SR for v8.5.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Fujii Masao
On Fri, Nov 13, 2009 at 11:15 AM, Greg Smith g...@2ndquadrant.com wrote:
 Whether or not you think it's sufficient for what you have in mind,
 synchronous replication requires a return ACK from the secondary before
 you say things are committed on the primary.  If you don't do that, it's not
 true sync replication anymore; it's asynchronous replication.  Plenty of
 people decide that a local commit combined with a promise to synchronize as
 soon as possible to the slave is good enough for their apps, which as you
 say is getting referred to as semi-synchronous replication nowadays.
  That's an awful name though, because it's not true--that's asynchronous
 replication, just aiming for minimal lag.  It's OK to say that's what you
 want, but you can't say it's really a synchronous commit anymore if you do
 things that way.

Umm... what is your definition of synchronous? I'm planning to provide
four synchronization modes as follows, for v8.5. Does this fit in your
thought?

  The primary waits ... before returning success of a transaction;
  * nothing   - asynchronous replication
  * recv ACK  - semi-synchronous replication
  * fsync ACK - semi-synchronous replication
  * redo ACK  - synchronous replication

Or, in synchronous replication, we must wait a fsync and a redo ACK?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Greg Stark
On Fri, Nov 13, 2009 at 2:37 AM, Fujii Masao masao.fu...@gmail.com wrote:
 Umm... what is your definition of synchronous? I'm planning to provide
 four synchronization modes as follows, for v8.5. Does this fit in your

I think my definition would be that a query against the replica will
produce the same result as a query against the master -- and that that
will be the case even after a system failure. That might not
necessarily mean that the log entry is fsynced on the replica, only
that it's fsynced in a location where the replica will have access to
it when it runs recovery.

I do have a different question though. What do you plan to do if
there's a failure when they're out of sync? The master hasn't
responded to the commit yet because it's still waiting on the replica
to respond but it has already recorded the commit itself. When it
comes back up it's out of sync with the replica and has to resend
those records? What if the replica has already received it and it was
the confirmation which was lost?

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Greg Smith

Fujii Masao wrote:

Umm... what is your definition of synchronous? I'm planning to provide
four synchronization modes as follows, for v8.5. Does this fit in your
thought?

  The primary waits ... before returning success of a transaction;
  * nothing   - asynchronous replication
  * recv ACK  - semi-synchronous replication
  * fsync ACK - semi-synchronous replication
  * redo ACK  - synchronous replication

Or, in synchronous replication, we must wait a fsync and a redo ACK?
  
Right, those are the possibilities, all four of them have valid use 
cases in the field and are worth implementing.  I don't like the label 
semi-synchronous replication myself, but it's a valuable feature to 
implement, and that is unfortunately the term other parts of the 
industry use for that approach.


But everyone needs to be extremely careful with the terminology here:  
if you say synchronous replication, that *only* means what you're 
labeling redo ACK (WAL ACK really).  Synchronous replication 
should not be used as a group term that includes the semi-synchronous 
variations, which are in fact asynchronous despite their marketing 
name.  If someone means semi-synchronous, but they say synchronous 
thinking it's a shared term also applicable to the semi-synchronous 
variations here, that's just going to be confusing for everyone.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Fujii Masao
On Fri, Nov 13, 2009 at 11:54 AM, Greg Stark gsst...@mit.edu wrote:
 I think my definition would be that a query against the replica will
 produce the same result as a query against the master -- and that that
 will be the case even after a system failure. That might not
 necessarily mean that the log entry is fsynced on the replica, only
 that it's fsynced in a location where the replica will have access to
 it when it runs recovery.

Agreed.

 I do have a different question though. What do you plan to do if
 there's a failure when they're out of sync? The master hasn't
 responded to the commit yet because it's still waiting on the replica
 to respond but it has already recorded the commit itself. When it
 comes back up it's out of sync with the replica and has to resend
 those records? What if the replica has already received it and it was
 the confirmation which was lost?

If the connection is not closed, the resending is not required because
TCP would guarantee that such records arrive at the standby someday.

Otherwise, the standby re-connects to the primary, and asks for the
missing records, so the resending would be done. Since only the missing
records are requested, the already received records don't reach the
standby again, I think.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Fujii Masao
On Fri, Nov 13, 2009 at 1:49 PM, Greg Smith g...@2ndquadrant.com wrote:
 Right, those are the possibilities, all four of them have valid use cases in
 the field and are worth implementing.  I don't like the label
 semi-synchronous replication myself, but it's a valuable feature to
 implement, and that is unfortunately the term other parts of the industry
 use for that approach.

BTW, MySQL and DRBD use the term semi-synchronous:
http://forge.mysql.com/wiki/ReplicationFeatures/SemiSyncReplication
http://www.drbd.org/users-guide/s-replication-protocols.html

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Robert Hodges
Hi Greg and Fujii, 

Just a point on terminology:  there's a difference in the usage of
semi-synchronous between DRBD and MySQL semi-synchronous replication, which
was originally developed by Google.

In the Google case semi-synchronous replication is a quorum algorithm where
clients receive a commit notification only after at least one of N slaves
has received the replication event.  In the DRBD case semi-synchronous means
that events have reached the slave but are not necessarily durable.  There's
no quorum.  

Of these two usages the Google semi-sync approach is the more interesting
because it avoids the availability problems associated with fully
synchronous operation but gets most of the durability benefits.

Cheers, Robert

On 11/12/09 9:29 PM PST, Fujii Masao masao.fu...@gmail.com wrote:

 On Fri, Nov 13, 2009 at 1:49 PM, Greg Smith g...@2ndquadrant.com wrote:
 Right, those are the possibilities, all four of them have valid use cases in
 the field and are worth implementing.  I don't like the label
 semi-synchronous replication myself, but it's a valuable feature to
 implement, and that is unfortunately the term other parts of the industry
 use for that approach.
 
 BTW, MySQL and DRBD use the term semi-synchronous:
 http://forge.mysql.com/wiki/ReplicationFeatures/SemiSyncReplication
 http://www.drbd.org/users-guide/s-replication-protocols.html
 
 Regards,
 
 --
 Fujii Masao
 NIPPON TELEGRAPH AND TELEPHONE CORPORATION
 NTT Open Source Software Center
 
 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers
 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-12 Thread Greg Smith

Fujii Masao wrote:

On Fri, Nov 13, 2009 at 1:49 PM, Greg Smith g...@2ndquadrant.com wrote:
  

Right, those are the possibilities, all four of them have valid use cases in
the field and are worth implementing.  I don't like the label
semi-synchronous replication myself, but it's a valuable feature to
implement, and that is unfortunately the term other parts of the industry
use for that approach.



BTW, MySQL and DRBD use the term semi-synchronous:
http://forge.mysql.com/wiki/ReplicationFeatures/SemiSyncReplication
http://www.drbd.org/users-guide/s-replication-protocols.html
  
Yeah, that's the other parts of the industry I was referring to.  
MySQL uses semi-synchronous to distinguish between its completely 
asynchronous default replication mode and one where it provides a 
somewhat safer implementation.  The description reads more as 
asynchronous with some synchronous elements, not one style of 
synchronous implementation.  None of their documentation wanders into 
the problem area here by calling it a true synchronous solution when 
it's really not--MySQL Cluster is their synchronous vehicle. 

It's fine to adopt the term semi-synchronous, as it's become quite 
popular and people are going to label the PG implementation with it 
regardless of what is settled on here.  But we should all try to be 
careful to use it as correctly as possible.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] write ahead logging in standby (streaming replication)

2009-11-11 Thread Fujii Masao
Hi,

Should the standby also have to follow the WAL rule during recovery?
The current patch doesn't care about the write order of the data page
and WAL in the standby. So, after both servers fail, restarting the
ex-standby by itself might corrupt the data.

If the standby follows the WAL rule, walreceiver might delay in
writing WAL records until the startup process' or bgwriter's fsync
have been finished. I'm a bit concerned that such delay might
increase the performance overhead on the primary.

Thought?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-11 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 Should the standby also have to follow the WAL rule during recovery?
 The current patch doesn't care about the write order of the data page
 and WAL in the standby. So, after both servers fail, restarting the
 ex-standby by itself might corrupt the data.

Surely the receiver should fsync the WAL itself to disk before
acknowledging it.  Assuming you've done that, I don't see any
corruption risk.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-11 Thread Fujii Masao
On Thu, Nov 12, 2009 at 12:03 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Fujii Masao masao.fu...@gmail.com writes:
 Should the standby also have to follow the WAL rule during recovery?
 The current patch doesn't care about the write order of the data page
 and WAL in the standby. So, after both servers fail, restarting the
 ex-standby by itself might corrupt the data.

 Surely the receiver should fsync the WAL itself to disk before
 acknowledging it.  Assuming you've done that, I don't see any
 corruption risk.

acknowledging it means letting the startup process know the arrival
of WAL records? If so, I agree that there is no risk of data corruption.

The problem is that fsync needs to be issued too frequently, which would
be harmless in asynchronous replication, but not in synchronous one.
A transaction would have to wait for the primary's and standby's fsync
before returning a success to a client.

So I'm inclined to change the startup process and bgwriter, instead of
walreceiver, so as to fsync the WAL for the WAL rule.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] write ahead logging in standby (streaming replication)

2009-11-11 Thread Heikki Linnakangas
Fujii Masao wrote:
 The problem is that fsync needs to be issued too frequently, which would
 be harmless in asynchronous replication, but not in synchronous one.
 A transaction would have to wait for the primary's and standby's fsync
 before returning a success to a client.
 
 So I'm inclined to change the startup process and bgwriter, instead of
 walreceiver, so as to fsync the WAL for the WAL rule.

Let's keep it simple for now. Just make the walreceiver do the fsync. We
can optimize later. For now, we're only going to have async mode anyway.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers