Re: [HACKERS] Timeline in the light of Synchronous replication

2010-10-19 Thread Robert Haas
On Mon, Oct 18, 2010 at 4:31 AM, Fujii Masao masao.fu...@gmail.com wrote:
 But, even though we will have done that, it should be noted that WAL in
 A might be ahead of that in B. For example, A might crash right after
 writing WAL to the disk and before sending it to B. So when we restart
 the old master A as the standby after failover, we should need to delete
 some WAL files (in A) which are inconsistent with the WAL sequence in B.

Right.  There's no way to make it categorically safe to turn A into a
standby, because there's no way to guarantee that the fsyncs of the
WAL happen at the same femtosecond on both machines.  What we should
be looking for is a reliable way to determine whether or not it is in
fact safe.  Timelines are intended to provide that, but there are
holes, so they don't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline in the light of Synchronous replication

2010-10-18 Thread Fujii Masao
On Thu, Oct 14, 2010 at 8:23 AM, fazool mein fazoolm...@gmail.com wrote:
 The concept of time line makes sense to me in the case of asynchronous
 replication. But in case of synchronous replication, I am not so sure.

 When a standby connects to the primary, it checks if both have the same time
 line. If not, it doesn't start.

 Now, consider the following scenario. The primary (call it A) fails, the
 standby (call it B), via a trigger file, comes out of recovery mode
 (increments time line id to say 2), and morphs into a primary. Now, lets say
 we start the old primary A as a standby, to connect to the new primary B
 (which previously was a standby). As the code is at the moment, the old
 primary A will not be allowed to connect to the new primary B because A's
 timelineid (1) is not equivalent to that of the new primary B (2). Hence, we
 need to create a backup again, and setup the standby from scratch.

Yep.

 In the above scenario, if the system was using asynchronous replication,
 time lines would have saved us from doing something wrong. But, if we are
 using synchronous replication, we know that both A and B would have been in
 sync before the failure. In this case, forcing to create a new standby from
 scratch because of time lines might not be very desirable if the database is
 huge.

At least in my sync rep patch, the data buffer flush waits until WAL has
been written to the disk, but not until WAL has arrived at the standby.
So the database in A might be ahead of that in B, even in sync rep. To
avoid this, we should make the buffer flush wait for also replication?

But, even though we will have done that, it should be noted that WAL in
A might be ahead of that in B. For example, A might crash right after
writing WAL to the disk and before sending it to B. So when we restart
the old master A as the standby after failover, we should need to delete
some WAL files (in A) which are inconsistent with the WAL sequence in B.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline in the light of Synchronous replication

2010-10-18 Thread Dimitri Fontaine
Fujii Masao masao.fu...@gmail.com writes:
 But, even though we will have done that, it should be noted that WAL in
 A might be ahead of that in B. For example, A might crash right after
 writing WAL to the disk and before sending it to B. So when we restart
 the old master A as the standby after failover, we should need to delete
 some WAL files (in A) which are inconsistent with the WAL sequence in B.

The idea to send from master to slave the current last applied LSN has
been talked about already. It would allow to send the WAL content in
parallel of it's local fsync() on the master, the standby would refrain
from applying any WAL segment until it knows the master is past that.

Now, given such a behavior, that would mean that when A joins again as a
standby, it would have to ask B for the current last applied LSN too,
and would notice the timeline change. Maybe by adding a facility to
request the last LSN of the previous timeline, and with the behavior
above applied there (skipping now-known-future-WALs in recovery), that
would work automatically?

There's still the problem of WALs that have been applied before
recovery, I don't know that we can do anything here. But maybe we could
also tweak the CHECKPOINT mecanism not to advance the restart point
until we know the standbys have already replayed anything up to the
restart point?

-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Timeline in the light of Synchronous replication

2010-10-18 Thread fazool mein
I believe we should come up with a universal solution that will solve
potential future problems as well (for example, if in sync replication, we
decide to send writes to standbys in parallel to writing on local disk).

The ideal thing would be to have an id that is incremented on every failure,
and is stored in the WAL. Whenever a standby connects to the primary, it
should send the point p in WAL where streaming should start, plus the id. If
the id is the same at the primary at point p, things are good. Else, we
should tell the standby to either create a new copy from scratch, or delete
some WALs.

@David
 One way to get them in sync without starting from scratch is to use
 rsync from A to B.  This works in the asynchronous case, too. :)

The problem mainly is detecting when one can rsync/stream and when not.

Regards



On Mon, Oct 18, 2010 at 1:57 AM, Dimitri Fontaine dimi...@2ndquadrant.frwrote:

 Fujii Masao masao.fu...@gmail.com writes:
  But, even though we will have done that, it should be noted that WAL in
  A might be ahead of that in B. For example, A might crash right after
  writing WAL to the disk and before sending it to B. So when we restart
  the old master A as the standby after failover, we should need to delete
  some WAL files (in A) which are inconsistent with the WAL sequence in B.

 The idea to send from master to slave the current last applied LSN has
 been talked about already. It would allow to send the WAL content in
 parallel of it's local fsync() on the master, the standby would refrain
 from applying any WAL segment until it knows the master is past that.

 Now, given such a behavior, that would mean that when A joins again as a
 standby, it would have to ask B for the current last applied LSN too,
 and would notice the timeline change. Maybe by adding a facility to
 request the last LSN of the previous timeline, and with the behavior
 above applied there (skipping now-known-future-WALs in recovery), that
 would work automatically?

 There's still the problem of WALs that have been applied before
 recovery, I don't know that we can do anything here. But maybe we could
 also tweak the CHECKPOINT mecanism not to advance the restart point
 until we know the standbys have already replayed anything up to the
 restart point?

 --
 Dimitri Fontaine
 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



Re: [HACKERS] Timeline in the light of Synchronous replication

2010-10-17 Thread David Fetter
On Wed, Oct 13, 2010 at 04:23:57PM -0700, fazool mein wrote:
 Hello guys,
 
 The concept of time line makes sense to me in the case of asynchronous
 replication. But in case of synchronous replication, I am not so sure.
 
 When a standby connects to the primary, it checks if both have the same time
 line. If not, it doesn't start.
 
 Now, consider the following scenario. The primary (call it A) fails, the
 standby (call it B), via a trigger file, comes out of recovery mode
 (increments time line id to say 2), and morphs into a primary. Now, lets say
 we start the old primary A as a standby, to connect to the new primary B
 (which previously was a standby). As the code is at the moment, the old
 primary A will not be allowed to connect to the new primary B because A's
 timelineid (1) is not equivalent to that of the new primary B (2). Hence, we
 need to create a backup again, and setup the standby from scratch.

Yes.

 In the above scenario, if the system was using asynchronous
 replication, time lines would have saved us from doing something
 wrong. But, if we are using synchronous replication, we know that
 both A and B would have been in sync before the failure. In this
 case, forcing to create a new standby from scratch because of time
 lines might not be very desirable if the database is huge.

One way to get them in sync without starting from scratch is to use
rsync from A to B.  This works in the asynchronous case, too. :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Timeline in the light of Synchronous replication

2010-10-16 Thread fazool mein
Hello guys,

The concept of time line makes sense to me in the case of asynchronous
replication. But in case of synchronous replication, I am not so sure.

When a standby connects to the primary, it checks if both have the same time
line. If not, it doesn't start.

Now, consider the following scenario. The primary (call it A) fails, the
standby (call it B), via a trigger file, comes out of recovery mode
(increments time line id to say 2), and morphs into a primary. Now, lets say
we start the old primary A as a standby, to connect to the new primary B
(which previously was a standby). As the code is at the moment, the old
primary A will not be allowed to connect to the new primary B because A's
timelineid (1) is not equivalent to that of the new primary B (2). Hence, we
need to create a backup again, and setup the standby from scratch.

In the above scenario, if the system was using asynchronous replication,
time lines would have saved us from doing something wrong. But, if we are
using synchronous replication, we know that both A and B would have been in
sync before the failure. In this case, forcing to create a new standby from
scratch because of time lines might not be very desirable if the database is
huge.

Your comments on the above will be appreciated.

Regards