Re: [HACKERS] Patch for fail-back without fresh backup

2014-01-16 Thread Jeff Janes
On Thu, Nov 21, 2013 at 2:43 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-11-21 14:40:36 -0800, Jeff Janes wrote:
  But if the transaction would not have otherwise generated WAL (i.e. a
  select that did not have to do any HOT pruning, or an update with zero
 rows
  matching the where condition), doesn't it now have to flush and wait when
  it would otherwise not?

 We short circuit that if there's no xid assigned. Check
 RecordTransactionCommit().


It looks like that only short-circuits the flush if both there is no xid
assigned, and !wrote_xlog.  (line 1054 of xact.c)

I do see stalls on fdatasync on flush from select statements which had no
xid, but did generate xlog due to HOT pruning, I don't see why WAL logging
hint bits would be different.

Cheers,

Jeff


Re: [HACKERS] Patch for fail-back without fresh backup

2014-01-16 Thread Andres Freund
On 2014-01-16 09:25:51 -0800, Jeff Janes wrote:
 On Thu, Nov 21, 2013 at 2:43 PM, Andres Freund and...@2ndquadrant.comwrote:
 
  On 2013-11-21 14:40:36 -0800, Jeff Janes wrote:
   But if the transaction would not have otherwise generated WAL (i.e. a
   select that did not have to do any HOT pruning, or an update with zero
  rows
   matching the where condition), doesn't it now have to flush and wait when
   it would otherwise not?
 
  We short circuit that if there's no xid assigned. Check
  RecordTransactionCommit().
 
 
 It looks like that only short-circuits the flush if both there is no xid
 assigned, and !wrote_xlog.  (line 1054 of xact.c)

Hm. Indeed. Why don't we just always use the async commit behaviour for
that? I don't really see any significant dangers from doing so?

It's also rather odd to use the sync rep mechanisms in such
scenarios... The if() really should test markXidCommitted instead of
wrote_xlog.

 I do see stalls on fdatasync on flush from select statements which had no
 xid, but did generate xlog due to HOT pruning, I don't see why WAL logging
 hint bits would be different.

Are the stalls at commit or while the select is running? If wal_buffers
is filled too fast, which can easily happen if loads of pages are hinted
and wal logged, that will happen independently from
RecordTransactionCommit().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2014-01-16 Thread Jeff Janes
On Thu, Jan 16, 2014 at 9:37 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2014-01-16 09:25:51 -0800, Jeff Janes wrote:
  On Thu, Nov 21, 2013 at 2:43 PM, Andres Freund and...@2ndquadrant.com
 wrote:
 
   On 2013-11-21 14:40:36 -0800, Jeff Janes wrote:
But if the transaction would not have otherwise generated WAL (i.e. a
select that did not have to do any HOT pruning, or an update with
 zero
   rows
matching the where condition), doesn't it now have to flush and wait
 when
it would otherwise not?
  
   We short circuit that if there's no xid assigned. Check
   RecordTransactionCommit().
  
 
  It looks like that only short-circuits the flush if both there is no xid
  assigned, and !wrote_xlog.  (line 1054 of xact.c)

 Hm. Indeed. Why don't we just always use the async commit behaviour for
 that? I don't really see any significant dangers from doing so?


I think the argument is that drawing the next value from a sequence can
generate xlog that needs to be flushed, but doesn't assign an xid.

I would think the sequence should flush that record before it hands out the
value, not before the commit, but...





 It's also rather odd to use the sync rep mechanisms in such
 scenarios... The if() really should test markXidCommitted instead of
 wrote_xlog.

  I do see stalls on fdatasync on flush from select statements which had no
  xid, but did generate xlog due to HOT pruning, I don't see why WAL
 logging
  hint bits would be different.

 Are the stalls at commit or while the select is running? If wal_buffers
 is filled too fast, which can easily happen if loads of pages are hinted
 and wal logged, that will happen independently from
 RecordTransactionCommit().


In the real world, I'm not sure what the distribution is.

But in my present test case, they are coming almost exclusively from
RecordTransactionCommit.

I use pgbench -T10 in a loop to generate dirty data and checkpoints (with
synchronous_commit on but with a BBU), and then to probe the consequences I
use:

pgbench -T10 -S -n --startup='set synchronous_commit='$f

(where --startup is an extension to pgbench proposed a few months ago)

Running the select-only query with synchronous_commit off almost completely
isolates it from the checkpoint drama that otherwise has a massive effect
on it.  with synchronous_commit=on, it goes from 6000 tps normally to 30
tps during the checkpoint sync, with synchronous_commit=off it might dip to
4000 or so during the worst of it.

(To be clear, this is about the pruning, not the logging of the hint bits)

Cheers,

Jeff


Re: [HACKERS] Patch for fail-back without fresh backup

2014-01-16 Thread Andres Freund
On 2014-01-16 11:01:29 -0800, Jeff Janes wrote:
 I think the argument is that drawing the next value from a sequence can
 generate xlog that needs to be flushed, but doesn't assign an xid.

Then that should assign an xid. Which would yield correct behaviour with
async commit where it's currently *not* causing a WAL flush at all
unless a page boundary is crossed.

I've tried arguing that way before...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2014-01-16 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 I think the argument is that drawing the next value from a sequence can
 generate xlog that needs to be flushed, but doesn't assign an xid.

 I would think the sequence should flush that record before it hands out the
 value, not before the commit, but...

IIRC the argument was that we'd flush WAL before any use of the value
could make it to disk.  Which is true if you're just inserting it into
a table; perhaps less so if the client is doing something external to
the database with it.  (But it'd be reasonable to say that clients
who want a guaranteed-good serial for such purposes should have to
commit the transaction that created the value.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-11-22 Thread Bruce Momjian
On Thu, Nov 21, 2013 at 11:43:34PM +0100, Andres Freund wrote:
 On 2013-11-21 14:40:36 -0800, Jeff Janes wrote:
  But if the transaction would not have otherwise generated WAL (i.e. a
  select that did not have to do any HOT pruning, or an update with zero rows
  matching the where condition), doesn't it now have to flush and wait when
  it would otherwise not?
 
 We short circuit that if there's no xid assigned. Check
 RecordTransactionCommit().

OK, that was my question, now answered.  Thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-11-21 Thread Jeff Janes
On Thu, Nov 21, 2013 at 12:31 PM, Alvaro Herrera
alvhe...@2ndquadrant.comwrote:

 Bruce Momjian escribió:
  On Thu, Oct 24, 2013 at 11:14:14PM +0300, Heikki Linnakangas wrote:
   On 24.10.2013 23:07, Josh Berkus wrote:

   What kind of overhead are we talking about here?
  
   One extra WAL record whenever a hint bit is set on a page, for the
   first time after a checkpoint. In other words, a WAL record needs to
   be written in the same circumstances as with page checksums, but the
   WAL records are much smaller as they don't need to contain a full
   page image, just the block number of the changed block.
  
   Or maybe we'll write the full page image after all, like with page
   checksums, just without calculating the checksums. It might be
   tricky to skip the full-page image, because then a subsequent change
   of the page (which isn't just a hint-bit update) needs to somehow
   know it needs to take a full page image even though a WAL record for
   it was already written.
 
  Sorry to be replying late to this, but while I am not worried about the
  additional WAL volume, does this change require the transaction to now
  wait for a WAL sync to disk before continuing?

 I don't think so.  There's extra WAL written, but there's no
 flush-and-wait until end of transaction (as has always been).



But if the transaction would not have otherwise generated WAL (i.e. a
select that did not have to do any HOT pruning, or an update with zero rows
matching the where condition), doesn't it now have to flush and wait when
it would otherwise not?

Cheers,

Jeff


Re: [HACKERS] Patch for fail-back without fresh backup

2013-11-21 Thread Andres Freund
On 2013-11-21 14:40:36 -0800, Jeff Janes wrote:
 But if the transaction would not have otherwise generated WAL (i.e. a
 select that did not have to do any HOT pruning, or an update with zero rows
 matching the where condition), doesn't it now have to flush and wait when
 it would otherwise not?

We short circuit that if there's no xid assigned. Check
RecordTransactionCommit().

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-11-21 Thread Alvaro Herrera
Bruce Momjian escribió:
 On Thu, Oct 24, 2013 at 11:14:14PM +0300, Heikki Linnakangas wrote:
  On 24.10.2013 23:07, Josh Berkus wrote:

  What kind of overhead are we talking about here?
  
  One extra WAL record whenever a hint bit is set on a page, for the
  first time after a checkpoint. In other words, a WAL record needs to
  be written in the same circumstances as with page checksums, but the
  WAL records are much smaller as they don't need to contain a full
  page image, just the block number of the changed block.
  
  Or maybe we'll write the full page image after all, like with page
  checksums, just without calculating the checksums. It might be
  tricky to skip the full-page image, because then a subsequent change
  of the page (which isn't just a hint-bit update) needs to somehow
  know it needs to take a full page image even though a WAL record for
  it was already written.
 
 Sorry to be replying late to this, but while I am not worried about the
 additional WAL volume, does this change require the transaction to now
 wait for a WAL sync to disk before continuing?

I don't think so.  There's extra WAL written, but there's no
flush-and-wait until end of transaction (as has always been).

 I thought that was the down-side to WAL logging hint bits, not the WAL
 volume itself.

I don't think this is true either.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-11-12 Thread Bruce Momjian
On Thu, Oct 24, 2013 at 11:14:14PM +0300, Heikki Linnakangas wrote:
 On 24.10.2013 23:07, Josh Berkus wrote:
 On 10/24/2013 11:12 AM, Heikki Linnakangas wrote:
 On 24.10.2013 20:39, Josh Berkus wrote:
 On 10/24/2013 04:15 AM, Pavan Deolasee wrote:
 If we do what you are suggesting, it seems like a single line patch
 to me.
 In XLogSaveBufferForHint(), we probably need to look at this
 additional GUC
 to decide whether or not to backup the block.
 
 Wait, what?  Why are we having an additional GUC?
 
 I'm opposed to the idea of having a GUC to enable failback.  When would
 anyone using replication ever want to disable that?
 
 For example, if you're not replicating for high availability purposes,
 but to keep a reporting standby up-to-date.
 
 What kind of overhead are we talking about here?  You probably said, but
 I've had a mail client meltdown and lost a lot of my -hackers emails.
 
 One extra WAL record whenever a hint bit is set on a page, for the
 first time after a checkpoint. In other words, a WAL record needs to
 be written in the same circumstances as with page checksums, but the
 WAL records are much smaller as they don't need to contain a full
 page image, just the block number of the changed block.
 
 Or maybe we'll write the full page image after all, like with page
 checksums, just without calculating the checksums. It might be
 tricky to skip the full-page image, because then a subsequent change
 of the page (which isn't just a hint-bit update) needs to somehow
 know it needs to take a full page image even though a WAL record for
 it was already written.

Sorry to be replying late to this, but while I am not worried about the
additional WAL volume, does this change require the transaction to now
wait for a WAL sync to disk before continuing?  I thought that was the
down-side to WAL logging hint bits, not the WAL volume itself.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-11-02 Thread Sawada Masahiko
On Fri, Oct 25, 2013 at 8:08 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-10-24 13:51:52 -0700, Josh Berkus wrote:

 It entirely depends on your workload. If it happens to be something
 like:
 INSERT INTO table (lots_of_data);
 CHECKPOINT;
 SELECT * FROM TABLE;

 i.e. there's a checkpoint between loading the data and reading it - not
 exactly all that uncommon - we'll need to log something for every
 page. That can be rather noticeable. Especially as I think it will be
 rather hard to log anything but a real FPI.

 I really don't think everyone will want this. I am absolutely not
 against providing an option to log enough information to make pg_rewind
 work, but I think providing a command to do *safe* *planned* failover
 will help in many more.


I think it is better providing as option to log enough information
such as new wal_level.
If user doesn't realize until it's too late, such information is
contained in checkpoint record?
For example if checkpoint record contained information of wal_level
then we can inform to user
using by such information.

BTW this information is useful only for pg_rewind?
Is there for anything else?
(Sorry it might has already been discussed..)

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-25 Thread Michael Paquier
On Fri, Oct 25, 2013 at 5:57 AM, Magnus Hagander mag...@hagander.net wrote:
 In fact I've been considering suggesting we might want to retire the
 difference between archive and hot_standby as wal_level, because the
 difference is usually so small. And the advantage of hot_standby is in
 almost every case worth it. Even in the archive recovery mode, being
 able to do pause_at_recovery_target is extremely useful. And as you
 say in (c) above, many users don't realize that until it's too late.
+1 on removing archive from wal_level. Having both archive and
hot_standby for wal_level is confusing, and if I recall correctly
hot_standby and archive have been kept as possible settings only to
protect people from bugs that the newly-introduced hot_standby could
introduce due to the few WAL records it adds. But it has been a couple
of releases since there have been no such bugs, no?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-25 Thread Andres Freund
On 2013-10-24 13:51:52 -0700, Josh Berkus wrote:
 On 10/24/2013 01:14 PM, Heikki Linnakangas wrote:
  One extra WAL record whenever a hint bit is set on a page, for the first
  time after a checkpoint. In other words, a WAL record needs to be
  written in the same circumstances as with page checksums, but the WAL
  records are much smaller as they don't need to contain a full page
  image, just the block number of the changed block.
  
  Or maybe we'll write the full page image after all, like with page
  checksums, just without calculating the checksums. It might be tricky to
  skip the full-page image, because then a subsequent change of the page
  (which isn't just a hint-bit update) needs to somehow know it needs to
  take a full page image even though a WAL record for it was already written.
 
 I think it would be worth estimating what this actually looks like in
 terms of log write quantity.  My inclication is to say that if it
 increases log writes less than 10%, we don't need to provide an option
 to turn it off.

It entirely depends on your workload. If it happens to be something
like:
INSERT INTO table (lots_of_data);
CHECKPOINT;
SELECT * FROM TABLE;

i.e. there's a checkpoint between loading the data and reading it - not
exactly all that uncommon - we'll need to log something for every
page. That can be rather noticeable. Especially as I think it will be
rather hard to log anything but a real FPI.

I really don't think everyone will want this. I am absolutely not
against providing an option to log enough information to make pg_rewind
work, but I think providing a command to do *safe* *planned* failover
will help in many more.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-25 Thread Andres Freund
On 2013-10-24 22:57:29 +0200, Magnus Hagander wrote:
 In fact I've been considering suggesting we might want to retire the
 difference between archive and hot_standby as wal_level, because the
 difference is usually so small. And the advantage of hot_standby is in
 almost every case worth it. Even in the archive recovery mode, being
 able to do pause_at_recovery_target is extremely useful. And as you
 say in (c) above, many users don't realize that until it's too late.

+1.

On 2013-10-25 15:16:30 +0900, Michael Paquier wrote:
 But it has been a couple of releases since there have been no such
 bugs, no?

One 'no' too much? Anyway, I think there have been more recent ones, but
it's infrequent enough that we can remove the level anyway.

FWIW, I've wondered if we shouldn't remove most of the EnableHotStandby
checks in xlog.c. There are way too many difference how StartupXLOG
behaves depending on HS.
E.g. I quite dislike that we do stuff like StartupCLOG at entirely
different times during recovery.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Pavan Deolasee
On Mon, Oct 21, 2013 at 7:10 PM, Sawada Masahiko sawada.m...@gmail.comwrote:



 I agree with you.
 If writing FPW is not large performance degradation, it is just idea
 that we can use to write FPW in same timing as checksum enabled.
 i.g., if we support new wal_level, the system writes FPW when a simple
 SELECT updates hint bits. but checksum function is disabled.
 Thought?


I wonder if its too much for this purpose. In fact, we just need a way to
know that a block could have been written on the master which the standby
never saw. So even WAL logging just the block id should be good enough for
pg_rewind to be able to detect and later copy that block from the new
master. Having said that, I don't know if there is general advantage of WAL
logging the exact hint bit update operation for other reasons.

Another difference AFAICS is that checksum feature needs the block to be
backed up only after the first time a hint bit is updated after checkpoint.
But for something like pg_rewind to work, we will need to WAL log every
hint bit update on a page. So we would want to keep it as short as possible.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Heikki Linnakangas

On 24.10.2013 13:02, Pavan Deolasee wrote:

Another difference AFAICS is that checksum feature needs the block to be
backed up only after the first time a hint bit is updated after checkpoint.
But for something like pg_rewind to work, we will need to WAL log every
hint bit update on a page. So we would want to keep it as short as possible.


To fix that, pg_rewind could always start the rewinding process from the 
last checkpoint before the point that the histories diverge, instead of 
the exact point of divergence. That would make the rewinding more 
expensive as it needs to read through a lot more WAL, but I think it 
would still be OK.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Pavan Deolasee
On Thu, Oct 24, 2013 at 4:22 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 .


 To fix that, pg_rewind could always start the rewinding process from the
 last checkpoint before the point that the histories diverge, instead of the
 exact point of divergence.


Is that something required even if someone plans to use pg_rewind for a
cluster with checksums enabled ? I mean since only first update after
checkpoint is WAL logged, pg_rewind will break if another update happens
after standby forks. Or would the recovery logic apply first WAL without
looking at the page lsn ? (Sorry, may be I should read the code instead of
asking you)

If we do what you are suggesting, it seems like a single line patch to me.
In XLogSaveBufferForHint(), we probably need to look at this additional GUC
to decide whether or not to backup the block.

That would make the rewinding more expensive as it needs to read through a
 lot more WAL, but I think it would still be OK.


Yeah, probably you are right. Though the amount of additional work could be
significantly higher and some testing might be warranted.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Pavan Deolasee
On Thu, Oct 24, 2013 at 4:45 PM, Pavan Deolasee pavan.deola...@gmail.comwrote:

 . Or would the recovery logic apply first WAL without looking at the page
 lsn ? (Sorry, may be I should read the code instead of asking you)


Never mind. I realized it has to. That's the whole purpose of backing it up
in the first place.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Heikki Linnakangas

On 24.10.2013 14:15, Pavan Deolasee wrote:

On Thu, Oct 24, 2013 at 4:22 PM, Heikki Linnakangashlinnakan...@vmware.com

wrote:



To fix that, pg_rewind could always start the rewinding process from the
last checkpoint before the point that the histories diverge, instead of the
exact point of divergence.


Is that something required even if someone plans to use pg_rewind for a
cluster with checksums enabled ? I mean since only first update after
checkpoint is WAL logged, pg_rewind will break if another update happens
after standby forks.


Yes. It's broken as it is, even when checksums are enabled - good catch. 
I'll go change it to read all the WAL in the target starting from the 
last checkpoint before the point of divergence.



Or would the recovery logic apply first WAL without
looking at the page lsn ? (Sorry, may be I should read the code instead of
asking you)


WAL recovery does apply all the full-page images without looking at the 
page LSN, but that doesn't help in this case. pg_rewind copies over the 
blocks from the source server (= promoted standby) that were changed in 
the target server (= old master), after the standby's history diverged 
from it. In other words, it reverts the blocks that were changed in the 
old master, by copying them over from the promoted standby. After that, 
WAL recovery is performed, using the WAL from the promoted standby, to 
apply all the changes from the promoted standby that were not present in 
the old master. But it never replays any WAL from the old master. It 
reads it through, to construct the list of blocks that were modified, 
but it doesn't apply them.



If we do what you are suggesting, it seems like a single line patch to me.
In XLogSaveBufferForHint(), we probably need to look at this additional GUC
to decide whether or not to backup the block.


Yeah, it's trivial to add such a guc. Will just have to figure out what 
we want the user interface to be like; should it be a separate guc, or 
somehow cram it into wal_level?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Pavan Deolasee
On Thu, Oct 24, 2013 at 5:45 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:



  Will just have to figure out what we want the user interface to be like;
 should it be a separate guc, or somehow cram it into wal_level?


Yeah, I had brought up similar idea up thread. Right now wal_level is
nicely ordered. But with this additional logic, I am not sure if we would
need multiple new levels and also break that ordering (I don't know if its
important). For example, one may want to set up streaming replication
with/without this feature or hot standby with/without the feature. I don't
have a good idea about how to capture them in wal_level. May be something
like: minimal, archive, archive_with_this_new_feature, hot_standby and
hot_standby_with_this_new_feature.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Alvaro Herrera
Pavan Deolasee escribió:

 Yeah, I had brought up similar idea up thread. Right now wal_level is
 nicely ordered. But with this additional logic, I am not sure if we would
 need multiple new levels and also break that ordering (I don't know if its
 important). For example, one may want to set up streaming replication
 with/without this feature or hot standby with/without the feature. I don't
 have a good idea about how to capture them in wal_level. May be something
 like: minimal, archive, archive_with_this_new_feature, hot_standby and
 hot_standby_with_this_new_feature.

That's confusing.  A separate GUC sounds better.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Josh Berkus
On 10/24/2013 04:15 AM, Pavan Deolasee wrote:
 If we do what you are suggesting, it seems like a single line patch to me.
 In XLogSaveBufferForHint(), we probably need to look at this additional GUC
 to decide whether or not to backup the block.

Wait, what?  Why are we having an additional GUC?

I'm opposed to the idea of having a GUC to enable failback.  When would
anyone using replication ever want to disable that?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Heikki Linnakangas

On 24.10.2013 20:39, Josh Berkus wrote:

On 10/24/2013 04:15 AM, Pavan Deolasee wrote:

If we do what you are suggesting, it seems like a single line patch to me.
In XLogSaveBufferForHint(), we probably need to look at this additional GUC
to decide whether or not to backup the block.


Wait, what?  Why are we having an additional GUC?

I'm opposed to the idea of having a GUC to enable failback.  When would
anyone using replication ever want to disable that?


For example, if you're not replicating for high availability purposes, 
but to keep a reporting standby up-to-date.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Josh Berkus
On 10/24/2013 11:12 AM, Heikki Linnakangas wrote:
 On 24.10.2013 20:39, Josh Berkus wrote:
 On 10/24/2013 04:15 AM, Pavan Deolasee wrote:
 If we do what you are suggesting, it seems like a single line patch
 to me.
 In XLogSaveBufferForHint(), we probably need to look at this
 additional GUC
 to decide whether or not to backup the block.

 Wait, what?  Why are we having an additional GUC?

 I'm opposed to the idea of having a GUC to enable failback.  When would
 anyone using replication ever want to disable that?
 
 For example, if you're not replicating for high availability purposes,
 but to keep a reporting standby up-to-date.

What kind of overhead are we talking about here?  You probably said, but
I've had a mail client meltdown and lost a lot of my -hackers emails.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Heikki Linnakangas

On 24.10.2013 23:07, Josh Berkus wrote:

On 10/24/2013 11:12 AM, Heikki Linnakangas wrote:

On 24.10.2013 20:39, Josh Berkus wrote:

On 10/24/2013 04:15 AM, Pavan Deolasee wrote:

If we do what you are suggesting, it seems like a single line patch
to me.
In XLogSaveBufferForHint(), we probably need to look at this
additional GUC
to decide whether or not to backup the block.


Wait, what?  Why are we having an additional GUC?

I'm opposed to the idea of having a GUC to enable failback.  When would
anyone using replication ever want to disable that?


For example, if you're not replicating for high availability purposes,
but to keep a reporting standby up-to-date.


What kind of overhead are we talking about here?  You probably said, but
I've had a mail client meltdown and lost a lot of my -hackers emails.


One extra WAL record whenever a hint bit is set on a page, for the first 
time after a checkpoint. In other words, a WAL record needs to be 
written in the same circumstances as with page checksums, but the WAL 
records are much smaller as they don't need to contain a full page 
image, just the block number of the changed block.


Or maybe we'll write the full page image after all, like with page 
checksums, just without calculating the checksums. It might be tricky to 
skip the full-page image, because then a subsequent change of the page 
(which isn't just a hint-bit update) needs to somehow know it needs to 
take a full page image even though a WAL record for it was already written.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Josh Berkus
On 10/24/2013 01:14 PM, Heikki Linnakangas wrote:
 One extra WAL record whenever a hint bit is set on a page, for the first
 time after a checkpoint. In other words, a WAL record needs to be
 written in the same circumstances as with page checksums, but the WAL
 records are much smaller as they don't need to contain a full page
 image, just the block number of the changed block.
 
 Or maybe we'll write the full page image after all, like with page
 checksums, just without calculating the checksums. It might be tricky to
 skip the full-page image, because then a subsequent change of the page
 (which isn't just a hint-bit update) needs to somehow know it needs to
 take a full page image even though a WAL record for it was already written.

I think it would be worth estimating what this actually looks like in
terms of log write quantity.  My inclication is to say that if it
increases log writes less than 10%, we don't need to provide an option
to turn it off.

The reasons I don't want to provide a disabling GUC are:
a) more GUCs
b) confusing users
c) causing users to disable rewind *until they need it*, at which point
it's too late to enable it.

So if there's any way we can avoid having a GUC for this, I'm for it.
And if we do have a GUC, failback should be enabled by default.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Magnus Hagander
On Thu, Oct 24, 2013 at 10:51 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/24/2013 01:14 PM, Heikki Linnakangas wrote:
 One extra WAL record whenever a hint bit is set on a page, for the first
 time after a checkpoint. In other words, a WAL record needs to be
 written in the same circumstances as with page checksums, but the WAL
 records are much smaller as they don't need to contain a full page
 image, just the block number of the changed block.

 Or maybe we'll write the full page image after all, like with page
 checksums, just without calculating the checksums. It might be tricky to
 skip the full-page image, because then a subsequent change of the page
 (which isn't just a hint-bit update) needs to somehow know it needs to
 take a full page image even though a WAL record for it was already written.

 I think it would be worth estimating what this actually looks like in
 terms of log write quantity.  My inclication is to say that if it
 increases log writes less than 10%, we don't need to provide an option
 to turn it off.

 The reasons I don't want to provide a disabling GUC are:
 a) more GUCs
 b) confusing users
 c) causing users to disable rewind *until they need it*, at which point
 it's too late to enable it.

 So if there's any way we can avoid having a GUC for this, I'm for it.
 And if we do have a GUC, failback should be enabled by default.

+1 on the principle.

In fact I've been considering suggesting we might want to retire the
difference between archive and hot_standby as wal_level, because the
difference is usually so small. And the advantage of hot_standby is in
almost every case worth it. Even in the archive recovery mode, being
able to do pause_at_recovery_target is extremely useful. And as you
say in (c) above, many users don't realize that until it's too late.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-24 Thread Sawada Masahiko
On Fri, Oct 25, 2013 at 5:57 AM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 24, 2013 at 10:51 PM, Josh Berkus j...@agliodbs.com wrote:
 On 10/24/2013 01:14 PM, Heikki Linnakangas wrote:

 I think it would be worth estimating what this actually looks like in
 terms of log write quantity.  My inclication is to say that if it
 increases log writes less than 10%, we don't need to provide an option
 to turn it off.

 The reasons I don't want to provide a disabling GUC are:
 a) more GUCs
 b) confusing users
 c) causing users to disable rewind *until they need it*, at which point
 it's too late to enable it.

 So if there's any way we can avoid having a GUC for this, I'm for it.
 And if we do have a GUC, failback should be enabled by default.

 +1 on the principle.

 In fact I've been considering suggesting we might want to retire the
 difference between archive and hot_standby as wal_level, because the
 difference is usually so small. And the advantage of hot_standby is in
 almost every case worth it. Even in the archive recovery mode, being
 able to do pause_at_recovery_target is extremely useful. And as you
 say in (c) above, many users don't realize that until it's too late.


+1.

Many user would not realize that it is too late If we will provide it
as additional GUC.
And I agree with writing log including the block number of the changed block.
I think that writing log is not lead huge overhead increase.
Is those WAL record replicated to the standby server in synchronous (
of course when configuring sync replication)?
I am concerned that it lead performance overhead with such as
executing SELECT or auto vacuum. especially, when two servers are in
far location.


Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-21 Thread Sawada Masahiko
On Thu, Oct 10, 2013 at 1:41 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:


 Not that I can find any flaw in the OP's patch, but given the major
 objections and my own nervousness about documenting this new failback safe
 standby mode, I am also inclining to improve pg_rewind or whatever it takes
 to get it working. Clearly at first we need to have an optional mechanism to
 WAL log hint bit updates. There seems to be two ways to do that:

 a. Add a new GUC which can turned on/off and requires server restart to take
 effect
 b. Add another option for wal_level setting.

 (b) looks better, but I am not sure if we want to support this new level
 with and without hot standby. If latter, we will need multiple new levels to
 differentiate all those cases. I am OK with supporting it only with hot
 standby which is probably what most people do with streaming replication
 anyway.

 The other issue is to how to optimally WAL log hint bit updates:

 a. Should we have separate WAL records just for the purpose or should we
 piggyback them on heap update/delete/prune etc WAL records ? Of course,
 there will be occasions when a simple SELECT also updates hint bits, so most
 likely we will need a separate WAL record anyhow.
 b. Does it make sense to try to all hint bits in a page if we are WAL
 logging it anyways ? I think we have discussed this idea even before just to
 minimize the number of writes a heap page receives when hint bits of
 different tuples are set at different times, each update triggering a fresh
 write. I don't remember whats the consensus for that, but it might be
 worthwhile to reconsider that option if we are WAL logging the hint bit
 updates.

I agree with you.
If writing FPW is not large performance degradation, it is just idea
that we can use to write FPW in same timing as checksum enabled.
i.g., if we support new wal_level, the system writes FPW when a simple
SELECT updates hint bits. but checksum function is disabled.
Thought?


 We will definitely need some amount of performance benchmarks even if this
 is optional. But are there other things to worry about ? Any strong
 objections to this idea or any other stow stopper for pg_rewind itself ?



-- 
Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-09 Thread Sawada Masahiko
On Tue, Oct 8, 2013 at 6:37 PM, Pavan Deolasee pavan.deola...@gmail.com wrote:

 On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:

 On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote:
 
 I attached the v12 patch which have modified based on above suggestions.


 There are still some parts of this design/patch which I am concerned about.

 1. The design clubs synchronous standby and failback safe standby rather
 very tightly. IIRC this is based on the feedback you received early, so my
 apologies for raising it again so late.
  a. GUC synchrnous_standby_names is used to name synchronous as well as
 failback safe standbys. I don't know if that will confuse users.

With currently the patch, user can specify the failback safe standby
and sync replication standby at same server.
synchronous_standby_names
So I was thinking that it will not confuse users.

  b. synchronous_commit's value will also control whether a sync/async
 failback safe standby wait for remote write or flush. Is that reasonable ?
 Or should there be a different way to configure the failback safe standby's
 WAL safety ?

synchronous_commit's values can not control whether sync sync/async
failback safe standby wait level.
On data page flush, failback safe standby waits for only flush.
Should we also allow to wait for remote write?


 2. With the current design/implementation, user can't configure a
 synchronous and an async failback safe standby at the same time. I think we
 discussed this earlier and there was an agreement on the limitation. Just
 wanted to get that confirmed again.


yes, user can't configure sync standby and async failback safe standby
at the same time.
The currently patch supports following cases.
- sync standby and make same as failback safe standby
- async standby and make same as failback safe standby


 3. SyncRepReleaseWaiters() does not know whether its waking up backends
 waiting for sync rep or failback safe rep. Is that ok ? For example, I found
 that the elog() message announcing next takeover emitted by the function may
 look bad. Since changing synchronous_transfer requires server restart, we
 can teach SyncRepReleaseWaiters() to look at that parameter to figure out
 whether the standby is sync and/or failback safe standby.

I agree with you.
Are you saying about following comment?
if (announce_next_takeover)
{
announce_next_takeover = false;
ereport(LOG,
(errmsg(standby \%s\ is now the synchronous standby
with priority %u,
application_name,
MyWalSnd-sync_standby_priority)));
}


 4. The documentation still need more work to clearly explain the use case.

Understood. we will more work to clearly explain the use case.


 5. Have we done any sort of stress testing of the patch ? If there is a bug,
 the data corruption at the master can go unnoticed. So IMHO we need many
 crash recovery tests to ensure that the patch is functionally correct.


I have done several testing of the patch. And I have confirmed that
data page is not flushed to disk
when the master server has not been receive the reply from the standby
server. I used pg_filedump.
To ensure that the patch is functionally correct, what test should we do?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-09 Thread Sawada Masahiko
On Tue, Oct 8, 2013 at 6:46 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-10-08 15:07:02 +0530, Pavan Deolasee wrote:
 On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.comwrote:

  On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote:
  
  I attached the v12 patch which have modified based on above suggestions.
 

 There are still some parts of this design/patch which I am concerned about.

 1. The design clubs synchronous standby and failback safe standby rather
 very tightly. IIRC this is based on the feedback you received early, so my
 apologies for raising it again so late.

 It is my impression that there still are several people having pretty
 fundamental doubts about this approach in general. From what I remember
 neither Heikki, Simon, Tom nor me were really convinced about this
 approach.


Thank you for comment.
We are thinking that this approach can solve the real problem.
Actually we have confirm the effect of this approach. The master
server flushes data page to disk
after the master server received reply from the standby server.
If you have concern or doubt in technical side, Could you tell me it?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-09 Thread Samrat Revagade
On Tue, Oct 8, 2013 at 3:16 PM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-10-08 15:07:02 +0530, Pavan Deolasee wrote:
  On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:
 
   On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com
 wrote:
   
   I attached the v12 patch which have modified based on above
 suggestions.
  
 
  There are still some parts of this design/patch which I am concerned
 about.
 
  1. The design clubs synchronous standby and failback safe standby rather
  very tightly. IIRC this is based on the feedback you received early, so
 my
  apologies for raising it again so late.

 It is my impression that there still are several people having pretty
 fundamental doubts about this approach in general. From what I remember
 neither Heikki, Simon, Tom nor me were really convinced about this
 approach.



Listing down all objections and their solutions:

Major Objection on the proposal:
* Tom Lane*
# additional complexity to the code it will cause performance overhead - On
an average it causes 0.5 - 1% performance overhead for fast transaction
workload, as the wait is mostly on backend process. The latest re-factored
code, looks less complex.
# Use of rsync with checksum - but many pages on the two servers may differ
in their binary values because of hint bits

*Heikki :*
# Use pg_rewind to do the same:
It has well known problem of hint bit updates.
If we use this we need enable checksums or explicitly WAL log hint bits
which leads to performance overhead

*Amit Kapila*
# How to take care of extra WAL on old master during recovery.?
we can solve this by deleting all WAL file when old master before it starts
as new standby.

*Simon Riggs*
# Renaming patch - done
# remove extra set of parameters - done
# performance drop - On an average it causes 0.5 - 1% performance overhead
for fast transaction workload, as the wait is mostly on backend process.
# The way of configuring standby - with synchronous_transfer parameter we
can configure 4 types of standby servers depending on the need.

*Fujii Masao*
# how patch interacts with cascaded standby - patch works same as
synchronous replication
# CHECKPOINT in the standby, it got stuck infinitely. - fixed this
# Complicated conditions in SyncRepWaitForLSN() – code has been refactored
in v11
# Improve source code comments - done

*Pavan Deolasee*
 # Interaction of synchronous_commit with synchronous_transfer
- Now synchronous_commit only controls whether and how
to wait for the standby only when a transaction
commits. synchronous_transfer OTOH tells how to interpret the standby
listed in
synchronous_standbys parameter.
 # Further Improvements in the documentation - we will do that
 # More stress testing - we will do that

Any inputs on stress testing would help.


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-09 Thread Robert Haas
On Wed, Oct 9, 2013 at 4:54 AM, Samrat Revagade
revagade.sam...@gmail.com wrote:
 On Tue, Oct 8, 2013 at 3:16 PM, Andres Freund and...@2ndquadrant.com
 wrote:
 On 2013-10-08 15:07:02 +0530, Pavan Deolasee wrote:
  On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko
  sawada.m...@gmail.comwrote:
 
   On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com
   wrote:
   
   I attached the v12 patch which have modified based on above
   suggestions.
  
 
  There are still some parts of this design/patch which I am concerned
  about.
 
  1. The design clubs synchronous standby and failback safe standby rather
  very tightly. IIRC this is based on the feedback you received early, so
  my
  apologies for raising it again so late.

 It is my impression that there still are several people having pretty
 fundamental doubts about this approach in general. From what I remember
 neither Heikki, Simon, Tom nor me were really convinced about this
 approach.



 Listing down all objections and their solutions:

 Major Objection on the proposal:
 * Tom Lane*
 # additional complexity to the code it will cause performance overhead - On
 an average it causes 0.5 - 1% performance overhead for fast transaction
 workload, as the wait is mostly on backend process. The latest re-factored
 code, looks less complex.
 # Use of rsync with checksum - but many pages on the two servers may differ
 in their binary values because of hint bits

 *Heikki :*
 # Use pg_rewind to do the same:
 It has well known problem of hint bit updates.
 If we use this we need enable checksums or explicitly WAL log hint bits
 which leads to performance overhead

 *Amit Kapila*
 # How to take care of extra WAL on old master during recovery.?
 we can solve this by deleting all WAL file when old master before it starts
 as new standby.

 *Simon Riggs*
 # Renaming patch - done
 # remove extra set of parameters - done
 # performance drop - On an average it causes 0.5 - 1% performance overhead
 for fast transaction workload, as the wait is mostly on backend process.
 # The way of configuring standby - with synchronous_transfer parameter we
 can configure 4 types of standby servers depending on the need.

 *Fujii Masao*
 # how patch interacts with cascaded standby - patch works same as
 synchronous replication
 # CHECKPOINT in the standby, it got stuck infinitely. - fixed this
 # Complicated conditions in SyncRepWaitForLSN() – code has been refactored
 in v11
 # Improve source code comments - done

 *Pavan Deolasee*
  # Interaction of synchronous_commit with synchronous_transfer - Now
 synchronous_commit only controls whether and how

 to wait for the standby only when a transaction commits.
 synchronous_transfer OTOH tells how to interpret the standby listed in
 synchronous_standbys parameter.
  # Further Improvements in the documentation - we will do that
  # More stress testing - we will do that

 Any inputs on stress testing would help.

The point is that when there are at least four senior community
members expressing serious objections to a concept, three of whom are
committes, we shouldn't be considering committing it until at least
some of those people have withdrawn your objections.  Nearly all patch
submitters are in favor of their own patches; that does not entitle
them to have those patches, committed even if there is a committer who
agrees with them.  There needs to be a real consensus on the path
forward.  If that policy ever changes, I have my own list of things
that are on the cutting-room floor that I'll be happy to resurrect.

Personally, I don't have a strong opinion on this patch because I have
not followed it closely enough.  But if Tom, Heikki, Simon, and Andres
are all unconvinced that this is a good direction, then put me down
for a -1 vote as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-09 Thread Pavan Deolasee
On Tue, Oct 8, 2013 at 9:22 PM, Heikki Linnakangas
hlinnakan...@vmware.comwrote:



 Yeah, I definitely think we should work on the pg_rewind approach instead
 of this patch. It's a lot more flexible. The performance hit of WAL-logging
 hint bit updates is the price you have to pay, but a lot of people were OK
 with that to get page checksum, so I think a lot of people would be OK with
 it for this purpose too. As long as it's optional, of course. And anyone
 using page checksums are already paying that price.


Not that I can find any flaw in the OP's patch, but given the major
objections and my own nervousness about documenting this new failback
safe standby mode, I am also inclining to improve pg_rewind or whatever it
takes to get it working. Clearly at first we need to have an optional
mechanism to WAL log hint bit updates. There seems to be two ways to do
that:

a. Add a new GUC which can turned on/off and requires server restart to
take effect
b. Add another option for wal_level setting.

(b) looks better, but I am not sure if we want to support this new level
with and without hot standby. If latter, we will need multiple new levels
to differentiate all those cases. I am OK with supporting it only with hot
standby which is probably what most people do with streaming replication
anyway.

The other issue is to how to optimally WAL log hint bit updates:

a. Should we have separate WAL records just for the purpose or should we
piggyback them on heap update/delete/prune etc WAL records ? Of course,
there will be occasions when a simple SELECT also updates hint bits, so
most likely we will need a separate WAL record anyhow.
b. Does it make sense to try to all hint bits in a page if we are WAL
logging it anyways ? I think we have discussed this idea even before just
to minimize the number of writes a heap page receives when hint bits of
different tuples are set at different times, each update triggering a fresh
write. I don't remember whats the consensus for that, but it might be
worthwhile to reconsider that option if we are WAL logging the hint bit
updates.

We will definitely need some amount of performance benchmarks even if this
is optional. But are there other things to worry about ? Any strong
objections to this idea or any other stow stopper for pg_rewind itself ?

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-08 Thread Sawada Masahiko
On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote:

 You added several checks into SyncRepWaitForLSN() so that it can handle both
 synchronous_transfer=data_flush and =commit. This change made the source code
 of the function very complicated, I'm afraid. To simplify the source code,
 what about just adding new wait-for-lsn function for data_flush instead of
 changing SyncRepWaitForLSN()? Obviously that new function and
 SyncRepWaitForLSN()
 have the common part. I think that it should be extracted as separate 
 function.

Thank you for reviewing and comment!

yes I agree with you.
I attached the v12 patch which have modified based on above suggestions.
- Added new function SyncRepTransferWaitForLSN() and SyncRepWait()
  SyncRepTransferWaitForLSN() is called on date page flush. OTOH,
SyncRepWatiForLSN() is called on transaction commit.
  And both functions call the SyncRepWait() after check whether sync
commit/transfer is requested.
  Practically server will waits at SyncRepWait().


 + * Note that if sync transfer is requested, we can't regular
 maintenance until
 + * standbys to connect.
   */
 -if (synchronous_commit  SYNCHRONOUS_COMMIT_LOCAL_FLUSH)
 +if (synchronous_commit  SYNCHRONOUS_COMMIT_LOCAL_FLUSH 
 !SyncTransRequested())

 Per discussion with Pavan, ISTM we don't need to avoid setting
 synchronous_commit
 to local even if synchronous_transfer is data_flush. But you did that here. 
 Why?

I made a mistake. I have removed it.


 When synchronous_transfer = data_flush, anti-wraparound vacuum can be blocked.
 Is this safe?


In the new version patch, when synchronous_transfer = data_flush/all
AND synchronous_standby_names is set,
vacuum is blocked.
This behaviour of synchronous_transfer similar to synchronous_commit.
Should this allow to do anti-wraparound vacuum even if
synchronous_transfer = data_flush/all?
If so, it also allow to flush the data page while doing vacuum?

 +#synchronous_transfer = commit# data page synchronization level
 +# commit, data_flush or all

 This comment seems confusing. I think that this parameter specifies when to
 wait for replication.

 +typedef enum
 +{
 +SYNCHRONOUS_TRANSFER_COMMIT,/* no wait for flush data page */
 +SYNCHRONOUS_TRANSFER_DATA_FLUSH,/* wait for data page flush only
 + * no wait for WAL */
 +SYNCHRONOUS_TRANSFER_ALL/* wait for data page flush and WAL*/
 +}SynchronousTransferLevel;

 These comments also seem confusing. For example, I think that the meaning of
 SYNCHRONOUS_TRANSFER_COMMIT is something like wait for replication on
 transaction commit.

Those comments are modified in new patch.


 @@ -521,6 +531,13 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
   */
  XLogFlush(lsn);

 +/*
 + * If synchronous transfer is requested, wait for failback safe 
 standby
 + * to receive WAL up to lsn.
 + */
 +if (SyncTransRequested())
 +SyncRepWaitForLSN(lsn, true, true);

 If smgr_redo() is called only during recovery, SyncRepWaitForLSN() doesn't 
 need
 to be called here.

Thank you for info.
I have removed it at smgr_redo().

Regards,

---
Sawada Masahiko


synchronous_transfer_v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-08 Thread Pavan Deolasee
On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.comwrote:

 On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote:
 
 I attached the v12 patch which have modified based on above suggestions.


There are still some parts of this design/patch which I am concerned about.

1. The design clubs synchronous standby and failback safe standby rather
very tightly. IIRC this is based on the feedback you received early, so my
apologies for raising it again so late.
 a. GUC synchrnous_standby_names is used to name synchronous as well as
failback safe standbys. I don't know if that will confuse users.
 b. synchronous_commit's value will also control whether a sync/async
failback safe standby wait for remote write or flush. Is that reasonable ?
Or should there be a different way to configure the failback safe standby's
WAL safety ?

2. With the current design/implementation, user can't configure a
synchronous and an async failback safe standby at the same time. I think we
discussed this earlier and there was an agreement on the limitation. Just
wanted to get that confirmed again.

3. SyncRepReleaseWaiters() does not know whether its waking up backends
waiting for sync rep or failback safe rep. Is that ok ? For example, I
found that the elog() message announcing next takeover emitted by the
function may look bad. Since changing synchronous_transfer requires server
restart, we can teach SyncRepReleaseWaiters() to look at that parameter to
figure out whether the standby is sync and/or failback safe standby.

4. The documentation still need more work to clearly explain the use case.

5. Have we done any sort of stress testing of the patch ? If there is a
bug, the data corruption at the master can go unnoticed. So IMHO we need
many crash recovery tests to ensure that the patch is functionally correct.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-08 Thread Andres Freund
On 2013-10-08 15:07:02 +0530, Pavan Deolasee wrote:
 On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.comwrote:
 
  On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote:
  
  I attached the v12 patch which have modified based on above suggestions.
 
 
 There are still some parts of this design/patch which I am concerned about.
 
 1. The design clubs synchronous standby and failback safe standby rather
 very tightly. IIRC this is based on the feedback you received early, so my
 apologies for raising it again so late.

It is my impression that there still are several people having pretty
fundamental doubts about this approach in general. From what I remember
neither Heikki, Simon, Tom nor me were really convinced about this
approach.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-08 Thread Pavan Deolasee
On Tue, Oct 8, 2013 at 3:16 PM, Andres Freund and...@2ndquadrant.comwrote:



 It is my impression that there still are several people having pretty
 fundamental doubts about this approach in general. From what I remember
 neither Heikki, Simon, Tom nor me were really convinced about this
 approach.


IIRC you and Tom were particularly skeptical about the approach. But do you
see a technical flaw or a show stopper with the approach ? Heikki has
written pg_rewind which is really very cool. But it fails to handle the
hint bit updates which are not WAL logged unless of course checksums are
turned on. We can have a GUC controlled option to  turn WAL logging on for
hint bit updates and then use pg_rewind for the purpose. But I did not see
any agreement on that either. Performance implication of WAL logging every
hint bit update could be huge.

Simon has raised usability concerns that Sawada-san and Samrat have tried
to address by following his suggestions. I am not fully convinced though we
have got that right. But then there is hardly any feedback on that aspect
lately.

In general, from the discussion it seems that the patch is trying to solve
a real problem. Even though Tom and you feel that rsync is probably good
enough and more trustworthy than any other approach, my feeling is that
many including Fujii-san still disagree with that argument based on real
user feedback. So where do we go from here ? I think it will really help
Sawada-san and Samrat if we can provide them some solid feedback and
approach to take.

Lately, I was thinking if we could do something else to track file system
updates without relying on WAL inspection and then use pg_rewind to solve
this problem. Some sort of prelaod library mechanism is one such
possibility. But haven't really thought through this entirely.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-08 Thread Heikki Linnakangas

On 08.10.2013 13:00, Pavan Deolasee wrote:

On Tue, Oct 8, 2013 at 3:16 PM, Andres Freundand...@2ndquadrant.comwrote:


It is my impression that there still are several people having pretty
fundamental doubts about this approach in general. From what I remember
neither Heikki, Simon, Tom nor me were really convinced about this
approach.


IIRC you and Tom were particularly skeptical about the approach. But do you
see a technical flaw or a show stopper with the approach ? Heikki has
written pg_rewind which is really very cool. But it fails to handle the
hint bit updates which are not WAL logged unless of course checksums are
turned on. We can have a GUC controlled option to  turn WAL logging on for
hint bit updates and then use pg_rewind for the purpose. But I did not see
any agreement on that either. Performance implication of WAL logging every
hint bit update could be huge.


Yeah, I definitely think we should work on the pg_rewind approach 
instead of this patch. It's a lot more flexible. The performance hit of 
WAL-logging hint bit updates is the price you have to pay, but a lot of 
people were OK with that to get page checksum, so I think a lot of 
people would be OK with it for this purpose too. As long as it's 
optional, of course. And anyone using page checksums are already paying 
that price.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-04 Thread Fujii Masao
On Fri, Oct 4, 2013 at 1:46 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Fri, Sep 27, 2013 at 6:44 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Fri, Sep 27, 2013 at 5:18 PM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:
 On Fri, Sep 27, 2013 at 1:28 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:


 

 Thank you for comment. I think it is good simple idea.
 In your opinion, if synchronous_transfer is set 'all' and
 synchronous_commit is set 'on',
 the master wait for data flush eve if user sets synchronous_commit to
 'local' or 'off'.
 For example, when user want to do transaction early, user can't do this.
 we leave the such situation as constraint?


 No, user can still override the transaction commit point wait. So if

 synchronous_transfer is set to all:
  - If synchronous_commit is ON - wait at all points
  - If synchronous_commit is OFF - wait only at buffer flush (and other
 related to failback safety) points

 synchronous_transfer is set to data_flush:
  - If synchronous_commit is either ON o OFF - do not wait at commit points,
 but wait at all other points

 synchronous_transfer is set to commit:
  - If synchronous_commit is ON - wait at commit point
  - If synchronous_commit is OFF - do not wait at any point


 Thank you for explain. Understood.
 if synchronous_transfer is set 'all' and user changes
 synchronous_commit to 'off'( or 'local') at a transaction,
 the master server wait at buffer flush, but doesn't wait at commit
 points. Right?

 In currently patch, synchronous_transfer works in cooperation with
 synchronous_commit.
 But if user changes synchronous_commit at a transaction, they are not
 in cooperation.
 So, your idea might be better than currently behaviour of 
 synchronous_transfer.

 I attached the v11 patch which have fixed following contents.

You added several checks into SyncRepWaitForLSN() so that it can handle both
synchronous_transfer=data_flush and =commit. This change made the source code
of the function very complicated, I'm afraid. To simplify the source code,
what about just adding new wait-for-lsn function for data_flush instead of
changing SyncRepWaitForLSN()? Obviously that new function and
SyncRepWaitForLSN()
have the common part. I think that it should be extracted as separate function.

+ * Note that if sync transfer is requested, we can't regular
maintenance until
+ * standbys to connect.
  */
-if (synchronous_commit  SYNCHRONOUS_COMMIT_LOCAL_FLUSH)
+if (synchronous_commit  SYNCHRONOUS_COMMIT_LOCAL_FLUSH 
!SyncTransRequested())

Per discussion with Pavan, ISTM we don't need to avoid setting
synchronous_commit
to local even if synchronous_transfer is data_flush. But you did that here. Why?

When synchronous_transfer = data_flush, anti-wraparound vacuum can be blocked.
Is this safe?

+#synchronous_transfer = commit# data page synchronization level
+# commit, data_flush or all

This comment seems confusing. I think that this parameter specifies when to
wait for replication.

+typedef enum
+{
+SYNCHRONOUS_TRANSFER_COMMIT,/* no wait for flush data page */
+SYNCHRONOUS_TRANSFER_DATA_FLUSH,/* wait for data page flush only
+ * no wait for WAL */
+SYNCHRONOUS_TRANSFER_ALL/* wait for data page flush and WAL*/
+}SynchronousTransferLevel;

These comments also seem confusing. For example, I think that the meaning of
SYNCHRONOUS_TRANSFER_COMMIT is something like wait for replication on
transaction commit.

@@ -521,6 +531,13 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record)
  */
 XLogFlush(lsn);

+/*
+ * If synchronous transfer is requested, wait for failback safe standby
+ * to receive WAL up to lsn.
+ */
+if (SyncTransRequested())
+SyncRepWaitForLSN(lsn, true, true);

If smgr_redo() is called only during recovery, SyncRepWaitForLSN() doesn't need
to be called here.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-10-03 Thread Sawada Masahiko
On Fri, Sep 27, 2013 at 6:44 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Fri, Sep 27, 2013 at 5:18 PM, Pavan Deolasee
 pavan.deola...@gmail.com wrote:
 On Fri, Sep 27, 2013 at 1:28 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:


 

 Thank you for comment. I think it is good simple idea.
 In your opinion, if synchronous_transfer is set 'all' and
 synchronous_commit is set 'on',
 the master wait for data flush eve if user sets synchronous_commit to
 'local' or 'off'.
 For example, when user want to do transaction early, user can't do this.
 we leave the such situation as constraint?


 No, user can still override the transaction commit point wait. So if

 synchronous_transfer is set to all:
  - If synchronous_commit is ON - wait at all points
  - If synchronous_commit is OFF - wait only at buffer flush (and other
 related to failback safety) points

 synchronous_transfer is set to data_flush:
  - If synchronous_commit is either ON o OFF - do not wait at commit points,
 but wait at all other points

 synchronous_transfer is set to commit:
  - If synchronous_commit is ON - wait at commit point
  - If synchronous_commit is OFF - do not wait at any point


 Thank you for explain. Understood.
 if synchronous_transfer is set 'all' and user changes
 synchronous_commit to 'off'( or 'local') at a transaction,
 the master server wait at buffer flush, but doesn't wait at commit
 points. Right?

 In currently patch, synchronous_transfer works in cooperation with
 synchronous_commit.
 But if user changes synchronous_commit at a transaction, they are not
 in cooperation.
 So, your idea might be better than currently behaviour of 
 synchronous_transfer.

I attached the v11 patch which have fixed following contents.
- synchronous_transfer controls to wait at only data flush level,
  synchronous_commit controls to wait at commit level. ( Based on
Pavan suggestion)
- If there are no sync replication standby name, both
synchronous_commit and synchronous_transfer
  don't work.
- Fixed that we didn't  support failback-safe standby.
  Previous patch can not support failback-safe standby. Because the
patch doesn't wait at FlushBuffer
  which is called by autovacuum.

So, if user want to do transaction early temporarily, user need to
change the synchronous_transfer value and reload
postgresql.conf.

Regards,

---
Sawada Masahiko


synchronous_transfer_v11.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-27 Thread Sawada Masahiko
On Thu, Sep 26, 2013 at 8:54 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:

 On Thu, Sep 19, 2013 at 4:02 PM, Fujii Masao masao.fu...@gmail.com wrote:



 Hmm... when synchronous_transfer is set to data_flush,
 IMO the intuitive behaviors are

 (1) synchronous_commit = on
 A data flush should wait for the corresponding WAL to be
 flushed in the standby

 (2) synchronous_commit = remote_write
 A data flush should wait for the corresponding WAL to be
 written to OS in the standby.

 (3) synchronous_commit = local
 (4) synchronous_commit = off
 A data flush should wait for the corresponding WAL to be
 written locally in the master.


 I thought synchronous_commit and synchronous_transfer are kind of orthogonal
 to each other. synchronous_commit only controls whether and how to wait for
 the standby only when a transaction commits. synchronous_transfer OTOH tells
 how to interpret the standby listed in synchronous_standbys parameter. If
 set to commit then they are synchronous standbys (like today). If set to
 data_flush, they are asynchronous failback safe standby and if set to
 all then they are synchronous failback safe standbys. Well, its confusing
 :-(

 So IMHO in the current state of things, the synchronous_transfer GUC can not
 be changed at a session/transaction level since all backends, including
 background workers must honor the settings to guarantee failback safety.
 synchronous_commit still works the same way, but is ignored if
 synchronous_transfer is set to data_flush because that effectively tells
 us that the standbys listed under synchronous_standbys are really *async*
 standbys with failback safety.


Thank you for comment. I think it is good simple idea.
In your opinion, if synchronous_transfer is set 'all' and
synchronous_commit is set 'on',
the master wait for data flush eve if user sets synchronous_commit to
'local' or 'off'.
For example, when user want to do transaction early, user can't do this.
we leave the such situation as constraint?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-27 Thread Pavan Deolasee
On Fri, Sep 27, 2013 at 1:28 PM, Sawada Masahiko sawada.m...@gmail.comwrote:


 

 Thank you for comment. I think it is good simple idea.
 In your opinion, if synchronous_transfer is set 'all' and
 synchronous_commit is set 'on',
 the master wait for data flush eve if user sets synchronous_commit to
 'local' or 'off'.
 For example, when user want to do transaction early, user can't do this.
 we leave the such situation as constraint?


No, user can still override the transaction commit point wait. So if

synchronous_transfer is set to all:
 - If synchronous_commit is ON - wait at all points
 - If synchronous_commit is OFF - wait only at buffer flush (and other
related to failback safety) points

synchronous_transfer is set to data_flush:
 - If synchronous_commit is either ON o OFF - do not wait at commit points,
but wait at all other points

synchronous_transfer is set to commit:
 - If synchronous_commit is ON - wait at commit point
 - If synchronous_commit is OFF - do not wait at any point

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-27 Thread Sawada Masahiko
On Fri, Sep 27, 2013 at 5:18 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:
 On Fri, Sep 27, 2013 at 1:28 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:


 

 Thank you for comment. I think it is good simple idea.
 In your opinion, if synchronous_transfer is set 'all' and
 synchronous_commit is set 'on',
 the master wait for data flush eve if user sets synchronous_commit to
 'local' or 'off'.
 For example, when user want to do transaction early, user can't do this.
 we leave the such situation as constraint?


 No, user can still override the transaction commit point wait. So if

 synchronous_transfer is set to all:
  - If synchronous_commit is ON - wait at all points
  - If synchronous_commit is OFF - wait only at buffer flush (and other
 related to failback safety) points

 synchronous_transfer is set to data_flush:
  - If synchronous_commit is either ON o OFF - do not wait at commit points,
 but wait at all other points

 synchronous_transfer is set to commit:
  - If synchronous_commit is ON - wait at commit point
  - If synchronous_commit is OFF - do not wait at any point


Thank you for explain. Understood.
if synchronous_transfer is set 'all' and user changes
synchronous_commit to 'off'( or 'local') at a transaction,
the master server wait at buffer flush, but doesn't wait at commit
points. Right?

In currently patch, synchronous_transfer works in cooperation with
synchronous_commit.
But if user changes synchronous_commit at a transaction, they are not
in cooperation.
So, your idea might be better than currently behaviour of synchronous_transfer.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-26 Thread Pavan Deolasee
On Thu, Sep 19, 2013 at 4:02 PM, Fujii Masao masao.fu...@gmail.com wrote:



 Hmm... when synchronous_transfer is set to data_flush,
 IMO the intuitive behaviors are

 (1) synchronous_commit = on
 A data flush should wait for the corresponding WAL to be
 flushed in the standby

 (2) synchronous_commit = remote_write
 A data flush should wait for the corresponding WAL to be
 written to OS in the standby.

 (3) synchronous_commit = local
 (4) synchronous_commit = off
 A data flush should wait for the corresponding WAL to be
 written locally in the master.


I thought synchronous_commit and synchronous_transfer are kind of
orthogonal to each other. synchronous_commit only controls whether and how
to wait for the standby only when a transaction commits.
synchronous_transfer OTOH tells how to interpret the standby listed in
synchronous_standbys parameter. If set to commit then they are
synchronous standbys (like today). If set to data_flush, they are
asynchronous failback safe standby and if set to all then they are
synchronous failback safe standbys. Well, its confusing :-(

So IMHO in the current state of things, the synchronous_transfer GUC can
not be changed at a session/transaction level since all backends, including
background workers must honor the settings to guarantee failback safety.
synchronous_commit still works the same way, but is ignored if
synchronous_transfer is set to data_flush because that effectively tells
us that the standbys listed under synchronous_standbys are really *async*
standbys with failback safety.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-22 Thread Sawada Masahiko
On Fri, Sep 20, 2013 at 10:33 PM, Samrat Revagade
revagade.sam...@gmail.com wrote:



 On Fri, Sep 20, 2013 at 3:40 PM, Sameer Thakur samthaku...@gmail.com
 wrote:






 Attached patch combines documentation patch and source-code patch.


 I have had a stab at reviewing the documentation. Have a look.


 Thanks.
 Attached patch implements suggestions in documentation.
 But comments from Fujii-san still needs to be implemented .
 We will implement them soon.


I have attached the patch which modify based on Fujii-san suggested.

If synchronous_transfer is set 'data_flush', behaviour of
synchronous_transfer with synchronous_commit is

(1) synchronous_commit = on
A data flush should wait for the corresponding WAL to be
flushed in the standby

(2) synchronous_commit = remote_write
A data flush should wait for the corresponding WAL to be
written to OS in the standby.

(3) synchronous_commit = local
(4) synchronous_commit = off
A data flush should wait for the corresponding WAL to be
written locally in the master.

Even if user changes synchronous_commit value in transaction,
other process (e.g. checkpointer process) can't confirm it.
Currently patch, each processes uses locally synchronous_commit.

Regards,

---
Sawada Masahiko


synchronous_transfer_v10.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-20 Thread Sameer Thakur

 Attached patch combines documentation patch and source-code patch.


I have had a stab at reviewing the documentation. Have a look.

--- a/doc/src/sgml/config.sgml

+++ b/doc/src/sgml/config.sgml

@@ -1749,6 +1749,50 @@ include 'filename'

   /listitem

  /varlistentry

+ varlistentry id=guc-synchronous-transfer
xreflabel=synchronous_transfer

+  termvarnamesynchronous_transfer/varname
(typeenum/type)/term

+  indexterm

+   primaryvarnamesynchronous_transfer/ configuration
parameter/primary

+  /indexterm

+  listitem

+   para

+This parameter controls the synchronous nature of WAL transfer and

+maintains file system level consistency between master server and

+standby server. It specifies whether master server will wait for
file

+system level change (for example : modifying data page) before

+the corresponding WAL records are replicated to the standby server.

+   /para

+   para

+Valid values are literalcommit/, literaldata_flush/ and

+literalall/. The default value is literalcommit/, meaning

+that master will only wait for transaction commits, this is
equivalent

+to turning off literalsynchronous_transfer/ parameter and
standby

+server will behave as a quotesynchronous standby / in

+Streaming Replication. For value literaldata_flush/, master
will

+wait only for data page modifications but not for transaction

+commits, hence the standby server will act as quoteasynchronous

+failback safe standby/. For value literal all/, master will
wait

+for data page modifications as well as for transaction commits and

+resultant standby server will act as quotesynchronous failback
safe

+standby/.The wait is on background activities and hence will not
create performance overhead.

+  To configure synchronous failback safe standby

+xref linkend=guc-synchronous-standby-names should be set.

+   /para

+  /listitem

+ /varlistentry



@@ -2258,14 +2302,25 @@ include 'filename'/indexterm

   listitem

para

-Specifies a comma-separated list of standby names that can support

-firsttermsynchronous replication/, as described in

-xref linkend=synchronous-replication.

-At any one time there will be at most one active synchronous
standby;

-transactions waiting for commit will be allowed to proceed after

-this standby server confirms receipt of their data.

-The synchronous standby will be the first standby named in this
list

-that is both currently connected and streaming data in real-time

+Specifies a comma-separated list of standby names. If this
parameter

+is set then standby will behave as synchronous standby in
replication,

+as described in xref linkend=synchronous-replication or
synchronous

+failback safe standby, as described in xref
linkend=failback-safe.

+At any time there will be at most one active standby; when standby
is

+synchronous standby in replication, transactions waiting for commit

+will be allowed to proceed after this standby server confirms
receipt

+of their data. But when standby is synchronous failback safe
standby

+data page modifications as well as transaction commits will be
allowed

+to proceed only after this standby server confirms receipt of
their data.

+If this parameter is set to empty value and

+xref linkend=guc-synchronous-transfer is set to
literaldata_flush/

+then standby is called as asynchronous failback safe standby and
only

+data page modifications will wait before corresponding WAL record
is

+replicated to standby.

+   /para

+   para

+Synchronous standby in replication will be the first standby named
in

+this list that is both currently connected and streaming data in
real-time

 (as shown by a state of literalstreaming/literal in the

 link linkend=monitoring-stats-views-table

 literalpg_stat_replication//link view).





--- a/doc/src/sgml/high-availability.sgml

+++ b/doc/src/sgml/high-availability.sgml

+

+  sect2 id=failback-safe

+ titleSetting up failback safe standby/title

+

+   indexterm zone=high-availability

+   primarySetting up failback safe standby/primary

+   /indexterm

+

+   para

+ PostgreSQL streaming replication offers durability, but if the master
crashes and

+a particular WAL record is unable to reach to standby server, then that

+WAL record is present on master server but not on standby server.

+In such a case master is ahead of standby server in term of WAL records
and data in database.

+This leads to file-system level inconsistency between master and standby
server.

+For example a heap page update on the master might not have been reflected
on standby when 

Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-20 Thread Samrat Revagade
On Fri, Sep 20, 2013 at 3:40 PM, Sameer Thakur samthaku...@gmail.comwrote:






 Attached patch combines documentation patch and source-code patch.


 I have had a stab at reviewing the documentation. Have a look.


Thanks.
Attached patch implements suggestions in documentation.
But comments from Fujii-san still needs to be implemented .
We will implement them soon.


synchronous_transfer_v9.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-19 Thread Sawada Masahiko
On Thu, Sep 19, 2013 at 12:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 I attached the patch which I have modified.

 Thanks for updating the patch!

 Here are the review comments:


Thank you for reviewing!

 I got the compiler warning:

 syncrep.c:112: warning: unused variable 'i'

 How does synchronous_transfer work with synchronous_commit?

The currently patch synchronous_transfer doesn't work when
synchronous_commit is set 'off' or 'local'.
if user changes synchronous_commit value on transaction, checkpointer
process can't see it.
Due to that, even if synchronous_commit is changed to 'off' from 'on',
synchronous_transfer doesn't work.
I'm planning to modify the patch so that synchronous_transfer is not
affected by synchronous_commit.


 + * accept all the likely variants of off.

 This comment should be removed because synchronous_transfer
 doesn't accept the value off.

 +{commit, SYNCHRONOUS_TRANSFER_COMMIT, true},

 ISTM the third value true should be false.

 +{0, SYNCHRONOUS_TRANSFER_COMMIT, true},

 Why is this needed?

 +elog(WARNING, XLogSend sendTimeLineValidUpto(%X/%X) =
 sentPtr(%X/%X) AND sendTImeLine,
 + (uint32) (sendTimeLineValidUpto  32), (uint32)
 sendTimeLineValidUpto,
 + (uint32) (sentPtr  32), (uint32) sentPtr);

 Why is this needed?


They are unnecessary. I had forgot to remove unnecessary codes.

 +#define SYNC_REP_WAIT_FLUSH1
 +#define SYNC_REP_WAIT_DATA_FLUSH2

 Why do we need to separate the wait-queue for wait-data-flush
 from that for wait-flush? ISTM that wait-data-flush also can
 wait for the replication on the wait-queue for wait-flush, and
 which would simplify the patch.


Yes, it seems not necessary to add queue newly.
I will delete SYNC_REP_WAIT_DATA_FLUSH and related that.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-19 Thread Fujii Masao
On Thu, Sep 19, 2013 at 7:07 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 12:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 I attached the patch which I have modified.

 Thanks for updating the patch!

 Here are the review comments:


 Thank you for reviewing!

 I got the compiler warning:

 syncrep.c:112: warning: unused variable 'i'

 How does synchronous_transfer work with synchronous_commit?

 The currently patch synchronous_transfer doesn't work when
 synchronous_commit is set 'off' or 'local'.
 if user changes synchronous_commit value on transaction, checkpointer
 process can't see it.
 Due to that, even if synchronous_commit is changed to 'off' from 'on',
 synchronous_transfer doesn't work.
 I'm planning to modify the patch so that synchronous_transfer is not
 affected by synchronous_commit.

Hmm... when synchronous_transfer is set to data_flush,
IMO the intuitive behaviors are

(1) synchronous_commit = on
A data flush should wait for the corresponding WAL to be
flushed in the standby

(2) synchronous_commit = remote_write
A data flush should wait for the corresponding WAL to be
written to OS in the standby.

(3) synchronous_commit = local
(4) synchronous_commit = off
A data flush should wait for the corresponding WAL to be
written locally in the master.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-19 Thread Sawada Masahiko
On Thu, Sep 19, 2013 at 7:32 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 7:07 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Thu, Sep 19, 2013 at 12:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 I attached the patch which I have modified.

 Thanks for updating the patch!

 Here are the review comments:


 Thank you for reviewing!

 I got the compiler warning:

 syncrep.c:112: warning: unused variable 'i'

 How does synchronous_transfer work with synchronous_commit?

 The currently patch synchronous_transfer doesn't work when
 synchronous_commit is set 'off' or 'local'.
 if user changes synchronous_commit value on transaction, checkpointer
 process can't see it.
 Due to that, even if synchronous_commit is changed to 'off' from 'on',
 synchronous_transfer doesn't work.
 I'm planning to modify the patch so that synchronous_transfer is not
 affected by synchronous_commit.

 Hmm... when synchronous_transfer is set to data_flush,
 IMO the intuitive behaviors are

 (1) synchronous_commit = on
 A data flush should wait for the corresponding WAL to be
 flushed in the standby

 (2) synchronous_commit = remote_write
 A data flush should wait for the corresponding WAL to be
 written to OS in the standby.

 (3) synchronous_commit = local
 (4) synchronous_commit = off
 A data flush should wait for the corresponding WAL to be
 written locally in the master.


It is good idea.
So synchronous_commit value need to be visible from other process.
To share synchronous_commit with other process, I will try to put
synchronous_commit
value into shared buffer.
Is there already the guc parameter which is shared with other process?
I tried to find such parameter, but there was not it.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-18 Thread Sawada Masahiko
On Wed, Sep 18, 2013 at 11:45 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Sep 18, 2013 at 10:35 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Tue, Sep 17, 2013 at 9:52 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I set up synchronous replication with synchronous_transfer = all, and then 
 I ran
 pgbench -i and executed CHECKPOINT in the master. After that, when I 
 executed
 CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased 
 by
 synchronous_transfer feature.

 Did you set synchronous_standby_names in the standby server?

 Yes.

 If so, the master server waits for the standby server which is set to
 synchronous_standby_names.
 Please let me know detail of this case.

 Both master and standby have the same postgresql.conf settings as follows:

 max_wal_senders = 4
 wal_level = hot_standby
 wal_keep_segments = 32
 synchronous_standby_names = '*'
 synchronous_transfer = all

 How does synchronous_transfer work with cascade replication? If it's set to 
 all
 in the sender-side standby, it can resolve the data page inconsistency 
 between
 two standbys?


 Currently patch supports the case which two servers are set up SYNC 
 replication.
 IWO, failback safe standby is the same as SYNC replication standby.
 User can set synchronous_transfer in only master side.

 So, it's very strange that CHECKPOINT on the standby gets stuck infinitely.


I attached the patch which I have modified.

I have modified that if both synchronous replication and synchronous
transfer are requested,
but the server still in recovery(i.g., the server is in standby mode),
the server doesn't wait for
corresponding WAL replicated.
Specifically, I added condition RecoveryInProgress().

If both functions(synchronous replication and transfer) are set and
user sets up synchronous replication between two servers,
user can executes CHECKPOINT on standby side. It will not wait for
corresponding WAL replicated.
But, If both parameter are set and user doesn't set up synchronous
replication(i.g., the master server works alone),
the master server waits infinitely when user executes CHECKPOINT. This
behaviour is similar to synchronous replication.


Regards,

---
Sawada Masahiko


synchronous_transfer_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-18 Thread Fujii Masao
On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 I attached the patch which I have modified.

Thanks for updating the patch!

Here are the review comments:

I got the compiler warning:

syncrep.c:112: warning: unused variable 'i'

How does synchronous_transfer work with synchronous_commit?

+ * accept all the likely variants of off.

This comment should be removed because synchronous_transfer
doesn't accept the value off.

+{commit, SYNCHRONOUS_TRANSFER_COMMIT, true},

ISTM the third value true should be false.

+{0, SYNCHRONOUS_TRANSFER_COMMIT, true},

Why is this needed?

+elog(WARNING, XLogSend sendTimeLineValidUpto(%X/%X) =
sentPtr(%X/%X) AND sendTImeLine,
+ (uint32) (sendTimeLineValidUpto  32), (uint32)
sendTimeLineValidUpto,
+ (uint32) (sentPtr  32), (uint32) sentPtr);

Why is this needed?

+#define SYNC_REP_WAIT_FLUSH1
+#define SYNC_REP_WAIT_DATA_FLUSH2

Why do we need to separate the wait-queue for wait-data-flush
from that for wait-flush? ISTM that wait-data-flush also can
wait for the replication on the wait-queue for wait-flush, and
which would simplify the patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-17 Thread Samrat Revagade
  syncrep.c: In function ‘SyncRepReleaseWaiters’:
  syncrep.c:421:6: warning: variable ‘numdataflush’ set but not used
  [-Wunused-but-set-variable]
 

 Sorry I forgot fix it.

 I have attached the patch which I modified.


Attached patch combines documentation patch and source-code patch.

-- 
Regards,

Samrat Revgade


synchronous_transfer_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-17 Thread Fujii Masao
On Tue, Sep 17, 2013 at 3:45 PM, Samrat Revagade
revagade.sam...@gmail.com wrote:

  syncrep.c: In function ‘SyncRepReleaseWaiters’:
  syncrep.c:421:6: warning: variable ‘numdataflush’ set but not used
  [-Wunused-but-set-variable]
 

 Sorry I forgot fix it.

 I have attached the patch which I modified.


 Attached patch combines documentation patch and source-code patch.

I set up synchronous replication with synchronous_transfer = all, and then I ran
pgbench -i and executed CHECKPOINT in the master. After that, when I executed
CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased by
synchronous_transfer feature.

How does synchronous_transfer work with cascade replication? If it's set to all
in the sender-side standby, it can resolve the data page inconsistency between
two standbys?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-17 Thread Sawada Masahiko
On Tue, Sep 17, 2013 at 9:52 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Sep 17, 2013 at 3:45 PM, Samrat Revagade
 revagade.sam...@gmail.com wrote:

  syncrep.c: In function ‘SyncRepReleaseWaiters’:
  syncrep.c:421:6: warning: variable ‘numdataflush’ set but not used
  [-Wunused-but-set-variable]
 

 Sorry I forgot fix it.

 I have attached the patch which I modified.


 Attached patch combines documentation patch and source-code patch.

 I set up synchronous replication with synchronous_transfer = all, and then I 
 ran
 pgbench -i and executed CHECKPOINT in the master. After that, when I executed
 CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased by
 synchronous_transfer feature.

Did you set synchronous_standby_names in the standby server?
If so, the master server waits for the standby server which is set to
synchronous_standby_names.
Please let me know detail of this case.


 How does synchronous_transfer work with cascade replication? If it's set to 
 all
 in the sender-side standby, it can resolve the data page inconsistency 
 between
 two standbys?


Currently patch supports the case which two servers are set up SYNC replication.
IWO, failback safe standby is the same as SYNC replication standby.
User can set synchronous_transfer in only master side.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-17 Thread Fujii Masao
On Wed, Sep 18, 2013 at 10:35 AM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Tue, Sep 17, 2013 at 9:52 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I set up synchronous replication with synchronous_transfer = all, and then I 
 ran
 pgbench -i and executed CHECKPOINT in the master. After that, when I executed
 CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased by
 synchronous_transfer feature.

 Did you set synchronous_standby_names in the standby server?

Yes.

 If so, the master server waits for the standby server which is set to
 synchronous_standby_names.
 Please let me know detail of this case.

Both master and standby have the same postgresql.conf settings as follows:

max_wal_senders = 4
wal_level = hot_standby
wal_keep_segments = 32
synchronous_standby_names = '*'
synchronous_transfer = all

 How does synchronous_transfer work with cascade replication? If it's set to 
 all
 in the sender-side standby, it can resolve the data page inconsistency 
 between
 two standbys?


 Currently patch supports the case which two servers are set up SYNC 
 replication.
 IWO, failback safe standby is the same as SYNC replication standby.
 User can set synchronous_transfer in only master side.

So, it's very strange that CHECKPOINT on the standby gets stuck infinitely.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-17 Thread Sawada Masahiko
On Wed, Sep 18, 2013 at 11:45 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Sep 18, 2013 at 10:35 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Tue, Sep 17, 2013 at 9:52 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I set up synchronous replication with synchronous_transfer = all, and then 
 I ran
 pgbench -i and executed CHECKPOINT in the master. After that, when I 
 executed
 CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased 
 by
 synchronous_transfer feature.

 Did you set synchronous_standby_names in the standby server?

 Yes.

 If so, the master server waits for the standby server which is set to
 synchronous_standby_names.
 Please let me know detail of this case.

 Both master and standby have the same postgresql.conf settings as follows:

 max_wal_senders = 4
 wal_level = hot_standby
 wal_keep_segments = 32
 synchronous_standby_names = '*'
 synchronous_transfer = all

 How does synchronous_transfer work with cascade replication? If it's set to 
 all
 in the sender-side standby, it can resolve the data page inconsistency 
 between
 two standbys?


 Currently patch supports the case which two servers are set up SYNC 
 replication.
 IWO, failback safe standby is the same as SYNC replication standby.
 User can set synchronous_transfer in only master side.

 So, it's very strange that CHECKPOINT on the standby gets stuck infinitely.


yes I think so.
I was not considering that user set synchronous_standby_names in the
standby server.
it will ocurr
I will fix it considering this case.



Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-17 Thread Sawada Masahiko
On Wed, Sep 18, 2013 at 1:05 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Wed, Sep 18, 2013 at 11:45 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Sep 18, 2013 at 10:35 AM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 On Tue, Sep 17, 2013 at 9:52 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I set up synchronous replication with synchronous_transfer = all, and then 
 I ran
 pgbench -i and executed CHECKPOINT in the master. After that, when I 
 executed
 CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased 
 by
 synchronous_transfer feature.

 Did you set synchronous_standby_names in the standby server?

 Yes.

 If so, the master server waits for the standby server which is set to
 synchronous_standby_names.
 Please let me know detail of this case.

 Both master and standby have the same postgresql.conf settings as follows:

 max_wal_senders = 4
 wal_level = hot_standby
 wal_keep_segments = 32
 synchronous_standby_names = '*'
 synchronous_transfer = all

 How does synchronous_transfer work with cascade replication? If it's set 
 to all
 in the sender-side standby, it can resolve the data page inconsistency 
 between
 two standbys?


 Currently patch supports the case which two servers are set up SYNC 
 replication.
 IWO, failback safe standby is the same as SYNC replication standby.
 User can set synchronous_transfer in only master side.

 So, it's very strange that CHECKPOINT on the standby gets stuck infinitely.


Sorry I sent mail by mistake.

yes I think so.

It waits for corresponding WAL replicated.
Behaviour of synchronous_transfer is similar to
synchronous_standby_names and synchronous replication little.
That is, if  those parameter is set but the standby server doesn't
connect to the master server,
the master server waits for corresponding WAL replicated to standby
server infinitely.

I was not considering that user set synchronous_standby_names in the
standby server.
I will fix it considering this case.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-12 Thread Peter Eisentraut
On 9/12/13 3:00 AM, Samrat Revagade wrote:
 
 We are improving the patch for Commit Fest 2 now.
 We will fix above compiler warnings as soon as possible and submit
 the patch
 
 
 Attached *synchronous_transfer_v5.patch* implements review comments from
 commit fest-1 and reduces the performance overhead of synchronous_transfer.

There is still this compiler warning:

syncrep.c: In function ‘SyncRepReleaseWaiters’:
syncrep.c:421:6: warning: variable ‘numdataflush’ set but not used
[-Wunused-but-set-variable]



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-09-12 Thread Sawada Masahiko
On Fri, Sep 13, 2013 at 1:11 AM, Peter Eisentraut pete...@gmx.net wrote:
 On 9/12/13 3:00 AM, Samrat Revagade wrote:

 We are improving the patch for Commit Fest 2 now.
 We will fix above compiler warnings as soon as possible and submit
 the patch


 Attached *synchronous_transfer_v5.patch* implements review comments from
 commit fest-1 and reduces the performance overhead of synchronous_transfer.

 There is still this compiler warning:

 syncrep.c: In function ‘SyncRepReleaseWaiters’:
 syncrep.c:421:6: warning: variable ‘numdataflush’ set but not used
 [-Wunused-but-set-variable]


Sorry I forgot fix it.

I have attached the patch which I modified.

Regards,

---
Sawada Masahiko


synchronous_transfer_v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-08-25 Thread Sawada Masahiko
On Sat, Aug 24, 2013 at 11:38 PM, Peter Eisentraut pete...@gmx.net wrote:
 On Thu, 2013-07-11 at 23:42 +0900, Sawada Masahiko wrote:
 please find the attached patch.

 Please fix these compiler warnings:

 xlog.c:3117:2: warning: implicit declaration of function ‘SyncRepWaitForLSN’ 
 [-Wimplicit-function-declaration]
 syncrep.c:414:6: warning: variable ‘numdataflush’ set but not used 
 [-Wunused-but-set-variable]
Thank you for your information!

We are improving the patch for Commit Fest 2 now.
We will fix above compiler warnings as soon as possible and submit the patch

-- 
Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-08-24 Thread Peter Eisentraut
On Thu, 2013-07-11 at 23:42 +0900, Sawada Masahiko wrote:
 please find the attached patch.

Please fix these compiler warnings:

xlog.c:3117:2: warning: implicit declaration of function ‘SyncRepWaitForLSN’ 
[-Wimplicit-function-declaration]
syncrep.c:414:6: warning: variable ‘numdataflush’ set but not used 
[-Wunused-but-set-variable]




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-07-15 Thread Samrat Revagade
 ToDo
 1. currently this patch supports synchronous transfer. so we can't set
 different synchronous transfer mode to each server.
 we need to improve the patch for support following cases.
- SYNC standby and make separate ASYNC failback safe standby
- ASYNC standby and make separate ASYNC failback safe standby

 2. we have not measure performance yet. we need to measure perfomance.

Here are the tests results, showing the performance overhead of the
patch ( failback_safe_standby_v4.patch ):

Tests are carried out in two different scenarios:
1.  Tests with Fast transaction workloads.
2.  Test with large loads.

Test Type-1: Tests with pgbench  (*Tests with fast transaction workloads*)

Notes:
1. These test are for testing the performance overhead caused by the
patch for fast transaction workloads.
2. Tests are performed with the pgbench benchmark and performance
measurement factor is the TPS value.
3. Values represents the TPS for 4 runs,and the last value represents
the average of all the runs.

Settings for tests:
transaction type: TPC-B (sort of)
scaling factor: 300
query mode: simple
number of clients:150
number of threads:1
duration:1800 s

Analysis of results:
1) Synchronous Replication :(753.06, 748.81, 748.38, 747.21, Avg- 747.2)
2) Synchronous Replication + Failsafe standby (commit) :
(729.13,724.33 , 713.59 , 710.79, Avg- 719.46)
3) Synchronous Replication + Failsafe standby (all) : (692.08, 688.08,
711.23,711.62, Avg- 700.75)

4) Asynchronous Replication :(1008.42,  993.39, 986.80 ,1028.46 ,
Avg-1004.26 )
5) Asynchronous Replication + Failsafe standby (commit) : (974.49,
978.60 ,969.11, 957.18 , Avg- 969.84)
6) Asynchronous Replication + Failsafe standby (data_flush) :
(1011.79, 992.05, 1030.20,940.50 , Avg- 993.63)

In above test results the performance numbers are very close to each
other, also because of noise they show variation. hence following is
approximate conclusion about the  overhead of patch.

1. Streaming replication + synchronous_transfer (all , data_flush)::
a)  On an average, synchronous replication combined with
synchronous_transfer (all ) causes 6.21 %  performance overhead.
b)  On an average, asynchronous streaming replication combined
synchronous_transfer (data_flush ) causes averagely 1.05 % performance
overhead.

2. Streaming replication + synchronous_transfer (commit):
a) On an average, synchronous replication combined with
synchronous_transfer (commit )  causes 3.71 % performance overhead.
b) On an average,  asynchronous streaming replication combined with
synchronous_transfer (commit)  causes averagely 3.42 % performance
overhead.


Test Type-2: Tests with pgbench -i (*Tests with large loads:*)

Notes:
1. These test are for testing the performance overhead caused by the
patch for large loads and index builds.
2. Tests are performed with the pgbench -i (initialization of test
data  i.e the time taken for creating tables of pgbench, inserting
tuples and building primary keys.)
3. The performance measurement factor is the wall clock time for
pgbench -i (measured with time command).
4. Values represents the Wall clock time for 4 runs and the last value
represents the average of all the runs.

pgbench settings:
Scale factor:  300 ( Database size - 4.3873 GB)

Test results:
1) Synchronous Replication : (126.98,  133.83,  127.77,  129.70,
Avg-129.57)  (second)
2) Synchronous Replication + synchronous_transfer (commit) : (132.87,
125.85,  133.91,  134.61,  Avg-131.81) (second)
3) Synchronous Replication + synchronous_transfer (all) : (133.59,
132.82, 134.20, 135.22,  Avg-133.95) (second)

4) Asynchronous Replication : ( 126.75 , 136.95, 130.42,
127.77, 130.47) (second)
5) Asynchronous Replication + synchronous_transfer (commit) : (128.13,
133.06, 127.62, 130.70, Avg-129.87) (second)
6) Asynchronous Replication + synchronous_transfer (data_flush) :
(134.55 , 139.90, 144.47, 143.85, Avg-140.69) (second)

In above test results the performance numbers are very close to each
other, also because of noise they show variation. hence following is
approximate conclusion about the  overhead of patch.

1. Streaming replication + synchronous_transfer (all , data_flush)::
a)  On an average, synchronous replication combined with
synchronous_transfer (all ) causes 3.38 %  performance overhead.
b)  On an average, asynchronous streaming replication combined
synchronous_transfer (data_flush ) causes averagely 7.83 % performance
overhead.

2. Streaming replication + synchronous_transfer (commit):
a) On an average, synchronous replication combined with
synchronous_transfer (commit )  causes 1.72 % performance overhead.
b) On an average,  asynchronous streaming replication combined with
synchronous_transfer (commit)  causes averagely (-0.45) % performance
overhead.

The test results for both the cases (large loads and fast
transactions) shows variation because of noise, But we can observe
that approximately patch causes 3-4% performance overhead.


Regards,

Re: [HACKERS] Patch for fail-back without fresh backup

2013-07-11 Thread Sawada Masahiko
On Tue, Jul 9, 2013 at 11:45 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Sun, Jul 7, 2013 at 4:27 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 I found a bug which occurred when we do vacuum, and have fixed it.
 yesterday (8th July) Improve scalability of WAL insertions patch is
 committed to HEAD. so v2 patch does not apply to HEAD now.
 I also have fixed it to be applicable to HEAD.

 please find the attached patch.

 Regards,

 ---
 Sawada Masahiko

I have fixed that master server doesn't waits for the WAL to be
flushed to disk of standby when master server execute FlushBuffer(),
and have attached v4 patch.
please find the attached patch.

Regards,

---
Sawada Masahiko


failback_safe_standby_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-07-09 Thread Sawada Masahiko
On Sun, Jul 7, 2013 at 4:27 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Sun, Jul 7, 2013 at 4:19 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 17 June 2013 09:03, Pavan Deolasee pavan.deola...@gmail.com wrote:

 I agree. We should probably find a better name for this. Any suggestions ?

 err, I already made one...

 But that's not the whole story. I can see some utility in a patch that
 makes all WAL transfer synchronous, rather than just commits. Some
 name like synchronous_transfer might be appropriate. e.g.
 synchronous_transfer = all | commit (default).

 Since commits are more foreground in nature and this feature
 does not require us to wait during common foreground activities, we want a
 configuration where master can wait for synchronous transfers at other than
 commits. May we can solve that by having more granular control to the said
 parameter ?


 The idea of another slew of parameters that are very similar to
 synchronous replication but yet somehow different seems weird. I can't
 see a reason why we'd want a second lot of parameters. Why not just
 use the existing ones for sync rep? (I'm surprised the Parameter
 Police haven't visited you in the night...) Sure, we might want to
 expand the design for how we specify multi-node sync rep, but that is
 a different patch.


 How would we then distinguish between synchronous and the new kind of
 standby ?

 That's not the point. The point is Why would we have a new kind of
 standby? and therefore why do we need new parameters?

 I am told, one of the very popular setups for DR is to have one
 local sync standby and one async (may be cascaded by the local sync). Since
 this new feature is more useful for DR because taking a fresh backup on a
 slower link is even more challenging, IMHO we should support such setups.

 ...which still doesn't make sense to me. Lets look at that in detail.

 Take 3 servers, A, B, C with A and B being linked by sync rep, and C
 being safety standby at a distance.

 Either A or B is master, except in disaster. So if A is master, then B
 would be the failover target. If A fails, then you want to failover to
 B. Once B is the target, you want to failback to A as the master. C
 needs to follow the new master, whichever it is.

 If you set up sync rep between A and B and this new mode between A and
 C. When B becomes the master, you need to failback from B from A, but
 you can't because the new mode applied between A and C only, so you
 have to failback from C to A. So having the new mode not match with
 sync rep means you are forcing people to failback using the slow link
 in the common case.

 You might observe that having the two modes match causes problems if A
 and B fail, so you are forced to go to C as master and then eventually
 failback to A or B across a slow link. That case is less common and
 could be solved by extending sync transfer to more/multi nodes.

 It definitely doesn't make sense to have sync rep on anything other
 than a subset of sync transfer. So while it may be sensible in the
 future to make sync transfer a superset of sync rep nodes, it makes
 sense to make them the same config for now.
 I have updated the patch.

 we support following 2 cases.
 1. SYNC server and also make same failback safe standby server
 2. ASYNC server and also make same failback safe standby server

 1.  changed name of parameter
   give up 'failback_safe_standby_names' parameter from the first patch.
   and changed name of parameter from 'failback_safe_mode ' to
 'synchronous_transfer'.
   this parameter accepts 'all', 'data_flush' and 'commit'.

   -'commit'
 'commit' means that master waits for corresponding WAL to flushed
 to disk of standby server on commits.
 but master doesn't waits for replicated data pages.

   -'data_flush'
 'data_flush' means that master waits for replicated data page
 (e.g, CLOG, pg_control) before flush to disk of master server.
 but if user set to 'data_flush' to this parameter,
 'synchronous_commit' values is ignored even if user set
 'synchronous_commit'.

   -'all'
 'all' means that master waits for replicated WAL and data page.

 2. put SyncRepWaitForLSN() function into XLogFlush() function
   we have put SyncRepWaitForLSN() function into XLogFlush() function,
 and change argument of XLogFlush().

 they are setup case and need to set parameters.

 - SYNC server and also make same failback safe standgy server (case 1)
   synchronous_transfer = all
   synchronous_commit = remote_write/on
   synchronous_standby_names = ServerName

 - ASYNC server and also make same failback safe standgy server (case 2)
   synchronous_transfer = data_flush
   (synchronous_commit values is ignored)

 - default SYNC replication
   synchronous_transfer = commit
   synchronous_commit = on
   synchronous_standby_names = ServerName

 - default ASYNC replication
   synchronous_transfer = commit

 ToDo
 1. 

Re: [HACKERS] Patch for fail-back without fresh backup

2013-07-07 Thread Sawada Masahiko
On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 17 June 2013 09:03, Pavan Deolasee pavan.deola...@gmail.com wrote:

 I agree. We should probably find a better name for this. Any suggestions ?

 err, I already made one...

 But that's not the whole story. I can see some utility in a patch that
 makes all WAL transfer synchronous, rather than just commits. Some
 name like synchronous_transfer might be appropriate. e.g.
 synchronous_transfer = all | commit (default).

 Since commits are more foreground in nature and this feature
 does not require us to wait during common foreground activities, we want a
 configuration where master can wait for synchronous transfers at other than
 commits. May we can solve that by having more granular control to the said
 parameter ?


 The idea of another slew of parameters that are very similar to
 synchronous replication but yet somehow different seems weird. I can't
 see a reason why we'd want a second lot of parameters. Why not just
 use the existing ones for sync rep? (I'm surprised the Parameter
 Police haven't visited you in the night...) Sure, we might want to
 expand the design for how we specify multi-node sync rep, but that is
 a different patch.


 How would we then distinguish between synchronous and the new kind of
 standby ?

 That's not the point. The point is Why would we have a new kind of
 standby? and therefore why do we need new parameters?

 I am told, one of the very popular setups for DR is to have one
 local sync standby and one async (may be cascaded by the local sync). Since
 this new feature is more useful for DR because taking a fresh backup on a
 slower link is even more challenging, IMHO we should support such setups.

 ...which still doesn't make sense to me. Lets look at that in detail.

 Take 3 servers, A, B, C with A and B being linked by sync rep, and C
 being safety standby at a distance.

 Either A or B is master, except in disaster. So if A is master, then B
 would be the failover target. If A fails, then you want to failover to
 B. Once B is the target, you want to failback to A as the master. C
 needs to follow the new master, whichever it is.

 If you set up sync rep between A and B and this new mode between A and
 C. When B becomes the master, you need to failback from B from A, but
 you can't because the new mode applied between A and C only, so you
 have to failback from C to A. So having the new mode not match with
 sync rep means you are forcing people to failback using the slow link
 in the common case.

 You might observe that having the two modes match causes problems if A
 and B fail, so you are forced to go to C as master and then eventually
 failback to A or B across a slow link. That case is less common and
 could be solved by extending sync transfer to more/multi nodes.

 It definitely doesn't make sense to have sync rep on anything other
 than a subset of sync transfer. So while it may be sensible in the
 future to make sync transfer a superset of sync rep nodes, it makes
 sense to make them the same config for now.
I have updated the patch.

we support following 2 cases.
1. SYNC server and also make same failback safe standby server
2. ASYNC server and also make same failback safe standby server

1.  changed name of parameter
  give up 'failback_safe_standby_names' parameter from the first patch.
  and changed name of parameter from 'failback_safe_mode ' to
'synchronous_transfer'.
  this parameter accepts 'all', 'data_flush' and 'commit'.

  -'commit'
'commit' means that master waits for corresponding WAL to flushed
to disk of standby server on commits.
but master doesn't waits for replicated data pages.

  -'data_flush'
'data_flush' means that master waits for replicated data page
(e.g, CLOG, pg_control) before flush to disk of master server.
but if user set to 'data_flush' to this parameter,
'synchronous_commit' values is ignored even if user set
'synchronous_commit'.

  -'all'
'all' means that master waits for replicated WAL and data page.

2. put SyncRepWaitForLSN() function into XLogFlush() function
  we have put SyncRepWaitForLSN() function into XLogFlush() function,
and change argument of XLogFlush().

they are setup case and need to set parameters.

- SYNC server and also make same failback safe standgy server (case 1)
  synchronous_transfer = all
  synchronous_commit = remote_write/on
  synchronous_standby_names = ServerName

- ASYNC server and also make same failback safe standgy server (case 2)
  synchronous_transfer = data_flush
  (synchronous_commit values is ignored)

- default SYNC replication
  synchronous_transfer = commit
  synchronous_commit = on
  synchronous_standby_names = ServerName

- default ASYNC replication
  synchronous_transfer = commit

ToDo
1. currently this patch supports synchronous transfer. so we can't set
different synchronous transfer mode to each server.
we need to improve the patch for support following cases.
   - SYNC standby 

Re: [HACKERS] Patch for fail-back without fresh backup

2013-07-07 Thread Sawada Masahiko
On Sun, Jul 7, 2013 at 4:19 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 17 June 2013 09:03, Pavan Deolasee pavan.deola...@gmail.com wrote:

 I agree. We should probably find a better name for this. Any suggestions ?

 err, I already made one...

 But that's not the whole story. I can see some utility in a patch that
 makes all WAL transfer synchronous, rather than just commits. Some
 name like synchronous_transfer might be appropriate. e.g.
 synchronous_transfer = all | commit (default).

 Since commits are more foreground in nature and this feature
 does not require us to wait during common foreground activities, we want a
 configuration where master can wait for synchronous transfers at other than
 commits. May we can solve that by having more granular control to the said
 parameter ?


 The idea of another slew of parameters that are very similar to
 synchronous replication but yet somehow different seems weird. I can't
 see a reason why we'd want a second lot of parameters. Why not just
 use the existing ones for sync rep? (I'm surprised the Parameter
 Police haven't visited you in the night...) Sure, we might want to
 expand the design for how we specify multi-node sync rep, but that is
 a different patch.


 How would we then distinguish between synchronous and the new kind of
 standby ?

 That's not the point. The point is Why would we have a new kind of
 standby? and therefore why do we need new parameters?

 I am told, one of the very popular setups for DR is to have one
 local sync standby and one async (may be cascaded by the local sync). Since
 this new feature is more useful for DR because taking a fresh backup on a
 slower link is even more challenging, IMHO we should support such setups.

 ...which still doesn't make sense to me. Lets look at that in detail.

 Take 3 servers, A, B, C with A and B being linked by sync rep, and C
 being safety standby at a distance.

 Either A or B is master, except in disaster. So if A is master, then B
 would be the failover target. If A fails, then you want to failover to
 B. Once B is the target, you want to failback to A as the master. C
 needs to follow the new master, whichever it is.

 If you set up sync rep between A and B and this new mode between A and
 C. When B becomes the master, you need to failback from B from A, but
 you can't because the new mode applied between A and C only, so you
 have to failback from C to A. So having the new mode not match with
 sync rep means you are forcing people to failback using the slow link
 in the common case.

 You might observe that having the two modes match causes problems if A
 and B fail, so you are forced to go to C as master and then eventually
 failback to A or B across a slow link. That case is less common and
 could be solved by extending sync transfer to more/multi nodes.

 It definitely doesn't make sense to have sync rep on anything other
 than a subset of sync transfer. So while it may be sensible in the
 future to make sync transfer a superset of sync rep nodes, it makes
 sense to make them the same config for now.
 I have updated the patch.

 we support following 2 cases.
 1. SYNC server and also make same failback safe standby server
 2. ASYNC server and also make same failback safe standby server

 1.  changed name of parameter
   give up 'failback_safe_standby_names' parameter from the first patch.
   and changed name of parameter from 'failback_safe_mode ' to
 'synchronous_transfer'.
   this parameter accepts 'all', 'data_flush' and 'commit'.

   -'commit'
 'commit' means that master waits for corresponding WAL to flushed
 to disk of standby server on commits.
 but master doesn't waits for replicated data pages.

   -'data_flush'
 'data_flush' means that master waits for replicated data page
 (e.g, CLOG, pg_control) before flush to disk of master server.
 but if user set to 'data_flush' to this parameter,
 'synchronous_commit' values is ignored even if user set
 'synchronous_commit'.

   -'all'
 'all' means that master waits for replicated WAL and data page.

 2. put SyncRepWaitForLSN() function into XLogFlush() function
   we have put SyncRepWaitForLSN() function into XLogFlush() function,
 and change argument of XLogFlush().

 they are setup case and need to set parameters.

 - SYNC server and also make same failback safe standgy server (case 1)
   synchronous_transfer = all
   synchronous_commit = remote_write/on
   synchronous_standby_names = ServerName

 - ASYNC server and also make same failback safe standgy server (case 2)
   synchronous_transfer = data_flush
   (synchronous_commit values is ignored)

 - default SYNC replication
   synchronous_transfer = commit
   synchronous_commit = on
   synchronous_standby_names = ServerName

 - default ASYNC replication
   synchronous_transfer = commit

 ToDo
 1. currently this patch supports synchronous transfer. so we can't set
 different 

Re: [HACKERS] Patch for fail-back without fresh backup

2013-07-02 Thread Sawada Masahiko
On Tue, Jul 2, 2013 at 2:45 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Friday, June 28, 2013 10:41 AM Sawada Masahiko wrote:
 On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila amit.kap...@huawei.com
 wrote:
  On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote:
  Hi,
 
  
   So our proposal on this problem is that we must ensure that
 master
  should
   not make any file system level changes without confirming that the
   corresponding WAL record is replicated to the standby.
  
 How will you take care of extra WAL on old master during
 recovery.
  If it
   plays the WAL which has not reached new-master, it can be a
 problem.
  
 
  I am trying to understand how there would be extra WAL on old master
  that it would replay and cause inconsistency. Consider how I am
  picturing it and correct me if I am wrong.
 
  1) Master crashes. So a failback standby becomes new master forking
 the
  WAL.
  2) Old master is restarted as a standby (now with this patch,
 without
  a new base backup).
  3) It would try to replay all the WAL it has available and later
  connect to the new master also following the timeline switch (the
  switch might happen using archived WAL and timeline history file OR
  the new switch-over-streaming-replication-connection as of 9.3,
  right?)
 
  * in (3), when the new standby/old master is replaying WAL, from
 where
  is it picking the WAL?
 Yes, this is the point which can lead to inconsistency, new
 standby/old master
 will replay WAL after the last successful checkpoint, for which he
 get info from
 control file. It is picking WAL from the location where it was
 logged when it was active (pg_xlog).
 
  Does it first replay all the WAL in pg_xlog
  before archive? Should we make it check for a timeline history file
 in
  archive before it starts replaying any WAL?
 
  I have really not thought what is best solution for problem.
 
  * And, would the new master, before forking the WAL, replay all the
  WAL that is necessary to come to state (of data directory) that the
  old master was just before it crashed?
 
  I don't think new master has any correlation with old master's data
 directory,
  Rather it will replay the WAL it has received/flushed before start
 acting as master.
 when old master fail over, WAL which ahead of new master might be
 broken data. so that when user want to dump from old master, there is
 possible to fail dump.
 it is just idea, we extend parameter which is used in recovery.conf
 like 'follow_master_force'. this parameter accepts 'on' and 'off', is
 effective only when standby_mode is set to on.

 if both parameters 'follow_master_force' and 'standby_mode' is set to
 'on',
 1. when standby server starts and starts to recovery, standby server
 skip to apply WAL which is in  pg_xlog, and request WAL from latest
 checkpoint LSN to master server.
 2. master server receives LSN which is standby server latest
 checkpoint, and compare between LSN of standby and LSN of master
 latest checkpoint. if those LSN match, master will send WAL from
 latest checkpoint LSN. if not, master will inform standby that failed.
 3. standby will fork WAL, and apply WAL which is sent from master
 continuity.

 Please consider if this solution has the same problem as mentioned by Robert 
 Hass in below mail:
 http://www.postgresql.org/message-id/ca+tgmoy4j+p7jy69ry8gposmmdznyqu6dtionprcxavg+sp...@mail.gmail.com


 in this approach, user who want to dump from old master will set 'off'
 to follow_master_force and standby_mode, and gets the dump of old
 master after master started. OTOH, user who want to starts replication
 force will set 'on' to both parameter.

 I think before going into solution of this problem, it should be confirmed by 
 others whether such a problem
 needs to be resolved as part of this patch.

 I have seen that Simon Riggs is a reviewer of this Patch and he hasn't 
 mentioned his views about this problem.
 So I think it's not worth inventing a solution.
 Rather I think if all other things are resolved for this patch, then may be 
 in end we can check with Committer,
 if he thinks that this problem needs to be solved as a separate patch.
thank you for feedback.
yes, we can consider separately those problem. and we need to judge
that whether it is worth to invent a solution.
I think that solving the fundamental of this problem is complex. it
might be needs to big change to architecture of replication.
so I'm thinking that I'd like to deal of something when we do
recovery. if so, I think that if we deal at recovery time, impact to
performance is ignored.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-07-02 Thread Amit Kapila
On Tuesday, July 02, 2013 11:16 AM Amit Kapila wrote:
 On Friday, June 28, 2013 10:41 AM Sawada Masahiko wrote:
  On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila amit.kap...@huawei.com
  wrote:
   On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote:
   Hi,
  
   
So our proposal on this problem is that we must ensure that
  master
   should
not make any file system level changes without confirming that
 the
corresponding WAL record is replicated to the standby.
   
  How will you take care of extra WAL on old master during
  recovery.
   If it
plays the WAL which has not reached new-master, it can be a
  problem.
   
  
   I am trying to understand how there would be extra WAL on old
 master
   that it would replay and cause inconsistency. Consider how I am
   picturing it and correct me if I am wrong.
  
   1) Master crashes. So a failback standby becomes new master
 forking
  the
   WAL.
   2) Old master is restarted as a standby (now with this patch,
  without
   a new base backup).
   3) It would try to replay all the WAL it has available and later
   connect to the new master also following the timeline switch (the
   switch might happen using archived WAL and timeline history file
 OR
   the new switch-over-streaming-replication-connection as of 9.3,
   right?)
  
   * in (3), when the new standby/old master is replaying WAL, from
  where
   is it picking the WAL?
  Yes, this is the point which can lead to inconsistency, new
  standby/old master
  will replay WAL after the last successful checkpoint, for which
 he
  get info from
  control file. It is picking WAL from the location where it was
  logged when it was active (pg_xlog).
  
   Does it first replay all the WAL in pg_xlog
   before archive? Should we make it check for a timeline history
 file
  in
   archive before it starts replaying any WAL?
  
   I have really not thought what is best solution for problem.
  
   * And, would the new master, before forking the WAL, replay all
 the
   WAL that is necessary to come to state (of data directory) that
 the
   old master was just before it crashed?
  
   I don't think new master has any correlation with old master's data
  directory,
   Rather it will replay the WAL it has received/flushed before start
  acting as master.
  when old master fail over, WAL which ahead of new master might be
  broken data. so that when user want to dump from old master, there is
  possible to fail dump.
  it is just idea, we extend parameter which is used in recovery.conf
  like 'follow_master_force'. this parameter accepts 'on' and 'off', is
  effective only when standby_mode is set to on.
 
  if both parameters 'follow_master_force' and 'standby_mode' is set to
  'on',
  1. when standby server starts and starts to recovery, standby server
  skip to apply WAL which is in  pg_xlog, and request WAL from latest
  checkpoint LSN to master server.
  2. master server receives LSN which is standby server latest
  checkpoint, and compare between LSN of standby and LSN of master
  latest checkpoint. if those LSN match, master will send WAL from
  latest checkpoint LSN. if not, master will inform standby that
 failed.
  3. standby will fork WAL, and apply WAL which is sent from master
  continuity.
 
 Please consider if this solution has the same problem as mentioned by
 Robert Hass in below mail:

Sorry typo error, it's Robert Haas mail:

 http://www.postgresql.org/message-
 id/ca+tgmoy4j+p7jy69ry8gposmmdznyqu6dtionprcxavg+sp...@mail.gmail.com
 
 
  in this approach, user who want to dump from old master will set
 'off'
  to follow_master_force and standby_mode, and gets the dump of old
  master after master started. OTOH, user who want to starts
 replication
  force will set 'on' to both parameter.
 
 I think before going into solution of this problem, it should be
 confirmed by others whether such a problem
 needs to be resolved as part of this patch.
 
 I have seen that Simon Riggs is a reviewer of this Patch and he hasn't
 mentioned his views about this problem.
 So I think it's not worth inventing a solution.
 
 Rather I think if all other things are resolved for this patch, then
 may be in end we can check with Committer,
 if he thinks that this problem needs to be solved as a separate patch.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-07-01 Thread Amit Kapila
On Friday, June 28, 2013 10:41 AM Sawada Masahiko wrote:
 On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila amit.kap...@huawei.com
 wrote:
  On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote:
  Hi,
 
  
   So our proposal on this problem is that we must ensure that
 master
  should
   not make any file system level changes without confirming that the
   corresponding WAL record is replicated to the standby.
  
 How will you take care of extra WAL on old master during
 recovery.
  If it
   plays the WAL which has not reached new-master, it can be a
 problem.
  
 
  I am trying to understand how there would be extra WAL on old master
  that it would replay and cause inconsistency. Consider how I am
  picturing it and correct me if I am wrong.
 
  1) Master crashes. So a failback standby becomes new master forking
 the
  WAL.
  2) Old master is restarted as a standby (now with this patch,
 without
  a new base backup).
  3) It would try to replay all the WAL it has available and later
  connect to the new master also following the timeline switch (the
  switch might happen using archived WAL and timeline history file OR
  the new switch-over-streaming-replication-connection as of 9.3,
  right?)
 
  * in (3), when the new standby/old master is replaying WAL, from
 where
  is it picking the WAL?
 Yes, this is the point which can lead to inconsistency, new
 standby/old master
 will replay WAL after the last successful checkpoint, for which he
 get info from
 control file. It is picking WAL from the location where it was
 logged when it was active (pg_xlog).
 
  Does it first replay all the WAL in pg_xlog
  before archive? Should we make it check for a timeline history file
 in
  archive before it starts replaying any WAL?
 
  I have really not thought what is best solution for problem.
 
  * And, would the new master, before forking the WAL, replay all the
  WAL that is necessary to come to state (of data directory) that the
  old master was just before it crashed?
 
  I don't think new master has any correlation with old master's data
 directory,
  Rather it will replay the WAL it has received/flushed before start
 acting as master.
 when old master fail over, WAL which ahead of new master might be
 broken data. so that when user want to dump from old master, there is
 possible to fail dump.
 it is just idea, we extend parameter which is used in recovery.conf
 like 'follow_master_force'. this parameter accepts 'on' and 'off', is
 effective only when standby_mode is set to on.
 
 if both parameters 'follow_master_force' and 'standby_mode' is set to
 'on',
 1. when standby server starts and starts to recovery, standby server
 skip to apply WAL which is in  pg_xlog, and request WAL from latest
 checkpoint LSN to master server.
 2. master server receives LSN which is standby server latest
 checkpoint, and compare between LSN of standby and LSN of master
 latest checkpoint. if those LSN match, master will send WAL from
 latest checkpoint LSN. if not, master will inform standby that failed.
 3. standby will fork WAL, and apply WAL which is sent from master
 continuity.

Please consider if this solution has the same problem as mentioned by Robert 
Hass in below mail:
http://www.postgresql.org/message-id/ca+tgmoy4j+p7jy69ry8gposmmdznyqu6dtionprcxavg+sp...@mail.gmail.com


 in this approach, user who want to dump from old master will set 'off'
 to follow_master_force and standby_mode, and gets the dump of old
 master after master started. OTOH, user who want to starts replication
 force will set 'on' to both parameter.

I think before going into solution of this problem, it should be confirmed by 
others whether such a problem
needs to be resolved as part of this patch.

I have seen that Simon Riggs is a reviewer of this Patch and he hasn't 
mentioned his views about this problem.
So I think it's not worth inventing a solution. 

Rather I think if all other things are resolved for this patch, then may be in 
end we can check with Committer, 
if he thinks that this problem needs to be solved as a separate patch.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-27 Thread Sawada Masahiko
On Mon, Jun 24, 2013 at 10:47 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 1. synchronous standby and make same as failback safe standby
 2. asynchronous standby and make same as failback safe standby

 in above case, adding new parameter might be meaningless. but I think
 that we should handle case not only case 1,2  but also following case
 3, 4 for DR.

to support case 1 and 2, I'm thinking that following another 2 ideas.


we add synchronous_transfer( commit/ data_flush/ all) . This GUC will
only affect the standbys mentioned in the list of
synchronous_standby_names.

1. If synchronous_transfer is set to commit, current synchronous
replication behavior is achieved
2. If synchronous_transfer is set to data_flush, the standbys named in
synchronous_standby_names will act as ASYNC failback safe standbys
3. If synchronous_transfer is set to all, the standbys named in
synchronous_standby_names will act as SYNC failback safe standbys

in this approach, 3 is confusing because we are actually setting up a
ASYNC standby by using the GUCs meant for sync standby setup.

-
we extend synchronous_commit so that it also accepts like 'all'. (
this approach dosen't provide 'synchronous_transfer' parameter)
'all' value means that master wait for not only replicated WAL but
also replicated data page (e.g., CLOG, pg_control). and master changes
the process by whether standby is connected as sync or async.

1. If synchronous_commit is set to 'all' and synchronous_standby_name
is set to standby name, the standbys named in
synchronous_standby_names will act as SYNC failback safe standby.
2. If synchronous_commit is set to 'all' and synchronous_standby_name
is NOT set to standby name, the standbys which is connecting to master
 will act as ASYNC failback safe standby.

one problem with not naming ASYNC standby explicitly is that the
master has no clue which standby to wait on.
If it chooses to wait on all async standbys for failback-safety that
can be quite detrimental, especially because async standbys can become
easily unreachable if they are on a slow link or at remote location.

please give me feedback.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-27 Thread Robert Haas
On Mon, Jun 17, 2013 at 7:48 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I am told, one of the very popular setups for DR is to have one
 local sync standby and one async (may be cascaded by the local sync). Since
 this new feature is more useful for DR because taking a fresh backup on a
 slower link is even more challenging, IMHO we should support such setups.

 ...which still doesn't make sense to me. Lets look at that in detail.

 Take 3 servers, A, B, C with A and B being linked by sync rep, and C
 being safety standby at a distance.

 Either A or B is master, except in disaster. So if A is master, then B
 would be the failover target. If A fails, then you want to failover to
 B. Once B is the target, you want to failback to A as the master. C
 needs to follow the new master, whichever it is.

 If you set up sync rep between A and B and this new mode between A and
 C. When B becomes the master, you need to failback from B from A, but
 you can't because the new mode applied between A and C only, so you
 have to failback from C to A. So having the new mode not match with
 sync rep means you are forcing people to failback using the slow link
 in the common case.

It's true that in this scenario that doesn't really make sense, but I
still think they are separate properties.  You could certainly want
synchronous replication without this new property, if you like the
data-loss guarantees that sync rep provides but don't care about
failback.  You could also want this new property without synchronous
replication, if you don't need the data-loss guarantees that sync rep
provides but you do care about fast failback.  I admit it seems
unlikely that you would use both features but not target them at the
same machines, although maybe: perhaps you have a sync standby and an
async standby and want this new property with respect to both of them.

In my admittedly limited experience, the use case for a lot of this
technology is in the cloud.  The general strategy seems to be: at the
first sign of trouble, kill the offending instance and fail over.
This can result in failing over pretty frequently, and needing it to
be fast.  There may be no real hardware problem; indeed, the failover
may be precipitated by network conditions or overload of the physical
host backing the virtual machine or any number of other nonphysical
problems.  I can see this being useful in that environment, even for
async standbys.  People can apparently tolerate a brief interruption
while their primary gets killed off and connections are re-established
with the new master, but they need the failover to be fast.  The
problem with the status quo is that, even if the first failover is
fast, the second one isn't, because it has to wait behind rebuilding
the original master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-27 Thread Sawada Masahiko
On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote:
 Hi,

 
  So our proposal on this problem is that we must ensure that master
 should
  not make any file system level changes without confirming that the
  corresponding WAL record is replicated to the standby.
 
How will you take care of extra WAL on old master during recovery.
 If it
  plays the WAL which has not reached new-master, it can be a problem.
 

 I am trying to understand how there would be extra WAL on old master
 that it would replay and cause inconsistency. Consider how I am
 picturing it and correct me if I am wrong.

 1) Master crashes. So a failback standby becomes new master forking the
 WAL.
 2) Old master is restarted as a standby (now with this patch, without
 a new base backup).
 3) It would try to replay all the WAL it has available and later
 connect to the new master also following the timeline switch (the
 switch might happen using archived WAL and timeline history file OR
 the new switch-over-streaming-replication-connection as of 9.3,
 right?)

 * in (3), when the new standby/old master is replaying WAL, from where
 is it picking the WAL?
Yes, this is the point which can lead to inconsistency, new standby/old 
 master
will replay WAL after the last successful checkpoint, for which he get 
 info from
control file. It is picking WAL from the location where it was logged when 
 it was active (pg_xlog).

 Does it first replay all the WAL in pg_xlog
 before archive? Should we make it check for a timeline history file in
 archive before it starts replaying any WAL?

 I have really not thought what is best solution for problem.

 * And, would the new master, before forking the WAL, replay all the
 WAL that is necessary to come to state (of data directory) that the
 old master was just before it crashed?

 I don't think new master has any correlation with old master's data directory,
 Rather it will replay the WAL it has received/flushed before start acting as 
 master.
when old master fail over, WAL which ahead of new master might be
broken data. so that when user want to dump from old master, there is
possible to fail dump.
it is just idea, we extend parameter which is used in recovery.conf
like 'follow_master_force'. this parameter accepts 'on' and 'off', is
effective only when standby_mode is set to on.

if both parameters 'follow_master_force' and 'standby_mode' is set to 'on',
1. when standby server starts and starts to recovery, standby server
skip to apply WAL which is in  pg_xlog, and request WAL from latest
checkpoint LSN to master server.
2. master server receives LSN which is standby server latest
checkpoint, and compare between LSN of standby and LSN of master
latest checkpoint. if those LSN match, master will send WAL from
latest checkpoint LSN. if not, master will inform standby that failed.
3. standby will fork WAL, and apply WAL which is sent from master continuity.

in this approach, user who want to dump from old master will set 'off'
to follow_master_force and standby_mode, and gets the dump of old
master after master started. OTOH, user who want to starts replication
force will set 'on' to both parameter.

please give me feedback.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-25 Thread Sawada Masahiko
On Tue, Jun 25, 2013 at 12:19 PM, Pavan Deolasee
pavan.deola...@gmail.com wrote:



 On Mon, Jun 24, 2013 at 7:17 PM, Sawada Masahiko sawada.m...@gmail.com
 wrote:



 
 [Server]
 standby_name = 'slave1'
 synchronous_transfer = commit
 wal_sender_timeout = 30
 [Server]
 standby_name = 'slave2'
 synchronous_transfer = all
 wal_sender_timeout = 50
 ---


 What different values/modes you are thinking for synchronous_transfer ? IMHO
 only commit and all may not be enough. As I suggested upthread, we may
 need an additional mode, say data, which will ensure synchronous WAL
 transfer before making any file system changes. We need this separate mode
 because the failback safe (or whatever we call it) standby need not wait on
 the commits and it's important to avoid that wait since it comes in a direct
 path of client transactions.

 If we are doing it, I wonder if an additional mode none also makes sense
 so that users can also control asynchronous standbys via the same mechanism.

I made mistake how to use name of parameter between
synchronous_transfer and failback_safe_standby_mode.
it means that we control file system changes using
failback_safe_standby_mode. if failback_safe_standby_mode is set
'remote_flush', master server wait for flushing all data page in
standby server (e.g., CLOG, pg_control).
right?
for example:
--
[server]
standby_name = 'slave1'
failback_safe_standby_mode = remote_flush
wal_sender_timeout = 50
--

in this case, we should also set synchronous_commit and
synchronous_level to each standby server. that is, do we need to set
following 3 parameters for supporting case 3,4 as I said?
-synchronous_commit = on/off/local/remote_write
-failback_safe_standby_mode = off/remote_write/remote_flush
-synchronous_level = sync/async (this parameter means that standby
server is connected using which mode (sync/async) .)

please give me your feedback.

Regards,
---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-25 Thread Amit Kapila
On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote:
 Hi,
 
 
  So our proposal on this problem is that we must ensure that master
 should
  not make any file system level changes without confirming that the
  corresponding WAL record is replicated to the standby.
 
How will you take care of extra WAL on old master during recovery.
 If it
  plays the WAL which has not reached new-master, it can be a problem.
 
 
 I am trying to understand how there would be extra WAL on old master
 that it would replay and cause inconsistency. Consider how I am
 picturing it and correct me if I am wrong.
 
 1) Master crashes. So a failback standby becomes new master forking the
 WAL.
 2) Old master is restarted as a standby (now with this patch, without
 a new base backup).
 3) It would try to replay all the WAL it has available and later
 connect to the new master also following the timeline switch (the
 switch might happen using archived WAL and timeline history file OR
 the new switch-over-streaming-replication-connection as of 9.3,
 right?)
 
 * in (3), when the new standby/old master is replaying WAL, from where
 is it picking the WAL?
   Yes, this is the point which can lead to inconsistency, new standby/old 
master
   will replay WAL after the last successful checkpoint, for which he get info 
from
   control file. It is picking WAL from the location where it was logged when 
it was active (pg_xlog).

 Does it first replay all the WAL in pg_xlog
 before archive? Should we make it check for a timeline history file in
 archive before it starts replaying any WAL?

I have really not thought what is best solution for problem.

 * And, would the new master, before forking the WAL, replay all the
 WAL that is necessary to come to state (of data directory) that the
 old master was just before it crashed?

I don't think new master has any correlation with old master's data directory,
Rather it will replay the WAL it has received/flushed before start acting as 
master.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-24 Thread Sawada Masahiko
On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 17 June 2013 09:03, Pavan Deolasee pavan.deola...@gmail.com wrote:

 I agree. We should probably find a better name for this. Any suggestions ?

 err, I already made one...

 But that's not the whole story. I can see some utility in a patch that
 makes all WAL transfer synchronous, rather than just commits. Some
 name like synchronous_transfer might be appropriate. e.g.
 synchronous_transfer = all | commit (default).

 Since commits are more foreground in nature and this feature
 does not require us to wait during common foreground activities, we want a
 configuration where master can wait for synchronous transfers at other than
 commits. May we can solve that by having more granular control to the said
 parameter ?


 The idea of another slew of parameters that are very similar to
 synchronous replication but yet somehow different seems weird. I can't
 see a reason why we'd want a second lot of parameters. Why not just
 use the existing ones for sync rep? (I'm surprised the Parameter
 Police haven't visited you in the night...) Sure, we might want to
 expand the design for how we specify multi-node sync rep, but that is
 a different patch.


 How would we then distinguish between synchronous and the new kind of
 standby ?

 That's not the point. The point is Why would we have a new kind of
 standby? and therefore why do we need new parameters?

 I am told, one of the very popular setups for DR is to have one
 local sync standby and one async (may be cascaded by the local sync). Since
 this new feature is more useful for DR because taking a fresh backup on a
 slower link is even more challenging, IMHO we should support such setups.

 ...which still doesn't make sense to me. Lets look at that in detail.

 Take 3 servers, A, B, C with A and B being linked by sync rep, and C
 being safety standby at a distance.

 Either A or B is master, except in disaster. So if A is master, then B
 would be the failover target. If A fails, then you want to failover to
 B. Once B is the target, you want to failback to A as the master. C
 needs to follow the new master, whichever it is.

 If you set up sync rep between A and B and this new mode between A and
 C. When B becomes the master, you need to failback from B from A, but
 you can't because the new mode applied between A and C only, so you
 have to failback from C to A. So having the new mode not match with
 sync rep means you are forcing people to failback using the slow link
 in the common case.

 You might observe that having the two modes match causes problems if A
 and B fail, so you are forced to go to C as master and then eventually
 failback to A or B across a slow link. That case is less common and
 could be solved by extending sync transfer to more/multi nodes.

 It definitely doesn't make sense to have sync rep on anything other
 than a subset of sync transfer. So while it may be sensible in the
 future to make sync transfer a superset of sync rep nodes, it makes
 sense to make them the same config for now.

when 2 servers being synchronous replication, those servers are in
same location in many cases. ( e.g., same server room)
so taking a full backup and sending it to old master is not issue.
this proposal works for situation which those servers are put in
remote location and when main site is powered down due to such as
power failure or natural disaster occurs.
as you said, we can control file (e.g., CLOG, pg_control, etc)
replicating by adding synchronous_transfer option.
but if to add only this parameter, we can handle only following 2 cases.

1. synchronous standby and make same as failback safe standby
2. asynchronous standby and make same as failback safe standby

in above case, adding new parameter might be meaningless. but I think
that we should handle case not only case 1,2  but also following case
3, 4 for DR.

3. synchronous standby and make different asynchronous failback safe standby
4. asynchronous standby and make different asynchronous failback safe standby

To handles following case 3 and 4, we should set parameter to each
standby. so we need to adding new parameter.
if we can structure replication in such situation, replication would
be more useful for user in slow link.

parameter improvement idea is which we extend ini file for to set
parameter each standby. For example :


[Server]
standby_name = 'slave1'
synchronous_transfer = commit
wal_sender_timeout = 30
[Server]
standby_name = 'slave2'
synchronous_transfer = all
wal_sender_timeout = 50
---

there are discussions about such ini file in past. if so, we can set
each parameter to each standby.

please give me feedback.

Regards,
---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-24 Thread Amit Langote
Hi,

 parameter improvement idea is which we extend ini file for to set
 parameter each standby. For example :

 
 [Server]
 standby_name = 'slave1'
 synchronous_transfer = commit
 wal_sender_timeout = 30
 [Server]
 standby_name = 'slave2'
 synchronous_transfer = all
 wal_sender_timeout = 50
 ---


Just ask to clarify:

Is 'slave2' a failback standby?
What does 'synchronous_transfer = all' mean? Does that mean wait
during both commit and checkpoint?



--
Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-24 Thread Pavan Deolasee
On Mon, Jun 24, 2013 at 7:17 PM, Sawada Masahiko sawada.m...@gmail.comwrote:



 
 [Server]
 standby_name = 'slave1'
 synchronous_transfer = commit
 wal_sender_timeout = 30
 [Server]
 standby_name = 'slave2'
 synchronous_transfer = all
 wal_sender_timeout = 50
 ---


What different values/modes you are thinking for synchronous_transfer ?
IMHO only commit and all may not be enough. As I suggested upthread, we
may need an additional mode, say data, which will ensure synchronous WAL
transfer before making any file system changes. We need this separate mode
because the failback safe (or whatever we call it) standby need not wait on
the commits and it's important to avoid that wait since it comes in a
direct path of client transactions.

If we are doing it, I wonder if an additional mode none also makes sense
so that users can also control asynchronous standbys via the same mechanism.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-24 Thread Amit Langote
Hi,


 So our proposal on this problem is that we must ensure that master should
 not make any file system level changes without confirming that the
 corresponding WAL record is replicated to the standby.

   How will you take care of extra WAL on old master during recovery. If it
 plays the WAL which has not reached new-master, it can be a problem.


I am trying to understand how there would be extra WAL on old master
that it would replay and cause inconsistency. Consider how I am
picturing it and correct me if I am wrong.

1) Master crashes. So a failback standby becomes new master forking the WAL.
2) Old master is restarted as a standby (now with this patch, without
a new base backup).
3) It would try to replay all the WAL it has available and later
connect to the new master also following the timeline switch (the
switch might happen using archived WAL and timeline history file OR
the new switch-over-streaming-replication-connection as of 9.3,
right?)

* in (3), when the new standby/old master is replaying WAL, from where
is it picking the WAL? Does it first replay all the WAL in pg_xlog
before archive? Should we make it check for a timeline history file in
archive before it starts replaying any WAL?

* And, would the new master, before forking the WAL, replay all the
WAL that is necessary to come to state (of data directory) that the
old master was just before it crashed?

Am I missing something here?

--
Amit Langote


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-20 Thread Amit Kapila
On Wednesday, June 19, 2013 10:45 PM Sawada Masahiko wrote:
On Tuesday, June 18, 2013, Amit Kapila wrote:
On Tuesday, June 18, 2013 12:18 AM Sawada Masahiko wrote:
 On Sun, Jun 16, 2013 at 2:00 PM, Amit kapila amit.kap...@huawei.com
 wrote:
  On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote:
  On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila
 amit.kap...@huawei.com wrote:
 
  On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote:
  On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila
 amit.kap...@huawei.com wrote:
  On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
  Hello,
 

 I think that we can dumping data before all WAL files deleting.  All
 WAL files deleting is done when old master starts as new standby.

  Can we dump data without starting server?
Sorry I made a mistake. We can't it.

  this proposing patch need to be able to also handle such scenario in future.

I am not sure the purposed patch can handle it so easily, but I think if others 
also felt it important, then a method should be a 
provided to user for extracting his last committed data.


With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-19 Thread Sawada Masahiko
On Tuesday, June 18, 2013, Amit Kapila wrote:

 On Tuesday, June 18, 2013 12:18 AM Sawada Masahiko wrote:
  On Sun, Jun 16, 2013 at 2:00 PM, Amit kapila 
  amit.kap...@huawei.comjavascript:;
 
  wrote:
   On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote:
   On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila
  amit.kap...@huawei.com javascript:; wrote:
  
   On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote:
   On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila
  amit.kap...@huawei.com javascript:; wrote:
   On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
   Hello,
  
   We have already started a discussion on pgsql-hackers for the
  problem of
   taking fresh backup during the failback operation here is the
  link for that:
  
  
   http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-
  6OzRaew5pWhk7yQtb
   jgwrfu513...@mail.gmail.com javascript:;
  
   Let me again summarize the problem we are trying to address.
  
  
 How will you take care of extra WAL on old master during
  recovery. If it
   plays the WAL which has not reached new-master, it can be a
  problem.
  
   you means that there is possible that old master's data ahead of
  new
   master's data.
  
 I mean to say is that WAL of old master can be ahead of new
  master. I understood that
 data files of old master can't be ahead, but I think WAL can be
  ahead.
  
   so there is inconsistent data between those server when fail back.
  right?
   if so , there is not possible inconsistent. because if you use GUC
  option
   as his propose (i.g., failback_safe_standby_mode = remote_flush),
   when old master is working fine, all file system level changes
  aren't
   done  before WAL replicated.
  
   Would the propose patch will take care that old master's WAL is
  also not ahead in some way?
   If yes, I think i am missing some point.
  
   yes it will happen that old master's WAL ahead of new master's WAL
  as you said.
   but I think that we can solve them by delete all WAL file when old
   master starts as new standby.
  
   I think ideally, it should reset WAL location at the point where new
  master has forrked off.
   In such a scenario it would be difficult for user who wants to get a
  dump of some data in
   old master which hasn't gone to new master. I am not sure if such a
  need is there for real users, but if it
   is there, then providing this solution will have some drawbacks.

  I think that we can dumping data before all WAL files deleting.  All
  WAL files deleting is done when old master starts as new standby.

   Can we dump data without starting server?

 Sorry I made a mistake. We can't it.

 this proposing patch need to be able to also handle such scenario in
future.

Regards,

---
Sawada Masahiko


-- 
Regards,

---
Sawada Masahiko


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-18 Thread Sawada Masahiko
On Tuesday, June 18, 2013, Amit Kapila wrote:

 On Tuesday, June 18, 2013 12:18 AM Sawada Masahiko wrote:
  On Sun, Jun 16, 2013 at 2:00 PM, Amit kapila 
  amit.kap...@huawei.comjavascript:;
 
  wrote:
   On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote:
   On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila
  amit.kap...@huawei.com javascript:; wrote:
  
   On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote:
   On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila
  amit.kap...@huawei.com javascript:; wrote:
   On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
   Hello,
  
   We have already started a discussion on pgsql-hackers for the
  problem of
   taking fresh backup during the failback operation here is the
  link for that:
  
  
   http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-
  6OzRaew5pWhk7yQtb
   jgwrfu513...@mail.gmail.com javascript:;
  
   Let me again summarize the problem we are trying to address.
  
  
 How will you take care of extra WAL on old master during
  recovery. If it
   plays the WAL which has not reached new-master, it can be a
  problem.
  
   you means that there is possible that old master's data ahead of
  new
   master's data.
  
 I mean to say is that WAL of old master can be ahead of new
  master. I understood that
 data files of old master can't be ahead, but I think WAL can be
  ahead.
  
   so there is inconsistent data between those server when fail back.
  right?
   if so , there is not possible inconsistent. because if you use GUC
  option
   as his propose (i.g., failback_safe_standby_mode = remote_flush),
   when old master is working fine, all file system level changes
  aren't
   done  before WAL replicated.
  
   Would the propose patch will take care that old master's WAL is
  also not ahead in some way?
   If yes, I think i am missing some point.
  
   yes it will happen that old master's WAL ahead of new master's WAL
  as you said.
   but I think that we can solve them by delete all WAL file when old
   master starts as new standby.
  
   I think ideally, it should reset WAL location at the point where new
  master has forrked off.
   In such a scenario it would be difficult for user who wants to get a
  dump of some data in
   old master which hasn't gone to new master. I am not sure if such a
  need is there for real users, but if it
   is there, then providing this solution will have some drawbacks.

  I think that we can dumping data before all WAL files deleting.  All
  WAL files deleting is done when old master starts as new standby.

   Can we dump data without starting server?

 Sorry I made a mistake. We can't it.

 this proposing patch need to be able to also handle such scenario in
future.

Regards,

---
Sawada Masahiko


-- 
Regards,

---
Sawada Masahiko


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-17 Thread Samrat Revagade
On Sun, Jun 16, 2013 at 11:08 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 16 June 2013 17:25, Samrat Revagade revagade.sam...@gmail.com wrote:
 
 
  On Sun, Jun 16, 2013 at 5:10 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
 
 
 
  So I strongly object to calling this patch anything to do with
  failback safe. You simply don't have enough data to make such a bold
  claim. (Which is why we call it synchronous replication and not zero
  data loss, for example).
 
  But that's not the whole story. I can see some utility in a patch that
  makes all WAL transfer synchronous, rather than just commits. Some
  name like synchronous_transfer might be appropriate. e.g.
  synchronous_transfer = all | commit (default).
 
 
  I agree with you about the fact that,
  Now a days the need of fresh backup in crash recovery  seems to be a
 major
  problem.
  we might need to change the name of patch if there other problems too
 with
  crash recovery.

 (Sorry don't understand)

  Sorry for the confusion. I will change name of a patch.


  The idea of another slew of parameters that are very similar to
  synchronous replication but yet somehow different seems weird. I can't
  see a reason why we'd want a second lot of parameters. Why not just
  use the existing ones for sync rep? (I'm surprised the Parameter
  Police haven't visited you in the night...) Sure, we might want to
  expand the design for how we specify multi-node sync rep, but that is
  a different patch.
 
 
  The different set of parameters are needed to differentiate between
  fail-safe standby and synchronous standby, the fail-safe standby and
 standby
  in synchronous replication can be two different servers.

 Why would they be different? What possible reason would you have for
 that config? There is no *need* for those parameters, the proposal
 could work perfectly well without them.

 Let's make this patch fulfill the stated objectives, not add in
 optional extras, especially ones that don't appear well thought
 through. If you wish to enhance the design for the specification of
 multi-node sync rep, make that a separate patch, later.

 I agree with you.I will remove the extra parameters if they are not
required in next version of the patch.

-- 
Regards,

Samrat Revgade


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-17 Thread Pavan Deolasee
On Sun, Jun 16, 2013 at 5:10 PM, Simon Riggs si...@2ndquadrant.com wrote:



 My perspective is that if the master crashed, assuming that you know
 everything about that and suddenly jumping back on seem like a recipe
 for disaster. Attempting that is currently blocked by the technical
 obstacles you've identified, but that doesn't mean they are the only
 ones - we don't yet understand what all the problems lurking might be.
 Personally, I won't be following you onto that minefield anytime soon.


Would it be fair to say that a user will be willing to trust her crashed
master in all scenarios where she would have done so in a single instance
setup ? IOW without the replication setup, AFAIU users have traditionally
trusted the WAL recovery to recover from failed instances. This would
include some common failures such as power outages and hardware failures,
but may not include others such as on disk corruption.


 So I strongly object to calling this patch anything to do with
 failback safe. You simply don't have enough data to make such a bold
 claim. (Which is why we call it synchronous replication and not zero
 data loss, for example).


I agree. We should probably find a better name for this. Any suggestions ?


 But that's not the whole story. I can see some utility in a patch that
 makes all WAL transfer synchronous, rather than just commits. Some
 name like synchronous_transfer might be appropriate. e.g.
 synchronous_transfer = all | commit (default).


Its an interesting idea, but I think there is some difference here. For
example, the proposed feature allows a backend to wait at other points but
not commit. Since commits are more foreground in nature and this feature
does not require us to wait during common foreground activities, we want a
configuration where master can wait for synchronous transfers at other than
commits. May we can solve that by having more granular control to the said
parameter ?


 The idea of another slew of parameters that are very similar to
 synchronous replication but yet somehow different seems weird. I can't
 see a reason why we'd want a second lot of parameters. Why not just
 use the existing ones for sync rep? (I'm surprised the Parameter
 Police haven't visited you in the night...) Sure, we might want to
 expand the design for how we specify multi-node sync rep, but that is
 a different patch.


How would we then distinguish between synchronous and the new kind of
standby ? I am told, one of the very popular setups for DR is to have one
local sync standby and one async (may be cascaded by the local sync). Since
this new feature is more useful for DR because taking a fresh backup on a
slower link is even more challenging, IMHO we should support such setups.



 I'm worried to see that adding this feature and yet turning it off
 causes a measureable drop in performance. I don't think we want that
 at all. That clearly needs more work and thought.


I agree. We need to repeat those tests. I don't trust that turning the
feature is causing 1-2% drop. In one of the tests, I see turning the
feature on is showing better number compared to when its turn off. That's
clearly noise or need concrete argument to convince that way.


 I also think your performance results are somewhat bogus. Fast
 transaction workloads were already mostly commit waits -


But not in case of async standby, right ?


 measurements
 of what happens to large loads, index builds etc would likely reveal
 something quite different.


I agree. I also feel we need tests where the FlushBuffer gets called more
often by the normal backends to see how much added wait in that code path
causes performance drops. Another important thing to test would be to see
how it works on a slower/high latency links.


 I'm tempted by the thought that we should put the WaitForLSN inside
 XLogFlush, rather than scatter additional calls everywhere and then
 have us inevitably miss one.


That indeed seems cleaner.

Thanks,
Pavan


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-17 Thread Simon Riggs
On 17 June 2013 09:03, Pavan Deolasee pavan.deola...@gmail.com wrote:

 I agree. We should probably find a better name for this. Any suggestions ?

err, I already made one...

 But that's not the whole story. I can see some utility in a patch that
 makes all WAL transfer synchronous, rather than just commits. Some
 name like synchronous_transfer might be appropriate. e.g.
 synchronous_transfer = all | commit (default).

 Since commits are more foreground in nature and this feature
 does not require us to wait during common foreground activities, we want a
 configuration where master can wait for synchronous transfers at other than
 commits. May we can solve that by having more granular control to the said
 parameter ?


 The idea of another slew of parameters that are very similar to
 synchronous replication but yet somehow different seems weird. I can't
 see a reason why we'd want a second lot of parameters. Why not just
 use the existing ones for sync rep? (I'm surprised the Parameter
 Police haven't visited you in the night...) Sure, we might want to
 expand the design for how we specify multi-node sync rep, but that is
 a different patch.


 How would we then distinguish between synchronous and the new kind of
 standby ?

That's not the point. The point is Why would we have a new kind of
standby? and therefore why do we need new parameters?

 I am told, one of the very popular setups for DR is to have one
 local sync standby and one async (may be cascaded by the local sync). Since
 this new feature is more useful for DR because taking a fresh backup on a
 slower link is even more challenging, IMHO we should support such setups.

...which still doesn't make sense to me. Lets look at that in detail.

Take 3 servers, A, B, C with A and B being linked by sync rep, and C
being safety standby at a distance.

Either A or B is master, except in disaster. So if A is master, then B
would be the failover target. If A fails, then you want to failover to
B. Once B is the target, you want to failback to A as the master. C
needs to follow the new master, whichever it is.

If you set up sync rep between A and B and this new mode between A and
C. When B becomes the master, you need to failback from B from A, but
you can't because the new mode applied between A and C only, so you
have to failback from C to A. So having the new mode not match with
sync rep means you are forcing people to failback using the slow link
in the common case.

You might observe that having the two modes match causes problems if A
and B fail, so you are forced to go to C as master and then eventually
failback to A or B across a slow link. That case is less common and
could be solved by extending sync transfer to more/multi nodes.

It definitely doesn't make sense to have sync rep on anything other
than a subset of sync transfer. So while it may be sensible in the
future to make sync transfer a superset of sync rep nodes, it makes
sense to make them the same config for now.

Phew

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-17 Thread Sawada Masahiko
On Sun, Jun 16, 2013 at 2:00 PM, Amit kapila amit.kap...@huawei.com wrote:
 On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote:
 On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila amit.kap...@huawei.com wrote:

 On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote:
 On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
 Hello,

 We have already started a discussion on pgsql-hackers for the problem of
 taking fresh backup during the failback operation here is the link for 
 that:


 http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb
 jgwrfu513...@mail.gmail.com

 Let me again summarize the problem we are trying to address.


   How will you take care of extra WAL on old master during recovery. If it
 plays the WAL which has not reached new-master, it can be a problem.

 you means that there is possible that old master's data ahead of new
 master's data.

   I mean to say is that WAL of old master can be ahead of new master. I 
 understood that
   data files of old master can't be ahead, but I think WAL can be ahead.

 so there is inconsistent data between those server when fail back. right?
 if so , there is not possible inconsistent. because if you use GUC option
 as his propose (i.g., failback_safe_standby_mode = remote_flush),
 when old master is working fine, all file system level changes aren't
 done  before WAL replicated.

 Would the propose patch will take care that old master's WAL is also not 
 ahead in some way?
 If yes, I think i am missing some point.

 yes it will happen that old master's WAL ahead of new master's WAL as you 
 said.
 but I think that we can solve them by delete all WAL file when old
 master starts as new standby.

 I think ideally, it should reset WAL location at the point where new master 
 has forrked off.
 In such a scenario it would be difficult for user who wants to get a dump of 
 some data in
 old master which hasn't gone to new master. I am not sure if such a need is 
 there for real users, but if it
 is there, then providing this solution will have some drawbacks.
I think that we can dumping data before all WAL files deleting.  All
WAL files deleting is done when old master starts as new standby.


Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-17 Thread Amit Kapila
On Tuesday, June 18, 2013 12:18 AM Sawada Masahiko wrote:
 On Sun, Jun 16, 2013 at 2:00 PM, Amit kapila amit.kap...@huawei.com
 wrote:
  On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote:
  On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila
 amit.kap...@huawei.com wrote:
 
  On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote:
  On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila
 amit.kap...@huawei.com wrote:
  On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
  Hello,
 
  We have already started a discussion on pgsql-hackers for the
 problem of
  taking fresh backup during the failback operation here is the
 link for that:
 
 
  http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-
 6OzRaew5pWhk7yQtb
  jgwrfu513...@mail.gmail.com
 
  Let me again summarize the problem we are trying to address.
 
 
How will you take care of extra WAL on old master during
 recovery. If it
  plays the WAL which has not reached new-master, it can be a
 problem.
 
  you means that there is possible that old master's data ahead of
 new
  master's data.
 
I mean to say is that WAL of old master can be ahead of new
 master. I understood that
data files of old master can't be ahead, but I think WAL can be
 ahead.
 
  so there is inconsistent data between those server when fail back.
 right?
  if so , there is not possible inconsistent. because if you use GUC
 option
  as his propose (i.g., failback_safe_standby_mode = remote_flush),
  when old master is working fine, all file system level changes
 aren't
  done  before WAL replicated.
 
  Would the propose patch will take care that old master's WAL is
 also not ahead in some way?
  If yes, I think i am missing some point.
 
  yes it will happen that old master's WAL ahead of new master's WAL
 as you said.
  but I think that we can solve them by delete all WAL file when old
  master starts as new standby.
 
  I think ideally, it should reset WAL location at the point where new
 master has forrked off.
  In such a scenario it would be difficult for user who wants to get a
 dump of some data in
  old master which hasn't gone to new master. I am not sure if such a
 need is there for real users, but if it
  is there, then providing this solution will have some drawbacks.

 I think that we can dumping data before all WAL files deleting.  All
 WAL files deleting is done when old master starts as new standby.

  Can we dump data without starting server?

With Regards,
Amit Kapila.  



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-16 Thread Simon Riggs
On 14 June 2013 17:21, Jeff Davis pg...@j-davis.com wrote:
 On Fri, 2013-06-14 at 16:10 +0200, Andres Freund wrote:
 Jeff Davis has a patch pending
 (1365493015.7580.3240.camel@sussancws0025) that passes the buffer_std
 flag down to MarkBufferDirtyHint() for exactly that reason. I thought we
 were on track committing that, but rereading the thread it doesn't look
 that way.

 Jeff, care to update that patch?

 Rebased and attached. Changed so all callers use buffer_std=true except
 those in freespace.c and fsmpage.c.

 Simon, did you (or anyone else) have an objection to this patch? If not,
 I'll go ahead and commit it tomorrow morning.

I didn't have a specific objection to the patch, I just wanted to
minimise change relating to this so we didn't introduce further bugs.

I've no objection to you committing that.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-16 Thread Simon Riggs
On 14 June 2013 10:11, Samrat Revagade revagade.sam...@gmail.com wrote:

 We have already started a discussion on pgsql-hackers for the problem of
 taking fresh backup during the failback operation here is the link for that:

 http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com


 So our proposal on this problem is that we must ensure that master should
 not make any file system level changes without confirming that the
 corresponding WAL record is replicated to the standby.

 1. The main objection was raised by Tom and others is that we should not add
 this feature and should go with traditional way of taking fresh backup using
 the rsync, because he was concerned about the additional complexity of the
 patch and the performance overhead during normal operations.

 2. Tom and others were also worried about the inconsistencies in the crashed
 master and suggested that its better to start with a fresh backup. Fujii
 Masao and others correctly countered that suggesting that we trust WAL
 recovery to clear all such inconsistencies and there is no reason why we
 can't do the same here.

 So the patch is showing 1-2% performance overhead.


Let's have a look at this...

The objections you summarise that Tom has made are ones that I agree
with. I also don't think that Fujii correctly countered those
objections.

My perspective is that if the master crashed, assuming that you know
everything about that and suddenly jumping back on seem like a recipe
for disaster. Attempting that is currently blocked by the technical
obstacles you've identified, but that doesn't mean they are the only
ones - we don't yet understand what all the problems lurking might be.
Personally, I won't be following you onto that minefield anytime soon.

So I strongly object to calling this patch anything to do with
failback safe. You simply don't have enough data to make such a bold
claim. (Which is why we call it synchronous replication and not zero
data loss, for example).

But that's not the whole story. I can see some utility in a patch that
makes all WAL transfer synchronous, rather than just commits. Some
name like synchronous_transfer might be appropriate. e.g.
synchronous_transfer = all | commit (default).

The idea of another slew of parameters that are very similar to
synchronous replication but yet somehow different seems weird. I can't
see a reason why we'd want a second lot of parameters. Why not just
use the existing ones for sync rep? (I'm surprised the Parameter
Police haven't visited you in the night...) Sure, we might want to
expand the design for how we specify multi-node sync rep, but that is
a different patch.

I'm worried to see that adding this feature and yet turning it off
causes a measureable drop in performance. I don't think we want that
at all. That clearly needs more work and thought.

I also think your performance results are somewhat bogus. Fast
transaction workloads were already mostly commit waits - measurements
of what happens to large loads, index builds etc would likely reveal
something quite different.

I'm tempted by the thought that we should put the WaitForLSN inside
XLogFlush, rather than scatter additional calls everywhere and then
have us inevitably miss one.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-16 Thread Samrat Revagade
On Sun, Jun 16, 2013 at 5:10 PM, Simon Riggs si...@2ndquadrant.com wrote:



 So I strongly object to calling this patch anything to do with
 failback safe. You simply don't have enough data to make such a bold
 claim. (Which is why we call it synchronous replication and not zero
 data loss, for example).

 But that's not the whole story. I can see some utility in a patch that
 makes all WAL transfer synchronous, rather than just commits. Some
 name like synchronous_transfer might be appropriate. e.g.
 synchronous_transfer = all | commit (default).


I agree with you about the fact that,
Now a days the need of fresh backup in crash recovery  seems to be a major
problem.
we might need to change the name of patch if there other problems too with
crash recovery.

The idea of another slew of parameters that are very similar to
 synchronous replication but yet somehow different seems weird. I can't
 see a reason why we'd want a second lot of parameters. Why not just
 use the existing ones for sync rep? (I'm surprised the Parameter
 Police haven't visited you in the night...) Sure, we might want to
 expand the design for how we specify multi-node sync rep, but that is
 a different patch.


The different set of parameters are needed to differentiate between
fail-safe standby and synchronous standby, the fail-safe standby and
standby in synchronous replication can be two different servers.



 I'm worried to see that adding this feature and yet turning it off
 causes a measureable drop in performance. I don't think we want that
 at all. That clearly needs more work and thought.

 I also think your performance results are somewhat bogus. Fast
 transaction workloads were already mostly commit waits - measurements
 of what happens to large loads, index builds etc would likely reveal
 something quite different.


I will test the other scenarios and post the results.



 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services




-- 
Regards,

Samrat Revgade


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-16 Thread Simon Riggs
On 16 June 2013 17:25, Samrat Revagade revagade.sam...@gmail.com wrote:


 On Sun, Jun 16, 2013 at 5:10 PM, Simon Riggs si...@2ndquadrant.com wrote:



 So I strongly object to calling this patch anything to do with
 failback safe. You simply don't have enough data to make such a bold
 claim. (Which is why we call it synchronous replication and not zero
 data loss, for example).

 But that's not the whole story. I can see some utility in a patch that
 makes all WAL transfer synchronous, rather than just commits. Some
 name like synchronous_transfer might be appropriate. e.g.
 synchronous_transfer = all | commit (default).


 I agree with you about the fact that,
 Now a days the need of fresh backup in crash recovery  seems to be a major
 problem.
 we might need to change the name of patch if there other problems too with
 crash recovery.

(Sorry don't understand)

 The idea of another slew of parameters that are very similar to
 synchronous replication but yet somehow different seems weird. I can't
 see a reason why we'd want a second lot of parameters. Why not just
 use the existing ones for sync rep? (I'm surprised the Parameter
 Police haven't visited you in the night...) Sure, we might want to
 expand the design for how we specify multi-node sync rep, but that is
 a different patch.


 The different set of parameters are needed to differentiate between
 fail-safe standby and synchronous standby, the fail-safe standby and standby
 in synchronous replication can be two different servers.

Why would they be different? What possible reason would you have for
that config? There is no *need* for those parameters, the proposal
could work perfectly well without them.

Let's make this patch fulfill the stated objectives, not add in
optional extras, especially ones that don't appear well thought
through. If you wish to enhance the design for the specification of
multi-node sync rep, make that a separate patch, later.

--
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-15 Thread Sawada Masahiko
On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
 Hello,

 We have already started a discussion on pgsql-hackers for the problem of
 taking fresh backup during the failback operation here is the link for that:


 http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb
 jgwrfu513...@mail.gmail.com

 Let me again summarize the problem we are trying to address.

 When the master fails, last few WAL files may not reach the standby. But
 the master may have gone ahead and made changes to its local file system
 after  flushing WAL to the local storage.  So master contains some file
 system level changes that standby does not have.  At this point, the data
 directory of  master is ahead of standby's data directory.
 Subsequently, the standby will be promoted as new master.  Later when the
 old master wants to be a standby of the new master, it can't just join the
 setup since there is inconsistency in between these two servers. We need
 to take the fresh backup from the new master.  This can happen in both the
 synchronous as well as asynchronous replication.

 Fresh backup is also needed in case of clean switch-over because in the
 current HEAD, the master does not wait for the standby to receive all the
 WAL
 up to the shutdown checkpoint record before shutting down the connection.
 Fujii Masao has already submitted a patch to handle clean switch-over case,
 but the problem is still remaining for failback case.

 The process of taking fresh backup is very time consuming when databases
 are of very big sizes, say several TB's, and when the servers are connected
 over a relatively slower link.  This would break the service level
 agreement of disaster recovery system.  So there is need to improve the
 process of
 disaster recovery in PostgreSQL.  One way to achieve this is to maintain
 consistency between master and standby which helps to avoid need of fresh
 backup.

 So our proposal on this problem is that we must ensure that master should
 not make any file system level changes without confirming that the
 corresponding WAL record is replicated to the standby.

   How will you take care of extra WAL on old master during recovery. If it
 plays the WAL which has not reached new-master, it can be a problem.

you means that there is possible that old master's data ahead of new
master's data.
so there is inconsistent data between those server when fail back. right?
if so , there is not possible inconsistent. because if you use GUC option
as his propose (i.g., failback_safe_standby_mode = remote_flush),
when old master is working fine, all file system level changes aren't
done  before WAL replicated.

--
Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-15 Thread Amit kapila

On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote:
On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
 Hello,

 We have already started a discussion on pgsql-hackers for the problem of
 taking fresh backup during the failback operation here is the link for that:


 http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb
 jgwrfu513...@mail.gmail.com

 Let me again summarize the problem we are trying to address.


   How will you take care of extra WAL on old master during recovery. If it
 plays the WAL which has not reached new-master, it can be a problem.

 you means that there is possible that old master's data ahead of new
 master's data.

  I mean to say is that WAL of old master can be ahead of new master. I 
understood that
  data files of old master can't be ahead, but I think WAL can be ahead.

 so there is inconsistent data between those server when fail back. right?
 if so , there is not possible inconsistent. because if you use GUC option
 as his propose (i.g., failback_safe_standby_mode = remote_flush),
 when old master is working fine, all file system level changes aren't
 done  before WAL replicated.

Would the propose patch will take care that old master's WAL is also not ahead 
in some way?
If yes, I think i am missing some point.

With Regards,
Amit Kapila.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-15 Thread Sawada Masahiko
On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila amit.kap...@huawei.com wrote:

 On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote:
 On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
 Hello,

 We have already started a discussion on pgsql-hackers for the problem of
 taking fresh backup during the failback operation here is the link for 
 that:


 http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb
 jgwrfu513...@mail.gmail.com

 Let me again summarize the problem we are trying to address.


   How will you take care of extra WAL on old master during recovery. If it
 plays the WAL which has not reached new-master, it can be a problem.

 you means that there is possible that old master's data ahead of new
 master's data.

   I mean to say is that WAL of old master can be ahead of new master. I 
 understood that
   data files of old master can't be ahead, but I think WAL can be ahead.

 so there is inconsistent data between those server when fail back. right?
 if so , there is not possible inconsistent. because if you use GUC option
 as his propose (i.g., failback_safe_standby_mode = remote_flush),
 when old master is working fine, all file system level changes aren't
 done  before WAL replicated.

 Would the propose patch will take care that old master's WAL is also not 
 ahead in some way?
 If yes, I think i am missing some point.

yes it will happen that old master's WAL ahead of new master's WAL as you said.
but I think that we can solve them by delete all WAL file when old
master starts as new standby.
 thought?

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-15 Thread Jeff Davis
On Fri, 2013-06-14 at 18:27 +0200, Andres Freund wrote:
 I'd like to see a comment around the memcpys in XLogSaveBufferForHint()
 that mentions that they are safe in a non std buffer due to
 XLogCheckBuffer setting an appropriate hole/offset. Or make an explicit
 change of the copy algorithm there.

Done.

 Btw, if you touch that code, I'd vote for renaming XLOG_HINT to XLOG_FPI
 or something like that. I find the former name confusing...

Also done.

Patch attached. Also, since we branched, I think this should be
back-patched to 9.3 as well.

Regards,
Jeff Davis

*** a/src/backend/access/hash/hash.c
--- b/src/backend/access/hash/hash.c
***
*** 287,293  hashgettuple(PG_FUNCTION_ARGS)
  			/*
  			 * Since this can be redone later if needed, mark as a hint.
  			 */
! 			MarkBufferDirtyHint(buf);
  		}
  
  		/*
--- 287,293 
  			/*
  			 * Since this can be redone later if needed, mark as a hint.
  			 */
! 			MarkBufferDirtyHint(buf, true);
  		}
  
  		/*
*** a/src/backend/access/heap/pruneheap.c
--- b/src/backend/access/heap/pruneheap.c
***
*** 262,268  heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin,
  		{
  			((PageHeader) page)-pd_prune_xid = prstate.new_prune_xid;
  			PageClearFull(page);
! 			MarkBufferDirtyHint(buffer);
  		}
  	}
  
--- 262,268 
  		{
  			((PageHeader) page)-pd_prune_xid = prstate.new_prune_xid;
  			PageClearFull(page);
! 			MarkBufferDirtyHint(buffer, true);
  		}
  	}
  
*** a/src/backend/access/nbtree/nbtinsert.c
--- b/src/backend/access/nbtree/nbtinsert.c
***
*** 413,421  _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel,
  	 * crucial. Be sure to mark the proper buffer dirty.
  	 */
  	if (nbuf != InvalidBuffer)
! 		MarkBufferDirtyHint(nbuf);
  	else
! 		MarkBufferDirtyHint(buf);
  }
  			}
  		}
--- 413,421 
  	 * crucial. Be sure to mark the proper buffer dirty.
  	 */
  	if (nbuf != InvalidBuffer)
! 		MarkBufferDirtyHint(nbuf, true);
  	else
! 		MarkBufferDirtyHint(buf, true);
  }
  			}
  		}
*** a/src/backend/access/nbtree/nbtree.c
--- b/src/backend/access/nbtree/nbtree.c
***
*** 1052,1058  restart:
  opaque-btpo_cycleid == vstate-cycleid)
  			{
  opaque-btpo_cycleid = 0;
! MarkBufferDirtyHint(buf);
  			}
  		}
  
--- 1052,1058 
  opaque-btpo_cycleid == vstate-cycleid)
  			{
  opaque-btpo_cycleid = 0;
! MarkBufferDirtyHint(buf, true);
  			}
  		}
  
*** a/src/backend/access/nbtree/nbtutils.c
--- b/src/backend/access/nbtree/nbtutils.c
***
*** 1789,1795  _bt_killitems(IndexScanDesc scan, bool haveLock)
  	if (killedsomething)
  	{
  		opaque-btpo_flags |= BTP_HAS_GARBAGE;
! 		MarkBufferDirtyHint(so-currPos.buf);
  	}
  
  	if (!haveLock)
--- 1789,1795 
  	if (killedsomething)
  	{
  		opaque-btpo_flags |= BTP_HAS_GARBAGE;
! 		MarkBufferDirtyHint(so-currPos.buf, true);
  	}
  
  	if (!haveLock)
*** a/src/backend/access/rmgrdesc/xlogdesc.c
--- b/src/backend/access/rmgrdesc/xlogdesc.c
***
*** 82,92  xlog_desc(StringInfo buf, uint8 xl_info, char *rec)
  		appendStringInfo(buf, restore point: %s, xlrec-rp_name);
  
  	}
! 	else if (info == XLOG_HINT)
  	{
  		BkpBlock   *bkp = (BkpBlock *) rec;
  
! 		appendStringInfo(buf, page hint: %s block %u,
  		 relpathperm(bkp-node, bkp-fork),
  		 bkp-block);
  	}
--- 82,92 
  		appendStringInfo(buf, restore point: %s, xlrec-rp_name);
  
  	}
! 	else if (info == XLOG_FPI)
  	{
  		BkpBlock   *bkp = (BkpBlock *) rec;
  
! 		appendStringInfo(buf, full-page image: %s block %u,
  		 relpathperm(bkp-node, bkp-fork),
  		 bkp-block);
  	}
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
***
*** 7681,7692  XLogRestorePoint(const char *rpName)
   * records. In that case, multiple copies of the same block would be recorded
   * in separate WAL records by different backends, though that is still OK from
   * a correctness perspective.
-  *
-  * Note that this only works for buffers that fit the standard page model,
-  * i.e. those for which buffer_std == true
   */
  XLogRecPtr
! XLogSaveBufferForHint(Buffer buffer)
  {
  	XLogRecPtr	recptr = InvalidXLogRecPtr;
  	XLogRecPtr	lsn;
--- 7681,7689 
   * records. In that case, multiple copies of the same block would be recorded
   * in separate WAL records by different backends, though that is still OK from
   * a correctness perspective.
   */
  XLogRecPtr
! XLogSaveBufferForHint(Buffer buffer, bool buffer_std)
  {
  	XLogRecPtr	recptr = InvalidXLogRecPtr;
  	XLogRecPtr	lsn;
***
*** 7708,7714  XLogSaveBufferForHint(Buffer buffer)
  	 * and reset rdata for any actual WAL record insert.
  	 */
  	rdata[0].buffer = buffer;
! 	rdata[0].buffer_std = true;
  
  	/*
  	 * Check buffer while not holding an exclusive lock.
--- 7705,7711 
  	 * and reset rdata for any 

Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-15 Thread Andres Freund
On 2013-06-15 11:36:54 -0700, Jeff Davis wrote:
 On Fri, 2013-06-14 at 18:27 +0200, Andres Freund wrote:
  I'd like to see a comment around the memcpys in XLogSaveBufferForHint()
  that mentions that they are safe in a non std buffer due to
  XLogCheckBuffer setting an appropriate hole/offset. Or make an explicit
  change of the copy algorithm there.
 
 Done.
 Also done.

Thanks! Looks good to me.

 Patch attached. Also, since we branched, I think this should be
 back-patched to 9.3 as well.

Absolutely.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-15 Thread Amit kapila
On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote:
On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila amit.kap...@huawei.com wrote:

 On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote:
 On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote:
 On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote:
 Hello,

 We have already started a discussion on pgsql-hackers for the problem of
 taking fresh backup during the failback operation here is the link for 
 that:


 http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb
 jgwrfu513...@mail.gmail.com

 Let me again summarize the problem we are trying to address.


   How will you take care of extra WAL on old master during recovery. If it
 plays the WAL which has not reached new-master, it can be a problem.

 you means that there is possible that old master's data ahead of new
 master's data.

   I mean to say is that WAL of old master can be ahead of new master. I 
 understood that
   data files of old master can't be ahead, but I think WAL can be ahead.

 so there is inconsistent data between those server when fail back. right?
 if so , there is not possible inconsistent. because if you use GUC option
 as his propose (i.g., failback_safe_standby_mode = remote_flush),
 when old master is working fine, all file system level changes aren't
 done  before WAL replicated.

 Would the propose patch will take care that old master's WAL is also not 
 ahead in some way?
 If yes, I think i am missing some point.

 yes it will happen that old master's WAL ahead of new master's WAL as you 
 said.
 but I think that we can solve them by delete all WAL file when old
 master starts as new standby.

I think ideally, it should reset WAL location at the point where new master has 
forrked off.
In such a scenario it would be difficult for user who wants to get a dump of 
some data in 
old master which hasn't gone to new master. I am not sure if such a need is 
there for real users, but if it
is there, then providing this solution will have some drawbacks.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Benedikt Grundmann
On Fri, Jun 14, 2013 at 10:11 AM, Samrat Revagade revagade.sam...@gmail.com
 wrote:

 Hello,


 We have already started a discussion on pgsql-hackers for the problem of
 taking fresh backup during the failback operation here is the link for that:




 http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com



 Let me again summarize the problem we are trying to address.



 When the master fails, last few WAL files may not reach the standby. But
 the master may have gone ahead and made changes to its local file system
 after flushing WAL to the local storage.  So master contains some file
 system level changes that standby does not have.  At this point, the data
 directory of master is ahead of standby's data directory.

 Subsequently, the standby will be promoted as new master.  Later when the
 old master wants to be a standby of the new master, it can't just join the
 setup since there is inconsistency in between these two servers. We need to
 take the fresh backup from the new master.  This can happen in both the
 synchronous as well as asynchronous replication.



 Fresh backup is also needed in case of clean switch-over because in the
 current HEAD, the master does not wait for the standby to receive all the
 WAL up to the shutdown checkpoint record before shutting down the
 connection. Fujii Masao has already submitted a patch to handle clean
 switch-over case, but the problem is still remaining for failback case.



 The process of taking fresh backup is very time consuming when databases
 are of very big sizes, say several TB's, and when the servers are connected
 over a relatively slower link.  This would break the service level
 agreement of disaster recovery system.  So there is need to improve the
 process of disaster recovery in PostgreSQL.  One way to achieve this is to
 maintain consistency between master and standby which helps to avoid need
 of fresh backup.



 So our proposal on this problem is that we must ensure that master should
 not make any file system level changes without confirming that the
 corresponding WAL record is replicated to the standby.



A alternative proposal (which will probably just reveal my lack of
understanding about what is or isn't possible with WAL).  Provide a way to
restart the master so that it rolls back the WAL changes that the slave
hasn't seen.

 There are many suggestions and objections pgsql-hackers about this problem
 The brief summary is as follows:





Re: [HACKERS] Patch for fail-back without fresh backup

2013-06-14 Thread Samrat Revagade
That will not happen if there is inconsistency in between both the servers.

Please refer to the discussions on the link provided in the first post:

http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com

Regards,

Samrat Revgade


  1   2   >