Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Nov 21, 2013 at 2:43 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-11-21 14:40:36 -0800, Jeff Janes wrote: But if the transaction would not have otherwise generated WAL (i.e. a select that did not have to do any HOT pruning, or an update with zero rows matching the where condition), doesn't it now have to flush and wait when it would otherwise not? We short circuit that if there's no xid assigned. Check RecordTransactionCommit(). It looks like that only short-circuits the flush if both there is no xid assigned, and !wrote_xlog. (line 1054 of xact.c) I do see stalls on fdatasync on flush from select statements which had no xid, but did generate xlog due to HOT pruning, I don't see why WAL logging hint bits would be different. Cheers, Jeff
Re: [HACKERS] Patch for fail-back without fresh backup
On 2014-01-16 09:25:51 -0800, Jeff Janes wrote: On Thu, Nov 21, 2013 at 2:43 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-11-21 14:40:36 -0800, Jeff Janes wrote: But if the transaction would not have otherwise generated WAL (i.e. a select that did not have to do any HOT pruning, or an update with zero rows matching the where condition), doesn't it now have to flush and wait when it would otherwise not? We short circuit that if there's no xid assigned. Check RecordTransactionCommit(). It looks like that only short-circuits the flush if both there is no xid assigned, and !wrote_xlog. (line 1054 of xact.c) Hm. Indeed. Why don't we just always use the async commit behaviour for that? I don't really see any significant dangers from doing so? It's also rather odd to use the sync rep mechanisms in such scenarios... The if() really should test markXidCommitted instead of wrote_xlog. I do see stalls on fdatasync on flush from select statements which had no xid, but did generate xlog due to HOT pruning, I don't see why WAL logging hint bits would be different. Are the stalls at commit or while the select is running? If wal_buffers is filled too fast, which can easily happen if loads of pages are hinted and wal logged, that will happen independently from RecordTransactionCommit(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Jan 16, 2014 at 9:37 AM, Andres Freund and...@2ndquadrant.comwrote: On 2014-01-16 09:25:51 -0800, Jeff Janes wrote: On Thu, Nov 21, 2013 at 2:43 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-11-21 14:40:36 -0800, Jeff Janes wrote: But if the transaction would not have otherwise generated WAL (i.e. a select that did not have to do any HOT pruning, or an update with zero rows matching the where condition), doesn't it now have to flush and wait when it would otherwise not? We short circuit that if there's no xid assigned. Check RecordTransactionCommit(). It looks like that only short-circuits the flush if both there is no xid assigned, and !wrote_xlog. (line 1054 of xact.c) Hm. Indeed. Why don't we just always use the async commit behaviour for that? I don't really see any significant dangers from doing so? I think the argument is that drawing the next value from a sequence can generate xlog that needs to be flushed, but doesn't assign an xid. I would think the sequence should flush that record before it hands out the value, not before the commit, but... It's also rather odd to use the sync rep mechanisms in such scenarios... The if() really should test markXidCommitted instead of wrote_xlog. I do see stalls on fdatasync on flush from select statements which had no xid, but did generate xlog due to HOT pruning, I don't see why WAL logging hint bits would be different. Are the stalls at commit or while the select is running? If wal_buffers is filled too fast, which can easily happen if loads of pages are hinted and wal logged, that will happen independently from RecordTransactionCommit(). In the real world, I'm not sure what the distribution is. But in my present test case, they are coming almost exclusively from RecordTransactionCommit. I use pgbench -T10 in a loop to generate dirty data and checkpoints (with synchronous_commit on but with a BBU), and then to probe the consequences I use: pgbench -T10 -S -n --startup='set synchronous_commit='$f (where --startup is an extension to pgbench proposed a few months ago) Running the select-only query with synchronous_commit off almost completely isolates it from the checkpoint drama that otherwise has a massive effect on it. with synchronous_commit=on, it goes from 6000 tps normally to 30 tps during the checkpoint sync, with synchronous_commit=off it might dip to 4000 or so during the worst of it. (To be clear, this is about the pruning, not the logging of the hint bits) Cheers, Jeff
Re: [HACKERS] Patch for fail-back without fresh backup
On 2014-01-16 11:01:29 -0800, Jeff Janes wrote: I think the argument is that drawing the next value from a sequence can generate xlog that needs to be flushed, but doesn't assign an xid. Then that should assign an xid. Which would yield correct behaviour with async commit where it's currently *not* causing a WAL flush at all unless a page boundary is crossed. I've tried arguing that way before... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
Jeff Janes jeff.ja...@gmail.com writes: I think the argument is that drawing the next value from a sequence can generate xlog that needs to be flushed, but doesn't assign an xid. I would think the sequence should flush that record before it hands out the value, not before the commit, but... IIRC the argument was that we'd flush WAL before any use of the value could make it to disk. Which is true if you're just inserting it into a table; perhaps less so if the client is doing something external to the database with it. (But it'd be reasonable to say that clients who want a guaranteed-good serial for such purposes should have to commit the transaction that created the value.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Nov 21, 2013 at 11:43:34PM +0100, Andres Freund wrote: On 2013-11-21 14:40:36 -0800, Jeff Janes wrote: But if the transaction would not have otherwise generated WAL (i.e. a select that did not have to do any HOT pruning, or an update with zero rows matching the where condition), doesn't it now have to flush and wait when it would otherwise not? We short circuit that if there's no xid assigned. Check RecordTransactionCommit(). OK, that was my question, now answered. Thanks. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Nov 21, 2013 at 12:31 PM, Alvaro Herrera alvhe...@2ndquadrant.comwrote: Bruce Momjian escribió: On Thu, Oct 24, 2013 at 11:14:14PM +0300, Heikki Linnakangas wrote: On 24.10.2013 23:07, Josh Berkus wrote: What kind of overhead are we talking about here? One extra WAL record whenever a hint bit is set on a page, for the first time after a checkpoint. In other words, a WAL record needs to be written in the same circumstances as with page checksums, but the WAL records are much smaller as they don't need to contain a full page image, just the block number of the changed block. Or maybe we'll write the full page image after all, like with page checksums, just without calculating the checksums. It might be tricky to skip the full-page image, because then a subsequent change of the page (which isn't just a hint-bit update) needs to somehow know it needs to take a full page image even though a WAL record for it was already written. Sorry to be replying late to this, but while I am not worried about the additional WAL volume, does this change require the transaction to now wait for a WAL sync to disk before continuing? I don't think so. There's extra WAL written, but there's no flush-and-wait until end of transaction (as has always been). But if the transaction would not have otherwise generated WAL (i.e. a select that did not have to do any HOT pruning, or an update with zero rows matching the where condition), doesn't it now have to flush and wait when it would otherwise not? Cheers, Jeff
Re: [HACKERS] Patch for fail-back without fresh backup
On 2013-11-21 14:40:36 -0800, Jeff Janes wrote: But if the transaction would not have otherwise generated WAL (i.e. a select that did not have to do any HOT pruning, or an update with zero rows matching the where condition), doesn't it now have to flush and wait when it would otherwise not? We short circuit that if there's no xid assigned. Check RecordTransactionCommit(). Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
Bruce Momjian escribió: On Thu, Oct 24, 2013 at 11:14:14PM +0300, Heikki Linnakangas wrote: On 24.10.2013 23:07, Josh Berkus wrote: What kind of overhead are we talking about here? One extra WAL record whenever a hint bit is set on a page, for the first time after a checkpoint. In other words, a WAL record needs to be written in the same circumstances as with page checksums, but the WAL records are much smaller as they don't need to contain a full page image, just the block number of the changed block. Or maybe we'll write the full page image after all, like with page checksums, just without calculating the checksums. It might be tricky to skip the full-page image, because then a subsequent change of the page (which isn't just a hint-bit update) needs to somehow know it needs to take a full page image even though a WAL record for it was already written. Sorry to be replying late to this, but while I am not worried about the additional WAL volume, does this change require the transaction to now wait for a WAL sync to disk before continuing? I don't think so. There's extra WAL written, but there's no flush-and-wait until end of transaction (as has always been). I thought that was the down-side to WAL logging hint bits, not the WAL volume itself. I don't think this is true either. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Oct 24, 2013 at 11:14:14PM +0300, Heikki Linnakangas wrote: On 24.10.2013 23:07, Josh Berkus wrote: On 10/24/2013 11:12 AM, Heikki Linnakangas wrote: On 24.10.2013 20:39, Josh Berkus wrote: On 10/24/2013 04:15 AM, Pavan Deolasee wrote: If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. Wait, what? Why are we having an additional GUC? I'm opposed to the idea of having a GUC to enable failback. When would anyone using replication ever want to disable that? For example, if you're not replicating for high availability purposes, but to keep a reporting standby up-to-date. What kind of overhead are we talking about here? You probably said, but I've had a mail client meltdown and lost a lot of my -hackers emails. One extra WAL record whenever a hint bit is set on a page, for the first time after a checkpoint. In other words, a WAL record needs to be written in the same circumstances as with page checksums, but the WAL records are much smaller as they don't need to contain a full page image, just the block number of the changed block. Or maybe we'll write the full page image after all, like with page checksums, just without calculating the checksums. It might be tricky to skip the full-page image, because then a subsequent change of the page (which isn't just a hint-bit update) needs to somehow know it needs to take a full page image even though a WAL record for it was already written. Sorry to be replying late to this, but while I am not worried about the additional WAL volume, does this change require the transaction to now wait for a WAL sync to disk before continuing? I thought that was the down-side to WAL logging hint bits, not the WAL volume itself. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Oct 25, 2013 at 8:08 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-24 13:51:52 -0700, Josh Berkus wrote: It entirely depends on your workload. If it happens to be something like: INSERT INTO table (lots_of_data); CHECKPOINT; SELECT * FROM TABLE; i.e. there's a checkpoint between loading the data and reading it - not exactly all that uncommon - we'll need to log something for every page. That can be rather noticeable. Especially as I think it will be rather hard to log anything but a real FPI. I really don't think everyone will want this. I am absolutely not against providing an option to log enough information to make pg_rewind work, but I think providing a command to do *safe* *planned* failover will help in many more. I think it is better providing as option to log enough information such as new wal_level. If user doesn't realize until it's too late, such information is contained in checkpoint record? For example if checkpoint record contained information of wal_level then we can inform to user using by such information. BTW this information is useful only for pg_rewind? Is there for anything else? (Sorry it might has already been discussed..) Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Oct 25, 2013 at 5:57 AM, Magnus Hagander mag...@hagander.net wrote: In fact I've been considering suggesting we might want to retire the difference between archive and hot_standby as wal_level, because the difference is usually so small. And the advantage of hot_standby is in almost every case worth it. Even in the archive recovery mode, being able to do pause_at_recovery_target is extremely useful. And as you say in (c) above, many users don't realize that until it's too late. +1 on removing archive from wal_level. Having both archive and hot_standby for wal_level is confusing, and if I recall correctly hot_standby and archive have been kept as possible settings only to protect people from bugs that the newly-introduced hot_standby could introduce due to the few WAL records it adds. But it has been a couple of releases since there have been no such bugs, no? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 2013-10-24 13:51:52 -0700, Josh Berkus wrote: On 10/24/2013 01:14 PM, Heikki Linnakangas wrote: One extra WAL record whenever a hint bit is set on a page, for the first time after a checkpoint. In other words, a WAL record needs to be written in the same circumstances as with page checksums, but the WAL records are much smaller as they don't need to contain a full page image, just the block number of the changed block. Or maybe we'll write the full page image after all, like with page checksums, just without calculating the checksums. It might be tricky to skip the full-page image, because then a subsequent change of the page (which isn't just a hint-bit update) needs to somehow know it needs to take a full page image even though a WAL record for it was already written. I think it would be worth estimating what this actually looks like in terms of log write quantity. My inclication is to say that if it increases log writes less than 10%, we don't need to provide an option to turn it off. It entirely depends on your workload. If it happens to be something like: INSERT INTO table (lots_of_data); CHECKPOINT; SELECT * FROM TABLE; i.e. there's a checkpoint between loading the data and reading it - not exactly all that uncommon - we'll need to log something for every page. That can be rather noticeable. Especially as I think it will be rather hard to log anything but a real FPI. I really don't think everyone will want this. I am absolutely not against providing an option to log enough information to make pg_rewind work, but I think providing a command to do *safe* *planned* failover will help in many more. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 2013-10-24 22:57:29 +0200, Magnus Hagander wrote: In fact I've been considering suggesting we might want to retire the difference between archive and hot_standby as wal_level, because the difference is usually so small. And the advantage of hot_standby is in almost every case worth it. Even in the archive recovery mode, being able to do pause_at_recovery_target is extremely useful. And as you say in (c) above, many users don't realize that until it's too late. +1. On 2013-10-25 15:16:30 +0900, Michael Paquier wrote: But it has been a couple of releases since there have been no such bugs, no? One 'no' too much? Anyway, I think there have been more recent ones, but it's infrequent enough that we can remove the level anyway. FWIW, I've wondered if we shouldn't remove most of the EnableHotStandby checks in xlog.c. There are way too many difference how StartupXLOG behaves depending on HS. E.g. I quite dislike that we do stuff like StartupCLOG at entirely different times during recovery. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Mon, Oct 21, 2013 at 7:10 PM, Sawada Masahiko sawada.m...@gmail.comwrote: I agree with you. If writing FPW is not large performance degradation, it is just idea that we can use to write FPW in same timing as checksum enabled. i.g., if we support new wal_level, the system writes FPW when a simple SELECT updates hint bits. but checksum function is disabled. Thought? I wonder if its too much for this purpose. In fact, we just need a way to know that a block could have been written on the master which the standby never saw. So even WAL logging just the block id should be good enough for pg_rewind to be able to detect and later copy that block from the new master. Having said that, I don't know if there is general advantage of WAL logging the exact hint bit update operation for other reasons. Another difference AFAICS is that checksum feature needs the block to be backed up only after the first time a hint bit is updated after checkpoint. But for something like pg_rewind to work, we will need to WAL log every hint bit update on a page. So we would want to keep it as short as possible. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
On 24.10.2013 13:02, Pavan Deolasee wrote: Another difference AFAICS is that checksum feature needs the block to be backed up only after the first time a hint bit is updated after checkpoint. But for something like pg_rewind to work, we will need to WAL log every hint bit update on a page. So we would want to keep it as short as possible. To fix that, pg_rewind could always start the rewinding process from the last checkpoint before the point that the histories diverge, instead of the exact point of divergence. That would make the rewinding more expensive as it needs to read through a lot more WAL, but I think it would still be OK. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Oct 24, 2013 at 4:22 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: . To fix that, pg_rewind could always start the rewinding process from the last checkpoint before the point that the histories diverge, instead of the exact point of divergence. Is that something required even if someone plans to use pg_rewind for a cluster with checksums enabled ? I mean since only first update after checkpoint is WAL logged, pg_rewind will break if another update happens after standby forks. Or would the recovery logic apply first WAL without looking at the page lsn ? (Sorry, may be I should read the code instead of asking you) If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. That would make the rewinding more expensive as it needs to read through a lot more WAL, but I think it would still be OK. Yeah, probably you are right. Though the amount of additional work could be significantly higher and some testing might be warranted. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Oct 24, 2013 at 4:45 PM, Pavan Deolasee pavan.deola...@gmail.comwrote: . Or would the recovery logic apply first WAL without looking at the page lsn ? (Sorry, may be I should read the code instead of asking you) Never mind. I realized it has to. That's the whole purpose of backing it up in the first place. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
On 24.10.2013 14:15, Pavan Deolasee wrote: On Thu, Oct 24, 2013 at 4:22 PM, Heikki Linnakangashlinnakan...@vmware.com wrote: To fix that, pg_rewind could always start the rewinding process from the last checkpoint before the point that the histories diverge, instead of the exact point of divergence. Is that something required even if someone plans to use pg_rewind for a cluster with checksums enabled ? I mean since only first update after checkpoint is WAL logged, pg_rewind will break if another update happens after standby forks. Yes. It's broken as it is, even when checksums are enabled - good catch. I'll go change it to read all the WAL in the target starting from the last checkpoint before the point of divergence. Or would the recovery logic apply first WAL without looking at the page lsn ? (Sorry, may be I should read the code instead of asking you) WAL recovery does apply all the full-page images without looking at the page LSN, but that doesn't help in this case. pg_rewind copies over the blocks from the source server (= promoted standby) that were changed in the target server (= old master), after the standby's history diverged from it. In other words, it reverts the blocks that were changed in the old master, by copying them over from the promoted standby. After that, WAL recovery is performed, using the WAL from the promoted standby, to apply all the changes from the promoted standby that were not present in the old master. But it never replays any WAL from the old master. It reads it through, to construct the list of blocks that were modified, but it doesn't apply them. If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. Yeah, it's trivial to add such a guc. Will just have to figure out what we want the user interface to be like; should it be a separate guc, or somehow cram it into wal_level? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Oct 24, 2013 at 5:45 PM, Heikki Linnakangas hlinnakan...@vmware.com wrote: Will just have to figure out what we want the user interface to be like; should it be a separate guc, or somehow cram it into wal_level? Yeah, I had brought up similar idea up thread. Right now wal_level is nicely ordered. But with this additional logic, I am not sure if we would need multiple new levels and also break that ordering (I don't know if its important). For example, one may want to set up streaming replication with/without this feature or hot standby with/without the feature. I don't have a good idea about how to capture them in wal_level. May be something like: minimal, archive, archive_with_this_new_feature, hot_standby and hot_standby_with_this_new_feature. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
Pavan Deolasee escribió: Yeah, I had brought up similar idea up thread. Right now wal_level is nicely ordered. But with this additional logic, I am not sure if we would need multiple new levels and also break that ordering (I don't know if its important). For example, one may want to set up streaming replication with/without this feature or hot standby with/without the feature. I don't have a good idea about how to capture them in wal_level. May be something like: minimal, archive, archive_with_this_new_feature, hot_standby and hot_standby_with_this_new_feature. That's confusing. A separate GUC sounds better. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 10/24/2013 04:15 AM, Pavan Deolasee wrote: If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. Wait, what? Why are we having an additional GUC? I'm opposed to the idea of having a GUC to enable failback. When would anyone using replication ever want to disable that? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 24.10.2013 20:39, Josh Berkus wrote: On 10/24/2013 04:15 AM, Pavan Deolasee wrote: If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. Wait, what? Why are we having an additional GUC? I'm opposed to the idea of having a GUC to enable failback. When would anyone using replication ever want to disable that? For example, if you're not replicating for high availability purposes, but to keep a reporting standby up-to-date. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 10/24/2013 11:12 AM, Heikki Linnakangas wrote: On 24.10.2013 20:39, Josh Berkus wrote: On 10/24/2013 04:15 AM, Pavan Deolasee wrote: If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. Wait, what? Why are we having an additional GUC? I'm opposed to the idea of having a GUC to enable failback. When would anyone using replication ever want to disable that? For example, if you're not replicating for high availability purposes, but to keep a reporting standby up-to-date. What kind of overhead are we talking about here? You probably said, but I've had a mail client meltdown and lost a lot of my -hackers emails. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 24.10.2013 23:07, Josh Berkus wrote: On 10/24/2013 11:12 AM, Heikki Linnakangas wrote: On 24.10.2013 20:39, Josh Berkus wrote: On 10/24/2013 04:15 AM, Pavan Deolasee wrote: If we do what you are suggesting, it seems like a single line patch to me. In XLogSaveBufferForHint(), we probably need to look at this additional GUC to decide whether or not to backup the block. Wait, what? Why are we having an additional GUC? I'm opposed to the idea of having a GUC to enable failback. When would anyone using replication ever want to disable that? For example, if you're not replicating for high availability purposes, but to keep a reporting standby up-to-date. What kind of overhead are we talking about here? You probably said, but I've had a mail client meltdown and lost a lot of my -hackers emails. One extra WAL record whenever a hint bit is set on a page, for the first time after a checkpoint. In other words, a WAL record needs to be written in the same circumstances as with page checksums, but the WAL records are much smaller as they don't need to contain a full page image, just the block number of the changed block. Or maybe we'll write the full page image after all, like with page checksums, just without calculating the checksums. It might be tricky to skip the full-page image, because then a subsequent change of the page (which isn't just a hint-bit update) needs to somehow know it needs to take a full page image even though a WAL record for it was already written. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 10/24/2013 01:14 PM, Heikki Linnakangas wrote: One extra WAL record whenever a hint bit is set on a page, for the first time after a checkpoint. In other words, a WAL record needs to be written in the same circumstances as with page checksums, but the WAL records are much smaller as they don't need to contain a full page image, just the block number of the changed block. Or maybe we'll write the full page image after all, like with page checksums, just without calculating the checksums. It might be tricky to skip the full-page image, because then a subsequent change of the page (which isn't just a hint-bit update) needs to somehow know it needs to take a full page image even though a WAL record for it was already written. I think it would be worth estimating what this actually looks like in terms of log write quantity. My inclication is to say that if it increases log writes less than 10%, we don't need to provide an option to turn it off. The reasons I don't want to provide a disabling GUC are: a) more GUCs b) confusing users c) causing users to disable rewind *until they need it*, at which point it's too late to enable it. So if there's any way we can avoid having a GUC for this, I'm for it. And if we do have a GUC, failback should be enabled by default. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Oct 24, 2013 at 10:51 PM, Josh Berkus j...@agliodbs.com wrote: On 10/24/2013 01:14 PM, Heikki Linnakangas wrote: One extra WAL record whenever a hint bit is set on a page, for the first time after a checkpoint. In other words, a WAL record needs to be written in the same circumstances as with page checksums, but the WAL records are much smaller as they don't need to contain a full page image, just the block number of the changed block. Or maybe we'll write the full page image after all, like with page checksums, just without calculating the checksums. It might be tricky to skip the full-page image, because then a subsequent change of the page (which isn't just a hint-bit update) needs to somehow know it needs to take a full page image even though a WAL record for it was already written. I think it would be worth estimating what this actually looks like in terms of log write quantity. My inclication is to say that if it increases log writes less than 10%, we don't need to provide an option to turn it off. The reasons I don't want to provide a disabling GUC are: a) more GUCs b) confusing users c) causing users to disable rewind *until they need it*, at which point it's too late to enable it. So if there's any way we can avoid having a GUC for this, I'm for it. And if we do have a GUC, failback should be enabled by default. +1 on the principle. In fact I've been considering suggesting we might want to retire the difference between archive and hot_standby as wal_level, because the difference is usually so small. And the advantage of hot_standby is in almost every case worth it. Even in the archive recovery mode, being able to do pause_at_recovery_target is extremely useful. And as you say in (c) above, many users don't realize that until it's too late. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Oct 25, 2013 at 5:57 AM, Magnus Hagander mag...@hagander.net wrote: On Thu, Oct 24, 2013 at 10:51 PM, Josh Berkus j...@agliodbs.com wrote: On 10/24/2013 01:14 PM, Heikki Linnakangas wrote: I think it would be worth estimating what this actually looks like in terms of log write quantity. My inclication is to say that if it increases log writes less than 10%, we don't need to provide an option to turn it off. The reasons I don't want to provide a disabling GUC are: a) more GUCs b) confusing users c) causing users to disable rewind *until they need it*, at which point it's too late to enable it. So if there's any way we can avoid having a GUC for this, I'm for it. And if we do have a GUC, failback should be enabled by default. +1 on the principle. In fact I've been considering suggesting we might want to retire the difference between archive and hot_standby as wal_level, because the difference is usually so small. And the advantage of hot_standby is in almost every case worth it. Even in the archive recovery mode, being able to do pause_at_recovery_target is extremely useful. And as you say in (c) above, many users don't realize that until it's too late. +1. Many user would not realize that it is too late If we will provide it as additional GUC. And I agree with writing log including the block number of the changed block. I think that writing log is not lead huge overhead increase. Is those WAL record replicated to the standby server in synchronous ( of course when configuring sync replication)? I am concerned that it lead performance overhead with such as executing SELECT or auto vacuum. especially, when two servers are in far location. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Oct 10, 2013 at 1:41 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: Not that I can find any flaw in the OP's patch, but given the major objections and my own nervousness about documenting this new failback safe standby mode, I am also inclining to improve pg_rewind or whatever it takes to get it working. Clearly at first we need to have an optional mechanism to WAL log hint bit updates. There seems to be two ways to do that: a. Add a new GUC which can turned on/off and requires server restart to take effect b. Add another option for wal_level setting. (b) looks better, but I am not sure if we want to support this new level with and without hot standby. If latter, we will need multiple new levels to differentiate all those cases. I am OK with supporting it only with hot standby which is probably what most people do with streaming replication anyway. The other issue is to how to optimally WAL log hint bit updates: a. Should we have separate WAL records just for the purpose or should we piggyback them on heap update/delete/prune etc WAL records ? Of course, there will be occasions when a simple SELECT also updates hint bits, so most likely we will need a separate WAL record anyhow. b. Does it make sense to try to all hint bits in a page if we are WAL logging it anyways ? I think we have discussed this idea even before just to minimize the number of writes a heap page receives when hint bits of different tuples are set at different times, each update triggering a fresh write. I don't remember whats the consensus for that, but it might be worthwhile to reconsider that option if we are WAL logging the hint bit updates. I agree with you. If writing FPW is not large performance degradation, it is just idea that we can use to write FPW in same timing as checksum enabled. i.g., if we support new wal_level, the system writes FPW when a simple SELECT updates hint bits. but checksum function is disabled. Thought? We will definitely need some amount of performance benchmarks even if this is optional. But are there other things to worry about ? Any strong objections to this idea or any other stow stopper for pg_rewind itself ? -- Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Oct 8, 2013 at 6:37 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote: I attached the v12 patch which have modified based on above suggestions. There are still some parts of this design/patch which I am concerned about. 1. The design clubs synchronous standby and failback safe standby rather very tightly. IIRC this is based on the feedback you received early, so my apologies for raising it again so late. a. GUC synchrnous_standby_names is used to name synchronous as well as failback safe standbys. I don't know if that will confuse users. With currently the patch, user can specify the failback safe standby and sync replication standby at same server. synchronous_standby_names So I was thinking that it will not confuse users. b. synchronous_commit's value will also control whether a sync/async failback safe standby wait for remote write or flush. Is that reasonable ? Or should there be a different way to configure the failback safe standby's WAL safety ? synchronous_commit's values can not control whether sync sync/async failback safe standby wait level. On data page flush, failback safe standby waits for only flush. Should we also allow to wait for remote write? 2. With the current design/implementation, user can't configure a synchronous and an async failback safe standby at the same time. I think we discussed this earlier and there was an agreement on the limitation. Just wanted to get that confirmed again. yes, user can't configure sync standby and async failback safe standby at the same time. The currently patch supports following cases. - sync standby and make same as failback safe standby - async standby and make same as failback safe standby 3. SyncRepReleaseWaiters() does not know whether its waking up backends waiting for sync rep or failback safe rep. Is that ok ? For example, I found that the elog() message announcing next takeover emitted by the function may look bad. Since changing synchronous_transfer requires server restart, we can teach SyncRepReleaseWaiters() to look at that parameter to figure out whether the standby is sync and/or failback safe standby. I agree with you. Are you saying about following comment? if (announce_next_takeover) { announce_next_takeover = false; ereport(LOG, (errmsg(standby \%s\ is now the synchronous standby with priority %u, application_name, MyWalSnd-sync_standby_priority))); } 4. The documentation still need more work to clearly explain the use case. Understood. we will more work to clearly explain the use case. 5. Have we done any sort of stress testing of the patch ? If there is a bug, the data corruption at the master can go unnoticed. So IMHO we need many crash recovery tests to ensure that the patch is functionally correct. I have done several testing of the patch. And I have confirmed that data page is not flushed to disk when the master server has not been receive the reply from the standby server. I used pg_filedump. To ensure that the patch is functionally correct, what test should we do? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Oct 8, 2013 at 6:46 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-08 15:07:02 +0530, Pavan Deolasee wrote: On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.comwrote: On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote: I attached the v12 patch which have modified based on above suggestions. There are still some parts of this design/patch which I am concerned about. 1. The design clubs synchronous standby and failback safe standby rather very tightly. IIRC this is based on the feedback you received early, so my apologies for raising it again so late. It is my impression that there still are several people having pretty fundamental doubts about this approach in general. From what I remember neither Heikki, Simon, Tom nor me were really convinced about this approach. Thank you for comment. We are thinking that this approach can solve the real problem. Actually we have confirm the effect of this approach. The master server flushes data page to disk after the master server received reply from the standby server. If you have concern or doubt in technical side, Could you tell me it? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Oct 8, 2013 at 3:16 PM, Andres Freund and...@2ndquadrant.comwrote: On 2013-10-08 15:07:02 +0530, Pavan Deolasee wrote: On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote: I attached the v12 patch which have modified based on above suggestions. There are still some parts of this design/patch which I am concerned about. 1. The design clubs synchronous standby and failback safe standby rather very tightly. IIRC this is based on the feedback you received early, so my apologies for raising it again so late. It is my impression that there still are several people having pretty fundamental doubts about this approach in general. From what I remember neither Heikki, Simon, Tom nor me were really convinced about this approach. Listing down all objections and their solutions: Major Objection on the proposal: * Tom Lane* # additional complexity to the code it will cause performance overhead - On an average it causes 0.5 - 1% performance overhead for fast transaction workload, as the wait is mostly on backend process. The latest re-factored code, looks less complex. # Use of rsync with checksum - but many pages on the two servers may differ in their binary values because of hint bits *Heikki :* # Use pg_rewind to do the same: It has well known problem of hint bit updates. If we use this we need enable checksums or explicitly WAL log hint bits which leads to performance overhead *Amit Kapila* # How to take care of extra WAL on old master during recovery.? we can solve this by deleting all WAL file when old master before it starts as new standby. *Simon Riggs* # Renaming patch - done # remove extra set of parameters - done # performance drop - On an average it causes 0.5 - 1% performance overhead for fast transaction workload, as the wait is mostly on backend process. # The way of configuring standby - with synchronous_transfer parameter we can configure 4 types of standby servers depending on the need. *Fujii Masao* # how patch interacts with cascaded standby - patch works same as synchronous replication # CHECKPOINT in the standby, it got stuck infinitely. - fixed this # Complicated conditions in SyncRepWaitForLSN() – code has been refactored in v11 # Improve source code comments - done *Pavan Deolasee* # Interaction of synchronous_commit with synchronous_transfer - Now synchronous_commit only controls whether and how to wait for the standby only when a transaction commits. synchronous_transfer OTOH tells how to interpret the standby listed in synchronous_standbys parameter. # Further Improvements in the documentation - we will do that # More stress testing - we will do that Any inputs on stress testing would help.
Re: [HACKERS] Patch for fail-back without fresh backup
On Wed, Oct 9, 2013 at 4:54 AM, Samrat Revagade revagade.sam...@gmail.com wrote: On Tue, Oct 8, 2013 at 3:16 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-10-08 15:07:02 +0530, Pavan Deolasee wrote: On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.comwrote: On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote: I attached the v12 patch which have modified based on above suggestions. There are still some parts of this design/patch which I am concerned about. 1. The design clubs synchronous standby and failback safe standby rather very tightly. IIRC this is based on the feedback you received early, so my apologies for raising it again so late. It is my impression that there still are several people having pretty fundamental doubts about this approach in general. From what I remember neither Heikki, Simon, Tom nor me were really convinced about this approach. Listing down all objections and their solutions: Major Objection on the proposal: * Tom Lane* # additional complexity to the code it will cause performance overhead - On an average it causes 0.5 - 1% performance overhead for fast transaction workload, as the wait is mostly on backend process. The latest re-factored code, looks less complex. # Use of rsync with checksum - but many pages on the two servers may differ in their binary values because of hint bits *Heikki :* # Use pg_rewind to do the same: It has well known problem of hint bit updates. If we use this we need enable checksums or explicitly WAL log hint bits which leads to performance overhead *Amit Kapila* # How to take care of extra WAL on old master during recovery.? we can solve this by deleting all WAL file when old master before it starts as new standby. *Simon Riggs* # Renaming patch - done # remove extra set of parameters - done # performance drop - On an average it causes 0.5 - 1% performance overhead for fast transaction workload, as the wait is mostly on backend process. # The way of configuring standby - with synchronous_transfer parameter we can configure 4 types of standby servers depending on the need. *Fujii Masao* # how patch interacts with cascaded standby - patch works same as synchronous replication # CHECKPOINT in the standby, it got stuck infinitely. - fixed this # Complicated conditions in SyncRepWaitForLSN() – code has been refactored in v11 # Improve source code comments - done *Pavan Deolasee* # Interaction of synchronous_commit with synchronous_transfer - Now synchronous_commit only controls whether and how to wait for the standby only when a transaction commits. synchronous_transfer OTOH tells how to interpret the standby listed in synchronous_standbys parameter. # Further Improvements in the documentation - we will do that # More stress testing - we will do that Any inputs on stress testing would help. The point is that when there are at least four senior community members expressing serious objections to a concept, three of whom are committes, we shouldn't be considering committing it until at least some of those people have withdrawn your objections. Nearly all patch submitters are in favor of their own patches; that does not entitle them to have those patches, committed even if there is a committer who agrees with them. There needs to be a real consensus on the path forward. If that policy ever changes, I have my own list of things that are on the cutting-room floor that I'll be happy to resurrect. Personally, I don't have a strong opinion on this patch because I have not followed it closely enough. But if Tom, Heikki, Simon, and Andres are all unconvinced that this is a good direction, then put me down for a -1 vote as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Oct 8, 2013 at 9:22 PM, Heikki Linnakangas hlinnakan...@vmware.comwrote: Yeah, I definitely think we should work on the pg_rewind approach instead of this patch. It's a lot more flexible. The performance hit of WAL-logging hint bit updates is the price you have to pay, but a lot of people were OK with that to get page checksum, so I think a lot of people would be OK with it for this purpose too. As long as it's optional, of course. And anyone using page checksums are already paying that price. Not that I can find any flaw in the OP's patch, but given the major objections and my own nervousness about documenting this new failback safe standby mode, I am also inclining to improve pg_rewind or whatever it takes to get it working. Clearly at first we need to have an optional mechanism to WAL log hint bit updates. There seems to be two ways to do that: a. Add a new GUC which can turned on/off and requires server restart to take effect b. Add another option for wal_level setting. (b) looks better, but I am not sure if we want to support this new level with and without hot standby. If latter, we will need multiple new levels to differentiate all those cases. I am OK with supporting it only with hot standby which is probably what most people do with streaming replication anyway. The other issue is to how to optimally WAL log hint bit updates: a. Should we have separate WAL records just for the purpose or should we piggyback them on heap update/delete/prune etc WAL records ? Of course, there will be occasions when a simple SELECT also updates hint bits, so most likely we will need a separate WAL record anyhow. b. Does it make sense to try to all hint bits in a page if we are WAL logging it anyways ? I think we have discussed this idea even before just to minimize the number of writes a heap page receives when hint bits of different tuples are set at different times, each update triggering a fresh write. I don't remember whats the consensus for that, but it might be worthwhile to reconsider that option if we are WAL logging the hint bit updates. We will definitely need some amount of performance benchmarks even if this is optional. But are there other things to worry about ? Any strong objections to this idea or any other stow stopper for pg_rewind itself ? Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote: You added several checks into SyncRepWaitForLSN() so that it can handle both synchronous_transfer=data_flush and =commit. This change made the source code of the function very complicated, I'm afraid. To simplify the source code, what about just adding new wait-for-lsn function for data_flush instead of changing SyncRepWaitForLSN()? Obviously that new function and SyncRepWaitForLSN() have the common part. I think that it should be extracted as separate function. Thank you for reviewing and comment! yes I agree with you. I attached the v12 patch which have modified based on above suggestions. - Added new function SyncRepTransferWaitForLSN() and SyncRepWait() SyncRepTransferWaitForLSN() is called on date page flush. OTOH, SyncRepWatiForLSN() is called on transaction commit. And both functions call the SyncRepWait() after check whether sync commit/transfer is requested. Practically server will waits at SyncRepWait(). + * Note that if sync transfer is requested, we can't regular maintenance until + * standbys to connect. */ -if (synchronous_commit SYNCHRONOUS_COMMIT_LOCAL_FLUSH) +if (synchronous_commit SYNCHRONOUS_COMMIT_LOCAL_FLUSH !SyncTransRequested()) Per discussion with Pavan, ISTM we don't need to avoid setting synchronous_commit to local even if synchronous_transfer is data_flush. But you did that here. Why? I made a mistake. I have removed it. When synchronous_transfer = data_flush, anti-wraparound vacuum can be blocked. Is this safe? In the new version patch, when synchronous_transfer = data_flush/all AND synchronous_standby_names is set, vacuum is blocked. This behaviour of synchronous_transfer similar to synchronous_commit. Should this allow to do anti-wraparound vacuum even if synchronous_transfer = data_flush/all? If so, it also allow to flush the data page while doing vacuum? +#synchronous_transfer = commit# data page synchronization level +# commit, data_flush or all This comment seems confusing. I think that this parameter specifies when to wait for replication. +typedef enum +{ +SYNCHRONOUS_TRANSFER_COMMIT,/* no wait for flush data page */ +SYNCHRONOUS_TRANSFER_DATA_FLUSH,/* wait for data page flush only + * no wait for WAL */ +SYNCHRONOUS_TRANSFER_ALL/* wait for data page flush and WAL*/ +}SynchronousTransferLevel; These comments also seem confusing. For example, I think that the meaning of SYNCHRONOUS_TRANSFER_COMMIT is something like wait for replication on transaction commit. Those comments are modified in new patch. @@ -521,6 +531,13 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record) */ XLogFlush(lsn); +/* + * If synchronous transfer is requested, wait for failback safe standby + * to receive WAL up to lsn. + */ +if (SyncTransRequested()) +SyncRepWaitForLSN(lsn, true, true); If smgr_redo() is called only during recovery, SyncRepWaitForLSN() doesn't need to be called here. Thank you for info. I have removed it at smgr_redo(). Regards, --- Sawada Masahiko synchronous_transfer_v12.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.comwrote: On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote: I attached the v12 patch which have modified based on above suggestions. There are still some parts of this design/patch which I am concerned about. 1. The design clubs synchronous standby and failback safe standby rather very tightly. IIRC this is based on the feedback you received early, so my apologies for raising it again so late. a. GUC synchrnous_standby_names is used to name synchronous as well as failback safe standbys. I don't know if that will confuse users. b. synchronous_commit's value will also control whether a sync/async failback safe standby wait for remote write or flush. Is that reasonable ? Or should there be a different way to configure the failback safe standby's WAL safety ? 2. With the current design/implementation, user can't configure a synchronous and an async failback safe standby at the same time. I think we discussed this earlier and there was an agreement on the limitation. Just wanted to get that confirmed again. 3. SyncRepReleaseWaiters() does not know whether its waking up backends waiting for sync rep or failback safe rep. Is that ok ? For example, I found that the elog() message announcing next takeover emitted by the function may look bad. Since changing synchronous_transfer requires server restart, we can teach SyncRepReleaseWaiters() to look at that parameter to figure out whether the standby is sync and/or failback safe standby. 4. The documentation still need more work to clearly explain the use case. 5. Have we done any sort of stress testing of the patch ? If there is a bug, the data corruption at the master can go unnoticed. So IMHO we need many crash recovery tests to ensure that the patch is functionally correct. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
On 2013-10-08 15:07:02 +0530, Pavan Deolasee wrote: On Tue, Oct 8, 2013 at 2:33 PM, Sawada Masahiko sawada.m...@gmail.comwrote: On Fri, Oct 4, 2013 at 4:32 PM, Fujii Masao masao.fu...@gmail.com wrote: I attached the v12 patch which have modified based on above suggestions. There are still some parts of this design/patch which I am concerned about. 1. The design clubs synchronous standby and failback safe standby rather very tightly. IIRC this is based on the feedback you received early, so my apologies for raising it again so late. It is my impression that there still are several people having pretty fundamental doubts about this approach in general. From what I remember neither Heikki, Simon, Tom nor me were really convinced about this approach. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Oct 8, 2013 at 3:16 PM, Andres Freund and...@2ndquadrant.comwrote: It is my impression that there still are several people having pretty fundamental doubts about this approach in general. From what I remember neither Heikki, Simon, Tom nor me were really convinced about this approach. IIRC you and Tom were particularly skeptical about the approach. But do you see a technical flaw or a show stopper with the approach ? Heikki has written pg_rewind which is really very cool. But it fails to handle the hint bit updates which are not WAL logged unless of course checksums are turned on. We can have a GUC controlled option to turn WAL logging on for hint bit updates and then use pg_rewind for the purpose. But I did not see any agreement on that either. Performance implication of WAL logging every hint bit update could be huge. Simon has raised usability concerns that Sawada-san and Samrat have tried to address by following his suggestions. I am not fully convinced though we have got that right. But then there is hardly any feedback on that aspect lately. In general, from the discussion it seems that the patch is trying to solve a real problem. Even though Tom and you feel that rsync is probably good enough and more trustworthy than any other approach, my feeling is that many including Fujii-san still disagree with that argument based on real user feedback. So where do we go from here ? I think it will really help Sawada-san and Samrat if we can provide them some solid feedback and approach to take. Lately, I was thinking if we could do something else to track file system updates without relying on WAL inspection and then use pg_rewind to solve this problem. Some sort of prelaod library mechanism is one such possibility. But haven't really thought through this entirely. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
On 08.10.2013 13:00, Pavan Deolasee wrote: On Tue, Oct 8, 2013 at 3:16 PM, Andres Freundand...@2ndquadrant.comwrote: It is my impression that there still are several people having pretty fundamental doubts about this approach in general. From what I remember neither Heikki, Simon, Tom nor me were really convinced about this approach. IIRC you and Tom were particularly skeptical about the approach. But do you see a technical flaw or a show stopper with the approach ? Heikki has written pg_rewind which is really very cool. But it fails to handle the hint bit updates which are not WAL logged unless of course checksums are turned on. We can have a GUC controlled option to turn WAL logging on for hint bit updates and then use pg_rewind for the purpose. But I did not see any agreement on that either. Performance implication of WAL logging every hint bit update could be huge. Yeah, I definitely think we should work on the pg_rewind approach instead of this patch. It's a lot more flexible. The performance hit of WAL-logging hint bit updates is the price you have to pay, but a lot of people were OK with that to get page checksum, so I think a lot of people would be OK with it for this purpose too. As long as it's optional, of course. And anyone using page checksums are already paying that price. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Oct 4, 2013 at 1:46 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Sep 27, 2013 at 6:44 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Sep 27, 2013 at 5:18 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: On Fri, Sep 27, 2013 at 1:28 PM, Sawada Masahiko sawada.m...@gmail.com wrote: Thank you for comment. I think it is good simple idea. In your opinion, if synchronous_transfer is set 'all' and synchronous_commit is set 'on', the master wait for data flush eve if user sets synchronous_commit to 'local' or 'off'. For example, when user want to do transaction early, user can't do this. we leave the such situation as constraint? No, user can still override the transaction commit point wait. So if synchronous_transfer is set to all: - If synchronous_commit is ON - wait at all points - If synchronous_commit is OFF - wait only at buffer flush (and other related to failback safety) points synchronous_transfer is set to data_flush: - If synchronous_commit is either ON o OFF - do not wait at commit points, but wait at all other points synchronous_transfer is set to commit: - If synchronous_commit is ON - wait at commit point - If synchronous_commit is OFF - do not wait at any point Thank you for explain. Understood. if synchronous_transfer is set 'all' and user changes synchronous_commit to 'off'( or 'local') at a transaction, the master server wait at buffer flush, but doesn't wait at commit points. Right? In currently patch, synchronous_transfer works in cooperation with synchronous_commit. But if user changes synchronous_commit at a transaction, they are not in cooperation. So, your idea might be better than currently behaviour of synchronous_transfer. I attached the v11 patch which have fixed following contents. You added several checks into SyncRepWaitForLSN() so that it can handle both synchronous_transfer=data_flush and =commit. This change made the source code of the function very complicated, I'm afraid. To simplify the source code, what about just adding new wait-for-lsn function for data_flush instead of changing SyncRepWaitForLSN()? Obviously that new function and SyncRepWaitForLSN() have the common part. I think that it should be extracted as separate function. + * Note that if sync transfer is requested, we can't regular maintenance until + * standbys to connect. */ -if (synchronous_commit SYNCHRONOUS_COMMIT_LOCAL_FLUSH) +if (synchronous_commit SYNCHRONOUS_COMMIT_LOCAL_FLUSH !SyncTransRequested()) Per discussion with Pavan, ISTM we don't need to avoid setting synchronous_commit to local even if synchronous_transfer is data_flush. But you did that here. Why? When synchronous_transfer = data_flush, anti-wraparound vacuum can be blocked. Is this safe? +#synchronous_transfer = commit# data page synchronization level +# commit, data_flush or all This comment seems confusing. I think that this parameter specifies when to wait for replication. +typedef enum +{ +SYNCHRONOUS_TRANSFER_COMMIT,/* no wait for flush data page */ +SYNCHRONOUS_TRANSFER_DATA_FLUSH,/* wait for data page flush only + * no wait for WAL */ +SYNCHRONOUS_TRANSFER_ALL/* wait for data page flush and WAL*/ +}SynchronousTransferLevel; These comments also seem confusing. For example, I think that the meaning of SYNCHRONOUS_TRANSFER_COMMIT is something like wait for replication on transaction commit. @@ -521,6 +531,13 @@ smgr_redo(XLogRecPtr lsn, XLogRecord *record) */ XLogFlush(lsn); +/* + * If synchronous transfer is requested, wait for failback safe standby + * to receive WAL up to lsn. + */ +if (SyncTransRequested()) +SyncRepWaitForLSN(lsn, true, true); If smgr_redo() is called only during recovery, SyncRepWaitForLSN() doesn't need to be called here. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Sep 27, 2013 at 6:44 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Fri, Sep 27, 2013 at 5:18 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: On Fri, Sep 27, 2013 at 1:28 PM, Sawada Masahiko sawada.m...@gmail.com wrote: Thank you for comment. I think it is good simple idea. In your opinion, if synchronous_transfer is set 'all' and synchronous_commit is set 'on', the master wait for data flush eve if user sets synchronous_commit to 'local' or 'off'. For example, when user want to do transaction early, user can't do this. we leave the such situation as constraint? No, user can still override the transaction commit point wait. So if synchronous_transfer is set to all: - If synchronous_commit is ON - wait at all points - If synchronous_commit is OFF - wait only at buffer flush (and other related to failback safety) points synchronous_transfer is set to data_flush: - If synchronous_commit is either ON o OFF - do not wait at commit points, but wait at all other points synchronous_transfer is set to commit: - If synchronous_commit is ON - wait at commit point - If synchronous_commit is OFF - do not wait at any point Thank you for explain. Understood. if synchronous_transfer is set 'all' and user changes synchronous_commit to 'off'( or 'local') at a transaction, the master server wait at buffer flush, but doesn't wait at commit points. Right? In currently patch, synchronous_transfer works in cooperation with synchronous_commit. But if user changes synchronous_commit at a transaction, they are not in cooperation. So, your idea might be better than currently behaviour of synchronous_transfer. I attached the v11 patch which have fixed following contents. - synchronous_transfer controls to wait at only data flush level, synchronous_commit controls to wait at commit level. ( Based on Pavan suggestion) - If there are no sync replication standby name, both synchronous_commit and synchronous_transfer don't work. - Fixed that we didn't support failback-safe standby. Previous patch can not support failback-safe standby. Because the patch doesn't wait at FlushBuffer which is called by autovacuum. So, if user want to do transaction early temporarily, user need to change the synchronous_transfer value and reload postgresql.conf. Regards, --- Sawada Masahiko synchronous_transfer_v11.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Sep 26, 2013 at 8:54 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: On Thu, Sep 19, 2013 at 4:02 PM, Fujii Masao masao.fu...@gmail.com wrote: Hmm... when synchronous_transfer is set to data_flush, IMO the intuitive behaviors are (1) synchronous_commit = on A data flush should wait for the corresponding WAL to be flushed in the standby (2) synchronous_commit = remote_write A data flush should wait for the corresponding WAL to be written to OS in the standby. (3) synchronous_commit = local (4) synchronous_commit = off A data flush should wait for the corresponding WAL to be written locally in the master. I thought synchronous_commit and synchronous_transfer are kind of orthogonal to each other. synchronous_commit only controls whether and how to wait for the standby only when a transaction commits. synchronous_transfer OTOH tells how to interpret the standby listed in synchronous_standbys parameter. If set to commit then they are synchronous standbys (like today). If set to data_flush, they are asynchronous failback safe standby and if set to all then they are synchronous failback safe standbys. Well, its confusing :-( So IMHO in the current state of things, the synchronous_transfer GUC can not be changed at a session/transaction level since all backends, including background workers must honor the settings to guarantee failback safety. synchronous_commit still works the same way, but is ignored if synchronous_transfer is set to data_flush because that effectively tells us that the standbys listed under synchronous_standbys are really *async* standbys with failback safety. Thank you for comment. I think it is good simple idea. In your opinion, if synchronous_transfer is set 'all' and synchronous_commit is set 'on', the master wait for data flush eve if user sets synchronous_commit to 'local' or 'off'. For example, when user want to do transaction early, user can't do this. we leave the such situation as constraint? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Sep 27, 2013 at 1:28 PM, Sawada Masahiko sawada.m...@gmail.comwrote: Thank you for comment. I think it is good simple idea. In your opinion, if synchronous_transfer is set 'all' and synchronous_commit is set 'on', the master wait for data flush eve if user sets synchronous_commit to 'local' or 'off'. For example, when user want to do transaction early, user can't do this. we leave the such situation as constraint? No, user can still override the transaction commit point wait. So if synchronous_transfer is set to all: - If synchronous_commit is ON - wait at all points - If synchronous_commit is OFF - wait only at buffer flush (and other related to failback safety) points synchronous_transfer is set to data_flush: - If synchronous_commit is either ON o OFF - do not wait at commit points, but wait at all other points synchronous_transfer is set to commit: - If synchronous_commit is ON - wait at commit point - If synchronous_commit is OFF - do not wait at any point Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Sep 27, 2013 at 5:18 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: On Fri, Sep 27, 2013 at 1:28 PM, Sawada Masahiko sawada.m...@gmail.com wrote: Thank you for comment. I think it is good simple idea. In your opinion, if synchronous_transfer is set 'all' and synchronous_commit is set 'on', the master wait for data flush eve if user sets synchronous_commit to 'local' or 'off'. For example, when user want to do transaction early, user can't do this. we leave the such situation as constraint? No, user can still override the transaction commit point wait. So if synchronous_transfer is set to all: - If synchronous_commit is ON - wait at all points - If synchronous_commit is OFF - wait only at buffer flush (and other related to failback safety) points synchronous_transfer is set to data_flush: - If synchronous_commit is either ON o OFF - do not wait at commit points, but wait at all other points synchronous_transfer is set to commit: - If synchronous_commit is ON - wait at commit point - If synchronous_commit is OFF - do not wait at any point Thank you for explain. Understood. if synchronous_transfer is set 'all' and user changes synchronous_commit to 'off'( or 'local') at a transaction, the master server wait at buffer flush, but doesn't wait at commit points. Right? In currently patch, synchronous_transfer works in cooperation with synchronous_commit. But if user changes synchronous_commit at a transaction, they are not in cooperation. So, your idea might be better than currently behaviour of synchronous_transfer. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Sep 19, 2013 at 4:02 PM, Fujii Masao masao.fu...@gmail.com wrote: Hmm... when synchronous_transfer is set to data_flush, IMO the intuitive behaviors are (1) synchronous_commit = on A data flush should wait for the corresponding WAL to be flushed in the standby (2) synchronous_commit = remote_write A data flush should wait for the corresponding WAL to be written to OS in the standby. (3) synchronous_commit = local (4) synchronous_commit = off A data flush should wait for the corresponding WAL to be written locally in the master. I thought synchronous_commit and synchronous_transfer are kind of orthogonal to each other. synchronous_commit only controls whether and how to wait for the standby only when a transaction commits. synchronous_transfer OTOH tells how to interpret the standby listed in synchronous_standbys parameter. If set to commit then they are synchronous standbys (like today). If set to data_flush, they are asynchronous failback safe standby and if set to all then they are synchronous failback safe standbys. Well, its confusing :-( So IMHO in the current state of things, the synchronous_transfer GUC can not be changed at a session/transaction level since all backends, including background workers must honor the settings to guarantee failback safety. synchronous_commit still works the same way, but is ignored if synchronous_transfer is set to data_flush because that effectively tells us that the standbys listed under synchronous_standbys are really *async* standbys with failback safety. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Sep 20, 2013 at 10:33 PM, Samrat Revagade revagade.sam...@gmail.com wrote: On Fri, Sep 20, 2013 at 3:40 PM, Sameer Thakur samthaku...@gmail.com wrote: Attached patch combines documentation patch and source-code patch. I have had a stab at reviewing the documentation. Have a look. Thanks. Attached patch implements suggestions in documentation. But comments from Fujii-san still needs to be implemented . We will implement them soon. I have attached the patch which modify based on Fujii-san suggested. If synchronous_transfer is set 'data_flush', behaviour of synchronous_transfer with synchronous_commit is (1) synchronous_commit = on A data flush should wait for the corresponding WAL to be flushed in the standby (2) synchronous_commit = remote_write A data flush should wait for the corresponding WAL to be written to OS in the standby. (3) synchronous_commit = local (4) synchronous_commit = off A data flush should wait for the corresponding WAL to be written locally in the master. Even if user changes synchronous_commit value in transaction, other process (e.g. checkpointer process) can't confirm it. Currently patch, each processes uses locally synchronous_commit. Regards, --- Sawada Masahiko synchronous_transfer_v10.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
Attached patch combines documentation patch and source-code patch. I have had a stab at reviewing the documentation. Have a look. --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1749,6 +1749,50 @@ include 'filename' /listitem /varlistentry + varlistentry id=guc-synchronous-transfer xreflabel=synchronous_transfer + termvarnamesynchronous_transfer/varname (typeenum/type)/term + indexterm + primaryvarnamesynchronous_transfer/ configuration parameter/primary + /indexterm + listitem + para +This parameter controls the synchronous nature of WAL transfer and +maintains file system level consistency between master server and +standby server. It specifies whether master server will wait for file +system level change (for example : modifying data page) before +the corresponding WAL records are replicated to the standby server. + /para + para +Valid values are literalcommit/, literaldata_flush/ and +literalall/. The default value is literalcommit/, meaning +that master will only wait for transaction commits, this is equivalent +to turning off literalsynchronous_transfer/ parameter and standby +server will behave as a quotesynchronous standby / in +Streaming Replication. For value literaldata_flush/, master will +wait only for data page modifications but not for transaction +commits, hence the standby server will act as quoteasynchronous +failback safe standby/. For value literal all/, master will wait +for data page modifications as well as for transaction commits and +resultant standby server will act as quotesynchronous failback safe +standby/.The wait is on background activities and hence will not create performance overhead. + To configure synchronous failback safe standby +xref linkend=guc-synchronous-standby-names should be set. + /para + /listitem + /varlistentry @@ -2258,14 +2302,25 @@ include 'filename'/indexterm listitem para -Specifies a comma-separated list of standby names that can support -firsttermsynchronous replication/, as described in -xref linkend=synchronous-replication. -At any one time there will be at most one active synchronous standby; -transactions waiting for commit will be allowed to proceed after -this standby server confirms receipt of their data. -The synchronous standby will be the first standby named in this list -that is both currently connected and streaming data in real-time +Specifies a comma-separated list of standby names. If this parameter +is set then standby will behave as synchronous standby in replication, +as described in xref linkend=synchronous-replication or synchronous +failback safe standby, as described in xref linkend=failback-safe. +At any time there will be at most one active standby; when standby is +synchronous standby in replication, transactions waiting for commit +will be allowed to proceed after this standby server confirms receipt +of their data. But when standby is synchronous failback safe standby +data page modifications as well as transaction commits will be allowed +to proceed only after this standby server confirms receipt of their data. +If this parameter is set to empty value and +xref linkend=guc-synchronous-transfer is set to literaldata_flush/ +then standby is called as asynchronous failback safe standby and only +data page modifications will wait before corresponding WAL record is +replicated to standby. + /para + para +Synchronous standby in replication will be the first standby named in +this list that is both currently connected and streaming data in real-time (as shown by a state of literalstreaming/literal in the link linkend=monitoring-stats-views-table literalpg_stat_replication//link view). --- a/doc/src/sgml/high-availability.sgml +++ b/doc/src/sgml/high-availability.sgml + + sect2 id=failback-safe + titleSetting up failback safe standby/title + + indexterm zone=high-availability + primarySetting up failback safe standby/primary + /indexterm + + para + PostgreSQL streaming replication offers durability, but if the master crashes and +a particular WAL record is unable to reach to standby server, then that +WAL record is present on master server but not on standby server. +In such a case master is ahead of standby server in term of WAL records and data in database. +This leads to file-system level inconsistency between master and standby server. +For example a heap page update on the master might not have been reflected on standby when
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Sep 20, 2013 at 3:40 PM, Sameer Thakur samthaku...@gmail.comwrote: Attached patch combines documentation patch and source-code patch. I have had a stab at reviewing the documentation. Have a look. Thanks. Attached patch implements suggestions in documentation. But comments from Fujii-san still needs to be implemented . We will implement them soon. synchronous_transfer_v9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Sep 19, 2013 at 12:25 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com wrote: I attached the patch which I have modified. Thanks for updating the patch! Here are the review comments: Thank you for reviewing! I got the compiler warning: syncrep.c:112: warning: unused variable 'i' How does synchronous_transfer work with synchronous_commit? The currently patch synchronous_transfer doesn't work when synchronous_commit is set 'off' or 'local'. if user changes synchronous_commit value on transaction, checkpointer process can't see it. Due to that, even if synchronous_commit is changed to 'off' from 'on', synchronous_transfer doesn't work. I'm planning to modify the patch so that synchronous_transfer is not affected by synchronous_commit. + * accept all the likely variants of off. This comment should be removed because synchronous_transfer doesn't accept the value off. +{commit, SYNCHRONOUS_TRANSFER_COMMIT, true}, ISTM the third value true should be false. +{0, SYNCHRONOUS_TRANSFER_COMMIT, true}, Why is this needed? +elog(WARNING, XLogSend sendTimeLineValidUpto(%X/%X) = sentPtr(%X/%X) AND sendTImeLine, + (uint32) (sendTimeLineValidUpto 32), (uint32) sendTimeLineValidUpto, + (uint32) (sentPtr 32), (uint32) sentPtr); Why is this needed? They are unnecessary. I had forgot to remove unnecessary codes. +#define SYNC_REP_WAIT_FLUSH1 +#define SYNC_REP_WAIT_DATA_FLUSH2 Why do we need to separate the wait-queue for wait-data-flush from that for wait-flush? ISTM that wait-data-flush also can wait for the replication on the wait-queue for wait-flush, and which would simplify the patch. Yes, it seems not necessary to add queue newly. I will delete SYNC_REP_WAIT_DATA_FLUSH and related that. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Sep 19, 2013 at 7:07 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Thu, Sep 19, 2013 at 12:25 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com wrote: I attached the patch which I have modified. Thanks for updating the patch! Here are the review comments: Thank you for reviewing! I got the compiler warning: syncrep.c:112: warning: unused variable 'i' How does synchronous_transfer work with synchronous_commit? The currently patch synchronous_transfer doesn't work when synchronous_commit is set 'off' or 'local'. if user changes synchronous_commit value on transaction, checkpointer process can't see it. Due to that, even if synchronous_commit is changed to 'off' from 'on', synchronous_transfer doesn't work. I'm planning to modify the patch so that synchronous_transfer is not affected by synchronous_commit. Hmm... when synchronous_transfer is set to data_flush, IMO the intuitive behaviors are (1) synchronous_commit = on A data flush should wait for the corresponding WAL to be flushed in the standby (2) synchronous_commit = remote_write A data flush should wait for the corresponding WAL to be written to OS in the standby. (3) synchronous_commit = local (4) synchronous_commit = off A data flush should wait for the corresponding WAL to be written locally in the master. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Sep 19, 2013 at 7:32 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Sep 19, 2013 at 7:07 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Thu, Sep 19, 2013 at 12:25 PM, Fujii Masao masao.fu...@gmail.com wrote: On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com wrote: I attached the patch which I have modified. Thanks for updating the patch! Here are the review comments: Thank you for reviewing! I got the compiler warning: syncrep.c:112: warning: unused variable 'i' How does synchronous_transfer work with synchronous_commit? The currently patch synchronous_transfer doesn't work when synchronous_commit is set 'off' or 'local'. if user changes synchronous_commit value on transaction, checkpointer process can't see it. Due to that, even if synchronous_commit is changed to 'off' from 'on', synchronous_transfer doesn't work. I'm planning to modify the patch so that synchronous_transfer is not affected by synchronous_commit. Hmm... when synchronous_transfer is set to data_flush, IMO the intuitive behaviors are (1) synchronous_commit = on A data flush should wait for the corresponding WAL to be flushed in the standby (2) synchronous_commit = remote_write A data flush should wait for the corresponding WAL to be written to OS in the standby. (3) synchronous_commit = local (4) synchronous_commit = off A data flush should wait for the corresponding WAL to be written locally in the master. It is good idea. So synchronous_commit value need to be visible from other process. To share synchronous_commit with other process, I will try to put synchronous_commit value into shared buffer. Is there already the guc parameter which is shared with other process? I tried to find such parameter, but there was not it. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Wed, Sep 18, 2013 at 11:45 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Sep 18, 2013 at 10:35 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Tue, Sep 17, 2013 at 9:52 PM, Fujii Masao masao.fu...@gmail.com wrote: I set up synchronous replication with synchronous_transfer = all, and then I ran pgbench -i and executed CHECKPOINT in the master. After that, when I executed CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased by synchronous_transfer feature. Did you set synchronous_standby_names in the standby server? Yes. If so, the master server waits for the standby server which is set to synchronous_standby_names. Please let me know detail of this case. Both master and standby have the same postgresql.conf settings as follows: max_wal_senders = 4 wal_level = hot_standby wal_keep_segments = 32 synchronous_standby_names = '*' synchronous_transfer = all How does synchronous_transfer work with cascade replication? If it's set to all in the sender-side standby, it can resolve the data page inconsistency between two standbys? Currently patch supports the case which two servers are set up SYNC replication. IWO, failback safe standby is the same as SYNC replication standby. User can set synchronous_transfer in only master side. So, it's very strange that CHECKPOINT on the standby gets stuck infinitely. I attached the patch which I have modified. I have modified that if both synchronous replication and synchronous transfer are requested, but the server still in recovery(i.g., the server is in standby mode), the server doesn't wait for corresponding WAL replicated. Specifically, I added condition RecoveryInProgress(). If both functions(synchronous replication and transfer) are set and user sets up synchronous replication between two servers, user can executes CHECKPOINT on standby side. It will not wait for corresponding WAL replicated. But, If both parameter are set and user doesn't set up synchronous replication(i.g., the master server works alone), the master server waits infinitely when user executes CHECKPOINT. This behaviour is similar to synchronous replication. Regards, --- Sawada Masahiko synchronous_transfer_v8.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, Sep 19, 2013 at 11:48 AM, Sawada Masahiko sawada.m...@gmail.com wrote: I attached the patch which I have modified. Thanks for updating the patch! Here are the review comments: I got the compiler warning: syncrep.c:112: warning: unused variable 'i' How does synchronous_transfer work with synchronous_commit? + * accept all the likely variants of off. This comment should be removed because synchronous_transfer doesn't accept the value off. +{commit, SYNCHRONOUS_TRANSFER_COMMIT, true}, ISTM the third value true should be false. +{0, SYNCHRONOUS_TRANSFER_COMMIT, true}, Why is this needed? +elog(WARNING, XLogSend sendTimeLineValidUpto(%X/%X) = sentPtr(%X/%X) AND sendTImeLine, + (uint32) (sendTimeLineValidUpto 32), (uint32) sendTimeLineValidUpto, + (uint32) (sentPtr 32), (uint32) sentPtr); Why is this needed? +#define SYNC_REP_WAIT_FLUSH1 +#define SYNC_REP_WAIT_DATA_FLUSH2 Why do we need to separate the wait-queue for wait-data-flush from that for wait-flush? ISTM that wait-data-flush also can wait for the replication on the wait-queue for wait-flush, and which would simplify the patch. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
syncrep.c: In function ‘SyncRepReleaseWaiters’: syncrep.c:421:6: warning: variable ‘numdataflush’ set but not used [-Wunused-but-set-variable] Sorry I forgot fix it. I have attached the patch which I modified. Attached patch combines documentation patch and source-code patch. -- Regards, Samrat Revgade synchronous_transfer_v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Sep 17, 2013 at 3:45 PM, Samrat Revagade revagade.sam...@gmail.com wrote: syncrep.c: In function ‘SyncRepReleaseWaiters’: syncrep.c:421:6: warning: variable ‘numdataflush’ set but not used [-Wunused-but-set-variable] Sorry I forgot fix it. I have attached the patch which I modified. Attached patch combines documentation patch and source-code patch. I set up synchronous replication with synchronous_transfer = all, and then I ran pgbench -i and executed CHECKPOINT in the master. After that, when I executed CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased by synchronous_transfer feature. How does synchronous_transfer work with cascade replication? If it's set to all in the sender-side standby, it can resolve the data page inconsistency between two standbys? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Sep 17, 2013 at 9:52 PM, Fujii Masao masao.fu...@gmail.com wrote: On Tue, Sep 17, 2013 at 3:45 PM, Samrat Revagade revagade.sam...@gmail.com wrote: syncrep.c: In function ‘SyncRepReleaseWaiters’: syncrep.c:421:6: warning: variable ‘numdataflush’ set but not used [-Wunused-but-set-variable] Sorry I forgot fix it. I have attached the patch which I modified. Attached patch combines documentation patch and source-code patch. I set up synchronous replication with synchronous_transfer = all, and then I ran pgbench -i and executed CHECKPOINT in the master. After that, when I executed CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased by synchronous_transfer feature. Did you set synchronous_standby_names in the standby server? If so, the master server waits for the standby server which is set to synchronous_standby_names. Please let me know detail of this case. How does synchronous_transfer work with cascade replication? If it's set to all in the sender-side standby, it can resolve the data page inconsistency between two standbys? Currently patch supports the case which two servers are set up SYNC replication. IWO, failback safe standby is the same as SYNC replication standby. User can set synchronous_transfer in only master side. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Wed, Sep 18, 2013 at 10:35 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Tue, Sep 17, 2013 at 9:52 PM, Fujii Masao masao.fu...@gmail.com wrote: I set up synchronous replication with synchronous_transfer = all, and then I ran pgbench -i and executed CHECKPOINT in the master. After that, when I executed CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased by synchronous_transfer feature. Did you set synchronous_standby_names in the standby server? Yes. If so, the master server waits for the standby server which is set to synchronous_standby_names. Please let me know detail of this case. Both master and standby have the same postgresql.conf settings as follows: max_wal_senders = 4 wal_level = hot_standby wal_keep_segments = 32 synchronous_standby_names = '*' synchronous_transfer = all How does synchronous_transfer work with cascade replication? If it's set to all in the sender-side standby, it can resolve the data page inconsistency between two standbys? Currently patch supports the case which two servers are set up SYNC replication. IWO, failback safe standby is the same as SYNC replication standby. User can set synchronous_transfer in only master side. So, it's very strange that CHECKPOINT on the standby gets stuck infinitely. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Wed, Sep 18, 2013 at 11:45 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Sep 18, 2013 at 10:35 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Tue, Sep 17, 2013 at 9:52 PM, Fujii Masao masao.fu...@gmail.com wrote: I set up synchronous replication with synchronous_transfer = all, and then I ran pgbench -i and executed CHECKPOINT in the master. After that, when I executed CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased by synchronous_transfer feature. Did you set synchronous_standby_names in the standby server? Yes. If so, the master server waits for the standby server which is set to synchronous_standby_names. Please let me know detail of this case. Both master and standby have the same postgresql.conf settings as follows: max_wal_senders = 4 wal_level = hot_standby wal_keep_segments = 32 synchronous_standby_names = '*' synchronous_transfer = all How does synchronous_transfer work with cascade replication? If it's set to all in the sender-side standby, it can resolve the data page inconsistency between two standbys? Currently patch supports the case which two servers are set up SYNC replication. IWO, failback safe standby is the same as SYNC replication standby. User can set synchronous_transfer in only master side. So, it's very strange that CHECKPOINT on the standby gets stuck infinitely. yes I think so. I was not considering that user set synchronous_standby_names in the standby server. it will ocurr I will fix it considering this case. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Wed, Sep 18, 2013 at 1:05 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Wed, Sep 18, 2013 at 11:45 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Sep 18, 2013 at 10:35 AM, Sawada Masahiko sawada.m...@gmail.com wrote: On Tue, Sep 17, 2013 at 9:52 PM, Fujii Masao masao.fu...@gmail.com wrote: I set up synchronous replication with synchronous_transfer = all, and then I ran pgbench -i and executed CHECKPOINT in the master. After that, when I executed CHECKPOINT in the standby, it got stuck infinitely. I guess this was cased by synchronous_transfer feature. Did you set synchronous_standby_names in the standby server? Yes. If so, the master server waits for the standby server which is set to synchronous_standby_names. Please let me know detail of this case. Both master and standby have the same postgresql.conf settings as follows: max_wal_senders = 4 wal_level = hot_standby wal_keep_segments = 32 synchronous_standby_names = '*' synchronous_transfer = all How does synchronous_transfer work with cascade replication? If it's set to all in the sender-side standby, it can resolve the data page inconsistency between two standbys? Currently patch supports the case which two servers are set up SYNC replication. IWO, failback safe standby is the same as SYNC replication standby. User can set synchronous_transfer in only master side. So, it's very strange that CHECKPOINT on the standby gets stuck infinitely. Sorry I sent mail by mistake. yes I think so. It waits for corresponding WAL replicated. Behaviour of synchronous_transfer is similar to synchronous_standby_names and synchronous replication little. That is, if those parameter is set but the standby server doesn't connect to the master server, the master server waits for corresponding WAL replicated to standby server infinitely. I was not considering that user set synchronous_standby_names in the standby server. I will fix it considering this case. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 9/12/13 3:00 AM, Samrat Revagade wrote: We are improving the patch for Commit Fest 2 now. We will fix above compiler warnings as soon as possible and submit the patch Attached *synchronous_transfer_v5.patch* implements review comments from commit fest-1 and reduces the performance overhead of synchronous_transfer. There is still this compiler warning: syncrep.c: In function ‘SyncRepReleaseWaiters’: syncrep.c:421:6: warning: variable ‘numdataflush’ set but not used [-Wunused-but-set-variable] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Sep 13, 2013 at 1:11 AM, Peter Eisentraut pete...@gmx.net wrote: On 9/12/13 3:00 AM, Samrat Revagade wrote: We are improving the patch for Commit Fest 2 now. We will fix above compiler warnings as soon as possible and submit the patch Attached *synchronous_transfer_v5.patch* implements review comments from commit fest-1 and reduces the performance overhead of synchronous_transfer. There is still this compiler warning: syncrep.c: In function ‘SyncRepReleaseWaiters’: syncrep.c:421:6: warning: variable ‘numdataflush’ set but not used [-Wunused-but-set-variable] Sorry I forgot fix it. I have attached the patch which I modified. Regards, --- Sawada Masahiko synchronous_transfer_v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Sat, Aug 24, 2013 at 11:38 PM, Peter Eisentraut pete...@gmx.net wrote: On Thu, 2013-07-11 at 23:42 +0900, Sawada Masahiko wrote: please find the attached patch. Please fix these compiler warnings: xlog.c:3117:2: warning: implicit declaration of function ‘SyncRepWaitForLSN’ [-Wimplicit-function-declaration] syncrep.c:414:6: warning: variable ‘numdataflush’ set but not used [-Wunused-but-set-variable] Thank you for your information! We are improving the patch for Commit Fest 2 now. We will fix above compiler warnings as soon as possible and submit the patch -- Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Thu, 2013-07-11 at 23:42 +0900, Sawada Masahiko wrote: please find the attached patch. Please fix these compiler warnings: xlog.c:3117:2: warning: implicit declaration of function ‘SyncRepWaitForLSN’ [-Wimplicit-function-declaration] syncrep.c:414:6: warning: variable ‘numdataflush’ set but not used [-Wunused-but-set-variable] -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
ToDo 1. currently this patch supports synchronous transfer. so we can't set different synchronous transfer mode to each server. we need to improve the patch for support following cases. - SYNC standby and make separate ASYNC failback safe standby - ASYNC standby and make separate ASYNC failback safe standby 2. we have not measure performance yet. we need to measure perfomance. Here are the tests results, showing the performance overhead of the patch ( failback_safe_standby_v4.patch ): Tests are carried out in two different scenarios: 1. Tests with Fast transaction workloads. 2. Test with large loads. Test Type-1: Tests with pgbench (*Tests with fast transaction workloads*) Notes: 1. These test are for testing the performance overhead caused by the patch for fast transaction workloads. 2. Tests are performed with the pgbench benchmark and performance measurement factor is the TPS value. 3. Values represents the TPS for 4 runs,and the last value represents the average of all the runs. Settings for tests: transaction type: TPC-B (sort of) scaling factor: 300 query mode: simple number of clients:150 number of threads:1 duration:1800 s Analysis of results: 1) Synchronous Replication :(753.06, 748.81, 748.38, 747.21, Avg- 747.2) 2) Synchronous Replication + Failsafe standby (commit) : (729.13,724.33 , 713.59 , 710.79, Avg- 719.46) 3) Synchronous Replication + Failsafe standby (all) : (692.08, 688.08, 711.23,711.62, Avg- 700.75) 4) Asynchronous Replication :(1008.42, 993.39, 986.80 ,1028.46 , Avg-1004.26 ) 5) Asynchronous Replication + Failsafe standby (commit) : (974.49, 978.60 ,969.11, 957.18 , Avg- 969.84) 6) Asynchronous Replication + Failsafe standby (data_flush) : (1011.79, 992.05, 1030.20,940.50 , Avg- 993.63) In above test results the performance numbers are very close to each other, also because of noise they show variation. hence following is approximate conclusion about the overhead of patch. 1. Streaming replication + synchronous_transfer (all , data_flush):: a) On an average, synchronous replication combined with synchronous_transfer (all ) causes 6.21 % performance overhead. b) On an average, asynchronous streaming replication combined synchronous_transfer (data_flush ) causes averagely 1.05 % performance overhead. 2. Streaming replication + synchronous_transfer (commit): a) On an average, synchronous replication combined with synchronous_transfer (commit ) causes 3.71 % performance overhead. b) On an average, asynchronous streaming replication combined with synchronous_transfer (commit) causes averagely 3.42 % performance overhead. Test Type-2: Tests with pgbench -i (*Tests with large loads:*) Notes: 1. These test are for testing the performance overhead caused by the patch for large loads and index builds. 2. Tests are performed with the pgbench -i (initialization of test data i.e the time taken for creating tables of pgbench, inserting tuples and building primary keys.) 3. The performance measurement factor is the wall clock time for pgbench -i (measured with time command). 4. Values represents the Wall clock time for 4 runs and the last value represents the average of all the runs. pgbench settings: Scale factor: 300 ( Database size - 4.3873 GB) Test results: 1) Synchronous Replication : (126.98, 133.83, 127.77, 129.70, Avg-129.57) (second) 2) Synchronous Replication + synchronous_transfer (commit) : (132.87, 125.85, 133.91, 134.61, Avg-131.81) (second) 3) Synchronous Replication + synchronous_transfer (all) : (133.59, 132.82, 134.20, 135.22, Avg-133.95) (second) 4) Asynchronous Replication : ( 126.75 , 136.95, 130.42, 127.77, 130.47) (second) 5) Asynchronous Replication + synchronous_transfer (commit) : (128.13, 133.06, 127.62, 130.70, Avg-129.87) (second) 6) Asynchronous Replication + synchronous_transfer (data_flush) : (134.55 , 139.90, 144.47, 143.85, Avg-140.69) (second) In above test results the performance numbers are very close to each other, also because of noise they show variation. hence following is approximate conclusion about the overhead of patch. 1. Streaming replication + synchronous_transfer (all , data_flush):: a) On an average, synchronous replication combined with synchronous_transfer (all ) causes 3.38 % performance overhead. b) On an average, asynchronous streaming replication combined synchronous_transfer (data_flush ) causes averagely 7.83 % performance overhead. 2. Streaming replication + synchronous_transfer (commit): a) On an average, synchronous replication combined with synchronous_transfer (commit ) causes 1.72 % performance overhead. b) On an average, asynchronous streaming replication combined with synchronous_transfer (commit) causes averagely (-0.45) % performance overhead. The test results for both the cases (large loads and fast transactions) shows variation because of noise, But we can observe that approximately patch causes 3-4% performance overhead. Regards,
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Jul 9, 2013 at 11:45 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Sun, Jul 7, 2013 at 4:27 PM, Sawada Masahiko sawada.m...@gmail.com wrote: I found a bug which occurred when we do vacuum, and have fixed it. yesterday (8th July) Improve scalability of WAL insertions patch is committed to HEAD. so v2 patch does not apply to HEAD now. I also have fixed it to be applicable to HEAD. please find the attached patch. Regards, --- Sawada Masahiko I have fixed that master server doesn't waits for the WAL to be flushed to disk of standby when master server execute FlushBuffer(), and have attached v4 patch. please find the attached patch. Regards, --- Sawada Masahiko failback_safe_standby_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Sun, Jul 7, 2013 at 4:27 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Sun, Jul 7, 2013 at 4:19 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs si...@2ndquadrant.com wrote: On 17 June 2013 09:03, Pavan Deolasee pavan.deola...@gmail.com wrote: I agree. We should probably find a better name for this. Any suggestions ? err, I already made one... But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). Since commits are more foreground in nature and this feature does not require us to wait during common foreground activities, we want a configuration where master can wait for synchronous transfers at other than commits. May we can solve that by having more granular control to the said parameter ? The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. How would we then distinguish between synchronous and the new kind of standby ? That's not the point. The point is Why would we have a new kind of standby? and therefore why do we need new parameters? I am told, one of the very popular setups for DR is to have one local sync standby and one async (may be cascaded by the local sync). Since this new feature is more useful for DR because taking a fresh backup on a slower link is even more challenging, IMHO we should support such setups. ...which still doesn't make sense to me. Lets look at that in detail. Take 3 servers, A, B, C with A and B being linked by sync rep, and C being safety standby at a distance. Either A or B is master, except in disaster. So if A is master, then B would be the failover target. If A fails, then you want to failover to B. Once B is the target, you want to failback to A as the master. C needs to follow the new master, whichever it is. If you set up sync rep between A and B and this new mode between A and C. When B becomes the master, you need to failback from B from A, but you can't because the new mode applied between A and C only, so you have to failback from C to A. So having the new mode not match with sync rep means you are forcing people to failback using the slow link in the common case. You might observe that having the two modes match causes problems if A and B fail, so you are forced to go to C as master and then eventually failback to A or B across a slow link. That case is less common and could be solved by extending sync transfer to more/multi nodes. It definitely doesn't make sense to have sync rep on anything other than a subset of sync transfer. So while it may be sensible in the future to make sync transfer a superset of sync rep nodes, it makes sense to make them the same config for now. I have updated the patch. we support following 2 cases. 1. SYNC server and also make same failback safe standby server 2. ASYNC server and also make same failback safe standby server 1. changed name of parameter give up 'failback_safe_standby_names' parameter from the first patch. and changed name of parameter from 'failback_safe_mode ' to 'synchronous_transfer'. this parameter accepts 'all', 'data_flush' and 'commit'. -'commit' 'commit' means that master waits for corresponding WAL to flushed to disk of standby server on commits. but master doesn't waits for replicated data pages. -'data_flush' 'data_flush' means that master waits for replicated data page (e.g, CLOG, pg_control) before flush to disk of master server. but if user set to 'data_flush' to this parameter, 'synchronous_commit' values is ignored even if user set 'synchronous_commit'. -'all' 'all' means that master waits for replicated WAL and data page. 2. put SyncRepWaitForLSN() function into XLogFlush() function we have put SyncRepWaitForLSN() function into XLogFlush() function, and change argument of XLogFlush(). they are setup case and need to set parameters. - SYNC server and also make same failback safe standgy server (case 1) synchronous_transfer = all synchronous_commit = remote_write/on synchronous_standby_names = ServerName - ASYNC server and also make same failback safe standgy server (case 2) synchronous_transfer = data_flush (synchronous_commit values is ignored) - default SYNC replication synchronous_transfer = commit synchronous_commit = on synchronous_standby_names = ServerName - default ASYNC replication synchronous_transfer = commit ToDo 1.
Re: [HACKERS] Patch for fail-back without fresh backup
On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs si...@2ndquadrant.com wrote: On 17 June 2013 09:03, Pavan Deolasee pavan.deola...@gmail.com wrote: I agree. We should probably find a better name for this. Any suggestions ? err, I already made one... But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). Since commits are more foreground in nature and this feature does not require us to wait during common foreground activities, we want a configuration where master can wait for synchronous transfers at other than commits. May we can solve that by having more granular control to the said parameter ? The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. How would we then distinguish between synchronous and the new kind of standby ? That's not the point. The point is Why would we have a new kind of standby? and therefore why do we need new parameters? I am told, one of the very popular setups for DR is to have one local sync standby and one async (may be cascaded by the local sync). Since this new feature is more useful for DR because taking a fresh backup on a slower link is even more challenging, IMHO we should support such setups. ...which still doesn't make sense to me. Lets look at that in detail. Take 3 servers, A, B, C with A and B being linked by sync rep, and C being safety standby at a distance. Either A or B is master, except in disaster. So if A is master, then B would be the failover target. If A fails, then you want to failover to B. Once B is the target, you want to failback to A as the master. C needs to follow the new master, whichever it is. If you set up sync rep between A and B and this new mode between A and C. When B becomes the master, you need to failback from B from A, but you can't because the new mode applied between A and C only, so you have to failback from C to A. So having the new mode not match with sync rep means you are forcing people to failback using the slow link in the common case. You might observe that having the two modes match causes problems if A and B fail, so you are forced to go to C as master and then eventually failback to A or B across a slow link. That case is less common and could be solved by extending sync transfer to more/multi nodes. It definitely doesn't make sense to have sync rep on anything other than a subset of sync transfer. So while it may be sensible in the future to make sync transfer a superset of sync rep nodes, it makes sense to make them the same config for now. I have updated the patch. we support following 2 cases. 1. SYNC server and also make same failback safe standby server 2. ASYNC server and also make same failback safe standby server 1. changed name of parameter give up 'failback_safe_standby_names' parameter from the first patch. and changed name of parameter from 'failback_safe_mode ' to 'synchronous_transfer'. this parameter accepts 'all', 'data_flush' and 'commit'. -'commit' 'commit' means that master waits for corresponding WAL to flushed to disk of standby server on commits. but master doesn't waits for replicated data pages. -'data_flush' 'data_flush' means that master waits for replicated data page (e.g, CLOG, pg_control) before flush to disk of master server. but if user set to 'data_flush' to this parameter, 'synchronous_commit' values is ignored even if user set 'synchronous_commit'. -'all' 'all' means that master waits for replicated WAL and data page. 2. put SyncRepWaitForLSN() function into XLogFlush() function we have put SyncRepWaitForLSN() function into XLogFlush() function, and change argument of XLogFlush(). they are setup case and need to set parameters. - SYNC server and also make same failback safe standgy server (case 1) synchronous_transfer = all synchronous_commit = remote_write/on synchronous_standby_names = ServerName - ASYNC server and also make same failback safe standgy server (case 2) synchronous_transfer = data_flush (synchronous_commit values is ignored) - default SYNC replication synchronous_transfer = commit synchronous_commit = on synchronous_standby_names = ServerName - default ASYNC replication synchronous_transfer = commit ToDo 1. currently this patch supports synchronous transfer. so we can't set different synchronous transfer mode to each server. we need to improve the patch for support following cases. - SYNC standby
Re: [HACKERS] Patch for fail-back without fresh backup
On Sun, Jul 7, 2013 at 4:19 PM, Sawada Masahiko sawada.m...@gmail.com wrote: On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs si...@2ndquadrant.com wrote: On 17 June 2013 09:03, Pavan Deolasee pavan.deola...@gmail.com wrote: I agree. We should probably find a better name for this. Any suggestions ? err, I already made one... But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). Since commits are more foreground in nature and this feature does not require us to wait during common foreground activities, we want a configuration where master can wait for synchronous transfers at other than commits. May we can solve that by having more granular control to the said parameter ? The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. How would we then distinguish between synchronous and the new kind of standby ? That's not the point. The point is Why would we have a new kind of standby? and therefore why do we need new parameters? I am told, one of the very popular setups for DR is to have one local sync standby and one async (may be cascaded by the local sync). Since this new feature is more useful for DR because taking a fresh backup on a slower link is even more challenging, IMHO we should support such setups. ...which still doesn't make sense to me. Lets look at that in detail. Take 3 servers, A, B, C with A and B being linked by sync rep, and C being safety standby at a distance. Either A or B is master, except in disaster. So if A is master, then B would be the failover target. If A fails, then you want to failover to B. Once B is the target, you want to failback to A as the master. C needs to follow the new master, whichever it is. If you set up sync rep between A and B and this new mode between A and C. When B becomes the master, you need to failback from B from A, but you can't because the new mode applied between A and C only, so you have to failback from C to A. So having the new mode not match with sync rep means you are forcing people to failback using the slow link in the common case. You might observe that having the two modes match causes problems if A and B fail, so you are forced to go to C as master and then eventually failback to A or B across a slow link. That case is less common and could be solved by extending sync transfer to more/multi nodes. It definitely doesn't make sense to have sync rep on anything other than a subset of sync transfer. So while it may be sensible in the future to make sync transfer a superset of sync rep nodes, it makes sense to make them the same config for now. I have updated the patch. we support following 2 cases. 1. SYNC server and also make same failback safe standby server 2. ASYNC server and also make same failback safe standby server 1. changed name of parameter give up 'failback_safe_standby_names' parameter from the first patch. and changed name of parameter from 'failback_safe_mode ' to 'synchronous_transfer'. this parameter accepts 'all', 'data_flush' and 'commit'. -'commit' 'commit' means that master waits for corresponding WAL to flushed to disk of standby server on commits. but master doesn't waits for replicated data pages. -'data_flush' 'data_flush' means that master waits for replicated data page (e.g, CLOG, pg_control) before flush to disk of master server. but if user set to 'data_flush' to this parameter, 'synchronous_commit' values is ignored even if user set 'synchronous_commit'. -'all' 'all' means that master waits for replicated WAL and data page. 2. put SyncRepWaitForLSN() function into XLogFlush() function we have put SyncRepWaitForLSN() function into XLogFlush() function, and change argument of XLogFlush(). they are setup case and need to set parameters. - SYNC server and also make same failback safe standgy server (case 1) synchronous_transfer = all synchronous_commit = remote_write/on synchronous_standby_names = ServerName - ASYNC server and also make same failback safe standgy server (case 2) synchronous_transfer = data_flush (synchronous_commit values is ignored) - default SYNC replication synchronous_transfer = commit synchronous_commit = on synchronous_standby_names = ServerName - default ASYNC replication synchronous_transfer = commit ToDo 1. currently this patch supports synchronous transfer. so we can't set different
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Jul 2, 2013 at 2:45 PM, Amit Kapila amit.kap...@huawei.com wrote: On Friday, June 28, 2013 10:41 AM Sawada Masahiko wrote: On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila amit.kap...@huawei.com wrote: On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote: Hi, So our proposal on this problem is that we must ensure that master should not make any file system level changes without confirming that the corresponding WAL record is replicated to the standby. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. I am trying to understand how there would be extra WAL on old master that it would replay and cause inconsistency. Consider how I am picturing it and correct me if I am wrong. 1) Master crashes. So a failback standby becomes new master forking the WAL. 2) Old master is restarted as a standby (now with this patch, without a new base backup). 3) It would try to replay all the WAL it has available and later connect to the new master also following the timeline switch (the switch might happen using archived WAL and timeline history file OR the new switch-over-streaming-replication-connection as of 9.3, right?) * in (3), when the new standby/old master is replaying WAL, from where is it picking the WAL? Yes, this is the point which can lead to inconsistency, new standby/old master will replay WAL after the last successful checkpoint, for which he get info from control file. It is picking WAL from the location where it was logged when it was active (pg_xlog). Does it first replay all the WAL in pg_xlog before archive? Should we make it check for a timeline history file in archive before it starts replaying any WAL? I have really not thought what is best solution for problem. * And, would the new master, before forking the WAL, replay all the WAL that is necessary to come to state (of data directory) that the old master was just before it crashed? I don't think new master has any correlation with old master's data directory, Rather it will replay the WAL it has received/flushed before start acting as master. when old master fail over, WAL which ahead of new master might be broken data. so that when user want to dump from old master, there is possible to fail dump. it is just idea, we extend parameter which is used in recovery.conf like 'follow_master_force'. this parameter accepts 'on' and 'off', is effective only when standby_mode is set to on. if both parameters 'follow_master_force' and 'standby_mode' is set to 'on', 1. when standby server starts and starts to recovery, standby server skip to apply WAL which is in pg_xlog, and request WAL from latest checkpoint LSN to master server. 2. master server receives LSN which is standby server latest checkpoint, and compare between LSN of standby and LSN of master latest checkpoint. if those LSN match, master will send WAL from latest checkpoint LSN. if not, master will inform standby that failed. 3. standby will fork WAL, and apply WAL which is sent from master continuity. Please consider if this solution has the same problem as mentioned by Robert Hass in below mail: http://www.postgresql.org/message-id/ca+tgmoy4j+p7jy69ry8gposmmdznyqu6dtionprcxavg+sp...@mail.gmail.com in this approach, user who want to dump from old master will set 'off' to follow_master_force and standby_mode, and gets the dump of old master after master started. OTOH, user who want to starts replication force will set 'on' to both parameter. I think before going into solution of this problem, it should be confirmed by others whether such a problem needs to be resolved as part of this patch. I have seen that Simon Riggs is a reviewer of this Patch and he hasn't mentioned his views about this problem. So I think it's not worth inventing a solution. Rather I think if all other things are resolved for this patch, then may be in end we can check with Committer, if he thinks that this problem needs to be solved as a separate patch. thank you for feedback. yes, we can consider separately those problem. and we need to judge that whether it is worth to invent a solution. I think that solving the fundamental of this problem is complex. it might be needs to big change to architecture of replication. so I'm thinking that I'd like to deal of something when we do recovery. if so, I think that if we deal at recovery time, impact to performance is ignored. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tuesday, July 02, 2013 11:16 AM Amit Kapila wrote: On Friday, June 28, 2013 10:41 AM Sawada Masahiko wrote: On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila amit.kap...@huawei.com wrote: On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote: Hi, So our proposal on this problem is that we must ensure that master should not make any file system level changes without confirming that the corresponding WAL record is replicated to the standby. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. I am trying to understand how there would be extra WAL on old master that it would replay and cause inconsistency. Consider how I am picturing it and correct me if I am wrong. 1) Master crashes. So a failback standby becomes new master forking the WAL. 2) Old master is restarted as a standby (now with this patch, without a new base backup). 3) It would try to replay all the WAL it has available and later connect to the new master also following the timeline switch (the switch might happen using archived WAL and timeline history file OR the new switch-over-streaming-replication-connection as of 9.3, right?) * in (3), when the new standby/old master is replaying WAL, from where is it picking the WAL? Yes, this is the point which can lead to inconsistency, new standby/old master will replay WAL after the last successful checkpoint, for which he get info from control file. It is picking WAL from the location where it was logged when it was active (pg_xlog). Does it first replay all the WAL in pg_xlog before archive? Should we make it check for a timeline history file in archive before it starts replaying any WAL? I have really not thought what is best solution for problem. * And, would the new master, before forking the WAL, replay all the WAL that is necessary to come to state (of data directory) that the old master was just before it crashed? I don't think new master has any correlation with old master's data directory, Rather it will replay the WAL it has received/flushed before start acting as master. when old master fail over, WAL which ahead of new master might be broken data. so that when user want to dump from old master, there is possible to fail dump. it is just idea, we extend parameter which is used in recovery.conf like 'follow_master_force'. this parameter accepts 'on' and 'off', is effective only when standby_mode is set to on. if both parameters 'follow_master_force' and 'standby_mode' is set to 'on', 1. when standby server starts and starts to recovery, standby server skip to apply WAL which is in pg_xlog, and request WAL from latest checkpoint LSN to master server. 2. master server receives LSN which is standby server latest checkpoint, and compare between LSN of standby and LSN of master latest checkpoint. if those LSN match, master will send WAL from latest checkpoint LSN. if not, master will inform standby that failed. 3. standby will fork WAL, and apply WAL which is sent from master continuity. Please consider if this solution has the same problem as mentioned by Robert Hass in below mail: Sorry typo error, it's Robert Haas mail: http://www.postgresql.org/message- id/ca+tgmoy4j+p7jy69ry8gposmmdznyqu6dtionprcxavg+sp...@mail.gmail.com in this approach, user who want to dump from old master will set 'off' to follow_master_force and standby_mode, and gets the dump of old master after master started. OTOH, user who want to starts replication force will set 'on' to both parameter. I think before going into solution of this problem, it should be confirmed by others whether such a problem needs to be resolved as part of this patch. I have seen that Simon Riggs is a reviewer of this Patch and he hasn't mentioned his views about this problem. So I think it's not worth inventing a solution. Rather I think if all other things are resolved for this patch, then may be in end we can check with Committer, if he thinks that this problem needs to be solved as a separate patch. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Friday, June 28, 2013 10:41 AM Sawada Masahiko wrote: On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila amit.kap...@huawei.com wrote: On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote: Hi, So our proposal on this problem is that we must ensure that master should not make any file system level changes without confirming that the corresponding WAL record is replicated to the standby. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. I am trying to understand how there would be extra WAL on old master that it would replay and cause inconsistency. Consider how I am picturing it and correct me if I am wrong. 1) Master crashes. So a failback standby becomes new master forking the WAL. 2) Old master is restarted as a standby (now with this patch, without a new base backup). 3) It would try to replay all the WAL it has available and later connect to the new master also following the timeline switch (the switch might happen using archived WAL and timeline history file OR the new switch-over-streaming-replication-connection as of 9.3, right?) * in (3), when the new standby/old master is replaying WAL, from where is it picking the WAL? Yes, this is the point which can lead to inconsistency, new standby/old master will replay WAL after the last successful checkpoint, for which he get info from control file. It is picking WAL from the location where it was logged when it was active (pg_xlog). Does it first replay all the WAL in pg_xlog before archive? Should we make it check for a timeline history file in archive before it starts replaying any WAL? I have really not thought what is best solution for problem. * And, would the new master, before forking the WAL, replay all the WAL that is necessary to come to state (of data directory) that the old master was just before it crashed? I don't think new master has any correlation with old master's data directory, Rather it will replay the WAL it has received/flushed before start acting as master. when old master fail over, WAL which ahead of new master might be broken data. so that when user want to dump from old master, there is possible to fail dump. it is just idea, we extend parameter which is used in recovery.conf like 'follow_master_force'. this parameter accepts 'on' and 'off', is effective only when standby_mode is set to on. if both parameters 'follow_master_force' and 'standby_mode' is set to 'on', 1. when standby server starts and starts to recovery, standby server skip to apply WAL which is in pg_xlog, and request WAL from latest checkpoint LSN to master server. 2. master server receives LSN which is standby server latest checkpoint, and compare between LSN of standby and LSN of master latest checkpoint. if those LSN match, master will send WAL from latest checkpoint LSN. if not, master will inform standby that failed. 3. standby will fork WAL, and apply WAL which is sent from master continuity. Please consider if this solution has the same problem as mentioned by Robert Hass in below mail: http://www.postgresql.org/message-id/ca+tgmoy4j+p7jy69ry8gposmmdznyqu6dtionprcxavg+sp...@mail.gmail.com in this approach, user who want to dump from old master will set 'off' to follow_master_force and standby_mode, and gets the dump of old master after master started. OTOH, user who want to starts replication force will set 'on' to both parameter. I think before going into solution of this problem, it should be confirmed by others whether such a problem needs to be resolved as part of this patch. I have seen that Simon Riggs is a reviewer of this Patch and he hasn't mentioned his views about this problem. So I think it's not worth inventing a solution. Rather I think if all other things are resolved for this patch, then may be in end we can check with Committer, if he thinks that this problem needs to be solved as a separate patch. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Mon, Jun 24, 2013 at 10:47 PM, Sawada Masahiko sawada.m...@gmail.com wrote: 1. synchronous standby and make same as failback safe standby 2. asynchronous standby and make same as failback safe standby in above case, adding new parameter might be meaningless. but I think that we should handle case not only case 1,2 but also following case 3, 4 for DR. to support case 1 and 2, I'm thinking that following another 2 ideas. we add synchronous_transfer( commit/ data_flush/ all) . This GUC will only affect the standbys mentioned in the list of synchronous_standby_names. 1. If synchronous_transfer is set to commit, current synchronous replication behavior is achieved 2. If synchronous_transfer is set to data_flush, the standbys named in synchronous_standby_names will act as ASYNC failback safe standbys 3. If synchronous_transfer is set to all, the standbys named in synchronous_standby_names will act as SYNC failback safe standbys in this approach, 3 is confusing because we are actually setting up a ASYNC standby by using the GUCs meant for sync standby setup. - we extend synchronous_commit so that it also accepts like 'all'. ( this approach dosen't provide 'synchronous_transfer' parameter) 'all' value means that master wait for not only replicated WAL but also replicated data page (e.g., CLOG, pg_control). and master changes the process by whether standby is connected as sync or async. 1. If synchronous_commit is set to 'all' and synchronous_standby_name is set to standby name, the standbys named in synchronous_standby_names will act as SYNC failback safe standby. 2. If synchronous_commit is set to 'all' and synchronous_standby_name is NOT set to standby name, the standbys which is connecting to master will act as ASYNC failback safe standby. one problem with not naming ASYNC standby explicitly is that the master has no clue which standby to wait on. If it chooses to wait on all async standbys for failback-safety that can be quite detrimental, especially because async standbys can become easily unreachable if they are on a slow link or at remote location. please give me feedback. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Mon, Jun 17, 2013 at 7:48 AM, Simon Riggs si...@2ndquadrant.com wrote: I am told, one of the very popular setups for DR is to have one local sync standby and one async (may be cascaded by the local sync). Since this new feature is more useful for DR because taking a fresh backup on a slower link is even more challenging, IMHO we should support such setups. ...which still doesn't make sense to me. Lets look at that in detail. Take 3 servers, A, B, C with A and B being linked by sync rep, and C being safety standby at a distance. Either A or B is master, except in disaster. So if A is master, then B would be the failover target. If A fails, then you want to failover to B. Once B is the target, you want to failback to A as the master. C needs to follow the new master, whichever it is. If you set up sync rep between A and B and this new mode between A and C. When B becomes the master, you need to failback from B from A, but you can't because the new mode applied between A and C only, so you have to failback from C to A. So having the new mode not match with sync rep means you are forcing people to failback using the slow link in the common case. It's true that in this scenario that doesn't really make sense, but I still think they are separate properties. You could certainly want synchronous replication without this new property, if you like the data-loss guarantees that sync rep provides but don't care about failback. You could also want this new property without synchronous replication, if you don't need the data-loss guarantees that sync rep provides but you do care about fast failback. I admit it seems unlikely that you would use both features but not target them at the same machines, although maybe: perhaps you have a sync standby and an async standby and want this new property with respect to both of them. In my admittedly limited experience, the use case for a lot of this technology is in the cloud. The general strategy seems to be: at the first sign of trouble, kill the offending instance and fail over. This can result in failing over pretty frequently, and needing it to be fast. There may be no real hardware problem; indeed, the failover may be precipitated by network conditions or overload of the physical host backing the virtual machine or any number of other nonphysical problems. I can see this being useful in that environment, even for async standbys. People can apparently tolerate a brief interruption while their primary gets killed off and connections are re-established with the new master, but they need the failover to be fast. The problem with the status quo is that, even if the first failover is fast, the second one isn't, because it has to wait behind rebuilding the original master. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila amit.kap...@huawei.com wrote: On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote: Hi, So our proposal on this problem is that we must ensure that master should not make any file system level changes without confirming that the corresponding WAL record is replicated to the standby. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. I am trying to understand how there would be extra WAL on old master that it would replay and cause inconsistency. Consider how I am picturing it and correct me if I am wrong. 1) Master crashes. So a failback standby becomes new master forking the WAL. 2) Old master is restarted as a standby (now with this patch, without a new base backup). 3) It would try to replay all the WAL it has available and later connect to the new master also following the timeline switch (the switch might happen using archived WAL and timeline history file OR the new switch-over-streaming-replication-connection as of 9.3, right?) * in (3), when the new standby/old master is replaying WAL, from where is it picking the WAL? Yes, this is the point which can lead to inconsistency, new standby/old master will replay WAL after the last successful checkpoint, for which he get info from control file. It is picking WAL from the location where it was logged when it was active (pg_xlog). Does it first replay all the WAL in pg_xlog before archive? Should we make it check for a timeline history file in archive before it starts replaying any WAL? I have really not thought what is best solution for problem. * And, would the new master, before forking the WAL, replay all the WAL that is necessary to come to state (of data directory) that the old master was just before it crashed? I don't think new master has any correlation with old master's data directory, Rather it will replay the WAL it has received/flushed before start acting as master. when old master fail over, WAL which ahead of new master might be broken data. so that when user want to dump from old master, there is possible to fail dump. it is just idea, we extend parameter which is used in recovery.conf like 'follow_master_force'. this parameter accepts 'on' and 'off', is effective only when standby_mode is set to on. if both parameters 'follow_master_force' and 'standby_mode' is set to 'on', 1. when standby server starts and starts to recovery, standby server skip to apply WAL which is in pg_xlog, and request WAL from latest checkpoint LSN to master server. 2. master server receives LSN which is standby server latest checkpoint, and compare between LSN of standby and LSN of master latest checkpoint. if those LSN match, master will send WAL from latest checkpoint LSN. if not, master will inform standby that failed. 3. standby will fork WAL, and apply WAL which is sent from master continuity. in this approach, user who want to dump from old master will set 'off' to follow_master_force and standby_mode, and gets the dump of old master after master started. OTOH, user who want to starts replication force will set 'on' to both parameter. please give me feedback. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tue, Jun 25, 2013 at 12:19 PM, Pavan Deolasee pavan.deola...@gmail.com wrote: On Mon, Jun 24, 2013 at 7:17 PM, Sawada Masahiko sawada.m...@gmail.com wrote: [Server] standby_name = 'slave1' synchronous_transfer = commit wal_sender_timeout = 30 [Server] standby_name = 'slave2' synchronous_transfer = all wal_sender_timeout = 50 --- What different values/modes you are thinking for synchronous_transfer ? IMHO only commit and all may not be enough. As I suggested upthread, we may need an additional mode, say data, which will ensure synchronous WAL transfer before making any file system changes. We need this separate mode because the failback safe (or whatever we call it) standby need not wait on the commits and it's important to avoid that wait since it comes in a direct path of client transactions. If we are doing it, I wonder if an additional mode none also makes sense so that users can also control asynchronous standbys via the same mechanism. I made mistake how to use name of parameter between synchronous_transfer and failback_safe_standby_mode. it means that we control file system changes using failback_safe_standby_mode. if failback_safe_standby_mode is set 'remote_flush', master server wait for flushing all data page in standby server (e.g., CLOG, pg_control). right? for example: -- [server] standby_name = 'slave1' failback_safe_standby_mode = remote_flush wal_sender_timeout = 50 -- in this case, we should also set synchronous_commit and synchronous_level to each standby server. that is, do we need to set following 3 parameters for supporting case 3,4 as I said? -synchronous_commit = on/off/local/remote_write -failback_safe_standby_mode = off/remote_write/remote_flush -synchronous_level = sync/async (this parameter means that standby server is connected using which mode (sync/async) .) please give me your feedback. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote: Hi, So our proposal on this problem is that we must ensure that master should not make any file system level changes without confirming that the corresponding WAL record is replicated to the standby. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. I am trying to understand how there would be extra WAL on old master that it would replay and cause inconsistency. Consider how I am picturing it and correct me if I am wrong. 1) Master crashes. So a failback standby becomes new master forking the WAL. 2) Old master is restarted as a standby (now with this patch, without a new base backup). 3) It would try to replay all the WAL it has available and later connect to the new master also following the timeline switch (the switch might happen using archived WAL and timeline history file OR the new switch-over-streaming-replication-connection as of 9.3, right?) * in (3), when the new standby/old master is replaying WAL, from where is it picking the WAL? Yes, this is the point which can lead to inconsistency, new standby/old master will replay WAL after the last successful checkpoint, for which he get info from control file. It is picking WAL from the location where it was logged when it was active (pg_xlog). Does it first replay all the WAL in pg_xlog before archive? Should we make it check for a timeline history file in archive before it starts replaying any WAL? I have really not thought what is best solution for problem. * And, would the new master, before forking the WAL, replay all the WAL that is necessary to come to state (of data directory) that the old master was just before it crashed? I don't think new master has any correlation with old master's data directory, Rather it will replay the WAL it has received/flushed before start acting as master. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs si...@2ndquadrant.com wrote: On 17 June 2013 09:03, Pavan Deolasee pavan.deola...@gmail.com wrote: I agree. We should probably find a better name for this. Any suggestions ? err, I already made one... But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). Since commits are more foreground in nature and this feature does not require us to wait during common foreground activities, we want a configuration where master can wait for synchronous transfers at other than commits. May we can solve that by having more granular control to the said parameter ? The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. How would we then distinguish between synchronous and the new kind of standby ? That's not the point. The point is Why would we have a new kind of standby? and therefore why do we need new parameters? I am told, one of the very popular setups for DR is to have one local sync standby and one async (may be cascaded by the local sync). Since this new feature is more useful for DR because taking a fresh backup on a slower link is even more challenging, IMHO we should support such setups. ...which still doesn't make sense to me. Lets look at that in detail. Take 3 servers, A, B, C with A and B being linked by sync rep, and C being safety standby at a distance. Either A or B is master, except in disaster. So if A is master, then B would be the failover target. If A fails, then you want to failover to B. Once B is the target, you want to failback to A as the master. C needs to follow the new master, whichever it is. If you set up sync rep between A and B and this new mode between A and C. When B becomes the master, you need to failback from B from A, but you can't because the new mode applied between A and C only, so you have to failback from C to A. So having the new mode not match with sync rep means you are forcing people to failback using the slow link in the common case. You might observe that having the two modes match causes problems if A and B fail, so you are forced to go to C as master and then eventually failback to A or B across a slow link. That case is less common and could be solved by extending sync transfer to more/multi nodes. It definitely doesn't make sense to have sync rep on anything other than a subset of sync transfer. So while it may be sensible in the future to make sync transfer a superset of sync rep nodes, it makes sense to make them the same config for now. when 2 servers being synchronous replication, those servers are in same location in many cases. ( e.g., same server room) so taking a full backup and sending it to old master is not issue. this proposal works for situation which those servers are put in remote location and when main site is powered down due to such as power failure or natural disaster occurs. as you said, we can control file (e.g., CLOG, pg_control, etc) replicating by adding synchronous_transfer option. but if to add only this parameter, we can handle only following 2 cases. 1. synchronous standby and make same as failback safe standby 2. asynchronous standby and make same as failback safe standby in above case, adding new parameter might be meaningless. but I think that we should handle case not only case 1,2 but also following case 3, 4 for DR. 3. synchronous standby and make different asynchronous failback safe standby 4. asynchronous standby and make different asynchronous failback safe standby To handles following case 3 and 4, we should set parameter to each standby. so we need to adding new parameter. if we can structure replication in such situation, replication would be more useful for user in slow link. parameter improvement idea is which we extend ini file for to set parameter each standby. For example : [Server] standby_name = 'slave1' synchronous_transfer = commit wal_sender_timeout = 30 [Server] standby_name = 'slave2' synchronous_transfer = all wal_sender_timeout = 50 --- there are discussions about such ini file in past. if so, we can set each parameter to each standby. please give me feedback. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
Hi, parameter improvement idea is which we extend ini file for to set parameter each standby. For example : [Server] standby_name = 'slave1' synchronous_transfer = commit wal_sender_timeout = 30 [Server] standby_name = 'slave2' synchronous_transfer = all wal_sender_timeout = 50 --- Just ask to clarify: Is 'slave2' a failback standby? What does 'synchronous_transfer = all' mean? Does that mean wait during both commit and checkpoint? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Mon, Jun 24, 2013 at 7:17 PM, Sawada Masahiko sawada.m...@gmail.comwrote: [Server] standby_name = 'slave1' synchronous_transfer = commit wal_sender_timeout = 30 [Server] standby_name = 'slave2' synchronous_transfer = all wal_sender_timeout = 50 --- What different values/modes you are thinking for synchronous_transfer ? IMHO only commit and all may not be enough. As I suggested upthread, we may need an additional mode, say data, which will ensure synchronous WAL transfer before making any file system changes. We need this separate mode because the failback safe (or whatever we call it) standby need not wait on the commits and it's important to avoid that wait since it comes in a direct path of client transactions. If we are doing it, I wonder if an additional mode none also makes sense so that users can also control asynchronous standbys via the same mechanism. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] Patch for fail-back without fresh backup
Hi, So our proposal on this problem is that we must ensure that master should not make any file system level changes without confirming that the corresponding WAL record is replicated to the standby. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. I am trying to understand how there would be extra WAL on old master that it would replay and cause inconsistency. Consider how I am picturing it and correct me if I am wrong. 1) Master crashes. So a failback standby becomes new master forking the WAL. 2) Old master is restarted as a standby (now with this patch, without a new base backup). 3) It would try to replay all the WAL it has available and later connect to the new master also following the timeline switch (the switch might happen using archived WAL and timeline history file OR the new switch-over-streaming-replication-connection as of 9.3, right?) * in (3), when the new standby/old master is replaying WAL, from where is it picking the WAL? Does it first replay all the WAL in pg_xlog before archive? Should we make it check for a timeline history file in archive before it starts replaying any WAL? * And, would the new master, before forking the WAL, replay all the WAL that is necessary to come to state (of data directory) that the old master was just before it crashed? Am I missing something here? -- Amit Langote -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Wednesday, June 19, 2013 10:45 PM Sawada Masahiko wrote: On Tuesday, June 18, 2013, Amit Kapila wrote: On Tuesday, June 18, 2013 12:18 AM Sawada Masahiko wrote: On Sun, Jun 16, 2013 at 2:00 PM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote: On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote: On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote: On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote: Hello, I think that we can dumping data before all WAL files deleting. All WAL files deleting is done when old master starts as new standby. Can we dump data without starting server? Sorry I made a mistake. We can't it. this proposing patch need to be able to also handle such scenario in future. I am not sure the purposed patch can handle it so easily, but I think if others also felt it important, then a method should be a provided to user for extracting his last committed data. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tuesday, June 18, 2013, Amit Kapila wrote: On Tuesday, June 18, 2013 12:18 AM Sawada Masahiko wrote: On Sun, Jun 16, 2013 at 2:00 PM, Amit kapila amit.kap...@huawei.comjavascript:; wrote: On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote: On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila amit.kap...@huawei.com javascript:; wrote: On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote: On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com javascript:; wrote: On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote: Hello, We have already started a discussion on pgsql-hackers for the problem of taking fresh backup during the failback operation here is the link for that: http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe- 6OzRaew5pWhk7yQtb jgwrfu513...@mail.gmail.com javascript:; Let me again summarize the problem we are trying to address. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. you means that there is possible that old master's data ahead of new master's data. I mean to say is that WAL of old master can be ahead of new master. I understood that data files of old master can't be ahead, but I think WAL can be ahead. so there is inconsistent data between those server when fail back. right? if so , there is not possible inconsistent. because if you use GUC option as his propose (i.g., failback_safe_standby_mode = remote_flush), when old master is working fine, all file system level changes aren't done before WAL replicated. Would the propose patch will take care that old master's WAL is also not ahead in some way? If yes, I think i am missing some point. yes it will happen that old master's WAL ahead of new master's WAL as you said. but I think that we can solve them by delete all WAL file when old master starts as new standby. I think ideally, it should reset WAL location at the point where new master has forrked off. In such a scenario it would be difficult for user who wants to get a dump of some data in old master which hasn't gone to new master. I am not sure if such a need is there for real users, but if it is there, then providing this solution will have some drawbacks. I think that we can dumping data before all WAL files deleting. All WAL files deleting is done when old master starts as new standby. Can we dump data without starting server? Sorry I made a mistake. We can't it. this proposing patch need to be able to also handle such scenario in future. Regards, --- Sawada Masahiko -- Regards, --- Sawada Masahiko
Re: [HACKERS] Patch for fail-back without fresh backup
On Tuesday, June 18, 2013, Amit Kapila wrote: On Tuesday, June 18, 2013 12:18 AM Sawada Masahiko wrote: On Sun, Jun 16, 2013 at 2:00 PM, Amit kapila amit.kap...@huawei.comjavascript:; wrote: On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote: On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila amit.kap...@huawei.com javascript:; wrote: On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote: On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com javascript:; wrote: On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote: Hello, We have already started a discussion on pgsql-hackers for the problem of taking fresh backup during the failback operation here is the link for that: http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe- 6OzRaew5pWhk7yQtb jgwrfu513...@mail.gmail.com javascript:; Let me again summarize the problem we are trying to address. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. you means that there is possible that old master's data ahead of new master's data. I mean to say is that WAL of old master can be ahead of new master. I understood that data files of old master can't be ahead, but I think WAL can be ahead. so there is inconsistent data between those server when fail back. right? if so , there is not possible inconsistent. because if you use GUC option as his propose (i.g., failback_safe_standby_mode = remote_flush), when old master is working fine, all file system level changes aren't done before WAL replicated. Would the propose patch will take care that old master's WAL is also not ahead in some way? If yes, I think i am missing some point. yes it will happen that old master's WAL ahead of new master's WAL as you said. but I think that we can solve them by delete all WAL file when old master starts as new standby. I think ideally, it should reset WAL location at the point where new master has forrked off. In such a scenario it would be difficult for user who wants to get a dump of some data in old master which hasn't gone to new master. I am not sure if such a need is there for real users, but if it is there, then providing this solution will have some drawbacks. I think that we can dumping data before all WAL files deleting. All WAL files deleting is done when old master starts as new standby. Can we dump data without starting server? Sorry I made a mistake. We can't it. this proposing patch need to be able to also handle such scenario in future. Regards, --- Sawada Masahiko -- Regards, --- Sawada Masahiko
Re: [HACKERS] Patch for fail-back without fresh backup
On Sun, Jun 16, 2013 at 11:08 PM, Simon Riggs si...@2ndquadrant.com wrote: On 16 June 2013 17:25, Samrat Revagade revagade.sam...@gmail.com wrote: On Sun, Jun 16, 2013 at 5:10 PM, Simon Riggs si...@2ndquadrant.com wrote: So I strongly object to calling this patch anything to do with failback safe. You simply don't have enough data to make such a bold claim. (Which is why we call it synchronous replication and not zero data loss, for example). But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). I agree with you about the fact that, Now a days the need of fresh backup in crash recovery seems to be a major problem. we might need to change the name of patch if there other problems too with crash recovery. (Sorry don't understand) Sorry for the confusion. I will change name of a patch. The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. The different set of parameters are needed to differentiate between fail-safe standby and synchronous standby, the fail-safe standby and standby in synchronous replication can be two different servers. Why would they be different? What possible reason would you have for that config? There is no *need* for those parameters, the proposal could work perfectly well without them. Let's make this patch fulfill the stated objectives, not add in optional extras, especially ones that don't appear well thought through. If you wish to enhance the design for the specification of multi-node sync rep, make that a separate patch, later. I agree with you.I will remove the extra parameters if they are not required in next version of the patch. -- Regards, Samrat Revgade
Re: [HACKERS] Patch for fail-back without fresh backup
On Sun, Jun 16, 2013 at 5:10 PM, Simon Riggs si...@2ndquadrant.com wrote: My perspective is that if the master crashed, assuming that you know everything about that and suddenly jumping back on seem like a recipe for disaster. Attempting that is currently blocked by the technical obstacles you've identified, but that doesn't mean they are the only ones - we don't yet understand what all the problems lurking might be. Personally, I won't be following you onto that minefield anytime soon. Would it be fair to say that a user will be willing to trust her crashed master in all scenarios where she would have done so in a single instance setup ? IOW without the replication setup, AFAIU users have traditionally trusted the WAL recovery to recover from failed instances. This would include some common failures such as power outages and hardware failures, but may not include others such as on disk corruption. So I strongly object to calling this patch anything to do with failback safe. You simply don't have enough data to make such a bold claim. (Which is why we call it synchronous replication and not zero data loss, for example). I agree. We should probably find a better name for this. Any suggestions ? But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). Its an interesting idea, but I think there is some difference here. For example, the proposed feature allows a backend to wait at other points but not commit. Since commits are more foreground in nature and this feature does not require us to wait during common foreground activities, we want a configuration where master can wait for synchronous transfers at other than commits. May we can solve that by having more granular control to the said parameter ? The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. How would we then distinguish between synchronous and the new kind of standby ? I am told, one of the very popular setups for DR is to have one local sync standby and one async (may be cascaded by the local sync). Since this new feature is more useful for DR because taking a fresh backup on a slower link is even more challenging, IMHO we should support such setups. I'm worried to see that adding this feature and yet turning it off causes a measureable drop in performance. I don't think we want that at all. That clearly needs more work and thought. I agree. We need to repeat those tests. I don't trust that turning the feature is causing 1-2% drop. In one of the tests, I see turning the feature on is showing better number compared to when its turn off. That's clearly noise or need concrete argument to convince that way. I also think your performance results are somewhat bogus. Fast transaction workloads were already mostly commit waits - But not in case of async standby, right ? measurements of what happens to large loads, index builds etc would likely reveal something quite different. I agree. I also feel we need tests where the FlushBuffer gets called more often by the normal backends to see how much added wait in that code path causes performance drops. Another important thing to test would be to see how it works on a slower/high latency links. I'm tempted by the thought that we should put the WaitForLSN inside XLogFlush, rather than scatter additional calls everywhere and then have us inevitably miss one. That indeed seems cleaner. Thanks, Pavan
Re: [HACKERS] Patch for fail-back without fresh backup
On 17 June 2013 09:03, Pavan Deolasee pavan.deola...@gmail.com wrote: I agree. We should probably find a better name for this. Any suggestions ? err, I already made one... But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). Since commits are more foreground in nature and this feature does not require us to wait during common foreground activities, we want a configuration where master can wait for synchronous transfers at other than commits. May we can solve that by having more granular control to the said parameter ? The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. How would we then distinguish between synchronous and the new kind of standby ? That's not the point. The point is Why would we have a new kind of standby? and therefore why do we need new parameters? I am told, one of the very popular setups for DR is to have one local sync standby and one async (may be cascaded by the local sync). Since this new feature is more useful for DR because taking a fresh backup on a slower link is even more challenging, IMHO we should support such setups. ...which still doesn't make sense to me. Lets look at that in detail. Take 3 servers, A, B, C with A and B being linked by sync rep, and C being safety standby at a distance. Either A or B is master, except in disaster. So if A is master, then B would be the failover target. If A fails, then you want to failover to B. Once B is the target, you want to failback to A as the master. C needs to follow the new master, whichever it is. If you set up sync rep between A and B and this new mode between A and C. When B becomes the master, you need to failback from B from A, but you can't because the new mode applied between A and C only, so you have to failback from C to A. So having the new mode not match with sync rep means you are forcing people to failback using the slow link in the common case. You might observe that having the two modes match causes problems if A and B fail, so you are forced to go to C as master and then eventually failback to A or B across a slow link. That case is less common and could be solved by extending sync transfer to more/multi nodes. It definitely doesn't make sense to have sync rep on anything other than a subset of sync transfer. So while it may be sensible in the future to make sync transfer a superset of sync rep nodes, it makes sense to make them the same config for now. Phew -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Sun, Jun 16, 2013 at 2:00 PM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote: On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote: On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote: On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote: Hello, We have already started a discussion on pgsql-hackers for the problem of taking fresh backup during the failback operation here is the link for that: http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb jgwrfu513...@mail.gmail.com Let me again summarize the problem we are trying to address. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. you means that there is possible that old master's data ahead of new master's data. I mean to say is that WAL of old master can be ahead of new master. I understood that data files of old master can't be ahead, but I think WAL can be ahead. so there is inconsistent data between those server when fail back. right? if so , there is not possible inconsistent. because if you use GUC option as his propose (i.g., failback_safe_standby_mode = remote_flush), when old master is working fine, all file system level changes aren't done before WAL replicated. Would the propose patch will take care that old master's WAL is also not ahead in some way? If yes, I think i am missing some point. yes it will happen that old master's WAL ahead of new master's WAL as you said. but I think that we can solve them by delete all WAL file when old master starts as new standby. I think ideally, it should reset WAL location at the point where new master has forrked off. In such a scenario it would be difficult for user who wants to get a dump of some data in old master which hasn't gone to new master. I am not sure if such a need is there for real users, but if it is there, then providing this solution will have some drawbacks. I think that we can dumping data before all WAL files deleting. All WAL files deleting is done when old master starts as new standby. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Tuesday, June 18, 2013 12:18 AM Sawada Masahiko wrote: On Sun, Jun 16, 2013 at 2:00 PM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote: On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote: On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote: On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote: Hello, We have already started a discussion on pgsql-hackers for the problem of taking fresh backup during the failback operation here is the link for that: http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe- 6OzRaew5pWhk7yQtb jgwrfu513...@mail.gmail.com Let me again summarize the problem we are trying to address. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. you means that there is possible that old master's data ahead of new master's data. I mean to say is that WAL of old master can be ahead of new master. I understood that data files of old master can't be ahead, but I think WAL can be ahead. so there is inconsistent data between those server when fail back. right? if so , there is not possible inconsistent. because if you use GUC option as his propose (i.g., failback_safe_standby_mode = remote_flush), when old master is working fine, all file system level changes aren't done before WAL replicated. Would the propose patch will take care that old master's WAL is also not ahead in some way? If yes, I think i am missing some point. yes it will happen that old master's WAL ahead of new master's WAL as you said. but I think that we can solve them by delete all WAL file when old master starts as new standby. I think ideally, it should reset WAL location at the point where new master has forrked off. In such a scenario it would be difficult for user who wants to get a dump of some data in old master which hasn't gone to new master. I am not sure if such a need is there for real users, but if it is there, then providing this solution will have some drawbacks. I think that we can dumping data before all WAL files deleting. All WAL files deleting is done when old master starts as new standby. Can we dump data without starting server? With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 14 June 2013 17:21, Jeff Davis pg...@j-davis.com wrote: On Fri, 2013-06-14 at 16:10 +0200, Andres Freund wrote: Jeff Davis has a patch pending (1365493015.7580.3240.camel@sussancws0025) that passes the buffer_std flag down to MarkBufferDirtyHint() for exactly that reason. I thought we were on track committing that, but rereading the thread it doesn't look that way. Jeff, care to update that patch? Rebased and attached. Changed so all callers use buffer_std=true except those in freespace.c and fsmpage.c. Simon, did you (or anyone else) have an objection to this patch? If not, I'll go ahead and commit it tomorrow morning. I didn't have a specific objection to the patch, I just wanted to minimise change relating to this so we didn't introduce further bugs. I've no objection to you committing that. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On 14 June 2013 10:11, Samrat Revagade revagade.sam...@gmail.com wrote: We have already started a discussion on pgsql-hackers for the problem of taking fresh backup during the failback operation here is the link for that: http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com So our proposal on this problem is that we must ensure that master should not make any file system level changes without confirming that the corresponding WAL record is replicated to the standby. 1. The main objection was raised by Tom and others is that we should not add this feature and should go with traditional way of taking fresh backup using the rsync, because he was concerned about the additional complexity of the patch and the performance overhead during normal operations. 2. Tom and others were also worried about the inconsistencies in the crashed master and suggested that its better to start with a fresh backup. Fujii Masao and others correctly countered that suggesting that we trust WAL recovery to clear all such inconsistencies and there is no reason why we can't do the same here. So the patch is showing 1-2% performance overhead. Let's have a look at this... The objections you summarise that Tom has made are ones that I agree with. I also don't think that Fujii correctly countered those objections. My perspective is that if the master crashed, assuming that you know everything about that and suddenly jumping back on seem like a recipe for disaster. Attempting that is currently blocked by the technical obstacles you've identified, but that doesn't mean they are the only ones - we don't yet understand what all the problems lurking might be. Personally, I won't be following you onto that minefield anytime soon. So I strongly object to calling this patch anything to do with failback safe. You simply don't have enough data to make such a bold claim. (Which is why we call it synchronous replication and not zero data loss, for example). But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. I'm worried to see that adding this feature and yet turning it off causes a measureable drop in performance. I don't think we want that at all. That clearly needs more work and thought. I also think your performance results are somewhat bogus. Fast transaction workloads were already mostly commit waits - measurements of what happens to large loads, index builds etc would likely reveal something quite different. I'm tempted by the thought that we should put the WaitForLSN inside XLogFlush, rather than scatter additional calls everywhere and then have us inevitably miss one. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Sun, Jun 16, 2013 at 5:10 PM, Simon Riggs si...@2ndquadrant.com wrote: So I strongly object to calling this patch anything to do with failback safe. You simply don't have enough data to make such a bold claim. (Which is why we call it synchronous replication and not zero data loss, for example). But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). I agree with you about the fact that, Now a days the need of fresh backup in crash recovery seems to be a major problem. we might need to change the name of patch if there other problems too with crash recovery. The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. The different set of parameters are needed to differentiate between fail-safe standby and synchronous standby, the fail-safe standby and standby in synchronous replication can be two different servers. I'm worried to see that adding this feature and yet turning it off causes a measureable drop in performance. I don't think we want that at all. That clearly needs more work and thought. I also think your performance results are somewhat bogus. Fast transaction workloads were already mostly commit waits - measurements of what happens to large loads, index builds etc would likely reveal something quite different. I will test the other scenarios and post the results. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Regards, Samrat Revgade
Re: [HACKERS] Patch for fail-back without fresh backup
On 16 June 2013 17:25, Samrat Revagade revagade.sam...@gmail.com wrote: On Sun, Jun 16, 2013 at 5:10 PM, Simon Riggs si...@2ndquadrant.com wrote: So I strongly object to calling this patch anything to do with failback safe. You simply don't have enough data to make such a bold claim. (Which is why we call it synchronous replication and not zero data loss, for example). But that's not the whole story. I can see some utility in a patch that makes all WAL transfer synchronous, rather than just commits. Some name like synchronous_transfer might be appropriate. e.g. synchronous_transfer = all | commit (default). I agree with you about the fact that, Now a days the need of fresh backup in crash recovery seems to be a major problem. we might need to change the name of patch if there other problems too with crash recovery. (Sorry don't understand) The idea of another slew of parameters that are very similar to synchronous replication but yet somehow different seems weird. I can't see a reason why we'd want a second lot of parameters. Why not just use the existing ones for sync rep? (I'm surprised the Parameter Police haven't visited you in the night...) Sure, we might want to expand the design for how we specify multi-node sync rep, but that is a different patch. The different set of parameters are needed to differentiate between fail-safe standby and synchronous standby, the fail-safe standby and standby in synchronous replication can be two different servers. Why would they be different? What possible reason would you have for that config? There is no *need* for those parameters, the proposal could work perfectly well without them. Let's make this patch fulfill the stated objectives, not add in optional extras, especially ones that don't appear well thought through. If you wish to enhance the design for the specification of multi-node sync rep, make that a separate patch, later. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote: On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote: Hello, We have already started a discussion on pgsql-hackers for the problem of taking fresh backup during the failback operation here is the link for that: http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb jgwrfu513...@mail.gmail.com Let me again summarize the problem we are trying to address. When the master fails, last few WAL files may not reach the standby. But the master may have gone ahead and made changes to its local file system after flushing WAL to the local storage. So master contains some file system level changes that standby does not have. At this point, the data directory of master is ahead of standby's data directory. Subsequently, the standby will be promoted as new master. Later when the old master wants to be a standby of the new master, it can't just join the setup since there is inconsistency in between these two servers. We need to take the fresh backup from the new master. This can happen in both the synchronous as well as asynchronous replication. Fresh backup is also needed in case of clean switch-over because in the current HEAD, the master does not wait for the standby to receive all the WAL up to the shutdown checkpoint record before shutting down the connection. Fujii Masao has already submitted a patch to handle clean switch-over case, but the problem is still remaining for failback case. The process of taking fresh backup is very time consuming when databases are of very big sizes, say several TB's, and when the servers are connected over a relatively slower link. This would break the service level agreement of disaster recovery system. So there is need to improve the process of disaster recovery in PostgreSQL. One way to achieve this is to maintain consistency between master and standby which helps to avoid need of fresh backup. So our proposal on this problem is that we must ensure that master should not make any file system level changes without confirming that the corresponding WAL record is replicated to the standby. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. you means that there is possible that old master's data ahead of new master's data. so there is inconsistent data between those server when fail back. right? if so , there is not possible inconsistent. because if you use GUC option as his propose (i.g., failback_safe_standby_mode = remote_flush), when old master is working fine, all file system level changes aren't done before WAL replicated. -- Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote: On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote: On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote: Hello, We have already started a discussion on pgsql-hackers for the problem of taking fresh backup during the failback operation here is the link for that: http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb jgwrfu513...@mail.gmail.com Let me again summarize the problem we are trying to address. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. you means that there is possible that old master's data ahead of new master's data. I mean to say is that WAL of old master can be ahead of new master. I understood that data files of old master can't be ahead, but I think WAL can be ahead. so there is inconsistent data between those server when fail back. right? if so , there is not possible inconsistent. because if you use GUC option as his propose (i.g., failback_safe_standby_mode = remote_flush), when old master is working fine, all file system level changes aren't done before WAL replicated. Would the propose patch will take care that old master's WAL is also not ahead in some way? If yes, I think i am missing some point. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote: On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote: On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote: Hello, We have already started a discussion on pgsql-hackers for the problem of taking fresh backup during the failback operation here is the link for that: http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb jgwrfu513...@mail.gmail.com Let me again summarize the problem we are trying to address. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. you means that there is possible that old master's data ahead of new master's data. I mean to say is that WAL of old master can be ahead of new master. I understood that data files of old master can't be ahead, but I think WAL can be ahead. so there is inconsistent data between those server when fail back. right? if so , there is not possible inconsistent. because if you use GUC option as his propose (i.g., failback_safe_standby_mode = remote_flush), when old master is working fine, all file system level changes aren't done before WAL replicated. Would the propose patch will take care that old master's WAL is also not ahead in some way? If yes, I think i am missing some point. yes it will happen that old master's WAL ahead of new master's WAL as you said. but I think that we can solve them by delete all WAL file when old master starts as new standby. thought? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, 2013-06-14 at 18:27 +0200, Andres Freund wrote: I'd like to see a comment around the memcpys in XLogSaveBufferForHint() that mentions that they are safe in a non std buffer due to XLogCheckBuffer setting an appropriate hole/offset. Or make an explicit change of the copy algorithm there. Done. Btw, if you touch that code, I'd vote for renaming XLOG_HINT to XLOG_FPI or something like that. I find the former name confusing... Also done. Patch attached. Also, since we branched, I think this should be back-patched to 9.3 as well. Regards, Jeff Davis *** a/src/backend/access/hash/hash.c --- b/src/backend/access/hash/hash.c *** *** 287,293 hashgettuple(PG_FUNCTION_ARGS) /* * Since this can be redone later if needed, mark as a hint. */ ! MarkBufferDirtyHint(buf); } /* --- 287,293 /* * Since this can be redone later if needed, mark as a hint. */ ! MarkBufferDirtyHint(buf, true); } /* *** a/src/backend/access/heap/pruneheap.c --- b/src/backend/access/heap/pruneheap.c *** *** 262,268 heap_page_prune(Relation relation, Buffer buffer, TransactionId OldestXmin, { ((PageHeader) page)-pd_prune_xid = prstate.new_prune_xid; PageClearFull(page); ! MarkBufferDirtyHint(buffer); } } --- 262,268 { ((PageHeader) page)-pd_prune_xid = prstate.new_prune_xid; PageClearFull(page); ! MarkBufferDirtyHint(buffer, true); } } *** a/src/backend/access/nbtree/nbtinsert.c --- b/src/backend/access/nbtree/nbtinsert.c *** *** 413,421 _bt_check_unique(Relation rel, IndexTuple itup, Relation heapRel, * crucial. Be sure to mark the proper buffer dirty. */ if (nbuf != InvalidBuffer) ! MarkBufferDirtyHint(nbuf); else ! MarkBufferDirtyHint(buf); } } } --- 413,421 * crucial. Be sure to mark the proper buffer dirty. */ if (nbuf != InvalidBuffer) ! MarkBufferDirtyHint(nbuf, true); else ! MarkBufferDirtyHint(buf, true); } } } *** a/src/backend/access/nbtree/nbtree.c --- b/src/backend/access/nbtree/nbtree.c *** *** 1052,1058 restart: opaque-btpo_cycleid == vstate-cycleid) { opaque-btpo_cycleid = 0; ! MarkBufferDirtyHint(buf); } } --- 1052,1058 opaque-btpo_cycleid == vstate-cycleid) { opaque-btpo_cycleid = 0; ! MarkBufferDirtyHint(buf, true); } } *** a/src/backend/access/nbtree/nbtutils.c --- b/src/backend/access/nbtree/nbtutils.c *** *** 1789,1795 _bt_killitems(IndexScanDesc scan, bool haveLock) if (killedsomething) { opaque-btpo_flags |= BTP_HAS_GARBAGE; ! MarkBufferDirtyHint(so-currPos.buf); } if (!haveLock) --- 1789,1795 if (killedsomething) { opaque-btpo_flags |= BTP_HAS_GARBAGE; ! MarkBufferDirtyHint(so-currPos.buf, true); } if (!haveLock) *** a/src/backend/access/rmgrdesc/xlogdesc.c --- b/src/backend/access/rmgrdesc/xlogdesc.c *** *** 82,92 xlog_desc(StringInfo buf, uint8 xl_info, char *rec) appendStringInfo(buf, restore point: %s, xlrec-rp_name); } ! else if (info == XLOG_HINT) { BkpBlock *bkp = (BkpBlock *) rec; ! appendStringInfo(buf, page hint: %s block %u, relpathperm(bkp-node, bkp-fork), bkp-block); } --- 82,92 appendStringInfo(buf, restore point: %s, xlrec-rp_name); } ! else if (info == XLOG_FPI) { BkpBlock *bkp = (BkpBlock *) rec; ! appendStringInfo(buf, full-page image: %s block %u, relpathperm(bkp-node, bkp-fork), bkp-block); } *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *** *** 7681,7692 XLogRestorePoint(const char *rpName) * records. In that case, multiple copies of the same block would be recorded * in separate WAL records by different backends, though that is still OK from * a correctness perspective. - * - * Note that this only works for buffers that fit the standard page model, - * i.e. those for which buffer_std == true */ XLogRecPtr ! XLogSaveBufferForHint(Buffer buffer) { XLogRecPtr recptr = InvalidXLogRecPtr; XLogRecPtr lsn; --- 7681,7689 * records. In that case, multiple copies of the same block would be recorded * in separate WAL records by different backends, though that is still OK from * a correctness perspective. */ XLogRecPtr ! XLogSaveBufferForHint(Buffer buffer, bool buffer_std) { XLogRecPtr recptr = InvalidXLogRecPtr; XLogRecPtr lsn; *** *** 7708,7714 XLogSaveBufferForHint(Buffer buffer) * and reset rdata for any actual WAL record insert. */ rdata[0].buffer = buffer; ! rdata[0].buffer_std = true; /* * Check buffer while not holding an exclusive lock. --- 7705,7711 * and reset rdata for any
Re: [HACKERS] Patch for fail-back without fresh backup
On 2013-06-15 11:36:54 -0700, Jeff Davis wrote: On Fri, 2013-06-14 at 18:27 +0200, Andres Freund wrote: I'd like to see a comment around the memcpys in XLogSaveBufferForHint() that mentions that they are safe in a non std buffer due to XLogCheckBuffer setting an appropriate hole/offset. Or make an explicit change of the copy algorithm there. Done. Also done. Thanks! Looks good to me. Patch attached. Also, since we branched, I think this should be back-patched to 9.3 as well. Absolutely. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Saturday, June 15, 2013 8:29 PM Sawada Masahiko wrote: On Sat, Jun 15, 2013 at 10:34 PM, Amit kapila amit.kap...@huawei.com wrote: On Saturday, June 15, 2013 1:19 PM Sawada Masahiko wrote: On Fri, Jun 14, 2013 at 10:15 PM, Amit Kapila amit.kap...@huawei.com wrote: On Friday, June 14, 2013 2:42 PM Samrat Revagade wrote: Hello, We have already started a discussion on pgsql-hackers for the problem of taking fresh backup during the failback operation here is the link for that: http://www.postgresql.org/message-id/CAF8Q-Gxg3PQTf71NVECe-6OzRaew5pWhk7yQtb jgwrfu513...@mail.gmail.com Let me again summarize the problem we are trying to address. How will you take care of extra WAL on old master during recovery. If it plays the WAL which has not reached new-master, it can be a problem. you means that there is possible that old master's data ahead of new master's data. I mean to say is that WAL of old master can be ahead of new master. I understood that data files of old master can't be ahead, but I think WAL can be ahead. so there is inconsistent data between those server when fail back. right? if so , there is not possible inconsistent. because if you use GUC option as his propose (i.g., failback_safe_standby_mode = remote_flush), when old master is working fine, all file system level changes aren't done before WAL replicated. Would the propose patch will take care that old master's WAL is also not ahead in some way? If yes, I think i am missing some point. yes it will happen that old master's WAL ahead of new master's WAL as you said. but I think that we can solve them by delete all WAL file when old master starts as new standby. I think ideally, it should reset WAL location at the point where new master has forrked off. In such a scenario it would be difficult for user who wants to get a dump of some data in old master which hasn't gone to new master. I am not sure if such a need is there for real users, but if it is there, then providing this solution will have some drawbacks. With Regards, Amit Kapila. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch for fail-back without fresh backup
On Fri, Jun 14, 2013 at 10:11 AM, Samrat Revagade revagade.sam...@gmail.com wrote: Hello, We have already started a discussion on pgsql-hackers for the problem of taking fresh backup during the failback operation here is the link for that: http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com Let me again summarize the problem we are trying to address. When the master fails, last few WAL files may not reach the standby. But the master may have gone ahead and made changes to its local file system after flushing WAL to the local storage. So master contains some file system level changes that standby does not have. At this point, the data directory of master is ahead of standby's data directory. Subsequently, the standby will be promoted as new master. Later when the old master wants to be a standby of the new master, it can't just join the setup since there is inconsistency in between these two servers. We need to take the fresh backup from the new master. This can happen in both the synchronous as well as asynchronous replication. Fresh backup is also needed in case of clean switch-over because in the current HEAD, the master does not wait for the standby to receive all the WAL up to the shutdown checkpoint record before shutting down the connection. Fujii Masao has already submitted a patch to handle clean switch-over case, but the problem is still remaining for failback case. The process of taking fresh backup is very time consuming when databases are of very big sizes, say several TB's, and when the servers are connected over a relatively slower link. This would break the service level agreement of disaster recovery system. So there is need to improve the process of disaster recovery in PostgreSQL. One way to achieve this is to maintain consistency between master and standby which helps to avoid need of fresh backup. So our proposal on this problem is that we must ensure that master should not make any file system level changes without confirming that the corresponding WAL record is replicated to the standby. A alternative proposal (which will probably just reveal my lack of understanding about what is or isn't possible with WAL). Provide a way to restart the master so that it rolls back the WAL changes that the slave hasn't seen. There are many suggestions and objections pgsql-hackers about this problem The brief summary is as follows:
Re: [HACKERS] Patch for fail-back without fresh backup
That will not happen if there is inconsistency in between both the servers. Please refer to the discussions on the link provided in the first post: http://www.postgresql.org/message-id/caf8q-gxg3pqtf71nvece-6ozraew5pwhk7yqtbjgwrfu513...@mail.gmail.com Regards, Samrat Revgade