Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-21 Thread Fujii Masao
On Wed, Apr 17, 2013 at 10:11 PM, Amit Kapila amit.kap...@huawei.com wrote: On Wednesday, April 17, 2013 4:19 PM Florian Pflug wrote: On Apr17, 2013, at 12:22 , Amit Kapila amit.kap...@huawei.com wrote: Do you mean to say that as an error has occurred, so it would not be able to flush

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-21 Thread Fujii Masao
On Wed, Apr 17, 2013 at 7:49 PM, Florian Pflug f...@phlo.org wrote: On Apr17, 2013, at 12:22 , Amit Kapila amit.kap...@huawei.com wrote: Do you mean to say that as an error has occurred, so it would not be able to flush received WAL, which could result in loss of WAL? I think even if error

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-19 Thread Martijn van Oosterhout
On Wed, Apr 17, 2013 at 12:49:10PM +0200, Florian Pflug wrote: Fixing this on the receive side alone seems quite messy and fragile. So instead, I think we should let the master send a shutdown message after it has sent everything it wants to send, and wait for the client to acknowledge it

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-19 Thread Florian Pflug
On Apr19, 2013, at 14:46 , Martijn van Oosterhout klep...@svana.org wrote: On Wed, Apr 17, 2013 at 12:49:10PM +0200, Florian Pflug wrote: Fixing this on the receive side alone seems quite messy and fragile. So instead, I think we should let the master send a shutdown message after it has sent

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-17 Thread Amit Kapila
On Monday, April 15, 2013 1:02 PM Florian Pflug wrote: On Apr14, 2013, at 17:56 , Fujii Masao masao.fu...@gmail.com wrote: At fast shutdown, after walsender sends the checkpoint record and closes the replication connection, walreceiver can detect the close of connection before receiving all

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-17 Thread Florian Pflug
On Apr17, 2013, at 12:22 , Amit Kapila amit.kap...@huawei.com wrote: Do you mean to say that as an error has occurred, so it would not be able to flush received WAL, which could result in loss of WAL? I think even if error occurs, it will call flush in WalRcvDie(), before terminating

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-17 Thread Amit Kapila
On Wednesday, April 17, 2013 4:19 PM Florian Pflug wrote: On Apr17, 2013, at 12:22 , Amit Kapila amit.kap...@huawei.com wrote: Do you mean to say that as an error has occurred, so it would not be able to flush received WAL, which could result in loss of WAL? I think even if error occurs,

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-15 Thread Florian Pflug
On Apr14, 2013, at 17:56 , Fujii Masao masao.fu...@gmail.com wrote: At fast shutdown, after walsender sends the checkpoint record and closes the replication connection, walreceiver can detect the close of connection before receiving all WAL records. This means that, even if walsender sends all

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-14 Thread Fujii Masao
On Fri, Apr 12, 2013 at 5:53 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 04/11/2013 07:29 PM, Fujii Masao wrote: On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: You just shut down the old master and let the standby catch up (takas a few microseconds ;) )

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-14 Thread Fujii Masao
On Fri, Apr 12, 2013 at 7:57 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-04-12 02:29:01 +0900, Fujii Masao wrote: On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: You just shut down the old master and let the standby catch up (takas a few

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-14 Thread Hannu Krosing
On 04/14/2013 05:56 PM, Fujii Masao wrote: On Fri, Apr 12, 2013 at 7:57 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-04-12 02:29:01 +0900, Fujii Masao wrote: On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: You just shut down the old master and let the

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-12 Thread Hannu Krosing
On 04/11/2013 07:29 PM, Fujii Masao wrote: On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: You just shut down the old master and let the standby catch up (takas a few microseconds ;) ) before you promote it. After this you can start up the former master with

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-12 Thread Andres Freund
On 2013-04-12 02:29:01 +0900, Fujii Masao wrote: On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: You just shut down the old master and let the standby catch up (takas a few microseconds ;) ) before you promote it. After this you can start up the former

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-12 Thread Andres Freund
On 2013-04-12 11:18:01 +0530, Pavan Deolasee wrote: On Thu, Apr 11, 2013 at 8:39 PM, Ants Aasma a...@cybertec.at wrote: On Thu, Apr 11, 2013 at 5:33 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 04/11/2013 03:52 PM, Ants Aasma wrote: On Thu, Apr 11, 2013 at 4:25 PM, Hannu

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-12 Thread Pavan Deolasee
On Fri, Apr 12, 2013 at 4:29 PM, Andres Freund and...@2ndquadrant.comwrote: I don't think that holds true at all. If you look at pg_stat_bgwriter in any remotely bugs cluster with a hot data set over shared_buffers you'll notice that a large percentage of writes will have been done by

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-12 Thread Andres Freund
On 2013-04-12 16:58:44 +0530, Pavan Deolasee wrote: On Fri, Apr 12, 2013 at 4:29 PM, Andres Freund and...@2ndquadrant.comwrote: I don't think that holds true at all. If you look at pg_stat_bgwriter in any remotely bugs cluster with a hot data set over shared_buffers you'll notice that

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Amit Kapila
On Wednesday, April 10, 2013 10:31 PM Fujii Masao wrote: On Thu, Apr 11, 2013 at 1:44 AM, Shaun Thomas stho...@optionshouse.com wrote: On 04/10/2013 11:40 AM, Fujii Masao wrote: Strange. If this is really true, shared disk failover solution is fundamentally broken because the standby

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Ants Aasma
On Thu, Apr 11, 2013 at 10:09 AM, Amit Kapila amit.kap...@huawei.com wrote: Consider the case old-master crashed during flushing the data page, now you would need full page image from new-master. It might so happen that in new-master Checkpoint would have purged (reused) the log file's from

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Sameer Thakur
Hello, The only potential use case for this that I can see, would be for system maintenance and a controlled failover. I agree: that's a major PITA when doing DR testing, but I personally don't think this is the way to fix that particular edge case. This is the use case we are trying to address

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Hannu Krosing
On 04/11/2013 01:26 PM, Sameer Thakur wrote: Hello, The only potential use case for this that I can see, would be for system maintenance and a controlled failover. I agree: that's a major PITA when doing DR testing, but I personally don't think this is the way to fix that particular edge

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Ants Aasma
On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: The proposed fix - halting all writes of data pages to disk and to WAL files while waiting ACK from standby - will tremendously slow down all parallel work on master. This is not what is being proposed. The proposed

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Tom Lane
Ants Aasma a...@cybertec.at writes: On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: The proposed fix - halting all writes of data pages to disk and to WAL files while waiting ACK from standby - will tremendously slow down all parallel work on master. This is not

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Hannu Krosing
On 04/11/2013 03:52 PM, Ants Aasma wrote: On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: The proposed fix - halting all writes of data pages to disk and to WAL files while waiting ACK from standby - will tremendously slow down all parallel work on master. This is

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Ants Aasma
On Thu, Apr 11, 2013 at 5:33 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 04/11/2013 03:52 PM, Ants Aasma wrote: On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: The proposed fix - halting all writes of data pages to disk and to WAL files while waiting ACK

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Fujii Masao
On Thu, Apr 11, 2013 at 2:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: Ants Aasma a...@cybertec.at writes: We already rely on WAL-before-data to ensure correct recovery. What is proposed here is to slightly redefine it to require WAL to be replicated before it is considered to be flushed. This

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Fujii Masao
On Thu, Apr 11, 2013 at 10:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: You just shut down the old master and let the standby catch up (takas a few microseconds ;) ) before you promote it. After this you can start up the former master with recovery.conf and it will follow nicely. No.

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Fujii Masao
On Fri, Apr 12, 2013 at 12:09 AM, Ants Aasma a...@cybertec.at wrote: On Thu, Apr 11, 2013 at 5:33 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 04/11/2013 03:52 PM, Ants Aasma wrote: On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: The proposed fix - halting

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-11 Thread Pavan Deolasee
On Thu, Apr 11, 2013 at 8:39 PM, Ants Aasma a...@cybertec.at wrote: On Thu, Apr 11, 2013 at 5:33 PM, Hannu Krosing ha...@2ndquadrant.com wrote: On 04/11/2013 03:52 PM, Ants Aasma wrote: On Thu, Apr 11, 2013 at 4:25 PM, Hannu Krosing ha...@2ndquadrant.com wrote: The proposed fix -

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Samrat Revagade
it's one of the reasons why a fresh base backup is required when starting old master as new standby? If yes, I agree with you. I've often heard the complaints about a backup when restarting new standby. That's really big problem. I think Fujii Masao is on the same page. In case of syncrep the

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Samrat Revagade
(5) *The master then forces a write of the data page related to this transaction.* *Sorry, this is incorrect. Whenever the master writes the data page it checks that the WAL record is written in standby till that LSN. * * * While master is waiting to force a write (point 5) for this data page,

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Amit Kapila
On Wednesday, April 10, 2013 3:42 PM Samrat Revagade wrote: (5) The master then forces a write of the data page related to this transaction. Sorry, this is incorrect. Whenever the master writes the data page it checks that the WAL record is written in standby till that LSN.  While master is

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Tom Lane
Amit Kapila amit.kap...@huawei.com writes: On Wednesday, April 10, 2013 3:42 PM Samrat Revagade wrote: Sorry, this is incorrect. Streaming replication continuous, master is not waiting, whenever the master writes the data page it checks that the WAL record is written in standby till that LSN.

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Andres Freund
On 2013-04-10 10:10:31 -0400, Tom Lane wrote: Amit Kapila amit.kap...@huawei.com writes: On Wednesday, April 10, 2013 3:42 PM Samrat Revagade wrote: Sorry, this is incorrect. Streaming replication continuous, master is not waiting, whenever the master writes the data page it checks that the

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Shaun Thomas
On 04/10/2013 09:10 AM, Tom Lane wrote: IOW, I wouldn't consider skipping the rsync even if I had a feature like this. Totally. Out in the field, we consider the old database corrupt the moment we fail over. There is literally no way to verify the safety of any data along the broken chain,

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Fujii Masao
On Wed, Apr 10, 2013 at 11:26 PM, Shaun Thomas stho...@optionshouse.com wrote: On 04/10/2013 09:10 AM, Tom Lane wrote: IOW, I wouldn't consider skipping the rsync even if I had a feature like this. Totally. Out in the field, we consider the old database corrupt the moment we fail over.

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Shaun Thomas
On 04/10/2013 11:40 AM, Fujii Masao wrote: Strange. If this is really true, shared disk failover solution is fundamentally broken because the standby needs to start up with the shared corrupted database at the failover. How so? Shared disk doesn't use replication. The point I was trying to

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Fujii Masao
On Wed, Apr 10, 2013 at 11:16 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-04-10 10:10:31 -0400, Tom Lane wrote: Amit Kapila amit.kap...@huawei.com writes: On Wednesday, April 10, 2013 3:42 PM Samrat Revagade wrote: Sorry, this is incorrect. Streaming replication continuous,

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Fujii Masao
On Thu, Apr 11, 2013 at 1:44 AM, Shaun Thomas stho...@optionshouse.com wrote: On 04/10/2013 11:40 AM, Fujii Masao wrote: Strange. If this is really true, shared disk failover solution is fundamentally broken because the standby needs to start up with the shared corrupted database at the

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Ants Aasma
On Wed, Apr 10, 2013 at 7:44 PM, Shaun Thomas stho...@optionshouse.com wrote: On 04/10/2013 11:40 AM, Fujii Masao wrote: Strange. If this is really true, shared disk failover solution is fundamentally broken because the standby needs to start up with the shared corrupted database at the

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Tom Lane
Ants Aasma a...@cybertec.at writes: We already rely on WAL-before-data to ensure correct recovery. What is proposed here is to slightly redefine it to require WAL to be replicated before it is considered to be flushed. This ensures that no data page on disk differs from the WAL that the slave

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Boszormenyi Zoltan
2013-04-10 18:46 keltezéssel, Fujii Masao írta: On Wed, Apr 10, 2013 at 11:16 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-04-10 10:10:31 -0400, Tom Lane wrote: Amit Kapila amit.kap...@huawei.com writes: On Wednesday, April 10, 2013 3:42 PM Samrat Revagade wrote: Sorry, this is

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-10 Thread Andres Freund
On 2013-04-10 20:39:25 +0200, Boszormenyi Zoltan wrote: 2013-04-10 18:46 keltezéssel, Fujii Masao írta: On Wed, Apr 10, 2013 at 11:16 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-04-10 10:10:31 -0400, Tom Lane wrote: Amit Kapila amit.kap...@huawei.com writes: On Wednesday, April

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-09 Thread Samrat Revagade
What Samrat is proposing here is that WAL is not flushed to the OS before it is acked by a synchronous replica so recovery won't go past the timeline change made in failover, making it necessary to take a new base backup to resync with the new master. Actually we are proposing that the data

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-09 Thread Ants Aasma
On Tue, Apr 9, 2013 at 9:42 AM, Samrat Revagade revagade.sam...@gmail.com wrote: What Samrat is proposing here is that WAL is not flushed to the OS before it is acked by a synchronous replica so recovery won't go past the timeline change made in failover, making it necessary to take a new base

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-09 Thread Hannu Krosing
On 04/08/2013 12:34 PM, Samrat Revagade wrote: Hello, We have been trying to figure out possible solutions to the following problem in streaming replication Consider following scenario: If master receives commit command, it writes and flushes commit WAL records to the disk, It also writes

[HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Samrat Revagade
Hello, We have been trying to figure out possible solutions to the following problem in streaming replication Consider following scenario: If master receives commit command, it writes and flushes commit WAL records to the disk, It also writes and flushes data page related to this transaction.

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Shaun Thomas
On 04/08/2013 05:34 AM, Samrat Revagade wrote: One solution to avoid this situation is have the master send WAL records to standby and wait for ACK from standby committing WAL files to disk and only after that commit data page related to this transaction on master. Isn't this basically what

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Tom Lane
Samrat Revagade revagade.sam...@gmail.com writes: We have been trying to figure out possible solutions to the following problem in streaming replication Consider following scenario: If master receives commit command, it writes and flushes commit WAL records to the disk, It also writes and

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Ants Aasma
On Mon, Apr 8, 2013 at 6:50 PM, Shaun Thomas stho...@optionshouse.com wrote: On 04/08/2013 05:34 AM, Samrat Revagade wrote: One solution to avoid this situation is have the master send WAL records to standby and wait for ACK from standby committing WAL files to disk and only after that commit

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Andres Freund
On 2013-04-08 19:26:33 +0300, Ants Aasma wrote: On Mon, Apr 8, 2013 at 6:50 PM, Shaun Thomas stho...@optionshouse.com wrote: On 04/08/2013 05:34 AM, Samrat Revagade wrote: One solution to avoid this situation is have the master send WAL records to standby and wait for ACK from standby

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Ants Aasma
On Mon, Apr 8, 2013 at 7:38 PM, Andres Freund and...@2ndquadrant.com wrote: On 2013-04-08 19:26:33 +0300, Ants Aasma wrote: Not exactly. Sync-rep ensures that commit success is not sent to the client before a synchronous replica acks the commit record. What Samrat is proposing here is that WAL

Re: [HACKERS] Inconsistent DB data in Streaming Replication

2013-04-08 Thread Fujii Masao
On Mon, Apr 8, 2013 at 7:34 PM, Samrat Revagade revagade.sam...@gmail.com wrote: Hello, We have been trying to figure out possible solutions to the following problem in streaming replication Consider following scenario: If master receives commit command, it writes and flushes commit WAL