On Thu, Jul 8, 2010 at 7:55 AM, Robert Haas wrote:
>> What was the final decision on behavior if fsync=off?
>
> I'm not sure we made any decision, per se, but if you use fsync=off in
> combination with SR and experience an unexpected crash-and-reboot on
> the master, you will be sad.
True. But, w
> Having said that, I do think we urgently need some high-level design
> discussion on how sync rep is actually going to handle this issue
> (perhaps on a new thread). If we can't resolve this issue, sync rep
> is going to be really slow; but there are no easy solutions to this
> problem in sight,
On Wed, Jul 7, 2010 at 6:44 PM, Josh Berkus wrote:
> On 7/6/10 4:44 PM, Robert Haas wrote:
>> To recap the previous discussion on this thread, we ended up changing
>> the behavior of 9.0 so that it only sends WAL which has been written
>> to the OS *and flushed*, because sending unflushed WAL to t
On 7/6/10 4:44 PM, Robert Haas wrote:
> To recap the previous discussion on this thread, we ended up changing
> the behavior of 9.0 so that it only sends WAL which has been written
> to the OS *and flushed*, because sending unflushed WAL to the standby
> is unsafe. The standby can get ahead of the
Tom Lane writes:
> Dimitri Fontaine writes:
>> Stop me if I'm all wrong already, but I though we said that we should
>> handle this case by decoupling what we can send to the standby and what
>> it can apply.
>
> What's the point of that? It won't make the standby apply any faster.
True, but it
Dimitri Fontaine writes:
> Stop me if I'm all wrong already, but I though we said that we should
> handle this case by decoupling what we can send to the standby and what
> it can apply.
What's the point of that? It won't make the standby apply any faster.
What it will do is make the protocol mo
On Wed, Jul 7, 2010 at 4:40 AM, Dimitri Fontaine wrote:
> Stop me if I'm all wrong already, but I though we said that we should
> handle this case by decoupling what we can send to the standby and what
> it can apply. We could do this by sending the current WAL fsync'ed
> position on the master in
Robert Haas writes:
> If it's unsafe to send written but unflushed WAL to the standby, then
> for the same reasons we can't send unwritten WAL either.
[...]
> Having said that, I do think we urgently need some high-level design
> discussion on how sync rep is actually going to handle this issue
S
On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote:
> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
> That is, we cannot send WAL until it has been written (and flushed) to the
> disk.
> This degrades the performance of synchronous replication very much since a
> t
On Wed, Jun 30, 2010 at 12:37 PM, Robert Haas wrote:
> One thought that occurred to me is that if the master and standby were
> more tightly coupled, you could recover after a crash by making the
> one with the further-advanced WAL position the master, and the other
> one the standby. That would
On Wed, Jun 30, 2010 at 5:36 AM, Fujii Masao wrote:
>> Before we get too busy frobnicating this gonkulator, I'd like to see a
>> little more discussion of what kind of performance people are
>> expecting from sync rep. Sounds to me like the best we can expect
>> here is, on every commit: (a) wait
On Wed, Jun 30, 2010 at 11:26 AM, Robert Haas wrote:
> Maybe. As Heikki pointed out upthread, the standby can't even write
> the WAL to back to the OS until it's been fsync'd on the master
> without risking the problem under discussion.
If we change the startup process so that it doesn't go ahea
On Tue, Jun 29, 2010 at 10:06 PM, Bruce Momjian wrote:
> Simon Riggs wrote:
>> On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote:
>>
>> > The problem is not that the master streams non-fsync'd WAL, but that the
>> > standby can replay that. So I'm thinking that we can send non-fsync'd WAL
>> >
Simon Riggs wrote:
> On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote:
>
> > The problem is not that the master streams non-fsync'd WAL, but that the
> > standby can replay that. So I'm thinking that we can send non-fsync'd WAL
> > safely if the standby makes the recovery wait until the master
On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote:
> The problem is not that the master streams non-fsync'd WAL, but that the
> standby can replay that. So I'm thinking that we can send non-fsync'd WAL
> safely if the standby makes the recovery wait until the master has fsync'd
> WAL. That is,
On Mon, Jun 21, 2010 at 10:40 AM, Heikki Linnakangas
wrote:
> I guess, but you have to be very careful to correctly refrain from applying
> the WAL. For example, a naive implementation might write the WAL to disk in
> walreceiver immediately, but refrain from telling the startup process about
> it
On 21/06/10 12:08, Fujii Masao wrote:
On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas wrote:
In 9.0, I think we can fix this problem by (1) only streaming WAL that
has been fsync'd and (2) PANIC-ing if the problem occurs anyway. But
in 9.1, with sync rep and the performance demands that entails,
On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas wrote:
> On Tue, Jun 15, 2010 at 3:57 PM, Josh Berkus wrote:
>>> I wonder if it would be possible to jigger things so that we send the
>>> WAL to the standby as soon as it is generated, but somehow arrange
>>> things so that the standby knows the last
On Tue, Jun 15, 2010 at 8:09 PM, Josh Berkus wrote:
>
>> I have yet to convince myself of how likely this is to occur. I tried
>> to reproduce this issue by crashing the database, but I think in 9.0
>> you need an actual operating system crash to cause this problem, and I
>> haven't yet set up an
On 6/15/10 5:09 PM, Josh Berkus wrote:
>> > In 9.0, I think we can fix this problem by (1) only streaming WAL that
>> > has been fsync'd and
>
> I don't think this is the best solution; it would be a noticeable
> performance penalty on replication.
Actually, there's an even bigger reason not to
> I have yet to convince myself of how likely this is to occur. I tried
> to reproduce this issue by crashing the database, but I think in 9.0
> you need an actual operating system crash to cause this problem, and I
> haven't yet set up an environment in which I can repeatedly crash the
> OS. I
On Tue, Jun 15, 2010 at 3:57 PM, Josh Berkus wrote:
>> I wonder if it would be possible to jigger things so that we send the
>> WAL to the standby as soon as it is generated, but somehow arrange
>> things so that the standby knows the last location that the master has
>> fsync'd and never applies
> I wonder if it would be possible to jigger things so that we send the
> WAL to the standby as soon as it is generated, but somehow arrange
> things so that the standby knows the last location that the master has
> fsync'd and never applies beyond that point.
I can't think of any way which would
On Jun 15, 2010, at 10:45 , Fujii Masao wrote:
> A transaction commit would need to wait for local fsync and replication
> in a serial manner, in synchronous replication. IOW, walsender cannot
> send the commit record until it's fsync'd in XLogWrite().
Hm, but since 9.0 won't do synchronous replic
On Tue, Jun 15, 2010 at 12:46 AM, Fujii Masao wrote:
> On Mon, Jun 14, 2010 at 10:13 PM, Robert Haas wrote:
>> On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote:
>>> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote:
Maybe. That sounds like a pretty enormous foot-gun to me, considering
On Tue, Jun 15, 2010 at 2:16 PM, Heikki Linnakangas
wrote:
> On 15/06/10 07:47, Fujii Masao wrote:
>>
>> On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane wrote:
>>>
>>> Fujii Masao writes:
Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
xlogctl->LogwrtResult.Write i
On 15/06/10 07:47, Fujii Masao wrote:
On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane wrote:
Fujii Masao writes:
Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
xlogctl->LogwrtResult.Write is updated after XLogWrite() performs fsync.
Wrong. LogwrtResult.Write tracks how far
On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane wrote:
> Fujii Masao writes:
>> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote:
>>> Well, we're already not waiting for fsync, which is the slowest part.
>
>> No, currently walsender waits for fsync.
>
> No, you're mistaken.
>
>> Walsender tries to se
On Mon, Jun 14, 2010 at 10:13 PM, Robert Haas wrote:
> On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote:
>> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote:
>>> Maybe. That sounds like a pretty enormous foot-gun to me, considering
>>> that we have no way of recovering from the situation wh
Fujii Masao writes:
> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote:
>> Well, we're already not waiting for fsync, which is the slowest part.
> No, currently walsender waits for fsync.
No, you're mistaken.
> Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
> xlogctl->Log
On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote:
> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote:
>> Maybe. That sounds like a pretty enormous foot-gun to me, considering
>> that we have no way of recovering from the situation where the standby
>> gets ahead of the master.
>
> No, we can
On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote:
> Maybe. That sounds like a pretty enormous foot-gun to me, considering
> that we have no way of recovering from the situation where the standby
> gets ahead of the master.
No, we can do that by reconstructing the standby from the backup.
And,
On Mon, 2010-06-14 at 17:39 +0900, Fujii Masao wrote:
> No, currently walsender waits for fsync.
> ...
> But that change would cause the problem that Robert pointed out.
> http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php
Presumably this means that if synchronous_commit = off on p
On Mon, 2010-06-14 at 17:39 +0900, Fujii Masao wrote:
> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote:
> > Stefan Kaltenbrunner writes:
> >> hmm not sure that is what fujii tried to say - I think his point was
> >> that in the original case we would have serialized all the operations
> >> (fir
On Mon, Jun 14, 2010 at 4:14 AM, Fujii Masao wrote:
> On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas wrote:
>> I think the failover case might be OK. But if the master crashes and
>> restarts, the slave might be left thinking its xlog position is ahead
>> of the xlog position on the master.
>
> R
On Sat, Jun 12, 2010 at 12:15 AM, Stefan Kaltenbrunner
wrote:
> hmm ok - but assuming sync rep we would end up with something like the
> following(hypotetically assuming each operation takes 1 time unit):
>
> originally:
>
> write 1
> sync 1
> network 1
> write 1
> sync 1
>
> total: 5
>
> whereas
On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote:
> Stefan Kaltenbrunner writes:
>> hmm not sure that is what fujii tried to say - I think his point was
>> that in the original case we would have serialized all the operations
>> (first write+sync on the master, network afterwards and write+sync o
On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas wrote:
> I think the failover case might be OK. But if the master crashes and
> restarts, the slave might be left thinking its xlog position is ahead
> of the xlog position on the master.
Right. Unless we perform a failover in this case, the standby
Florian Pflug wrote:
glibc defines O_DSYNC as an alias for O_SYNC and warrants that with
"Most Linux filesystems don't actually implement the POSIX O_SYNC semantics, which
require all metadata updates of a write to be on disk on returning to userspace, but only
the O_DSYNC semantics, which requ
On 12/06/10 01:16, Josh Berkus wrote:
Well, we're already not waiting for fsync, which is the slowest part.
If there's a performance problem, it may be because FADVISE_DONTNEED
disables kernel buffering so that we're forced to actually read the data
back from disk before sending it on down the
On Jun 12, 2010, at 3:10 , Josh Berkus wrote:
>> Hm, but then Robert's failure case is real, and streaming replication might
>> break due to an OS-level crash of the master. Or am I missing something?
>
> 1) Master goes out
> 2) "floating" transaction applied to standby.
> 3) Standby goes out
> 4
> Hm, but then Robert's failure case is real, and streaming replication might
> break due to an OS-level crash of the master. Or am I missing something?
Well, in the failover case this isn't a problem, it's a benefit: the
standby gets a transaction which you would have lost off the master.
Howev
On Jun 11, 2010, at 16:31 , Tom Lane wrote:
> Fujii Masao writes:
>> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
>> That is, we cannot send WAL until it has been written (and flushed) to the
>> disk.
>
> I believe the above statement to be incorrect: walsender d
> Well, we're already not waiting for fsync, which is the slowest part.
> If there's a performance problem, it may be because FADVISE_DONTNEED
> disables kernel buffering so that we're forced to actually read the data
> back from disk before sending it on down the wire.
Well, that's fairly direct
On 06/11/2010 04:47 PM, Tom Lane wrote:
Stefan Kaltenbrunner writes:
hmm not sure that is what fujii tried to say - I think his point was
that in the original case we would have serialized all the operations
(first write+sync on the master, network afterwards and write+sync on
the slave) and no
Stefan Kaltenbrunner writes:
> hmm not sure that is what fujii tried to say - I think his point was
> that in the original case we would have serialized all the operations
> (first write+sync on the master, network afterwards and write+sync on
> the slave) and now we could try parallelizing by
On 06/11/2010 04:31 PM, Tom Lane wrote:
Fujii Masao writes:
In 9.0, walsender reads WAL always from the disk and sends it to the standby.
That is, we cannot send WAL until it has been written (and flushed) to the disk.
I believe the above statement to be incorrect: walsender does *not* wait
f
Fujii Masao writes:
> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
> That is, we cannot send WAL until it has been written (and flushed) to the
> disk.
I believe the above statement to be incorrect: walsender does *not* wait
for an fsync to occur.
I agree with t
On Fri, Jun 11, 2010 at 9:57 AM, Fujii Masao wrote:
> On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas wrote:
>> On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote:
>>> Thought? Comment? Objection?
>>
>> What happens if the WAL is streamed to the standby and then the master
>> crashes without writi
On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas wrote:
> On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote:
>> Thought? Comment? Objection?
>
> What happens if the WAL is streamed to the standby and then the master
> crashes without writing that WAL to disk?
What are you concerned about?
I think
On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote:
> Thought? Comment? Objection?
What happens if the WAL is streamed to the standby and then the master
crashes without writing that WAL to disk?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
--
Sent
Hi,
In 9.0, walsender reads WAL always from the disk and sends it to the standby.
That is, we cannot send WAL until it has been written (and flushed) to the disk.
This degrades the performance of synchronous replication very much since a
transaction commit must wait for the WAL write time *plus* t
52 matches
Mail list logo