Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-08 Thread Fujii Masao
On Thu, Jul 8, 2010 at 7:55 AM, Robert Haas wrote: >> What was the final decision on behavior if fsync=off? > > I'm not sure we made any decision, per se, but if you use fsync=off in > combination with SR and experience an unexpected crash-and-reboot on > the master, you will be sad. True. But, w

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread marcin mank
> Having said that, I do think we urgently need some high-level design > discussion on how sync rep is actually going to handle this issue > (perhaps on a new thread).  If we can't resolve this issue, sync rep > is going to be really slow; but there are no easy solutions to this > problem in sight,

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Robert Haas
On Wed, Jul 7, 2010 at 6:44 PM, Josh Berkus wrote: > On 7/6/10 4:44 PM, Robert Haas wrote: >> To recap the previous discussion on this thread, we ended up changing >> the behavior of 9.0 so that it only sends WAL which has been written >> to the OS *and flushed*, because sending unflushed WAL to t

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Josh Berkus
On 7/6/10 4:44 PM, Robert Haas wrote: > To recap the previous discussion on this thread, we ended up changing > the behavior of 9.0 so that it only sends WAL which has been written > to the OS *and flushed*, because sending unflushed WAL to the standby > is unsafe. The standby can get ahead of the

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Dimitri Fontaine
Tom Lane writes: > Dimitri Fontaine writes: >> Stop me if I'm all wrong already, but I though we said that we should >> handle this case by decoupling what we can send to the standby and what >> it can apply. > > What's the point of that? It won't make the standby apply any faster. True, but it

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Tom Lane
Dimitri Fontaine writes: > Stop me if I'm all wrong already, but I though we said that we should > handle this case by decoupling what we can send to the standby and what > it can apply. What's the point of that? It won't make the standby apply any faster. What it will do is make the protocol mo

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Robert Haas
On Wed, Jul 7, 2010 at 4:40 AM, Dimitri Fontaine wrote: > Stop me if I'm all wrong already, but I though we said that we should > handle this case by decoupling what we can send to the standby and what > it can apply. We could do this by sending the current WAL fsync'ed > position on the master in

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Dimitri Fontaine
Robert Haas writes: > If it's unsafe to send written but unflushed WAL to the standby, then > for the same reasons we can't send unwritten WAL either. [...] > Having said that, I do think we urgently need some high-level design > discussion on how sync rep is actually going to handle this issue S

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-06 Thread Robert Haas
On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote: > In 9.0, walsender reads WAL always from the disk and sends it to the standby. > That is, we cannot send WAL until it has been written (and flushed) to the > disk. > This degrades the performance of synchronous replication very much since a > t

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-01 Thread Greg Stark
On Wed, Jun 30, 2010 at 12:37 PM, Robert Haas wrote: > One thought that occurred to me is that if the master and standby were > more tightly coupled, you could recover after a crash by making the > one with the further-advanced WAL position the master, and the other > one the standby.  That would

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-30 Thread Robert Haas
On Wed, Jun 30, 2010 at 5:36 AM, Fujii Masao wrote: >> Before we get too busy frobnicating this gonkulator, I'd like to see a >> little more discussion of what kind of performance people are >> expecting from sync rep.  Sounds to me like the best we can expect >> here is, on every commit: (a) wait

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-30 Thread Fujii Masao
On Wed, Jun 30, 2010 at 11:26 AM, Robert Haas wrote: > Maybe.  As Heikki pointed out upthread, the standby can't even write > the WAL to back to the OS until it's been fsync'd on the master > without risking the problem under discussion. If we change the startup process so that it doesn't go ahea

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-29 Thread Robert Haas
On Tue, Jun 29, 2010 at 10:06 PM, Bruce Momjian wrote: > Simon Riggs wrote: >> On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote: >> >> > The problem is not that the master streams non-fsync'd WAL, but that the >> > standby can replay that. So I'm thinking that we can send non-fsync'd WAL >> >

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-29 Thread Bruce Momjian
Simon Riggs wrote: > On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote: > > > The problem is not that the master streams non-fsync'd WAL, but that the > > standby can replay that. So I'm thinking that we can send non-fsync'd WAL > > safely if the standby makes the recovery wait until the master

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-21 Thread Simon Riggs
On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote: > The problem is not that the master streams non-fsync'd WAL, but that the > standby can replay that. So I'm thinking that we can send non-fsync'd WAL > safely if the standby makes the recovery wait until the master has fsync'd > WAL. That is,

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-21 Thread Greg Stark
On Mon, Jun 21, 2010 at 10:40 AM, Heikki Linnakangas wrote: > I guess, but you have to be very careful to correctly refrain from applying > the WAL. For example, a naive implementation might write the WAL to disk in > walreceiver immediately, but refrain from telling the startup process about > it

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-21 Thread Heikki Linnakangas
On 21/06/10 12:08, Fujii Masao wrote: On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas wrote: In 9.0, I think we can fix this problem by (1) only streaming WAL that has been fsync'd and (2) PANIC-ing if the problem occurs anyway. But in 9.1, with sync rep and the performance demands that entails,

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-21 Thread Fujii Masao
On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas wrote: > On Tue, Jun 15, 2010 at 3:57 PM, Josh Berkus wrote: >>> I wonder if it would be possible to jigger things so that we send the >>> WAL to the standby as soon as it is generated, but somehow arrange >>> things so that the standby knows the last

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Robert Haas
On Tue, Jun 15, 2010 at 8:09 PM, Josh Berkus wrote: > >> I have yet to convince myself of how likely this is to occur.  I tried >> to reproduce this issue by crashing the database, but I think in 9.0 >> you need an actual operating system crash to cause this problem, and I >> haven't yet set up an

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Josh Berkus
On 6/15/10 5:09 PM, Josh Berkus wrote: >> > In 9.0, I think we can fix this problem by (1) only streaming WAL that >> > has been fsync'd and > > I don't think this is the best solution; it would be a noticeable > performance penalty on replication. Actually, there's an even bigger reason not to

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Josh Berkus
> I have yet to convince myself of how likely this is to occur. I tried > to reproduce this issue by crashing the database, but I think in 9.0 > you need an actual operating system crash to cause this problem, and I > haven't yet set up an environment in which I can repeatedly crash the > OS. I

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Robert Haas
On Tue, Jun 15, 2010 at 3:57 PM, Josh Berkus wrote: >> I wonder if it would be possible to jigger things so that we send the >> WAL to the standby as soon as it is generated, but somehow arrange >> things so that the standby knows the last location that the master has >> fsync'd and never applies

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Josh Berkus
> I wonder if it would be possible to jigger things so that we send the > WAL to the standby as soon as it is generated, but somehow arrange > things so that the standby knows the last location that the master has > fsync'd and never applies beyond that point. I can't think of any way which would

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Florian Pflug
On Jun 15, 2010, at 10:45 , Fujii Masao wrote: > A transaction commit would need to wait for local fsync and replication > in a serial manner, in synchronous replication. IOW, walsender cannot > send the commit record until it's fsync'd in XLogWrite(). Hm, but since 9.0 won't do synchronous replic

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Robert Haas
On Tue, Jun 15, 2010 at 12:46 AM, Fujii Masao wrote: > On Mon, Jun 14, 2010 at 10:13 PM, Robert Haas wrote: >> On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote: >>> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote: Maybe.  That sounds like a pretty enormous foot-gun to me, considering

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Fujii Masao
On Tue, Jun 15, 2010 at 2:16 PM, Heikki Linnakangas wrote: > On 15/06/10 07:47, Fujii Masao wrote: >> >> On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane  wrote: >>> >>> Fujii Masao  writes: Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH, xlogctl->LogwrtResult.Write i

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Heikki Linnakangas
On 15/06/10 07:47, Fujii Masao wrote: On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane wrote: Fujii Masao writes: Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH, xlogctl->LogwrtResult.Write is updated after XLogWrite() performs fsync. Wrong. LogwrtResult.Write tracks how far

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane wrote: > Fujii Masao writes: >> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote: >>> Well, we're already not waiting for fsync, which is the slowest part. > >> No, currently walsender waits for fsync. > > No, you're mistaken. > >> Walsender tries to se

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Mon, Jun 14, 2010 at 10:13 PM, Robert Haas wrote: > On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote: >> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote: >>> Maybe.  That sounds like a pretty enormous foot-gun to me, considering >>> that we have no way of recovering from the situation wh

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Tom Lane
Fujii Masao writes: > On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote: >> Well, we're already not waiting for fsync, which is the slowest part. > No, currently walsender waits for fsync. No, you're mistaken. > Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH, > xlogctl->Log

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote: > On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote: >> Maybe.  That sounds like a pretty enormous foot-gun to me, considering >> that we have no way of recovering from the situation where the standby >> gets ahead of the master. > > No, we can

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote: > Maybe.  That sounds like a pretty enormous foot-gun to me, considering > that we have no way of recovering from the situation where the standby > gets ahead of the master. No, we can do that by reconstructing the standby from the backup. And,

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Simon Riggs
On Mon, 2010-06-14 at 17:39 +0900, Fujii Masao wrote: > No, currently walsender waits for fsync. > ... > But that change would cause the problem that Robert pointed out. > http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php Presumably this means that if synchronous_commit = off on p

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Simon Riggs
On Mon, 2010-06-14 at 17:39 +0900, Fujii Masao wrote: > On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote: > > Stefan Kaltenbrunner writes: > >> hmm not sure that is what fujii tried to say - I think his point was > >> that in the original case we would have serialized all the operations > >> (fir

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 4:14 AM, Fujii Masao wrote: > On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas wrote: >> I think the failover case might be OK.  But if the master crashes and >> restarts, the slave might be left thinking its xlog position is ahead >> of the xlog position on the master. > > R

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Sat, Jun 12, 2010 at 12:15 AM, Stefan Kaltenbrunner wrote: > hmm ok - but assuming sync rep we would end up with something like the > following(hypotetically assuming each operation takes 1 time unit): > > originally: > > write 1 > sync 1 > network 1 > write 1 > sync 1 > > total: 5 > > whereas

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote: > Stefan Kaltenbrunner writes: >> hmm not sure that is what fujii tried to say - I think his point was >> that in the original case we would have serialized all the operations >> (first write+sync on the master, network afterwards and write+sync o

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas wrote: > I think the failover case might be OK.  But if the master crashes and > restarts, the slave might be left thinking its xlog position is ahead > of the xlog position on the master. Right. Unless we perform a failover in this case, the standby

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-13 Thread Greg Smith
Florian Pflug wrote: glibc defines O_DSYNC as an alias for O_SYNC and warrants that with "Most Linux filesystems don't actually implement the POSIX O_SYNC semantics, which require all metadata updates of a write to be on disk on returning to userspace, but only the O_DSYNC semantics, which requ

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-12 Thread Heikki Linnakangas
On 12/06/10 01:16, Josh Berkus wrote: Well, we're already not waiting for fsync, which is the slowest part. If there's a performance problem, it may be because FADVISE_DONTNEED disables kernel buffering so that we're forced to actually read the data back from disk before sending it on down the

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-12 Thread Florian Pflug
On Jun 12, 2010, at 3:10 , Josh Berkus wrote: >> Hm, but then Robert's failure case is real, and streaming replication might >> break due to an OS-level crash of the master. Or am I missing something? > > 1) Master goes out > 2) "floating" transaction applied to standby. > 3) Standby goes out > 4

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Josh Berkus
> Hm, but then Robert's failure case is real, and streaming replication might > break due to an OS-level crash of the master. Or am I missing something? Well, in the failover case this isn't a problem, it's a benefit: the standby gets a transaction which you would have lost off the master. Howev

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Florian Pflug
On Jun 11, 2010, at 16:31 , Tom Lane wrote: > Fujii Masao writes: >> In 9.0, walsender reads WAL always from the disk and sends it to the standby. >> That is, we cannot send WAL until it has been written (and flushed) to the >> disk. > > I believe the above statement to be incorrect: walsender d

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Josh Berkus
> Well, we're already not waiting for fsync, which is the slowest part. > If there's a performance problem, it may be because FADVISE_DONTNEED > disables kernel buffering so that we're forced to actually read the data > back from disk before sending it on down the wire. Well, that's fairly direct

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Stefan Kaltenbrunner
On 06/11/2010 04:47 PM, Tom Lane wrote: Stefan Kaltenbrunner writes: hmm not sure that is what fujii tried to say - I think his point was that in the original case we would have serialized all the operations (first write+sync on the master, network afterwards and write+sync on the slave) and no

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Tom Lane
Stefan Kaltenbrunner writes: > hmm not sure that is what fujii tried to say - I think his point was > that in the original case we would have serialized all the operations > (first write+sync on the master, network afterwards and write+sync on > the slave) and now we could try parallelizing by

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Stefan Kaltenbrunner
On 06/11/2010 04:31 PM, Tom Lane wrote: Fujii Masao writes: In 9.0, walsender reads WAL always from the disk and sends it to the standby. That is, we cannot send WAL until it has been written (and flushed) to the disk. I believe the above statement to be incorrect: walsender does *not* wait f

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Tom Lane
Fujii Masao writes: > In 9.0, walsender reads WAL always from the disk and sends it to the standby. > That is, we cannot send WAL until it has been written (and flushed) to the > disk. I believe the above statement to be incorrect: walsender does *not* wait for an fsync to occur. I agree with t

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Robert Haas
On Fri, Jun 11, 2010 at 9:57 AM, Fujii Masao wrote: > On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas wrote: >> On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote: >>> Thought? Comment? Objection? >> >> What happens if the WAL is streamed to the standby and then the master >> crashes without writi

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Fujii Masao
On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas wrote: > On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote: >> Thought? Comment? Objection? > > What happens if the WAL is streamed to the standby and then the master > crashes without writing that WAL to disk? What are you concerned about? I think

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Robert Haas
On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote: > Thought? Comment? Objection? What happens if the WAL is streamed to the standby and then the master crashes without writing that WAL to disk? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent

[HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Fujii Masao
Hi, In 9.0, walsender reads WAL always from the disk and sends it to the standby. That is, we cannot send WAL until it has been written (and flushed) to the disk. This degrades the performance of synchronous replication very much since a transaction commit must wait for the WAL write time *plus* t