Re: [HACKERS] Replication server timeout patch

2011-03-31 Thread Heikki Linnakangas
On 31.03.2011 05:46, Fujii Masao wrote: On Wed, Mar 30, 2011 at 10:54 PM, Robert Haasrobertmh...@gmail.com wrote: On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masaomasao.fu...@gmail.com wrote: On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On

Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Heikki Linnakangas
On 29.03.2011 07:55, Fujii Masao wrote: On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: pq_flush_if_writable() calls internal_flush() without using PG_TRY block. This seems unsafe because for example pgwin32_waitforsinglesocket() called by

Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Fujii Masao
On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: +       pq_putmessage_noblock('d', msgbuf, 1 + sizeof(WalDataMessageHeader) + nbytes); Don't we need to check the return value of pq_putmessage_noblock? That can return EOF when trouble happens (for

Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Heikki Linnakangas
On 30.03.2011 10:58, Fujii Masao wrote: On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: +A value of zero means wait forever. This parameter can only be set in The first sentence sounds misleading. Even if you set the parameter to zero,

Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Fujii Masao
On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 30.03.2011 10:58, Fujii Masao wrote: On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com  wrote: +        A value of zero means wait forever.  This parameter

Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Robert Haas
On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 30.03.2011 10:58, Fujii Masao wrote: On Wed, Mar 30, 2011 at 4:24 PM, Heikki Linnakangas

Re: [HACKERS] Replication server timeout patch

2011-03-30 Thread Fujii Masao
On Wed, Mar 30, 2011 at 10:54 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 30, 2011 at 4:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Wed, Mar 30, 2011 at 5:03 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 30.03.2011 10:58, Fujii Masao wrote: On Wed,

Re: [HACKERS] Replication server timeout patch

2011-03-29 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes: On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas Should we use COMMERROR instead of ERROR if we fail to put the socket in the right mode? Maybe. COMMERROR exists to keep us from trying to send an error report down a failed socket. I would assume

Re: [HACKERS] Replication server timeout patch

2011-03-29 Thread Robert Haas
On Tue, Mar 29, 2011 at 9:24 AM, Tom Lane t...@sss.pgh.pa.us wrote: Fujii Masao masao.fu...@gmail.com writes: On Mon, Mar 28, 2011 at 7:49 PM, Heikki Linnakangas Should we use COMMERROR instead of ERROR if we fail to put the socket in the right mode? Maybe. COMMERROR exists to keep us from

Re: [HACKERS] Replication server timeout patch

2011-03-29 Thread Fujii Masao
On Wed, Mar 30, 2011 at 1:04 AM, Robert Haas robertmh...@gmail.com wrote: COMMERROR exists to keep us from trying to send an error report down a failed socket.  I would assume (perhaps wrongly) that walsender/walreceiver don't try to push error reports across the socket anyway, only to the

Re: [HACKERS] Replication server timeout patch

2011-03-28 Thread Heikki Linnakangas
On 24.03.2011 15:24, Fujii Masao wrote: On Wed, Mar 23, 2011 at 7:33 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I don't much like the API for this. Walsender shouldn't need to know about the details of the FE/BE protocol, pq_putbytes_if_available() seems too low level to

Re: [HACKERS] Replication server timeout patch

2011-03-25 Thread Robert Haas
On Wed, Mar 23, 2011 at 6:33 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 16.03.2011 11:11, Fujii Masao wrote: On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masaomasao.fu...@gmail.com  wrote: Agreed. I'll change the patch. Done. I attached the updated patch. I don't much

Re: [HACKERS] Replication server timeout patch

2011-03-24 Thread Fujii Masao
On Wed, Mar 23, 2011 at 7:33 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I don't much like the API for this. Walsender shouldn't need to know about the details of the FE/BE protocol, pq_putbytes_if_available() seems too low level to be useful. I think a better API would

Re: [HACKERS] Replication server timeout patch

2011-03-23 Thread Heikki Linnakangas
On 16.03.2011 11:11, Fujii Masao wrote: On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masaomasao.fu...@gmail.com wrote: Agreed. I'll change the patch. Done. I attached the updated patch. I don't much like the API for this. Walsender shouldn't need to know about the details of the FE/BE protocol,

Re: [HACKERS] Replication server timeout patch

2011-03-16 Thread Fujii Masao
On Sat, Mar 12, 2011 at 4:34 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao masao.fu...@gmail.com wrote: I think we should consider making this change for 9.1.  This is a real wart, and it's going to become even more of a problem with sync rep, I

Re: [HACKERS] Replication server timeout patch

2011-03-16 Thread Fujii Masao
On Wed, Mar 16, 2011 at 4:49 PM, Fujii Masao masao.fu...@gmail.com wrote: Agreed. I'll change the patch. Done. I attached the updated patch. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center replication_timeout_v6.patch Description: Binary

Re: [HACKERS] Replication server timeout patch

2011-03-11 Thread Fujii Masao
On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao masao.fu...@gmail.com wrote: Why does internal_flush_if_writable compute bufptr differently from

Re: [HACKERS] Replication server timeout patch

2011-03-11 Thread Robert Haas
On Fri, Mar 11, 2011 at 8:14 AM, Fujii Masao masao.fu...@gmail.com wrote: On Mon, Mar 7, 2011 at 8:47 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Mar 6, 2011 at 11:10 PM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Mar 6, 2011 at 5:03 PM, Fujii Masao masao.fu...@gmail.com wrote:

Re: [HACKERS] Replication server timeout patch

2011-03-11 Thread Fujii Masao
On Fri, Mar 11, 2011 at 10:18 PM, Robert Haas robertmh...@gmail.com wrote: I added this replication timeout patch into next CF. I explain why this feature is required for the future review; Without this feature, walsender might unexpectedly remain for a while when the standby crashes or the

Re: [HACKERS] Replication server timeout patch

2011-03-11 Thread Bruce Momjian
Fujii Masao wrote: On Fri, Mar 11, 2011 at 10:18 PM, Robert Haas robertmh...@gmail.com wrote: I added this replication timeout patch into next CF. I explain why this feature is required for the future review; Without this feature, walsender might unexpectedly remain for a while when

Re: [HACKERS] Replication server timeout patch

2011-03-11 Thread Robert Haas
On Fri, Mar 11, 2011 at 8:29 AM, Fujii Masao masao.fu...@gmail.com wrote: I think we should consider making this change for 9.1.  This is a real wart, and it's going to become even more of a problem with sync rep, I think. Yeah, that's a welcome! Please feel free to review the patch. I

Re: [HACKERS] Replication server timeout patch

2011-03-06 Thread Fujii Masao
On Sun, Mar 6, 2011 at 3:23 AM, Robert Haas robertmh...@gmail.com wrote: On Mon, Feb 28, 2011 at 8:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao masao.fu...@gmail.com wrote: There are two things that I think are pretty clear.  If the receiver has

Re: [HACKERS] Replication server timeout patch

2011-03-05 Thread Robert Haas
On Mon, Feb 28, 2011 at 8:08 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao masao.fu...@gmail.com wrote: There are two things that I think are pretty clear.  If the receiver has wal_receiver_status_interval=0, then we should ignore

Re: [HACKERS] Replication server timeout patch

2011-02-28 Thread Fujii Masao
On Sun, Feb 27, 2011 at 11:52 AM, Fujii Masao masao.fu...@gmail.com wrote: There are two things that I think are pretty clear.  If the receiver has wal_receiver_status_interval=0, then we should ignore replication_timeout for that connection. The patch still doesn't check that

Re: [HACKERS] Replication server timeout patch

2011-02-26 Thread Fujii Masao
On Fri, Feb 18, 2011 at 12:10 PM, Robert Haas robertmh...@gmail.com wrote: IMHO, that's so broken as to be useless. I would really like to have a solution to this problem, though. Relying on TCP keepalives is weak. Agreed. I updated the replication timeout patch which I submitted before.

Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Simon Riggs
On Wed, 2011-02-16 at 11:34 +0900, Fujii Masao wrote: On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina dan...@heroku.com wrote: On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff

Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Robert Haas
On Thu, Feb 17, 2011 at 4:21 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2011-02-16 at 11:34 +0900, Fujii Masao wrote: On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina dan...@heroku.com wrote: On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb

Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Josh Berkus
So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for me to say let's commit. Not committing gives us nothing at all, which is as much use as a chocolate teapot. Can someone summarize the cases

Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Simon Riggs
On Thu, 2011-02-17 at 16:42 -0500, Robert Haas wrote: So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for me to say let's commit. Not committing gives us nothing at all, which is as much

Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Fujii Masao
On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus j...@agliodbs.com wrote: So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for me to say let's commit. Not committing gives us nothing at all, which is

Re: [HACKERS] Replication server timeout patch

2011-02-17 Thread Robert Haas
On Thu, Feb 17, 2011 at 9:10 PM, Fujii Masao masao.fu...@gmail.com wrote: On Fri, Feb 18, 2011 at 7:55 AM, Josh Berkus j...@agliodbs.com wrote: So, in summary, the position is that we have a timeout, but that timeout doesn't work in all cases. But it does work in some, so that seems enough for

Re: [HACKERS] Replication server timeout patch

2011-02-15 Thread Robert Haas
On Mon, Feb 14, 2011 at 5:13 PM, Daniel Farina dan...@heroku.com wrote: On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said

Re: [HACKERS] Replication server timeout patch

2011-02-15 Thread Fujii Masao
On Tue, Feb 15, 2011 at 7:13 AM, Daniel Farina dan...@heroku.com wrote: On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said

Re: [HACKERS] Replication server timeout patch

2011-02-14 Thread Fujii Masao
On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said before, the timeout which this patch provides doesn't work well when the walsender gets blocked in sending WAL. At first, we would need to implement a

Re: [HACKERS] Replication server timeout patch

2011-02-14 Thread Daniel Farina
On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said before, the timeout which this patch provides doesn't work well when the

Re: [HACKERS] Replication server timeout patch

2011-02-14 Thread Simon Riggs
On Mon, 2011-02-14 at 14:13 -0800, Daniel Farina wrote: On Mon, Feb 14, 2011 at 12:48 AM, Fujii Masao masao.fu...@gmail.com wrote: On Sat, Feb 12, 2011 at 8:58 AM, Daniel Farina dan...@heroku.com wrote: Context diff equivalent attached. Thanks for the patch! As I said before, the

[HACKERS] Replication server timeout patch

2011-02-11 Thread Daniel Farina
Hello list, I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. -- fdr ***

Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Robert Haas
On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina drfar...@acm.org wrote: I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long

[HACKERS] Replication server timeout patch

2011-02-11 Thread Daniel Farina
Hello list, I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out there...I'll be around in Not Too Long to finish any other details. -- fdr ***

Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Heikki Linnakangas
On 11.02.2011 22:11, Robert Haas wrote: On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org wrote: I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or anything, but I just wanted to get it out

Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Robert Haas
On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 11.02.2011 22:11, Robert Haas wrote: On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org  wrote: I split this out of the synchronous replication patch for independent review. I'm

Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Daniel Farina
On Fri, Feb 11, 2011 at 12:11 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farina drfar...@acm.org wrote: I split this out of the synchronous replication patch for independent review. I'm dashing out the door, so I haven't put it on the CF yet or

Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Robert Haas
On Fri, Feb 11, 2011 at 4:38 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 11.02.2011 22:11, Robert Haas wrote: On Fri, Feb 11, 2011 at 2:02 PM, Daniel Farinadrfar...@acm.org  wrote: I split

Re: [HACKERS] Replication server timeout patch

2011-02-11 Thread Daniel Farina
On Feb 11, 2011 8:20 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 11, 2011 at 4:38 PM, Robert Haas robertmh...@gmail.com wrote: On Fri, Feb 11, 2011 at 4:30 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: On 11.02.2011 22:11, Robert Haas wrote: On Fri, Feb