Re: [HACKERS] Keepalive for max_standby_delay

2010-07-03 Thread Heikki Linnakangas
On 02/07/10 23:36, Tom Lane wrote: Robert Haasrobertmh...@gmail.com writes: I haven't been able to wrap my head around why the delay should be LESS in the archive case than in the streaming case. Can you attempt to hit me with the clue-by-four? In the archive case, you're presumably trying

Re: [HACKERS] Keepalive for max_standby_delay

2010-07-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: It would seem logical to use the same logic for archive recovery as we do for streaming replication, and only set XLogReceiptTime when you have to wait for a WAL segment to arrive into the archive, ie. when restore_command

Re: [HACKERS] Keepalive for max_standby_delay

2010-07-03 Thread Heikki Linnakangas
On 03/07/10 18:32, Tom Lane wrote: Heikki Linnakangasheikki.linnakan...@enterprisedb.com writes: It would seem logical to use the same logic for archive recovery as we do for streaming replication, and only set XLogReceiptTime when you have to wait for a WAL segment to arrive into the archive,

Re: [HACKERS] Keepalive for max_standby_delay

2010-07-03 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: On 03/07/10 18:32, Tom Lane wrote: That would not do what you want at all in the case where you're recovering from archive --- XLogReceiptTime would never advance at all for the duration of the recovery. Do you mean when using

Re: [HACKERS] Keepalive for max_standby_delay

2010-07-02 Thread Tom Lane
[ Apologies for the very slow turnaround on this --- I got hit with another batch of non-postgres security issues this week. ] Attached is a draft patch for revising the max_standby_delay behavior into something I think is a bit saner. There is some unfinished business: * I haven't touched the

Re: [HACKERS] Keepalive for max_standby_delay

2010-07-02 Thread Robert Haas
On Fri, Jul 2, 2010 at 4:11 PM, Tom Lane t...@sss.pgh.pa.us wrote: [ Apologies for the very slow turnaround on this --- I got hit with another batch of non-postgres security issues this week. ] Attached is a draft patch for revising the max_standby_delay behavior into something I think is a

Re: [HACKERS] Keepalive for max_standby_delay

2010-07-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: I haven't been able to wrap my head around why the delay should be LESS in the archive case than in the streaming case. Can you attempt to hit me with the clue-by-four? In the archive case, you're presumably trying to catch up, and so it makes sense

Re: [HACKERS] Keepalive for max_standby_delay

2010-07-02 Thread Robert Haas
On Fri, Jul 2, 2010 at 4:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I haven't been able to wrap my head around why the delay should be LESS in the archive case than in the streaming case.  Can you attempt to hit me with the clue-by-four? In the

Re: [HACKERS] Keepalive for max_standby_delay

2010-07-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Jul 2, 2010 at 4:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: In the archive case, you're presumably trying to catch up, and so it makes sense to kill queries faster so you can catch up. On the flip side, the timeout for the WAL segment is for

Re: [HACKERS] Keepalive for max_standby_delay

2010-07-02 Thread Robert Haas
On Fri, Jul 2, 2010 at 4:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jul 2, 2010 at 4:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: In the archive case, you're presumably trying to catch up, and so it makes sense to kill queries faster so you can

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-30 Thread Simon Riggs
On Mon, 2010-06-28 at 10:09 -0700, Josh Berkus wrote: It will get done. It is not the very first thing on my to-do list. ??? What is then? If it's not the first thing on your priority list, with 9.0 getting later by the day, maybe we should leave it to Robert and Simon, who *do* seem

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-30 Thread Bruce Momjian
Simon Riggs wrote: On Mon, 2010-06-28 at 10:09 -0700, Josh Berkus wrote: It will get done. It is not the very first thing on my to-do list. ??? What is then? If it's not the first thing on your priority list, with 9.0 getting later by the day, maybe we should leave it to Robert

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-28 Thread Simon Riggs
On Wed, 2010-06-16 at 21:56 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jun 9, 2010 at 8:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yes, I'll get with it ... Any update on this? Sorry, I've been a bit distracted by other responsibilities (libtiff security

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-28 Thread Robert Haas
On Mon, Jun 28, 2010 at 3:17 AM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2010-06-16 at 21:56 -0400, Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jun 9, 2010 at 8:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yes, I'll get with it ... Any update on this? Sorry,

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-28 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Wed, 2010-06-16 at 21:56 -0400, Tom Lane wrote: Sorry, I've been a bit distracted by other responsibilities (libtiff security issues for Red Hat, if you must know). I'll get on it shortly. I don't think the PostgreSQL project should wait any

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-28 Thread Josh Berkus
It will get done. It is not the very first thing on my to-do list. ??? What is then? If it's not the first thing on your priority list, with 9.0 getting later by the day, maybe we should leave it to Robert and Simon, who *do* seem to have it first on *their* list? I swear, when Simon

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-28 Thread Robert Haas
On Mon, Jun 28, 2010 at 10:19 AM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On Wed, 2010-06-16 at 21:56 -0400, Tom Lane wrote: Sorry, I've been a bit distracted by other responsibilities (libtiff security issues for Red Hat, if you must know).  I'll get on it

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-28 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: ... It is even more unreasonable to commit to providing a timely patch (twice) and then fail to do so. We are trying to finalize a release here, and you've made it clear you think this code needs revision before then. I respect your opinion, but not

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-28 Thread Robert Haas
On Mon, Jun 28, 2010 at 2:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: ... It is even more unreasonable to commit to providing a timely patch (twice) and then fail to do so.  We are trying to finalize a release here, and you've made it clear you think

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-21 Thread Robert Haas
On Mon, Jun 21, 2010 at 12:20 AM, Ron Mayer rm...@cheapcomplexdevices.com wrote: Robert Haas wrote: On Wed, Jun 16, 2010 at 9:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: Sorry, I've been a bit distracted by other responsibilities (libtiff security issues for Red Hat, if you must know).  I'll get

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-20 Thread Ron Mayer
Robert Haas wrote: On Wed, Jun 16, 2010 at 9:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: Sorry, I've been a bit distracted by other responsibilities (libtiff security issues for Red Hat, if you must know). I'll get on it shortly. What? You have other things to do besides hack on PostgreSQL?

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-17 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Wed, Jun 9, 2010 at 8:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yes, I'll get with it ... Any update on this? Sorry, I've been a bit distracted by other responsibilities (libtiff security issues for Red Hat, if you must know). I'll get on it

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-17 Thread Robert Haas
On Wed, Jun 16, 2010 at 9:56 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jun 9, 2010 at 8:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Yes, I'll get with it ... Any update on this? Sorry, I've been a bit distracted by other responsibilities (libtiff

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-16 Thread Robert Haas
On Wed, Jun 9, 2010 at 8:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On Thu, 2010-06-03 at 19:02 -0400, Tom Lane wrote: I decided there wasn't time to get anything useful done on it before the beta2 deadline (which is, more or less, right now).  I will

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-09 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Thu, 2010-06-03 at 19:02 -0400, Tom Lane wrote: I decided there wasn't time to get anything useful done on it before the beta2 deadline (which is, more or less, right now). I will take another look over the next few days. We all really need you

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-09 Thread Simon Riggs
On Thu, 2010-06-03 at 19:02 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Thu, 2010-06-03 at 18:18 +0100, Simon Riggs wrote: Are you planning to work on these things now as you said? Are you? Or do you want me to? I decided there wasn't time to get anything useful

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Simon Riggs
On Wed, 2010-06-02 at 16:00 -0400, Tom Lane wrote: the current situation that query grace periods go to zero Possibly a better way to handle this concern is to make the second parameter: min_standby_grace_period - the minimum time a query will be given in which to execute, even if

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Fujii Masao
On Thu, Jun 3, 2010 at 4:41 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I don't understand why you want to use a different delay when you're restoring from archive vs. when you're streaming (what about existing WAL files found in pg_xlog, BTW?). The source of WAL shouldn't

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Simon Riggs
On Thu, 2010-06-03 at 17:56 +0900, Fujii Masao wrote: On Thu, Jun 3, 2010 at 4:41 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I don't understand why you want to use a different delay when you're restoring from archive vs. when you're streaming (what about existing WAL

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Fujii Masao
On Thu, Jun 3, 2010 at 6:07 PM, Simon Riggs si...@2ndquadrant.com wrote: On Thu, 2010-06-03 at 17:56 +0900, Fujii Masao wrote: On Thu, Jun 3, 2010 at 4:41 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: I don't understand why you want to use a different delay when you're

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Simon Riggs
On Thu, 2010-06-03 at 18:39 +0900, Fujii Masao wrote: What purpose would that serve? Tom has already explained this and it makes sense for me. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Fujii Masao
On Thu, Jun 3, 2010 at 5:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: I stand by my suggestion from yesterday: Let's define max_standby_delay as the difference between a piece of WAL becoming available in the standby, and applying it. My proposal is essentially the same as yours plus allowing

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Greg Stark
On Thu, Jun 3, 2010 at 12:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: I was assuming the walreceiver only requests more wal in relatively small chunks and only when replay has caught up and needs more data. I haven't actually read this code so if that

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Tom Lane
Greg Stark gsst...@mit.edu writes: Well, if the slave can't keep up, that's a separate problem.  It will not fail to keep up as a result of the transmission mechanism. No, I mean if the slave is paused due to a conflict. Does it stop reading data from the master or does it buffer it up on

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Greg Stark
On Thu, Jun 3, 2010 at 4:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: The data keeps coming in and getting dumped into the slave's pg_xlog. walsender/walreceiver are not at all tied to the slave's application of WAL.  In principle we could have the code around max_standby_delay notice just how

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Wed, 2010-06-02 at 13:14 -0400, Tom Lane wrote: This patch seems to me to be going in fundamentally the wrong direction. It's adding complexity and overhead (far more than is needed), and it's failing utterly to resolve the objections that I raised

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Wed, 2010-06-02 at 16:00 -0400, Tom Lane wrote: the current situation that query grace periods go to zero Possibly a better way to handle this concern is to make the second parameter: min_standby_grace_period - the minimum time a query will be

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Simon Riggs
On Thu, 2010-06-03 at 12:47 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Wed, 2010-06-02 at 13:14 -0400, Tom Lane wrote: This patch seems to me to be going in fundamentally the wrong direction. It's adding complexity and overhead (far more than is needed), and it's

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Thu, 2010-06-03 at 12:47 -0400, Tom Lane wrote: But in any case the current behavior is still quite broken as regards reading stale timestamps from WAL. Agreed. That wasn't the objective of this patch or a priority. If you're reading from an

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Simon Riggs
On Thu, 2010-06-03 at 13:32 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: On Thu, 2010-06-03 at 12:47 -0400, Tom Lane wrote: But in any case the current behavior is still quite broken as regards reading stale timestamps from WAL. Agreed. That wasn't the objective of

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Greg Stark
On Thu, Jun 3, 2010 at 4:18 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: On Thu, Jun 3, 2010 at 12:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: It is off-base.  The receiver does not request data, the sender is what determines how much WAL is sent when. Hm, so what

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Tom Lane
Greg Stark gsst...@mit.edu writes: On Thu, Jun 3, 2010 at 12:11 AM, Tom Lane t...@sss.pgh.pa.us wrote: It is off-base.  The receiver does not request data, the sender is what determines how much WAL is sent when. Hm, so what happens if the slave blocks, doesn't the sender block when the

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Simon Riggs
On Thu, 2010-06-03 at 18:18 +0100, Simon Riggs wrote: Are you planning to work on these things now as you said? Are you? Or do you want me to? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-03 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: On Thu, 2010-06-03 at 18:18 +0100, Simon Riggs wrote: Are you planning to work on these things now as you said? Are you? Or do you want me to? I decided there wasn't time to get anything useful done on it before the beta2 deadline (which is, more or

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Simon Riggs
On Mon, 2010-05-31 at 14:40 -0400, Bruce Momjian wrote: Uh, we have three days before we package 9.0beta2. It would be good if we could decide on the max_standby_delay issue soon. I've heard something from Heikki, not from anyone else. Those comments amount to lets replace max_standby_delay

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2010-05-31 at 14:40 -0400, Bruce Momjian wrote: Uh, we have three days before we package 9.0beta2. It would be good if we could decide on the max_standby_delay issue soon. I've heard something from Heikki, not from anyone else. Those

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: OK, here's v4. I've been trying to stay out of this thread, but with beta2 approaching and no resolution in sight, I'm afraid I have to get involved. This patch seems to me to be going in fundamentally the wrong direction. It's adding complexity and

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: An important property of this design is that all relevant timestamps are taken on the slave, so clock skew isn't an issue. I agree that this is important, and I do run NTP on all my servers and even monitor it using Nagios. It's still not a cure-all for

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Andrew Dunstan
Tom Lane wrote: I'm still inclined to apply the part of Simon's patch that adds a transmit timestamp to each SR send chunk. That would actually be completely unused by the slave given my proposal above, but I think that it is an important step to take to future-proof the SR protocol against

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: Comments? I'm not really a huge fan of adding another GUC, to be honest. I'm more inclined to say we treat 'max_archive_delay' as '0', and turn max_streaming_delay into what you've described. If we fall back so

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Simon Riggs
On Wed, 2010-06-02 at 13:45 -0400, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: Comments? I'm not really a huge fan of adding another GUC, to be honest. I'm more inclined to say we treat 'max_archive_delay' as '0', and turn

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Robert Haas
On Wed, Jun 2, 2010 at 2:03 PM, Simon Riggs si...@2ndquadrant.com wrote: On Wed, 2010-06-02 at 13:45 -0400, Tom Lane wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: Comments? I'm not really a huge fan of adding another GUC, to be honest.  I'm more

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Greg Stark
On Wed, Jun 2, 2010 at 6:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: I believe that the motivation for treating archived timestamps as live is, essentially, to force rapid catchup if a slave falls behind so far that it's reading from archive instead of SR. Huh, I think this is the first

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Simon Riggs
On Wed, 2010-06-02 at 13:14 -0400, Tom Lane wrote: This patch seems to me to be going in fundamentally the wrong direction. It's adding complexity and overhead (far more than is needed), and it's failing utterly to resolve the objections that I raised to start with. Having read your proposal,

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Robert Haas
On Wed, Jun 2, 2010 at 2:27 PM, Simon Riggs si...@2ndquadrant.com wrote: Syncing two servers in replication is common practice, as has been explained here; I'm still surprised people think otherwise. Measuring the time between two servers is the very purpose of the patch, so the

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Tom Lane
Greg Stark gsst...@mit.edu writes: On Wed, Jun 2, 2010 at 6:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: I believe that the motivation for treating archived timestamps as live is, essentially, to force rapid catchup if a slave falls behind so far that it's reading from archive instead of SR.

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Heikki Linnakangas
On 02/06/10 20:14, Tom Lane wrote: For realistic values of max_standby_delay ... Hang on right there. What do you consider a realistic value for max_standby_delay? Because I'm not sure I have a grip on that myself. 5 seconds? 5 minutes? 5 hours? I can see use cases for all of those...

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: The problem with defining max_archive_delay that way is again that you can fall behind indefinitely. See my response to Greg Stark --- I don't think this is really an issue. It's certainly far less of an issue than the current

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Greg Stark
On Wed, Jun 2, 2010 at 8:14 PM, Tom Lane t...@sss.pgh.pa.us wrote: Indeed, but nothing we do can prevent that, if the slave is just plain slower than the master.  You have to assume that the slave is capable of keeping up in the absence of query-caused delays, or you're hosed. I was assuming

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Tom Lane
Greg Stark gsst...@mit.edu writes: I was assuming the walreceiver only requests more wal in relatively small chunks and only when replay has caught up and needs more data. I haven't actually read this code so if that assumption is wrong then I'm off-base. It is off-base. The receiver does

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-02 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote: Greg Stark gsst...@mit.edu writes: So I think this isn't necessarily such a blue moon event. As I understand it, all it would take is a single long-running report and a vacuum or HOT cleanup occurring on the master. I think this is mostly FUD too.

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-01 Thread Heikki Linnakangas
On 27/05/10 20:26, Simon Riggs wrote: On Wed, 2010-05-26 at 16:22 -0700, Josh Berkus wrote: Just this second posted about that, as it turns out. I have a v3 *almost* ready of the keepalive patch. It still makes sense to me after a few days reflection, so is worth discussion and review. In or

Re: [HACKERS] Keepalive for max_standby_delay

2010-06-01 Thread Simon Riggs
Thanks for the review. On Tue, 2010-06-01 at 13:36 +0300, Heikki Linnakangas wrote: If we really want to try to salvage max_standby_delay with a meaning similar to what it has now, I think we should go with the idea some people bashed around earlier and define the grace period as the

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-31 Thread Bruce Momjian
Uh, we have three days before we package 9.0beta2. It would be good if we could decide on the max_standby_delay issue soon. --- Simon Riggs wrote: On Wed, 2010-05-26 at 16:22 -0700, Josh Berkus wrote: Just this second

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-27 Thread Simon Riggs
On Wed, 2010-05-26 at 16:22 -0700, Josh Berkus wrote: Just this second posted about that, as it turns out. I have a v3 *almost* ready of the keepalive patch. It still makes sense to me after a few days reflection, so is worth discussion and review. In or out, I want this settled within

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Heikki Linnakangas
On 19/05/10 00:37, Simon Riggs wrote: On Tue, 2010-05-18 at 17:25 -0400, Heikki Linnakangas wrote: On 18/05/10 17:17, Simon Riggs wrote: There's no reason that the buffer size we use for XLogRead() should be the same as the send buffer, if you're worried about that. My point is that

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Josh Berkus
Committed with chunk size of 128 kB. I hope that's a reasonable compromise, in the absence of any performance test data either way. So where are we with max_standby_delay? Status check? -- -- Josh Berkus PostgreSQL

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Simon Riggs
On Thu, 2010-05-27 at 01:34 +0300, Heikki Linnakangas wrote: Committed with chunk size of 128 kB. Thanks. I'm sure that's fine. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription:

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Simon Riggs
On Sun, 2010-05-16 at 17:11 +0100, Simon Riggs wrote: New version, with some other cleanup of wait processing. New logic is that when Startup asks for next applychunk of WAL it saves the lastChunkTimestamp. That is then the base time used by WaitExceedsMaxStandbyDelay(), except when

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Simon Riggs
On Wed, 2010-05-26 at 15:45 -0700, Josh Berkus wrote: Committed with chunk size of 128 kB. I hope that's a reasonable compromise, in the absence of any performance test data either way. So where are we with max_standby_delay? Status check? Just this second posted about that, as it turns

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-26 Thread Josh Berkus
Just this second posted about that, as it turns out. I have a v3 *almost* ready of the keepalive patch. It still makes sense to me after a few days reflection, so is worth discussion and review. In or out, I want this settled within a week. Definitely need some RR here. Does the keepalive

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-18 Thread Heikki Linnakangas
On 17/05/10 04:40, Simon Riggs wrote: On Sun, 2010-05-16 at 16:53 +0100, Simon Riggs wrote: Attached patch rearranges the walsender loops slightly to fix the above. XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE / 2) in one round and returns to the main loop after that

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-18 Thread Heikki Linnakangas
On 17/05/10 12:36, Jim Nasby wrote: On May 15, 2010, at 12:05 PM, Heikki Linnakangas wrote: What exactly is the user trying to monitor? If it's how far behind is the standby, the difference between pg_current_xlog_insert_location() in the master and pg_last_xlog_replay_location() in the standby

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-18 Thread Simon Riggs
On Tue, 2010-05-18 at 17:06 -0400, Heikki Linnakangas wrote: On 17/05/10 04:40, Simon Riggs wrote: On Sun, 2010-05-16 at 16:53 +0100, Simon Riggs wrote: Attached patch rearranges the walsender loops slightly to fix the above. XLogSend() now only sends up to MAX_SEND_SIZE bytes (==

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-18 Thread Simon Riggs
On Tue, 2010-05-18 at 17:08 -0400, Heikki Linnakangas wrote: On 17/05/10 12:36, Jim Nasby wrote: On May 15, 2010, at 12:05 PM, Heikki Linnakangas wrote: What exactly is the user trying to monitor? If it's how far behind is the standby, the difference between

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-18 Thread Heikki Linnakangas
On 18/05/10 17:17, Simon Riggs wrote: There's no reason that the buffer size we use for XLogRead() should be the same as the send buffer, if you're worried about that. My point is that pq_putmessage contains internal flushes so at the libpq level you gain nothing by big batches. The read() will

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-18 Thread Simon Riggs
On Tue, 2010-05-18 at 17:25 -0400, Heikki Linnakangas wrote: On 18/05/10 17:17, Simon Riggs wrote: There's no reason that the buffer size we use for XLogRead() should be the same as the send buffer, if you're worried about that. My point is that pq_putmessage contains internal flushes so at

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-17 Thread Simon Riggs
On Mon, 2010-05-17 at 11:51 +0900, Fujii Masao wrote: Is it OK that this keepalive message cannot be used by HS in file-based log-shipping? Even in SR, the startup process cannot use the keepalive until walreceiver has been started up. Yes, I see those things. We already have

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-17 Thread Simon Riggs
On Sun, 2010-05-16 at 16:53 +0100, Simon Riggs wrote: Attached patch rearranges the walsender loops slightly to fix the above. XLogSend() now only sends up to MAX_SEND_SIZE bytes (== XLOG_SEG_SIZE / 2) in one round and returns to the main loop after that even if there's unsent WAL, and

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-17 Thread Jim Nasby
On May 15, 2010, at 12:05 PM, Heikki Linnakangas wrote: What exactly is the user trying to monitor? If it's how far behind is the standby, the difference between pg_current_xlog_insert_location() in the master and pg_last_xlog_replay_location() in the standby seems more robust and well-defined

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-16 Thread Simon Riggs
On Sun, 2010-05-16 at 00:05 +0300, Heikki Linnakangas wrote: Heikki Linnakangas wrote: Simon Riggs wrote: WALSender sleeps even when it might have more WAL to send, it doesn't check it just unconditionally sleeps. At least WALReceiver loops until it has no more to receive. I just can't

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-16 Thread Simon Riggs
On Sat, 2010-05-15 at 19:50 +0100, Simon Riggs wrote: On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote: I will recode using that concept. Startup gets new pointer when it runs out of data to replay. That might or might not include an updated keepalive timestamp, since there's no exact

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-16 Thread Fujii Masao
On Sun, May 16, 2010 at 6:05 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Heikki Linnakangas wrote: Simon Riggs wrote: WALSender sleeps even when it might have more WAL to send, it doesn't check it just unconditionally sleeps. At least WALReceiver loops until it has no

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-16 Thread Fujii Masao
On Mon, May 17, 2010 at 1:11 AM, Simon Riggs si...@2ndquadrant.com wrote: On Sat, 2010-05-15 at 19:50 +0100, Simon Riggs wrote: On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote: I will recode using that concept. Startup gets new pointer when it runs out of data to replay. That might or

[HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Simon Riggs
Patch adds a keepalive message to ensure max_standby_delay is useful. No WAL format changes, no libpq changes. Just an additional message type for the streaming replication protocol, sent once per main loop in WALsender. Plus docs. Comments? -- Simon Riggs www.2ndQuadrant.com diff

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes: Patch adds a keepalive message to ensure max_standby_delay is useful. The proposed placement of the keepalive-send is about the worst it could possibly be. It needs to be done right before pq_flush to ensure minimum transfer delay. Otherwise any

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Simon Riggs
On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote: Simon Riggs si...@2ndquadrant.com writes: Patch adds a keepalive message to ensure max_standby_delay is useful. The proposed placement of the keepalive-send is about the worst it could possibly be. It needs to be done right before pq_flush

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Heikki Linnakangas
Simon Riggs wrote: On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote: I'm also extremely dubious that it's a good idea to set recoveryLastXTime from this. Using both that and the timestamps from the wal log flies in the face of everything I remember about control theory. We should be doing

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Simon Riggs
On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote: Simon Riggs wrote: On Sat, 2010-05-15 at 11:45 -0400, Tom Lane wrote: I'm also extremely dubious that it's a good idea to set recoveryLastXTime from this. Using both that and the timestamps from the wal log flies in the face of

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Heikki Linnakangas
Simon Riggs wrote: On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote: Doesn't feel right to me either. If you want to expose the keepalive-time to queries, it should be a separate field, something like lastMasterKeepaliveTime and a pg_last_master_keepalive() function to read it.

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Simon Riggs
On Sat, 2010-05-15 at 20:05 +0300, Heikki Linnakangas wrote: Simon Riggs wrote: On Sat, 2010-05-15 at 19:30 +0300, Heikki Linnakangas wrote: Doesn't feel right to me either. If you want to expose the keepalive-time to queries, it should be a separate field, something like

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Simon Riggs
On Sat, 2010-05-15 at 18:24 +0100, Simon Riggs wrote: I will recode using that concept. There's some behaviours that aren't helpful here: Startup gets new pointer when it runs out of data to replay. That might or might not include an updated keepalive timestamp, since there's no exact

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Heikki Linnakangas
Simon Riggs wrote: WALSender sleeps even when it might have more WAL to send, it doesn't check it just unconditionally sleeps. At least WALReceiver loops until it has no more to receive. I just can't imagine why that's useful behaviour. Good catch. That should be fixed. I also note that

Re: [HACKERS] Keepalive for max_standby_delay

2010-05-15 Thread Heikki Linnakangas
Heikki Linnakangas wrote: Simon Riggs wrote: WALSender sleeps even when it might have more WAL to send, it doesn't check it just unconditionally sleeps. At least WALReceiver loops until it has no more to receive. I just can't imagine why that's useful behaviour. Good catch. That should be