Re: [HACKERS] max_standby_delay considered harmful

2010-05-14 Thread Robert Haas
On Thu, May 13, 2010 at 1:12 PM, Josh Berkus j...@agliodbs.com wrote:
 On 5/12/10 8:07 PM, Robert Haas wrote:
 I think that would be a good thing to check (it'll confirm whether
 this is the same bug), but I'm not convinced we should actually fix it
 that way.  Prior to 8.4, we handled a smart shutdown during recovery
 at the conclusion of recovery, just prior to entering normal running.
 I'm wondering if we shouldn't revert to that behavior in both 8.4 and
 HEAD.

 This would be OK as long as we document it well.  We patched the
 shutdown the way we did specifically because Fujii thought it would be
 an easy fix; if it's complicated, we should revert it and document the
 issue for DBAs.

I don't understand this comment.

 Oh, and to confirm: the same issue exists, and has always existed, with
 Warm Standby.

That's what I was thinking, but I hadn't gotten around to testing it.
Thanks for the confirmation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-14 Thread Josh Berkus

 This would be OK as long as we document it well.  We patched the
 shutdown the way we did specifically because Fujii thought it would be
 an easy fix; if it's complicated, we should revert it and document the
 issue for DBAs.
 
 I don't understand this comment.

In other words, I'm saying that it's not critical that we troubleshoot
this for 9.0.  Revering Fujii's patch, if it's not working, is an option.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-14 Thread Robert Haas
On Fri, May 14, 2010 at 5:51 PM, Josh Berkus j...@agliodbs.com wrote:

 This would be OK as long as we document it well.  We patched the
 shutdown the way we did specifically because Fujii thought it would be
 an easy fix; if it's complicated, we should revert it and document the
 issue for DBAs.

 I don't understand this comment.

 In other words, I'm saying that it's not critical that we troubleshoot
 this for 9.0.  Revering Fujii's patch, if it's not working, is an option.

There is no patch which we could revert to fix this, by Fujii Masao or
anyone else.  The patch he proposed has not been committed.  I am
still studying the problem to try to figure out where to go with it.
We could decide to punt the whole thing for 9.1, but I'd like to
understand what the options are before we make that decision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-13 Thread Josh Berkus
On 5/12/10 8:07 PM, Robert Haas wrote:
 I think that would be a good thing to check (it'll confirm whether
 this is the same bug), but I'm not convinced we should actually fix it
 that way.  Prior to 8.4, we handled a smart shutdown during recovery
 at the conclusion of recovery, just prior to entering normal running.
 I'm wondering if we shouldn't revert to that behavior in both 8.4 and
 HEAD.

This would be OK as long as we document it well.  We patched the
shutdown the way we did specifically because Fujii thought it would be
an easy fix; if it's complicated, we should revert it and document the
issue for DBAs.

Oh, and to confirm: the same issue exists, and has always existed, with
Warm Standby.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Simon Riggs
On Tue, 2010-05-11 at 14:01 +0900, Fujii Masao wrote:
 On Mon, May 10, 2010 at 3:27 PM, Simon Riggs si...@2ndquadrant.com wrote:
  I already explained that killing the startup process first is a bad idea
  for many reasons when shutdown was discussed. Can't remember who added
  the new standby shutdown code recently, but it sounds like their design
  was pretty poor if it didn't include shutting down properly with HS. I
  hope they fix the bug they have introduced. HS was never designed to
  work that way, so there is no flaw there; it certainly worked when
  committed.
 
 New smart shutdown during recovery doesn't kill the startup process until
 all of the read only backends have gone away. So it works fine with HS.

Yes, I thought some more about what Robert said. HS works identically to
normal running in this regard, so there's no hint of a bug or design
flaw on that for either of us to worry about.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Robert Haas
On Wed, May 12, 2010 at 2:50 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, 2010-05-11 at 14:01 +0900, Fujii Masao wrote:
 On Mon, May 10, 2010 at 3:27 PM, Simon Riggs si...@2ndquadrant.com wrote:
  I already explained that killing the startup process first is a bad idea
  for many reasons when shutdown was discussed. Can't remember who added
  the new standby shutdown code recently, but it sounds like their design
  was pretty poor if it didn't include shutting down properly with HS. I
  hope they fix the bug they have introduced. HS was never designed to
  work that way, so there is no flaw there; it certainly worked when
  committed.

 New smart shutdown during recovery doesn't kill the startup process until
 all of the read only backends have gone away. So it works fine with HS.

 Yes, I thought some more about what Robert said. HS works identically to
 normal running in this regard, so there's no hint of a bug or design
 flaw on that for either of us to worry about.

I'm not sure what to make of this.  Sometimes not shutting down
doesn't sound like a feature to me.

http://archives.postgresql.org/pgsql-hackers/2010-05/msg00098.php
http://archives.postgresql.org/pgsql-hackers/2010-05/msg00103.php

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Simon Riggs
On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:

 I'm not sure what to make of this.  Sometimes not shutting down
 doesn't sound like a feature to me.

It acts exactly the same in recovery as in normal running. It is not a
special feature of recovery at all, bug or otherwise.

You may think its a strange feature generally and I would agree. I would
welcome you changing that in 9.1+, as long as your change works in both
recovery and normal running.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Greg Stark
On Wed, May 12, 2010 at 12:26 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:

 I'm not sure what to make of this.  Sometimes not shutting down
 doesn't sound like a feature to me.

 It acts exactly the same in recovery as in normal running. It is not a
 special feature of recovery at all, bug or otherwise.

I admit I've sometimes been surprised that smart shutdown was waiting
when I didn't expect it to.

It would be good to give the shutdown more feedback. If it explicitly
shows Waiting for n sessions with active transactions to commit or
Waiting for n sessions to disconnect then the user would at least
understand why it was waiting and what would be necessary to get it to
continue.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Robert Haas
On Wed, May 12, 2010 at 7:26 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:

 I'm not sure what to make of this.  Sometimes not shutting down
 doesn't sound like a feature to me.

 It acts exactly the same in recovery as in normal running. It is not a
 special feature of recovery at all, bug or otherwise.

Simon, that doesn't make any sense.  We are talking about a backend
getting stuck forever on an exclusive lock that is held by the startup
process and which will never be released (for example, because the
master has shut down and no more WAL can be obtained for replay).  The
startup process does not hold locks in normal operation.

There are other things we might want to change about the shutdown
behavior (for example, switching from smart to fast automatically
after N seconds) which could apply to both the primary and the standby
and which might also be workarounds for this problem, but this
particular issue is specific to Hot Standby mode and pretending
otherwise is just sticking your head in the sand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Simon Riggs
On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote:
 On Wed, May 12, 2010 at 7:26 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:
 
  I'm not sure what to make of this.  Sometimes not shutting down
  doesn't sound like a feature to me.
 
  It acts exactly the same in recovery as in normal running. It is not a
  special feature of recovery at all, bug or otherwise.
 
 Simon, that doesn't make any sense.  We are talking about a backend
 getting stuck forever on an exclusive lock that is held by the startup
 process and which will never be released (for example, because the
 master has shut down and no more WAL can be obtained for replay).  The
 startup process does not hold locks in normal operation.

When I test it, startup process holding a lock does not prevent shutdown
of a standby. 

I'd be happy to see your test case showing a bug exists and that the
behaviour differs from normal running.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Stefan Kaltenbrunner

Simon Riggs wrote:

On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote:

On Wed, May 12, 2010 at 7:26 AM, Simon Riggs si...@2ndquadrant.com wrote:

On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:


I'm not sure what to make of this.  Sometimes not shutting down
doesn't sound like a feature to me.

It acts exactly the same in recovery as in normal running. It is not a
special feature of recovery at all, bug or otherwise.

Simon, that doesn't make any sense.  We are talking about a backend
getting stuck forever on an exclusive lock that is held by the startup
process and which will never be released (for example, because the
master has shut down and no more WAL can be obtained for replay).  The
startup process does not hold locks in normal operation.


When I test it, startup process holding a lock does not prevent shutdown
of a standby. 


I'd be happy to see your test case showing a bug exists and that the
behaviour differs from normal running.


In my testing the postmaster simply does not shut down even with no 
clients connected any more once in a while - most of the time it works 
just fine but in like 1 out of 10 cases it get's stuck - my testcase (as 
detailed in the related thread) is simply doing an interval load on the 
master (pgbench -T 120  sleep 30  pgbench -T 120 - rinse and repeat 
as needed) and pgbench -S  pg_ctl restart  pgbench -S in a lop on 
the standby. once in a while the standby will simply not shut down 
(forever - not only by eceeding the default timeout of pgctl which seems 
to get triggered much more often on the standby than on the master - 
have not looked into that yet in detail)



Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Simon Riggs
On Wed, 2010-05-12 at 16:03 +0200, Stefan Kaltenbrunner wrote:
 Simon Riggs wrote:
  On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote:
  On Wed, May 12, 2010 at 7:26 AM, Simon Riggs si...@2ndquadrant.com wrote:
  On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:
 
  I'm not sure what to make of this.  Sometimes not shutting down
  doesn't sound like a feature to me.
  It acts exactly the same in recovery as in normal running. It is not a
  special feature of recovery at all, bug or otherwise.
  Simon, that doesn't make any sense.  We are talking about a backend
  getting stuck forever on an exclusive lock that is held by the startup
  process and which will never be released (for example, because the
  master has shut down and no more WAL can be obtained for replay).  The
  startup process does not hold locks in normal operation.
  
  When I test it, startup process holding a lock does not prevent shutdown
  of a standby. 
  
  I'd be happy to see your test case showing a bug exists and that the
  behaviour differs from normal running.
 
 In my testing the postmaster simply does not shut down even with no 
 clients connected any more once in a while - most of the time it works 
 just fine but in like 1 out of 10 cases it get's stuck - my testcase (as 
 detailed in the related thread) is simply doing an interval load on the 
 master (pgbench -T 120  sleep 30  pgbench -T 120 - rinse and repeat 
 as needed) and pgbench -S  pg_ctl restart  pgbench -S in a lop on 
 the standby. once in a while the standby will simply not shut down 
 (forever - not only by eceeding the default timeout of pgctl which seems 
 to get triggered much more often on the standby than on the master - 
 have not looked into that yet in detail)

If you could recreate that on a server in debug mode we can see what's
happening. If you can attach to the server and get a back trace that
would help. I've not seen that behaviour at all during testing and if
the issue is sporadic its not likely to help much trying to recreate
myself.

This could be an issue with SR, or an issue with the shutdown code
itself.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Simon Riggs
On Wed, 2010-05-12 at 14:18 +0100, Simon Riggs wrote:
 On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote:
  On Wed, May 12, 2010 at 7:26 AM, Simon Riggs si...@2ndquadrant.com wrote:
   On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:
  
   I'm not sure what to make of this.  Sometimes not shutting down
   doesn't sound like a feature to me.
  
   It acts exactly the same in recovery as in normal running. It is not a
   special feature of recovery at all, bug or otherwise.
  
  Simon, that doesn't make any sense.  We are talking about a backend
  getting stuck forever on an exclusive lock that is held by the startup
  process and which will never be released (for example, because the
  master has shut down and no more WAL can be obtained for replay).  The
  startup process does not hold locks in normal operation.
 
 When I test it, startup process holding a lock does not prevent shutdown
 of a standby. 
 
 I'd be happy to see your test case showing a bug exists and that the
 behaviour differs from normal running.

Let me put this differently: I accept that Stefan has reported a
problem. Neither Tom nor myself can reproduce the problem. I've re-run
Stefan's test case and restarted the server more than 400 times now
without any issue.

I re-read your post where you gave what you yourself called uninformed
speculation. There's no real polite way to say it, but yes your
speculation does appear to be uninformed, since it is incorrect. Reasons
would be not least that Stefan's tests don't actually send any locks to
the standby anyway (!), but even if they did your speculation as to the
cause is still all wrong, as explained.

There is no evidence to link this behaviour with HS, as yet, and you
should be considering the possibility the problem lies elsewhere,
especially since it could be code you committed that is at fault.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Robert Haas
On Wed, May 12, 2010 at 11:28 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-05-12 at 14:18 +0100, Simon Riggs wrote:
 On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote:
  On Wed, May 12, 2010 at 7:26 AM, Simon Riggs si...@2ndquadrant.com wrote:
   On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:
  
   I'm not sure what to make of this.  Sometimes not shutting down
   doesn't sound like a feature to me.
  
   It acts exactly the same in recovery as in normal running. It is not a
   special feature of recovery at all, bug or otherwise.
 
  Simon, that doesn't make any sense.  We are talking about a backend
  getting stuck forever on an exclusive lock that is held by the startup
  process and which will never be released (for example, because the
  master has shut down and no more WAL can be obtained for replay).  The
  startup process does not hold locks in normal operation.

 When I test it, startup process holding a lock does not prevent shutdown
 of a standby.

 I'd be happy to see your test case showing a bug exists and that the
 behaviour differs from normal running.

 Let me put this differently: I accept that Stefan has reported a
 problem. Neither Tom nor myself can reproduce the problem. I've re-run
 Stefan's test case and restarted the server more than 400 times now
 without any issue.

OK, I'm glad to hear you've been testing this.  I wasn't aware of that.

 I re-read your post where you gave what you yourself called uninformed
 speculation. There's no real polite way to say it, but yes your
 speculation does appear to be uninformed, since it is incorrect. Reasons
 would be not least that Stefan's tests don't actually send any locks to
 the standby anyway (!),

Hmm.  Well, assuming you're correct, that does seem to be a, uh,
slight problem with my theory.

 but even if they did your speculation as to the
 cause is still all wrong, as explained.

You lost me.  I don't understand why the problem that I'm referring to
couldn't happen, even if it's not what's happening here.

 There is no evidence to link this behaviour with HS, as yet, and you
 should be considering the possibility the problem lies elsewhere,
 especially since it could be code you committed that is at fault.

Huh?? The evidence that this bug is linked with HS is that it occurs
on a server running in HS mode, and not otherwise.  As for whether the
bug is code I committed, that's certainly possible, but keep in mind
it didn't work at all before IN HOT STANDBY MODE - and that will be
code you committed.

I'm going to go test this and see if I can figure out what's going on.
 I hope you will keep at it also - as you point out, your knowledge of
this code far exceeds mine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Simon Riggs
On Wed, 2010-05-12 at 12:04 -0400, Robert Haas wrote:

 Huh?? The evidence that this bug is linked with HS is that it occurs
 on a server running in HS mode, and not otherwise.  As for whether the
 bug is code I committed, that's certainly possible, but keep in mind
 it didn't work at all before IN HOT STANDBY MODE - and that will be
 code you committed.

I'll say it now, so its plain. I'm not going to investigate every bug
that occurs on Postgres, just because someone was in HS when they found
it. Any more than all bugs on Postgres in normal running are MVCC bugs.
There needs to be reasonable evidence or a conjecture by someone that
knows something about the code. If HS were the only thing changed in
recovery in this release, that might not seem reasonable, but since we
have much new code and I am not the only developer, it is.

Normal shutdown didn't work on a standby before HS was committed and it
didn't work afterwards either. Use all the capitals you like but if you
use poor arguments and combine that with no evidence then we'll not get
very far, either in working together or in solving the actual bugs.
Please don't continue to make wild speculations about things related to
HS and recovery, so that issues do not become confused; there is no need
to comment on every thread.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Greg Stark
On Wed, May 12, 2010 at 5:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-05-12 at 12:04 -0400, Robert Haas wrote:

 Huh?? The evidence that this bug is linked with HS is that it occurs
 on a server running in HS mode, and not otherwise.  As for whether the
 bug is code I committed, that's certainly possible, but keep in mind
 it didn't work at all before IN HOT STANDBY MODE - and that will be
 code you committed.

 I'll say it now, so its plain. I'm not going to investigate every bug
 that occurs on Postgres, just because someone was in HS when they found
 it.

Fair enough, though your help debugging is always appreciated
regardless of whether a problem is HS related or not. Nobody's
obligated to work on anything in Postgres after all.

I'm not sure who to blame for the shouting match over whose commit
introduced the bug -- it doesn't seem like a relevant or useful thing
to argue about, please both stop.

  there is no need
 to comment on every thread.

This is out of line.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Simon Riggs
On Wed, 2010-05-12 at 18:05 +0100, Greg Stark wrote:

 I'm not sure who to blame for the shouting match over whose commit
 introduced the bug -- it doesn't seem like a relevant or useful thing
 to argue about, please both stop.

I haven't blamed Robert's code, merely asked him to consider that it is
something other HS, since we have no evidence either way at present
because the issue is sporadic and has not been replicated as yet, with
no specific detail leading to any section of code.

   there is no need
  to comment on every thread.
 
 This is out of line.

Quoted out of context, it is. My full comment is Please don't continue
to make wild speculations about things related to HS and recovery, so
that issues do not become confused; there is no need to comment on every
thread. ... by which I mean threads related to HS and recovery. I
respect everybody's right to free speech here, but I would say the same
to anyone if they do it repeatedly. I'm not the first to make such a
comment on hackers either.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Joshua D. Drake
On Wed, 2010-05-12 at 17:49 +0100, Simon Riggs wrote:
 On Wed, 2010-05-12 at 12:04 -0400, Robert Haas wrote:

 Normal shutdown didn't work on a standby before HS was committed and it
 didn't work afterwards either. Use all the capitals you like but if you
 use poor arguments and combine that with no evidence then we'll not get
 very far, either in working together or in solving the actual bugs.
 Please don't continue to make wild speculations about things related to
 HS and recovery, so that issues do not become confused; there is no need
 to comment on every thread.
 

Simon,

People are very passionate about this feature. This feature has the
ability to show us as moving forward in a fashion that will allow us to
directly compete with the big boys in the big installs, although we
are still probably 2-3 releases from that.

It also has the ability to make us look like a bunch of yahoos (no pun
intended) who are better served beating up on that database that Oracle
just bought, versus Oracle itself.

Patience is a virtue for all when it comes to the this feature.

Joshua D. Drake


-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Robert Haas
On Wed, May 12, 2010 at 1:21 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-05-12 at 18:05 +0100, Greg Stark wrote:

 I'm not sure who to blame for the shouting match over whose commit
 introduced the bug -- it doesn't seem like a relevant or useful thing
 to argue about, please both stop.

 I haven't blamed Robert's code, merely asked him to consider that it is
 something other HS, since we have no evidence either way at present
 because the issue is sporadic and has not been replicated as yet, with
 no specific detail leading to any section of code.

I'm not really sure what we're arguing about here.  I feel like I'm
being accused either of (a) introducing the bug (which is possible) or
(b) saying that Simon introduced the bug (which presumably is also
possible, although it's not really my point).  I ventured an
uninformed guess at what the problem might be; Simon thinks my guess
is wrong, and it may well be: but either way there's a bug buried in
here somewhere and it would be nice to fix it.  I thought that it
would be a good idea for Simon to look at it because, on the surface,
it APPEARS to have something to do with Hot Standby, since that's what
Stefan was testing when he found it.  Sure, the investigation might
lead somewhere else; I completely admit that.

Now, Simon just said he HAS looked at it and can't reproduce the
problem.  So now I'm even less sure what we're arguing about.  I'm
glad he looked at it.  It's interesting that he wasn't able to
reproduce the problem.  I hope that he or someone else will find
something that helps us move forward.  I am having difficulty
reproducing Stefan's test environment and perhaps for that reason I
can't reproduce it either, though I've encountered several other
problems about which, I suppose, I will post separate emails.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Stefan Kaltenbrunner

On 05/12/2010 05:28 PM, Simon Riggs wrote:

On Wed, 2010-05-12 at 14:18 +0100, Simon Riggs wrote:

On Wed, 2010-05-12 at 08:52 -0400, Robert Haas wrote:

On Wed, May 12, 2010 at 7:26 AM, Simon Riggssi...@2ndquadrant.com  wrote:

On Wed, 2010-05-12 at 07:10 -0400, Robert Haas wrote:


I'm not sure what to make of this.  Sometimes not shutting down
doesn't sound like a feature to me.


It acts exactly the same in recovery as in normal running. It is not a
special feature of recovery at all, bug or otherwise.


Simon, that doesn't make any sense.  We are talking about a backend
getting stuck forever on an exclusive lock that is held by the startup
process and which will never be released (for example, because the
master has shut down and no more WAL can be obtained for replay).  The
startup process does not hold locks in normal operation.


When I test it, startup process holding a lock does not prevent shutdown
of a standby.

I'd be happy to see your test case showing a bug exists and that the
behaviour differs from normal running.


Let me put this differently: I accept that Stefan has reported a
problem. Neither Tom nor myself can reproduce the problem. I've re-run
Stefan's test case and restarted the server more than 400 times now
without any issue.

I re-read your post where you gave what you yourself called uninformed
speculation. There's no real polite way to say it, but yes your
speculation does appear to be uninformed, since it is incorrect. Reasons
would be not least that Stefan's tests don't actually send any locks to
the standby anyway (!), but even if they did your speculation as to the
cause is still all wrong, as explained.

There is no evidence to link this behaviour with HS, as yet, and you
should be considering the possibility the problem lies elsewhere,
especially since it could be code you committed that is at fault.


Well I'm not sure why people seem to have that hard a time reproducing 
that issue - it seems that I can provoke it really trivially(in this 
case no loops, no pgbench, no tricks). A few minutes ago I logged into 
my test standby (which is idle except for the odd connect to template1 
caused by nagios - the master is idle as well and has been for days):


postg...@soldata005:~$ psql
psql (9.0beta1)
Type help for help.

postgres=# select 1;
 ?column?
--
1
(1 row)

postgres=# \q
postg...@soldata005:~$ pg_ctl -D /var/lib/postgresql/9.0b1/main/ restart
waiting for server to shut down done
server stopped
server starting
postg...@soldata005:~$ pg_ctl -D /var/lib/postgresql/9.0b1/main/ restart
waiting for server to shut down done
server stopped
server starting
postg...@soldata005:~$ pg_ctl -D /var/lib/postgresql/9.0b1/main/ restart
waiting for server to shut 
down... failed

pg_ctl: server does not shut down


the server log for that is as follows:

2010-05-12 20:36:18.166 CEST,,, LOG:  received smart shutdown request
2010-05-12 20:36:18.167 CEST,,, FATAL:  terminating walreceiver 
process due to administrator command

2010-05-12 20:36:18.174 CEST,,, LOG:  shutting down
2010-05-12 20:36:18.251 CEST,,, LOG:  database system is shut down
2010-05-12 20:36:19.706 CEST,,, LOG:  database system was interrupted 
while in recovery at log time 2010-05-06 17:36:05 CEST
2010-05-12 20:36:19.706 CEST,,, HINT:  If this has occurred more than 
once some data might be corrupted and you might need to choose an 
earlier recovery target.

2010-05-12 20:36:19.706 CEST,,, LOG:  entering standby mode
2010-05-12 20:36:19.721 CEST,,, LOG:  consistent recovery state 
reached at 1/1278

2010-05-12 20:36:19.721 CEST,,, LOG:  invalid record length at 1/1278
2010-05-12 20:36:19.723 CEST,,, LOG:  database system is ready to 
accept read only connections
2010-05-12 20:36:19.737 CEST,,, LOG:  streaming replication 
successfully connected to primary

2010-05-12 20:36:19.918 CEST,,, LOG:  received smart shutdown request
2010-05-12 20:36:19.919 CEST,,, FATAL:  terminating walreceiver 
process due to administrator command

2010-05-12 20:36:19.922 CEST,,, LOG:  shutting down
2010-05-12 20:36:19.937 CEST,,, LOG:  database system is shut down
2010-05-12 20:36:21.433 CEST,,, LOG:  database system was interrupted 
while in recovery at log time 2010-05-06 17:36:05 CEST
2010-05-12 20:36:21.433 CEST,,, HINT:  If this has occurred more than 
once some data might be corrupted and you might need to choose an 
earlier recovery target.

2010-05-12 20:36:21.433 CEST,,, LOG:  entering standby mode
2010-05-12 20:36:21.482 CEST,,, LOG:  received smart shutdown request
2010-05-12 20:36:21.504 CEST,,, LOG:  consistent recovery state 
reached at 1/1278

2010-05-12 20:36:21.504 CEST,,, LOG:  invalid record length at 1/1278
2010-05-12 20:36:21.505 CEST,,, LOG:  database system is ready to 
accept read only connections
2010-05-12 20:36:21.516 CEST,,, LOG:  streaming replication 
successfully connected to primary


so it restarted two times 

Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Simon Riggs
On Wed, 2010-05-12 at 21:10 +0200, Stefan Kaltenbrunner wrote:

  There is no evidence to link this behaviour with HS, as yet, and you
  should be considering the possibility the problem lies elsewhere,
  especially since it could be code you committed that is at fault.
 
 Well I'm not sure why people seem to have that hard a time reproducing 
 that issue - it seems that I can provoke it really trivially(in this 
 case no loops, no pgbench, no tricks). A few minutes ago I logged into 
 my test standby (which is idle except for the odd connect to template1 
 caused by nagios - the master is idle as well and has been for days):

Thanks, good report.

 so it restarted two times successfully - however if one looks at the 
 third time one can see that it received the smart shutdown request 
 BEFORE it reached a consistent recovery state - yet it continued to 
 enable HS and reenabled SR as well.
 
 The database is now sitting there doing nothing and it more or less 
 broken because you cannot connect to it in the current state:
 
 ~$ psql
 psql: FATAL:  the database system is shutting down
 
 the startup process has the following backtrace:
 
 (gdb) bt
 #0  0x7fbe24cb2c83 in select () from /lib/libc.so.6
 #1  0x006e811a in pg_usleep ()
 #2  0x0048c333 in XLogPageRead ()
 #3  0x0048c967 in ReadRecord ()
 #4  0x00493ab6 in StartupXLOG ()
 #5  0x00495a88 in StartupProcessMain ()
 #6  0x004ab25e in AuxiliaryProcessMain ()
 #7  0x005d4a7d in StartChildProcess ()
 #8  0x005d70c2 in PostmasterMain ()
 #9  0x0057d898 in main ()

Well, its waiting for new info from primary. Nothing to do with locking,
but that's not an indication that its an SR issue though either. ;-)

I'll put some waits into that part of the code and see if I can induce
the failure. Maybe its just a simple lack of a CHECK_FOR_INTERRUPTS().

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Alvaro Herrera
Excerpts from Stefan Kaltenbrunner's message of mié may 12 15:10:28 -0400 2010:

 the startup process has the following backtrace:
 
 (gdb) bt
 #0  0x7fbe24cb2c83 in select () from /lib/libc.so.6
 #1  0x006e811a in pg_usleep ()
 #2  0x0048c333 in XLogPageRead ()
 #3  0x0048c967 in ReadRecord ()
 #4  0x00493ab6 in StartupXLOG ()
 #5  0x00495a88 in StartupProcessMain ()
 #6  0x004ab25e in AuxiliaryProcessMain ()
 #7  0x005d4a7d in StartChildProcess ()
 #8  0x005d70c2 in PostmasterMain ()
 #9  0x0057d898 in main ()

I just noticed that we have some code assigning the return value of
time() to a pg_time_t variable.  Is this supposed to work reliably?
(xlog.c lines 9267ff)
-- 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Simon Riggs
On Wed, 2010-05-12 at 15:36 -0400, Alvaro Herrera wrote:
 I just noticed that we have some code assigning the return value of
 time() to a pg_time_t variable.  Is this supposed to work reliably?
 (xlog.c lines 9267ff)

Code's used that for a while now. Checkpoints and everywhere.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Robert Haas
On Wed, May 12, 2010 at 3:36 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 Excerpts from Stefan Kaltenbrunner's message of mié may 12 15:10:28 -0400 
 2010:

 the startup process has the following backtrace:

 (gdb) bt
 #0  0x7fbe24cb2c83 in select () from /lib/libc.so.6
 #1  0x006e811a in pg_usleep ()
 #2  0x0048c333 in XLogPageRead ()
 #3  0x0048c967 in ReadRecord ()
 #4  0x00493ab6 in StartupXLOG ()
 #5  0x00495a88 in StartupProcessMain ()
 #6  0x004ab25e in AuxiliaryProcessMain ()
 #7  0x005d4a7d in StartChildProcess ()
 #8  0x005d70c2 in PostmasterMain ()
 #9  0x0057d898 in main ()

 I just noticed that we have some code assigning the return value of
 time() to a pg_time_t variable.  Is this supposed to work reliably?
 (xlog.c lines 9267ff)

I'

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Robert Haas
On Wed, May 12, 2010 at 3:51 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, May 12, 2010 at 3:36 PM, Alvaro Herrera alvhe...@alvh.no-ip.org 
 wrote:
 Excerpts from Stefan Kaltenbrunner's message of mié may 12 15:10:28 -0400 
 2010:

 the startup process has the following backtrace:

 (gdb) bt
 #0  0x7fbe24cb2c83 in select () from /lib/libc.so.6
 #1  0x006e811a in pg_usleep ()
 #2  0x0048c333 in XLogPageRead ()
 #3  0x0048c967 in ReadRecord ()
 #4  0x00493ab6 in StartupXLOG ()
 #5  0x00495a88 in StartupProcessMain ()
 #6  0x004ab25e in AuxiliaryProcessMain ()
 #7  0x005d4a7d in StartChildProcess ()
 #8  0x005d70c2 in PostmasterMain ()
 #9  0x0057d898 in main ()

 I just noticed that we have some code assigning the return value of
 time() to a pg_time_t variable.  Is this supposed to work reliably?
 (xlog.c lines 9267ff)

 I'

I have a love-hate relationship with GMail, sorry.

I am wondering if we are not correctly handling the case where we get
a shutdown request while we are still in the PM_STARTUP state.  It
looks like we might go ahead and switch to PM_RECOVERY and then
PM_RECOVERY_CONSISTENT without noticing the shutdown.  There is some
logic to handle the shutdown when the startup process exits, but if
the startup process never exits it looks like we might get stuck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Simon Riggs
On Wed, 2010-05-12 at 14:43 -0400, Robert Haas wrote:

 I thought that it
 would be a good idea for Simon to look at it because, on the surface,
 it APPEARS to have something to do with Hot Standby, since that's what
 Stefan was testing when he found it.

He was also testing SR, yet you haven't breathed a word about that for
some strange reason. It didn't APPEAR like it was HS at all, not from
basic logic or from technical knowledge. So you'll have to forgive me if
I don't leap into action when you say something is an HS problem in the
future.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Josh Berkus
Simon, Robert,

 He was also testing SR, yet you haven't breathed a word about that for
 some strange reason. It didn't APPEAR like it was HS at all, not from
 basic logic or from technical knowledge. So you'll have to forgive me if
 I don't leap into action when you say something is an HS problem in the
 future.

Can we please chill out on this some?  Especially since we now have an
actual reproduceable bug?

Simon, it's natural for people to come to you because you are
knowledgeable and responsive.  You should take it as a compliment.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Joshua D. Drake
On Wed, 2010-05-12 at 22:34 +0100, Simon Riggs wrote:
 On Wed, 2010-05-12 at 14:43 -0400, Robert Haas wrote:
 
  I thought that it
  would be a good idea for Simon to look at it because, on the surface,
  it APPEARS to have something to do with Hot Standby, since that's what
  Stefan was testing when he found it.
 
 He was also testing SR, yet you haven't breathed a word about that for
 some strange reason. It didn't APPEAR like it was HS at all, not from
 basic logic or from technical knowledge. So you'll have to forgive me if
 I don't leap into action when you say something is an HS problem in the
 future.

Simon, with respect -- knock it off. 

Robert gave a very reasonable response. He is just trying to help. Relax
man.

Joshua Drake



-- 
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Robert Haas
On Wed, May 12, 2010 at 5:34 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-05-12 at 14:43 -0400, Robert Haas wrote:

 I thought that it
 would be a good idea for Simon to look at it because, on the surface,
 it APPEARS to have something to do with Hot Standby, since that's what
 Stefan was testing when he found it.

 He was also testing SR, yet you haven't breathed a word about that for
 some strange reason. It didn't APPEAR like it was HS at all, not from
 basic logic or from technical knowledge. So you'll have to forgive me if
 I don't leap into action when you say something is an HS problem in the
 future.

Well, the original subject line of the report had mentioned SR only,
but I had a specific theory about what might be happening that was
related to the operation of HS.  You've said that you think my guess
is incorrect, and that's very possible, but until we actually find and
fix the bug we're all just guessing.  I wasn't intending to cast
aspersions on your code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Fujii Masao
On Thu, May 13, 2010 at 4:55 AM, Robert Haas robertmh...@gmail.com wrote:
 I am wondering if we are not correctly handling the case where we get
 a shutdown request while we are still in the PM_STARTUP state.  It
 looks like we might go ahead and switch to PM_RECOVERY and then
 PM_RECOVERY_CONSISTENT without noticing the shutdown.  There is some
 logic to handle the shutdown when the startup process exits, but if
 the startup process never exits it looks like we might get stuck.

Right. I reported this problem and submitted the patch before.
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00592.php

Stefan,
Could you check whether the patch fixes the problem you encountered?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-12 Thread Robert Haas
On Wed, May 12, 2010 at 10:46 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, May 13, 2010 at 4:55 AM, Robert Haas robertmh...@gmail.com wrote:
 I am wondering if we are not correctly handling the case where we get
 a shutdown request while we are still in the PM_STARTUP state.  It
 looks like we might go ahead and switch to PM_RECOVERY and then
 PM_RECOVERY_CONSISTENT without noticing the shutdown.  There is some
 logic to handle the shutdown when the startup process exits, but if
 the startup process never exits it looks like we might get stuck.

 Right. I reported this problem and submitted the patch before.
 http://archives.postgresql.org/pgsql-hackers/2010-04/msg00592.php

Sorry we missed that.

 Stefan,
 Could you check whether the patch fixes the problem you encountered?

I think that would be a good thing to check (it'll confirm whether
this is the same bug), but I'm not convinced we should actually fix it
that way.  Prior to 8.4, we handled a smart shutdown during recovery
at the conclusion of recovery, just prior to entering normal running.
I'm wondering if we shouldn't revert to that behavior in both 8.4 and
HEAD.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Simon Riggs
On Sun, 2010-05-09 at 20:56 -0400, Robert Haas wrote:

   Seems like it could take FOREVER on a busy system.  Surely that's not
   OK.  The fact that Hot Standby has to take exclusive locks that can't
   be released until WAL replay has progressed to a certain point seems
   like a fairly serious wart.
 
  If this is a serious wart then it's not one of hot standby, but one of
  postgres proper. AccessExclusiveLocks (SELECT-blocking locks that is, as
  opposed to UPDATE/DELETE-blocking locks) are never necessary from a
  correctness POV, they're only there for implementation reasons.
 
  Getting rid of them doesn't seem completely insurmountable either - just as
  multiple row versions remove the need to block SELECTs dues to concurrent
  UPDATEs, multiple datafile versions could remove the need to block SELECTs
  due to concurrent ALTERs. But people seem to live with them quite well,
  judged from the amount of work put into getting rid of them (zero). I
  therefore fail to see why they should pose a significant problem in HS
  setups.
  The difference is that in HS you have to wait for a moment where *no 
  exclusive
  lock at all* exist, possibly without contending for any of them, while on 
  the
  master you might not even blocked by the existence of any of those locks.
 
  If you have two sessions which in overlapping transactions lock different
  tables exlusively you have no problem shutting the master down, but you will
  never reach a point where no exclusive lock is taken on the slave.
 
 A possible solution to this in the shutdown case is to kill anyone
 waiting on a lock held by the startup process at the same time we kill
 the startup process, and to kill anyone who subsequently waits for
 such a lock as soon as they attempt to take it.  

I already explained that killing the startup process first is a bad idea
for many reasons when shutdown was discussed. Can't remember who added
the new standby shutdown code recently, but it sounds like their design
was pretty poor if it didn't include shutting down properly with HS. I
hope they fix the bug they have introduced. HS was never designed to
work that way, so there is no flaw there; it certainly worked when
committed.

 I'm not sure if this
 would also make sense in the pause case.

Not sure why pausing replay would make any difference at all. Being
between one WAL record and the next is a valid and normal state that
exists many thousands of times per second. If making that state longer
would cause problems we would already have seen any issues. There are
none, it will work fine.

 Another possible solution would be to try to figure out if there's a
 way to delay application of WAL that requires the taking of AELs to
 the point where we could apply it all at once.  That might not be
 feasible, though, or only in some cases, and it's certainly 9.1
 material (at least) in any case.

Locks usually protect users from accessing a table while its being
clustered or dropped or something like that. Locks are not bad. They are
also used by some developers to specifically serialize access to an
object. AccessExclusiveLocks are rare in normal running and not to be
avoided when they do exist. HS correctly supports locking, as and when
such locks are made on the master.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
 On Sun, May 9, 2010 at 6:58 PM, Andres Freund and...@anarazel.de wrote:
 The difference is that in HS you have to wait for a moment where *no 
 exclusive
 lock at all* exist, possibly without contending for any of them, while on the
 master you might not even blocked by the existence of any of those locks.

 If you have two sessions which in overlapping transactions lock different
 tables exlusively you have no problem shutting the master down, but you will
 never reach a point where no exclusive lock is taken on the slave.

 A possible solution to this in the shutdown case is to kill anyone
 waiting on a lock held by the startup process at the same time we kill
 the startup process, and to kill anyone who subsequently waits for
 such a lock as soon as they attempt to take it.  I'm not sure if this
 would also make sense in the pause case.

Well, wait, I'm getting lost here. It seems to me that no query on the
slave is granted to take AEL, not matter what. The only case is a query
waiting for the replay to release its locks. 

The only consequence of pause not waiting for any lock to get released
from the replay is that those backends will be, well, paused. But that
applies the same to any backend started after we pause.

Waiting for replay to release all its locks before to pause would mean
that there's a possibility that the activity on the master is such that
you never reach a pause in the WAL stream. Let's assume we want any new
code we throw in at this stage to be a magic wand making every use happy
at once.

So we'd need a pause function taking either 1 or 2 arguments, first is
to say we pause now even if we know the replay is holding some locks
that might pause the reporting queries too, the other is to wait until
the locks are not held anymore, with a timeout (default 1min?).

Ok, that's designing the API we're missing, and we should not be in the
process of doing any design at this stage. But we are.

 [good summary of current positions]
 I can't presume to extract a consensus from that; I don't think there
 is one.

All we know for sure is that Tom does not want to release as-is, and he
rightfully insists on several objectives as far as the editing is
concerned:
 - no addition of code we might want to throw away later
 - avoid having to deprecate released behavior, it's too hard
 - minimal change set, possibly with no new features.

One more, pausing the replay is *already* in the code base, it's exactly
what happens under the hood if you favor queries rather than replay, to
the point I don't understand why the pause design needs to happen
now. We're only talking about having an *explicit* version of it.

Regards,
-- 
dim

I too am growing tired of insisting this much. I only continue because I
really can't get to understand why-o-why considering a new API over
existing feature is not possible at this stage. I'm hitting my head on
the wal, so to say…

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Heikki Linnakangas
Robert Haas wrote:
 On Sun, May 9, 2010 at 6:58 PM, Andres Freund and...@anarazel.de wrote:
 On Monday 10 May 2010 00:25:44 Florian Pflug wrote:
 On May 9, 2010, at 22:01 , Robert Haas wrote:
 On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine dfonta...@hi-media.com
 wrote:
 Seems like it could take FOREVER on a busy system.  Surely that's not
 OK.  The fact that Hot Standby has to take exclusive locks that can't
 be released until WAL replay has progressed to a certain point seems
 like a fairly serious wart.
 If this is a serious wart then it's not one of hot standby, but one of
 postgres proper. AccessExclusiveLocks (SELECT-blocking locks that is, as
 opposed to UPDATE/DELETE-blocking locks) are never necessary from a
 correctness POV, they're only there for implementation reasons.

 Getting rid of them doesn't seem completely insurmountable either - just as
 multiple row versions remove the need to block SELECTs dues to concurrent
 UPDATEs, multiple datafile versions could remove the need to block SELECTs
 due to concurrent ALTERs. But people seem to live with them quite well,
 judged from the amount of work put into getting rid of them (zero). I
 therefore fail to see why they should pose a significant problem in HS
 setups.
 The difference is that in HS you have to wait for a moment where *no 
 exclusive
 lock at all* exist, possibly without contending for any of them, while on the
 master you might not even blocked by the existence of any of those locks.

 If you have two sessions which in overlapping transactions lock different
 tables exlusively you have no problem shutting the master down, but you will
 never reach a point where no exclusive lock is taken on the slave.
 
 A possible solution to this in the shutdown case is to kill anyone
 waiting on a lock held by the startup process at the same time we kill
 the startup process, and to kill anyone who subsequently waits for
 such a lock as soon as they attempt to take it.

If you're not going to apply any more WAL records before shutdown, you
could also just release all the AccessExclusiveLocks held by the startup
process. Whatever the transaction was doing with the locked relation, if
we're not going to replay any more WAL records before shutdown, we will
not see the transaction committing or doing anything else with the
relation, so we should be safe. Whatever state the data on disk is in,
it must be valid, or we would have a problem with crash recovery
recovering up to this WAL record and then starting up too.

I'm not 100% clear if that reasoning applies to AccessExclusiveLocks
taken explicitly with LOCK TABLE. It's not clear what the application
would use the lock for.

Nevertheless, maybe killing the transactions that wait for the locks
would be more intuitive anyway.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Heikki Linnakangas
Robert Haas wrote:
 On Thu, May 6, 2010 at 2:47 PM, Josh Berkus j...@agliodbs.com wrote:
 Now that I've realized what the real problem is with max_standby_delay
 (namely, that inactivity on the master can use up the delay), I think
 we should do what Tom originally suggested here.  It's not as good as
 a really working max_standby_delay, but we're not going to have that
 for 9.0, and it's clearly better than a boolean.
 I guess I'm not clear on how what Tom proposed is fundamentally
 different from max_standby_delay = -1.  If there's enough concurrent
 queries, recovery would never catch up.
 
 If your workload is that the standby server is getting pounded with
 queries like crazy, then it's probably not that different: it will
 fall progressively further behind.  But I suspect many people will set
 up standby servers where most of the activity happens on the primary,
 but they run some reporting queries on the standby.  If you expect
 your reporting queries to finish in 10s, you could set the max delay
 to say 60s.  In the event that something gets wedged, recovery will
 eventually kill it and move on rather than just getting stuck forever.
  If the volume of queries is known not to be too high, it's reasonable
 to expect that a few good whacks will be enough to get things back on
 track.

Yeah, I could live with that.

A problem with using the name max_standby_delay for Tom's suggestion
is that it sounds like a hard limit, which it isn't. But if we name it
something like:

# -1 = no timeout
# 0 = kill conflicting queries immediately
#  0 wait for N seconds, then kill query
standby_conflict_timeout = -1

it's more clear that the setting is a timeout for each *conflict*, and
it's less surprising that the standby can fall indefinitely behind in
the worst case. If we name the setting along those lines, I could live
with that.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Florian Pflug
On May 10, 2010, at 11:43 , Heikki Linnakangas wrote:
 If you're not going to apply any more WAL records before shutdown, you
 could also just release all the AccessExclusiveLocks held by the startup
 process. Whatever the transaction was doing with the locked relation, if
 we're not going to replay any more WAL records before shutdown, we will
 not see the transaction committing or doing anything else with the
 relation, so we should be safe. Whatever state the data on disk is in,
 it must be valid, or we would have a problem with crash recovery
 recovering up to this WAL record and then starting up too.

Sounds plausible. But wouldn't this imply that HS could *always* postpone the 
acquisition of an AccessExclusiveLocks until right before the corresponding 
commit record is replayed? If fail to see a case where this would fail, yet 
recovery in case of an intermediate crash would be correct.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Aidan Van Dyk
* Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100510 06:03]:
 
 A problem with using the name max_standby_delay for Tom's suggestion
 is that it sounds like a hard limit, which it isn't. But if we name it
 something like:

I'ld still rather an if your killing something, make sure you kill
enough to get all the way current behaviour, but that's just me

I'm want to run my standbys in a always current mode... But if I decide
to play with a lagged HR, I really want to make sure there is some
mechanism to cap the lag, and the cap is something I can understand
and use to make a reasonable estimate as to when data I know is live on
the primary will be seen on the standby...

bonus points if it works similarly for archive recovery ;-)

a.


-- 
Aidan Van Dyk Create like a god,
ai...@highrise.ca   command like a king,
http://www.highrise.ca/   work like a slave.


signature.asc
Description: Digital signature


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Robert Haas
On Mon, May 10, 2010 at 2:27 AM, Simon Riggs si...@2ndquadrant.com wrote:
 I already explained that killing the startup process first is a bad idea
 for many reasons when shutdown was discussed. Can't remember who added
 the new standby shutdown code recently, but it sounds like their design
 was pretty poor if it didn't include shutting down properly with HS. I
 hope they fix the bug they have introduced. HS was never designed to
 work that way, so there is no flaw there; it certainly worked when
 committed.

The patch was written by Fujii Masao and committed, after review, by
me.  Prior to that patch, smart shutdown never worked; now it works,
or so I believe, unless recovery is stalled holding a lock upon which
a regular back-end is blocking.  Clearly that is both better and not
all that good.  If you have any ideas to improve the situation
further, I'm all ears.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Robert Haas
On Mon, May 10, 2010 at 6:03 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Yeah, I could live with that.

 A problem with using the name max_standby_delay for Tom's suggestion
 is that it sounds like a hard limit, which it isn't. But if we name it
 something like:

 # -1 = no timeout
 # 0 = kill conflicting queries immediately
 #  0 wait for N seconds, then kill query
 standby_conflict_timeout = -1

 it's more clear that the setting is a timeout for each *conflict*, and
 it's less surprising that the standby can fall indefinitely behind in
 the worst case. If we name the setting along those lines, I could live
 with that.

Yeah, if we do it that way, +1 for changing the name, and your
suggestion seems good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Robert Haas
On Mon, May 10, 2010 at 6:13 AM, Florian Pflug f...@phlo.org wrote:
 On May 10, 2010, at 11:43 , Heikki Linnakangas wrote:
 If you're not going to apply any more WAL records before shutdown, you
 could also just release all the AccessExclusiveLocks held by the startup
 process. Whatever the transaction was doing with the locked relation, if
 we're not going to replay any more WAL records before shutdown, we will
 not see the transaction committing or doing anything else with the
 relation, so we should be safe. Whatever state the data on disk is in,
 it must be valid, or we would have a problem with crash recovery
 recovering up to this WAL record and then starting up too.

 Sounds plausible. But wouldn't this imply that HS could *always* postpone the 
 acquisition of an AccessExclusiveLocks until right before the corresponding 
 commit record is replayed? If fail to see a case where this would fail, yet 
 recovery in case of an intermediate crash would be correct.

Yeah, I'd like to understand this, too.  I don't have a clear
understanding of when HS needs to take locks here in the first place.

[removing Josh Berkus's persistently bouncing email from the CC line]

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Heikki Linnakangas
Florian Pflug wrote:
 On May 10, 2010, at 11:43 , Heikki Linnakangas wrote:
 If you're not going to apply any more WAL records before shutdown, you
 could also just release all the AccessExclusiveLocks held by the startup
 process. Whatever the transaction was doing with the locked relation, if
 we're not going to replay any more WAL records before shutdown, we will
 not see the transaction committing or doing anything else with the
 relation, so we should be safe. Whatever state the data on disk is in,
 it must be valid, or we would have a problem with crash recovery
 recovering up to this WAL record and then starting up too.
 
 Sounds plausible. But wouldn't this imply that HS could *always* postpone the 
 acquisition of an AccessExclusiveLocks until right before the corresponding 
 commit record is replayed? If fail to see a case where this would fail, yet 
 recovery in case of an intermediate crash would be correct.

I guess it could in some situations, but for example the
AccessExclusiveLock taken at the end of lazy vacuum to truncate the
relation must be held during the truncation, or concurrent readers will
get upset.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Andres Freund
On Monday 10 May 2010 14:00:45 Heikki Linnakangas wrote:
 Florian Pflug wrote:
  On May 10, 2010, at 11:43 , Heikki Linnakangas wrote:
  If you're not going to apply any more WAL records before shutdown, you
  could also just release all the AccessExclusiveLocks held by the startup
  process. Whatever the transaction was doing with the locked relation, if
  we're not going to replay any more WAL records before shutdown, we will
  not see the transaction committing or doing anything else with the
  relation, so we should be safe. Whatever state the data on disk is in,
  it must be valid, or we would have a problem with crash recovery
  recovering up to this WAL record and then starting up too.
  
  Sounds plausible. But wouldn't this imply that HS could *always* postpone
  the acquisition of an AccessExclusiveLocks until right before the
  corresponding commit record is replayed? If fail to see a case where
  this would fail, yet recovery in case of an intermediate crash would be
  correct.
 
 I guess it could in some situations, but for example the
 AccessExclusiveLock taken at the end of lazy vacuum to truncate the
 relation must be held during the truncation, or concurrent readers will
 get upset.
Actually all the locks that do not need to be taken on the slave would not 
need to be an ACCESS EXCLUSIVE but a EXCLUSIVE on the master, right? That 
should be fixed on the master, not hacked up on the slave and is by far out 
of scope of 9.0.
Thats an area where I definitely would like to improve pg in the future...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Bruce Momjian
Simon Riggs wrote:
 Bruce has used the word crippleware for the current state. Raising a
 problem and then blocking solutions is the best way I know to cripple a
 release. It should be clear that I've done my best to avoid this

FYI, it was Robert Haas who used the term crippleware to describe a
boolean value for max_standby_delay, and I was just repeating his term,
and disputing it would be crippleware.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Bruce Momjian
Robert Haas wrote:
 Wultsch (who doesn't ever want to kill queries and therefore would be
 happy with a boolean), Yeb Havinga (who never wants to stall recovery
 and therefore would also be happy with a boolean), and Florian Pflug
 (who points out that pause/resume is actually a nontrivial feature).
 Apologies if I've left anyone out or misrepresented their position.
 
 Overall I would say opinion is about evenly split between:
 
 - leave it as-is
 - make it a Boolean
 - change it in some way but to something more expressive than a Boolean
 
 I can't presume to extract a consensus from that; I don't think there
 is one.  You could say the majority of people want to change
 something and that would be true; you could also say the majority of
 people don't want a Boolean and that would also be true.

Yep, this is where we are.  Discussion had stopped, so it seemed like
time for a decision, and with no one agreeing on what to do, feature
removal seemed like the best approach.  Suggesting we will fix it later
in beta is not a solution.

Now, if everyone agrees we should do X, and X in simple, lets do X, but
I am stil not seeing that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Mike Rylander
On Mon, May 10, 2010 at 6:03 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Robert Haas wrote:
 On Thu, May 6, 2010 at 2:47 PM, Josh Berkus j...@agliodbs.com wrote:
 Now that I've realized what the real problem is with max_standby_delay
 (namely, that inactivity on the master can use up the delay), I think
 we should do what Tom originally suggested here.  It's not as good as
 a really working max_standby_delay, but we're not going to have that
 for 9.0, and it's clearly better than a boolean.
 I guess I'm not clear on how what Tom proposed is fundamentally
 different from max_standby_delay = -1.  If there's enough concurrent
 queries, recovery would never catch up.

 If your workload is that the standby server is getting pounded with
 queries like crazy, then it's probably not that different: it will
 fall progressively further behind.  But I suspect many people will set
 up standby servers where most of the activity happens on the primary,
 but they run some reporting queries on the standby.  If you expect
 your reporting queries to finish in 10s, you could set the max delay
 to say 60s.  In the event that something gets wedged, recovery will
 eventually kill it and move on rather than just getting stuck forever.
  If the volume of queries is known not to be too high, it's reasonable
 to expect that a few good whacks will be enough to get things back on
 track.

 Yeah, I could live with that.

 A problem with using the name max_standby_delay for Tom's suggestion
 is that it sounds like a hard limit, which it isn't. But if we name it
 something like:

 # -1 = no timeout
 # 0 = kill conflicting queries immediately
 #  0 wait for N seconds, then kill query
 standby_conflict_timeout = -1

 it's more clear that the setting is a timeout for each *conflict*, and
 it's less surprising that the standby can fall indefinitely behind in
 the worst case. If we name the setting along those lines, I could live
 with that.

+1 from the peanut gallery.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone:  1-877-OPEN-ILS (673-6457)
 | email:  mi...@esilibrary.com
 | web:  http://www.esilibrary.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Kevin Grittner
Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 
 Overall I would say opinion is about evenly split between:
 
 - leave it as-is
 - make it a Boolean
 - change it in some way but to something more expressive than a
   Boolean
 
I think a boolean would limit the environments in which HS would be
useful.  Personally, I think how far the replica is behind the
source is a more useful metric, even with anomalies on the
transition from idle to active; but a blocking duration would be
much better than no finer control than the boolean.  So my instant
runoff second choice would be for the block duration knob.
 
 time for a decision, and with no one agreeing on what to do,
 feature removal seemed like the best approach.
 
I keep wondering at the assertion that once a GUC is present
(especially a tuning GUC like this) that we're stuck with it.  I
know that's true of SQL code constructs, but postgresql.conf files? 
How about redirect_stderr, max_fsm_*, sort_mem, etc.?  This argument
seems tenuous.
 
 Suggesting we will fix it later in beta is not a solution.
 
I'm with you there, 100%
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Stephen Frost
* Aidan Van Dyk (ai...@highrise.ca) wrote:
 * Heikki Linnakangas heikki.linnakan...@enterprisedb.com [100510 06:03]:
  A problem with using the name max_standby_delay for Tom's suggestion
  is that it sounds like a hard limit, which it isn't. But if we name it
  something like:
 
 I'ld still rather an if your killing something, make sure you kill
 enough to get all the way current behaviour, but that's just me

I agree with that comment, and it's more like what max_standby_delay
was.  That's what I had thought Tom was proposing initially,
since it makes a heck of alot more sense to me than just keep
waiting, just keep waiting...

Now, if it's possible to have things queue up behind the recovery
process, such that the recovery process will only wait up to 
timeout * # of locks held when recovery started, that might be alright,
but that's not the impression I've gotten about how this will work.

Of course, I also want to be able to have a Nagios hook that checks how
far behind the slave has gotten, and a way to tell the slave oook,
you're too far behind, just forcibly catch up right *now*.  If I could
use reload to change max_standby_delay (or whatever) and I can figure
out how long the delay is (even if I have to update a table on the
master and then see what it says on the slave..), I'd be happy.

That being said, I do think it makes more sense to wait until we've got
a conflict to start the timer, and I rather like avoiding the
uncertainty of time sync between master and slave by using WAL arrival
time on the slave.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Bruce Momjian
Kevin Grittner wrote:
 Bruce Momjian br...@momjian.us wrote:
  Robert Haas wrote:
  
  Overall I would say opinion is about evenly split between:
  
  - leave it as-is
  - make it a Boolean
  - change it in some way but to something more expressive than a
Boolean
  
 I think a boolean would limit the environments in which HS would be
 useful.  Personally, I think how far the replica is behind the
 source is a more useful metric, even with anomalies on the
 transition from idle to active; but a blocking duration would be
 much better than no finer control than the boolean.  So my instant
 runoff second choice would be for the block duration knob.
  
  time for a decision, and with no one agreeing on what to do,
  feature removal seemed like the best approach.
  
 I keep wondering at the assertion that once a GUC is present
 (especially a tuning GUC like this) that we're stuck with it.  I
 know that's true of SQL code constructs, but postgresql.conf files? 
 How about redirect_stderr, max_fsm_*, sort_mem, etc.?  This argument
 seems tenuous.

You are right that we are much more flexible about changing
administrative configuration parameters (like this one) than SQL. In the
past, we even renamed logging parameters to be more consistent, and I
think that proves the bar is quite low for GUC administrative parameter
change.  :-)

The concern about 'max_standby_delay' is that it controls a lot of new
code and affects the behavior of HS/SR in ways that might cause a poor
user experience, expecially for non-expert users.  I admit that expert
users can use the setting, but we are coding for a general user base,
and we might have to field many questions about 'max_standby_delay' from
general users that will make us look bad.  The setting is total
useless is something we have heard about other partial solutions we
have released in the past.  We try to avoid that.  ;-)  Labeling
something experimental also makes our code look sloppy.  And if we
decide the problem is unsolvable using this approach, we should remove
it now rather than later.  We don't like to carry around a wart for a
small segment of our userbase.

I realize many of you have not been around to see some of our
less-than-perfect solutions and to see the pain they cause.  Once
something gets it, we have to fight to remove it.  In fact, there is no
way we would add 'max_standby_delay' into our codebase now, knowing its
limitations, but people are having to fight hard for its removal, if
necessary.

Now that discussion has restarted again, let's keep going to see if can
reach some kind of simple solution.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Greg Stark
On Mon, May 10, 2010 at 5:20 PM, Bruce Momjian br...@momjian.us wrote:
 You are right that we are much more flexible about changing
 administrative configuration parameters (like this one) than SQL. In the
 past, we even renamed logging parameters to be more consistent, and I
 think that proves the bar is quite low for GUC administrative parameter
 change.  :-)

 The concern about 'max_standby_delay' is that it controls a lot of new
 code and affects the behavior of HS/SR in ways that might cause a poor
 user experience, expecially for non-expert users.

I would like to propose that we do the following:

1) Replace max_standby_delay with a boolean as per heikki's suggestion

2) Add an explicitly experimental option like max_standby_delay or
recovery_conflict_timeout which is only effective if you've chosen
recovery_conflict=pause recovery
option and is explicitly documented as being scheduled to be replaced
with a more complete system in future versions.

My thinking is that when we do replace max_standby_delay we would keep
the recovery_conflict parameter with the same semantics. It's just the
additional experimental option which would change.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Josh Berkus

 1) Replace max_standby_delay with a boolean as per heikki's suggestion
 
 2) Add an explicitly experimental option like max_standby_delay or
 recovery_conflict_timeout which is only effective if you've chosen
 recovery_conflict=pause recovery
 option and is explicitly documented as being scheduled to be replaced
 with a more complete system in future versions.

+1

As far as I can tell, the current delay *works*.  It just doesn't
necessarily work the way most people expect it to to work.  Kind of
like, hmmm, shared_buffers?  Or effective_cache_size?  Or
effective_io_concurrency?

And I still think that having this kind of a delay option will give us
invaluable use feedback on how the option *should* work in 9.1, which we
won't get if we don't have an option. I think we will be overhauling it
for 9.1, but I don't think that overhaul will benefit from a lack of data.

-- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Florian Pflug
On May 10, 2010, at 17:39 , Kevin Grittner wrote:
 Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 
 Overall I would say opinion is about evenly split between:
 
 - leave it as-is
 - make it a Boolean
 - change it in some way but to something more expressive than a
  Boolean
 
 I think a boolean would limit the environments in which HS would be
 useful.  Personally, I think how far the replica is behind the
 source is a more useful metric, even with anomalies on the
 transition from idle to active; but a blocking duration would be
 much better than no finer control than the boolean.  So my instant
 runoff second choice would be for the block duration knob.

You could always toggle that boolean automatically, based on some measurement 
of the replication lag (Assuming the boolean would be settable at runtime). 
That'd give you much more flexibility than any built-on knob could provide, and 
even more so than a built-in knob with known deficiencies.

My preference is hence to make it a boolean, but in a way that allows more 
advanced behavior to be implemented on top of it. In the simplest case by 
allowing the boolean to be flipped at runtime and ensuring that the system 
reacts in a sane way.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-10 Thread Fujii Masao
On Mon, May 10, 2010 at 3:27 PM, Simon Riggs si...@2ndquadrant.com wrote:
 I already explained that killing the startup process first is a bad idea
 for many reasons when shutdown was discussed. Can't remember who added
 the new standby shutdown code recently, but it sounds like their design
 was pretty poor if it didn't include shutting down properly with HS. I
 hope they fix the bug they have introduced. HS was never designed to
 work that way, so there is no flaw there; it certainly worked when
 committed.

New smart shutdown during recovery doesn't kill the startup process until
all of the read only backends have gone away. So it works fine with HS.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Simon Riggs
On Sat, 2010-05-08 at 14:48 -0400, Bruce Momjian wrote:

 I think the consensus is to change this setting to a boolean.  If you
 don't want to do it, I am sure we can find someone who will.

You expect others to act on consensus and follow rules, yet ignore them
yourself when it suits your purpose. Your other points seem designed to
distract people from seeing that.

There is clear agreement that a problem exists. The action to take as a
result of that problem is very clearly in doubt and yet you repeatedly
ignore other people's comments and viable technical resolutions. If you
can find a cat's paw to break consensus for you, more fool them. You
might find someone with a good resolution, if you ask that instead.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Kevin Grittner
Bruce Momjian  wrote:
 
 I think everyone agrees the current code is unusable, per Heikki's
 comment about a WAL file arriving after a period of no WAL
 activity
 
I don't.
 
I am curious to hear how many complaints we've had from alpha and
beta testers of HS regarding this issue.  I know that if we used it
with our software, the issue would probably go unnoticed because of
our usage patterns and automatic query retry.  A positive setting
would work as intended for us.  I can think of pessimal usage
patterns, different software approaches, and/or goals for HS usage
which would conflict badly with a positive setting.  Hopefully we
can document this area better than we've historically done with, for
example, fsync -- which has similar trade-offs, only with more dire
consequences for bad user choices.
 
-Kevin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes:
 I like the proposal of a boolean because it provides only the minimal
 feature set of two cases that are both clearly needed and easily
 implementable.  Whatever we do later is certain to provide a superset
 of those two cases.  If we do something else (and that includes my own
 proposal of a straight lock timeout), we'll be implementing something
 we might wish to take back later.  Taking out features after they've
 been in a release is very hard, even if we realize they're badly
 designed.

That's where I though my proposal fitted in. I fail to see us wanting to
take back explicit pause/resume admin functions in any future release.

Now, after having read Greg's arguments, my vote would be the following:
 - hot_standby_conflict_winner = queries|replay, defaults to replay
 - add pause/resume so that people can switch temporarily to queries
 - label max_standby_delay *experimental*, keep current code

By clearly stating the feature is *experimental* it should be easy to
both get feedback on it so that we know what to implement in 9.1, and
should that be completely different, take back the feature. It should
even be possible to continue tweaking its behavior during beta, or do
something better.

Of course it will piss off some users, but they knew they were depending
on some *experimental* feature after all.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Florian Pflug
On May 9, 2010, at 13:59 , Dimitri Fontaine wrote:
 Tom Lane t...@sss.pgh.pa.us writes:
 I like the proposal of a boolean because it provides only the minimal
 feature set of two cases that are both clearly needed and easily
 implementable.  Whatever we do later is certain to provide a superset
 of those two cases.  If we do something else (and that includes my own
 proposal of a straight lock timeout), we'll be implementing something
 we might wish to take back later.  Taking out features after they've
 been in a release is very hard, even if we realize they're badly
 designed.
 
 That's where I though my proposal fitted in. I fail to see us wanting to
 take back explicit pause/resume admin functions in any future release.
 
 Now, after having read Greg's arguments, my vote would be the following:
 - hot_standby_conflict_winner = queries|replay, defaults to replay
 - add pause/resume so that people can switch temporarily to queries
 - label max_standby_delay *experimental*, keep current code

Adding pause/resume seems to introduce some non-trivial locking problems, 
though. How would you handle a pause request if the recovery process currently 
held a lock?

Dropping the lock is not an option for correctness reasons. Otherwise you 
wouldn't have needed to take the lock in the first place, no?

Pausing with the lock held leads to priority-inversion like problems. Queries 
now might block until recovery is resumed - quite the opposite of what pause() 
is supposed to archive

The only remaining option is to continue applying WAL until you reach a point 
where no locks are held, then pause. But from a user's POV that is nearly 
indistinguishable from simply setting hot_standby_conflict_winner to in the 
first place I think.

best regards,
Florian Pflug



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Greg Stark
On Sun, May 9, 2010 at 4:00 AM, Greg Smith g...@2ndquadrant.com wrote:
  The use cases are covered as best they can be without better support from
 expected future SR features like heartbeats and XID loopback.

For what it's worth I think deferring these extra complications is a
very useful exercise. I would like to see a system that doesn't depend
on them for basic functionality. In particular I would like to see a
system that can be useful using purely WAL log shipping without
streaming replication at all.

I'm a bit unclear how the boolean proposal would solve things though.
Surely if you set the boolean to recovery-wins then when using
streaming replication with any non-idle master virtually every query
would be cancelled immediately as every HOT cleanup would cause a
snapshot conflict with even short-lived queires in the slave.



-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Robert Haas
On Sun, May 9, 2010 at 12:47 PM, Greg Stark gsst...@mit.edu wrote:
 On Sun, May 9, 2010 at 4:00 AM, Greg Smith g...@2ndquadrant.com wrote:
  The use cases are covered as best they can be without better support from
 expected future SR features like heartbeats and XID loopback.

 For what it's worth I think deferring these extra complications is a
 very useful exercise. I would like to see a system that doesn't depend
 on them for basic functionality. In particular I would like to see a
 system that can be useful using purely WAL log shipping without
 streaming replication at all.

 I'm a bit unclear how the boolean proposal would solve things though.
 Surely if you set the boolean to recovery-wins then when using
 streaming replication with any non-idle master virtually every query
 would be cancelled immediately as every HOT cleanup would cause a
 snapshot conflict with even short-lived queires in the slave.

It sounds to me like what we need here is some testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Simon Riggs
On Sat, 2010-05-08 at 20:57 -0400, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On Sunday 09 May 2010 01:34:18 Bruce Momjian wrote:
  I think everyone agrees the current code is unusable, per Heikki's
  comment about a WAL file arriving after a period of no WAL activity, and
  look how long it took our group to even understand why that fails so
  badly.
 
  To be honest its not *that* hard to simply make sure generating wal 
  regularly 
  to combat that. While it surely aint a nice workaround its not much of a 
  problem either.
 
 Well, that's dumping a kluge onto users; but really that isn't the
 point.  What we have here is a badly designed and badly implemented
 feature, and we need to not ship it like this so as to not
 institutionalize a bad design.

No, you have it backwards. HS was designed to work with SR. SR
unfortunately did not deliver any form of monitoring, and in doing so
the keepalive that it was known HS needed was left out, although it had
been on the todo list for some time. Luckily Greg and I argued to have
some monitoring added and my code was used to provide barest minimum
monitoring for SR, yet not enough to help HS.

Of course, if one team doesn't deliver for whatever reason then others
must take up the slack, if they can: no complaints. Since I personally
didn't know this was going to be the case until after freeze, it is very
late to resolve this situation sensibly and time has been against us.
It's much harder for me to reach into the depths of another person's
work and see how to add necessary mechanisms, especially when I'm
working elsewhere. Even if I had done, it's likely that I would have
been blocked with the great idea, next release response as already
used on this thread.

Without doubt the current mechanism suffers from the issues you mention,
though the current state is not the result of bad design, merely
inaction and lack of integration. We could resolve the current state in
many ways, if we chose.

Bruce has used the word crippleware for the current state. Raising a
problem and then blocking solutions is the best way I know to cripple a
release. It should be clear that I've done my best to avoid this
situation and have been active on both SR and HS. Had I not acted as I
have done to date, SR would at this point slurp CPU like a bandit and be
unmonitorable, both fatal flaws in production. I point this out not to
argue, but to set the record straight. IMHO your assignment of blame is
misplaced and your comments about poor design do not reflect how we
arrived at the current state.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Simon Riggs
On Sun, 2010-05-09 at 16:10 +0200, Florian Pflug wrote:

 Adding pause/resume seems to introduce some non-trivial locking
 problems, though. How would you handle a pause request if the recovery
 process currently held a lock?

(We are only talking about AccessExclusiveLocks here. No LWlocks are
held across WAL records during replay)

Just pause. There are no technical problem there.

Perhaps a danger of unforeseen consequences, though doing that might
also be desirable, who can say?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Dimitri Fontaine
Florian Pflug f...@phlo.org writes:
 The only remaining option is to continue applying WAL until you reach
 a point where no locks are held, then pause. But from a user's POV
 that is nearly indistinguishable from simply setting
 hot_standby_conflict_winner to in the first place I think.

Not really, the use case would be using the slave as a reporting server,
you know you have say 4 hours of reporting queries during which you will
pause the recovery. So it's ok for the pause command to take time.

What I understand the boolean option would do is to force the user into
choosing either high-availability or using the slave for other purposes
too. The problem is in wanting both, and that's what HS was meant to solve.

Having pause/resume allows for a mixed case usage which is simple to
drive and understand, yet fails to provide adaptive behavior where
queries are allowed to pause recovery implicitly for a while.

In my mind, that would be a compromise we could reach for 9.0, but it
seems introducing those admin functions now is to far a stretch. I've
been failing to understand exactly why, only getting a generic answer I
find unsatisfying here, because all the alternative paths being
proposed, apart from improve documentation, are more involved code
wise.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Florian Pflug
On May 9, 2010, at 21:04 , Simon Riggs wrote:
 On Sun, 2010-05-09 at 16:10 +0200, Florian Pflug wrote:
 
 Adding pause/resume seems to introduce some non-trivial locking
 problems, though. How would you handle a pause request if the recovery
 process currently held a lock?
 
 (We are only talking about AccessExclusiveLocks here. No LWlocks are
 held across WAL records during replay)
 
 Just pause. There are no technical problem there.
 
 Perhaps a danger of unforeseen consequences, though doing that might
 also be desirable, who can say?

No technical problems perhaps, but some usability ones, no?

I assume people would pause recovery to prevent it from interfering with 
long-running reporting queries. Now, if those queries might block indefinitely 
if the pause request by chance was issued while the recovery process held an 
AccessExclusiveLock, then the pause *caused* exactly what it was supposed to 
prevent. Setting hot_standby_conflict_winner to queries would at least have 
allowed the reporting queries to finish eventually.

If AccessExclusiveLocks are taken out of the picture (they're supposed to be 
pretty rare on a production system anyway), setting hot_standby_conflict_winner 
to queries seems to act like a conditional pause request - recovery is paused 
as soon as it gets in the way. In this setting, the real advantage of pause 
would be to prevent recovery from using up all available IO bandwidth. This 
seems like a valid concern, but calls more for something like recovery_delay 
(similar to vacuum_delay) instead of pause().

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Robert Haas
On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine dfonta...@hi-media.com wrote:
 Florian Pflug f...@phlo.org writes:
 The only remaining option is to continue applying WAL until you reach
 a point where no locks are held, then pause. But from a user's POV
 that is nearly indistinguishable from simply setting
 hot_standby_conflict_winner to in the first place I think.

 Not really, the use case would be using the slave as a reporting server,
 you know you have say 4 hours of reporting queries during which you will
 pause the recovery. So it's ok for the pause command to take time.

Seems like it could take FOREVER on a busy system.  Surely that's not
OK.  The fact that Hot Standby has to take exclusive locks that can't
be released until WAL replay has progressed to a certain point seems
like a fairly serious wart.  We had a discussion on another thread of
how this can make the database fail to shut down properly, a problem
we're not addressing because we're too busy arguing about
max_standby_delay.  In fact, if we knew how to pause replay without
leaving random locks lying around, we could rearrange the whole smart
shutdown sequence so that we paused replay FIRST and then waited for
all backends to exit, but the consensus on the thread where we
discussed this was that we did not know how to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Simon Riggs
On Sun, 2010-05-09 at 16:01 -0400, Robert Haas wrote:

 The fact that Hot Standby has to take exclusive locks that can't
 be released until WAL replay has progressed to a certain point seems
 like a fairly serious wart.

LOL

And people lecture me about design.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Florian Pflug
On May 9, 2010, at 22:01 , Robert Haas wrote:
 On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine dfonta...@hi-media.com 
 wrote:
 Florian Pflug f...@phlo.org writes:
 The only remaining option is to continue applying WAL until you reach
 a point where no locks are held, then pause. But from a user's POV
 that is nearly indistinguishable from simply setting
 hot_standby_conflict_winner to in the first place I think.
 
 Not really, the use case would be using the slave as a reporting server,
 you know you have say 4 hours of reporting queries during which you will
 pause the recovery. So it's ok for the pause command to take time.
 
 Seems like it could take FOREVER on a busy system.  Surely that's not
 OK.  The fact that Hot Standby has to take exclusive locks that can't
 be released until WAL replay has progressed to a certain point seems
 like a fairly serious wart.

If this is a serious wart then it's not one of hot standby, but one of postgres 
proper. AccessExclusiveLocks (SELECT-blocking locks that is, as opposed to 
UPDATE/DELETE-blocking locks) are never necessary from a correctness POV, 
they're only there for implementation reasons.

Getting rid of them doesn't seem completely insurmountable either - just as 
multiple row versions remove the need to block SELECTs dues to concurrent 
UPDATEs, multiple datafile versions could remove the need to block SELECTs due 
to concurrent ALTERs. But people seem to live with them quite well, judged from 
the amount of work put into getting rid of them (zero). I therefore fail to see 
why they should pose a significant problem in HS setups.

 We had a discussion on another thread of
 how this can make the database fail to shut down properly, a problem
 we're not addressing because we're too busy arguing about
 max_standby_delay.  In fact, if we knew how to pause replay without
 leaving random locks lying around, we could rearrange the whole smart
 shutdown sequence so that we paused replay FIRST and then waited for
 all backends to exit, but the consensus on the thread where we
 discussed this was that we did not know how to do that.

Yeah, this was exactly my line of thought too.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Andres Freund
On Monday 10 May 2010 00:25:44 Florian Pflug wrote:
 On May 9, 2010, at 22:01 , Robert Haas wrote:
  On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine dfonta...@hi-media.com 
wrote:
  Florian Pflug f...@phlo.org writes:
  The only remaining option is to continue applying WAL until you reach
  a point where no locks are held, then pause. But from a user's POV
  that is nearly indistinguishable from simply setting
  hot_standby_conflict_winner to in the first place I think.
  
  Not really, the use case would be using the slave as a reporting server,
  you know you have say 4 hours of reporting queries during which you will
  pause the recovery. So it's ok for the pause command to take time.
  
  Seems like it could take FOREVER on a busy system.  Surely that's not
  OK.  The fact that Hot Standby has to take exclusive locks that can't
  be released until WAL replay has progressed to a certain point seems
  like a fairly serious wart.
 
 If this is a serious wart then it's not one of hot standby, but one of
 postgres proper. AccessExclusiveLocks (SELECT-blocking locks that is, as
 opposed to UPDATE/DELETE-blocking locks) are never necessary from a
 correctness POV, they're only there for implementation reasons.
 
 Getting rid of them doesn't seem completely insurmountable either - just as
 multiple row versions remove the need to block SELECTs dues to concurrent
 UPDATEs, multiple datafile versions could remove the need to block SELECTs
 due to concurrent ALTERs. But people seem to live with them quite well,
 judged from the amount of work put into getting rid of them (zero). I
 therefore fail to see why they should pose a significant problem in HS
 setups.
The difference is that in HS you have to wait for a moment where *no exclusive 
lock at all* exist, possibly without contending for any of them, while on the 
master you might not even blocked by the existence of any of those locks.

If you have two sessions which in overlapping transactions lock different 
tables exlusively you have no problem shutting the master down, but you will 
never reach a point where no exclusive lock is taken on the slave.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-09 Thread Robert Haas
On Sun, May 9, 2010 at 6:58 PM, Andres Freund and...@anarazel.de wrote:
 On Monday 10 May 2010 00:25:44 Florian Pflug wrote:
 On May 9, 2010, at 22:01 , Robert Haas wrote:
  On Sun, May 9, 2010 at 3:09 PM, Dimitri Fontaine dfonta...@hi-media.com
 wrote:
  Florian Pflug f...@phlo.org writes:
  The only remaining option is to continue applying WAL until you reach
  a point where no locks are held, then pause. But from a user's POV
  that is nearly indistinguishable from simply setting
  hot_standby_conflict_winner to in the first place I think.
 
  Not really, the use case would be using the slave as a reporting server,
  you know you have say 4 hours of reporting queries during which you will
  pause the recovery. So it's ok for the pause command to take time.
 
  Seems like it could take FOREVER on a busy system.  Surely that's not
  OK.  The fact that Hot Standby has to take exclusive locks that can't
  be released until WAL replay has progressed to a certain point seems
  like a fairly serious wart.

 If this is a serious wart then it's not one of hot standby, but one of
 postgres proper. AccessExclusiveLocks (SELECT-blocking locks that is, as
 opposed to UPDATE/DELETE-blocking locks) are never necessary from a
 correctness POV, they're only there for implementation reasons.

 Getting rid of them doesn't seem completely insurmountable either - just as
 multiple row versions remove the need to block SELECTs dues to concurrent
 UPDATEs, multiple datafile versions could remove the need to block SELECTs
 due to concurrent ALTERs. But people seem to live with them quite well,
 judged from the amount of work put into getting rid of them (zero). I
 therefore fail to see why they should pose a significant problem in HS
 setups.
 The difference is that in HS you have to wait for a moment where *no exclusive
 lock at all* exist, possibly without contending for any of them, while on the
 master you might not even blocked by the existence of any of those locks.

 If you have two sessions which in overlapping transactions lock different
 tables exlusively you have no problem shutting the master down, but you will
 never reach a point where no exclusive lock is taken on the slave.

A possible solution to this in the shutdown case is to kill anyone
waiting on a lock held by the startup process at the same time we kill
the startup process, and to kill anyone who subsequently waits for
such a lock as soon as they attempt to take it.  I'm not sure if this
would also make sense in the pause case.

Another possible solution would be to try to figure out if there's a
way to delay application of WAL that requires the taking of AELs to
the point where we could apply it all at once.  That might not be
feasible, though, or only in some cases, and it's certainly 9.1
material (at least) in any case.

Anyway, this is all a little off-topic.  We need to get back to
arguing about how best to cut the legs out from under a feature that's
been in the tree for six months but Tom didn't get around to looking
at until last week.  I'll restate my position: now that I understand
what the issues are (I think), the feature as currently implemented
seems pretty wonky, but cutting it down to a boolean seems like an
exercise in excessive pessimism about our ability to predict future
development directions, as well as possibly quite inconvenient for
people attempting to use Hot Standby.  Therefore I think we should
adopt Tom's original proposal (with +1 also from Stephen Frost), but
that doesn't seem likely to fly because, on the one hand, we have Tom
himself arguing (along with Bruce and possibly Heikki) that we should
whack it down all the way to a boolean; and on the other hand Simon
and Greg Smith and I think also Andres Freund and Kevin Grittner
arguing that the original feature is OK as-is.

Other people who weighed in include Stefan Kaltenbrunner (who opined
that Tom had a legitimate complaint about the current design but
didn't vote for a specific resolution), Greg Sabino Mullane (who
pointed out that SOME of the issues that Tom raised could be solved
with proper time synchronization), Josh Drake (who thought requiring
NTP to be working was a bad idea, and therefore presumably favors
changing something), Josh Berkus (who changed his vote at least once
and whose priority seems to have to do with releasing before the turn
of the century than with the actual technical option we select,
apologies if I'm misreading his emails), Greg Stark (who seems to
think that a boolean will be bad news but didn't specifically vote for
another option), Dimitri Fontaine (who wants a boolean plus
pause/resume functions, or maybe a plugin facility of some kind), Rob
Wultsch (who doesn't ever want to kill queries and therefore would be
happy with a boolean), Yeb Havinga (who never wants to stall recovery
and therefore would also be happy with a boolean), and Florian Pflug
(who points out that pause/resume is actually a nontrivial feature).
Apologies if I've 

Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Simon Riggs
On Thu, 2010-05-06 at 12:03 -0700, Josh Berkus wrote:

 So changing to a lock-based mechanism or designing a plugin interface
 are really not at all realistic at this date.

I agree that changing to a lock-based mechanism is too much at this
stage of development.

However, putting in a plugin is trivial. We could do it if we choose,
without instability or risk. It is as easy a change as option (1). It's
not complex to design because it would use the exact same API as the
internal conflict resolution module already does; we can just move the
current conflict code straight into a contrib module. This can be done
bug-free in about 3 hours work. There is no performance issue associated
with that either. Plugins would allow all of the various mechanisms
requested on list over 18 months, nor would they prevent including some
of those options within the core at a later date.

Without meaning to cause further contention, it is very clear that
putting in contrib modules isn't bad after all, so there is no material
argument against the plugin approach. 

I recognise that plugins for some reason ignite unstated fears, by
observation that there is always an argument every time I mention them.
I invite an explanation of that off-list.

 Realistically, we have two options at this point:
 
 1) reduce max_standby_delay to a boolean.
 
 2) have a delay option (based either on WAL glob start time or on system
 time) like the current max_standby_delay, preferably with some bugs fixed.

With a plugin option, I would not object to option 1.

If option 1 was taken, without plugins, it's clear that would be against
consensus.

Having said that, I'll confirm now there will not be an extreme reaction
from me if option (1) was forced, nor do I counsel that from others.

 I said it before and I'll say it again: release early, release often.

None of this needs to delay release.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Bruce Momjian
Simon Riggs wrote:
 With a plugin option, I would not object to option 1.
 
 If option 1 was taken, without plugins, it's clear that would be against
 consensus.
 
 Having said that, I'll confirm now there will not be an extreme reaction
 from me if option (1) was forced, nor do I counsel that from others.

I found this email amusing. You phrase it like the community is supposed
to be worried by an objection from you or an extreme reaction;  I
certainly am not.  You have been in the community long enough to not use
such phrasing.  This is not the first time I have complained about this.
I have no idea why an objection from you should mean more than an
objection from anyone else in the community, and I have no idea what an
extreme reaction means, or why anyone should care.  Do you think the
community is negotiting with you?

I think the concensus is to change this setting to a boolean.  If you
don't want to do it, I am sure we can find someone who will.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Robert Haas
On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian br...@momjian.us wrote:
 I think the concensus is to change this setting to a boolean.  If you
 don't want to do it, I am sure we can find someone who will.

I still think we should revert to Tom's original proposal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Bruce Momjian
Robert Haas wrote:
 On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian br...@momjian.us wrote:
  I think the concensus is to change this setting to a boolean. ?If you
  don't want to do it, I am sure we can find someone who will.
 
 I still think we should revert to Tom's original proposal.

And Tom's proposal was to do it on WAL slave arrival time?  If we could
get agreement from everyone that that is the proper direction, fine, but
I am hearing things like plugins, and other complexity that makes it
seem we are not getting closer to an agreed solution, and without
agreement, the simplest approach seems to be just to remove the part we
can't agree upon.

I think the big question is whether this issue is significant enough
that we should ignore our policy of no feature design during beta.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Robert Haas
On Sat, May 8, 2010 at 3:40 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian br...@momjian.us wrote:
  I think the concensus is to change this setting to a boolean. ?If you
  don't want to do it, I am sure we can find someone who will.

 I still think we should revert to Tom's original proposal.

 And Tom's proposal was to do it on WAL slave arrival time?  If we could
 get agreement from everyone that that is the proper direction, fine, but
 I am hearing things like plugins, and other complexity that makes it
 seem we are not getting closer to an agreed solution, and without
 agreement, the simplest approach seems to be just to remove the part we
 can't agree upon.

 I think the big question is whether this issue is significant enough
 that we should ignore our policy of no feature design during beta.

Tom's proposal was basically to define recovery_process_lock_timeout.
The recovery process would wait X seconds for a lock, then kill
whoever held it.  It's not the greatest knob in the world for the
reasons already pointed out, but I think it's still better than a
boolean and will be useful to some users.  And it's pretty simple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Dimitri Fontaine
Bruce Momjian br...@momjian.us writes:
 I have no idea why an objection from you should mean more than an
 objection from anyone else in the community, and I have no idea what an
 extreme reaction means, or why anyone should care.

Maybe I shouldn't say anything here. But clearly while you're spot on
that Simon's objection is worth just as much as any other contributor's,
I disagree that we shouldn't care about the way those people feel about
being a member of our community.

I appreciate your efforts to avoid having anyone here use such a wording
but I can't help to dislike your argument for it. I hope that's simply a
localisation issue (l10n is so much harder than i18n).

Anyway, I so much hate reading such exchanges here that I couldn't help
ranting about it. Back to suitable -hackers content.

 I think the concensus is to change this setting to a boolean.  If you
 don't want to do it, I am sure we can find someone who will.

I don't think so. I understand the current state to be:
 a. this problem is not blocking beta, but a must fix before release
 b. we either have to change the API or the behavior
 c. only one behavior change has been proposed, by Tom
 d. proposed behavior would favor queries rather than availability
 e. API change 1 is boolean + explicit pause/resume command
 f. API change 2 is boolean + plugin facility, with a contrib for
current behavior. 
 g. API change 3 is boolean only

I don't remember reading any mail on this thread bearing consensus on
the choices above, but rather either one of us pushing for his vision or
people defending the current situation, complaining about it or asking
that a reasonable choice is made soon.

If we have to choose between reasonable and soon, soon won't be my
vote. Beta is meant to last more or less 3 months after all.

Each party's standing is clear. Decision remains to be made, and I guess
that the one writing the code will have a much louder voice.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Bruce Momjian
Robert Haas wrote:
 On Sat, May 8, 2010 at 3:40 PM, Bruce Momjian br...@momjian.us wrote:
  Robert Haas wrote:
  On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian br...@momjian.us wrote:
   I think the concensus is to change this setting to a boolean. ?If you
   don't want to do it, I am sure we can find someone who will.
 
  I still think we should revert to Tom's original proposal.
 
  And Tom's proposal was to do it on WAL slave arrival time? ?If we could
  get agreement from everyone that that is the proper direction, fine, but
  I am hearing things like plugins, and other complexity that makes it
  seem we are not getting closer to an agreed solution, and without
  agreement, the simplest approach seems to be just to remove the part we
  can't agree upon.
 
  I think the big question is whether this issue is significant enough
  that we should ignore our policy of no feature design during beta.
 
 Tom's proposal was basically to define recovery_process_lock_timeout.
 The recovery process would wait X seconds for a lock, then kill
 whoever held it.  It's not the greatest knob in the world for the
 reasons already pointed out, but I think it's still better than a
 boolean and will be useful to some users.  And it's pretty simple.

I thought there was concern about lock stacking causing
unpredictable/unbounded delays.   I am not sure boolean has a majority
vote, but I am suggesting that because it is the _minimal_ feature set,
and when we can't agree during beta, the minimal feature set seems like
the best choice.  

Clearly, anything is more feature-full than boolean --- the big question
is whether Tom's proposal is significantly better than boolean that we
should spend the time designing and implementing it, with the
possibility it will all be changed in 9.1.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Greg Smith

Bruce Momjian wrote:

I think the big question is whether this issue is significant enough
that we should ignore our policy of no feature design during beta


The idea that you're considering removal of a feature that we already 
have people using in beta and making plans around is a policy violation 
too you know. A freeze should include not cutting things just because 
their UI or implementation is not ideal yet. And you've been using the 
word consensus here when there is no such thing. At best there's 
barely a majority here among people who have stated an opinion, and 
consensus means something much stronger even than that; that means 
something closer to unanimity. I thought the summary of where the 
project is at Josh wrote at 
http://archives.postgresql.org/message-id/4be31279.7040...@agliodbs.com 
was excellent, both from a technical and a process commentary 
standpoint. I'd be completely happy to follow that plan, and then we'd 
be at a consensus--with no one left arguing.


It was very clear back in February that if SR didn't hit the feature set 
to make HS less troublesome out of the box, there would be some 
limitations here, and that set of concerns hasn't changed much since 
then. I thought the backup plan if we didn't get things like xid 
feedback was to keep the capability as written anyway, knowing that it's 
still much better than no control over cancellation timing available at 
all. Keep improving documentation around its issues, and continue to 
hack away at them in user space and in the field. Then we do better for 
9.1. You seem bent on removing the feedback part of that cycle.


The full statement of the ESR bit Josh was quoting is Release early. 
Release often. And listen to your customers.[1] My customers include 
some of whom believed the PostgreSQL community process enough to 
contribute toward the HS development that's been completed and donated 
to the project. They have a pretty clear view on this I'm relaying when 
I talk about what I'd like to see happen. They are saying they cannot 
completely ignore their requirements for HA failover, but would be 
willing to loosen them just a bit (increasing failover time slightly) if 
it reduces the odds of query cancellation, and therefore improves how 
much load they can expect to push toward the standby. max_standby_delay 
is a currently available mechanism that does that. I'm not going to be 
their nanny and say no, that's not perfectly predictable, you might get 
a query canceled sometimes when you don't expect it anyway.


Instead, I was hoping to let them deploy 9.0 with this option available 
(but certainly not the default), informed of the potential risks, see 
how that goes. We can confirm whether the userland workarounds we 
believe will be effective here really are. If so, then we can solider 
forward directly incorporating them into the server code, knowing that 
works. If not, switch to one of the safer modes, see if there's 
something better to use altogether in 9.1, and perhaps this whole 
approach gets removed. That's healthy development progress either way.


Upthread Bruce expressed some concern that this was going to live 
forever once deployed. There is no way I'm going to let this behavior 
continue to be available in 9.1 if field tests say the workarounds 
aren't good enough. That's going to torture all of us who do customer 
deployments of this technology every day if that turns out to be the 
case, and nobody is going to feel the heat from that worse than 
2ndQuadrant. I did a round once of removing GUCs that didn't do what 
they were expected to in the field before, based on real-world tests 
showing regular misuse, and I'll do it again if this falls into that 
same category. We've already exposed this release to a whole stack of 
risk from work during its development cycle, risk that doesn't really 
drop much just from cutting this one bit. I'd at least like to get all 
the reward possible from that risk, which I expected to include feedback 
in this area.


Circumventing the planned development process by dropping this now will 
ruin how I expected the project to feel out the right thing on the user 
side, and we'll all be left with little more insight for what to do in 
9.1 than we have now. And I'm not looking forward to explaining to 
people why a feature they've been seeing and planning to deploy for 
months has now been cut only after what was supposed to be a freeze for 
beta.


[1] 
http://catb.org/esr/writings/homesteading/cathedral-bazaar/ar01s04.html 
, and this particular bit is quite relevant here: Linus was keeping his 
hacker/users constantly stimulated and rewarded—stimulated by the 
prospect of having an ego-satisfying piece of the action, rewarded by 
the sight of constant (even daily) improvement in their work. Linus was 
directly aiming to maximize the number of person-hours thrown at 
debugging and development, even at the possible cost of instability in 
the code and user-base burnout 

Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Bruce Momjian
Greg Smith wrote:
 Bruce Momjian wrote:
  I think the big question is whether this issue is significant enough
  that we should ignore our policy of no feature design during beta
 
 The idea that you're considering removal of a feature that we already 
 have people using in beta and making plans around is a policy violation 
 too you know. A freeze should include not cutting things just because 
 their UI or implementation is not ideal yet. And you've been using the 
 word consensus here when there is no such thing. At best there's 
 barely a majority here among people who have stated an opinion, and 
 consensus means something much stronger even than that; that means 
 something closer to unanimity. I thought the summary of where the 
 project is at Josh wrote at 
 http://archives.postgresql.org/message-id/4be31279.7040...@agliodbs.com 
 was excellent, both from a technical and a process commentary 
 standpoint. I'd be completely happy to follow that plan, and then we'd 
 be at a consensus--with no one left arguing.

I can't argue with anything you have said in your email.  The big
question is whether designing during beta is worth it in this case, and
whether we can get something that is useful and gives us useful feedback
for 9.1, and is it worth spending the time to figure this out during
beta?  If we can, great, let's do it, but I have not seen that yet, and
I am unclear how long we should keep trying to find it.

I think everyone agrees the current code is unusable, per Heikki's
comment about a WAL file arriving after a period of no WAL activity, and
look how long it took our group to even understand why that fails so
badly.  I thought Tom's idea had problems, and there were ideas of how
to improve it.  It just seems like we are drifting around on something
that has no easy solution, and not something that we are likely to hit
during beta where we should be focusing on the release.  Saying we have
three months to fix this during beta seems like a recipe for delaying
the final release, and this feature is not worth that.

What we could do is to convert max_standby_delay to a boolean, 'ifdef'
out the code that was handling non-boolean cases, and then if someone
wants to work on a patch in a corner and propose something in a month
that improves this, we can judge the patch on its own merits, and apply
it if it is a great benefit, because basically that is what we are doing
now if we fix this --- adding a new patch/feature during beta. 
(Frankly, because we are not requiring an initdb during beta, I am
unclear how we are going to rename max_standby_delay to behave as a
boolean.)

It is great if we can get a working max_standby_delay, but I fear
drifting/distraction at this stage.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Andres Freund
On Sunday 09 May 2010 01:34:18 Bruce Momjian wrote:
 I think everyone agrees the current code is unusable, per Heikki's
 comment about a WAL file arriving after a period of no WAL activity, and
 look how long it took our group to even understand why that fails so
 badly.
To be honest its not *that* hard to simply make sure generating wal regularly 
to combat that. While it surely aint a nice workaround its not much of a 
problem either.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On Sunday 09 May 2010 01:34:18 Bruce Momjian wrote:
 I think everyone agrees the current code is unusable, per Heikki's
 comment about a WAL file arriving after a period of no WAL activity, and
 look how long it took our group to even understand why that fails so
 badly.

 To be honest its not *that* hard to simply make sure generating wal regularly 
 to combat that. While it surely aint a nice workaround its not much of a 
 problem either.

Well, that's dumping a kluge onto users; but really that isn't the
point.  What we have here is a badly designed and badly implemented
feature, and we need to not ship it like this so as to not
institutionalize a bad design.

I like the proposal of a boolean because it provides only the minimal
feature set of two cases that are both clearly needed and easily
implementable.  Whatever we do later is certain to provide a superset
of those two cases.  If we do something else (and that includes my own
proposal of a straight lock timeout), we'll be implementing something
we might wish to take back later.  Taking out features after they've
been in a release is very hard, even if we realize they're badly
designed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Greg Smith

Tom Lane wrote:


Taking out features after they've been in a release is very hard, even if we 
realize they're badly
designed.
  


It doesn't have to be; that's the problem the release often part takes 
care of.  If a release has only been out a year, and a new one comes out 
saying oh, that thing we released for the first time in the last 
version, it didn't work as well as we'd hoped in the field; you should 
try to avoid that and use this new implementation that works better 
instead once you can upgrade, that's not only not hard, it's exactly 
what people using a X.0 release expect to happen.


I've read the message from you that started off this thread several 
times now.  Your low-level code implementation details shared later 
obviously need to be addressed.  But all of the fundamental and 
fatal issues you mentioned at the start continue to strike me as 
either situations where you don't agree with the use case this was 
designed for, or spots where you feel the userland workarounds required 
to make it work right are too onerous.  Bruce's objections seem to fall 
mainly into the latter category.


I've been wandering around talking to people about that exact 
subject--what do people want and expect from Hot Standby, and what would 
they do to gain its benefits--for over six months now, independently of 
Simon's work which did a lot of that before me too.  The use cases are 
covered as best they can be without better support from expected future 
SR features like heartbeats and XID loopback.  As for the workarounds 
required to make things work, the responses I get match what we just saw 
from Andres.  When the required details are explained, people say 
that's annoying but I can do that, and off we go.  There are 
significant documentation issues I know need to be cleaned up here, and 
I've already said I'll take care of that as soon as freeze is really 
here and I have a stable target.  (That this discussion is still going 
on says that's not yet)


What I fail to see are problems significant enough to not ship the parts 
of this feature that are done, so that it can be used by those it is 
appropriate for, allow feedback, and make it easy to test individual 
improvements upon what's already there.  I can't make you prioritize 
based on what people are telling me.  All I can do is suggest you 
reconsider handing control over the decision to use this feature or not 
to the users of the software, so they can make their own choice.


I'm tired of arguing about this instead of doing productive work, and 
I've done all I can here to try and work within the development process 
of the community.  If talk of removing the max_standby_delay feature 
clears up, I'll happily provide my promised round of documentation 
updates, to make its limitations and associated workarounds as clear as 
they can be, within a week of being told go on that.  If instead this 
capability goes away, making those moot, I'll maintain my own release 
for the 2ndQuadrant customers who have insisted they need this 
capability if I have to.  That would be really unfortunate, because the 
only bucket I can pull time out of for that is the one I currently 
allocate to answering questions on the mailing lists here most days.  
I'd rather spend that helping out the PostgreSQL community, but we do 
need to deliver what our customers want too.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Bruce Momjian
Greg Smith wrote:
 Tom Lane wrote:
 
  Taking out features after they've been in a release is very hard, even if 
  we realize they're badly
  designed.

 
 It doesn't have to be; that's the problem the release often part takes 
 care of.  If a release has only been out a year, and a new one comes out 
 saying oh, that thing we released for the first time in the last 
 version, it didn't work as well as we'd hoped in the field; you should 
 try to avoid that and use this new implementation that works better 
 instead once you can upgrade, that's not only not hard, it's exactly 
 what people using a X.0 release expect to happen.

I think this is the crux of the issue.  Tom and I are saying that
historically we have shipped only complete features, or as complete as
reasonable, and have removed items during beta that we found didn't meet
this criteria, in an attempt to reduce the amount of feature set churn
in Postgres.  A database is complex, so modifying the API between major
releases is something we only do when we find a significant benefit.

In this case, if we keep max_standby_delay as non-boolean, we know it
will have to be redesigned in 9.1, and it is unclear to me what
additional knowledge we will gain by shipping it in 9.0, except to have
to tell people that it doesn't work well or requires complex
work-arounds, and that doesn't thrill any of us.  (I already suggested
that statement_timeout might supply a reasonable and predictable
workaround for non-boolean usage of max_standby_delay.)

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Robert Haas
On Sat, May 8, 2010 at 6:51 PM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Sat, May 8, 2010 at 3:40 PM, Bruce Momjian br...@momjian.us wrote:
  Robert Haas wrote:
  On Sat, May 8, 2010 at 2:48 PM, Bruce Momjian br...@momjian.us wrote:
   I think the concensus is to change this setting to a boolean. ?If you
   don't want to do it, I am sure we can find someone who will.
 
  I still think we should revert to Tom's original proposal.
 
  And Tom's proposal was to do it on WAL slave arrival time? ?If we could
  get agreement from everyone that that is the proper direction, fine, but
  I am hearing things like plugins, and other complexity that makes it
  seem we are not getting closer to an agreed solution, and without
  agreement, the simplest approach seems to be just to remove the part we
  can't agree upon.
 
  I think the big question is whether this issue is significant enough
  that we should ignore our policy of no feature design during beta.

 Tom's proposal was basically to define recovery_process_lock_timeout.
 The recovery process would wait X seconds for a lock, then kill
 whoever held it.  It's not the greatest knob in the world for the
 reasons already pointed out, but I think it's still better than a
 boolean and will be useful to some users.  And it's pretty simple.

 I thought there was concern about lock stacking causing
 unpredictable/unbounded delays.   I am not sure boolean has a majority
 vote, but I am suggesting that because it is the _minimal_ feature set,
 and when we can't agree during beta, the minimal feature set seems like
 the best choice.

 Clearly, anything is more feature-full than boolean --- the big question
 is whether Tom's proposal is significantly better than boolean that we
 should spend the time designing and implementing it, with the
 possibility it will all be changed in 9.1.

I doubt it's likely to be thrown out completely.  We might decide to
fine-tune it in some way.  My fear is that if we ship this with only a
boolean, we're shipping crippleware.  If that fear turns out to be
unfounded, I will of course be happy, but that's my concern, and I
don't believe that it's entirely unfounded.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Bruce Momjian
Robert Haas wrote:
  Clearly, anything is more feature-full than boolean --- the big question
  is whether Tom's proposal is significantly better than boolean that we
  should spend the time designing and implementing it, with the
  possibility it will all be changed in 9.1.
 
 I doubt it's likely to be thrown out completely.  We might decide to
 fine-tune it in some way.  My fear is that if we ship this with only a
 boolean, we're shipping crippleware.  If that fear turns out to be
 unfounded, I will of course be happy, but that's my concern, and I
 don't believe that it's entirely unfounded.

Well, historically, we have been willing to not ship features if we
can't get it right.  No one has ever accused us of crippleware, but our
hesitancy has caused slower user adoption, though long-term, it has
helped us grow a dedicated user base that trusts us.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-08 Thread Robert Haas
On Sun, May 9, 2010 at 12:08 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
  Clearly, anything is more feature-full than boolean --- the big question
  is whether Tom's proposal is significantly better than boolean that we
  should spend the time designing and implementing it, with the
  possibility it will all be changed in 9.1.

 I doubt it's likely to be thrown out completely.  We might decide to
 fine-tune it in some way.  My fear is that if we ship this with only a
 boolean, we're shipping crippleware.  If that fear turns out to be
 unfounded, I will of course be happy, but that's my concern, and I
 don't believe that it's entirely unfounded.

 Well, historically, we have been willing to not ship features if we
 can't get it right.  No one has ever accused us of crippleware, but our
 hesitancy has caused slower user adoption, though long-term, it has
 helped us grow a dedicated user base that trusts us.

We can make the decision to not ship the feature if the feature is
max_standby_delay.  But I think the feature is Hot Standby, which
I think we've pretty much committed to shipping.  And I am concerned
that if the only mechanism for controlling query cancellation vs.
recovery lag is a boolean, people feel that we didn't get Hot Standby
right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Rob Wultsch
On Wed, May 5, 2010 at 9:32 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, May 5, 2010 at 11:50 PM, Bruce Momjian br...@momjian.us wrote:
 If someone wants to suggest that HS is useless if max_standby_delay
 supports only boolean values, I am ready to suggest we remove HS as well
 and head to 9.0 because that would suggest that HS itself is going to be
 useless.

 I think HS is going to be a lot less useful than many people think, at
 least in 9.0.  But I think ripping out max_standby_delay will make it
 worse.

 The code will not be thrown away;  we will bring it back for 9.1.

 If that's the case, then taking it out makes no sense.

mysql dba troll
I manage a bunch of different environments and I am pretty sure that
in any of them if the db started seemingly randomly killing queries I
would have application teams followed quickly by executives coming
after me with torches and pitchforks.

I can not imagine setting this value to anything other than a bool and
most of the time that bool would be -1. I would only be unleashing a
kill storm in utter desperation and I would probably need to explain
myself in detail after. Utter desperation means I am sure I am going
to have to do a impactful failover at any moment and need a slave
completely up to date NOW.

It is good to have the option to automatically cancel queries, but I
think it is a mistake to assume many people will use it.

What I would really need for instrumentation is the ability to
determine *easily* how much a slave is lagging in clock time.
/mysql dba troll

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Simon Riggs
On Thu, 2010-05-06 at 00:47 -0400, Robert Haas wrote: 

 That just doesn't sound that bad to me, especially since the proposed
 alternative is:
 
 - Queries will get cancelled like crazy, period.
 
 Or else:
 
 - Replication can fall infinitely far behind and you can write a
 tedious and error-prone script to try to prevent it if you like.
 
 I think THAT is going to tarnish our reputation.

Yes, that will.

There is no consensus to remove max_standby_delay.

It could be improved with minor adjustments and it makes more sense to
allow a few of those, treating them as bugs.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Simon Riggs
On Wed, 2010-05-05 at 23:15 -0700, Rob Wultsch wrote:

 I manage a bunch of different environments and I am pretty sure that
 in any of them if the db started seemingly randomly killing queries I
 would have application teams followed quickly by executives coming
 after me with torches and pitchforks.

Fully understood and well argued, thanks for your input.

HS doesn't randomly kill queries and there are documented work-arounds
to control this behaviour.

Removing the parameter won't help the situation at all, it will make the
situation *worse* by removing control from where it's clearly needed and
removing all hope of making the HS feature work in practice. There is no
consensus to remove the parameter.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Dimitri Fontaine
Greg Smith g...@2ndquadrant.com writes:
 If you need a script that involves changing a server setting to do
 something, that translates into you can't do that for a typical DBA.  The
 idea of a program regularly changing a server configuration setting on a
 production system is one you just can't sell.  That makes this idea
 incredibly more difficult to use in the field than any of the workarounds
 that cope with the known max_standby_delay issues.

I still think that the best API we can do in a timely fashion for 9.0
is:

  standby_conflict_winner = replay|queries

  pg_pause_recovery() / pg_resume_recovery()

It seems to me those two functions are only exposing existing facilities
in the code, so that's more an API change that a new feature
inclusion. Of course I'm certainly wrong. But the code has already been
written.

I don't think we'll find any better to offer our users in the right time
frame. Now I'll try to step back and stop repeating myself in the void :)

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Florian Pflug
On May 6, 2010, at 11:26 , Dimitri Fontaine wrote:
 Greg Smith g...@2ndquadrant.com writes:
 If you need a script that involves changing a server setting to do
 something, that translates into you can't do that for a typical DBA.  The
 idea of a program regularly changing a server configuration setting on a
 production system is one you just can't sell.  That makes this idea
 incredibly more difficult to use in the field than any of the workarounds
 that cope with the known max_standby_delay issues.
 
 I still think that the best API we can do in a timely fashion for 9.0
 is:
 
  standby_conflict_winner = replay|queries
 
  pg_pause_recovery() / pg_resume_recovery()
 
 It seems to me those two functions are only exposing existing facilities
 in the code, so that's more an API change that a new feature
 inclusion. Of course I'm certainly wrong. But the code has already been
 written.

If there was an additional SQL-callable function that returned the backends the 
recovery process is currently waiting for, plus one that reported that last 
timestamp seen in the WAL, than all those different cancellation policies could 
be implemented as daemons that monitor recovery and kill backends as needed, no?

That would allow people to experiment with different cancellation policies, and 
maybe shed some light on what the useful policies are in practice.

best regards,
Florian Pflug



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Yeb Havinga

Rob Wultsch wrote:

I manage a bunch of different environments and I am pretty sure that
in any of them if the db started seemingly randomly killing queries I
would have application teams followed quickly by executives coming
after me with torches and pitchforks.

I can not imagine setting this value to anything other than a bool and
most of the time that bool would be -1. I would only be unleashing a
kill storm in utter desperation and I would probably need to explain
myself in detail after. Utter desperation means I am sure I am going
to have to do a impactful failover at any moment and need a slave
completely up to date NOW.
  
That's funny because when I was reading this thread, I was thinking the 
exact opposite: having max_standby_delay always set to 0 so I know the 
standby server is as up-to-date as possible. The application that 
accesses the hot standby has to be 'special' anyway because it might 
deliver not-up-to-date data. If that information about specialties 
regarding querying the standby server includes the warning that queries 
might get cancelled, they can opt for a retry themselves (is there a 
special return code to catch that case? like PGRES_RETRY_LATER) or a 
message to the user that their report is currently unavailable and they 
should retry in a few minutes.


regards,
Yeb Havinga


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Robert Haas
On Thu, May 6, 2010 at 1:35 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Robert Haas wrote:
 On Wed, May 5, 2010 at 11:52 PM, Bruce Momjian br...@momjian.us wrote:
 I am afraid the current setting is tempting for users to enable, but
 will be so unpredictable that it will tarnish the repuation of HS and
 Postgres.  We don't want to be thinking in 9 months, Wow, we shouldn't
 have shipped that features.  It is causing all kinds of problems.  We
 have done that before (rarely), and it isn't a good feeling.

 I am not convinced it will be unpredictable.  The only caveats that
 I've seen so far are:

 - You need to run ntpd.
 - Queries will get cancelled like crazy if you're not using steaming
 replication.

 And also in situations where the master is idle for a while and then
 starts doing stuff. That's the most significant source of confusion,
 IMHO, I wouldn't mind the requirement of ntpd so much.

Oh.  Ouch.  OK, sorry, I missed that part.  Wow, that's awful.  OK, I
agree: we can't ship that as-is.

/me feels embarrassed for completely failing to understand the root of
the issue until 84 emails into the thread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Andres Freund
Hi,

On Thursday 06 May 2010 07:35:49 Heikki Linnakangas wrote:
 Robert Haas wrote:
  On Wed, May 5, 2010 at 11:52 PM, Bruce Momjian br...@momjian.us wrote:
  I am afraid the current setting is tempting for users to enable, but
  will be so unpredictable that it will tarnish the repuation of HS and
  Postgres.  We don't want to be thinking in 9 months, Wow, we shouldn't
  have shipped that features.  It is causing all kinds of problems.  We
  have done that before (rarely), and it isn't a good feeling.
  
  I am not convinced it will be unpredictable.  The only caveats that
  I've seen so far are:
  
  - You need to run ntpd.
  - Queries will get cancelled like crazy if you're not using steaming
  replication.
 
 And also in situations where the master is idle for a while and then
 starts doing stuff. That's the most significant source of confusion,
 IMHO, I wouldn't mind the requirement of ntpd so much.
Personally I would much rather like to keep that configurability and manually 
generate a record a second. Or possibly do something akin to 
archive_timeout...

That may be not as important once there are less sources of conflict 
resolutions - but thats something *definitely* not going to happen for 9.0...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Simon Riggs
On Thu, 2010-05-06 at 11:36 +0200, Florian Pflug wrote:

 If there was an additional SQL-callable function that returned the backends 
 the recovery process is currently waiting for, plus one that reported that 
 last timestamp seen in the WAL, than all those different cancellation 
 policies could be implemented as daemons that monitor recovery and kill 
 backends as needed, no?
 
 That would allow people to experiment with different cancellation policies, 
 and maybe shed some light on what the useful policies are in practice.

It would be easier to implement a conflict resolution plugin that is
called when a conflict occurs, allowing users to have a customisable
mechanism. Again, I have no objection to that proposal.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Florian Pflug
On May 6, 2010, at 12:48 , Simon Riggs wrote:
 On Thu, 2010-05-06 at 11:36 +0200, Florian Pflug wrote:
 If there was an additional SQL-callable function that returned the backends 
 the recovery process is currently waiting for, plus one that reported that 
 last timestamp seen in the WAL, than all those different cancellation 
 policies could be implemented as daemons that monitor recovery and kill 
 backends as needed, no?
 
 That would allow people to experiment with different cancellation policies, 
 and maybe shed some light on what the useful policies are in practice.
 
 It would be easier to implement a conflict resolution plugin that is
 called when a conflict occurs, allowing users to have a customisable
 mechanism. Again, I have no objection to that proposal.

True, providing a plugin API would be even better, since no SQL callable API 
would have to be devised, and possible algorithms wouldn't be constrained by 
such an API's limitations.

The existing max_standby_delay logic could be moved to such a plugin, living in 
contrib. Since it was already established (I believe) that the existing 
max_standby_delay logic is sufficiently fragile to require significant 
knowledge on the user's side about potential pitfalls, asking those users to 
install the plugin from contrib shouldn't be too much to ask for.

This way, users who really need something more sophisticated than recovery wins 
always or standby wins always are given the tools they need *if* they're 
willing to put in the extra effort. For those who don't, offering 
max_standby_delay probably does more harm than good anyway, so nothing is lost 
by not offering it in the first place.

best regards,
Florian Pflug



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Simon Riggs
On Thu, 2010-05-06 at 13:46 +0200, Florian Pflug wrote:
 On May 6, 2010, at 12:48 , Simon Riggs wrote:
  On Thu, 2010-05-06 at 11:36 +0200, Florian Pflug wrote:
  If there was an additional SQL-callable function that returned the 
  backends the recovery process is currently waiting for, plus one that 
  reported that last timestamp seen in the WAL, than all those different 
  cancellation policies could be implemented as daemons that monitor 
  recovery and kill backends as needed, no?
  
  That would allow people to experiment with different cancellation 
  policies, and maybe shed some light on what the useful policies are in 
  practice.
  
  It would be easier to implement a conflict resolution plugin that is
  called when a conflict occurs, allowing users to have a customisable
  mechanism. Again, I have no objection to that proposal.
 
 True, providing a plugin API would be even better, since no SQL callable API 
 would have to be devised, and possible algorithms wouldn't be constrained by 
 such an API's limitations.
 
 The existing max_standby_delay logic could be moved to such a plugin, living 
 in contrib. Since it was already established (I believe) that the existing 
 max_standby_delay logic is sufficiently fragile to require significant 
 knowledge on the user's side about potential pitfalls, asking those users to 
 install the plugin from contrib shouldn't be too much to ask for.
 
 This way, users who really need something more sophisticated than recovery 
 wins always or standby wins always are given the tools they need *if* they're 
 willing to put in the extra effort. For those who don't, offering 
 max_standby_delay probably does more harm than good anyway, so nothing is 
 lost by not offering it in the first place.

No problem from me with that approach.

As long as 9.0 ships with the current capability to enforce
max_standby_delay, I have no problem.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Greg Smith

Heikki Linnakangas wrote:

Robert Haas wrote:
  

I am not convinced it will be unpredictable.  The only caveats that
I've seen so far are:

- You need to run ntpd.
- Queries will get cancelled like crazy if you're not using steaming
replication.



And also in situations where the master is idle for a while and then
starts doing stuff. That's the most significant source of confusion,
IMHO, I wouldn't mind the requirement of ntpd so much.
  


I consider it mandatory to include an documentation update here that 
says if you set max_standby_delay  0, and do not run something that 
regularly generates activity to the master like [example], you will get 
unnecessary query cancellation on the standby.  As well as something 
like what Josh was suggesting, adding warnings that this is for 
advanced users only, to borrow his wording.  This is why my name has 
been on the open items list for a while now--to make sure I follow 
through on that.


I haven't written it yet because there were still changes to the 
underlying code being made up until moments before beta started, then 
this discussion started without a break between.  There are a clear set 
of user land things that can be done to make up the deficiencies in the 
state of the server code, but we won't even get to see how they work out 
in the field (feedback needed to improve the 9.1 design) if this 
capability goes away altogether.


Is it not clear that there are some people who consider the occasional 
bit of cancellation OK, because they can correct for at the application 
layer and they're willing to factor it in to their design if it allows 
using the otherwise idle HA standby?  I'm fine with expanding that 
section of the documentation too, to make it more obvious that's the 
only situation this aspect of HS is aimed at and suitable for.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Greg Smith

Yeb Havinga wrote:

Rob Wultsch wrote:

I can not imagine setting this value to anything other than a bool and
most of the time that bool would be -1.
That's funny because when I was reading this thread, I was thinking 
the exact opposite: having max_standby_delay always set to 0 so I know 
the standby server is as up-to-date as possible.


If you ask one person about this, you'll discover they only consider one 
behavior here sane, and any other setting is crazy.  Ask five people, 
and you'll likely find someone who believes the complete opposite.  Ask 
ten and carefully work out the trade-offs they're willing to make given 
the fundamental limitations of replication, and you'll arrive at the 
range of behaviors available right now, plus some more that haven't been 
built yet.  There are a lot of different types of database applications 
out there, each with their own reliability and speed requirements to 
balance.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Dimitri Fontaine
Simon Riggs si...@2ndquadrant.com writes:
 It would be easier to implement a conflict resolution plugin that is
 called when a conflict occurs, allowing users to have a customisable
 mechanism. Again, I have no objection to that proposal.

To implement, if you say so, no doubt. To use, that means you need to
install a contrib module after validation that the trade offs there are
the one you're interested into, or you have to code it yourself. In C.

I don't see that as an improvement over what we have now. Our main
problem seems to be the documentation of the max_standby_delay, where we
give the impression it's doing things the code can not do. IIUC.

Regards,
-- 
dim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Simon Riggs
On Thu, 2010-05-06 at 16:09 +0200, Dimitri Fontaine wrote:
 Simon Riggs si...@2ndquadrant.com writes:
  It would be easier to implement a conflict resolution plugin that is
  called when a conflict occurs, allowing users to have a customisable
  mechanism. Again, I have no objection to that proposal.
 
 To implement, if you say so, no doubt. To use, that means you need to
 install a contrib module after validation that the trade offs there are
 the one you're interested into, or you have to code it yourself. In C.
 
 I don't see that as an improvement over what we have now. Our main
 problem seems to be the documentation of the max_standby_delay, where we
 give the impression it's doing things the code can not do. IIUC.

I meant easier to implement than what Florian suggested.

The plugin would also allow you to have the pause/resume capability.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Thu, 2010-05-06 at 16:09 +0200, Dimitri Fontaine wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 It would be easier to implement a conflict resolution plugin that is
 called when a conflict occurs, allowing users to have a customisable
 mechanism. Again, I have no objection to that proposal.
 To implement, if you say so, no doubt. To use, that means you need to
 install a contrib module after validation that the trade offs there are
 the one you're interested into, or you have to code it yourself. In C.

 I don't see that as an improvement over what we have now. Our main
 problem seems to be the documentation of the max_standby_delay, where we
 give the impression it's doing things the code can not do. IIUC.
 
 I meant easier to implement than what Florian suggested.
 
 The plugin would also allow you to have the pause/resume capability.

Not the same plugin. A hook for stop/resume would need to be called
before and/or after each record, the one for conflict resolution would
need to be called at each conflict. Designing a good interface for a
plugin is hard, you need at least a couple of samples ideas for plugins
that would use the hook, before you know the interface is flexible enough.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] max_standby_delay considered harmful

2010-05-06 Thread Heikki Linnakangas
Simon Riggs wrote:
 On Thu, 2010-05-06 at 16:09 +0200, Dimitri Fontaine wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 It would be easier to implement a conflict resolution plugin that is
 called when a conflict occurs, allowing users to have a customisable
 mechanism. Again, I have no objection to that proposal.
 To implement, if you say so, no doubt. To use, that means you need to
 install a contrib module after validation that the trade offs there are
 the one you're interested into, or you have to code it yourself. In C.

 I don't see that as an improvement over what we have now. Our main
 problem seems to be the documentation of the max_standby_delay, where we
 give the impression it's doing things the code can not do. IIUC.
 
 I meant easier to implement than what Florian suggested.
 
 The plugin would also allow you to have the pause/resume capability.

Not the same plugin. A hook for stop/resume would need to be called
before and/or after each record, the one for conflict resolution would
need to be called at each conflict. Designing a good interface for a
plugin is hard, you need at least a couple of sample ideas for plugins
that would use the hook, before you know the interface is flexible enough.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >