On Wed, Mar 10, 2010 at 6:29 AM, Josh Berkus j...@agliodbs.com wrote:
Then I increased vacuum_defer_cleanup_age to 10, which represents
about 5 minutes of transactions on the test system. This eliminated all
query cancels for the reporting query, which takes an average of 10s.
Next is a
On 3/10/10 3:38 AM, Greg Stark wrote:
I think that means that a
vacuum_defer_cleanup of up to about 100 or so (it depends on the width
of your counter record) might be reasonable as a general suggestion
but anything higher will depend on understanding the specific system.
100 wouldn't be
On Wed, Mar 10, 2010 at 3:29 PM, Josh Berkus j...@agliodbs.com wrote:
I've been playing with vacuum_defer_cleanup_age in reference to the
query cancel problem. It really seems to me that this is the way
forward in terms of dealing with query cancel for normal operation
rather than
Fujii Masao wrote:
On Wed, Mar 10, 2010 at 3:29 PM, Josh Berkus j...@agliodbs.com wrote:
I've been playing with vacuum_defer_cleanup_age in reference to the
query cancel problem. ?It really seems to me that this is the way
forward in terms of dealing with query cancel for normal operation
Why isn't vacuum_defer_cleanup_age listed on postgresql.conf.sample?
Though I also tried to test the effect of it, I was unable to find it
in the conf file.
Using it has some bugs we need to clean up, apparently.
--Josh Berkus
--
Sent via pgsql-hackers mailing list
All,
I've been playing with vacuum_defer_cleanup_age in reference to the
query cancel problem. It really seems to me that this is the way
forward in terms of dealing with query cancel for normal operation
rather than wal_standby_delay, or maybe in combination with it.
As a first test, I set up
Bruce Momjian wrote:
'max_standby_delay = -1' is really only a reasonable idea if you are
absolutely certain all queries are going to be short, which we can't
dismiss as an unfounded use case so it has value. I would expect you
have to also combine it with a matching reasonable
Greg Smith wrote:
I assumed they would set max_standby_delay = -1 and be happy.
The admin in this situation might be happy until the first time the
primary fails and a failover is forced, at which point there is an
unbounded amount of recovery data to apply that was stuck waiting
On 3/2/10 10:30 AM, Bruce Momjian wrote:
Right now you can't choose master bloat, but you can choose the other
two. I think that is acceptable for 9.0, assuming the other two don't
have the problems that Tom foresees.
Actually, if vacuum_defer_cleanup_age can be used, master bloat is an
Bruce Momjian wrote:
Right now you can't choose master bloat, but you can choose the other
two. I think that is acceptable for 9.0, assuming the other two don't
have the problems that Tom foresees.
I was wrong. You can choose master bloat with
vacuum_defer_cleanup_age, but only crudely
On Mon, 2010-03-01 at 12:04 -0800, Josh Berkus wrote:
does anyone dispute his analysis? Simon?
No dispute. I think I've discussed this before.
--
Simon Riggs www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your
On Mon, 2010-03-01 at 14:43 -0500, Tom Lane wrote:
Speaking of which, does the current HS+SR code have a
provision to force the standby to stop tracking WAL and come up live,
even when there's more WAL available?
Yes, trigger file.
--
Simon Riggs www.2ndQuadrant.com
--
Sent via
On Sun, 2010-02-28 at 16:56 +0100, Joachim Wieland wrote:
Now let's take a look at both scenarios from the administrators' point
of view:
Well argued, agree with all of your points.
--
Simon Riggs www.2ndQuadrant.com
--
Sent via pgsql-hackers mailing list
Greg Stark wrote:
On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus j...@agliodbs.com wrote:
I don't think that defer_cleanup_age is a long-term solution. ?But we
need *a* solution which does not involve delaying 9.0.
So I think the primary solution currently is to raise max_standby_age.
Greg Stark wrote:
On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus j...@agliodbs.com wrote:
I don't think that defer_cleanup_age is a long-term solution. ?But we
need *a* solution which does not involve delaying 9.0.
So I think the primary solution currently is to raise max_standby_age.
Greg Smith wrote:
Bruce Momjian wrote:
Right now you can't choose master bloat, but you can choose the other
two. I think that is acceptable for 9.0, assuming the other two don't
have the problems that Tom foresees.
I was wrong. You can choose master bloat with
On 2/28/10 7:00 PM, Greg Smith wrote:
The main problem with setting vacuum_defer_cleanup_age high isn't
showing it works, it's a pretty simple bit of code. It's when you
recognize that it penalizes all cleanup all the time, whether or not the
standby is actually executing a long-running query
Josh Berkus wrote:
And I think we can measure bloat in a pgbench test, no? When I get a
chance, I'll run one for a couple hours and see the difference that
cleanup_age makes.
The test case I attached at the start of this thread runs just the
UPDATE to the tellers table. Running something
On Mon, Mar 1, 2010 at 5:50 PM, Josh Berkus j...@agliodbs.com wrote:
I don't think that defer_cleanup_age is a long-term solution. But we
need *a* solution which does not involve delaying 9.0.
So I think the primary solution currently is to raise max_standby_age.
However there is a concern
So I think the primary solution currently is to raise max_standby_age.
However there is a concern with max_standby_age. If you set it to,
say, 300s. Then run a 300s query on the slave which causes the slave
to fall 299s behind. Now you start a new query on the slave -- it gets
a snapshot
On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus j...@agliodbs.com wrote:
Completely aside from that, how many users are going to be happy with a
slave server which is constantly 5 minutes behind?
Uhm, well all the ones who are happy with our current warm standby
setup for one?
And all the ones
Greg Stark wrote:
On Mon, Mar 1, 2010 at 7:21 PM, Josh Berkus j...@agliodbs.com wrote:
Completely aside from that, how many users are going to be happy with a
slave server which is constantly 5 minutes behind?
Uhm, well all the ones who are happy with our current warm standby
setup for one?
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
Greg Stark wrote:
For what it's worth Oracle has an option to have your standby
intentionally hold back n minutes behind and I've seen that set to 5
minutes.
yeah a lot of people are doing that intentionally...
It's the old DBA screwup
On 3/1/10 11:43 AM, Tom Lane wrote:
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
Greg Stark wrote:
For what it's worth Oracle has an option to have your standby
intentionally hold back n minutes behind and I've seen that set to 5
minutes.
yeah a lot of people are doing that
On 2/28/10 7:12 PM, Robert Haas wrote:
However, I'd still like to hear from someone with the requisite
technical knowledge whether capturing and retrying the current query in
a query cancel is even possible.
I'm not sure who you want to hear from here, but I think that's a dead end.
dead
Josh Berkus j...@agliodbs.com wrote:
It's undeniable that auto-retry would be better from a user's
perspective than a user-visible cancel. So if it's *reasonable*
to implement, I think we should be working on it. I'm also very
puzzled as to why nobody else wants to even discuss it; it's
josh, nobody is talking about it because it doesn't make sense. you could
only retry if it was the first query in the transaction and only if you
could prove there were no side-effects outside the database and then you
would have no reason to think the retry would be any more likely to work.
greg
Greg Stark st...@mit.edu writes:
josh, nobody is talking about it because it doesn't make sense. you could
only retry if it was the first query in the transaction and only if you
could prove there were no side-effects outside the database and then you
would have no reason to think the retry
Josh Berkus wrote:
However, this leaves aside Greg's point about snapshot age and
successive queries; does anyone dispute his analysis? Simon?
There's already a note on the Hot Standby TODO about unexpectly bad
max_standby_delay behavior being possible on an idle system, with no
On Mon, Mar 1, 2010 at 5:32 PM, Josh Berkus j...@agliodbs.com wrote:
On 2/28/10 7:12 PM, Robert Haas wrote:
However, I'd still like to hear from someone with the requisite
technical knowledge whether capturing and retrying the current query in
a query cancel is even possible.
I'm not sure
* Tom Lane t...@sss.pgh.pa.us [100301 20:04]:
Greg Stark st...@mit.edu writes:
josh, nobody is talking about it because it doesn't make sense. you could
only retry if it was the first query in the transaction and only if you
could prove there were no side-effects outside the database and
Joachim Wieland wrote:
1) With the current implementation they will see better performance on
the master and more aggressive vacuum (!), since they have less
long-running queries now on the master and autovacuum can kick in and
clean up with less delay than before. On the other hand their
Josh Berkus wrote:
HS+SR is still a tremendous improvement over the options available
previously. We never thought it was going to work for everyone
everywhere, and shouldn't let our project's OCD tendencies run away from us.
OCD (Obsessive-Compulsive Disorder) --- good one. :-)
--
Bruce
Bruce Momjian wrote:
Joachim Wieland wrote:
1) With the current implementation they will see better performance on
the master and more aggressive vacuum (!), since they have less
long-running queries now on the master and autovacuum can kick in and
clean up with less delay than before. On
Robert Haas wrote:
I just read through the current documentation and it doesn't really
seem to explain very much about how HS decides which queries to kill.
Can someone try to flesh that out a bit?
I believe it just launches on a mass killing spree once things like
max_standby_delay expire.
On Sun, Feb 28, 2010 at 6:07 AM, Greg Smith g...@2ndquadrant.com wrote:
Not forced to--have the option of. There are obviously workloads where you
wouldn't want this. At the same time, I think there are some pretty common
ones people are going to expect HS+SR to work on transparently where
On Sun, Feb 28, 2010 at 2:54 PM, Greg Stark gsst...@mit.edu wrote:
Really? I think we get lots of suprised wows from the field from the
idea that a long-running read-only query can cause your database to
bloat. I think the only reason that's obvious to us is that we've been
grappling with that
All,
First, from the nature of the arguments, we need to eventually have both
versions of SR: delay-based and xmin-pub. And it would be fantastic if
Greg Smith and Tom Lane could work on xmin-pub to see if we can get it
ready as well.
I also think, based on the discussion and Greg's test case,
Joachim Wieland wrote:
Instead, I assume that most people who will grab 9.0 and use HS+SR do
already have a database with a certain query profile. Now with HS+SR
they will try to put the most costly and longest read-only queries to
the standby but in the end will run the same number of queries
Josh Berkus wrote:
First, from the nature of the arguments, we need to eventually have both
versions of SR: delay-based and xmin-pub. And it would be fantastic if
Greg Smith and Tom Lane could work on xmin-pub to see if we can get it
ready as well.
As I see it, the main technical obstacle
Josh Berkus j...@agliodbs.com writes:
2) A more usable vacuum_defer_cleanup_age. If it was feasible for a
user to configure the master to not vacuum records less than, say, 5
minutes dead, then that would again offer the choice to the user of
slightly degraded performance on the master
On Sun, Feb 28, 2010 at 8:47 PM, Josh Berkus j...@agliodbs.com wrote:
1) Automated retry of cancelled queries on the slave. I have no idea
how hard this would be to implement, but it makes the difference between
writing lots of exception-handling code for slave connections
(unacceptable) to
Greg, Joachim,
As I see it, the main technical obstacle here is that a subset of a
feature already on the SR roadmap needs to get built earlier than
expected to pull this off. I don't know about Tom, but I have no
expectation it's possible for me to get up to speed on that code fast
enough
Josh Berkus wrote:
Well, we could throw this on the user if we could get them some
information on how to calculate that number. For example, some way for
them to calculate the number of XIDs per minute via a query, and then
set vacuum_defer_cleanup_age appropriately on the master. Sure, it's
On Sun, Feb 28, 2010 at 5:38 PM, Josh Berkus j...@agliodbs.com wrote:
Greg, Joachim,
As I see it, the main technical obstacle here is that a subset of a
feature already on the SR roadmap needs to get built earlier than
expected to pull this off. I don't know about Tom, but I have no
I think that what we are going to have to do before we can ship 9.0
is rip all of that stuff out and replace it with the sort of closed-loop
synchronization Greg Smith is pushing. It will probably be several
months before everyone is forced to accept that, which is why 9.0 is
not going to
On Fri, Feb 26, 2010 at 1:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Greg Stark gsst...@mit.edu writes:
In the model you describe any long-lived queries on the slave cause
tables in the master to bloat with dead records.
Yup, same as they would do on the master.
I think this model is on the
On Sun, Feb 28, 2010 at 5:28 AM, Greg Smith g...@2ndquadrant.com wrote:
The idea of the workaround is that if you have a single long-running query
to execute, and you want to make sure it doesn't get canceled because of a
vacuum cleanup, you just have it connect back to the master to keep an
Robert Haas wrote:
It seems to me that if we're forced to pass the xmin from the
slave back to the master, that would be a huge step backward in terms
of both scalability and performance, so I really hope it doesn't come
to that.
Not forced to--have the option of. There are obviously
On Fri, Feb 26, 2010 at 8:33 AM, Greg Smith g...@2ndquadrant.com wrote:
I'm not sure what you might be expecting from the above combination, but
what actually happens is that many of the SELECT statements on the table
*that isn't even being updated* are canceled. You see this in the logs:
On Fri, Feb 26, 2010 at 4:43 PM, Richard Huxton d...@archonet.com wrote:
Let's see if I've got the concepts clear here, and hopefully my thinking it
through will help others reading the archives.
There are two queues:
I don't see two queues. I only see the one queue of operations which
have
Greg Stark gsst...@mit.edu writes:
In the model you describe any long-lived queries on the slave cause
tables in the master to bloat with dead records.
Yup, same as they would do on the master.
I think this model is on the roadmap but it's not appropriate for
everyone and I think one of the
Tom Lane wrote:
I'm going to make an unvarnished assertion here. I believe that the
notion of synchronizing the WAL stream against slave queries is
fundamentally wrong and we will never be able to make it work.
The information needed isn't available in the log stream and can't be
made
On 2/26/10 10:53 AM, Tom Lane wrote:
I think that what we are going to have to do before we can ship 9.0
is rip all of that stuff out and replace it with the sort of closed-loop
synchronization Greg Smith is pushing. It will probably be several
months before everyone is forced to accept that,
Josh Berkus j...@agliodbs.com writes:
On 2/26/10 10:53 AM, Tom Lane wrote:
I think that what we are going to have to do before we can ship 9.0
is rip all of that stuff out and replace it with the sort of closed-loop
synchronization Greg Smith is pushing. It will probably be several
months
I don't see a substantial additional burden there. What I would
imagine is needed is that the slave transmits a single number back
--- its current oldest xmin --- and the walsender process publishes
that number as its transaction xmin in its PGPROC entry on the master.
If the main purpose
Tom Lane wrote:
Josh Berkus j...@agliodbs.com writes:
On 2/26/10 10:53 AM, Tom Lane wrote:
I think that what we are going to have to do before we can ship 9.0
is rip all of that stuff out and replace it with the sort of closed-loop
synchronization Greg Smith is pushing. It will probably be
On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I don't see a substantial additional burden there. What I would
imagine is needed is that the slave transmits a single number back
--- its current oldest xmin --- and the walsender process publishes
that number as its
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
I don't actually understand how tight synchronization on its own would
solve the problem. What if the connection to the master is lost? Do you
kill all queries in the standby before reconnecting?
Sure. So what? They'd have been
On Fri, 2010-02-26 at 12:02 -0800, Josh Berkus wrote:
I don't see a substantial additional burden there. What I would
imagine is needed is that the slave transmits a single number back
--- its current oldest xmin --- and the walsender process publishes
that number as its transaction xmin
* Greg Stark gsst...@mit.edu [100226 15:10]:
On Fri, Feb 26, 2010 at 7:16 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I don't see a substantial additional burden there. What I would
imagine is needed is that the slave transmits a single number back
--- its current oldest xmin --- and the
On Fri, Feb 26, 2010 at 8:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:
How's it going to do that, when it has no queries at the instant
of startup?
Why shouldn't it have any queries at walreceiver startup? It has any
xlog segments that were copied from the master and any it can find in
the
On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
There's *definitely* not going to be enough information in the WAL
stream coming from a master that doesn't think it has HS slaves.
We can't afford to record all that extra stuff in installations for
which it's just useless
Tom Lane wrote:
I don't see a substantial additional burden there. What I would
imagine is needed is that the slave transmits a single number back
--- its current oldest xmin --- and the walsender process publishes
that number as its transaction xmin in its PGPROC entry on the master.
That
That is exactly the core idea I was trying to suggest in my rambling
message. Just that small additional bit of information transmitted and
published to the master via that route, and it's possible to optimize
this problem in a way not available now. And it's a way that I believe
will feel
On Fri, Feb 26, 2010 at 9:44 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Greg Stark gsst...@mit.edu writes:
What extra entries?
Locks, just for starters. I haven't read enough of the code yet to know
what else Simon added. In the past it's not been necessary to record
any transient information
Greg Stark wrote:
On Fri, Feb 26, 2010 at 9:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
There's *definitely* not going to be enough information in the WAL
stream coming from a master that doesn't think it has HS slaves.
We can't afford to record all that extra stuff in installations for
which
Josh Berkus wrote:
That is exactly the core idea I was trying to suggest in my rambling
message. Just that small additional bit of information transmitted and
published to the master via that route, and it's possible to optimize
this problem in a way not available now. And it's a way that I
68 matches
Mail list logo