Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-03-16 Thread Andrew Dunstan
On Mon, Mar 16, 2015 at 11:46 AM, Robert Haas robertmh...@gmail.com wrote:

 On Sun, Mar 15, 2015 at 8:04 PM, Rui DeSousa r...@crazybean.net wrote:
  Would a parameter to auto terminate long running transactions be a
 better solution? Terminating the long running transaction would allow for
 the normal vacuuming process to cleanup the deleted records thus avoiding
 database bloat without introducing new semantics to Postgres's MVCC. I
 would also recommend that the default be disabled.

 An advantage of Kevin's approach is that you don't have to abort *all*
 long-running transactions, only those that touch data which has been
 modified since then.  If your system is read-mostly, that could
 dramatically reduce the number of aborts.


Indeed. The suggestion of auto-terminating long running transactions misses
the point, ISTM. Most of the use cases I can see for this involve vacuuming
tables that the long running transaction will have no interest in at all
(which is why I suggested being able to set it on a per table basis). I
certainly don't want to be terminating my long running report transaction
so that my queue table which is only ever touched by very short-lived
transactions can be reasonably vacuumed. But that's what we have to do now.

cheers

andrew


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-03-16 Thread Robert Haas
On Sun, Mar 15, 2015 at 8:04 PM, Rui DeSousa r...@crazybean.net wrote:
 Would a parameter to auto terminate long running transactions be a better 
 solution? Terminating the long running transaction would allow for the normal 
 vacuuming process to cleanup the deleted records thus avoiding database bloat 
 without introducing new semantics to Postgres's MVCC. I would also recommend 
 that the default be disabled.

An advantage of Kevin's approach is that you don't have to abort *all*
long-running transactions, only those that touch data which has been
modified since then.  If your system is read-mostly, that could
dramatically reduce the number of aborts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-03-15 Thread Rui DeSousa
Would a parameter to auto terminate long running transactions be a better 
solution? Terminating the long running transaction would allow for the normal 
vacuuming process to cleanup the deleted records thus avoiding database bloat 
without introducing new semantics to Postgres's MVCC. I would also recommend 
that the default be disabled.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-03-15 Thread Gavin Flower

On 16/03/15 13:04, Rui DeSousa wrote:

Would a parameter to auto terminate long running transactions be a better 
solution? Terminating the long running transaction would allow for the normal 
vacuuming process to cleanup the deleted records thus avoiding database bloat 
without introducing new semantics to Postgres's MVCC. I would also recommend 
that the default be disabled.



Hmm...

Better per transaction, or even per statement.

If I fire of a query I expect should be completed within 30 seconds, 
then I probably don't mind it aborting after 5 minutes.   However, if I 
fire of a query that I expect to take between an hour and 3 hours, then 
I definitely don't want it aborted after 5 minutes!


It could be that 99.99% of transactions will complete with a minute, but 
a small minority might reasonably be expected to take much longer.



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-03-14 Thread Simon Riggs
On 17 February 2015 at 06:35, Magnus Hagander mag...@hagander.net wrote:

 On Feb 17, 2015 12:26 AM, Andres Freund and...@2ndquadrant.com wrote:

 On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
  It seems we already have a mechanism in place that allows tuning of
  query cancel on standbys vs. preventing standby queries from seeing old
  data, specifically
  max_standby_streaming_delay/max_standby_archive_delay.  We obsessed
  about how users were going to react to these odd variables, but there
  has been little negative feedback.

 FWIW, I think that's a somewhat skewed perception. I think it was right to
 introduce those, because we didn't really have any alternatives.

 But max_standby_streaming_delay, max_standby_archive_delay and
 hot_standby_feedback are among the most frequent triggers for questions
 and complaints that I/we see.


 Agreed.

 And a really bad one used to be vacuum_defer_cleanup_age, because of
 confusing units amongst other things. Which in terms seems fairly close to
 Kevins suggestions, unfortunately.

I agree with Bruce that we already have a mechanism similar to this
for Hot Standby, so it should therefore be OK to have it for normal
running when users desire it. The key difference here is that in this
patch we use the LSN to look for changed data blocks, allowing queries
to proceed for longer than they would do on Hot Standby where they
currently just get blown away regardless. I like the proposal in this
patch but it is more extreme than the mechanism Hot Standby provides.
(If we implement this then I would want to see it work for Hot Standby
also so we can keep the mechanisms in step, I think that is too late
in the day to add that for 9.5.)

I also agree with Andres and Magnus in saying that in the current
definition of the tunable parameter it would be hard to use.

In response to Tom's perspective that it is the single most annoying
feature of Oracle, I agree. It just happens we have a similar problem:
bloat.

Having a parameter to define maximum acceptable bloat would be a
very useful thing in PostgreSQL.  IIRC other DBMS had a lot of work to
make snapshot too old occur in controllable circumstances.  So I
would call having a very crap parameter like old_snapshot_limit a
good start, but clearly not the end point of an initiative too improve
our control of bloat.

+1 to include this patch in 9.5, as long as there is agreement to
improve this in the future

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, RemoteDBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-24 Thread Kevin Grittner
Amit Kapila amit.kapil...@gmail.com wrote:

 Could you please explain in slightly more detail why can't it work if we use 
 timestamp instead of snapshot-xmin in your patch in

 function TestForOldSnapshot()?

It works fine for the additional visibility checking in scans, but
it doesn't cover the vacuuming -- that needs to use a transaction ID
for its cutoff.
 How exactly will this allow to return snapshot old error, do we
 need a check for page (page lsn) as in your current patch?


The change to the part where it actually throws the error is very
small, just checking time instead of xmin.


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-23 Thread Amit Kapila
On Sun, Feb 22, 2015 at 10:18 PM, Kevin Grittner kgri...@ymail.com wrote:

 Amit Kapila amit.kapil...@gmail.com wrote:

  It seems to me that SQL Server also uses similar mechanism to
  avoid the bloat in version store (place to store previous
  versions or record).

  I think if other leading databases provide a way to control the
  bloat, it indicates that most of the customers having
  write-intesive workload would like to see such an option.

 Yeah, I think many users are surprised by the damage that can be
 done to a database by an idle connection or long-running read-only
 query, especially when coming from other databases which have
 protections against that.

 The bad news is that changing to specifying the limit in time
 rather than transaction IDs is far more complicated than I thought.
 Basically, we need to associate times in the past with the value of
 latestCompletedXid (or derived values like snapshot xmin) at those
 times.  I was initially thinking that by adding a timestamp to the
 snapshot we could derive this from active snapshots.  For any one
 connection, it's not that hard for it to scan the pairingheap of
 snapshots and pick out the right values; but the problem is that it
 is process-local memory and this needs to work while the connection
 is sitting idle for long periods of time.

Could you please explain in slightly more detail why can't it work
if we use timestamp instead of snapshot-xmin in your patch in
function TestForOldSnapshot()?

 I've got a couple ideas
 on how to approach it, but neither seems entirely satisfying:

 (1) Use a new timer ID to interrupt the process whenever its
 oldest snapshot which hasn't yet crossed the old snapshot
 threshold does so.  The problem is that this seems as though it
 would need to do an awful lot more work (scanning the pairing heap
 for the best match) than anything else is currently doing in a
 timeout handler.  One alternative would be to keep a second pairing
 heap to track snapshots which have not crossed the threshold,
 removing items as they get old -- that would make the work needed
 in the timeout handler closer to other handlers.

 (2) Use a course enough granularity on time and a short enough
 maximum for the GUC to just keep a circular buffer of the mappings
 in memory.  We might be able to make this dense enough that one
 minute resolution for up to 60 days could fit in 338kB.  Obviously
 that could be reduced with courser granularity or a shorter
 maximum.


How exactly will this allow to return snapshot old error, do we
need a check for page (page lsn) as in your current patch?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-22 Thread Amit Kapila
On Mon, Feb 16, 2015 at 10:49 AM, Kevin Grittner kgri...@ymail.com wrote:

 What this discussion has made me reconsider is the metric for
 considering a transaction too old.  The number of transaction IDs
 consumed seems inferior as the unit of measure for that to LSN or
 time.

 It looks to me to be pretty trivial (on the order of maybe 30 lines
 of code) to specify this GUC in minutes rather than transaction
 IDs.

It seems to me that SQL Server also uses similar mechanism to
avoid the bloat in version store (place to store previous versions or
record).

Mechanism used by them is that during the shrink process, the
longest running transactions that have not yet generated row
versions are marked as victims.  If a transaction is marked as a
victim, it can no longer read the row versions in the version store.
When it attempts to read row versions, an error message is
generated and the transaction is rolled back.

I am not sure how much weight it adds to the usefulness of this
patch, however I think if other leading databases provide a way to
control the bloat, it indicates that most of the customers having
write-intesive workload would like to see such an option.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-22 Thread Kevin Grittner
Amit Kapila amit.kapil...@gmail.com wrote:

 It seems to me that SQL Server also uses similar mechanism to
 avoid the bloat in version store (place to store previous
 versions or record).

 I think if other leading databases provide a way to control the
 bloat, it indicates that most of the customers having
 write-intesive workload would like to see such an option.

Yeah, I think many users are surprised by the damage that can be
done to a database by an idle connection or long-running read-only
query, especially when coming from other databases which have
protections against that.

The bad news is that changing to specifying the limit in time
rather than transaction IDs is far more complicated than I thought.
Basically, we need to associate times in the past with the value of
latestCompletedXid (or derived values like snapshot xmin) at those
times.  I was initially thinking that by adding a timestamp to the
snapshot we could derive this from active snapshots.  For any one
connection, it's not that hard for it to scan the pairingheap of
snapshots and pick out the right values; but the problem is that it
is process-local memory and this needs to work while the connection
is sitting idle for long periods of time.  I've got a couple ideas
on how to approach it, but neither seems entirely satisfying:

(1) Use a new timer ID to interrupt the process whenever its
oldest snapshot which hasn't yet crossed the old snapshot
threshold does so.  The problem is that this seems as though it
would need to do an awful lot more work (scanning the pairing heap
for the best match) than anything else is currently doing in a
timeout handler.  One alternative would be to keep a second pairing
heap to track snapshots which have not crossed the threshold,
removing items as they get old -- that would make the work needed
in the timeout handler closer to other handlers.

(2) Use a course enough granularity on time and a short enough
maximum for the GUC to just keep a circular buffer of the mappings
in memory.  We might be able to make this dense enough that one
minute resolution for up to 60 days could fit in 338kB.  Obviously
that could be reduced with courser granularity or a shorter
maximum.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-22 Thread Andrew Dunstan


On 02/22/2015 11:48 AM, Kevin Grittner wrote:


(2) Use a course enough granularity on time and a short enough
maximum for the GUC to just keep a circular buffer of the mappings
in memory.  We might be able to make this dense enough that one
minute resolution for up to 60 days could fit in 338kB.  Obviously
that could be reduced with courser granularity or a shorter
maximum.




This doesn't sound too bad to me. Presumably these would be tunable. I 
think one minute granularity would be fine for most purposes,  but I can 
imagine people who would want it finer, like 10 seconds, say.



cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-20 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:
 Kevin Grittner (kgri...@ymail.com) wrote:

 With the two patches I submitted, bloat stabilized at less than 5%
 except for some btree indexes which followed pattern of inserting
 at the end and deleting most (but not all) of the entries over
 time.  I've been working on a separate patch for that, but it's not
 ready to present, so I probably won't post that until the first CF
 of the next release -- unless someone's interested enough to want
 to discuss it before then.

 Understood.  I'd encourage you to post your thoughts on it- I'm
 certainly interested in it, even if it isn't something we'll be able to
 really work on for 9.5.

OK, if you're up for discussing it even while it's in early
development, I'll start a separate thread to outline the idea.  As
a preliminary teaser in very abbreviated form, the idea is to
basically use the page split logic, but splitting all tuples from
multiple adjacent pages onto a new page to the right.  The old tuples
maintain their pointers (to support scans positioned in that range),
but all pointers *to* those pages will be redirected to the new
page.  Let's not bifurcate this thread by discussing it further
here; please wait until I start the new thread, which will be after
I post the next version of this snapshot too old patch.

 Isn't there a concern about the index entry disappearing also though?
 If there is a specific query against a particular value in the index and
 the index no longer has that value then the query would return zero
 results- but this could happen without visiting the heap, right?

Right; that's why I don't see how to avoid touching the index AMs.
The index pages also contain the LSN, and if you are not visiting
the heap it is because some page in the index was changed, and you
will see the too recent LSN there.

 Therefore, how do you know that the value was, or wasn't, visible at the
 time that the original long-running snapshot was taken.

Strictly speaking, you don't know anything that specific.  You know
that changes were made that *might* have removed tuples that would
have been visible to the snapshot.  Once a snapshot passes the
threshold for being considered old, then viewing any page changed
after the snapshot was built has to trigger the error.  That means
some false positives, but as long as you can't get false negatives
there is never a silently incorrect query result.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Kevin Grittner
Ants Aasma a...@cybertec.at wrote:

 If I understood the issue correctly, you have long running snapshots
 accessing one part of the data, while you have high churn on a
 disjoint part of data. We would need to enable vacuum on the high
 churn data while still being able to run those long queries. The
 snapshot too old solution limits the maximum age of global xmin at
 the cost of having to abort transactions that are older than this and
 happen to touch a page that was modified after they were started.

Pretty much.  It's not quite as black and white regarding high
churn and stable tables, though -- at least for the customer whose
feedback drove my current work on containing bloat.  They are
willing to tolerate an occasional snapshot too old on a cursor,
and they can automatically pick up again by restarting the cursor
with a new lower limit and a new snapshot.

 What about having the long running snapshots declare their working
 set, and then only take them into account for global xmin for
 relations that are in the working set? Like a SET TRANSACTION WORKING
 SET command. This way the error is deterministic, vacuum on the high
 churn tables doesn't have to wait for the old transaction delay to
 expire and we avoid a hard to tune GUC (what do I need to set
 old_snapshot_threshold to, to reduce bloat while not having normal
 transactions abort).

Let me make sure I understand your suggestion.  You are suggesting
that within a transaction you can declare a list of tables which
should get the snapshot too old error (or something like it) if a
page is read which was modified after the snapshot was taken?
Snapshots within that transaction would not constrain the effective
global xmin for purposes of vacuuming those particular tables?

If I'm understanding your proposal, that would help some of the
cases the snapshot too old case addresses, but it would not
handle the accidental idle in transaction case (e.g., start a
repeatable read transaction, run one query, then sit idle
indefinitely).  I don't think it would work quite as well for some
of the other cases I've seen, but perhaps it could be made to fit
if we could figure out the accidental idle in transaction case.

I also think it might be hard to develop a correct global xmin for
vacuuming a particular table with that solution.  How do you see
that working?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Andrew Dunstan


On 02/19/2015 03:31 PM, Kevin Grittner wrote:


What about having the long running snapshots declare their working
set, and then only take them into account for global xmin for
relations that are in the working set? Like a SET TRANSACTION WORKING
SET command. This way the error is deterministic, vacuum on the high
churn tables doesn't have to wait for the old transaction delay to
expire and we avoid a hard to tune GUC (what do I need to set
old_snapshot_threshold to, to reduce bloat while not having normal
transactions abort).

Let me make sure I understand your suggestion.  You are suggesting
that within a transaction you can declare a list of tables which
should get the snapshot too old error (or something like it) if a
page is read which was modified after the snapshot was taken?
Snapshots within that transaction would not constrain the effective
global xmin for purposes of vacuuming those particular tables?



I thought it meant that the declared tables would only be vacuumed 
conservatively, so the transaction would expect not to see snapshot too 
old from them.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Rod Taylor
On Wed, Feb 18, 2015 at 4:57 PM, Kevin Grittner kgri...@ymail.com wrote:

  But max_standby_streaming_delay, max_standby_archive_delay and
  hot_standby_feedback are among the most frequent triggers for
  questions and complaints that I/we see.
 
  Agreed.
  And a really bad one used to be vacuum_defer_cleanup_age, because
  of confusing units amongst other things. Which in terms seems
  fairly close to Kevins suggestions, unfortunately.

 Particularly my initial suggestion, which was to base snapshot
 age it on the number of transaction IDs assigned.  Does this look
 any better to you if it is something that can be set to '20min' or
 '1h'?  Just to restate, that would not automatically cancel the
 snapshots past that age; it would allow vacuum of any tuples which
 became dead that long ago, and would cause a snapshot too old
 message for any read of a page modified more than that long ago
 using a snapshot which was older than that.


I like this thought. One of the first things I do in a new Pg environment
is setup a cronjob that watches pg_stat_activity and terminates most
backends over N minutes in age (about 5x the length of normal work) with an
exception for a handful of accounts doing backups and other maintenance
operations.  This prevents a stuck client from jamming up the database.

Would pg_dump be able to opt-out of such a restriction?

regards,

Rod


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Kevin Grittner
Andrew Dunstan and...@dunslane.net wrote:
 On 02/19/2015 09:44 AM, Kevin Grittner wrote:

 I understand why this make people nervous. I wonder if it might be more
 palatable if there were a per-table setting that could enable it? If we
 could ensure that this was only applied to high churn queue tables, say,
 while tables touched by the report writer would not have it applied,
 that would calm a lot of my fears.

That's an interesting idea.  I think I should switch the unit of
measure for too old to a time-based value first, since there
seems to be universal agreement that it would be better than number
of transactions.  Once I've cleared that hurdle I'll see what this
would take.

 I'm also interested in handling the case Stephen Frost described, where
 a tuple is effectively dead but we don't currently have the means of
 discovering the fact, because there is an older long running transaction
 which is never in fact going to be able to see the tuple.

Absolutely. That's one of several other issues that I've been
looking at over the last few weeks. It sounds like people are
already working on that one, which is great. My personal priority
list included that, but after the two I submitted here and a patch
to allow combining near-empty btree pages so that btree bloat is
constrained without having to reindex periodically for the cases
where index tuples are added in index order (at one or a few
insertion points) and most-but-not-all are deleted. You can
currently wind up with a worst-case of one index tuple per page
with no action to reduce that bloat by vacuum processes.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Kevin Grittner
Rod Taylor rod.tay...@gmail.com wrote:

 Would pg_dump be able to opt-out of such a restriction?

I don't see how, since vacuum would be removing recently dead
tuples that are still visible; the alternative to getting a
snapshot too old error when reading a page which could be
affected is to return incorrect results, and nobody wants that.
The best you could do if you wanted to run pg_dump (or similar) and
it might take more time than your old_snapshot_threshold would be
to increase the threshold, reload, dump, set it back to the
normal setting, and reload again.


Andrew's suggestion of setting this per table would not help here.


--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Ants Aasma
On Thu, Feb 19, 2015 at 6:01 PM, Kevin Grittner kgri...@ymail.com wrote:
 I can see how they would be, provided we can be confident that we're
 going to actually throw an error when the snapshot is out of date and
 not end up returning incorrect results.  We need to be darn sure of
 that, both now and in a few years from now when many of us may have
 forgotten about this knob.. ;)

 I get that this is not zero-cost to maintain, and that it might not
 be of interest to the community largely for that reason.  If you
 have any ideas about how to improve that, I'm all ears.  But do
 consider the actual scope of what was needed for the btree coverage
 (above).  (Note that the index-only scan issue doesn't look like
 anything AM-specific; if something is needed for that it would be
 in the pruning/vacuum area.)

If I understood the issue correctly, you have long running snapshots
accessing one part of the data, while you have high churn on a
disjoint part of data. We would need to enable vacuum on the high
churn data while still being able to run those long queries. The
snapshot too old solution limits the maximum age of global xmin at
the cost of having to abort transactions that are older than this and
happen to touch a page that was modified after they were started.

What about having the long running snapshots declare their working
set, and then only take them into account for global xmin for
relations that are in the working set? Like a SET TRANSACTION WORKING
SET command. This way the error is deterministic, vacuum on the high
churn tables doesn't have to wait for the old transaction delay to
expire and we avoid a hard to tune GUC (what do I need to set
old_snapshot_threshold to, to reduce bloat while not having normal
transactions abort).

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Claudio Freire
On Thu, Feb 19, 2015 at 3:44 PM, Kevin Grittner kgri...@ymail.com wrote:
 I'm also interested in handling the case Stephen Frost described, where
 a tuple is effectively dead but we don't currently have the means of
 discovering the fact, because there is an older long running transaction
 which is never in fact going to be able to see the tuple.

 Absolutely. That's one of several other issues that I've been
 looking at over the last few weeks. It sounds like people are
 already working on that one, which is great. My personal priority
 list included that, but after the two I submitted here and a patch
 to allow combining near-empty btree pages so that btree bloat is
 constrained without having to reindex periodically for the cases
 where index tuples are added in index order (at one or a few
 insertion points) and most-but-not-all are deleted. You can
 currently wind up with a worst-case of one index tuple per page
 with no action to reduce that bloat by vacuum processes.

I'd be willing to test that patch.

I have a big database that does that, and a script that fills the disk
with said bloat. That's forced me to do periodic reindexing, which
sucks.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Ants Aasma
On Feb 19, 2015 10:31 PM, Kevin Grittner kgri...@ymail.com wrote:
  What about having the long running snapshots declare their working
  set, and then only take them into account for global xmin for
  relations that are in the working set? Like a SET TRANSACTION WORKING
  SET command. This way the error is deterministic, vacuum on the high
  churn tables doesn't have to wait for the old transaction delay to
  expire and we avoid a hard to tune GUC (what do I need to set
  old_snapshot_threshold to, to reduce bloat while not having normal
  transactions abort).

 Let me make sure I understand your suggestion.  You are suggesting
 that within a transaction you can declare a list of tables which
 should get the snapshot too old error (or something like it) if a
 page is read which was modified after the snapshot was taken?
 Snapshots within that transaction would not constrain the effective
 global xmin for purposes of vacuuming those particular tables?

Sorry, I should have been clearer. I'm proposing that a transaction can
declare what tables it will access. After that the transaction will
constrain xmin for only those tables. Accessing any other table would give
an error immediately.

 If I'm understanding your proposal, that would help some of the
 cases the snapshot too old case addresses, but it would not
 handle the accidental idle in transaction case (e.g., start a
 repeatable read transaction, run one query, then sit idle
 indefinitely).  I don't think it would work quite as well for some
 of the other cases I've seen, but perhaps it could be made to fit
 if we could figure out the accidental idle in transaction case.

Accidental idle in transaction seems better handled by just terminating
transactions older than some timeout. This is already doable externally,
but an integrated solution would be nice if we could figure out how the
permissions for setting such a timeout work.

 I also think it might be hard to develop a correct global xmin for
 vacuuming a particular table with that solution.  How do you see
 that working?

I'm imagining the long running transaction would register it's xmin in a
shared map keyed by relation for each referenced relation and remove the
xmin from procarray. Vacuum would access this map by relation, determine
the minimum and use that if it's earlier than the global xmin. I'm being
deliberately vague here about the datastructure in shared memory as I don't
have a great idea what to use there. It's somewhat similar to the lock
table in that in theory the size is unbounded, but in practice it's
expected to be relatively tiny.

Regards,
Ants Aasma


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:
 Kevin Grittner (kgri...@ymail.com) wrote:
 Stephen Frost sfr...@snowman.net wrote:

 In the end, with a single long-running transaction, the worst bloat you
 would have is double the size of the system at the time the long-running
 transaction started.

 I agree that limiting bloat to one dead tuple for every live one
 for each old snapshot is a limit that has value, and it was unfair
 of me to characterize that as not being a limit.  Sorry for that.

 This possible solution was discussed with the user whose feedback
 caused me to write this patch, but there were several reasons they
 dismissed that as a viable total solution for them, two of which I
 can share:

 (1)  They have a pool of connections each of which can have several
 long-running cursors, so the limit from that isn't just doubling
 the size of their database, it is limiting it to some two-or-three
 digit multiple of the necessary size.

 This strikes me as a bit off-the-cuff; was an analysis done which
 deteremined that would be the result?

If it sounds like off-the-cuff it is because I am summarizing the
mountain of data which has been accumulated over weeks of
discussions and many rounds of multi-day tests using their actual
software driven by a software test driver that simulates users,
with think time and all.

To be usable, we need to run on a particular set of hardware with a
simulated load of 3000 users.  In a two-day test without the
patches I'm proposing, their actual, working production software
running against PostgreSQL bloated the database by 37% and was on a
linear rate of increase.  Unpatched, CPU time consumed at a linear
rate due to the bloat until in the second day it saturated the CPUs
and the driver was unable to get acceptable user performance.
Because of that we haven't moved on to test what the bloat rate
would be like with 3000 users, but I assume it would be higher than
with 300 users.

With the two patches I submitted, bloat stabilized at less than 5%
except for some btree indexes which followed pattern of inserting
at the end and deleting most (but not all) of the entries over
time.  I've been working on a separate patch for that, but it's not
ready to present, so I probably won't post that until the first CF
of the next release -- unless someone's interested enough to want
to discuss it before then.

 (2)  They are already prepared to deal with snapshot too old
 errors on queries that run more than about 20 minutes and which
 access tables which are modified.  They would rather do that than
 suffer the bloat beyond that point.

 That, really, is the crux here- they've already got support for dealing
 with it the way Oracle did and they'd like PG to do that too.
 Unfortunately, that, by itself, isn't a good reason for a particular
 capability (we certainly aren't going to be trying to duplicate PL/SQL
 in PG any time soon).

I understand that, and if the community doesn't want this style of
limitation on bloat we can make it part of PPAS (which *does*
duplicate PL/SQL, among other things).  We are offering it to the
community first.

 That said, there are capabilities in other RDBMS's which are
 valuable and which we *do* want, so the fact that Oracle does this
 also isn't a reason to not include it.

:-)

 I'm not against having a knob like this, which is defaulted to off,

 Thanks!  I'm not sure that amounts to a +1, but at least it doesn't
 sound like a -1.  :-)

 So, at the time I wrote that, I wasn't sure if it was a +1 or not
 myself.  I've been thinking about it since then, however, and I'm
 leaning more towards having the capability than not, so perhaps that's a
 +1, but it doesn't excuse the need to come up with an implementation
 that everyone can be happy with and what you've come up with so far
 doesn't have a lot of appeal, based on the feedback (I've only glanced
 through it myself, but I agree with Andres and Tom that it's a larger
 footprint than we'd want for this and the number of places having to be
 touched is concerning as it could lead to future bugs).

This conversation has gotten a little confusing because some of my
posts seem to have been held up in the email systems for some
reason, and are arriving out-of-order (and at least one I don't see
yet).  But I have been responding to these points.  For one thing I
stated four days ago that I now feel that the metric for
determining that a snapshot is old should be *time*, not
transactions IDs consumed, and offered to modify that patch in that
direction if people agreed.  I now see one person agreeing and
several people arguing that I should do just that (as though they
had not seem me propose it), so I'll try to get a new version out
today like that.  In any event, I am sure it is possible and feel
that it would be better.  (Having posted that now at least 4 times,
hopefully people will pick up on it.)

For another thing, as mentioned earlier, this is the btree footprint:

 

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Andrew Dunstan


On 02/19/2015 09:44 AM, Kevin Grittner wrote:


On the 15th I said this:

| What this discussion has made me reconsider is the metric for
| considering a transaction too old.  The number of transaction IDs
| consumed seems inferior as the unit of measure for that to LSN or
| time.
|
| It looks to me to be pretty trivial (on the order of maybe 30 lines
| of code) to specify this GUC in minutes rather than transaction
| IDs.  At first glance this seems like it would be vulnerable to the
| usual complaints about mismanaged clocks, but those are easily
| answered by using a cached time in shared memory that we populate
| in such a way that it can never move backward.  Done right, this
| could even allow the GUC to be changeable on reload rather than
| only at restart.  A badly mismanaged system clock could not cause a
| query to generate incorrect results; the worst that could happen is
| that this feature would fail to control bloat as much as expected
| or reads of modified data could generate the snapshot too old
| error around the time of the clock adjustment.
|
| As before, this would default to a magic value to mean that you
| want the historical PostgreSQL behavior.
|
| If that makes the idea more palatable to anyone, I can submit a
| patch to that effect within the next 24 hours.

Until yesterday I didn't get any feedback suggesting that such a
change would make the patch more palatable.  Now that I have had,
I'll try to post a patch to that effect today.

Thanks!




I understand why this make people nervous. I wonder if it might be more 
palatable if there were a per-table setting that could enable it? If we 
could ensure that this was only applied to high churn queue tables, say, 
while tables touched by the report writer would not have it applied, 
that would calm a lot of my fears.


I'm also interested in handling the case Stephen Frost described, where 
a tuple is effectively dead but we don't currently have the means of 
discovering the fact, because there is an older long running transaction 
which is never in fact going to be able to see the tuple.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Kevin Grittner
Greg Stark st...@mit.edu wrote:
 On Sun, Feb 15, 2015 at 8:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 There might be something in that, but again it's not much like
 this patch.  The key point I think we're both making is that
 nondeterministic failures are bad, especially when you're
 talking about long-running, expensive-to-retry transactions.

 But I think Kevin would agree with you there. That's why he's
 interested in having the errors not occur if you don't read from
 the volatile tables. Ie, your reporting query reading from
 read-only tables would be guaranteed to succeed even if some
 other table had had some rows vacuumed away.

 I'm not sure that's really going to work because of things like
 hint bit updates or hot pruning. That is, say your table is
 read-only now but last week some of the records were updated. You
 might reasonably expect those rows to be safe and indeed the rows
 themselves will still be in your report. But the earlier versions
 of the rows could still be sitting on some pages invisible to
 every query but not vacuumable quite yet and then suddenly they
 get vacuumed away and the LSN on the page gets bumped.

That's a very interesting point.  In the case that the reporting
tables are like that (versus being created right before the run,
for the report), it would argue for either very aggressive
autovacuum settings or an explicit vacuum on those tables before
starting the report.

 Fwiw I would strongly suggest that instead of having a number of
 transactions to have a time difference.

On the 15th I said this:

| What this discussion has made me reconsider is the metric for
| considering a transaction too old.  The number of transaction IDs
| consumed seems inferior as the unit of measure for that to LSN or
| time.
|
| It looks to me to be pretty trivial (on the order of maybe 30 lines
| of code) to specify this GUC in minutes rather than transaction
| IDs.  At first glance this seems like it would be vulnerable to the
| usual complaints about mismanaged clocks, but those are easily
| answered by using a cached time in shared memory that we populate
| in such a way that it can never move backward.  Done right, this
| could even allow the GUC to be changeable on reload rather than
| only at restart.  A badly mismanaged system clock could not cause a
| query to generate incorrect results; the worst that could happen is
| that this feature would fail to control bloat as much as expected
| or reads of modified data could generate the snapshot too old
| error around the time of the clock adjustment.
|
| As before, this would default to a magic value to mean that you
| want the historical PostgreSQL behavior.
|
| If that makes the idea more palatable to anyone, I can submit a
| patch to that effect within the next 24 hours.

Until yesterday I didn't get any feedback suggesting that such a
change would make the patch more palatable.  Now that I have had,
I'll try to post a patch to that effect today.

Thanks!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread Stephen Frost
Kevin,

* Kevin Grittner (kgri...@ymail.com) wrote:
 Stephen Frost sfr...@snowman.net wrote:
  Kevin Grittner (kgri...@ymail.com) wrote:
  (1)  They have a pool of connections each of which can have several
  long-running cursors, so the limit from that isn't just doubling
  the size of their database, it is limiting it to some two-or-three
  digit multiple of the necessary size.
 
  This strikes me as a bit off-the-cuff; was an analysis done which
  deteremined that would be the result?
 
 If it sounds like off-the-cuff it is because I am summarizing the
 mountain of data which has been accumulated over weeks of
 discussions and many rounds of multi-day tests using their actual
 software driven by a software test driver that simulates users,
 with think time and all.

My apologies for, unintentionally, implying that you hadn't done
sufficient research or testing.  That certainly was *not* my intent and
it's absolutely clear to me that you've spent quite a bit of time
analyzing this problem.

What I was trying to get at is that the approach which I outlined was,
perhaps, not closely analyzed.  Now, analyzing a problem based on a
solution which doesn't actually exist isn't exactly trivial, and while
it might be possible to do in this case it'd probably be easier to
simply write the code and the review the results than perform an
independent analysis.  I do feel that the solution might have been
dismissed on the basis of doubling the database for each long-running
transaction is not acceptable or even having bloat because of a
long-running transaction is not acceptable, but really analyzing what
would actually happen is a much more difficult exercise.

 With the two patches I submitted, bloat stabilized at less than 5%
 except for some btree indexes which followed pattern of inserting
 at the end and deleting most (but not all) of the entries over
 time.  I've been working on a separate patch for that, but it's not
 ready to present, so I probably won't post that until the first CF
 of the next release -- unless someone's interested enough to want
 to discuss it before then.

Understood.  I'd encourage you to post your thoughts on it- I'm
certainly interested in it, even if it isn't something we'll be able to
really work on for 9.5.

  That, really, is the crux here- they've already got support for dealing
  with it the way Oracle did and they'd like PG to do that too.
  Unfortunately, that, by itself, isn't a good reason for a particular
  capability (we certainly aren't going to be trying to duplicate PL/SQL
  in PG any time soon).
 
 I understand that, and if the community doesn't want this style of
 limitation on bloat we can make it part of PPAS (which *does*
 duplicate PL/SQL, among other things).  We are offering it to the
 community first.

That is certainly appreciated and I'd encourage you to continue to do
so.  I do feel that the community, in general, is interested in these
kinds of knobs- it's just a question of implemenation and making sure
that it's a carefully considered solution and not a knee-jerk reaction
to match what another RDBMS does.

  I'm not against having a knob like this, which is defaulted to off,
 
  Thanks!  I'm not sure that amounts to a +1, but at least it doesn't
  sound like a -1.  :-)
 
  So, at the time I wrote that, I wasn't sure if it was a +1 or not
  myself.  I've been thinking about it since then, however, and I'm
  leaning more towards having the capability than not, so perhaps that's a
  +1, but it doesn't excuse the need to come up with an implementation
  that everyone can be happy with and what you've come up with so far
  doesn't have a lot of appeal, based on the feedback (I've only glanced
  through it myself, but I agree with Andres and Tom that it's a larger
  footprint than we'd want for this and the number of places having to be
  touched is concerning as it could lead to future bugs).
 
 This conversation has gotten a little confusing because some of my
 posts seem to have been held up in the email systems for some
 reason, and are arriving out-of-order (and at least one I don't see
 yet).  But I have been responding to these points.  For one thing I
 stated four days ago that I now feel that the metric for
 determining that a snapshot is old should be *time*, not
 transactions IDs consumed, and offered to modify that patch in that
 direction if people agreed.  I now see one person agreeing and
 several people arguing that I should do just that (as though they
 had not seem me propose it), so I'll try to get a new version out
 today like that.  In any event, I am sure it is possible and feel
 that it would be better.  (Having posted that now at least 4 times,
 hopefully people will pick up on it.)

I agree with others that having it be time-based would be better.  I
did see that but it's really an independent consideration from the
footprint of the patch.

 For another thing, as mentioned earlier, this is the btree footprint:
 
  

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-19 Thread David Steele
On 2/19/15 1:54 PM, Kevin Grittner wrote:
 Rod Taylor rod.tay...@gmail.com wrote:
 
 Would pg_dump be able to opt-out of such a restriction?
 
 I don't see how, since vacuum would be removing recently dead
 tuples that are still visible; the alternative to getting a
 snapshot too old error when reading a page which could be
 affected is to return incorrect results, and nobody wants that.
 The best you could do if you wanted to run pg_dump (or similar) and
 it might take more time than your old_snapshot_threshold would be
 to increase the threshold, reload, dump, set it back to the
 normal setting, and reload again.

While I think pg_dump is a great solution for small to medium
installations, there are a number of excellent file-based backup options
available.  Anyone who is seriously worried about bloat (or locking)
should be looking to those solutions.

-- 
- David Steele
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Kevin Grittner
Magnus Hagander mag...@hagander.net wrote:
 On Feb 17, 2015 12:26 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:

 It seems we already have a mechanism in place that allows
 tuning of query cancel on standbys vs. preventing standby
 queries from seeing old data, specifically
 max_standby_streaming_delay/max_standby_archive_delay.  We
 obsessed about how users were going to react to these odd
 variables, but there has been little negative feedback.

 FWIW, I think that's a somewhat skewed perception. I think it
 was right to introduce those, because we didn't really have any
 alternatives.

As far as I can see, the alternatives suggested so far are all
about causing heap bloat to progress more slowly, but still without
limit.  I suggest, based on a lot of user feedback (from the
customer I've talked about at some length on this thread, as well
as numerous others), that unlimited bloat based on the activity of
one connection is a serious deficiency in the product; and that
there is no real alternative to something like a snapshot too old
error if we want to fix that deficiency.  Enhancements to associate
a snapshot with a database and using a separate vacuum xmin per
database, not limiting vacuum of a particular object by snapshots
that cannot see that snapshot, etc., are all Very Good Things and I
hope those changes are made, but none of that fixes a very
fundamental flaw.

 But max_standby_streaming_delay, max_standby_archive_delay and
 hot_standby_feedback are among the most frequent triggers for
 questions and complaints that I/we see.

 Agreed.
 And a really bad one used to be vacuum_defer_cleanup_age, because
 of confusing units amongst other things. Which in terms seems
 fairly close to Kevins suggestions, unfortunately.

Particularly my initial suggestion, which was to base snapshot
age it on the number of transaction IDs assigned.  Does this look
any better to you if it is something that can be set to '20min' or
'1h'?  Just to restate, that would not automatically cancel the
snapshots past that age; it would allow vacuum of any tuples which
became dead that long ago, and would cause a snapshot too old
message for any read of a page modified more than that long ago
using a snapshot which was older than that.

Unless this idea gets some +1 votes Real Soon Now, I will mark the
patch as Rejected and move on.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Stephen Frost
Kevin,

* Kevin Grittner (kgri...@ymail.com) wrote:
 Magnus Hagander mag...@hagander.net wrote:
  On Feb 17, 2015 12:26 AM, Andres Freund and...@2ndquadrant.com wrote:
  On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
 
  It seems we already have a mechanism in place that allows
  tuning of query cancel on standbys vs. preventing standby
  queries from seeing old data, specifically
  max_standby_streaming_delay/max_standby_archive_delay.  We
  obsessed about how users were going to react to these odd
  variables, but there has been little negative feedback.
 
  FWIW, I think that's a somewhat skewed perception. I think it
  was right to introduce those, because we didn't really have any
  alternatives.
 
 As far as I can see, the alternatives suggested so far are all
 about causing heap bloat to progress more slowly, but still without
 limit.  I suggest, based on a lot of user feedback (from the
 customer I've talked about at some length on this thread, as well
 as numerous others), that unlimited bloat based on the activity of
 one connection is a serious deficiency in the product; and that
 there is no real alternative to something like a snapshot too old
 error if we want to fix that deficiency.  Enhancements to associate
 a snapshot with a database and using a separate vacuum xmin per
 database, not limiting vacuum of a particular object by snapshots
 that cannot see that snapshot, etc., are all Very Good Things and I
 hope those changes are made, but none of that fixes a very
 fundamental flaw.

I also agree with the general idea that it makes sense to provide a way
to control bloat, but I think you've missed what Andres was getting at
with his suggestion (if I understand correctly, apologies if I don't).

The problem is that we're only looking at the overall xmin / xmax
horizon when it comes to deciding if a given tuple is dead.  That's
not quite right- the tuple might actually be dead to all *current*
transactions by being newer than the oldest transaction but dead for all
later transactions.  Basically, there exist gaps between our cluster
wide xmin / xmax where we might find actually dead rows.  Marking those
rows dead and reusable *would* stop the bloat, not just slow it down.

In the end, with a single long-running transaction, the worst bloat you
would have is double the size of the system at the time the long-running
transaction started.  Another way of thinking about it is with this
timeline:

xx
/\  /\ /\ /\
Long runningrow createdrow deletedvacuum
transaction starts

Now, if all transactions currently running started either before the row
was created or after the row was deleted, then the row is dead and
vacuum can reclaim it.  Finding those gaps in the transaction space for
all of the currently running processes might not be cheap, but for this
kind of a use-case, it might be worth the effort.  On a high-churn
system where the actual set of rows being copied over constantly isn't
very high then you could end up with some amount of duplication of rows-
those to satisfy the ancient transaction and the current working set,
but there wouldn't be any further bloat and it'd almost cerainly be a
much better situation than what is being seen now.

 Particularly my initial suggestion, which was to base snapshot
 age it on the number of transaction IDs assigned.  Does this look
 any better to you if it is something that can be set to '20min' or
 '1h'?  Just to restate, that would not automatically cancel the
 snapshots past that age; it would allow vacuum of any tuples which
 became dead that long ago, and would cause a snapshot too old
 message for any read of a page modified more than that long ago
 using a snapshot which was older than that.
 
 Unless this idea gets some +1 votes Real Soon Now, I will mark the
 patch as Rejected and move on.

I'm not against having a knob like this, which is defaulted to off,
but I do think we'd be a lot better off with a system that could
realize when rows are not visible to any currently running transaction
and clean them up.  If this knob's default is off then I don't think
we'd be likely to get the complaints which are being discussed (or, if
we did, we could point the individual at the admin who set the knob...).

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Magnus Hagander
On Wed, Feb 18, 2015 at 10:57 PM, Kevin Grittner kgri...@ymail.com wrote:

 Magnus Hagander mag...@hagander.net wrote:
  On Feb 17, 2015 12:26 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
  But max_standby_streaming_delay, max_standby_archive_delay and
  hot_standby_feedback are among the most frequent triggers for
  questions and complaints that I/we see.
 
  Agreed.
  And a really bad one used to be vacuum_defer_cleanup_age, because
  of confusing units amongst other things. Which in terms seems
  fairly close to Kevins suggestions, unfortunately.

 Particularly my initial suggestion, which was to base snapshot
 age it on the number of transaction IDs assigned.  Does this look
 any better to you if it is something that can be set to '20min' or
 '1h'?  Just to restate, that would not automatically cancel the
 snapshots past that age; it would allow vacuum of any tuples which
 became dead that long ago, and would cause a snapshot too old
 message for any read of a page modified more than that long ago
 using a snapshot which was older than that.


Yes, it would definitely look much better. My reference per above was
exactly that - having a setting in the unit number of xids confused a lot
of users and made it really hard to tune. Having something in time units is
a lot easier to understand and tune for most people.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Kevin Grittner
Stephen Frost sfr...@snowman.net wrote:

 I also agree with the general idea that it makes sense to provide a way
 to control bloat, but I think you've missed what Andres was getting at
 with his suggestion (if I understand correctly, apologies if I don't).

 The problem is that we're only looking at the overall xmin / xmax
 horizon when it comes to deciding if a given tuple is dead.  That's
 not quite right- the tuple might actually be dead to all *current*
 transactions by being newer than the oldest transaction but dead for all
 later transactions.  Basically, there exist gaps between our cluster
 wide xmin / xmax where we might find actually dead rows.  Marking those
 rows dead and reusable *would* stop the bloat, not just slow it down.

 In the end, with a single long-running transaction, the worst bloat you
 would have is double the size of the system at the time the long-running
 transaction started.

I agree that limiting bloat to one dead tuple for every live one
for each old snapshot is a limit that has value, and it was unfair
of me to characterize that as not being a limit.  Sorry for that.

This possible solution was discussed with the user whose feedback
caused me to write this patch, but there were several reasons they
dismissed that as a viable total solution for them, two of which I
can share:

(1)  They have a pool of connections each of which can have several
long-running cursors, so the limit from that isn't just doubling
the size of their database, it is limiting it to some two-or-three
digit multiple of the necessary size.

(2)  They are already prepared to deal with snapshot too old
errors on queries that run more than about 20 minutes and which
access tables which are modified.  They would rather do that than
suffer the bloat beyond that point.

IMO all of these changes people are working are very valuable, and
complement each other.  This particular approach is likely to be
especially appealing to those moving from Oracle because it is a
familiar mechanism, and one which some of them have written their
software to be aware of and deal with gracefully.

 I'm not against having a knob like this, which is defaulted to off,

Thanks!  I'm not sure that amounts to a +1, but at least it doesn't
sound like a -1.  :-)

 but I do think we'd be a lot better off with a system that could
 realize when rows are not visible to any currently running transaction
 and clean them up.

+1

But they are not mutually exclusive; I see them as complementary.

 If this knob's default is off then I don't think
 we'd be likely to get the complaints which are being discussed (or, if
 we did, we could point the individual at the admin who set the knob...).

That's how I see it, too.  I would not suggest making the default
anything other than off, but there are situations where it would
be a nice tool to have in the toolbox.

Thanks for the feedback!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Greg Stark
On Sun, Feb 15, 2015 at 8:27 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 There might be something in that, but again it's not much like this patch.
 The key point I think we're both making is that nondeterministic failures
 are bad, especially when you're talking about long-running, expensive-to-
 retry transactions.


But I think Kevin would agree with you there. That's why he's interested in
having the errors not occur if you don't read from the volatile tables. Ie,
your reporting query reading from read-only tables would be guaranteed to
succeed even if some other table had had some rows vacuumed away.

I'm not sure that's really going to work because of things like hint bit
updates or hot pruning. That is, say your table is read-only now but last
week some of the records were updated. You might reasonably expect those
rows to be safe and indeed the rows themselves will still be in your
report. But the earlier versions of the rows could still be sitting on some
pages invisible to every query but not vacuumable quite yet and then
suddenly they get vacuumed away and the LSN on the page gets bumped.

Fwiw I would strongly suggest that instead of having a number of
transactions to have a time difference. I haven't been following the
commit timestamp thread but I'm hoping that patch has the infrastructure
needed to look up an lsn and find out the timestamp and vice versa. So when
you take a snapshot you could look up the effective cut-off LSN based on
the time difference specified in the GUC. As a DBA I would find it
infinitely more comforting specifying old_snapshot_threshold=4h than
specifying some arbitrary large number of transactions and hoping I
calculated the transaction rate reasonably accurately.


-- 
greg


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-18 Thread Stephen Frost
* Kevin Grittner (kgri...@ymail.com) wrote:
 Stephen Frost sfr...@snowman.net wrote:
  I also agree with the general idea that it makes sense to provide a way
  to control bloat, but I think you've missed what Andres was getting at
  with his suggestion (if I understand correctly, apologies if I don't).
 
  The problem is that we're only looking at the overall xmin / xmax
  horizon when it comes to deciding if a given tuple is dead.  That's
  not quite right- the tuple might actually be dead to all *current*
  transactions by being newer than the oldest transaction but dead for all
  later transactions.  Basically, there exist gaps between our cluster
  wide xmin / xmax where we might find actually dead rows.  Marking those
  rows dead and reusable *would* stop the bloat, not just slow it down.
 
  In the end, with a single long-running transaction, the worst bloat you
  would have is double the size of the system at the time the long-running
  transaction started.
 
 I agree that limiting bloat to one dead tuple for every live one
 for each old snapshot is a limit that has value, and it was unfair
 of me to characterize that as not being a limit.  Sorry for that.
 
 This possible solution was discussed with the user whose feedback
 caused me to write this patch, but there were several reasons they
 dismissed that as a viable total solution for them, two of which I
 can share:
 
 (1)  They have a pool of connections each of which can have several
 long-running cursors, so the limit from that isn't just doubling
 the size of their database, it is limiting it to some two-or-three
 digit multiple of the necessary size.

This strikes me as a bit off-the-cuff; was an analysis done which
deteremined that would be the result?  If there is overlap between the
long-running cursors then there would be less bloat, and most systems
which I'm familiar with don't turn the entire database over in 20
minutes, 20 hours, or even 20 days except in pretty specific cases.
Perhaps this is one of those, and if so then I'm all wet, but the
feeling I get is that this is a way to dismiss this solution because
it's not what's wanted, which is what Oracle did.

 (2)  They are already prepared to deal with snapshot too old
 errors on queries that run more than about 20 minutes and which
 access tables which are modified.  They would rather do that than
 suffer the bloat beyond that point.

That, really, is the crux here- they've already got support for dealing
with it the way Oracle did and they'd like PG to do that too.
Unfortunately, that, by itself, isn't a good reason for a particular
capability (we certainly aren't going to be trying to duplicate PL/SQL
in PG any time soon).  That said, there are capabilities in other
RDBMS's which are valuable and which we *do* want, so the fact that
Oracle does this also isn't a reason to not include it.

 IMO all of these changes people are working are very valuable, and
 complement each other.  This particular approach is likely to be
 especially appealing to those moving from Oracle because it is a
 familiar mechanism, and one which some of them have written their
 software to be aware of and deal with gracefully.

For my 2c, I'd much rather provide them with a system where they don't
have to deal with broken snapshots than give them a way to have them the
way Oracle provided them. :)  That said, even the approach Andres
outlined will cause bloat and it may be beyond what's acceptable in some
environments, and it's certainly more complicated and unlikely to get
done in the short term.

  I'm not against having a knob like this, which is defaulted to off,
 
 Thanks!  I'm not sure that amounts to a +1, but at least it doesn't
 sound like a -1.  :-)

So, at the time I wrote that, I wasn't sure if it was a +1 or not
myself.  I've been thinking about it since then, however, and I'm
leaning more towards having the capability than not, so perhaps that's a
+1, but it doesn't excuse the need to come up with an implementation
that everyone can be happy with and what you've come up with so far
doesn't have a lot of appeal, based on the feedback (I've only glanced
through it myself, but I agree with Andres and Tom that it's a larger
footprint than we'd want for this and the number of places having to be
touched is concerning as it could lead to future bugs).

A lot of that would go away if there was a way to avoid having to mess
with the index AMs, I'd think, but I wonder if we'd actually need more
done there- it's not immediately obvious to me how an index-only scan is
safe with this.  Whenever an actual page is visited, we can check the
LSN, and the relation can't be truncated by vacuum since the transaction
will still have a lock on the table which prevents it, but does the
visibility-map update check make sure to never mark pages all-visible
when one of these old transactions is running around?  On what basis?

  but I do think we'd be a lot better off with a system that could
  realize when rows 

Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-16 Thread Bruce Momjian
On Sun, Feb 15, 2015 at 11:36:50AM -0500, Tom Lane wrote:
 I wonder if we couldn't achieve largely the same positive effects through
 adding a simple transaction-level timeout option.  That would be far
 easier for users to understand and manage, it would be trivial to allow
 specific high-value transactions to run with a laxer setting, it does not
 create any implementation-detail-dependent behavior that we'd be having to
 explain to users forevermore, and (not incidentally) it would be a lot
 simpler and more trustworthy to implement.  There's no well-defined
 correlation between your setting and the net effect on database bloat,
 but that's true with the snapshot too old approach as well.

While we have prided ourselves on not generating query cancel messages
due to snapshot-too-old, there is the downside of bloat.  I understand
the need to allow users to make the trade-off between bloat and query
cancellation.

It seems we already have a mechanism in place that allows tuning of
query cancel on standbys vs. preventing standby queries from seeing old
data, specifically
max_standby_streaming_delay/max_standby_archive_delay.  We obsessed
about how users were going to react to these odd variables, but there
has been little negative feedback.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-16 Thread Jim Nasby

On 2/16/15 2:41 AM, Andres Freund wrote:

Since you actually don't want cancellations for the long running
reporting queries it very much might be sensible to switch to a more
complicated version of HeapTupleSatisfiesVacuum() if there's longrunning
queries. One that can recognize if rows are actually visible to any
backend, or if there are just snapshots that see older rows. I've
previously wondered how hard this would be, but I don't think it's
*that*  hard. And it'd be a much more general solution.


Josh Berkus mentioned on IRC that they're working on improving vacuum, 
and I think it was something along these lines.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-16 Thread Andres Freund
On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
 It seems we already have a mechanism in place that allows tuning of
 query cancel on standbys vs. preventing standby queries from seeing old
 data, specifically
 max_standby_streaming_delay/max_standby_archive_delay.  We obsessed
 about how users were going to react to these odd variables, but there
 has been little negative feedback.

FWIW, I think that's a somewhat skewed perception. I think it was right to
introduce those, because we didn't really have any alternatives.

But max_standby_streaming_delay, max_standby_archive_delay and
hot_standby_feedback are among the most frequent triggers for questions
and complaints that I/we see.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-16 Thread Magnus Hagander
On Feb 17, 2015 12:26 AM, Andres Freund and...@2ndquadrant.com wrote:

 On 2015-02-16 16:35:46 -0500, Bruce Momjian wrote:
  It seems we already have a mechanism in place that allows tuning of
  query cancel on standbys vs. preventing standby queries from seeing old
  data, specifically
  max_standby_streaming_delay/max_standby_archive_delay.  We obsessed
  about how users were going to react to these odd variables, but there
  has been little negative feedback.

 FWIW, I think that's a somewhat skewed perception. I think it was right to
 introduce those, because we didn't really have any alternatives.

 But max_standby_streaming_delay, max_standby_archive_delay and
 hot_standby_feedback are among the most frequent triggers for questions
 and complaints that I/we see.


Agreed.

And a really bad one used to be vacuum_defer_cleanup_age, because of
confusing units amongst other things. Which in terms seems fairly close to
Kevins suggestions, unfortunately.

/Magnus


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-16 Thread Andres Freund
On 2015-02-16 05:19:11 +, Kevin Grittner wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
  Jim Nasby jim.na...@bluetreble.com writes:
  On 2/15/15 10:36 AM, Tom Lane wrote:
  I wonder if we couldn't achieve largely the same positive effects through
  adding a simple transaction-level timeout option.
 
 We suggested this to our customer and got out of the meeting with
 it looking like it *might* fly.  In the next meeting, however, they
 said they had run it by others and reviewed the code and it was
 completely unacceptable -- they would not consider pg with this as
 the solution.
 
  A common use-case is long-running reports hitting relatively stable data
  in a database that also has tables with a high churn rate (ie: queue
  tables). In those scenarios your only options right now are to suffer
  huge amounts of bloat in the high-churn or not do your reporting. A
  simple transaction timeout only solves this by denying you reporting
  queries.
 
  Agreed, but Kevin's proposal has exactly the same problem only worse,
  because (a) the reporting transactions might or might not fail (per
  Murphy, they'll fail consistently only when you're on deadline), and
  (b) if they do fail, there's nothing you can do short of increasing the
  slack db-wide.
 
 These they were comfortable with, and did *not* consider to be
 unpredictable or something they could not do something about.
 I really don't feel I can say more than that, though, without
 disclosing more than I should.

I agree that we need to do something about the dangers of long
snapshots, I'm not sure this is it. I'm not sure of the contrary either.

What I'm definitely not a fan of though, is this implementation. Having
to add checks to a large number of places is a recipe for silently wrong
query results.


One thing I was wondering about recently was introducing an optimization
for repeatedly updated rows into the vacuum code: A row that has xmin =
xmax where these have committed can be removed, even if the xid is above
the xmin horizon - no other transaction is ever going to see it. While
there's some hairy-ness implementing that, it doesn't seem too hard. And
there's a fair number of cases where that'd allow less bloat to
accumulate.  Obviously it'd be better if we had logic to do that for
other patterns as well (where the updates aren't in the same xact), but
it seems like a good start.



  There might be something in that, but again it's not much like this patch.
  The key point I think we're both making is that nondeterministic failures
  are bad, especially when you're talking about long-running, expensive-to-
  retry transactions.
 
 What the customer most doesn't want to be nondeterministic is
 whether the error is generated only when the snapshot has been used
 to read a page which has been modified since the snapshot was
 taken.  If tables or materialized views are set up before a query
 and then not subsequently modified during the execution of the
 query, that query must not be canceled even if it runs for days,
 but it must not cause unconstrained bloat during the run.  So far I
 don't see any way to avoid including the LSN with the snapshot or
 modifying the index AMs.  Let's be clear on the footprint for that;
 for the btree implementation it is:

IIUC you never would want cancellations when accessing the the tables
these longrunning backends actually access. The errors about too old
snapshots are just a stopgap because we can't compute the xmin per table
in a meaningful way atm. Right?

Is the bloat caused by rows these transactions actually can see or are
the not-removed rows newer than the transaction's xmax?

Since you actually don't want cancellations for the long running
reporting queries it very much might be sensible to switch to a more
complicated version of HeapTupleSatisfiesVacuum() if there's longrunning
queries. One that can recognize if rows are actually visible to any
backend, or if there are just snapshots that see older rows. I've
previously wondered how hard this would be, but I don't think it's
*that* hard. And it'd be a much more general solution.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-15 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 What this patch does is add a GUC call old_snapshot_threshold.  It
 defaults to -1, which leaves behavior matching unpatched code.
 Above that it allows tuples to be vacuumed away after the number of
 transaction IDs specified by the GUC have been consumed.

 TBH, I'm not sure why we'd wish to emulate Oracle's single worst
 operational feature.

 I've run into cases where people have suffered horribly bloated
 databases because of one ill-behaved connection.  Some companies
 don't want to be vulnerable to that and the disruption that
 recovery from that bloat causes.

No doubt, preventing bloat is a good thing, but that doesn't mean this
is the best API we could create for the issue.  The proposition this
patch offers to DBAs is: You can turn this knob to reduce bloat by some
hard-to-quantify factor.  The cost is that some long-running transactions
might fail.  You won't know which ones are at risk, the failures won't be
the same from time to time, and you won't be able to do anything to spare
high-value transactions from that fate except by turning that knob back
again globally (which requires a database restart).  Maybe refugees from
Oracle will think that sounds good, but nobody else will.

I wonder if we couldn't achieve largely the same positive effects through
adding a simple transaction-level timeout option.  That would be far
easier for users to understand and manage, it would be trivial to allow
specific high-value transactions to run with a laxer setting, it does not
create any implementation-detail-dependent behavior that we'd be having to
explain to users forevermore, and (not incidentally) it would be a lot
simpler and more trustworthy to implement.  There's no well-defined
correlation between your setting and the net effect on database bloat,
but that's true with the snapshot too old approach as well.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-15 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 On 2/15/15 10:36 AM, Tom Lane wrote:
 I wonder if we couldn't achieve largely the same positive effects through
 adding a simple transaction-level timeout option.

 A common use-case is long-running reports hitting relatively stable data 
 in a database that also has tables with a high churn rate (ie: queue 
 tables). In those scenarios your only options right now are to suffer 
 huge amounts of bloat in the high-churn or not do your reporting. A 
 simple transaction timeout only solves this by denying you reporting 
 queries.

Agreed, but Kevin's proposal has exactly the same problem only worse,
because (a) the reporting transactions might or might not fail (per
Murphy, they'll fail consistently only when you're on deadline), and
(b) if they do fail, there's nothing you can do short of increasing the
slack db-wide.

 An idea that I've had on this would be some way to lock down the 
 tables that a long-running transaction could access. That would allow 
 vacuum to ignore any snapshots that transaction had for tables it wasn't 
 accessing. That's something that would be deterministic.

There might be something in that, but again it's not much like this patch.
The key point I think we're both making is that nondeterministic failures
are bad, especially when you're talking about long-running, expensive-to-
retry transactions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-15 Thread Jim Nasby

On 2/15/15 10:36 AM, Tom Lane wrote:

Kevin Grittner kgri...@ymail.com writes:

Tom Lane t...@sss.pgh.pa.us wrote:

Kevin Grittner kgri...@ymail.com writes:

What this patch does is add a GUC call old_snapshot_threshold.  It
defaults to -1, which leaves behavior matching unpatched code.
Above that it allows tuples to be vacuumed away after the number of
transaction IDs specified by the GUC have been consumed.



TBH, I'm not sure why we'd wish to emulate Oracle's single worst
operational feature.



I've run into cases where people have suffered horribly bloated
databases because of one ill-behaved connection.  Some companies
don't want to be vulnerable to that and the disruption that
recovery from that bloat causes.


No doubt, preventing bloat is a good thing, but that doesn't mean this
is the best API we could create for the issue.  The proposition this
patch offers to DBAs is: You can turn this knob to reduce bloat by some
hard-to-quantify factor.  The cost is that some long-running transactions
might fail.  You won't know which ones are at risk, the failures won't be
the same from time to time, and you won't be able to do anything to spare
high-value transactions from that fate except by turning that knob back
again globally (which requires a database restart).  Maybe refugees from
Oracle will think that sounds good, but nobody else will.

I wonder if we couldn't achieve largely the same positive effects through
adding a simple transaction-level timeout option.  That would be far
easier for users to understand and manage, it would be trivial to allow
specific high-value transactions to run with a laxer setting, it does not
create any implementation-detail-dependent behavior that we'd be having to
explain to users forevermore, and (not incidentally) it would be a lot
simpler and more trustworthy to implement.  There's no well-defined
correlation between your setting and the net effect on database bloat,
but that's true with the snapshot too old approach as well.


A common use-case is long-running reports hitting relatively stable data 
in a database that also has tables with a high churn rate (ie: queue 
tables). In those scenarios your only options right now are to suffer 
huge amounts of bloat in the high-churn or not do your reporting. A 
simple transaction timeout only solves this by denying you reporting 
queries.


An idea that I've had on this would be some way to lock down the 
tables that a long-running transaction could access. That would allow 
vacuum to ignore any snapshots that transaction had for tables it wasn't 
accessing. That's something that would be deterministic.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-15 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Jim Nasby jim.na...@bluetreble.com writes:
 On 2/15/15 10:36 AM, Tom Lane wrote:
 I wonder if we couldn't achieve largely the same positive effects through
 adding a simple transaction-level timeout option.

We suggested this to our customer and got out of the meeting with
it looking like it *might* fly.  In the next meeting, however, they
said they had run it by others and reviewed the code and it was
completely unacceptable -- they would not consider pg with this as
the solution.

 A common use-case is long-running reports hitting relatively stable data
 in a database that also has tables with a high churn rate (ie: queue
 tables). In those scenarios your only options right now are to suffer
 huge amounts of bloat in the high-churn or not do your reporting. A
 simple transaction timeout only solves this by denying you reporting
 queries.

 Agreed, but Kevin's proposal has exactly the same problem only worse,
 because (a) the reporting transactions might or might not fail (per
 Murphy, they'll fail consistently only when you're on deadline), and
 (b) if they do fail, there's nothing you can do short of increasing the
 slack db-wide.

These they were comfortable with, and did *not* consider to be
unpredictable or something they could not do something about.
I really don't feel I can say more than that, though, without
disclosing more than I should.

 An idea that I've had on this would be some way to lock down the
 tables that a long-running transaction could access. That would allow
 vacuum to ignore any snapshots that transaction had for tables it wasn't
 accessing. That's something that would be deterministic.

While this option was not specifically suggested, based on their
their reasons that numerous other options we suggested were
unacceptable, I feel sure that this would not be acceptable.

I think Tom hit the nail on the head when he said Maybe refugees
from Oracle will think that sounds good...  It is precisely those
with very large code bases which have been modified over long
periods of time to work with Oracle that would find this solution
attractive, or perhaps *necessary* to consider a move to Postgres.
That's a potential market we don't care to write off.

 There might be something in that, but again it's not much like this patch.
 The key point I think we're both making is that nondeterministic failures
 are bad, especially when you're talking about long-running, expensive-to-
 retry transactions.

What the customer most doesn't want to be nondeterministic is
whether the error is generated only when the snapshot has been used
to read a page which has been modified since the snapshot was
taken.  If tables or materialized views are set up before a query
and then not subsequently modified during the execution of the
query, that query must not be canceled even if it runs for days,
but it must not cause unconstrained bloat during the run.  So far I
don't see any way to avoid including the LSN with the snapshot or
modifying the index AMs.  Let's be clear on the footprint for that;
for the btree implementation it is:

 src/backend/access/nbtree/nbtinsert.c |  7 ---
 src/backend/access/nbtree/nbtpage.c   |  2 +-
 src/backend/access/nbtree/nbtsearch.c | 43 
++-
 src/include/access/nbtree.h   |  7 ---
 4 files changed, 43 insertions(+), 16 deletions(-)

What this discussion has made me reconsider is the metric for
considering a transaction too old.  The number of transaction IDs
consumed seems inferior as the unit of measure for that to LSN or
time.

It looks to me to be pretty trivial (on the order of maybe 30 lines
of code) to specify this GUC in minutes rather than transaction
IDs.  At first glance this seems like it would be vulnerable to the
usual complaints about mismanaged clocks, but those are easily
answered by using a cached time in shared memory that we populate
in such a way that it can never move backward.  Done right, this
could even allow the GUC to be changeable on reload rather than
only at restart.  A badly mismanaged system clock could not cause a
query to generate incorrect results; the worst that could happen is
that this feature would fail to control bloat as much as expected
or reads of modified data could generate the snapshot too old
error around the time of the clock adjustment.

As before, this would default to a magic value to mean that you
want the historical PostgreSQL behavior.

If that makes the idea more palatable to anyone, I can submit a
patch to that effect within the next 24 hours.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-14 Thread Tom Lane
Kevin Grittner kgri...@ymail.com writes:
 What this patch does is add a GUC call old_snapshot_threshold.  It
 defaults to -1, which leaves behavior matching unpatched code.
 Above that it allows tuples to be vacuumed away after the number of
 transaction IDs specified by the GUC have been consumed.

TBH, I'm not sure why we'd wish to emulate Oracle's single worst
operational feature.

 Unlike the other patch, this one is more at the proof of concept
 phase, because it requires support in the heap and each index AM to
 work correctly; so far I have only had time to cover the heap and
 btree indexes.

But, having said that, why would the index AMs care?  Seems like what
you are describing should be strictly a matter for VACUUM's removal
rules.  If we're going to have something as ugly as this, I would much
rather it had a very small code footprint.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow snapshot too old error, to prevent bloat

2015-02-14 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 What this patch does is add a GUC call old_snapshot_threshold.  It
 defaults to -1, which leaves behavior matching unpatched code.
 Above that it allows tuples to be vacuumed away after the number of
 transaction IDs specified by the GUC have been consumed.

 TBH, I'm not sure why we'd wish to emulate Oracle's single worst
 operational feature.

I've run into cases where people have suffered horribly bloated
databases because of one ill-behaved connection.  Some companies
don't want to be vulnerable to that and the disruption that
recovery from that bloat causes.

Note that the default is to maintain legacy PostgreSQL behavior; 
the postgresql.conf file must be modified to see this error.

 Unlike the other patch, this one is more at the proof of concept
 phase, because it requires support in the heap and each index AM to
 work correctly; so far I have only had time to cover the heap and
 btree indexes.

 But, having said that, why would the index AMs care?  Seems like what
 you are describing should be strictly a matter for VACUUM's removal
 rules.  If we're going to have something as ugly as this, I would much
 rather it had a very small code footprint.

If a tuple is vacuumed away, you would not notice that if you were
running an index scan because you would not visit the page it used
to be on to see that the LSN had changed.  We tried to sell the
idea that *any* scan using a snapshot past the threshold should
error out (which is the only way I can see to avoid making the
index AMs aware of this), but they insisted that there was no
reason to have scans fail when reading an unmodified table using an
old snapshot, or even an unmodified page of a modified table.

Due to the cost to this company of modifying their software to deal
with differrent behavior, they will not move without the behavior
I'm shooting for with this patch.  Due to the value of this
customer to us, we will put this (or something to accomplish this
behavior) into our fork.  We're happy to share it with the
community.  If the community does not feel they need this behavior,
or wants to generate the error more aggressively to avoid touching
the index AMs, I understand.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers