Re: snapshot too old issues, first around wraparound and then more.

2021-06-22 Thread Greg Stark
On Thu, 17 Jun 2021 at 23:49, Noah Misch wrote: > > On Wed, Jun 16, 2021 at 12:00:57PM -0400, Tom Lane wrote: > > I agree that's a great use-case. I don't like this implementation though. > > I think if you want to set things up like that, you should draw a line > > between the tables it's okay

Re: snapshot too old issues, first around wraparound and then more.

2021-06-17 Thread Thomas Munro
On Wed, Apr 15, 2020 at 2:21 PM Thomas Munro wrote: > On Mon, Apr 13, 2020 at 2:58 PM Thomas Munro wrote: > > On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan wrote: > > > I'm thinking of a version of "snapshot too old" that amounts to a > > > statement timeout that gets applied for xmin horizon

Re: snapshot too old issues, first around wraparound and then more.

2021-06-17 Thread Noah Misch
On Wed, Jun 16, 2021 at 12:00:57PM -0400, Tom Lane wrote: > Greg Stark writes: > > I think Andres's point earlier is the one that stands out the most for me: > > > > > I still think that's the most reasonable course. I actually like the > > > feature, but I don't think a better implementation of

Re: snapshot too old issues, first around wraparound and then more.

2021-06-17 Thread Stephen Frost
Greetings, * Peter Geoghegan (p...@bowt.ie) wrote: > On Wed, Jun 16, 2021 at 12:06 PM Andres Freund wrote: > > > I would think that it wouldn't really matter inside VACUUM -- it would > > > only really need to be either an opportunistic pruning or an > > > opportunistic index deletion thing --

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Peter Geoghegan
On Wed, Jun 16, 2021 at 12:06 PM Andres Freund wrote: > > I would think that it wouldn't really matter inside VACUUM -- it would > > only really need to be either an opportunistic pruning or an > > opportunistic index deletion thing -- probably both. Most of the time > > VACUUM doesn't seem to

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Peter Geoghegan
On Wed, Jun 16, 2021 at 11:27 AM Andres Freund wrote: > 2) Modeling when it is safe to remove row versions. It is easy to remove >a tuple that was inserted and deleted within one "not needed" xid >range, but it's far less obvious when it is safe to remove row >versions where

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Andres Freund
Hi, On 2021-06-16 10:44:49 -0700, Peter Geoghegan wrote: > On Wed, Jun 16, 2021 at 10:04 AM Tom Lane wrote: > > Of course, there's still the question of how VACUUM could cheaply > > apply such info to decide what could be purged. > I would think that it wouldn't really matter inside VACUUM --

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Peter Geoghegan
On Wed, Jun 16, 2021 at 11:06 AM Andres Freund wrote: > > 2) (a) Some hackers want the feature gone so they can implement changes > >without making those changes cooperate with this feature. (b) Bugs in > > this > >feature make such cooperation materially harder. > > I think the a) part

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Andres Freund
Hi, On 2021-06-16 13:04:07 -0400, Tom Lane wrote: > Yeah, I think this scenario of a few transactions with old snapshots > and the rest with very new ones could be improved greatly if we exposed > more info about backends' snapshot state than just "oldest xmin". But > that might be expensive to

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Andres Freund
Hi, On 2021-06-15 21:59:45 -0700, Noah Misch wrote: > Hackers are rather wise, but the variety of PostgreSQL use is enormous. We > see that, among other ways, when regression fixes spike in each vN.1. The > $SUBJECT feature was born in response to a user experience; a lack of hacker > interest

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Peter Geoghegan
On Wed, Jun 16, 2021 at 10:04 AM Tom Lane wrote: > I remember that Heikki was fooling with a patch that reduced snapshots > to LSNs. If we got that done, it'd be practical to expose complete > info about backends' snapshot state in a lot of cases (i.e., anytime > you had less than N live

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Tom Lane
Stephen Frost writes: > I've long felt that the appropriate approach to addressing that is to > improve on VACUUM and find a way to do better than just having the > conditional of 'xmax < global min' drive if we can mark a given tuple as > no longer visible to anyone. Yeah, I think this scenario

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Stephen Frost
Greetings, * Greg Stark (st...@mit.edu) wrote: > I think Andres's point earlier is the one that stands out the most for me: > > > I still think that's the most reasonable course. I actually like the > > feature, but I don't think a better implementation of it would share > > much if any of the

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Tom Lane
Greg Stark writes: > Fwiw I too think the basic idea of the feature is actually awesome. > There are tons of use cases where you might have one long-lived > transaction working on a dedicated table (or even a schema) that will > never look at the rapidly mutating tables in another schema and

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Greg Stark
I think Andres's point earlier is the one that stands out the most for me: > I still think that's the most reasonable course. I actually like the > feature, but I don't think a better implementation of it would share > much if any of the current infrastructure. That makes me wonder whether

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 11:24 PM Noah Misch wrote: > When I say "some hackers", I don't mean that specific people think such > thoughts right now. I'm saying that the expected cost of future cooperation > with the feature is nonzero, and bugs in the feature raise that cost. I see. > > > A

Re: snapshot too old issues, first around wraparound and then more.

2021-06-16 Thread Noah Misch
On Tue, Jun 15, 2021 at 10:47:45PM -0700, Peter Geoghegan wrote: > On Tue, Jun 15, 2021 at 9:59 PM Noah Misch wrote: > > Hackers are rather wise, but the variety of PostgreSQL use is enormous. We > > see that, among other ways, when regression fixes spike in each vN.1. The > > $SUBJECT feature

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 9:59 PM Noah Misch wrote: > Hackers are rather wise, but the variety of PostgreSQL use is enormous. We > see that, among other ways, when regression fixes spike in each vN.1. The > $SUBJECT feature was born in response to a user experience; a lack of hacker > interest

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Noah Misch
On Tue, Jun 15, 2021 at 02:32:11PM -0700, Peter Geoghegan wrote: > What I had in mind was this: a committer adopting the feature > themselves. The committer would be morally obligated to maintain the > feature on an ongoing basis, just as if they were the original > committer. This seems like the

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Thomas Munro
On Wed, Jun 16, 2021 at 7:17 AM Robert Haas wrote: > Progress has been pretty limited, but not altogether nonexistent. > 55b7e2f4d78d8aa7b4a5eae9a0a810601d03c563 fixed, or at least seemed to > fix, the time->XID mapping, which is one of the main things that > Andres said was broken originally.

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Andres Freund
Hi, On 2021-06-15 15:17:05 -0400, Robert Haas wrote: > But that's not clear to me. I'm not clear how exactly how many > problems we know about and need to fix in order to keep the feature, > and I'm also not clear how deep the hole goes. Like, if we need to get > a certain number of specific bugs

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Tom Lane
Peter Geoghegan writes: > What I had in mind was this: a committer adopting the feature > themselves. The committer would be morally obligated to maintain the > feature on an ongoing basis, just as if they were the original > committer. This seems like the only sensible way of resolving this >

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Peter Geoghegan
On Wed, Apr 1, 2020 at 4:59 PM Andres Freund wrote: > The primary issue here is that there is no TestForOldSnapshot() in > heap_hot_search_buffer(). Therefore index fetches will never even try to > detect that tuples it needs actually have already been pruned away. This is still true, right?

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 12:17 PM Robert Haas wrote: > My general point here is that I would like to know whether we have a > finite number of reasonably localized bugs or a three-ring disaster > that is unrecoverable no matter what we do. Andres seems to think it > is the latter, and I *think*

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 12:49 PM Tom Lane wrote: > Robert Haas writes: > > My general point here is that I would like to know whether we have a > > finite number of reasonably localized bugs or a three-ring disaster > > that is unrecoverable no matter what we do. Andres seems to think it > > is

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Tom Lane
Robert Haas writes: > My general point here is that I would like to know whether we have a > finite number of reasonably localized bugs or a three-ring disaster > that is unrecoverable no matter what we do. Andres seems to think it > is the latter, and I *think* Peter Geoghegan agrees, but I

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Robert Haas
On Tue, Jun 15, 2021 at 12:51 PM Tom Lane wrote: > So, it's well over a year later, and so far as I can see exactly > nothing has been done about snapshot_too_old's problems. Progress has been pretty limited, but not altogether nonexistent. 55b7e2f4d78d8aa7b4a5eae9a0a810601d03c563 fixed, or at

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 11:01 AM Tom Lane wrote: > The goal I have in mind is for snapshot_too_old to be fixed or gone > in v15. I don't feel a need to force the issue sooner than that, so > there's plenty of time for someone to step up, if anyone wishes to. Seems more than reasonable to me. A

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Tom Lane
Peter Geoghegan writes: > On Tue, Jun 15, 2021 at 9:51 AM Tom Lane wrote: >> So, it's well over a year later, and so far as I can see exactly >> nothing has been done about snapshot_too_old's problems. > I propose that the revert question be explicitly timeboxed. If the > issues haven't been

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Peter Geoghegan
On Tue, Jun 15, 2021 at 9:51 AM Tom Lane wrote: > So, it's well over a year later, and so far as I can see exactly > nothing has been done about snapshot_too_old's problems. FWIW I think that the concept itself is basically reasonable. The implementation is very flawed, though, so it hardly

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Andres Freund
Hi, On 2021-06-15 12:51:28 -0400, Tom Lane wrote: > Robert Haas writes: > > Oh, maybe I'm the one who misunderstood... > > So, it's well over a year later, and so far as I can see exactly > nothing has been done about snapshot_too_old's problems. > > I never liked that feature to begin with,

Re: snapshot too old issues, first around wraparound and then more.

2021-06-15 Thread Tom Lane
Robert Haas writes: > Oh, maybe I'm the one who misunderstood... So, it's well over a year later, and so far as I can see exactly nothing has been done about snapshot_too_old's problems. I never liked that feature to begin with, and I would be very glad to undertake the task of ripping it out.

Re: snapshot too old issues, first around wraparound and then more.

2020-04-18 Thread Robert Haas
On Fri, Apr 17, 2020 at 4:40 PM Thomas Munro wrote: > I understood that you'd forked a new thread to discuss one particular > problem among the many that Andres nailed to the door, namely the xid > map's failure to be monotonic, and here I was responding to other > things from his list, namely

Re: snapshot too old issues, first around wraparound and then more.

2020-04-17 Thread Thomas Munro
On Sat, Apr 18, 2020 at 12:19 AM Robert Haas wrote: > On Thu, Apr 16, 2020 at 11:37 PM Thomas Munro wrote: > > Then of course frozenXID can be advanced with eg update pg_database > > set datallowconn = 't' where datname = 'template0', then vacuumdb > > --freeze --all, and checked before and

Re: snapshot too old issues, first around wraparound and then more.

2020-04-17 Thread Robert Haas
On Thu, Apr 16, 2020 at 11:37 PM Thomas Munro wrote: > Then of course frozenXID can be advanced with eg update pg_database > set datallowconn = 't' where datname = 'template0', then vacuumdb > --freeze --all, and checked before and after with Robert's > pg_old_snapshot_time_mapping() SRF to see

Re: snapshot too old issues, first around wraparound and then more.

2020-04-16 Thread Thomas Munro
On Fri, Apr 17, 2020 at 3:37 PM Thomas Munro wrote: > On Mon, Apr 13, 2020 at 5:14 PM Andres Freund wrote: > > FWIW, I think the part that is currently harder to fix is the time->xmin > > mapping and some related pieces. Second comes the test > > infrastructure. Compared to those, adding

Re: snapshot too old issues, first around wraparound and then more.

2020-04-16 Thread Thomas Munro
On Mon, Apr 13, 2020 at 5:14 PM Andres Freund wrote: > FWIW, I think the part that is currently harder to fix is the time->xmin > mapping and some related pieces. Second comes the test > infrastructure. Compared to those, adding additional checks for old > snapshots wouldn't be too hard -

Re: snapshot too old issues, first around wraparound and then more.

2020-04-14 Thread Thomas Munro
On Mon, Apr 13, 2020 at 2:58 PM Thomas Munro wrote: > On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan wrote: > > I think that it's worth considering whether or not there are a > > significant number of "snapshot too old" users that rarely or never > > rely on old snapshots used by new queries.

Re: snapshot too old issues, first around wraparound and then more.

2020-04-12 Thread Andres Freund
Hi, On 2020-04-13 14:58:34 +1200, Thomas Munro wrote: > On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan wrote: > > I think that it's worth considering whether or not there are a > > significant number of "snapshot too old" users that rarely or never > > rely on old snapshots used by new queries.

Re: snapshot too old issues, first around wraparound and then more.

2020-04-12 Thread Thomas Munro
On Fri, Apr 3, 2020 at 2:22 PM Peter Geoghegan wrote: > I think that it's worth considering whether or not there are a > significant number of "snapshot too old" users that rarely or never > rely on old snapshots used by new queries. Kevin said that this > happens "in some cases", but how many

Re: snapshot too old issues, first around wraparound and then more.

2020-04-04 Thread Amit Kapila
On Sat, Apr 4, 2020 at 12:33 AM Andres Freund wrote: > > On 2020-04-03 14:32:09 +0530, Amit Kapila wrote: > > > > Agreed, but OTOH, not giving time to Kevin or others who might be > > interested to support this work is also not fair. I think once > > somebody comes up with patches for problems

Re: snapshot too old issues, first around wraparound and then more.

2020-04-03 Thread Andres Freund
Hi, On 2020-04-03 14:32:09 +0530, Amit Kapila wrote: > On Fri, Apr 3, 2020 at 6:52 AM Peter Geoghegan wrote: > > > > On Thu, Apr 2, 2020 at 5:17 PM Andres Freund wrote: > > > Since this is a feature that can result in wrong query results (and > > > quite possibly crashes / data corruption), I

Re: snapshot too old issues, first around wraparound and then more.

2020-04-03 Thread Amit Kapila
On Fri, Apr 3, 2020 at 6:52 AM Peter Geoghegan wrote: > > On Thu, Apr 2, 2020 at 5:17 PM Andres Freund wrote: > > Since this is a feature that can result in wrong query results (and > > quite possibly crashes / data corruption), I don't think we can just > > leave this unfixed. But given the

Re: snapshot too old issues, first around wraparound and then more.

2020-04-02 Thread Peter Geoghegan
On Thu, Apr 2, 2020 at 5:17 PM Andres Freund wrote: > Since this is a feature that can result in wrong query results (and > quite possibly crashes / data corruption), I don't think we can just > leave this unfixed. But given the amount of code / infrastructure > changes required to get this into

Re: snapshot too old issues, first around wraparound and then more.

2020-04-02 Thread Andres Freund
Hi, On 2020-04-01 12:02:18 -0400, Robert Haas wrote: > I have no objection to the idea that *if* the feature is hopelessly > broken, it should be removed. I don't think we have a real choice here at this point, at least for the back branches. Just about nothing around old_snapshot_threshold

Re: snapshot too old issues, first around wraparound and then more.

2020-04-02 Thread Andres Freund
Hi, I just spend a good bit more time improving my snapshot patch, so it could work well with a fixed version of the old_snapshot_threshold feature. Mostly so there's no unnecessary dependency on the resolution of the issues in that patch. When testing my changes, for quite a while, I could not

Re: snapshot too old issues, first around wraparound and then more.

2020-04-02 Thread Peter Geoghegan
On Thu, Apr 2, 2020 at 11:28 AM Peter Geoghegan wrote: > In conclusion, I share Andres' concerns here. There are glaring > problems with how we manipulate the data structure that controls the > effective horizon for pruning. Maybe they can be fixed while leaving > the code that manages the

Re: snapshot too old issues, first around wraparound and then more.

2020-04-02 Thread Peter Geoghegan
On Tue, Mar 31, 2020 at 11:40 PM Andres Freund wrote: > The problem, as far as I can tell, is that > oldSnapshotControl->head_timestamp appears to be intended to be the > oldest value in the ring. But we update it unconditionally in the "need > a new bucket, but it might not be the very next one"

Re: snapshot too old issues, first around wraparound and then more.

2020-04-02 Thread Andres Freund
Hi, On April 2, 2020 9:36:32 AM PDT, Kevin Grittner wrote: >On Wed, Apr 1, 2020 at 7:17 PM Andres Freund >wrote: > >> FWIW, with autovacuum=off the query does not get killed until a >manual >> vacuum, nor if fewer rows are deleted and the table has previously >been >> vacuumed. >> >> The

Re: snapshot too old issues, first around wraparound and then more.

2020-04-02 Thread Kevin Grittner
On Wed, Apr 1, 2020 at 7:17 PM Andres Freund wrote: > FWIW, with autovacuum=off the query does not get killed until a manual > vacuum, nor if fewer rows are deleted and the table has previously been > vacuumed. > > The vacuum in the second session isn't required. There just needs to be >

Re: snapshot too old issues, first around wraparound and then more.

2020-04-02 Thread Kevin Grittner
On Wed, Apr 1, 2020 at 6:59 PM Andres Freund wrote: > index fetches will never even try to > detect that tuples it needs actually have already been pruned away. > I looked at this flavor of problem today and from what I saw: (1) This has been a problem all the way back to 9.6.0. (2) The

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 17:54:06 -0700, Andres Freund wrote: > * Check whether the given snapshot is too old to have safely read the given > * page from the given table. If so, throw a "snapshot too old" error. > * > * This test generally needs to be performed after every BufferGetPage() call > *

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Peter Geoghegan
On Wed, Apr 1, 2020 at 5:54 PM Andres Freund wrote: > As far as I can tell there's not sufficient in-tree explanation of when > code needs to test for an old snapshot. There's just the following > comment above TestForOldSnapshot(): > * Check whether the given snapshot is too old to have safely

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 16:59:51 -0700, Andres Freund wrote: > The primary issue here is that there is no TestForOldSnapshot() in > heap_hot_search_buffer(). Therefore index fetches will never even try to > detect that tuples it needs actually have already been pruned away. bitmap heap scan doesn't

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Peter Geoghegan
On Wed, Apr 1, 2020 at 4:59 PM Andres Freund wrote: > Thanks, that's super helpful. Glad I could help. > I got a bit confused here - you seemed to have switched session 1 and 2 > around? Doesn't seem to matter much though, I was able to reproduce this. Yeah, I switched the session numbers

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 16:59:51 -0700, Andres Freund wrote: > The primary issue here is that there is no TestForOldSnapshot() in > heap_hot_search_buffer(). Therefore index fetches will never even try to > detect that tuples it needs actually have already been pruned away. FWIW, with autovacuum=off

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 15:30:39 -0700, Peter Geoghegan wrote: > On Wed, Apr 1, 2020 at 3:00 PM Peter Geoghegan wrote: > > I like that idea. I think that I've spotted what may be an independent > > bug, but I have to wait around for a minute or two to reproduce it > > each time. Makes it hard to get

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 14:11:11 -0700, Andres Freund wrote: > As far as I can tell, with a large old_snapshot_threshold, it can take a > very long time to get to a head_timestamp that's old enough for > TransactionIdLimitedForOldSnapshots() to do anything. Look at this > trace of a pgbench run with

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Peter Geoghegan
On Wed, Apr 1, 2020 at 3:00 PM Peter Geoghegan wrote: > I like that idea. I think that I've spotted what may be an independent > bug, but I have to wait around for a minute or two to reproduce it > each time. Makes it hard to get to a minimal test case. I now have simple steps to reproduce a bug

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Peter Geoghegan
On Wed, Apr 1, 2020 at 1:25 PM Robert Haas wrote: > Maybe that contrib module could even have some functions to simulate > aging without the passage of any real time. Like, say you have a > function or procedure old_snapshot_pretend_time_has_passed(integer), > and it moves

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 15:11:52 -0500, Kevin Grittner wrote: > On Wed, Apr 1, 2020 at 2:43 PM Andres Freund wrote: > > > The thing that makes me really worried is that the contents of the time > > mapping seem very wrong. I've reproduced query results in a REPEATABLE > > READ transaction changing

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Robert Haas
On Wed, Apr 1, 2020 at 3:43 PM Andres Freund wrote: > The thing that makes me really worried is that the contents of the time > mapping seem very wrong. I've reproduced query results in a REPEATABLE > READ transaction changing (pruned without triggering an error). And I've > reproduced rows not

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Kevin Grittner
On Wed, Apr 1, 2020 at 2:43 PM Andres Freund wrote: > The thing that makes me really worried is that the contents of the time > mapping seem very wrong. I've reproduced query results in a REPEATABLE > READ transaction changing (pruned without triggering an error). That is a very big problem.

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, Nice to have you back for a bit! Even if the circumstances aren't great... It's very understandable that the lists are past your limits, I barely keep up these days. Without any health issues. On 2020-04-01 14:10:09 -0500, Kevin Grittner wrote: > Perhaps the lack of evidence for usage in

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Kevin Grittner
On Wed, Apr 1, 2020 at 10:09 AM Andres Freund wrote: First off, many thanks to Andres for investigating this, and apologies for the bugs. Also thanks to Michael for making sure I saw the thread. I must also apologize that for not being able to track the community lists consistently due to

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Robert Haas
On Wed, Apr 1, 2020 at 2:37 PM Andres Freund wrote: > Just continuing is easier said than done. Especially with the background > of knowing that several users had hit the bug that allowed all of the > above to be hit, and that advancing relfrozenxid further would make it > worse. Fair point, but

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 13:27:56 -0400, Robert Haas wrote: > Perhaps "irresponsible" is the wrong word, but it's certainly caused > problems for multiple EnterpriseDB customers, and in my view, those > problems weren't necessary. Either a WARNING or an ERROR would have > shown up in the log, but an

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 11:04:43 -0700, Peter Geoghegan wrote: > On Wed, Apr 1, 2020 at 10:28 AM Robert Haas wrote: > > Is there any chance that you're planning to look into the details? > > That would certainly be welcome from my perspective. +1 This definitely needs more eyes. I am not even close

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Peter Geoghegan
On Wed, Apr 1, 2020 at 10:28 AM Robert Haas wrote: > Sure, but not all levels of risk are equal. Jumping out of a plane > carries some risk of death whether or not you have a parachute, but > that does not mean that we shouldn't worry about whether you have one > or not before you jump. > > In

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 12:02:18 -0400, Robert Haas wrote: > On Wed, Apr 1, 2020 at 11:09 AM Andres Freund wrote: > > There's really no reason at all to have bins of one minute. As it's a > > PGC_POSTMASTER GUC, it should just have didided time into bins of > > (old_snapshot_threshold * USEC_PER_SEC)

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Robert Haas
On Wed, Apr 1, 2020 at 1:03 PM Peter Geoghegan wrote: > I don't think that it's fair to characterize Andres' actions in that > situation as in any way irresponsible. We had an extremely complicated > data corruption bug that he went to great lengths to fix, following > two other incorrect fixes.

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Peter Geoghegan
On Wed, Apr 1, 2020 at 9:02 AM Robert Haas wrote: > I complained > when you added those error checks to vacuum in back-branches, and > since that release went out people are regularly tripping those checks > and taking prolonged outages for a problem that wasn't making them > unhappy before. I

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Robert Haas
On Wed, Apr 1, 2020 at 11:09 AM Andres Freund wrote: > That doesn't exist in all the back branches. Think it'd be easier to add > code to explicitly prune it during MaintainOldSnapshotTimeMapping(). That's reasonable. > There's really no reason at all to have bins of one minute. As it's a >

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 11:15:14 -0400, Robert Haas wrote: > On Wed, Apr 1, 2020 at 2:40 AM Andres Freund wrote: > > I added some debug output to print the mapping before/after changes by > > MaintainOldSnapshotTimeMapping() (note that I used timestamps relative > > to the server start in

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Robert Haas
On Wed, Apr 1, 2020 at 2:40 AM Andres Freund wrote: > I added some debug output to print the mapping before/after changes by > MaintainOldSnapshotTimeMapping() (note that I used timestamps relative > to the server start in minutes/seconds to make it easier to interpret). > > And the output turns

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-04-01 10:01:07 -0400, Robert Haas wrote: > On Wed, Apr 1, 2020 at 2:40 AM Andres Freund wrote: > > The problem is that there's no protection again the xids in the > > ringbuffer getting old enough to wrap around. Given that practical uses > > of old_snapshot_threshold are likely to

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, On 2020-03-31 23:40:08 -0700, Andres Freund wrote: > I added some debug output to print the mapping before/after changes by > MaintainOldSnapshotTimeMapping() (note that I used timestamps relative > to the server start in minutes/seconds to make it easier to interpret). Now attached.

Re: snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Robert Haas
On Wed, Apr 1, 2020 at 2:40 AM Andres Freund wrote: > The problem is that there's no protection again the xids in the > ringbuffer getting old enough to wrap around. Given that practical uses > of old_snapshot_threshold are likely to be several hours to several > days, that's not particularly

snapshot too old issues, first around wraparound and then more.

2020-04-01 Thread Andres Freund
Hi, Sorry, this mail is somewhat long. But I think it's important that at least a few committers read it, since I think we're going to have to make some sort of call about what to do. I am trying to change the snapshot too old infrastructure so it cooperates with my snapshot scalability patch.