Re: [HACKERS] snapshot too old, configured by time

Steve Singer Sun, 08 Nov 2015 15:08:08 -0800

On 10/15/2015 05:47 PM, Kevin Grittner wrote:

All other issues raised by Álvaro and Steve have been addressed,except for this one, which I will argue against:


I've been looking through the updated patch


In snapmgr.c

+ * XXX: If we can trust a read of an int64 value to be atomic, we canskip the

+ * spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)

Was your intent with the XXX for it to be a TODO to only aquire the lockon platforms without the atomic 64bit operations?


On a more general note:

I've tried various manual tests of this feature and it sometimes worksas expected and sometimes doesn't.I'm getting the feeling that how I expect it to work isn't quite in syncwith how it does work.


I'd expect the following to be sufficient to generate the test

T1: Obtains a snapshot that can see some rows
T2: Waits 60 seconds and performs an update on those rows
T2: Performs a vacuum

T1: Waits 60 seconds, tries to select from the table. The snapshotshould be too old

For example it seems that in test 002 the select_ok on conn1 followingthe vacuum but right before the final sleep is critical to the snapshottoo old error showing up (ie if I remove that select_ok but leave in thesleep I don't get the error)

Is this intended and if so is there a better way we can explain howthings work?

Also is 0 intended to be an acceptable value for old_snapshot_thresholdand if so what should we expect to see then?

So if I understand correctly, every call to BufferGetPage needs to have
a TestForOldSnapshot immediately afterwards?  It seems easy to later
introduce places that fail to test for old snapshots.  What happens if
they do?  Does vacuum remove tuples anyway and then the query returns
wrong results?  That seems pretty dangerous.  Maybe the snapshot could
be an argument of BufferGetPage?

There are 486 occurences of BufferGetPage in the source code, and
this patch follows 36 of them with TestForOldSnapshot.  This only
needs to be done when a page is going to be used for a scan which
produces user-visible results.  That is, it is *not* needed for
positioning within indexes to add or vacuum away entries, for heap
inserts or deletions (assuming the row to be deleted has already
been found).  It seems wrong to modify about 450 BufferGetPage
references to add a NULL parameter; and if we do want to make that
noop change as insurance, it seems like it should be a separate
patch, since the substance of this patch would be buried under the
volume of that.

I will add this to the November CF.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapshot too old, configured by time

Reply via email to