Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-20 Thread Andres Freund
On 2016-07-20 13:59:32 -0400, Robert Haas wrote:
> It's hard to believe that it's equally good to use the newest
> registered snapshot (which is, I think, what you will often get from
> GetActiveSnapshot()) and the oldest registered snapshot (which is what
> you will get from pairingheap_first()).  It seems to me that we need
> to analyze what happens if we choose a snapshot that is older than the
> one used to find the datum which contained the toast pointer, and
> conversely what happens if we use a snapshot that is newer than the
> one we used to find the toast pointer.

Yea, the oldest seems better.


> Here's an attempt:
> 
> 1. If we pick a snapshot that is older than the one that found the
> scan tuple, we might get a "snapshot too old" error that is not
> strictly necessary.

Right. Which still seems a lot better than essentially pessimizing
vacuuming for toast tables considerably.


> 2. If we pick a snapshot that is newer than the one that found the
> scan tuple, then haven't we failed to fix the problem?  I'm not so
> sure about this direction, but if it's OK to test an arbitrarily new
> snapshot, then I can't see why we need the test at all.

I think some argument could be construed why it'd possibly be safe, but
I feel a lot better with the other option.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-20 Thread Robert Haas
On Wed, Jul 20, 2016 at 12:30 PM, Andres Freund  wrote:
>> And how do you obtain that?  The functions that reference
>> SnapshotToast are toast_delete_datum, toastrel_value_exists, and
>> toast_fetch_datum, toast_fetch_datum_slice, but none of those take a
>> snapshot as an argument, nor is there any reasonable way to make them
>> do so.  Those are indirectly called by things like bttextcmp, which
>> don't know what snapshot was used to fetch the datum that they are
>> detoasting and can't reasonably be made to know.
>>
>> I mean, you could do something *approximately* correct by calling
>> GetActiveSnapshot() but that doesn't seem likely to be correct in
>> detail.
>
> GetActiveSnapshot() seems like it should work well enough in this case,
> or we could use pairingheap_first() to get the actual oldest registered
> one.

It's hard to believe that it's equally good to use the newest
registered snapshot (which is, I think, what you will often get from
GetActiveSnapshot()) and the oldest registered snapshot (which is what
you will get from pairingheap_first()).  It seems to me that we need
to analyze what happens if we choose a snapshot that is older than the
one used to find the datum which contained the toast pointer, and
conversely what happens if we use a snapshot that is newer than the
one we used to find the toast pointer.

Here's an attempt:

1. If we pick a snapshot that is older than the one that found the
scan tuple, we might get a "snapshot too old" error that is not
strictly necessary.

2. If we pick a snapshot that is newer than the one that found the
scan tuple, then haven't we failed to fix the problem?  I'm not so
sure about this direction, but if it's OK to test an arbitrarily new
snapshot, then I can't see why we need the test at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-20 Thread Alvaro Herrera
Andres Freund wrote:
> On 2016-07-20 11:26:11 -0400, Robert Haas wrote:
> > On Wed, Jul 20, 2016 at 3:39 AM, Andres Freund  wrote:
> > >>I think Snapshot's members whenTaken and lsn are updated/initialized
> > >>only in GetSnapshotData().  So if GetSnapshotData() is not used, how
> > >>will you expect those fields to be updated.  We need those fields to
> > >>be updated for TestForOldSnapshot().
> > >
> > > That's why I suggested copying them from the current mvcc snapshot.
> > 
> > And how do you obtain that?  The functions that reference
> > SnapshotToast are toast_delete_datum, toastrel_value_exists, and
> > toast_fetch_datum, toast_fetch_datum_slice, but none of those take a
> > snapshot as an argument, nor is there any reasonable way to make them
> > do so.  Those are indirectly called by things like bttextcmp, which
> > don't know what snapshot was used to fetch the datum that they are
> > detoasting and can't reasonably be made to know.
> > 
> > I mean, you could do something *approximately* correct by calling
> > GetActiveSnapshot() but that doesn't seem likely to be correct in
> > detail.
> 
> GetActiveSnapshot() seems like it should work well enough in this case,
> or we could use pairingheap_first() to get the actual oldest registered
> one.

Hmm.  Why is the active snapshot not sufficient?  Perhaps we need some
kind of redesign or minor tweak to snapmgr to keep track of the oldest
snapshot of a resowner or something like that?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-20 Thread Andres Freund
On 2016-07-20 11:26:11 -0400, Robert Haas wrote:
> On Wed, Jul 20, 2016 at 3:39 AM, Andres Freund  wrote:
> >>I think Snapshot's members whenTaken and lsn are updated/initialized
> >>only in GetSnapshotData().  So if GetSnapshotData() is not used, how
> >>will you expect those fields to be updated.  We need those fields to
> >>be updated for TestForOldSnapshot().
> >
> > That's why I suggested copying them from the current mvcc snapshot.
> 
> And how do you obtain that?  The functions that reference
> SnapshotToast are toast_delete_datum, toastrel_value_exists, and
> toast_fetch_datum, toast_fetch_datum_slice, but none of those take a
> snapshot as an argument, nor is there any reasonable way to make them
> do so.  Those are indirectly called by things like bttextcmp, which
> don't know what snapshot was used to fetch the datum that they are
> detoasting and can't reasonably be made to know.
> 
> I mean, you could do something *approximately* correct by calling
> GetActiveSnapshot() but that doesn't seem likely to be correct in
> detail.

GetActiveSnapshot() seems like it should work well enough in this case,
or we could use pairingheap_first() to get the actual oldest registered
one.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-20 Thread Robert Haas
On Wed, Jul 20, 2016 at 3:39 AM, Andres Freund  wrote:
>>I think Snapshot's members whenTaken and lsn are updated/initialized
>>only in GetSnapshotData().  So if GetSnapshotData() is not used, how
>>will you expect those fields to be updated.  We need those fields to
>>be updated for TestForOldSnapshot().
>
> That's why I suggested copying them from the current mvcc snapshot.

And how do you obtain that?  The functions that reference
SnapshotToast are toast_delete_datum, toastrel_value_exists, and
toast_fetch_datum, toast_fetch_datum_slice, but none of those take a
snapshot as an argument, nor is there any reasonable way to make them
do so.  Those are indirectly called by things like bttextcmp, which
don't know what snapshot was used to fetch the datum that they are
detoasting and can't reasonably be made to know.

I mean, you could do something *approximately* correct by calling
GetActiveSnapshot() but that doesn't seem likely to be correct in
detail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-20 Thread Kevin Grittner
On Tue, Jul 19, 2016 at 6:32 PM, Andres Freund  wrote:

> I mean the only difference between toast / plain heap table WRT
> old_snapshot_threshold is that we don't use a mvcc snapshot.

We use different functions and never, ever call BufferGetPage --
except for deep in the bowels of the AMs.  Countless functions
would need to be modified to pass in information about whether any
call is one of those which need to test for snapshot-too-old.
Since "normal" heap and index access is already covered without
that, yet use the AMs, there would be a weird "double coverage" to
look out for.  On top of all that, you would need to not only throw
errors for some cases but (as you pointed out earlier in the
thread) turn others into no-ops.  Also, some of the toast calls are
very far from the calls for the base row, where a function might
decide to de-toast some toast pointer.  With the naive approach of
what you suggest, frequency of checking would go from once per page
(containing multiple tuples) to that *plus* once per toast chunk
per value per heap tuple, although it seems like checking any one
(like the first) toast chunk for a value would suffice.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-20 Thread Andres Freund


On July 19, 2016 7:43:05 PM PDT, Amit Kapila  wrote:
>On Wed, Jul 20, 2016 at 7:57 AM, Andres Freund 
>wrote:
>>
>>
>> On July 19, 2016 7:14:42 PM PDT, Amit Kapila
> wrote:
>>>On Wed, Jul 20, 2016 at 5:02 AM, Andres Freund 
>>>wrote:
 On 2016-07-19 18:09:59 -0500, Kevin Grittner wrote:
> As far as I can see, to do this the way that Andres and Amit
> suggest involves tying in to indexam.c and other code in
>incredibly
> ugly ways.

 Could you explain the problem you're seing?

 Isn't pretty much all all that we need to do:
 1) add a InitSnapshotToast(Snapshot originMVCCSnap), which sets
>>>SnapshotData->lsn
to the the origin snapshot's lsn
 2) adapt TestForOldSnapshot() to accept both HeapTupleSatisfiesMVCC
>>>and
HeapTupleSatisfiesToast?

>>>
>>>I also think so.  However, it is not clear what is the best place to
>>>initialize toast snapshot.  One idea could be to do it in
>>>GetSnapshotData() after capturing the required information for the
>>>valid value of old_snapshot_threshold.  Do you have something else in
>>>mind?
>>
>> There's very few callsites using toast snapshots. I'd just do it
>there. Don't think we ever use GetSnapshotData for them.
>>
>
>I think Snapshot's members whenTaken and lsn are updated/initialized
>only in GetSnapshotData().  So if GetSnapshotData() is not used, how
>will you expect those fields to be updated.  We need those fields to
>be updated for TestForOldSnapshot().

That's why I suggested copying them from the current mvcc snapshot.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-19 Thread Amit Kapila
On Wed, Jul 20, 2016 at 7:57 AM, Andres Freund  wrote:
>
>
> On July 19, 2016 7:14:42 PM PDT, Amit Kapila  wrote:
>>On Wed, Jul 20, 2016 at 5:02 AM, Andres Freund 
>>wrote:
>>> On 2016-07-19 18:09:59 -0500, Kevin Grittner wrote:
 As far as I can see, to do this the way that Andres and Amit
 suggest involves tying in to indexam.c and other code in incredibly
 ugly ways.
>>>
>>> Could you explain the problem you're seing?
>>>
>>> Isn't pretty much all all that we need to do:
>>> 1) add a InitSnapshotToast(Snapshot originMVCCSnap), which sets
>>SnapshotData->lsn
>>>to the the origin snapshot's lsn
>>> 2) adapt TestForOldSnapshot() to accept both HeapTupleSatisfiesMVCC
>>and
>>>HeapTupleSatisfiesToast?
>>>
>>
>>I also think so.  However, it is not clear what is the best place to
>>initialize toast snapshot.  One idea could be to do it in
>>GetSnapshotData() after capturing the required information for the
>>valid value of old_snapshot_threshold.  Do you have something else in
>>mind?
>
> There's very few callsites using toast snapshots. I'd just do it there. Don't 
> think we ever use GetSnapshotData for them.
>

I think Snapshot's members whenTaken and lsn are updated/initialized
only in GetSnapshotData().  So if GetSnapshotData() is not used, how
will you expect those fields to be updated.  We need those fields to
be updated for TestForOldSnapshot().

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-19 Thread Andres Freund


On July 19, 2016 7:14:42 PM PDT, Amit Kapila  wrote:
>On Wed, Jul 20, 2016 at 5:02 AM, Andres Freund 
>wrote:
>> On 2016-07-19 18:09:59 -0500, Kevin Grittner wrote:
>>> As far as I can see, to do this the way that Andres and Amit
>>> suggest involves tying in to indexam.c and other code in incredibly
>>> ugly ways.
>>
>> Could you explain the problem you're seing?
>>
>> Isn't pretty much all all that we need to do:
>> 1) add a InitSnapshotToast(Snapshot originMVCCSnap), which sets
>SnapshotData->lsn
>>to the the origin snapshot's lsn
>> 2) adapt TestForOldSnapshot() to accept both HeapTupleSatisfiesMVCC
>and
>>HeapTupleSatisfiesToast?
>>
>
>I also think so.  However, it is not clear what is the best place to
>initialize toast snapshot.  One idea could be to do it in
>GetSnapshotData() after capturing the required information for the
>valid value of old_snapshot_threshold.  Do you have something else in
>mind?

There's very few callsites using toast snapshots. I'd just do it there. Don't 
think we ever use GetSnapshotData for them.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-19 Thread Amit Kapila
On Wed, Jul 20, 2016 at 5:02 AM, Andres Freund  wrote:
> On 2016-07-19 18:09:59 -0500, Kevin Grittner wrote:
>> As far as I can see, to do this the way that Andres and Amit
>> suggest involves tying in to indexam.c and other code in incredibly
>> ugly ways.
>
> Could you explain the problem you're seing?
>
> Isn't pretty much all all that we need to do:
> 1) add a InitSnapshotToast(Snapshot originMVCCSnap), which sets 
> SnapshotData->lsn
>to the the origin snapshot's lsn
> 2) adapt TestForOldSnapshot() to accept both HeapTupleSatisfiesMVCC and
>HeapTupleSatisfiesToast?
>

I also think so.  However, it is not clear what is the best place to
initialize toast snapshot.  One idea could be to do it in
GetSnapshotData() after capturing the required information for the
valid value of old_snapshot_threshold.  Do you have something else in
mind?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-19 Thread Andres Freund
On 2016-07-19 18:09:59 -0500, Kevin Grittner wrote:
> As far as I can see, to do this the way that Andres and Amit
> suggest involves tying in to indexam.c and other code in incredibly
> ugly ways.

Could you explain the problem you're seing?

Isn't pretty much all all that we need to do:
1) add a InitSnapshotToast(Snapshot originMVCCSnap), which sets 
SnapshotData->lsn
   to the the origin snapshot's lsn
2) adapt TestForOldSnapshot() to accept both HeapTupleSatisfiesMVCC and
   HeapTupleSatisfiesToast?

I mean the only difference between toast / plain heap table WRT
old_snapshot_threshold is that we don't use a mvcc snapshot.

> I think it is entirely the wrong way to go, as I can't
> find a way to make it look remotely sane.  The question is whether
> I should do it the way that I think is sane, or whether someone
> else wants to show me what I'm missing by producing at least a
> rough patch along these lines.

I'll, but I'd prefer you explaining the problem first. Maybe it's me
missing the obvious problem.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Andres Freund
On 2016-07-13 15:57:02 -0500, Kevin Grittner wrote:
> A short answer is that a normal table's heap doesn't go through
> systable_getnext_ordered().  That function is used both for cases
> where the check should not be made, like toast_delete_datum(), and
> cases where it should, like toast_fetch_datum().

It *has* to be be made for toast_delete_datum(). Otherwise we could end
up deleting a since reused toast id. Or am I missing something?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Kevin Grittner
On Wed, Jul 13, 2016 at 12:48 PM, Andres Freund  wrote:
> On 2016-07-13 10:06:52 -0500, Kevin Grittner wrote:
>> On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila  wrote:
>>> On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner  wrote:
 On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:

> I'm a bit confused, why aren't we simply adding LSN interlock
> checks for toast? Doesn't look that hard? Seems like a much more
> natural course of fixing this issue?

 I took some time trying to see what you have in mind, and I'm
 really not "getting it".
>>>
>>> Isn't it possible if we initialize lsn and whenTaken in SnapshotToast
>>> when old_snapshot_threshold > 0 and add a check for
>>> HeapTupleSatisfiesToast in TestForOldSnapshot()?
>>
>> With that approach, how will we know *not* to generate an error
>> when reading the chain of tuples for a value we are deleting.  Or
>> positioning to modify an index on toast data.  Etc., etc. etc.
>
> I'm not following. How is that different in the toast case than in the
> heap case?

A short answer is that a normal table's heap doesn't go through
systable_getnext_ordered().  That function is used both for cases
where the check should not be made, like toast_delete_datum(), and
cases where it should, like toast_fetch_datum().

Since this keeps coming up, I'll produce a patch this way.  I'm
skeptical, but maybe it will look better than I think it will.  I
should be able to post that by Friday.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Andres Freund
On 2016-07-13 10:06:52 -0500, Kevin Grittner wrote:
> On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila  wrote:
> > On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner  wrote:
> >> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:
> >>
> >>> I'm a bit confused, why aren't we simply adding LSN interlock
> >>> checks for toast? Doesn't look that hard? Seems like a much more
> >>> natural course of fixing this issue?
> >>
> >> I took some time trying to see what you have in mind, and I'm
> >> really not "getting it".
> >
> > Isn't it possible if we initialize lsn and whenTaken in SnapshotToast
> > when old_snapshot_threshold > 0 and add a check for
> > HeapTupleSatisfiesToast in TestForOldSnapshot()?
> 
> With that approach, how will we know *not* to generate an error
> when reading the chain of tuples for a value we are deleting.  Or
> positioning to modify an index on toast data.  Etc., etc. etc.

I'm not following. How is that different in the toast case than in the
heap case?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Andres Freund
On 2016-07-12 10:04:45 -0500, Kevin Grittner wrote:
> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:
> 
> > I'm a bit confused, why aren't we simply adding LSN interlock
> > checks for toast? Doesn't look that hard? Seems like a much more
> > natural course of fixing this issue?
> 
> I took some time trying to see what you have in mind, and I'm
> really not "getting it".  I definitely applaud you for spotting the
> problem, but this suggestion for solving it doesn't seem to be
> useful.

...

> Basically, after turning this suggestion every way I could, I see
> two alternative ways to implement it.

What I was actually getting at was to perform TestForOldSnapshot() in
the HeapTupleSatisfiesToast case as well. That'd require minor amounts
of work to keep the lsn up2date, but otherwise should be fairly easy to
implement.  It seems much more logical to use the same mechanism we use
for heap for toast as well, rather than implementing something separate.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Kevin Grittner
On Wed, Jul 13, 2016 at 7:57 AM, Amit Kapila  wrote:
> On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner  wrote:
>> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:
>>
>>> I'm a bit confused, why aren't we simply adding LSN interlock
>>> checks for toast? Doesn't look that hard? Seems like a much more
>>> natural course of fixing this issue?
>>
>> I took some time trying to see what you have in mind, and I'm
>> really not "getting it".
>
> Isn't it possible if we initialize lsn and whenTaken in SnapshotToast
> when old_snapshot_threshold > 0 and add a check for
> HeapTupleSatisfiesToast in TestForOldSnapshot()?

With that approach, how will we know *not* to generate an error
when reading the chain of tuples for a value we are deleting.  Or
positioning to modify an index on toast data.  Etc., etc. etc.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-13 Thread Amit Kapila
On Tue, Jul 12, 2016 at 8:34 PM, Kevin Grittner  wrote:
> On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:
>
>> I'm a bit confused, why aren't we simply adding LSN interlock
>> checks for toast? Doesn't look that hard? Seems like a much more
>> natural course of fixing this issue?
>
> I took some time trying to see what you have in mind, and I'm
> really not "getting it".
>

Isn't it possible if we initialize lsn and whenTaken in SnapshotToast
when old_snapshot_threshold > 0 and add a check for
HeapTupleSatisfiesToast in TestForOldSnapshot()?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-12 Thread Kevin Grittner
On Fri, Jul 8, 2016 at 1:52 PM, Andres Freund  wrote:

> I'm a bit confused, why aren't we simply adding LSN interlock
> checks for toast? Doesn't look that hard? Seems like a much more
> natural course of fixing this issue?

I took some time trying to see what you have in mind, and I'm
really not "getting it".  I definitely applaud you for spotting the
problem, but this suggestion for solving it doesn't seem to be
useful.

Basically, after turning this suggestion every way I could, I see
two alternative ways to implement it.

(1)  Whenever TestForOldSnapshot() checks a heap page, check
whether the related toast is OK for all visible tuples on that
page.  It would be enough to check one toast tuple for one value
per heap tuple, but still -- this would be really nasty from a
performance perspective.

(2)  To deal with the fact that only about 7% of the
BufferGetPage() calls need to make this check, all functions and
macros which read toast data from the table would need extra
parameters, and all call sites for the toast API would need to have
such context information passed to them, so they could specify this
correctly.  Ugh.

Compared to those options, the approach I was taking, where the fix
is "automatic" but some workloads where old_snapshot_threshold is
on would sequentially read some toast indexes more often seems
pretty tame.

Do you see some other option that fits what you describe?  I'll
give you a couple days to respond before coding the patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-08 Thread Andres Freund
On 2016-07-08 13:32:35 -0500, Kevin Grittner wrote:
> On Fri, Jul 8, 2016 at 12:53 PM, Andres Freund  wrote:
> > On 2016-07-08 11:00:50 -0500, Kevin Grittner wrote:
> >> On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund  wrote:
> >>
> >> > So I don't think that approach still allows old snapshot related
> >> > cleanups for toast triggered vacuums?  Is that an acceptable
> >> > restriction?
> >>
> >> What I would rather see is that if the heap is vacuumed (whether or
> >> not by autovacuum) then the related TOAST table is also vacuumed
> >> (using the same horizon the heap used), but if the TOAST relation
> >> is chosen for vacuum by itself that it does not attempt to adjust
> >> the horizon based on old_snapshot_threshold.
> >
> > Uh, wouldn't that quote massively regress the autovacuum workload in
> > some cases? There's a reason they're considered separately after
> > all. And in many cases, even if there's lots of updates in the heap
> > table, the toast table doesn't get any updates. And the toast table is
> > often a lot larger than the data.
> 
> Of course, the toast table has only one index, and it is narrow.

But that index and the table are often large...

> With the visibility map, it should visit only the needed pages in
> the toast's heap area, so any regression would be in the case that:
> 
> (1)  old_snapshot_threshold >= 0
> (2)  the "normal" heap met the conditions for vacuum, but the heap
>  didn't
> (3)  when passing the toast heap based on visibility map, *some*
>  cleanup was done (otherwise the TID list would be empty, so no
>  index pass is needed)

Unfortunately btree performs an index scan, even if there's no tids to
clean up. See the unconditional calls to
lazy_cleanup_index()->amvacuumcleanup(). C.f.
/*
 * If btbulkdelete was called, we need not do anything, just return the
 * stats from the latest btbulkdelete call.  If it wasn't called, we 
must
 * still do a pass over the index, to recycle any newly-recyclable pages
 * and to obtain index statistics.
 *
 * Since we aren't going to actually delete any leaf items, there's no
 * need to go through all the vacuum-cycle-ID pushups.


> but I'm also sure that by containing toast size
> when it would otherwise grow for weeks or months, it could be a
> very large performance gain.

That's an argument for changing autovacuum heuristics, not for making
this change as a side-effect of a bugfix.


I'm a bit confused, why aren't we simply adding LSN interlock checks for
toast? Doesn't look that hard? Seems like a much more natural course of
fixing this issue?


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-08 Thread Kevin Grittner
On Fri, Jul 8, 2016 at 12:53 PM, Andres Freund  wrote:
> On 2016-07-08 11:00:50 -0500, Kevin Grittner wrote:
>> On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund  wrote:
>>
>> > So I don't think that approach still allows old snapshot related
>> > cleanups for toast triggered vacuums?  Is that an acceptable
>> > restriction?
>>
>> What I would rather see is that if the heap is vacuumed (whether or
>> not by autovacuum) then the related TOAST table is also vacuumed
>> (using the same horizon the heap used), but if the TOAST relation
>> is chosen for vacuum by itself that it does not attempt to adjust
>> the horizon based on old_snapshot_threshold.
>
> Uh, wouldn't that quote massively regress the autovacuum workload in
> some cases? There's a reason they're considered separately after
> all. And in many cases, even if there's lots of updates in the heap
> table, the toast table doesn't get any updates. And the toast table is
> often a lot larger than the data.

Of course, the toast table has only one index, and it is narrow.
With the visibility map, it should visit only the needed pages in
the toast's heap area, so any regression would be in the case that:

(1)  old_snapshot_threshold >= 0
(2)  the "normal" heap met the conditions for vacuum, but the heap
 didn't
(3)  when passing the toast heap based on visibility map, *some*
 cleanup was done (otherwise the TID list would be empty, so no
 index pass is needed)

Any extra work would be at least partially offset by pushing back
the point where the next vacuum of toast data would be needed and
by removing index entries and keeping both the toast data and index
smaller.  I'm sure you could find cases where there was a net
performance loss, but I'm also sure that by containing toast size
when it would otherwise grow for weeks or months, it could be a
very large performance gain.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-08 Thread Andres Freund
On 2016-07-08 11:00:50 -0500, Kevin Grittner wrote:
> On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund  wrote:
> 
> > So I don't think that approach still allows old snapshot related
> > cleanups for toast triggered vacuums?  Is that an acceptable
> > restriction?
> 
> What I would rather see is that if the heap is vacuumed (whether or
> not by autovacuum) then the related TOAST table is also vacuumed
> (using the same horizon the heap used), but if the TOAST relation
> is chosen for vacuum by itself that it does not attempt to adjust
> the horizon based on old_snapshot_threshold.

Uh, wouldn't that quote massively regress the autovacuum workload in
some cases? There's a reason they're considered separately after
all. And in many cases, even if there's lots of updates in the heap
table, the toast table doesn't get any updates. And the toast table is
often a lot larger than the data.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-08 Thread Kevin Grittner
On Wed, Jul 6, 2016 at 4:55 PM, Andres Freund  wrote:

> So I don't think that approach still allows old snapshot related
> cleanups for toast triggered vacuums?  Is that an acceptable
> restriction?

What I would rather see is that if the heap is vacuumed (whether or
not by autovacuum) then the related TOAST table is also vacuumed
(using the same horizon the heap used), but if the TOAST relation
is chosen for vacuum by itself that it does not attempt to adjust
the horizon based on old_snapshot_threshold.  I am looking to see
how to make that happen; expect a new patch Monday.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-07-06 Thread Andres Freund
On 2016-07-02 14:20:13 -0500, Kevin Grittner wrote:
> The possible fixes considered were these:
>
> (1)  Always vacuum the heap before its related TOAST table.

I think that's clearly not ok from a cost perspective.

> (2)  Same as (1) but only when old_snapshot_threshold >= 0.
> (3)  Allow the special snapshot used for TOAST access to generate
> the "snapshot too old" error, so that the modified page from the
> pruning/vacuuming (along with other conditions) would cause that
> rather than something suggesting corrupt internal structure.
> (4)  When looking to read a toasted value for a tuple past the
> early pruning horizon, if the value was not found consider it a
> "snapshot too old" error.

Doesn't solve the issue that a toast id might end up being reused.

> (5)  Don't vacuum or prune a TOAST table except as part of the heap
> vacuum when early pruning is enabled.

That's pretty costly.

> (6)  Don't allow early vacuuming/pruning of TOAST values except as
> part of the vacuuming of the related heap.


> It became evident pretty quickly that the HOT pruning of TOAST
> values should not do early cleanup, based on practical concerns of
> coordinating that with the heap cleanup for any of the above
> options.  What's more, since we don't allow updating of tuples
> holding TOAST values, HOT pruning seems to be of dubious value on a
> TOAST table in general -- but removing that would be the matter for
> a separate patch.

I'm not following here. Sure, there'll be no HOT chains, but hot pruning
also releases space (though not item pointers) for dead tuples. And
that's fairly valuable in high-churn tables?


> Anyway, this patch includes a small hunk of code
> (two lines) to avoid early HOT pruning for TOAST tables.

I see it's only prohibiting the old_snapshot_threshold triggered
cleanup, good.


> For the vacuuming, option (6) seems a clear winner, and that is
> what this patch implements.  A TOAST table can still be vacuumed on
> its own, but in that case it will not use old_snapshot_threshold to
> try to do any early cleanup.


> We were already normally vacuuming
> the TOAST table whenever we vacuumed the related heap; in such a
> case it uses the "oldestXmin" used for the heap to vacuum the TOAST
> table.

That's not the case. Autovacuum schedules main and toast tables
independently. Check the two collection loops in do_autovacuum:
/*
 * On the first pass, we collect main tables to vacuum, and also the 
main
 * table relid to TOAST relid mapping.
 */
while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
{
...
relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
  
effective_multixact_freeze_max_age,
  , 
, );
...
/* relations that need work are added to table_oids */
if (dovacuum || doanalyze)
table_oids = lappend_oid(table_oids, relid);
}
...
/* second pass: check TOAST tables */
while ((tuple = heap_getnext(relScan, ForwardScanDirection)) != NULL)
{
...
relation_needs_vacanalyze(relid, relopts, classForm, tabentry,
  
effective_multixact_freeze_max_age,
  , 
, );

/* ignore analyze for toast tables */
if (dovacuum)
table_oids = lappend_oid(table_oids, relid);
}

So I don't think that approach still allows old snapshot related
cleanups for toast triggered vacuums?  Is that an acceptable
restriction?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 11:57:48 -0700, Andres Freund wrote:
> See 
> https://www.postgresql.org/message-id/20160616183207.wygoktoplycdz...@alap3.anar

For posterity's sake, that was supposed to be
https://www.postgresql.org/message-id/20160616183207.wygoktoplycdz...@alap3.anarazel.de


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 13:53:01 -0500, Kevin Grittner wrote:
> On Thu, Jun 16, 2016 at 1:19 PM, Kevin Grittner  wrote:
> > On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund  wrote:
> >> On 2016-06-16 12:43:34 -0400, Robert Haas wrote:
> 
> >>> Maybe it would help if you lay out the whole sequence of events, like:
> >>>
> >>> S1: Does this.
> >>> S2: Does that.
> >>> S1: Now does something else.
> >>
> >> I presume it'd be something like:
> >>
> >> Assuming a 'toasted' table, which contains one row, with a 1GB field.
> >>
> >> S1: BEGIN REPEATABLE READ;
> >> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
> >> S2: DELETE FROM toasted;
> >> AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small
> >> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
> >> 
> >
> > I'll put together a test like that and post in a bit.
> 
> old_snapshot_threshold = '1min'
> autovacuum_vacuum_threshold = 0\
> autovacuum_vacuum_scale_factor = 0.01
> 
> test=# CREATE TABLE gb (rec bytea not null);
> CREATE TABLE
> test=# ALTER TABLE gb ALTER COLUMN rec SET STORAGE external;
> ALTER TABLE
> test=# INSERT INTO gb SELECT t FROM (SELECT repeat('x',
> 10)::bytea) x(t);
> INSERT 0 1
> test=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
> BEGIN
> test=# SELECT SUM(length(rec)) FROM gb;
> sum
> 
>  10
> (1 row)
> 
> [wait for autovacuum to run]
> 
> test=# SELECT SUM(length(rec)) FROM gb;
> ERROR:  snapshot too old

See 
https://www.postgresql.org/message-id/20160616183207.wygoktoplycdz...@alap3.anar
for a recipe that reproduce the issue. I presume your example also
vacuums the main table due to the threshold and scale factor you set
(which will pretty much alwasy vacuum a table, no?).

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Kevin Grittner
On Thu, Jun 16, 2016 at 1:32 PM, Andres Freund  wrote:

> With old_snapshot_threshold=1 I indeed can reproduce the issue. I
> disabled autovacuum, to make the scheduling more predictable. But it
> should "work" just as well with autovacuum.
>
> S1: CREATE TABLE toasted(largecol text);
> INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM 
> generate_series(1, 1000);
> BEGIN;
> DELETE FROM toasted;
> S2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
> S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
>> ...
> S1: COMMIT;
> S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
>> ...
> S1: /* wait for snapshot threshold to be passed */
> S1: VACUUM pg_toast.pg_toast_16437;
>> INFO:  0: "pg_toast_16437": found 61942 removable, 0 nonremovable row 
>> versions in 15486 out of 15486 pages
>> DETAIL:  0 dead row versions cannot be removed yet.
> S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
> ERROR:  XX000: missing chunk number 0 for toast value 16455 in pg_toast_16437
> LOCATION:  toast_fetch_datum, tuptoaster.c:1945

Thanks!  That's something I should be able to work with.
Unfortunately, I am going to be on vacation, so I won't have any
results until sometime after 28 June.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Kevin Grittner
On Thu, Jun 16, 2016 at 1:19 PM, Kevin Grittner  wrote:
> On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund  wrote:
>> On 2016-06-16 12:43:34 -0400, Robert Haas wrote:

>>> Maybe it would help if you lay out the whole sequence of events, like:
>>>
>>> S1: Does this.
>>> S2: Does that.
>>> S1: Now does something else.
>>
>> I presume it'd be something like:
>>
>> Assuming a 'toasted' table, which contains one row, with a 1GB field.
>>
>> S1: BEGIN REPEATABLE READ;
>> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
>> S2: DELETE FROM toasted;
>> AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small
>> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
>> 
>
> I'll put together a test like that and post in a bit.

old_snapshot_threshold = '1min'
autovacuum_vacuum_threshold = 0\
autovacuum_vacuum_scale_factor = 0.01

test=# CREATE TABLE gb (rec bytea not null);
CREATE TABLE
test=# ALTER TABLE gb ALTER COLUMN rec SET STORAGE external;
ALTER TABLE
test=# INSERT INTO gb SELECT t FROM (SELECT repeat('x',
10)::bytea) x(t);
INSERT 0 1
test=# BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN
test=# SELECT SUM(length(rec)) FROM gb;
sum

 10
(1 row)

[wait for autovacuum to run]

test=# SELECT SUM(length(rec)) FROM gb;
ERROR:  snapshot too old

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
Hi,

On 2016-06-16 13:19:13 -0500, Kevin Grittner wrote:
> On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund  wrote:
> > On 2016-06-16 12:43:34 -0400, Robert Haas wrote:
> 
> >> The root of my confusion is: if we prune a tuple, we'll bump the page
> >> LSN, so any session that is still referencing that tuple will error
> >> out as soon as it touches the page on which that tuple used to exist.
> >
> > Right. On the main table. But we don't peform that check on the toast
> > table/pages. So if we prune toast tuples, which are still referenced by
> > (unvacuumed) main relation, we can get into trouble.
> 
> I thought that we should never be using visibility information from
> the toast table; that the visibility information in the heap should
> control.

We use visibility information for vacuuming, toast vacuum puts toast
chunk tuples through HeapTupleSatisfiesVacuum(), just like for normal
tuples.  Otherwise we'd have to collect dead toast tuples during the
normal vacuum, and then do explicit vacuums for those. That'd end up
being pretty expensive.


> If that's the case, how would we prune toast rows without
> pruning the heap?

I'm not sure what you mean? We prune toast tuples by checking xmin/xmax,
and then comparing with OldestXmin. Without STO that's safe, because we
know nobody could lookup up those toast tuples.


> You pointed out that the *reverse* case has an
> option bit -- if that is ever set there could be toasted values
> which would not have a row.

We vacuum toast tables without the main table, by simply calling
vacuum() on the toast relation.  So you can get the case that only the
normal relation is vacuumed *or* that only the toast relation is vacuumed.


> Do they still have a line pointer in the heap, like "dead" index
> entries?

You can have non-pruned toast tuples, where any evidence of the
referencing main-heap tuple is gone.


> How are they cleaned up in current production versions?

There's simply no interlock except OldestXmin preveting referenced toast
tuples to be vacuumed, as long as any alive snapshot can read them.


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 13:16:35 -0500, Kevin Grittner wrote:
> On Thu, Jun 16, 2016 at 1:01 PM, Andres Freund  wrote:
> 
> > The relevant part is the HeapTupleSatisfiesMVCC check, we're using
> > SatisfiesToast for toast lookups.
> >
> > FWIW, I just tried to reproduce this with old_snapshot_threshold = 0 -
> > but ran into the problem that I couldn't get it to vacuum anything away
> > (neither main nor toast rel).   That appears to be
> > if (old_snapshot_threshold == 0)
> > {
> > if (TransactionIdPrecedes(latest_xmin, 
> > MyPgXact->xmin)
> > && TransactionIdFollows(latest_xmin, 
> > xlimit))
> > xlimit = latest_xmin;
> > because latest_xmin always is equal to MyPgXact->xmin, which is actually
> > kinda unsurprising?
> 
> Sure -- the STO feature *never* advances the point for early
> pruning past the earliest still-active transaction ID.  If it did
> we would have all sorts of weird problems.

Both latest_xmin, MyPgXact->xmin are equivalent to txid_current() here.
Note that a threshold of 1 actually vacuums in this case (after waiting
obviously), but 0 never does.  Afaics that's because before
TransactionIdLimitedForOldSnapshots() is reached,
MaintainOldSnapshotTimeMapping will have updated latest_xmin to the
current value.


With old_snapshot_threshold=1 I indeed can reproduce the issue. I
disabled autovacuum, to make the scheduling more predictable. But it
should "work" just as well with autovacuum.

S1: CREATE TABLE toasted(largecol text);
INSERT INTO toasted SELECT string_agg(random()::text, '-') FROM 
generate_series(1, 1000);
BEGIN;
DELETE FROM toasted;
S2: BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
> ...
S1: COMMIT;
S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
> ...
S1: /* wait for snapshot threshold to be passed */
S1: VACUUM pg_toast.pg_toast_16437;
> INFO:  0: "pg_toast_16437": found 61942 removable, 0 nonremovable row 
> versions in 15486 out of 15486 pages
> DETAIL:  0 dead row versions cannot be removed yet.
S2: SELECT hashtext(largecol), length(largecol) FROM toasted;
ERROR:  XX000: missing chunk number 0 for toast value 16455 in pg_toast_16437
LOCATION:  toast_fetch_datum, tuptoaster.c:1945

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Kevin Grittner
On Thu, Jun 16, 2016 at 11:54 AM, Andres Freund  wrote:
> On 2016-06-16 12:43:34 -0400, Robert Haas wrote:

>> The root of my confusion is: if we prune a tuple, we'll bump the page
>> LSN, so any session that is still referencing that tuple will error
>> out as soon as it touches the page on which that tuple used to exist.
>
> Right. On the main table. But we don't peform that check on the toast
> table/pages. So if we prune toast tuples, which are still referenced by
> (unvacuumed) main relation, we can get into trouble.

I thought that we should never be using visibility information from
the toast table; that the visibility information in the heap should
control.  If that's the case, how would we prune toast rows without
pruning the heap?  You pointed out that the *reverse* case has an
option bit -- if that is ever set there could be toasted values
which would not have a row.  Do they still have a line pointer in
the heap, like "dead" index entries?  How are they cleaned up in
current production versions?  (Note the question mark -- I'm not
big on using that with assertions and rarely fall back on
rhetorical questions.)

>> It won't even survive long enough to care that the tuple isn't there
>> any more.
>>
>> Maybe it would help if you lay out the whole sequence of events, like:
>>
>> S1: Does this.
>> S2: Does that.
>> S1: Now does something else.
>
> I presume it'd be something like:
>
> Assuming a 'toasted' table, which contains one row, with a 1GB field.
>
> S1: BEGIN REPEATABLE READ;
> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
> S2: DELETE FROM toasted;
> AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small
> S1: SELECT SUM(length(one_gb_record)) FROM toasted;
> 

I'll put together a test like that and post in a bit.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Kevin Grittner
On Thu, Jun 16, 2016 at 1:01 PM, Andres Freund  wrote:

> The relevant part is the HeapTupleSatisfiesMVCC check, we're using
> SatisfiesToast for toast lookups.
>
> FWIW, I just tried to reproduce this with old_snapshot_threshold = 0 -
> but ran into the problem that I couldn't get it to vacuum anything away
> (neither main nor toast rel).   That appears to be
> if (old_snapshot_threshold == 0)
> {
> if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
> && TransactionIdFollows(latest_xmin, xlimit))
> xlimit = latest_xmin;
> because latest_xmin always is equal to MyPgXact->xmin, which is actually
> kinda unsurprising?

Sure -- the STO feature *never* advances the point for early
pruning past the earliest still-active transaction ID.  If it did
we would have all sorts of weird problems.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 13:44:34 -0400, Robert Haas wrote:
> On Thu, Jun 16, 2016 at 12:54 PM, Andres Freund  wrote:
> >> The root of my confusion is: if we prune a tuple, we'll bump the page
> >> LSN, so any session that is still referencing that tuple will error
> >> out as soon as it touches the page on which that tuple used to exist.
> >
> > Right. On the main table. But we don't peform that check on the toast
> > table/pages. So if we prune toast tuples, which are still referenced by
> > (unvacuumed) main relation, we can get into trouble.
> 
> OK, if it's true that we don't perform that check on the TOAST table,
> then I agree there's a potential problem there.  I don't immediately
> know where in the code to look to check that.

static inline void
TestForOldSnapshot(Snapshot snapshot, Relation relation, Page page)
{
Assert(relation != NULL);

if (old_snapshot_threshold >= 0
&& (snapshot) != NULL
&& (snapshot)->satisfies == HeapTupleSatisfiesMVCC
&& !XLogRecPtrIsInvalid((snapshot)->lsn)
&& PageGetLSN(page) > (snapshot)->lsn)
TestForOldSnapshot_impl(snapshot, relation);
}

The relevant part is the HeapTupleSatisfiesMVCC check, we're using
SatisfiesToast for toast lookups.


FWIW, I just tried to reproduce this with old_snapshot_threshold = 0 -
but ran into the problem that I couldn't get it to vacuum anything away
(neither main nor toast rel).   That appears to be
if (old_snapshot_threshold == 0)
{
if (TransactionIdPrecedes(latest_xmin, MyPgXact->xmin)
&& TransactionIdFollows(latest_xmin, xlimit))
xlimit = latest_xmin;
because latest_xmin always is equal to MyPgXact->xmin, which is actually
kinda unsurprising?

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 12:54 PM, Andres Freund  wrote:
>> The root of my confusion is: if we prune a tuple, we'll bump the page
>> LSN, so any session that is still referencing that tuple will error
>> out as soon as it touches the page on which that tuple used to exist.
>
> Right. On the main table. But we don't peform that check on the toast
> table/pages. So if we prune toast tuples, which are still referenced by
> (unvacuumed) main relation, we can get into trouble.

OK, if it's true that we don't perform that check on the TOAST table,
then I agree there's a potential problem there.  I don't immediately
know where in the code to look to check that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 12:43:34 -0400, Robert Haas wrote:
> On Thu, Jun 16, 2016 at 12:14 PM, Andres Freund  wrote:
> >> > The issue isn't there without the feature, because we (should) never
> >> > access a tuple/detoast a column when it's invisible enough for the
> >> > corresponding toast tuple to be vacuumed away. But with
> >> > old_snapshot_timeout that's obviously (intentionally) not the case
> >> > anymore.  Due to old_snapshot_threshold we'll prune tuples which,
> >> > without it, would still be considered HEAPTUPLE_RECENTLY_DEAD.
> >>
> >> Is there really an assumption that the heap and the TOAST heap are
> >> only ever vacuumed with the same OldestXmin value?  Because that seems
> >> like it would be massively flaky.
> >
> > There's not. They can be vacuumed days apart. But if we vacuum the toast
> > table with an OldestXmin, and encounter a dead toast tuple, by the
> > definition of OldestXmin (excluding STO), there cannot be a session
> > reading the referencing tuple anymore - so that shouldn't matter.
> 
> I don't understand how STO changes that.  I'm not saying it doesn't
> change it, but I don't understand why it would.

Because we advance OldestXmin more aggressively, while allowing
snapshots that are *older* than OldestXmin to access old tuples on pages
which haven't been touched.


> The root of my confusion is: if we prune a tuple, we'll bump the page
> LSN, so any session that is still referencing that tuple will error
> out as soon as it touches the page on which that tuple used to exist.

Right. On the main table. But we don't peform that check on the toast
table/pages. So if we prune toast tuples, which are still referenced by
(unvacuumed) main relation, we can get into trouble.


> It won't even survive long enough to care that the tuple isn't there
> any more.
> 
> Maybe it would help if you lay out the whole sequence of events, like:
> 
> S1: Does this.
> S2: Does that.
> S1: Now does something else.

I presume it'd be something like:

Assuming a 'toasted' table, which contains one row, with a 1GB field.

S1: BEGIN REPEATABLE READ;
S1: SELECT SUM(length(one_gb_record)) FROM toasted;
S2: DELETE FROM toasted;
AUTOVAC: vacuum toasted's toast table, it's large. skip toasted, it's small
S1: SELECT SUM(length(one_gb_record)) FROM toasted;



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 12:14 PM, Andres Freund  wrote:
>> > The issue isn't there without the feature, because we (should) never
>> > access a tuple/detoast a column when it's invisible enough for the
>> > corresponding toast tuple to be vacuumed away. But with
>> > old_snapshot_timeout that's obviously (intentionally) not the case
>> > anymore.  Due to old_snapshot_threshold we'll prune tuples which,
>> > without it, would still be considered HEAPTUPLE_RECENTLY_DEAD.
>>
>> Is there really an assumption that the heap and the TOAST heap are
>> only ever vacuumed with the same OldestXmin value?  Because that seems
>> like it would be massively flaky.
>
> There's not. They can be vacuumed days apart. But if we vacuum the toast
> table with an OldestXmin, and encounter a dead toast tuple, by the
> definition of OldestXmin (excluding STO), there cannot be a session
> reading the referencing tuple anymore - so that shouldn't matter.

I don't understand how STO changes that.  I'm not saying it doesn't
change it, but I don't understand why it would.

The root of my confusion is: if we prune a tuple, we'll bump the page
LSN, so any session that is still referencing that tuple will error
out as soon as it touches the page on which that tuple used to exist.
It won't even survive long enough to care that the tuple isn't there
any more.

Maybe it would help if you lay out the whole sequence of events, like:

S1: Does this.
S2: Does that.
S1: Now does something else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 12:02:39 -0400, Robert Haas wrote:
> On Thu, Jun 16, 2016 at 11:37 AM, Andres Freund  wrote:
> >> If that were really true, why would we not have the problem in
> >> current production versions that the toast table could be vacuumed
> >> before the heap, leading to exactly the issue you are talking
> >> about?
> >
> > The issue isn't there without the feature, because we (should) never
> > access a tuple/detoast a column when it's invisible enough for the
> > corresponding toast tuple to be vacuumed away. But with
> > old_snapshot_timeout that's obviously (intentionally) not the case
> > anymore.  Due to old_snapshot_threshold we'll prune tuples which,
> > without it, would still be considered HEAPTUPLE_RECENTLY_DEAD.
> 
> Is there really an assumption that the heap and the TOAST heap are
> only ever vacuumed with the same OldestXmin value?  Because that seems
> like it would be massively flaky.

There's not. They can be vacuumed days apart. But if we vacuum the toast
table with an OldestXmin, and encounter a dead toast tuple, by the
definition of OldestXmin (excluding STO), there cannot be a session
reading the referencing tuple anymore - so that shouldn't matter.

IIRC we actually reverted a patch that caused significant problems
around this. I think there's a small race condition around
ProcessStandbyHSFeedbackMessage(), and you can restart with a different
vacuum_defer_cleanup_age (we should just remove that), but other than
that we shouldn't run into any issues without STO.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Robert Haas
On Thu, Jun 16, 2016 at 11:37 AM, Andres Freund  wrote:
>> If that were really true, why would we not have the problem in
>> current production versions that the toast table could be vacuumed
>> before the heap, leading to exactly the issue you are talking
>> about?
>
> The issue isn't there without the feature, because we (should) never
> access a tuple/detoast a column when it's invisible enough for the
> corresponding toast tuple to be vacuumed away. But with
> old_snapshot_timeout that's obviously (intentionally) not the case
> anymore.  Due to old_snapshot_threshold we'll prune tuples which,
> without it, would still be considered HEAPTUPLE_RECENTLY_DEAD.

Is there really an assumption that the heap and the TOAST heap are
only ever vacuumed with the same OldestXmin value?  Because that seems
like it would be massively flaky.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Andres Freund
On 2016-06-16 09:50:09 -0500, Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 8:57 PM, Andres Freund  wrote:
> 
> > The simplest version of the scenario I'm concerned about is that a tuple
> > in a tuple is *not* vacuumed, even though it's elegible to be removed
> > due to STO. If that tuple has toasted columns, it could be the that the
> > toast table was independently vacuumed (autovac considers main/toast
> > tables separately,
> 
> If that were really true, why would we not have the problem in
> current production versions that the toast table could be vacuumed
> before the heap, leading to exactly the issue you are talking
> about?

The issue isn't there without the feature, because we (should) never
access a tuple/detoast a column when it's invisible enough for the
corresponding toast tuple to be vacuumed away. But with
old_snapshot_timeout that's obviously (intentionally) not the case
anymore.  Due to old_snapshot_threshold we'll prune tuples which,
without it, would still be considered HEAPTUPLE_RECENTLY_DEAD.


> It seems to me that a simple test shows that it is not the
> case that the heap is vacuumed without considering toast:

That's why I mentioned autovacuum:
/*
 * Scan pg_class to determine which tables to vacuum.
 *
 * We do this in two passes: on the first one we collect the list of 
plain
 * relations and materialized views, and on the second one we collect
 * TOAST tables. The reason for doing the second pass is that during it 
we
 * want to use the main relation's pg_class.reloptions entry if the 
TOAST
 * table does not have any, and we cannot obtain it unless we know
 * beforehand what's the main  table OID.
 *
 * We need to check TOAST tables separately because in cases with short,
 * wide tables there might be proportionally much more activity in the
 * TOAST table than in its parent.
 */
...
tab->at_vacoptions = VACOPT_SKIPTOAST |
(dovacuum ? VACOPT_VACUUM : 0) |
(doanalyze ? VACOPT_ANALYZE : 0) |
(!wraparound ? VACOPT_NOWAIT : 0);
(note the skiptoast)
...
/*
 * Remember the relation's TOAST relation for later, if the caller asked
 * us to process it.  In VACUUM FULL, though, the toast table is
 * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
 */
if (!(options & VACOPT_SKIPTOAST) && !(options & VACOPT_FULL))
toast_relid = onerel->rd_rel->reltoastrelid;
else
toast_relid = InvalidOid;
...
if (toast_relid != InvalidOid)
vacuum_rel(toast_relid, relation, options, params);


> > or the horizon could change between the two heap scans,
> 
> Not a problem in current production why?

Because the horizon will never go to a value which allows "surely dead"
tuples to be read, thus we never detoast columns from a tuple for which
we'd removed toast data. That's why we're performing visibility tests
(hopefully) everywhere, before accessing tuple contents (as opposed to
inspecting the header).


> > or pins could prevent vacuuming of one page, or ...).
> 
> Not a problem in current production why?

Same reason.


> So far I am not seeing any way for TOAST tuples to be pruned in
> advance of the referencing heap tuples, nor any way for the problem
> you describe to happen in the absence of that.

Didn't I just list three different ways, only one of which you doubted
the veracity of? Saying "Not a problem in current production why"
doesn't change it being a problem.


> > It seems the easiest way to fix this would be to make
> > TestForOldSnapshot() "accept" SnapshotToast as well.
> 
> I don't think that would be appropriate without testing the
> characteristics of the underlying table to see whether it should be
> excluded.

You mean checking whether it's a toast table? We could check that, but
since we never use a toast scan outside of toast, it doesn't seem
necessary.


> But is the TOAST data ever updated without an update to
> the referencing heap tuple?

It shouldn't.


> If not, I don't see any benefit.

Huh?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-16 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 8:57 PM, Andres Freund  wrote:

> The simplest version of the scenario I'm concerned about is that a tuple
> in a tuple is *not* vacuumed, even though it's elegible to be removed
> due to STO. If that tuple has toasted columns, it could be the that the
> toast table was independently vacuumed (autovac considers main/toast
> tables separately,

If that were really true, why would we not have the problem in
current production versions that the toast table could be vacuumed
before the heap, leading to exactly the issue you are talking
about?  It seems to me that a simple test shows that it is not the
case that the heap is vacuumed without considering toast:

test=# create table tt (c1 text not null);
CREATE TABLE
test=# insert into tt select repeat(md5(n::text),10) from (select
generate_series(1,100)) x(n);
INSERT 0 100
test=# delete from tt;
DELETE 100
test=# vacuum verbose tt;
INFO:  vacuuming "public.tt"
INFO:  "tt": removed 100 row versions in 1 pages
INFO:  "tt": found 100 removable, 0 nonremovable row versions in 1 out
of 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_16552"
INFO:  scanned index "pg_toast_16552_index" to remove 1900 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16552": removed 1900 row versions in 467 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "pg_toast_16552_index" now contains 0 row versions in 8 pages
DETAIL:  1900 index row versions were removed.
5 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_16552": found 1900 removable, 0 nonremovable row
versions in 467 out of 467 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

> or the horizon could change between the two heap scans,

Not a problem in current production why?

> or pins could prevent vacuuming of one page, or ...).

Not a problem in current production why?

So far I am not seeing any way for TOAST tuples to be pruned in
advance of the referencing heap tuples, nor any way for the problem
you describe to happen in the absence of that.  If you see such,
could you provide a more detailed description or a reproducible
test case?

> Toast accesses via tuptoaster.c currently don't perform
> TestForOldSnapshot_impl(), because they use SnapshotToastData, not
> SnapshotMVCC.
>
> That seems to means two things:
>
> 1) You might get 'missing chunk number ...' errors on access to toasted
>columns. Misleading error, but ok.
>
> 2) Because the tuple has been pruned from the toast table, it's possible
>that the toast oid/va_valueid is reused, because now there's no
>conflict with GetNewOidWithIndex() anymore. In that case the
>toast_fetch_datum() might return a tuple from another column & type
>(all columns in a table share the same toast table), which could lead
>to almost arbitrary problems.  That's not super likely to happen, but
>could have quite severe consequences once it starts.
>
> It seems the easiest way to fix this would be to make
> TestForOldSnapshot() "accept" SnapshotToast as well.

I don't think that would be appropriate without testing the
characteristics of the underlying table to see whether it should be
excluded.  But is the TOAST data ever updated without an update to
the referencing heap tuple?  If not, I don't see any benefit.  And
we certainly don't want to add some new way to prune TOAST tuples
which might still have referencing heap tuples; that could provide
a route to *create* the problem you describe.

I am on vacation tomorrow (Friday the 17th) through Monday the
27th, so I will be unable to respond to further issues during that
time.  Hopefully I can get this particular issue sorted out before
I go.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Alvaro Herrera
Kevin Grittner wrote:

> test=# -- connection 1
> analyze verbose t1;  -- when run after threshold, STO error occurs
> INFO:  analyzing "public.t1"
> INFO:  "t1": scanned 45 of 45 pages, containing 8999 live rows and
> 1001 dead rows; 8999 rows in sample, 8999 estimated total rows
> ERROR:  snapshot too old
> CONTEXT:  SQL statement "SELECT (i * (select c1 from t2))"
> PL/pgSQL function mysq(integer) line 3 at RETURN
> 
> Is there some other behavior which would be preferred?

The fact that the ERROR is being thrown seems okay to me.  I was a bit
surprised that the second INFO line is shown, but there's a simple
explanation: we first acquire the sample rows (using
acquire_sample_rows) and only after that's done we try to compute the
stats from them.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Andres Freund
On 2016-06-15 20:22:24 -0500, Kevin Grittner wrote:
> I'm not clear where you see this as being in any way different with
> STO.  Above it seemed that you saw this as an issue related to
> ANALYZE.  If there is not early pruning for the table being
> analyzed, nothing is at all different.  If there is early pruning
> the rows are not seen and there could be no detoasting.  If there
> is a function that lies about IMMUTABLE and reads from a table, it
> either functions as before or throws a STO error on page access
> (long before any detoasting).  Am I missing something?

I'm not sure, I might be missing something myself. Given the frequency
of confusion of all senior hackers involved in this discussion...

I previously was thinking of this in the context of ANALYZE, but I now
think it's a bigger problem (and might not really affect ANALYZE
itself).

The simplest version of the scenario I'm concerned about is that a tuple
in a tuple is *not* vacuumed, even though it's elegible to be removed
due to STO. If that tuple has toasted columns, it could be the that the
toast table was independently vacuumed (autovac considers main/toast
tables separately, or the horizon could change between the two heap
scans, or pins could prevent vacuuming of one page, or ...).  Toast
accesses via tuptoaster.c currently don't perform
TestForOldSnapshot_impl(), because they use SnapshotToastData, not
SnapshotMVCC.

That seems to means two things:

1) You might get 'missing chunk number ...' errors on access to toasted
   columns. Misleading error, but ok.

2) Because the tuple has been pruned from the toast table, it's possible
   that the toast oid/va_valueid is reused, because now there's no
   conflict with GetNewOidWithIndex() anymore. In that case the
   toast_fetch_datum() might return a tuple from another column & type
   (all columns in a table share the same toast table), which could lead
   to almost arbitrary problems.  That's not super likely to happen, but
   could have quite severe consequences once it starts.


It seems the easiest way to fix this would be to make
TestForOldSnapshot() "accept" SnapshotToast as well.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 5:34 PM, Andres Freund  wrote:
> On 2016-06-15 16:58:25 -0500, Kevin Grittner wrote:
>> On Wed, Jun 15, 2016 at 3:25 PM, Andres Freund  wrote:
>>> On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote:
 On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund  wrote:

> We might fetch a toast tuple which
> since have been re-purposed for a datum of a different type.

 How would that happen?
>>>
>>> Autovac vacuums toast and heap tables independently. Once a toast datum
>>> isn't used anymore, the oid used can be reused (because it doesn't
>>> conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a
>>> datum, which hasn't been removed, the contents of that toast id, might
>>> actually be for something different.
>>
>> What prevents that from happening now, without STO?
>
> Afaics we shouldn't ever look (i.e. detoast) at a "dead for everyone"
> tuple in autovacuum (or anywhere else). There's one minor exception to
> that, and that's enum datums in indexes, which is why we currently have
> weird transactional requirements for them.  I'm not entirely sure this
> can be hit, but it's worth checking.

I'm not clear where you see this as being in any way different with
STO.  Above it seemed that you saw this as an issue related to
ANALYZE.  If there is not early pruning for the table being
analyzed, nothing is at all different.  If there is early pruning
the rows are not seen and there could be no detoasting.  If there
is a function that lies about IMMUTABLE and reads from a table, it
either functions as before or throws a STO error on page access
(long before any detoasting).  Am I missing something?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Andres Freund
On 2016-06-15 16:58:25 -0500, Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 3:25 PM, Andres Freund  wrote:
> > On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote:
> >> On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund  wrote:
> >>
> >> > We might fetch a toast tuple which
> >> > since have been re-purposed for a datum of a different type.
> >>
> >> How would that happen?
> >
> > Autovac vacuums toast and heap tables independently. Once a toast datum
> > isn't used anymore, the oid used can be reused (because it doesn't
> > conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a
> > datum, which hasn't been removed, the contents of that toast id, might
> > actually be for something different.
> 
> What prevents that from happening now, without STO?

Afaics we shouldn't ever look (i.e. detoast) at a "dead for everyone"
tuple in autovacuum (or anywhere else). There's one minor exception to
that, and that's enum datums in indexes, which is why we currently have
weird transactional requirements for them.  I'm not entirely sure this
can be hit, but it's worth checking.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 2:40 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>> On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera  
>> wrote:
>
>> > We actually go quite some lengths to support this case, even when it's
>> > the opinion of many that we shouldn't.  For example VACUUM doesn't try
>> > to find index entries using the values in each deleted tuple; instead we
>> > remember the TIDs and then scan the indexes (possibly many times) to
>> > find entries that match those TIDs -- which is much slower.  Yet we do
>> > it this way to protect the case that somebody is doing the
>> > not-really-IMMUTABLE function.
>> >
>> > In other words, I don't think we consider the position you argued as
>> > acceptable.
>>
>> What are you saying is unacceptable, and what behavior would be
>> acceptable instead?
>
> The answer "we don't support the situation where you have an index using
> an IMMUTABLE function that isn't actually immutable" is not acceptable.
> The acceptable solution would be a design that doesn't have that
> property as a requisite.
>
> I think having various executor(/heapam) checks that raise errors when
> queries are executed from within ANALYZE is acceptable.

Here is an example of a test case showing that:

-- connection 1
drop table if exists t1;
create table t1 (c1 int not null);
drop table if exists t2;
create table t2 (c1 int not null);
insert into t1 select generate_series(1, 1);
drop function mysq(i int);
create function mysq(i int)
  returns int
  language plpgsql
  immutable
as $mysq$
begin
  return (i * (select c1 from t2));
end
$mysq$;
insert into t2 values (1);
create index t1_c1sq on t1 ((mysq(c1)));
begin transaction isolation level repeatable read;
select 1;

-- connection 2
vacuum analyze verbose t1;
delete from t1 where c1 between 1000 and 1999;
delete from t1 where c1 = 8000;
update t2 set c1 = 1;

-- connection 1
analyze verbose t1;  -- when run after threshold, STO error occurs

The tail end of that, running the analyze once immediately and once
after the threshold is:

test=# -- connection 1
test=# analyze verbose t1;  -- when run after threshold, STO error occurs
INFO:  analyzing "public.t1"
INFO:  "t1": scanned 45 of 45 pages, containing 8999 live rows and
1001 dead rows; 8999 rows in sample, 8999 estimated total rows
ANALYZE
test=# -- connection 1
analyze verbose t1;  -- when run after threshold, STO error occurs
INFO:  analyzing "public.t1"
INFO:  "t1": scanned 45 of 45 pages, containing 8999 live rows and
1001 dead rows; 8999 rows in sample, 8999 estimated total rows
ERROR:  snapshot too old
CONTEXT:  SQL statement "SELECT (i * (select c1 from t2))"
PL/pgSQL function mysq(integer) line 3 at RETURN

Is there some other behavior which would be preferred?

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 3:25 PM, Andres Freund  wrote:
> On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote:
>> On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund  wrote:
>>
>> > We might fetch a toast tuple which
>> > since have been re-purposed for a datum of a different type.
>>
>> How would that happen?
>
> Autovac vacuums toast and heap tables independently. Once a toast datum
> isn't used anymore, the oid used can be reused (because it doesn't
> conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a
> datum, which hasn't been removed, the contents of that toast id, might
> actually be for something different.

What prevents that from happening now, without STO?

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Andres Freund
On 2016-06-15 14:24:58 -0500, Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund  wrote:
> 
> > We might fetch a toast tuple which
> > since have been re-purposed for a datum of a different type.
> 
> How would that happen?

Autovac vacuums toast and heap tables independently. Once a toast datum
isn't used anymore, the oid used can be reused (because it doesn't
conflict via GetNewOidWithIndex() anymore. If analyze then detoasts a
datum, which hasn't been removed, the contents of that toast id, might
actually be for something different.

That's not super likely to happen (given how rare oid wraparounds
usually are), but it appears to be possible.

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread David G. Johnston
On Wed, Jun 15, 2016 at 3:40 PM, Alvaro Herrera 
wrote:

> Kevin Grittner wrote:
> > On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera
> >  wrote:
>
> > > We actually go quite some lengths to support this case, even when it's
> > > the opinion of many that we shouldn't.  For example VACUUM doesn't try
> > > to find index entries using the values in each deleted tuple; instead
> we
> > > remember the TIDs and then scan the indexes (possibly many times) to
> > > find entries that match those TIDs -- which is much slower.  Yet we do
> > > it this way to protect the case that somebody is doing the
> > > not-really-IMMUTABLE function.
> > >
> > > In other words, I don't think we consider the position you argued as
> > > acceptable.
> >
> > What are you saying is unacceptable, and what behavior would be
> > acceptable instead?
>
> The answer "we don't support the situation where you have an index using
> an IMMUTABLE function that isn't actually immutable" is not acceptable.
> The acceptable solution would be a design that doesn't have that
> property as a requisite.
>

​Yes, a much better solution would be for PostgreSQL to examine the body of
every function and determine on its own the proper volatility - or lacking
that to "sandbox" (for lack of a better term) function execution so it
simply cannot do things that conflict with its user specified marking.  But
the prevailing opinion on this list is that such an effort is not worthy of
resources and that "let the user keep both pieces"​

​is the more expedient policy.  That this patch is being defended using
that argument is consistent to policy and thus quite acceptable.

The technical details here are just beyond my reach ATM but I think
Robert's meta-points are spot on.  Though to be fair we are  changing a
fundamental assumption underlying how the system and transactions operate -
the amount of code whose assumptions are now being stressed is non-trivial;
and for a feature that will generally have less use in production - and
likely in much higher-stakes arenas - having a professionally hostile
approach will help to ensure that what is released has been thoroughly
vetted.

​These edge cases should be thought of, discussed, and ideally documented
somewhere so that future coders can see and understand that said edges have
been considered even if the answer is: "well, we don't blow up and at worse
have some slightly off statistics, that seems fine"​.

David J.


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Robert Haas
On Wed, Jun 15, 2016 at 2:59 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>> On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas  wrote:
>>
>> > The expression index case is the one to worry about; if there is a
>> > problem, that's where it is.  What bothers me is that a function used
>> > in an expression index could do anything at all - it can read any
>> > table in the database.
>>
>> It *can*, but then you are lying to the database when you call it
>> IMMUTABLE.  Such an index can easily become corrupted through
>> normal DML.  Without DML the ANALYZE has no problem.  So you seem
>> to be concerned that if someone is lying to the database engine to
>> force it accept a function as IMMUTABLE when it actually isn't, and
>> then updating the referenced rows (which is very likely to render
>> the index corrupted), that statistics might also become stale.
>
> We actually go quite some lengths to support this case, even when it's
> the opinion of many that we shouldn't.  For example VACUUM doesn't try
> to find index entries using the values in each deleted tuple; instead we
> remember the TIDs and then scan the indexes (possibly many times) to
> find entries that match those TIDs -- which is much slower.  Yet we do
> it this way to protect the case that somebody is doing the
> not-really-IMMUTABLE function.
>
> In other words, I don't think we consider the position you argued as
> acceptable.

Well, I actually don't think there's a giant problem here.  I'm just
trying to follow the chain of the argument to its (illogical)
conclusion.  I mean, if ANALYZE itself fails to see a tuple subjected
to early pruning, that should be fine.  And if some query run by a
supposedly-but-not-actually immutable function errors out because
snapshot_too_old is set, that also seems more or less fine.  The
statistics might not get updated, but oh well: either make your
supposedly-immutable function actually immutable, or else turn off
snapshot_too_old, or else live with the fact that ANALYZE will fail
some percentage of the time.  Supporting people who cheat and do
things that technically aren't allowed is one thing; saying that every
new feature must never have any downsides for such people is something
else.  If we took the latter approach, parallel query would be right
out, because you sure can break things by mislabeling functions as
PARALLEL SAFE.  I *do* think it *must* be possible to get an ANALYZE
to do something funky - either error out, or give wrong answers - if
you set up a strange enough set of circumstances, but I don't think
that's necessarily something we need to do anything about.

I think this whole discussion of snapshot too old has provoked far too
many bitter emails -- on all sides.  I entirely believe that there are
legitimate reasons to have concerns about this feature, and I entirely
suspect that it has problems we haven't found yet, and I also entirely
believe that there will be some future bugs that stem from this
feature that we would not have had otherwise.  I think it is entirely
legitimate to have concerns about those things.  On the other hand, I
*also* believe that Kevin is a pretty careful guy and that he's done
his best to make this patch safe and that he did not just go off and
commit something half-baked without due reflection.  We have to expect
that if people who are committers don't get much review of their
patches, they will eventually commit them anyway.  The "I can't
believe you committed this" reactions seem out of line to me.  This
feature is not perfect.  Nor is it the worst thing anybody's ever
committed.  But it's definitely provoked more ire than most.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Alvaro Herrera
Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera
>  wrote:

> > We actually go quite some lengths to support this case, even when it's
> > the opinion of many that we shouldn't.  For example VACUUM doesn't try
> > to find index entries using the values in each deleted tuple; instead we
> > remember the TIDs and then scan the indexes (possibly many times) to
> > find entries that match those TIDs -- which is much slower.  Yet we do
> > it this way to protect the case that somebody is doing the
> > not-really-IMMUTABLE function.
> >
> > In other words, I don't think we consider the position you argued as
> > acceptable.
> 
> What are you saying is unacceptable, and what behavior would be
> acceptable instead?

The answer "we don't support the situation where you have an index using
an IMMUTABLE function that isn't actually immutable" is not acceptable.
The acceptable solution would be a design that doesn't have that
property as a requisite.

I think having various executor(/heapam) checks that raise errors when
queries are executed from within ANALYZE is acceptable.  I don't know
about the TOAST related angle Andres just raised.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 2:20 PM, Andres Freund  wrote:

> We might fetch a toast tuple which
> since have been re-purposed for a datum of a different type.

How would that happen?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Andres Freund
On 2016-06-15 14:50:46 -0400, Robert Haas wrote:
> On Wed, Jun 15, 2016 at 2:45 PM, Alvaro Herrera
>  wrote:
> > Robert Haas wrote:
> >> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner 
> >> wrote:
> >> > The test I showed creates a situation which (to ANALYZE) is
> >> > identical to what you describe -- ANALYZE sees a page with an LSN
> >> > recent enough that it could have been (and actually has been)
> >> > pruned.  Why would it be better for the ANALYZE to fail than to
> >> > complete?
> >>
> >> As I understand it, the reason we need to sometimes give "ERROR:
> >> snapshot too old" after early pruning is because we might otherwise
> >> give the wrong answer.
> >
> > So what constitutes "the wrong answer"?  A regular transaction reading a
> > page and not finding a tuple that should have been there but was
> > removed, is a serious problem and should be aborted.  For ANALYZE it may
> > not matter a great deal.  Some very old tuple that might have been
> > chosen for the sample is not there; a different tuple is chosen instead,
> > so the stats might be slightly difference.  No big deal.
> >
> > Maybe it is possible to get into trouble if you're taking a sample for
> > an expression index.
> 
> The expression index case is the one to worry about; if there is a
> problem, that's where it is.  What bothers me is that a function used
> in an expression index could do anything at all - it can read any
> table in the database.

Isn't that also a problem around fetching toast tuples? As we don't
TestForOldSnapshot_impl() for toast, We might fetch a toast tuple which
since have been re-purposed for a datum of a different type. Which can
have arbitrarily bad consequences afaics.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 1:59 PM, Alvaro Herrera
 wrote:
> Kevin Grittner wrote:
>> On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas  wrote:
>>
>>> The expression index case is the one to worry about; if there is a
>>> problem, that's where it is.  What bothers me is that a function used
>>> in an expression index could do anything at all - it can read any
>>> table in the database.
>>
>> It *can*, but then you are lying to the database when you call it
>> IMMUTABLE.  Such an index can easily become corrupted through
>> normal DML.  Without DML the ANALYZE has no problem.  So you seem
>> to be concerned that if someone is lying to the database engine to
>> force it accept a function as IMMUTABLE when it actually isn't, and
>> then updating the referenced rows (which is very likely to render
>> the index corrupted), that statistics might also become stale.
>
> We actually go quite some lengths to support this case, even when it's
> the opinion of many that we shouldn't.  For example VACUUM doesn't try
> to find index entries using the values in each deleted tuple; instead we
> remember the TIDs and then scan the indexes (possibly many times) to
> find entries that match those TIDs -- which is much slower.  Yet we do
> it this way to protect the case that somebody is doing the
> not-really-IMMUTABLE function.
>
> In other words, I don't think we consider the position you argued as
> acceptable.

What are you saying is unacceptable, and what behavior would be
acceptable instead?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 1:54 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:

>> The expression index case is the one to worry about; if there is a
>> problem, that's where it is.  What bothers me is that a function used
>> in an expression index could do anything at all - it can read any
>> table in the database.
>
> Hmm, but if it does that, then the code that actually implements that
> query would run the STO checks, right?  The analyze code doesn't need
> to.

Right.  In the described case, you would get a "snapshot too old"
error inside the expression which is trying to generate the index
value.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Alvaro Herrera
Kevin Grittner wrote:
> On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas  wrote:
> 
> > The expression index case is the one to worry about; if there is a
> > problem, that's where it is.  What bothers me is that a function used
> > in an expression index could do anything at all - it can read any
> > table in the database.
> 
> It *can*, but then you are lying to the database when you call it
> IMMUTABLE.  Such an index can easily become corrupted through
> normal DML.  Without DML the ANALYZE has no problem.  So you seem
> to be concerned that if someone is lying to the database engine to
> force it accept a function as IMMUTABLE when it actually isn't, and
> then updating the referenced rows (which is very likely to render
> the index corrupted), that statistics might also become stale.

We actually go quite some lengths to support this case, even when it's
the opinion of many that we shouldn't.  For example VACUUM doesn't try
to find index entries using the values in each deleted tuple; instead we
remember the TIDs and then scan the indexes (possibly many times) to
find entries that match those TIDs -- which is much slower.  Yet we do
it this way to protect the case that somebody is doing the
not-really-IMMUTABLE function.

In other words, I don't think we consider the position you argued as
acceptable.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 1:50 PM, Robert Haas  wrote:

> The expression index case is the one to worry about; if there is a
> problem, that's where it is.  What bothers me is that a function used
> in an expression index could do anything at all - it can read any
> table in the database.

It *can*, but then you are lying to the database when you call it
IMMUTABLE.  Such an index can easily become corrupted through
normal DML.  Without DML the ANALYZE has no problem.  So you seem
to be concerned that if someone is lying to the database engine to
force it accept a function as IMMUTABLE when it actually isn't, and
then updating the referenced rows (which is very likely to render
the index corrupted), that statistics might also become stale.

They might.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Jun 15, 2016 at 2:45 PM, Alvaro Herrera
>  wrote:

> > Maybe it is possible to get into trouble if you're taking a sample for
> > an expression index.
> 
> The expression index case is the one to worry about; if there is a
> problem, that's where it is.  What bothers me is that a function used
> in an expression index could do anything at all - it can read any
> table in the database.

Hmm, but if it does that, then the code that actually implements that
query would run the STO checks, right?  The analyze code doesn't need
to.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Robert Haas
On Wed, Jun 15, 2016 at 2:45 PM, Alvaro Herrera
 wrote:
> Robert Haas wrote:
>> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner 
>> wrote:
>> > The test I showed creates a situation which (to ANALYZE) is
>> > identical to what you describe -- ANALYZE sees a page with an LSN
>> > recent enough that it could have been (and actually has been)
>> > pruned.  Why would it be better for the ANALYZE to fail than to
>> > complete?
>>
>> As I understand it, the reason we need to sometimes give "ERROR:
>> snapshot too old" after early pruning is because we might otherwise
>> give the wrong answer.
>
> So what constitutes "the wrong answer"?  A regular transaction reading a
> page and not finding a tuple that should have been there but was
> removed, is a serious problem and should be aborted.  For ANALYZE it may
> not matter a great deal.  Some very old tuple that might have been
> chosen for the sample is not there; a different tuple is chosen instead,
> so the stats might be slightly difference.  No big deal.
>
> Maybe it is possible to get into trouble if you're taking a sample for
> an expression index.

The expression index case is the one to worry about; if there is a
problem, that's where it is.  What bothers me is that a function used
in an expression index could do anything at all - it can read any
table in the database.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 1:45 PM, Alvaro Herrera
 wrote:

> Maybe it is possible to get into trouble if you're taking a sample for
> an expression index.

Maybe -- if you are using a function for an index expression which
does an explicit SELECT against some database table rather than
only using values from the row itself.  In such a case you would
have had to mark a function as IMMUTABLE which depends on table
contents.  I say you get to keep both pieces.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Alvaro Herrera
Robert Haas wrote:
> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner 
> wrote:

> > The test I showed creates a situation which (to ANALYZE) is
> > identical to what you describe -- ANALYZE sees a page with an LSN
> > recent enough that it could have been (and actually has been)
> > pruned.  Why would it be better for the ANALYZE to fail than to
> > complete?
> 
> As I understand it, the reason we need to sometimes give "ERROR:
> snapshot too old" after early pruning is because we might otherwise
> give the wrong answer.

So what constitutes "the wrong answer"?  A regular transaction reading a
page and not finding a tuple that should have been there but was
removed, is a serious problem and should be aborted.  For ANALYZE it may
not matter a great deal.  Some very old tuple that might have been
chosen for the sample is not there; a different tuple is chosen instead,
so the stats might be slightly difference.  No big deal.

Maybe it is possible to get into trouble if you're taking a sample for
an expression index.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 1:29 PM, Robert Haas  wrote:
> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner 
> wrote:>> So what happens in this scenario:
>>> 1. ANALYZE runs really slowly - maybe the user-defined function it's
>>> running for the expression index is extremely long-running.
>>> 2. Eventually, the snapshot for ANALYZE is older than the configured
>>> value of snapshot_too_old.
>>> 3. Then, ANALYZE selects a page with an LSN new enough that it might
>>> have been pruned.
>>>
>>> Presumably, the ANALYZE ought to error out in this scenario, just as
>>> it would in any other situation where an old snapshot sees a new page.
>>> No?
>>
>> The test I showed creates a situation which (to ANALYZE) is
>> identical to what you describe -- ANALYZE sees a page with an LSN
>> recent enough that it could have been (and actually has been)
>> pruned.  Why would it be better for the ANALYZE to fail than to
>> complete?
>
> As I understand it, the reason we need to sometimes give "ERROR:
> snapshot too old" after early pruning is because we might otherwise
> give the wrong answer.
>
> Maybe I'm confused.

In the scenario that you describe, ANALYZE is coming up with
statistics to use in query planning, and the numbers are not
expected to always be 100% accurate.  I can think of conditions
which might prevail when seeing the recent LSN.

(1)  The recent LSN is from a cause having nothing to do with the
STO feature, like DML.  As things stand, the behavior is the same
as without the patch -- the rows are counted just the same as
always.  If we did as you suggest, we instead would abort ANALYZE
and have stale statistics.

(2)  The recent LSN is related to STO pruning.  The dead rows are
gone forever, and will not be counted.  This seems more correct
than counting them, even if it were possible, and also superior to
aborting the ANALYZE and leaving stale statistics.

Of course, a subset of (1) is the case that the rows can be
early-pruned at the next opportunity.  In such a case ANALYZE is
still counting them according to the rules that we had before the
snapshot too old feature.  If someone wants to argue that those
rules are wrong, that seems like material for a separate patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Robert Haas
On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner 
wrote:>> So what happens in this scenario:
>> 1. ANALYZE runs really slowly - maybe the user-defined function it's
>> running for the expression index is extremely long-running.
>> 2. Eventually, the snapshot for ANALYZE is older than the configured
>> value of snapshot_too_old.
>> 3. Then, ANALYZE selects a page with an LSN new enough that it might
>> have been pruned.
>>
>> Presumably, the ANALYZE ought to error out in this scenario, just as
>> it would in any other situation where an old snapshot sees a new page.
>> No?
>
> The test I showed creates a situation which (to ANALYZE) is
> identical to what you describe -- ANALYZE sees a page with an LSN
> recent enough that it could have been (and actually has been)
> pruned.  Why would it be better for the ANALYZE to fail than to
> complete?

As I understand it, the reason we need to sometimes give "ERROR:
snapshot too old" after early pruning is because we might otherwise
give the wrong answer.

Maybe I'm confused.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Kevin Grittner
On Wed, Jun 15, 2016 at 10:46 AM, Robert Haas  wrote:
> On Sat, Jun 11, 2016 at 11:29 AM, Kevin Grittner  wrote:
>> I have reviewed the code and run tests to try to find something
>> here which could be considered a bug, without finding any problem.
>> When reading pages for the random sample for ANALYZE (or
>> auto-analyze) there is not an age check; so ANALYZE completes
>> without error, keeping statistics up-to-date.
>>
>> There really is no difference in behavior except in the case that:
>>
>> (1)  old_snapshot_threshold >= 0 to enable the "snapshot too old"
>>feature, and
>> (2)  there were tuples that were dead as the result of completed
>>transactions, and
>> (3)  those tuples became older than the threshold, and
>> (4)  those tuples were pruned or vacuumed away, and
>> (5)  an ANALYZE process would have read those dead tuples had they
>>not been removed.
>>
>> In such a case the irrevocably dead, permanently removed tuples are
>> not counted in the statistics.  I have trouble seeing a better
>> outcome than that.  Among my tests, I specifically checked for an
>> ANALYZE of a table having an index on an expression, using an old
>> snapshot:
>>
>> -- connection 1
>> drop table if exists t1;
>> create table t1 (c1 int not null);
>> drop table if exists t2;
>> create table t2 (c1 int not null);
>> insert into t1 select generate_series(1, 1);
>> drop function mysq(i int);
>> create function mysq(i int)
>>   returns int
>>   language plpgsql
>>   immutable
>> as $mysq$
>> begin
>>   return (i * i);
>> end
>> $mysq$;
>> create index t1_c1sq on t1 ((mysq(c1)));
>> begin transaction isolation level repeatable read;
>> select 1;
>>
>> -- connection 2
>> vacuum analyze verbose t1;
>> delete from t1 where c1 between 1000 and 1999;
>> delete from t1 where c1 = 8000;
>> insert into t2 values (1);
>> select pg_sleep_for('2min');
>> vacuum verbose t1;  -- repeat if necessary to see the dead rows
>> disappear
>>
>> -- connection 1
>> analyze verbose t1;
>>
>> This runs to completion, as I would want and expect.
>>
>> I am closing this item on the "PostgreSQL 9.6 Open Items" page.  If
>> anyone feels that I've missed something, please provide a test to
>> show the problem, or a clear description of the problem and how you
>> feel behavior should be different.
>
> So what happens in this scenario:
>
> 1. ANALYZE runs really slowly - maybe the user-defined function it's
> running for the expression index is extremely long-running.
> 2. Eventually, the snapshot for ANALYZE is older than the configured
> value of snapshot_too_old.
> 3. Then, ANALYZE selects a page with an LSN new enough that it might
> have been pruned.
>
> Presumably, the ANALYZE ought to error out in this scenario, just as
> it would in any other situation where an old snapshot sees a new page.
> No?

The test I showed creates a situation which (to ANALYZE) is
identical to what you describe -- ANALYZE sees a page with an LSN
recent enough that it could have been (and actually has been)
pruned.  Why would it be better for the ANALYZE to fail than to
complete?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-15 Thread Robert Haas
On Sat, Jun 11, 2016 at 11:29 AM, Kevin Grittner  wrote:
> I have reviewed the code and run tests to try to find something
> here which could be considered a bug, without finding any problem.
> When reading pages for the random sample for ANALYZE (or
> auto-analyze) there is not an age check; so ANALYZE completes
> without error, keeping statistics up-to-date.
>
> There really is no difference in behavior except in the case that:
>
> (1)  old_snapshot_threshold >= 0 to enable the "snapshot too old"
>feature, and
> (2)  there were tuples that were dead as the result of completed
>transactions, and
> (3)  those tuples became older than the threshold, and
> (4)  those tuples were pruned or vacuumed away, and
> (5)  an ANALYZE process would have read those dead tuples had they
>not been removed.
>
> In such a case the irrevocably dead, permanently removed tuples are
> not counted in the statistics.  I have trouble seeing a better
> outcome than that.  Among my tests, I specifically checked for an
> ANALYZE of a table having an index on an expression, using an old
> snapshot:
>
> -- connection 1
> drop table if exists t1;
> create table t1 (c1 int not null);
> drop table if exists t2;
> create table t2 (c1 int not null);
> insert into t1 select generate_series(1, 1);
> drop function mysq(i int);
> create function mysq(i int)
>   returns int
>   language plpgsql
>   immutable
> as $mysq$
> begin
>   return (i * i);
> end
> $mysq$;
> create index t1_c1sq on t1 ((mysq(c1)));
> begin transaction isolation level repeatable read;
> select 1;
>
> -- connection 2
> vacuum analyze verbose t1;
> delete from t1 where c1 between 1000 and 1999;
> delete from t1 where c1 = 8000;
> insert into t2 values (1);
> select pg_sleep_for('2min');
> vacuum verbose t1;  -- repeat if necessary to see the dead rows
> disappear
>
> -- connection 1
> analyze verbose t1;
>
> This runs to completion, as I would want and expect.
>
> I am closing this item on the "PostgreSQL 9.6 Open Items" page.  If
> anyone feels that I've missed something, please provide a test to
> show the problem, or a clear description of the problem and how you
> feel behavior should be different.

So what happens in this scenario:

1. ANALYZE runs really slowly - maybe the user-defined function it's
running for the expression index is extremely long-running.
2. Eventually, the snapshot for ANALYZE is older than the configured
value of snapshot_too_old.
3. Then, ANALYZE selects a page with an LSN new enough that it might
have been pruned.

Presumably, the ANALYZE ought to error out in this scenario, just as
it would in any other situation where an old snapshot sees a new page.
No?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-11 Thread Kevin Grittner
On Tue, May 24, 2016 at 4:10 PM, Robert Haas  wrote:
> On Tue, May 24, 2016 at 3:48 PM, Kevin Grittner  wrote:
>> On Tue, May 24, 2016 at 12:00 PM, Andres Freund  wrote:
>>> On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote:
 On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner  wrote:
> On Fri, May 6, 2016 at 7:48 PM, Andres Freund  wrote:

>> That comment reminds me of a question I had: Did you consider the effect
>> of this patch on analyze? It uses a snapshot, and by memory you've not
>> built in a defense against analyze being cancelled.
>>
>> The primary defense is not considering a cancellation except in 30
>> of the 500 places a page is used.  None of those 30 are, as far as
>> I can see (upon review in response to your question), used in the
>> analyze process.
>
> It's not obvious to me how this is supposed to work.  If ANALYZE's
> snapshot is subject to being ignored for xmin purposes because of
> snapshot_too_old, then I would think it would need to consider
> cancelling itself if it reads a page with possibly-removed data, just
> like in any other case.  It seems that we might not actually need a
> snapshot set for ANALYZE in all cases, because the comments say:
>
> /* functions in indexes may want a snapshot set */
> PushActiveSnapshot(GetTransactionSnapshot());
>
> If we can get away with it, it would be a rather large win to only set
> a snapshot when the table has an expression index.  For purposes of
> "snapshot too old", though, it will be important that a function in an
> index which tries to read data from some other table which has been
> pruned cancels itself when necessary.

I have reviewed the code and run tests to try to find something
here which could be considered a bug, without finding any problem.
When reading pages for the random sample for ANALYZE (or
auto-analyze) there is not an age check; so ANALYZE completes
without error, keeping statistics up-to-date.

There really is no difference in behavior except in the case that:

(1)  old_snapshot_threshold >= 0 to enable the "snapshot too old"
   feature, and
(2)  there were tuples that were dead as the result of completed
   transactions, and
(3)  those tuples became older than the threshold, and
(4)  those tuples were pruned or vacuumed away, and
(5)  an ANALYZE process would have read those dead tuples had they
   not been removed.

In such a case the irrevocably dead, permanently removed tuples are
not counted in the statistics.  I have trouble seeing a better
outcome than that.  Among my tests, I specifically checked for an
ANALYZE of a table having an index on an expression, using an old
snapshot:

-- connection 1
drop table if exists t1;
create table t1 (c1 int not null);
drop table if exists t2;
create table t2 (c1 int not null);
insert into t1 select generate_series(1, 1);
drop function mysq(i int);
create function mysq(i int)
  returns int
  language plpgsql
  immutable
as $mysq$
begin
  return (i * i);
end
$mysq$;
create index t1_c1sq on t1 ((mysq(c1)));
begin transaction isolation level repeatable read;
select 1;

-- connection 2
vacuum analyze verbose t1;
delete from t1 where c1 between 1000 and 1999;
delete from t1 where c1 = 8000;
insert into t2 values (1);
select pg_sleep_for('2min');
vacuum verbose t1;  -- repeat if necessary to see the dead rows
disappear

-- connection 1
analyze verbose t1;

This runs to completion, as I would want and expect.

I am closing this item on the "PostgreSQL 9.6 Open Items" page.  If
anyone feels that I've missed something, please provide a test to
show the problem, or a clear description of the problem and how you
feel behavior should be different.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-10 Thread Kevin Grittner
On Fri, Jun 10, 2016 at 10:26 AM, Robert Haas  wrote:
> On Fri, Jun 10, 2016 at 10:45 AM, Kevin Grittner  wrote:

>> Since vacuum calls the pruning function, and not the other way
>> around, the name you suggest would be technically more correct.
>> Committed using "Pruning" instead of "Vacuum" in both new macros.

> You've still got an early_vacuum_enabled variable name floating around there.

Gah!  Renamed for consistency.

Thanks!

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-10 Thread Robert Haas
On Fri, Jun 10, 2016 at 10:45 AM, Kevin Grittner  wrote:
> On Thu, Jun 9, 2016 at 6:16 PM, Robert Haas  wrote:
>
>> So I like the idea of centralizing checks in
>> RelationAllowsEarlyVacuum, but shouldn't it really be called
>> RelationAllowsEarlyPruning?
>
> Since vacuum calls the pruning function, and not the other way
> around, the name you suggest would be technically more correct.
> Committed using "Pruning" instead of "Vacuum" in both new macros.
>
> I have closed the CREATE INDEX versus "snapshot too old" issue in
> the "PostgreSQL 9.6 Open Items" Wiki page.

You've still got an early_vacuum_enabled variable name floating around there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-10 Thread Kevin Grittner
On Thu, Jun 9, 2016 at 6:16 PM, Robert Haas  wrote:

> So I like the idea of centralizing checks in
> RelationAllowsEarlyVacuum, but shouldn't it really be called
> RelationAllowsEarlyPruning?

Since vacuum calls the pruning function, and not the other way
around, the name you suggest would be technically more correct.
Committed using "Pruning" instead of "Vacuum" in both new macros.

I have closed the CREATE INDEX versus "snapshot too old" issue in
the "PostgreSQL 9.6 Open Items" Wiki page.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-09 Thread Robert Haas
On Thu, Jun 9, 2016 at 10:28 AM, Kevin Grittner  wrote:
> [Thanks to Robert to stepping up to keep this moving while I was
> down yesterday with a minor injury.  I'm back on it today.]
>> Generally, I think I see the hazard you're concerned about: I had
>> failed to realize, as you mentioned upthread, that new index pages
>> would have an LSN of 0. So if a tuple is pruned early and then the
>> index is reindexed, old snapshots won't realize that data is missing.
>> What I'm less certain about is whether you can actually get by with
>> reusing ii_BrokenHotChain to handle this case.
>
> v2 and later does not do that.  v1 did, but that was a more blunt
> instrument.
>
>>  For example, note this comment:
>>
>>  * However, when reindexing an existing index, we should do nothing here.
>>  * Any HOT chains that are broken with respect to the index must predate
>>  * the index's original creation, so there is no need to change the
>>  * index's usability horizon.  Moreover, we *must not* try to change the
>>  * index's pg_index entry while reindexing pg_index itself, and this
>>  * optimization nicely prevents that.
>>
>> This logic doesn't apply to the old snapshot case; there, you'd need
>> to distrust the index whether it was an initial build or a REINDEX,
>> but it doesn't look like that's what the patch does.  I'm worried
>> there could be other places where we rely on ii_BrokenHotChain to
>> detect only one specific condition that isn't quite the same as what
>> you're trying to use it for here.
>
> Well spotted.  I had used a lot of discreet calls to get that
> reindex logic right, but it was verbose and ugly, so I had just
> added the new macros in this patch and started to implement them
> before I knocked off for the day.  At handover I was too distracted
> to remember where I was with it.  :-(  See if it looks right to you
> now.
>
> Attached is v3.  I will commit this patch to resolve this issue
> tomorrow, barring any objections before then.

So I like the idea of centralizing checks in
RelationAllowsEarlyVacuum, but shouldn't it really be called
RelationAllowsEarlyPruning?

Will look at this a bit more if I get time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-09 Thread Kevin Grittner
[Thanks to Robert to stepping up to keep this moving while I was
down yesterday with a minor injury.  I'm back on it today.]

On Wed, Jun 8, 2016 at 3:11 PM, Robert Haas  wrote:
> On Wed, Jun 8, 2016 at 4:04 PM, Kevin Grittner  wrote:

>> -- connection 1
>> drop table if exists t1;
>> create table t1 (c1 int not null);
>> insert into t1 select generate_series(1, 100);
>> begin transaction isolation level repeatable read;
>> select 1;
>>
>> -- connection 2
>> insert into t2 values (1);
>> delete from t1 where c1 between 20 and 29;
>> delete from t1 where c1 = 100;
>> vacuum analyze verbose t1;
>> select pg_sleep_for('2min');
>> vacuum analyze verbose t1;  -- repeat if needed until dead rows vacuumed
>>
>> -- connection 1
>> select c1 from t1 where c1 = 100;
>> select c1 from t1 where c1 = 25;
>>
>> The problem occurs when an index is built while an old snapshot
>> exists which can't see the effects of early pruning/vacuuming.  The
>> fix prevents use of such an index until all snapshots early enough
>> to have a problem have been released.
>
> This example doesn't seem to involve any CREATE INDEX or REINDEX
> operations, so I'm a little confused.

Sorry; pasto -- the index build is supposed to be on connection 2
right after the dead rows are confirmed vacuumed away:

create index t1_c1 on t1 (c1);

> Generally, I think I see the hazard you're concerned about: I had
> failed to realize, as you mentioned upthread, that new index pages
> would have an LSN of 0. So if a tuple is pruned early and then the
> index is reindexed, old snapshots won't realize that data is missing.
> What I'm less certain about is whether you can actually get by with
> reusing ii_BrokenHotChain to handle this case.

v2 and later does not do that.  v1 did, but that was a more blunt
instrument.

>  For example, note this comment:
>
>  * However, when reindexing an existing index, we should do nothing here.
>  * Any HOT chains that are broken with respect to the index must predate
>  * the index's original creation, so there is no need to change the
>  * index's usability horizon.  Moreover, we *must not* try to change the
>  * index's pg_index entry while reindexing pg_index itself, and this
>  * optimization nicely prevents that.
>
> This logic doesn't apply to the old snapshot case; there, you'd need
> to distrust the index whether it was an initial build or a REINDEX,
> but it doesn't look like that's what the patch does.  I'm worried
> there could be other places where we rely on ii_BrokenHotChain to
> detect only one specific condition that isn't quite the same as what
> you're trying to use it for here.

Well spotted.  I had used a lot of discreet calls to get that
reindex logic right, but it was verbose and ugly, so I had just
added the new macros in this patch and started to implement them
before I knocked off for the day.  At handover I was too distracted
to remember where I was with it.  :-(  See if it looks right to you
now.

Attached is v3.  I will commit this patch to resolve this issue
tomorrow, barring any objections before then.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 31a1438..449a665 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2040,18 +2040,24 @@ index_build(Relation heapRelation,
 	/*
 	 * If we found any potentially broken HOT chains, mark the index as not
 	 * being usable until the current transaction is below the event horizon.
-	 * See src/backend/access/heap/README.HOT for discussion.
+	 * See src/backend/access/heap/README.HOT for discussion.  Also set this
+	 * if early pruning/vacuuming is enabled for the heap relation.  While it
+	 * might become safe to use the index earlier based on actual cleanup
+	 * activity and other active transactions, the test for that would be much
+	 * more complex and would require some form of blocking, so keep it simple
+	 * and fast by just using the current transaction.
 	 *
 	 * However, when reindexing an existing index, we should do nothing here.
 	 * Any HOT chains that are broken with respect to the index must predate
 	 * the index's original creation, so there is no need to change the
 	 * index's usability horizon.  Moreover, we *must not* try to change the
 	 * index's pg_index entry while reindexing pg_index itself, and this
-	 * optimization nicely prevents that.
+	 * optimization nicely prevents that.  The more complex rules needed for a
+	 * reindex are handled separately after this function returns.
 	 *
 	 * We also need not set indcheckxmin during a concurrent index build,
 	 * because we won't set indisvalid true until all transactions that care
-	 * about the broken HOT chains are gone.
+	 * about the broken HOT chains or early pruning/vacuuming are gone.
 	 *
 	 * Therefore, this code path can only be taken during 

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-08 Thread Noah Misch
On Fri, Jun 03, 2016 at 04:29:40PM -0500, Kevin Grittner wrote:
> On Fri, May 27, 2016 at 10:35 AM, Kevin Grittner  wrote:
> > On Tue, May 24, 2016 at 4:10 PM, Robert Haas  wrote:
> 
> >> [ANALYZE of index with expression may fail to update statistics
> >> if ANALYZE runs longer than old_snapshot_threshold]
> 
> >> If we can get away with it, it would be a rather large win to only set
> >> a snapshot when the table has an expression index.  For purposes of
> >> "snapshot too old", though, it will be important that a function in an
> >> index which tries to read data from some other table which has been
> >> pruned cancels itself when necessary.
> >
> > I will make this my top priority after resolving the question of whether
> > there is an issue with CREATE INDEX.  I expect to have a resolution,
> > probably involving some patch, by 3 June.
> 
> Due to 9.5 bug-fixing and the index issue taking a bit longer than
> I expected, this is now looking like a 7 June resolution.

This PostgreSQL 9.6 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
http://www.postgresql.org/message-id/20160527025039.ga447...@tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-08 Thread Robert Haas
On Wed, Jun 8, 2016 at 4:04 PM, Kevin Grittner  wrote:
> On Wed, Jun 8, 2016 at 2:49 PM, Robert Haas  wrote:
>> Do you have a test case that demonstrates a problem, or an explanation
>> of why you think there is one?
>
> With old_snapshot_threshold = '1min'
>
> -- connection 1
> drop table if exists t1;
> create table t1 (c1 int not null);
> insert into t1 select generate_series(1, 100);
> begin transaction isolation level repeatable read;
> select 1;
>
> -- connection 2
> insert into t2 values (1);
> delete from t1 where c1 between 20 and 29;
> delete from t1 where c1 = 100;
> vacuum analyze verbose t1;
> select pg_sleep_for('2min');
> vacuum analyze verbose t1;  -- repeat if needed until dead rows vacuumed
>
> -- connection 1
> select c1 from t1 where c1 = 100;
> select c1 from t1 where c1 = 25;
>
> The problem occurs when an index is built while an old snapshot
> exists which can't see the effects of early pruning/vacuuming.  The
> fix prevents use of such an index until all snapshots early enough
> to have a problem have been released.

This example doesn't seem to involve any CREATE INDEX or REINDEX
operations, so I'm a little confused.

Generally, I think I see the hazard you're concerned about: I had
failed to realize, as you mentioned upthread, that new index pages
would have an LSN of 0. So if a tuple is pruned early and then the
index is reindexed, old snapshots won't realize that data is missing.
What I'm less certain about is whether you can actually get by with
reusing ii_BrokenHotChain to handle this case.  For example, note this
comment:

 * However, when reindexing an existing index, we should do nothing here.
 * Any HOT chains that are broken with respect to the index must predate
 * the index's original creation, so there is no need to change the
 * index's usability horizon.  Moreover, we *must not* try to change the
 * index's pg_index entry while reindexing pg_index itself, and this
 * optimization nicely prevents that.

This logic doesn't apply to the old snapshot case; there, you'd need
to distrust the index whether it was an initial build or a REINDEX,
but it doesn't look like that's what the patch does.  I'm worried
there could be other places where we rely on ii_BrokenHotChain to
detect only one specific condition that isn't quite the same as what
you're trying to use it for here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-08 Thread Kevin Grittner
On Wed, Jun 8, 2016 at 2:49 PM, Robert Haas  wrote:

> Do you have a test case that demonstrates a problem, or an explanation
> of why you think there is one?

With old_snapshot_threshold = '1min'

-- connection 1
drop table if exists t1;
create table t1 (c1 int not null);
insert into t1 select generate_series(1, 100);
begin transaction isolation level repeatable read;
select 1;

-- connection 2
insert into t2 values (1);
delete from t1 where c1 between 20 and 29;
delete from t1 where c1 = 100;
vacuum analyze verbose t1;
select pg_sleep_for('2min');
vacuum analyze verbose t1;  -- repeat if needed until dead rows vacuumed

-- connection 1
select c1 from t1 where c1 = 100;
select c1 from t1 where c1 = 25;

The problem occurs when an index is built while an old snapshot
exists which can't see the effects of early pruning/vacuuming.  The
fix prevents use of such an index until all snapshots early enough
to have a problem have been released.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-08 Thread Robert Haas
On Wed, Jun 8, 2016 at 9:40 AM, Kevin Grittner  wrote:
>>> Of course, ii_BrokenHotChain should be renamed to something like
>>> ii_UnsafeForOldSnapshots, and some comments need to be updated; but
>>> the above is the substance of it.
>>
>> I don't know why we'd want to rename it like that...
>
> If we made the above change, the old name would be misleading, but
> I've thought better of that and attach a slightly different
> approach (tested but not yet with comment adjustments); attached.

Kevin asked me to look at this patch, and maybe update it, but after
some further study, I am not at all convinced that there's any actual
bug here.  Here's why: in order for the HeapTupleSatisfiesVacuum() in
IndexBuildHeapRangeScan() to return HEAPTUPLE_DEAD instead of
HEAPTUPLE_RECENTLY_DEAD, it would have to be using an OldestXmin value
that doesn't include all of the snapshots in the system.  But that
will never happen, because that xmin comes directly from
GetOldestXmin(heapRelation, true), which knows nothing about
snapshot_too_old and will therefore never exclude any snapshots.  If
we were to pass the output of HeapTupleSatisfiesVacuum() through
TransactionIdLimitedForOldSnapshots() before using it here, we would
have a bug.  But we don't do that.

Do you have a test case that demonstrates a problem, or an explanation
of why you think there is one?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-08 Thread Kevin Grittner
On Tue, Jun 7, 2016 at 10:40 AM, Robert Haas  wrote:
> On Sat, Jun 4, 2016 at 4:21 PM, Kevin Grittner  wrote:

>> the minimal patch to fix behavior in this area would be:
>>
>> diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
>> index 31a1438..6c379da 100644
>> --- a/src/backend/catalog/index.c
>> +++ b/src/backend/catalog/index.c
>> @@ -2268,6 +2268,9 @@ IndexBuildHeapRangeScan(Relation heapRelation,
>> Assert(numblocks == InvalidBlockNumber);
>> }
>>
>> +   if (old_snapshot_threshold >= 0)
>> +   indexInfo->ii_BrokenHotChain = true;
>> +
>> reltuples = 0;
>>
>> /*
>>
>> Of course, ii_BrokenHotChain should be renamed to something like
>> ii_UnsafeForOldSnapshots, and some comments need to be updated; but
>> the above is the substance of it.
>
> I don't know why we'd want to rename it like that...

If we made the above change, the old name would be misleading, but
I've thought better of that and attach a slightly different
approach (tested but not yet with comment adjustments); attached.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 31a1438..945f55c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2060,7 +2060,8 @@ index_build(Relation heapRelation,
 	 * about any concurrent readers of the tuple; no other transaction can see
 	 * it yet.
 	 */
-	if (indexInfo->ii_BrokenHotChain && !isreindex &&
+	if ((indexInfo->ii_BrokenHotChain || EarlyVacuumEnabled(indexRelation)) &&
+		!isreindex &&
 		!indexInfo->ii_Concurrent)
 	{
 		Oid			indexId = RelationGetRelid(indexRelation);
@@ -3409,9 +3410,10 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 	 !indexForm->indisready ||
 	 !indexForm->indislive);
 		if (index_bad ||
-			(indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain))
+			(indexForm->indcheckxmin &&
+			 !(indexInfo->ii_BrokenHotChain || EarlyVacuumEnabled(iRel
 		{
-			if (!indexInfo->ii_BrokenHotChain)
+			if (!(indexInfo->ii_BrokenHotChain || EarlyVacuumEnabled(iRel)))
 indexForm->indcheckxmin = false;
 			else if (index_bad)
 indexForm->indcheckxmin = true;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 8a830d4..4e50b19 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -4290,8 +4290,7 @@ IssuePendingWritebacks(WritebackContext *context)
 void
 TestForOldSnapshot_impl(Snapshot snapshot, Relation relation)
 {
-	if (!IsCatalogRelation(relation)
-	 && !RelationIsAccessibleInLogicalDecoding(relation)
+	if (RelationAllowsEarlyVacuum(relation)
 	 && (snapshot)->whenTaken < GetOldSnapshotThresholdTimestamp())
 		ereport(ERROR,
 (errcode(ERRCODE_SNAPSHOT_TOO_OLD),
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index f8a2a83..2eabd1c 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1597,10 +1597,7 @@ TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
 {
 	if (TransactionIdIsNormal(recentXmin)
 		&& old_snapshot_threshold >= 0
-		&& RelationNeedsWAL(relation)
-		&& !IsCatalogRelation(relation)
-		&& !RelationIsAccessibleInLogicalDecoding(relation)
-		&& !RelationHasUnloggedIndex(relation))
+		&& RelationAllowsEarlyVacuum(relation))
 	{
 		int64		ts = GetSnapshotCurrentTimestamp();
 		TransactionId xlimit = recentXmin;
diff --git a/src/include/utils/snapmgr.h b/src/include/utils/snapmgr.h
index 4270696..effdb0c 100644
--- a/src/include/utils/snapmgr.h
+++ b/src/include/utils/snapmgr.h
@@ -31,6 +31,19 @@
 #define OLD_SNAPSHOT_PADDING_ENTRIES 10
 #define OLD_SNAPSHOT_TIME_MAP_ENTRIES (old_snapshot_threshold + OLD_SNAPSHOT_PADDING_ENTRIES)
 
+/*
+ * Common definition of relation properties that allow early pruning/vacuuming
+ * when old_snapshot_threshold >= 0.
+ */
+#define RelationAllowsEarlyVacuum(rel) \
+( \
+	 RelationNeedsWAL(rel) \
+  && !IsCatalogRelation(rel) \
+  && !RelationIsAccessibleInLogicalDecoding(rel) \
+  && !RelationHasUnloggedIndex(rel) \
+)
+
+#define EarlyVacuumEnabled(rel) (old_snapshot_threshold >= 0 && RelationAllowsEarlyVacuum(rel))
 
 /* GUC variables */
 extern PGDLLIMPORT int old_snapshot_threshold;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-07 Thread Robert Haas
On Sat, Jun 4, 2016 at 4:21 PM, Kevin Grittner  wrote:
> On Fri, Jun 3, 2016 at 4:24 PM, Kevin Grittner  wrote:
>> Consequently, causing the index to be ignored in planning when
>> using the old index
>
> That last line should have read "using an old snapshot"
>
>> is not a nice optimization, but necessary for
>> correctness.  We already have logic to do this for other cases
>> (like HOT updates), so it is a matter of tying in to that existing
>> code correctly.  This won't be all that novel.
>
> Just to demonstrate that, the minimal patch to fix behavior in this
> area would be:
>
> diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
> index 31a1438..6c379da 100644
> --- a/src/backend/catalog/index.c
> +++ b/src/backend/catalog/index.c
> @@ -2268,6 +2268,9 @@ IndexBuildHeapRangeScan(Relation heapRelation,
> Assert(numblocks == InvalidBlockNumber);
> }
>
> +   if (old_snapshot_threshold >= 0)
> +   indexInfo->ii_BrokenHotChain = true;
> +
> reltuples = 0;
>
> /*
>
> Of course, ii_BrokenHotChain should be renamed to something like
> ii_UnsafeForOldSnapshots, and some comments need to be updated; but
> the above is the substance of it.

I don't know why we'd want to rename it like that...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-04 Thread Kevin Grittner
On Fri, Jun 3, 2016 at 4:24 PM, Kevin Grittner  wrote:

> Consequently, causing the index to be ignored in planning when
> using the old index

That last line should have read "using an old snapshot"

> is not a nice optimization, but necessary for
> correctness.  We already have logic to do this for other cases
> (like HOT updates), so it is a matter of tying in to that existing
> code correctly.  This won't be all that novel.

Just to demonstrate that, the minimal patch to fix behavior in this
area would be:

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 31a1438..6c379da 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2268,6 +2268,9 @@ IndexBuildHeapRangeScan(Relation heapRelation,
Assert(numblocks == InvalidBlockNumber);
}

+   if (old_snapshot_threshold >= 0)
+   indexInfo->ii_BrokenHotChain = true;
+
reltuples = 0;

/*

Of course, ii_BrokenHotChain should be renamed to something like
ii_UnsafeForOldSnapshots, and some comments need to be updated; but
the above is the substance of it.

Attached is what I have so far.  I'm still looking at what other
comments might need to be adjusted, but thought I should put this
much out for comment now.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 31a1438..b072985 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1672,7 +1672,7 @@ BuildIndexInfo(Relation index)
 
 	/* initialize index-build state to default */
 	ii->ii_Concurrent = false;
-	ii->ii_BrokenHotChain = false;
+	ii->ii_UnsafeForOldSnapshots = false;
 
 	return ii;
 }
@@ -2060,7 +2060,7 @@ index_build(Relation heapRelation,
 	 * about any concurrent readers of the tuple; no other transaction can see
 	 * it yet.
 	 */
-	if (indexInfo->ii_BrokenHotChain && !isreindex &&
+	if (indexInfo->ii_UnsafeForOldSnapshots && !isreindex &&
 		!indexInfo->ii_Concurrent)
 	{
 		Oid			indexId = RelationGetRelid(indexRelation);
@@ -2137,11 +2137,11 @@ index_build(Relation heapRelation,
  * so here because the AM might reject some of the tuples for its own reasons,
  * such as being unable to store NULLs.
  *
- * A side effect is to set indexInfo->ii_BrokenHotChain to true if we detect
- * any potentially broken HOT chains.  Currently, we set this if there are
- * any RECENTLY_DEAD or DELETE_IN_PROGRESS entries in a HOT chain, without
- * trying very hard to detect whether they're really incompatible with the
- * chain tip.
+ * A side effect is to set indexInfo->ii_UnsafeForOldSnapshots to true if we
+ * detect any potentially broken HOT chains or if old_snapshot_threshold is
+ * set to allow early pruning/vacuuuming.  Currently, we set this based on
+ * fairly simple tests; it is not guaranteed that using the index would cause
+ * a problem when set, but if the flag is clear it is guaranteed to be safe.
  */
 double
 IndexBuildHeapScan(Relation heapRelation,
@@ -2268,6 +2268,14 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		Assert(numblocks == InvalidBlockNumber);
 	}
 
+	/*
+	 * If allowing early pruning/vacuuming, the index cannot be considered
+	 * safe for old snapshots, since entries could be missing and lsn is set
+	 * to InvalidXLogRecPtr in all pages of the new index.
+	 */
+	if (old_snapshot_threshold >= 0)
+		indexInfo->ii_UnsafeForOldSnapshots = true;
+
 	reltuples = 0;
 
 	/*
@@ -2361,7 +2369,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 	{
 		indexIt = false;
 		/* mark the index as unsafe for old snapshots */
-		indexInfo->ii_BrokenHotChain = true;
+		indexInfo->ii_UnsafeForOldSnapshots = true;
 	}
 	else
 		indexIt = true;
@@ -2488,7 +2496,7 @@ IndexBuildHeapRangeScan(Relation heapRelation,
 		 */
 		indexIt = false;
 		/* mark the index as unsafe for old snapshots */
-		indexInfo->ii_BrokenHotChain = true;
+		indexInfo->ii_UnsafeForOldSnapshots = true;
 	}
 	else
 	{
@@ -3409,9 +3417,9 @@ reindex_index(Oid indexId, bool skip_constraint_checks, char persistence,
 	 !indexForm->indisready ||
 	 !indexForm->indislive);
 		if (index_bad ||
-			(indexForm->indcheckxmin && !indexInfo->ii_BrokenHotChain))
+			(indexForm->indcheckxmin && !indexInfo->ii_UnsafeForOldSnapshots))
 		{
-			if (!indexInfo->ii_BrokenHotChain)
+			if (!indexInfo->ii_UnsafeForOldSnapshots)
 indexForm->indcheckxmin = false;
 			else if (index_bad)
 indexForm->indcheckxmin = true;
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 564e10e..8dcac7e 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -314,7 +314,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid,
 	indexInfo->ii_Unique = true;
 	indexInfo->ii_ReadyForInserts = true;
 	indexInfo->ii_Concurrent = false;
-	indexInfo->ii_BrokenHotChain = false;
+	

Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-03 Thread Kevin Grittner
On Fri, May 27, 2016 at 10:35 AM, Kevin Grittner  wrote:
> On Tue, May 24, 2016 at 4:10 PM, Robert Haas  wrote:

>> [ANALYZE of index with expression may fail to update statistics
>> if ANALYZE runs longer than old_snapshot_threshold]

>> If we can get away with it, it would be a rather large win to only set
>> a snapshot when the table has an expression index.  For purposes of
>> "snapshot too old", though, it will be important that a function in an
>> index which tries to read data from some other table which has been
>> pruned cancels itself when necessary.
>
> I will make this my top priority after resolving the question of whether
> there is an issue with CREATE INDEX.  I expect to have a resolution,
> probably involving some patch, by 3 June.

Due to 9.5 bug-fixing and the index issue taking a bit longer than
I expected, this is now looking like a 7 June resolution.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-06-03 Thread Kevin Grittner
On Fri, May 27, 2016 at 10:18 AM, Kevin Grittner  wrote:
> On Fri, May 27, 2016 at 9:58 AM, Kevin Grittner  wrote:
>> On Tue, May 24, 2016 at 4:56 PM, Andres Freund  wrote:
>
>>> If an old session with >= repeatable read accesses a clustered
>>> table (after the cluster committed), they'll now not see any
>>> errors, because all the LSNs look new.
>>
>> Again, it is new LSNs that trigger errors; if the page has not been
>> written recently the LSN is old and there is no error.  I think you
>> may be seeing problems based on getting the basics of this
>> backwards.
>
> I am reviewing the suggestion of a possible bug now, and will make
> it my top priority until resolved.  By the end of 1 June I will
> either have committed a fix or posted an explanation of why the
> concern is mistaken, with test results to demonstrate correct
> behavior.

This got set back by needing to fix a bug in the 9.5 release.  I am
back on this and have figured out that everyone who commented on
this specific issue was wrong about a very important fact -- the
LSNs in index pages after CREATE INDEX (with or without
CONCURRENTLY) and for REINDEX are always == InvalidXLogRecPtr (0).

That means that a snapshot from before an index build does not
always generate errors when it should on the use of the new index.
(Any early pruning/vacuuuming from before the index build is
missed; activity subsequent to the index build is recognized.)
Consequently, causing the index to be ignored in planning when
using the old index is not a nice optimization, but necessary for
correctness.  We already have logic to do this for other cases
(like HOT updates), so it is a matter of tying in to that existing
code correctly.  This won't be all that novel.

I now expect to push a fix along those lines by Tuesday, 6 June.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-31 Thread Kevin Grittner
On Tue, May 31, 2016 at 10:03 AM, Robert Haas  wrote:
> On Fri, May 27, 2016 at 10:58 AM, Kevin Grittner  wrote:
>>> As far as I can see normal index builds will allow concurrent hot
>>> prunes and everything; since those only require page-level
>>> exclusive locks.
>>>
>>> So for !concurrent builds we might end up with a corrupt index.
>>
>> ... by which you mean an index that omits certainly-dead heap
>> tuples which have been the subject of early pruning or vacuum, even
>> if there are still registered snapshots that include those tuples?
>> Or do you see something else?
>
> I think that is the danger.

Well, the *perceived* danger -- since every page in the new index
would be new enough to be recognized as too recently modified to be
safe for a snapshot that could still see the omitted rows, the only
question I'm sorting out on this is how safe it might be to cause
the index to be ignored in planning when using such a snapshot.
That and demonstrating the safe behavior to those not following
closely enough to see what will happen without a demonstration.

>> Again, since both the heap pages involved and all the new index
>> pages would have LSNs recent enough to trigger the old snapshot
>> check, I'm not sure where the problem is,

> This is a good point, though, I think.

The whole perception of risk in this area seems to be based on
getting that wrong; although the review of this area may yield some
benefit in terms of minimizing false positives.

>>> I think many of the places relying on heap scans with !mvcc
>>> snapshots are problematic in that way. Outdated reads will not be
>>> detected by TestForOldSnapshot() (given the (snapshot)->satisfies
>>> == HeapTupleSatisfiesMVCC condition therein), and rely on the
>>> snapshot to be actually working.  E.g. CLUSTER/ VACUUM FULL rely
>>> on accurate HEAPTUPLE_RECENTLY_DEAD
>>
>> Don't the "RECENTLY" values imply that the transaction is still
>> running which cause the tuple to be dead?  Since tuples are not
>> subject to early pruning or vacuum when that is the case, where do
>> you see a problem?  The snapshot itself has the usual xmin et al.,
>> so I'm not sure what you might mean by "the snapshot to be actually
>> working" if not the override at the time of pruning/vacuuming.
>
> Anybody who calls HeapTupleSatisfiesVacuum() with an xmin value that
> is newer that the oldest registered snapshot in the system (based on
> some snapshots being ignored) might get a return value of
> HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD.

Since the override xmin cannot advance past the earliest
transaction which is still active, HEAPTUPLE_DEAD indicates that
the transaction causing the tuple to be dead has completed and the
tuple is irrevocably dead -- even if there are still snapshots
registered which can see it.  Seeing HEAPTUPLE_DEAD rather than
HEAPTUPLE_RECENTLY_DEAD is strictly limited to tuples which are
certainly and permanently dead and for which the only possible
references are non-MVCC snapshots or existing snapshots subject to
"snapshot too old" monitoring.

> IndexBuildHeapRangeScan(): We might end up with indexIt = false
> instead of indexIt = true.  That should be OK because anyone using the
> old snapshot will see a new page LSN and error out.  We might also
> fail to set indexInfo->ii_BrokenHotChain = true.  I suspect that's a
> problem, but I'm not certain of it.

The latter flag is what I'm currently digging at; but my hope is
that whenever old_snapshot_threshold >= 0 we can set
indexInfo->ii_BrokenHotChain = true to cause the planner to skip
consideration of the index if the snapshot is an "old" one.  That
will avoid some false positives (seeing the error when not strictly
necessary).  If that works out the way I hope, the only down side
is that a scan using a snapshot from an old transaction or cursor
would use some other index or a heap scan; but we already have that
possibility in some cases -- that seems to be the point of the
flag.

> CheckForSerializableConflictOut: Maybe a problem?  If the tuple is
> dead, there's no issue, but if it's recently-dead, there might be.

If the tuple is not visible to the scan, the behavior is unchanged
(a simple return from the function on either HEAPTUPLE_DEAD or
HEAPTUPLE_RECENTLY_DEAD with !visible) and (thus) clearly correct.
If the tuple is visible to us it is currently subject to early
pruning or vacuum (since those operations would get the same
modified xmin) but has not yet had any such treatment since we made
it to this function in the first place.  The processing for SSI
purposes would be unaffected by the possibility that there could
later be early pruning/vacuuming.

Thanks for the review and feedback!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-31 Thread Robert Haas
On Fri, May 27, 2016 at 10:58 AM, Kevin Grittner  wrote:
>> As far as I can see normal index builds will allow concurrent hot
>> prunes and everything; since those only require page-level
>> exclusive locks.
>>
>> So for !concurrent builds we might end up with a corrupt index.
>
> ... by which you mean an index that omits certainly-dead heap
> tuples which have been the subject of early pruning or vacuum, even
> if there are still registered snapshots that include those tuples?
> Or do you see something else?

I think that is the danger.

> Again, since both the heap pages involved and all the new index
> pages would have LSNs recent enough to trigger the old snapshot
> check, I'm not sure where the problem is, but will review closely
> to see what I might have missed.

This is a good point, though, I think.

>> I think many of the places relying on heap scans with !mvcc
>> snapshots are problematic in that way. Outdated reads will not be
>> detected by TestForOldSnapshot() (given the (snapshot)->satisfies
>> == HeapTupleSatisfiesMVCC condition therein), and rely on the
>> snapshot to be actually working.  E.g. CLUSTER/ VACUUM FULL rely
>> on accurate HEAPTUPLE_RECENTLY_DEAD
>
> Don't the "RECENTLY" values imply that the transaction is still
> running which cause the tuple to be dead?  Since tuples are not
> subject to early pruning or vacuum when that is the case, where do
> you see a problem?  The snapshot itself has the usual xmin et al.,
> so I'm not sure what you might mean by "the snapshot to be actually
> working" if not the override at the time of pruning/vacuuming.

Anybody who calls HeapTupleSatisfiesVacuum() with an xmin value that
is newer that the oldest registered snapshot in the system (based on
some snapshots being ignored) might get a return value of
HEAPTUPLE_DEAD rather than HEAPTUPLE_RECENTLY_DEAD.  It seems
necessary to carefully audit all calls to HeapTupleSatisfiesVacuum()
to see whether that difference matters.  I took a quick look and
here's what I see:

statapprox_heap(): Statistical information for the DBA.  The
difference is non-critical.

heap_prune_chain(): Seeing the tuple as dead might cause it to be
removed early.  This should be OK.  Removing the tuple early will
cause the page LSN to be bumped unless RelationNeedsWAL() returns
false, and TransactionIdLimitedForOldSnapshots() includes that as a
condition for disabling early pruning.

IndexBuildHeapRangeScan(): We might end up with indexIt = false
instead of indexIt = true.  That should be OK because anyone using the
old snapshot will see a new page LSN and error out.  We might also
fail to set indexInfo->ii_BrokenHotChain = true.  I suspect that's a
problem, but I'm not certain of it.

acquire_sample_rows: Both return values are treated in the same way.
No problem.

copy_heap_data: We'll end up setting isdead = true instead of
tups_recently_dead += 1.  That means that the new heap won't include
the tuple, which is OK because old snapshots can't read the new heap
without erroring out, assuming that the new heap has LSNs.  The xmin
used here comes from vacuum_set_xid_limits() which goes through
TransactionIdLimitedForOldSnapshots() so this should be OK for the
same reasons as heap_prune_chain().  Another effect of seeing the
tuple as prematurely dead is that we'll call rewrite_heap_dead_tuple()
on it; rewrite_heap_dead_tuple() will presume that if this tuple is
dead, its predecessor in the ctid chain is also dead.  I don't see any
obvious problem with that.

lazy_scan_heap(): Basically, the same thing as heap_prune_chain().

CheckForSerializableConflictOut: Maybe a problem?  If the tuple is
dead, there's no issue, but if it's recently-dead, there might be.

We might want to add comments to some of these places addressing
snapshot_too_old specifically.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-27 Thread Kevin Grittner
On Tue, May 24, 2016 at 4:10 PM, Robert Haas  wrote:
> On Tue, May 24, 2016 at 3:48 PM, Kevin Grittner  wrote:
>> On Tue, May 24, 2016 at 12:00 PM, Andres Freund  wrote:
>>> On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote:
 On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner  wrote:
> On Fri, May 6, 2016 at 7:48 PM, Andres Freund  wrote:

>> That comment reminds me of a question I had: Did you consider the effect
>> of this patch on analyze? It uses a snapshot, and by memory you've not
>> built in a defense against analyze being cancelled.
>>
>> The primary defense is not considering a cancellation except in 30
>> of the 500 places a page is used.  None of those 30 are, as far as
>> I can see (upon review in response to your question), used in the
>> analyze process.
>
> It's not obvious to me how this is supposed to work.  If ANALYZE's
> snapshot is subject to being ignored for xmin purposes because of
> snapshot_too_old, then I would think it would need to consider
> cancelling itself if it reads a page with possibly-removed data, just
> like in any other case.  It seems that we might not actually need a
> snapshot set for ANALYZE in all cases, because the comments say:
>
> /* functions in indexes may want a snapshot set */
> PushActiveSnapshot(GetTransactionSnapshot());
>
> If we can get away with it, it would be a rather large win to only set
> a snapshot when the table has an expression index.  For purposes of
> "snapshot too old", though, it will be important that a function in an
> index which tries to read data from some other table which has been
> pruned cancels itself when necessary.

I will make this my top priority after resolving the question of whether
there is an issue with CREATE INDEX.  I expect to have a resolution,
probably involving some patch, by 3 June.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-27 Thread Kevin Grittner
On Fri, May 27, 2016 at 9:58 AM, Kevin Grittner  wrote:
> On Tue, May 24, 2016 at 4:56 PM, Andres Freund  wrote:

>> If an old session with >= repeatable read accesses a clustered
>> table (after the cluster committed), they'll now not see any
>> errors, because all the LSNs look new.
>
> Again, it is new LSNs that trigger errors; if the page has not been
> written recently the LSN is old and there is no error.  I think you
> may be seeing problems based on getting the basics of this
> backwards.

I am reviewing the suggestion of a possible bug now, and will make
it my top priority until resolved.  By the end of 1 June I will
either have committed a fix or posted an explanation of why the
concern is mistaken, with test results to demonstrate correct
behavior.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-27 Thread Kevin Grittner
On Tue, May 24, 2016 at 4:56 PM, Andres Freund  wrote:

> The LSNs of the created index pages will be new, and we'll thus
> not necessarily error out when requried.

It is *old* LSNs that are *safe* -- *new* LSNs are what trigger the
"snapshot too old" error.  So your observation may be a reason that
there is not, in fact, any bug here.  That said, even if there is
no chance that using the index could generate incorrect results, it
may be worth looking at whether avoiding index use (as mentioned
below) might avoid false positives, and have enough value as an
optimization to add.  Of course, if that is the case, there is the
question of whether that is appropriate for 9.6 or material for the
first version 10 CF.

> At the very least we'd have to set indexInfo->ii_BrokenHotChain
> in that case.

If there is a bug, this is what I will look at first -- having
queries avoid the index if they are using an old snapshot, rather
than throwing an error.

> As far as I can see normal index builds will allow concurrent hot
> prunes and everything; since those only require page-level
> exclusive locks.
>
> So for !concurrent builds we might end up with a corrupt index.

... by which you mean an index that omits certainly-dead heap
tuples which have been the subject of early pruning or vacuum, even
if there are still registered snapshots that include those tuples?
Or do you see something else?

Again, since both the heap pages involved and all the new index
pages would have LSNs recent enough to trigger the old snapshot
check, I'm not sure where the problem is, but will review closely
to see what I might have missed.

> I think many of the places relying on heap scans with !mvcc
> snapshots are problematic in that way. Outdated reads will not be
> detected by TestForOldSnapshot() (given the (snapshot)->satisfies
> == HeapTupleSatisfiesMVCC condition therein), and rely on the
> snapshot to be actually working.  E.g. CLUSTER/ VACUUM FULL rely
> on accurate HEAPTUPLE_RECENTLY_DEAD

Don't the "RECENTLY" values imply that the transaction is still
running which cause the tuple to be dead?  Since tuples are not
subject to early pruning or vacuum when that is the case, where do
you see a problem?  The snapshot itself has the usual xmin et al.,
so I'm not sure what you might mean by "the snapshot to be actually
working" if not the override at the time of pruning/vacuuming.

> If an old session with >= repeatable read accesses a clustered
> table (after the cluster committed), they'll now not see any
> errors, because all the LSNs look new.

Again, it is new LSNs that trigger errors; if the page has not been
written recently the LSN is old and there is no error.  I think you
may be seeing problems based on getting the basics of this
backwards.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Andres Freund
On 2016-05-24 16:09:27 -0500, Kevin Grittner wrote:
> On Tue, May 24, 2016 at 3:54 PM, Andres Freund  wrote:
>
> > what about e.g. concurrent index builds? E.g. IndexBuildHeapRangeScan() 
> > doesn't
> > seem to contain any checks against outdated blocks
>
> Why would it?  We're talking about blocks where there were dead
> tuples, with the transaction which updated or deleted the tuples
> being complete, where those dead tuples were vacuumed away.  These
> should not appear in the index, should they?

(it appears I had switched around the concerns for concurrent and
!concurrent in my head. But I think both might be negatively affected.)

For concurrent indexes we'll actually use a normal mvcc snapshot, which
means heap_getnext() in IndexBuildHeapRangeScan() and
validate_index_heapscan() will error out when encountering removed
tuples. Which means it'll be hard to ever perform a concurrent reindex
where an individual phase takes more than old_snapshot_threshold -
problematic from my POV, given that old_snapshot_threshold cannot be
changed at runtime.


For normal index scans the following appears to be problematic:
case HEAPTUPLE_RECENTLY_DEAD:
/*
 * If tuple is recently deleted then we 
must index it
 * anyway to preserve MVCC semantics.  
(Pre-existing
 * transactions could try to use the 
index after we finish
 * building it, and may need to see 
such tuples.)
 *
 * However, if it was HOT-updated then 
we must only index
 * the live tuple at the end of the 
HOT-chain.  Since this
 * breaks semantics for pre-existing 
snapshots, mark the
 * index as unusable for them.
 */

afaics that detection is broken if we advance the xmin horizon more
aggressively than the snapshot. The LSNs of the created index pages will
be new, and we'll thus not necessarily error out when requried. At the
very least we'd have to set indexInfo->ii_BrokenHotChain in that case.
As far as I can see normal index builds will allow concurrent hot prunes
and everything; since those only require page-level exclusive locks.

So for !concurrent builds we might end up with a corrupt index.


I think many of the places relying on heap scans with !mvcc snapshots
are problematic in that way. Outdated reads will not be detected by
TestForOldSnapshot() (given the (snapshot)->satisfies ==
HeapTupleSatisfiesMVCC condition therein), and rely on the snapshot to
be actually working.  E.g. CLUSTER/ VACUUM FULL rely on accurate
HEAPTUPLE_RECENTLY_DEAD
switch (HeapTupleSatisfiesVacuum(tuple, OldestXmin, buf))
{
...
case HEAPTUPLE_RECENTLY_DEAD:
tups_recently_dead += 1;
/* fall through */
case HEAPTUPLE_LIVE:
/* Live or recently dead, must copy it */
isdead = false;
break;

If an old session with >= repeatable read accesses a clustered table
(after the cluster committed), they'll now not see any errors, because
all the LSNs look new.



> >>> Is there anything preventing this from becoming a problem?
> >>
> >> The fundamental approach that the error can only appear on
> >> user-facing scans, not internal reads and positioning.
> >
> >> Unless there is some code path that uses a scan where the snapshot
> >> age is checked, this cannot be a problem.  I don't see any such
> >> path, but if you do, please let me know, and I'll fix things.
> >
> > That scares me. Not throwing an error, and not being broken are entirely
> > different things.
>
> Absolutely, but let's not start pointing at random chunks of code
> and asking why an error isn't thrown there without showing that you
> get bad results otherwise, or at least some plausible argument why
> you might.

This attitude is disturbing. You've evidently not at all looked at the
snapshot issues around analyze before; even though snapshot too old at
the very least introduces a behavioural issue there (if not a bug at
least in the case of expression based stuff). I've asked you about
principled defenses, and your answer was "we don't error out".  Now I
point to another place, and you respond with a relatively strong
dismissal.  Independent of me being right or wrong, that seems to be the
completely wrong way round.


Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Kevin Grittner
On Tue, May 24, 2016 at 4:09 PM, Kevin Grittner  wrote:
> On Tue, May 24, 2016 at 3:54 PM, Andres Freund  wrote:

>> It appears that concurrent index builds are currently broken
>> from a quick skim?
>
> Either you don't understand this feature very well, or I don't
> understand concurrent index build very well.

And it may be the latter.  On closer review I think some adjustment
may be needed in IndexBuildHeapRangeScan().  I'm not sure that
throwing the error is necessary, since there is a flag to say that
the index is unsafe for existing snapshots -- it may be enough to
set that flag.

Sorry for my earlier email.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Kevin Grittner
On Tue, May 24, 2016 at 4:10 PM, Robert Haas  wrote:

> For purposes of
> "snapshot too old", though, it will be important that a function in an
> index which tries to read data from some other table which has been
> pruned cancels itself when necessary.

Hm.  I'll try to work up a test case for this.  If you have one,
please send it along to me.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Robert Haas
On Tue, May 24, 2016 at 3:48 PM, Kevin Grittner  wrote:
> On Tue, May 24, 2016 at 12:00 PM, Andres Freund  wrote:
>> On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote:
>>> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner  wrote:
 On Fri, May 6, 2016 at 7:48 PM, Andres Freund  wrote:
>>>
> That comment reminds me of a question I had: Did you consider the effect
> of this patch on analyze? It uses a snapshot, and by memory you've not
> built in a defense against analyze being cancelled.
>
> The primary defense is not considering a cancellation except in 30
> of the 500 places a page is used.  None of those 30 are, as far as
> I can see (upon review in response to your question), used in the
> analyze process.

It's not obvious to me how this is supposed to work.  If ANALYZE's
snapshot is subject to being ignored for xmin purposes because of
snapshot_too_old, then I would think it would need to consider
cancelling itself if it reads a page with possibly-removed data, just
like in any other case.  It seems that we might not actually need a
snapshot set for ANALYZE in all cases, because the comments say:

/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());

If we can get away with it, it would be a rather large win to only set
a snapshot when the table has an expression index.  For purposes of
"snapshot too old", though, it will be important that a function in an
index which tries to read data from some other table which has been
pruned cancels itself when necessary.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Kevin Grittner
On Tue, May 24, 2016 at 3:54 PM, Andres Freund  wrote:

> what about e.g. concurrent index builds? E.g. IndexBuildHeapRangeScan() 
> doesn't
> seem to contain any checks against outdated blocks

Why would it?  We're talking about blocks where there were dead
tuples, with the transaction which updated or deleted the tuples
being complete, where those dead tuples were vacuumed away.  These
should not appear in the index, should they?

> and that's certainly not ok?

Why not?

> It appears that concurrent index builds are currently broken
> from a quick skim?

Either you don't understand this feature very well, or I don't
understand concurrent index build very well.  I thought we burned a
lot of time going through the index an extra time just to get rid
of dead tuples -- why would it be a problem not to add them in the
first place?

>>> Is there anything preventing this from becoming a problem?
>>
>> The fundamental approach that the error can only appear on
>> user-facing scans, not internal reads and positioning.
>
>> Unless there is some code path that uses a scan where the snapshot
>> age is checked, this cannot be a problem.  I don't see any such
>> path, but if you do, please let me know, and I'll fix things.
>
> That scares me. Not throwing an error, and not being broken are entirely
> different things.

Absolutely, but let's not start pointing at random chunks of code
and asking why an error isn't thrown there without showing that you
get bad results otherwise, or at least some plausible argument why
you might.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Andres Freund
On 2016-05-24 14:48:35 -0500, Kevin Grittner wrote:
> On Tue, May 24, 2016 at 12:00 PM, Andres Freund  wrote:
> > On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote:
> >> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner  wrote:
> >>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund  wrote:
> >>
>  That comment reminds me of a question I had: Did you consider the effect
>  of this patch on analyze? It uses a snapshot, and by memory you've not
>  built in a defense against analyze being cancelled.
> 
> The primary defense is not considering a cancellation except in 30
> of the 500 places a page is used.  None of those 30 are, as far as
> I can see (upon review in response to your question), used in the
> analyze process.

Uh. How's that a defense? That seems like a recipe for corruption, not a
protection. That might be acceptable in the analyze case, but what about
e.g. concurrent index builds? E.g. IndexBuildHeapRangeScan() doesn't
seem to contain any checks against outdated blocks, and that's certainly
not ok? It appears that concurrent index builds are currently broken
from a quick skim?

> > Is there anything preventing this from becoming a problem?
> 
> The fundamental approach that the error can only appear on
> user-facing scans, not internal reads and positioning.

> Unless there is some code path that uses a scan where the snapshot
> age is checked, this cannot be a problem.  I don't see any such
> path, but if you do, please let me know, and I'll fix things.

That scares me. Not throwing an error, and not being broken are entirely
different things.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Alvaro Herrera
Kevin Grittner wrote:
> On Tue, May 24, 2016 at 12:00 PM, Andres Freund  wrote:
> > On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote:
> >> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner  wrote:
> >>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund  wrote:
> >>
>  That comment reminds me of a question I had: Did you consider the effect
>  of this patch on analyze? It uses a snapshot, and by memory you've not
>  built in a defense against analyze being cancelled.
> 
> The primary defense is not considering a cancellation except in 30
> of the 500 places a page is used.  None of those 30 are, as far as
> I can see (upon review in response to your question), used in the
> analyze process.

I think what this means is that vacuum might remove tuples that would
otherwise be visible to analyze's snapshot.  I suppose that's
acceptable.  I wondered if it could cause harm to the size of the
sample, but after looking at acquire_sample_rows briefly I think it'd be
unharmed.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Kevin Grittner
On Tue, May 24, 2016 at 12:00 PM, Andres Freund  wrote:
> On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote:
>> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner  wrote:
>>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund  wrote:
>>
 That comment reminds me of a question I had: Did you consider the effect
 of this patch on analyze? It uses a snapshot, and by memory you've not
 built in a defense against analyze being cancelled.

The primary defense is not considering a cancellation except in 30
of the 500 places a page is used.  None of those 30 are, as far as
I can see (upon review in response to your question), used in the
analyze process.

>> With a 1min threshold, after loading a table "v" with a million
>> rows, beginning a repeatable read transaction on a different
>> connection and opening a cursor against that table, deleting almost
>> all rows on the original connection, and waiting a few minutes I
>> see this in the open transaction:
>>
>> test=# analyze verbose v;
>> INFO:  analyzing "public.v"
>> INFO:  "v": scanned 4425 of 4425 pages, containing 1999 live rows and
>> 0 dead rows; 1999 rows in sample, 1999 estimated total rows
>> ANALYZE
>> test=# select count(*) from v;
>> ERROR:  snapshot too old
>>
>> Meanwhile, no errors appeared in the log from autovacuum.
>
> I'd guess that that problem could only be reproduced if autoanalyze
> takes longer than the timeout, and there's rows pruned after it has
> started?  Analyze IIRC acquires a new snapshot when getting sample rows,
> so it'll not necessarily trigger in the above scenario, right?

Per Tom's recollection and my review of the code, the transaction
snapshot would be used in the test I show above, and the
combination of the verbose output the subsequent query show clearly
that if one of the page references capable of throwing the error
were encountered with this snapshot, the error would be thrown.  So
at least in this ANALYZE run, there is empirical support for what I
found in reviewing the code -- none of the places where we check
for an old snapshot are exercised during an ANALYZE.

> Is there anything preventing this from becoming a problem?

The fundamental approach that the error can only appear on
user-facing scans, not internal reads and positioning.

Unless there is some code path that uses a scan where the snapshot
age is checked, this cannot be a problem.  I don't see any such
path, but if you do, please let me know, and I'll fix things.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Andres Freund
On 2016-05-24 13:04:09 -0500, Kevin Grittner wrote:
> On Tue, May 24, 2016 at 12:00 PM, Andres Freund  wrote:
> 
> > Analyze IIRC acquires a new snapshot when getting sample rows,
> 
> I could not find anything like that, and a case-insensitive search
> of analyze.c finds no occurrences of "snap".  Can you remember
> where you think you saw something that would cause the ANALYZE
> command in my test to use a snapshot other than the one from the
> REPEATABLE READ transaction in which it was run?

It's outside of analyze.c:

autovacuum_do_vac_analyze() -> vacuum() ->
if (options & VACOPT_VACUUM)
use_own_xacts = true;
else
{
Assert(options & VACOPT_ANALYZE);
if (IsAutoVacuumWorkerProcess())
use_own_xacts = true;
...
if (options & VACOPT_ANALYZE)
{
/*
 * If using separate xacts, start one for 
analyze. Otherwise,
 * we can use the outer transaction.
 */
if (use_own_xacts)
{
StartTransactionCommand();
/* functions in indexes may want a 
snapshot set */

PushActiveSnapshot(GetTransactionSnapshot());
}

analyze_rel(relid, relation, options, params,
va_cols, in_outer_xact, 
vac_strategy);

if (use_own_xacts)
{
PopActiveSnapshot();
CommitTransactionCommand();
}
}

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Tom Lane
Kevin Grittner  writes:
> On Tue, May 24, 2016 at 12:00 PM, Andres Freund  wrote:
>> Analyze IIRC acquires a new snapshot when getting sample rows,

> I could not find anything like that, and a case-insensitive search
> of analyze.c finds no occurrences of "snap".  Can you remember
> where you think you saw something that would cause the ANALYZE
> command in my test to use a snapshot other than the one from the
> REPEATABLE READ transaction in which it was run?

The control logic concerned with that is in vacuum.c, not analyze.c.
AFAICS a manual ANALYZE that's within a transaction block would use
the prevailing snapshot (see in_outer_xact tests).  There are a lot
of cases that wouldn't, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Alvaro Herrera
Kevin Grittner wrote:
> On Tue, May 24, 2016 at 12:00 PM, Andres Freund  wrote:
> 
> > Analyze IIRC acquires a new snapshot when getting sample rows,
> 
> I could not find anything like that, and a case-insensitive search
> of analyze.c finds no occurrences of "snap".  Can you remember
> where you think you saw something that would cause the ANALYZE
> command in my test to use a snapshot other than the one from the
> REPEATABLE READ transaction in which it was run?

For ANALYZE, the snapshot is set in vacuum() prior to calling
analyze_rel(); see vacuum.c.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Kevin Grittner
On Tue, May 24, 2016 at 12:00 PM, Andres Freund  wrote:

> Analyze IIRC acquires a new snapshot when getting sample rows,

I could not find anything like that, and a case-insensitive search
of analyze.c finds no occurrences of "snap".  Can you remember
where you think you saw something that would cause the ANALYZE
command in my test to use a snapshot other than the one from the
REPEATABLE READ transaction in which it was run?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Andres Freund
On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote:
> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner  wrote:
> > On Fri, May 6, 2016 at 7:48 PM, Andres Freund  wrote:
> 
> >> That comment reminds me of a question I had: Did you consider the effect
> >> of this patch on analyze? It uses a snapshot, and by memory you've not
> >> built in a defense against analyze being cancelled.
> >
> > Will need to check on that.
> 
> With a 1min threshold, after loading a table "v" with a million
> rows, beginning a repeatable read transaction on a different
> connection and opening a cursor against that table, deleting almost
> all rows on the original connection, and waiting a few minutes I
> see this in the open transaction:
> 
> test=# analyze verbose v;
> INFO:  analyzing "public.v"
> INFO:  "v": scanned 4425 of 4425 pages, containing 1999 live rows and
> 0 dead rows; 1999 rows in sample, 1999 estimated total rows
> ANALYZE
> test=# select count(*) from v;
> ERROR:  snapshot too old
> 
> Meanwhile, no errors appeared in the log from autovacuum.

I'd guess that that problem could only be reproduced if autoanalyze
takes longer than the timeout, and there's rows pruned after it has
started?  Analyze IIRC acquires a new snapshot when getting sample rows,
so it'll not necessarily trigger in the above scenario, right?

Is there anything preventing this from becoming a problem?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-24 Thread Kevin Grittner
On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner  wrote:
> On Fri, May 6, 2016 at 7:48 PM, Andres Freund  wrote:

>> That comment reminds me of a question I had: Did you consider the effect
>> of this patch on analyze? It uses a snapshot, and by memory you've not
>> built in a defense against analyze being cancelled.
>
> Will need to check on that.

With a 1min threshold, after loading a table "v" with a million
rows, beginning a repeatable read transaction on a different
connection and opening a cursor against that table, deleting almost
all rows on the original connection, and waiting a few minutes I
see this in the open transaction:

test=# analyze verbose v;
INFO:  analyzing "public.v"
INFO:  "v": scanned 4425 of 4425 pages, containing 1999 live rows and
0 dead rows; 1999 rows in sample, 1999 estimated total rows
ANALYZE
test=# select count(*) from v;
ERROR:  snapshot too old

Meanwhile, no errors appeared in the log from autovacuum.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-07 Thread Andres Freund
On 2016-05-06 20:28:27 -0500, Kevin Grittner wrote:
> On Fri, May 6, 2016 at 7:48 PM, Andres Freund  wrote:
> > On 2016-05-06 19:43:24 -0500, Kevin Grittner wrote:
> >> It's disappointing that I am not getting more consistent numbers,
> >> but NUMA can be hard to manage that way.
> >
> > FWIW, in my experience, unless you disable autovacuum (or rather
> > auto-analyze), the effects from non-predicable analyze runs with
> > long-running snapshots are worse.  I mean the numa effects suck, but in
> > r/w workload effects of analyze are often much worse.
> 
> Hm.  But the benefits of the patch are not there if autovacuum is
> off.  I'm gonna need to ponder the best way to test given all that.

It's sufficient to set the threshhold for analyze very high, as vacuum
itself doesn't have that problem. I basically just set
autovacuum_analyze_threshold to INT_MAX , that alleviates the problem
for normal runs.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-06 Thread Kevin Grittner
On Fri, May 6, 2016 at 7:48 PM, Andres Freund  wrote:
> On 2016-05-06 19:43:24 -0500, Kevin Grittner wrote:
>> It's disappointing that I am not getting more consistent numbers,
>> but NUMA can be hard to manage that way.
>
> FWIW, in my experience, unless you disable autovacuum (or rather
> auto-analyze), the effects from non-predicable analyze runs with
> long-running snapshots are worse.  I mean the numa effects suck, but in
> r/w workload effects of analyze are often much worse.

Hm.  But the benefits of the patch are not there if autovacuum is
off.  I'm gonna need to ponder the best way to test given all that.

> That comment reminds me of a question I had: Did you consider the effect
> of this patch on analyze? It uses a snapshot, and by memory you've not
> built in a defense against analyze being cancelled.

Will need to check on that.

>> Will push shortly with the nit-pick fixes you requested.
>
> Cool.

Done.

I will be checking in on the buildfarm, but if it starts causing
problems while I'm, say, sleeping -- I won't be offended if someone
else reverts 7e3da1c4737fd6582e12c80983987e4d2cbc1d17.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [HACKERS] Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold

2016-05-06 Thread Andres Freund
On 2016-05-06 19:43:24 -0500, Kevin Grittner wrote:
> It's disappointing that I am not getting more consistent numbers,
> but NUMA can be hard to manage that way.

FWIW, in my experience, unless you disable autovacuum (or rather
auto-analyze), the effects from non-predicable analyze runs with
long-running snapshots are worse.  I mean the numa effects suck, but in
r/w workload effects of analyze are often much worse.


That comment reminds me of a question I had: Did you consider the effect
of this patch on analyze? It uses a snapshot, and by memory you've not
built in a defense against analyze being cancelled.


> Will push shortly with the nit-pick fixes you requested.

Cool.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >