Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-05-04 Thread Simon Riggs
On 2 May 2012 13:41, Robert Haas robertmh...@gmail.com wrote:

 So on further reflection I'm thinking it may be best just to stick
 with a hard conflict for now and see what feedback we get from beta
 testers.

Which is what I was expecting y'all to conclude once you'd looked at
the task in more detail.

And I'm happy with the concept of beta being a period where we listen
to feedback, not just bugs, and decide whether further refinements are
required.

What I think is possible is to alter the conflict as it hits the
backend. If a backend has enable_indexonly = off then it wouldn't be
affected by those conflicts anyhow. So if we simply record whether we
are using an index only plan then we'll know whether to ignore it or
abort. I think we can handle that by marking the snapshot at executor
startup time. Needs a few other pushups but not that hard.

The likelihood is that SQL that uses index only won't run for long
enough to be cancelled anyway.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-05-02 Thread Robert Haas
On Thu, Apr 26, 2012 at 8:03 PM, Robert Haas robertmh...@gmail.com wrote:
 So, as a first step, I've committed a patch that just throws a hard
 conflict.  I think we probably want to optimize this further, and I'm
 going to work investigate that next.  But it seemed productive to get
 this much out of the way first, so I did.

I've been thinking about this some more.  What's worrying me is that a
visibility conflict, however we implement it, could be *worse* from
the user's point of view than just killing the query.  After all,
there's a reasonable likelihood that a single visibility map page
covers the whole relation (or all the blocks that the user is
interested in), so any sort of conflict is basically going to turn the
index-only scan into an index-scan plus some extra overhead.  And if
the planner had known that the user was going to get an index-only
scan rather than just a plain index scan, it might well have picked
some other plan in the first place.

Another problem is that, if we add a test for visibility conflicts
into visibilitymap_test(), I'm afraid we're going to drive up the cost
of that function very significantly.  Previous testing suggests that
that efficiency or lack thereof of that function is already a
performance problem for index-only scans, which kinda makes me not
that excited about adding another branch in there somewhere (and even
less excited about any proposed implementation that would add an
lwlock acquire/release or similar).

So on further reflection I'm thinking it may be best just to stick
with a hard conflict for now and see what feedback we get from beta
testers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-26 Thread Robert Haas
On Mon, Apr 16, 2012 at 4:13 PM, Robert Haas robertmh...@gmail.com wrote:
 But fundamentally we all seem to be converging on some variant of the
 soft conflict idea.

So, as a first step, I've committed a patch that just throws a hard
conflict.  I think we probably want to optimize this further, and I'm
going to work investigate that next.  But it seemed productive to get
this much out of the way first, so I did.

In studying this, it strikes me that it would be rather nicer if we
recovery conflicts could somehow arrange to roll back the active
transaction by some means short of a FATAL error.  I think there are
some protocol issues with doing that, but I still wish we could figure
out a way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Noah Misch
On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:
 In the department of query cancellations, I believe Noah argued
 previously that this wasn't really going to cause a problem.  And,
 indeed, if the master has a mix of inserts, updates, and deletes, then
 it seems likely that any recovery conflicts generated this way would
 be hitting about the same place in the XID space as the ones caused by
 pruning away dead tuples.  What will be different, if we go this
 route, is that an insert-only master, which right now only generates
 conflicts at freezing time, will instead generate conflicts much
 sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
 enough to know whether this is a problem, and I'm not trying to block
 this approach, but I do want to make sure that everyone agrees that
 it's acceptable before we go do it, because inevitably somebody is
 going to get bit.

Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
A standby with the GUC off would ignore all-visible indicators and also
decline to generate conflicts when flipping them on.

 As to correctness, after further review of lazy_scan_heap(), I think
 there are some residual bugs here that need to be fixed whatever we
 decide to do about the Hot Standby case:
 
 1. I noticed that when we do PageSetAllVisible() today we follow it up
 with SetBufferCommitInfoNeedsSave().  PD_ALL_VISIBLE is not merely a
 hint, so I think these should be changed to MarkBufferDirty(), which
 shouldn't make much difference given current code, but proposals to
 handle hint changes differently than non-hint changes abound, so it's
 probably not good to rely on those meaning the same thing forever.

Do you refer to PD_ALL_VISIBLE as not merely a hint due to the requirement
to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a
positive visibility map bit?  That is to say, PD_ALL_VISIBLE is fully a hint
in its role as a witness of tuple visibility, but it is not a hint in its role
as a witness of the visibility map status?  In any event, I agree that those
call sites should use MarkBufferDirty().

The large comment in SetBufferCommitInfoNeedsSave() seems incorrect.  On
systems with weaker memory ordering, the FlushBuffer() process's clearing of
BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave()
process until some time after the former retrieved buffer contents from shared
memory.  Memory barriers could make the comment true, but we should probably
just update the comment to explain that a race condition may eat the update
entirely.  Incidentally, is there a good reason for MarkBufferDirty() to
increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave()
not to do so?  Looks like an oversight.

 2. More seriously, lazy_scan_heap() releases the buffer lock before
 setting the all-visible bit on the heap.  This looks just plain wrong
 (and it's my fault, to be clear).  Somebody else can come along after
 we've released the buffer lock and mark the page not-all-visible.
 That concurrent process will check the visibility map bit, find it
 already clear, and go on its merry way.  Then, we set the visibility
 map bit and go on our merry way.  Now we have a not-all-visible page
 with the all-visible bit set in the visibility map.   Oops.

Hmm, indeed.  This deserves its own open item.

 I think
 this probably needs to be rewritten so that lazy_scan_heap() acquires
 a pin on the visibility map page before locking the buffer for cleanup
 and holds onto that pin until either we exit the range of heap pages
 covered by that visibility map page or trigger index vac due to
 maintenance_work_mem exhaustion.  With that infrastructure in place,
 we can set the visibility map bit at the same time we set the
 page-level bit without risking holding the buffer lock across a buffer
 I/O (we might have to hold the buffer lock across a WAL flush in the
 worst case, but that hazard exists in a number of other places as well
 and fixing it here is well out of scope).

Looks reasonable at a glance.

 1. vacuum on master sets the page-level bit and the visibility map bit
 2. the heap page with the bit is written to disk BEFORE the WAL entry
 generated in (1) gets flushed; this is allowable, since it's not an
 error for the page-level bit to be set while the visibility-map bit is
 unset, only the other way around
 3. crash (still before the WAL entry is flushed)
 4. some operation on the master emits an FPW for the page without
 first rendering it not-all-visible
 
 At present, I'm not sure there's any real problem here, since normally
 an all-visible heap page is only going to get another WAL entry if
 somebody does an insert, update, or delete on it, in which case the
 visibility map bit is going to get cleared anyway.  Freezing the page
 might emit a new FPW, but that's going to generate conflicts anyway,
 so no problem there.  So I think there's no real problem here, but it
 

Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Simon Riggs
On Mon, Apr 16, 2012 at 8:02 AM, Noah Misch n...@leadboat.com wrote:
 On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:
 In the department of query cancellations, I believe Noah argued
 previously that this wasn't really going to cause a problem.  And,
 indeed, if the master has a mix of inserts, updates, and deletes, then
 it seems likely that any recovery conflicts generated this way would
 be hitting about the same place in the XID space as the ones caused by
 pruning away dead tuples.  What will be different, if we go this
 route, is that an insert-only master, which right now only generates
 conflicts at freezing time, will instead generate conflicts much
 sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
 enough to know whether this is a problem, and I'm not trying to block
 this approach, but I do want to make sure that everyone agrees that
 it's acceptable before we go do it, because inevitably somebody is
 going to get bit.

 Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
 A standby with the GUC off would ignore all-visible indicators and also
 decline to generate conflicts when flipping them on.

I'm against adding a new GUC, especially for that fairly thin reason.


 1. vacuum on master sets the page-level bit and the visibility map bit
 2. the heap page with the bit is written to disk BEFORE the WAL entry
 generated in (1) gets flushed; this is allowable, since it's not an
 error for the page-level bit to be set while the visibility-map bit is
 unset, only the other way around
 3. crash (still before the WAL entry is flushed)
 4. some operation on the master emits an FPW for the page without
 first rendering it not-all-visible

 At present, I'm not sure there's any real problem here, since normally
 an all-visible heap page is only going to get another WAL entry if
 somebody does an insert, update, or delete on it, in which case the
 visibility map bit is going to get cleared anyway.  Freezing the page
 might emit a new FPW, but that's going to generate conflicts anyway,
 so no problem there.  So I think there's no real problem here, but it
 doesn't seem totally future-proof - any future type of WAL record that
 might modify the page without rendering it not-all-visible would
 create a very subtle hazard for Hot Standby.  We could dodge the whole
 issue by leaving the hack in heapgetpage() intact and just be happy
 with making index-only scans work, or maybe somebody has a more clever
 idea.

 Good point.  We could finagle things so the standby notices end-of-recovery
 checkpoints and blocks the optimization for older snapshots.  For example,
 maintain an integer count of end-of-recovery checkpoints seen and store that
 in each Snapshot instead of takenDuringRecovery; use 0 on a master.  When the
 global value is greater than the snapshot's value, disable the optimization
 for that snapshot.  I don't know whether the optimization is powerful enough
 to justify that level of trouble, but it's an option to consider.

 For a different approach, targeting the fragility, we could add assertions to
 detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a
 full-page image.

We don't need a future proof solution, especially not at this stage of
the release cycle. Every time you add a WAL record, we need to
consider the possible conflict impact.

We just need a simple and clear solution. I'll work on that.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Heikki Linnakangas

On 16.04.2012 10:38, Simon Riggs wrote:

On Mon, Apr 16, 2012 at 8:02 AM, Noah Mischn...@leadboat.com  wrote:

On Fri, Apr 13, 2012 at 12:33:06PM -0400, Robert Haas wrote:

In the department of query cancellations, I believe Noah argued
previously that this wasn't really going to cause a problem.  And,
indeed, if the master has a mix of inserts, updates, and deletes, then
it seems likely that any recovery conflicts generated this way would
be hitting about the same place in the XID space as the ones caused by
pruning away dead tuples.  What will be different, if we go this
route, is that an insert-only master, which right now only generates
conflicts at freezing time, will instead generate conflicts much
sooner, as soon as the relation is vacuumed.  I do not use Hot Standby
enough to know whether this is a problem, and I'm not trying to block
this approach, but I do want to make sure that everyone agrees that
it's acceptable before we go do it, because inevitably somebody is
going to get bit.


Given sufficient doubt, we could add a GUC, say hot_standby_use_all_visible.
A standby with the GUC off would ignore all-visible indicators and also
decline to generate conflicts when flipping them on.


I'm against adding a new GUC, especially for that fairly thin reason.


Same here.

Can we have a soft hot standby conflict that doesn't kill the query, 
but disables index-only-scans?


In the long run, perhaps we need to store XIDs in the visibility map 
instead of just a bit. If you we only stored one xid per 100 pages or 
something like that, the storage overhead would not be much higher than 
what we have at the moment. But that's obviously not going to happen for 
9.2...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Can we have a soft hot standby conflict that doesn't kill the query, 
 but disables index-only-scans?

Well, there wouldn't be any way for the planner to know whether an
index-only scan would be safe or not.  I think this would have to look
like a run-time fallback.  Could it be structured as return that the
page's all-visible bit is not set, instead of failing?  Or am I
confused about where the conflict is coming from?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Simon Riggs
On Mon, Apr 16, 2012 at 3:19 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Can we have a soft hot standby conflict that doesn't kill the query,
 but disables index-only-scans?

 Well, there wouldn't be any way for the planner to know whether an
 index-only scan would be safe or not.  I think this would have to look
 like a run-time fallback.  Could it be structured as return that the
 page's all-visible bit is not set, instead of failing?  Or am I
 confused about where the conflict is coming from?

The starting point is that HS now offers 4 different mechanisms for
avoiding conflicts, probably the best of which is hot standby
feedback. So we only need to improve things if those techniques don't
work for people. So initially, my attitude is lets throw a conflict
and wait for feedback during beta.

If we do need to do something, then introduce concept of a visibility conflict.

On replay:
If feedback not set, set LSN of visibility conflict on PROCs that
conflict, if not already set.

On query:
If feedback not set, check conflict LSN against page, if page is
later, check visibility.

In terms of optimisation, I wouldn't expect to have to adjust costs at
all. The difference would only show for long running queries and
typically they don't touch too much new data as a fraction of total.
The cost model for index only is pretty crude anyhow.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas
On Mon, Apr 16, 2012 at 4:26 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Can we have a soft hot standby conflict that doesn't kill the query, but
 disables index-only-scans?

Yeah, something like that seems possible.

For example, suppose the master includes, in each
mark-heap-page-all-visible record, the newest XID on the page.  On the
standby, we keep track of the most recent such XID ever seen in shared
memory.  After noting that a page is all-visible, the standby
cross-checks its snapshot against this XID and does the heap fetch
anyway if it's too new.  Conceivably this can be done with memory
barriers but not without locks: we must update the XID in shared
memory *before* marking the page all-visible, and we must check the
visibility map or page-level bit *before* fetching the XID from shared
memory - but the actual reads and writes of the XID are atomic.

Now, if an index-only scan suffers a very high number of these soft
conflicts and consequently does a lot more heap fetches than
expected, performance might suck.  But that should be rare, and could
be mitigated by turning on hot_standby_feedback.  Also, there might be
some hit for repeatedly reading that XID from memory.

Alternatively, we could try to forbid index-only scan plans from being
created in the first place *only when* the underlying snapshot is too
old.  But then what happens if a conflict happens later, after the
plan has been created but before it's fully executed?  At that point
it's too late to switch gears, so we'd still need something like the
above.  And the above might be adequate all by itself, since in
practice index-only scans are mostly going to be useful when the data
isn't changing all that fast, so most of the queries that would be
cancelled by hard conflicts wouldn't have actually had a problem
anyway.

 In the long run, perhaps we need to store XIDs in the visibility map instead
 of just a bit. If you we only stored one xid per 100 pages or something like
 that, the storage overhead would not be much higher than what we have at the
 moment. But that's obviously not going to happen for 9.2...

Well, that would also have the undesirable side effect of greatly
reducing the granularity of the map.  As it is, updating a tiny
fraction of the tuples in the table could result in the entire table
ceasing to be not-all-visible if you happen to update exactly one
tuple per page through the entire heap.  Or more generally, updating
X% of the rows in the table can cause Y% of the rows in the table to
no longer be visible for index-only-scan purposes, where Y  X.  What
you're proposing would make that much worse.

I think we're actually going to find that the current system isn't
tight enough; my suspicion is that users are going to complain that
we're not aggressive enough about marking pages all-visible, because
autovac won't kick in until updates+deletes20%, but (1) index-only
scans also care about inserts and (2) by the time we've got 20% dead
tuples index-only scans may well be worthless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas
On Mon, Apr 16, 2012 at 3:02 AM, Noah Misch n...@leadboat.com wrote:
 Do you refer to PD_ALL_VISIBLE as not merely a hint due to the requirement
 to prevent a page from simultaneously having a negative PD_ALL_VISIBLE and a
 positive visibility map bit?  That is to say, PD_ALL_VISIBLE is fully a hint
 in its role as a witness of tuple visibility, but it is not a hint in its role
 as a witness of the visibility map status?

Exactly.

 The large comment in SetBufferCommitInfoNeedsSave() seems incorrect.  On
 systems with weaker memory ordering, the FlushBuffer() process's clearing of
 BM_JUST_DIRTIED may not be visible to the SetBufferCommitInfoNeedsSave()
 process until some time after the former retrieved buffer contents from shared
 memory.

True.

 Memory barriers could make the comment true, but we should probably
 just update the comment to explain that a race condition may eat the update
 entirely.

Agreed.

 Incidentally, is there a good reason for MarkBufferDirty() to
 increment pgBufferUsage.shared_blks_dirtied and SetBufferCommitInfoNeedsSave()
 not to do so?  Looks like an oversight.

Also agreed.

 2. More seriously, lazy_scan_heap() releases the buffer lock before
 setting the all-visible bit on the heap.  This looks just plain wrong
 (and it's my fault, to be clear).  Somebody else can come along after
 we've released the buffer lock and mark the page not-all-visible.
 That concurrent process will check the visibility map bit, find it
 already clear, and go on its merry way.  Then, we set the visibility
 map bit and go on our merry way.  Now we have a not-all-visible page
 with the all-visible bit set in the visibility map.   Oops.

 Hmm, indeed.  This deserves its own open item.

Also agreed.

 Good point.  We could finagle things so the standby notices end-of-recovery
 checkpoints and blocks the optimization for older snapshots.  For example,
 maintain an integer count of end-of-recovery checkpoints seen and store that
 in each Snapshot instead of takenDuringRecovery; use 0 on a master.  When the
 global value is greater than the snapshot's value, disable the optimization
 for that snapshot.  I don't know whether the optimization is powerful enough
 to justify that level of trouble, but it's an option to consider.

I suspect not.  It seems we can make index-only scans work without
doing something like this; it's only the sequential-scan optimization
we might lose.  But nobody's ever really complained about not having
that, to my knowledge.

 For a different approach, targeting the fragility, we could add assertions to
 detect unexpected cases of a page bearing PD_ALL_VISIBLE submitted as a
 full-page image.

The holes are narrow enough that I fear such cases would be detected
only on high-velocity production systems, which is not exactly where
you want to find out about such problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-16 Thread Robert Haas
On Mon, Apr 16, 2012 at 1:58 PM, Simon Riggs si...@2ndquadrant.com wrote:
 If we do need to do something, then introduce concept of a visibility 
 conflict.

 On replay:
 If feedback not set, set LSN of visibility conflict on PROCs that
 conflict, if not already set.

 On query:
 If feedback not set, check conflict LSN against page, if page is
 later, check visibility.

Hmm, should have read the whole thread before replying.  This similar
to what I just proposed in response to Heikki's message, but using LSN
in lieu of (or maybe you mean in addition to) XID.

I don't think we can ignore the need to throw conflicts just because
hot_standby_feedback is set; there are going to be corner cases, for
example, when it's just recently been turned on and the master has
already done cleanup; or if the master and standby have recently
gotten disconnected for even just a few seconds.

But fundamentally we all seem to be converging on some variant of the
soft conflict idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] index-only scans vs. Hot Standby, round two

2012-04-15 Thread Simon Riggs
On Fri, Apr 13, 2012 at 5:33 PM, Robert Haas robertmh...@gmail.com wrote:
 Currently, we have a problem with index-only scans in Hot Standby
 mode: the xmin horizon on the standby might lag the master, and thus
 an index-only scan might mistakenly conclude that no heap fetch is
 needed when in fact it is.  I suggested that we handle this by
 suppressing generation of index-only scan plans in Hot Standby mode,
 but Simon, Noah, and Dimitri were arguing that we should instead do
 the following, which is now on the open items list:

 * Make XLOG_HEAP2_VISIBLE records generate recovery snapshot conflicts
 so that IndexOnlyScans work on Hot Standby
...
snip very long email /snip

Luckily its much simpler than all of that suggests. It'll take a few
hours for me to write a short reply but its Sunday today, so that will
happen later.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers