Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Bruce Momjian
On Mon, Feb  3, 2014 at 11:55:34AM -0500, Robert Haas wrote:
  Robert, where are we on this?  Should I post a patch?
 
 I started working on this at one point but didn't finish the
 implementation, let alone the no-doubt-onerous performance testing
 that will be needed to validate whatever we come up with.  It would be
 really easy to cause serious regressions with ill-considered changes
 in this area, and I don't think many people here have the bandwidth
 for a detailed study of all the different workloads that might be
 affected here right this very minute.  More generally, you're sending
 all these pings three weeks after the deadline for CF4.  I don't think
 that's a good time to encourage people to *start* revising old
 patches, or writing new ones.
 
 I've also had some further thoughts about the right way to drive
 vacuum scheduling.  I think what we need to do is tightly couple the

I understand the problems with vacuum scheduling, but I was trying to
address _just_ the insert-only workload problem for index-only scans.

Right now, as I remember, only vacuum sets the visibility bits.  If we
don't want to make vacuum trigger for insert-only workloads, can we set
pages all-visible more often?  

Is there a reason that a sequential scan, which does do page pruning,
doesn't set the visibility bits too?  Or does it?  Can an non-index-only
index scan that finds the heap tuple all-visible and the page not 
all-visible check the other items on the page to see if the page can be
marked all-visible?  Does analyze set pages all-visible?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Robert Haas
On Tue, Feb 11, 2014 at 10:56 AM, Bruce Momjian br...@momjian.us wrote:
 Right now, as I remember, only vacuum sets the visibility bits.  If we
 don't want to make vacuum trigger for insert-only workloads, can we set
 pages all-visible more often?

 Is there a reason that a sequential scan, which does do page pruning,
 doesn't set the visibility bits too?  Or does it?  Can an non-index-only
 index scan that finds the heap tuple all-visible and the page not
 all-visible check the other items on the page to see if the page can be
 marked all-visible?  Does analyze set pages all-visible?

A sequential scan will set hint bits and will prune the page, but
pruning the page doesn't ever mark it all-visible; that logic is
entirely in vacuum.  If that could be made cheap enough to be
negligible, it might well be worth doing in heap_page_prune().  I
think there might be a way to do that, but it's a bit tricky because
the pruning logic iterates over the page in a somewhat complex way,
not just a straightforward scan of all the item pointers the way the
existing logic doesn't.  It would be pretty cool if we could just use
a bit out of the heap-prune xlog record to indicate whether the
all-visible bit should be set; then we'd gain the benefit of marking
things all-visible much more often without needing vacuum.

That doesn't help insert-only tables much, though, because those won't
require pruning.  We set hint bits (which dirties the page) but
currently don't write WAL.  We'd have to change that to set the
all-visible bit when scanning such a table, and that would be
expensive.  :-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Bruce Momjian
On Tue, Feb 11, 2014 at 11:28:36AM -0500, Robert Haas wrote:
 A sequential scan will set hint bits and will prune the page, but
 pruning the page doesn't ever mark it all-visible; that logic is
 entirely in vacuum.  If that could be made cheap enough to be
 negligible, it might well be worth doing in heap_page_prune().  I
 think there might be a way to do that, but it's a bit tricky because
 the pruning logic iterates over the page in a somewhat complex way,
 not just a straightforward scan of all the item pointers the way the
 existing logic doesn't.  It would be pretty cool if we could just use
 a bit out of the heap-prune xlog record to indicate whether the
 all-visible bit should be set; then we'd gain the benefit of marking
 things all-visible much more often without needing vacuum.
 
 That doesn't help insert-only tables much, though, because those won't
 require pruning.  We set hint bits (which dirties the page) but
 currently don't write WAL.  We'd have to change that to set the
 all-visible bit when scanning such a table, and that would be
 expensive.  :-(

Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
(2012) but they still seem to be not usable for insert-only workloads
two years later.  Based on current progress, it doesn't look like this
will be corrected until 9.5 (2015).  I am kind of confused why this has
not generated more urgency.

I guess my question is what approach do we want to take to fixing this? 
If we are doing pruning, aren't we emitting WAL?  You are right that for
an insert-only workload, we aren't going to prune, but if pruning WAL
overhead is acceptable for a sequential scan, isn't index-only
page-all-visible WAL overhead acceptable?

Do we want to track the number of inserts in statistics and trigger an
auto-vacuum after a specified number of inserts?  The problem there is
that we really don't need to do any index cleanup, which is what vacuum
typically does --- we just want to scan the table and set the
all-visible bits, so that approach seems non-optimal.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Andres Freund
On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote:
 Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
 (2012) but they still seem to be not usable for insert-only workloads
 two years later.  Based on current progress, it doesn't look like this
 will be corrected until 9.5 (2015).  I am kind of confused why this has
 not generated more urgency.

I think this largely FUD. They are hugely beneficial in some scenarios
and less so in others. Just like lots of other features we have.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Bruce Momjian
On Tue, Feb 11, 2014 at 06:54:10PM +0100, Andres Freund wrote:
 On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote:
  Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
  (2012) but they still seem to be not usable for insert-only workloads
  two years later.  Based on current progress, it doesn't look like this
  will be corrected until 9.5 (2015).  I am kind of confused why this has
  not generated more urgency.
 
 I think this largely FUD. They are hugely beneficial in some scenarios
 and less so in others. Just like lots of other features we have.

I don't understand.  Index-only scans are known to have benefits --- if
an insert-only workload can't use that, why is that acceptable?  What is
fear-uncertainty-and-doubt about that?  Please explain.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Andres Freund
On 2014-02-11 13:23:19 -0500, Bruce Momjian wrote:
 On Tue, Feb 11, 2014 at 06:54:10PM +0100, Andres Freund wrote:
  On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote:
   Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
   (2012) but they still seem to be not usable for insert-only workloads
   two years later.  Based on current progress, it doesn't look like this
   will be corrected until 9.5 (2015).  I am kind of confused why this has
   not generated more urgency.
  
  I think this largely FUD. They are hugely beneficial in some scenarios
  and less so in others. Just like lots of other features we have.
 
 I don't understand.  Index-only scans are known to have benefits --- if
 an insert-only workload can't use that, why is that acceptable?  What is
 fear-uncertainty-and-doubt about that?  Please explain.

Uh, for one, insert only workloads certainly aren't the majority of
usecases. Ergo there are plenty of cases where index only scans work out
of the box.
Also, they *do* work for insert only workloads, you just either have to
wait longer, or manually trigger VACUUMs. That's a far cry from not
being usable.

I am not saying it shouldn't be improved, I just don't see the point of
bringing it up while everyone is busy with the last CF and claiming it
is unusable and that stating that it is surprisising that nobody really
cares.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Bruce Momjian
On Tue, Feb 11, 2014 at 07:31:03PM +0100, Andres Freund wrote:
 On 2014-02-11 13:23:19 -0500, Bruce Momjian wrote:
  On Tue, Feb 11, 2014 at 06:54:10PM +0100, Andres Freund wrote:
   On 2014-02-11 12:12:13 -0500, Bruce Momjian wrote:
Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
(2012) but they still seem to be not usable for insert-only workloads
two years later.  Based on current progress, it doesn't look like this
will be corrected until 9.5 (2015).  I am kind of confused why this has
not generated more urgency.
   
   I think this largely FUD. They are hugely beneficial in some scenarios
   and less so in others. Just like lots of other features we have.
  
  I don't understand.  Index-only scans are known to have benefits --- if
  an insert-only workload can't use that, why is that acceptable?  What is
  fear-uncertainty-and-doubt about that?  Please explain.
 
 Uh, for one, insert only workloads certainly aren't the majority of
 usecases. Ergo there are plenty of cases where index only scans work out
 of the box.

True.

 Also, they *do* work for insert only workloads, you just either have to
 wait longer, or manually trigger VACUUMs. That's a far cry from not

Wait longer for what?  Anti-xid-wraparound vacuum?

 being usable.

Is using VACUUM for these cases documented?  Should it be?

 I am not saying it shouldn't be improved, I just don't see the point of
 bringing it up while everyone is busy with the last CF and claiming it
 is unusable and that stating that it is surprisising that nobody really
 cares.

Well, I brought it up in September too. My point was not that it is a
new issue but that it has been such an ignored issue for two years.  I
am not asking for a fix, but right now we don't even have a plan on how
to improve this.

I still don't see how this is FUD, and you have not explained it to me. 
This is a known limitation for two years, not documented (?), and with
no TODO item and no plan on how to improve it.  Do you want to declare
such cases FUD and just ignore them?  I don't see how that moves us
forward.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Tue, Feb 11, 2014 at 07:31:03PM +0100, Andres Freund wrote:
 I am not saying it shouldn't be improved, I just don't see the point of
 bringing it up while everyone is busy with the last CF and claiming it
 is unusable and that stating that it is surprisising that nobody really
 cares.

 Well, I brought it up in September too. My point was not that it is a
 new issue but that it has been such an ignored issue for two years.  I
 am not asking for a fix, but right now we don't even have a plan on how
 to improve this.

Indeed, and considering that we're all busy with the CF, I think it's
quite unreasonable of you to expect that we'll drop everything else
to think about this problem right now.  The reason it's like it is
is that it's not easy to see how to make it better; so even if we did
drop everything else, it's not clear to me that any plan would emerge
anytime soon.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Andres Freund
On 2014-02-11 13:41:46 -0500, Bruce Momjian wrote:
 Wait longer for what?  Anti-xid-wraparound vacuum?

Yes.

 Is using VACUUM for these cases documented?  Should it be?

No idea, it seems to be part of at least part of the folkloric
knowledge, from what I see at clients.

  I am not saying it shouldn't be improved, I just don't see the point of
  bringing it up while everyone is busy with the last CF and claiming it
  is unusable and that stating that it is surprisising that nobody really
  cares.

 Well, I brought it up in September too. My point was not that it is a
 new issue but that it has been such an ignored issue for two years.  I
 am not asking for a fix, but right now we don't even have a plan on how
 to improve this.

Coming up with a plan for this takes time and discussion, not something
we seem to have aplenty of atm. And even if were to agree on a plan
right now, we wouldn't incorporate it into 9.4, so what's the point of
bringing it up now?

 I still don't see how this is FUD, and you have not explained it to me. 
 This is a known limitation for two years, not documented (?), and with
 no TODO item and no plan on how to improve it.  Do you want to declare
 such cases FUD and just ignore them?  I don't see how that moves us
 forward.

Claiming something doesn't work while it just has manageable usability
issues doesn't strike me as a reasonable starting point. If it bugs
somebody enough to come up with a rough proposal it will get fixed...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Bruce Momjian
On Tue, Feb 11, 2014 at 01:54:48PM -0500, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  On Tue, Feb 11, 2014 at 07:31:03PM +0100, Andres Freund wrote:
  I am not saying it shouldn't be improved, I just don't see the point of
  bringing it up while everyone is busy with the last CF and claiming it
  is unusable and that stating that it is surprisising that nobody really
  cares.
 
  Well, I brought it up in September too. My point was not that it is a
  new issue but that it has been such an ignored issue for two years.  I
  am not asking for a fix, but right now we don't even have a plan on how
  to improve this.
 
 Indeed, and considering that we're all busy with the CF, I think it's
 quite unreasonable of you to expect that we'll drop everything else
 to think about this problem right now.  The reason it's like it is
 is that it's not easy to see how to make it better; so even if we did
 drop everything else, it's not clear to me that any plan would emerge
 anytime soon.

Well, documenting the VACUUM requirement and adding it to the TODO list
are things we should consider for 9.4.  If you think doing that after
the commit-fest is best, I can do that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Jeff Janes
On Tue, Feb 11, 2014 at 9:12 AM, Bruce Momjian br...@momjian.us wrote:

 On Tue, Feb 11, 2014 at 11:28:36AM -0500, Robert Haas wrote:
  A sequential scan will set hint bits and will prune the page, but
  pruning the page doesn't ever mark it all-visible; that logic is
  entirely in vacuum.  If that could be made cheap enough to be
  negligible, it might well be worth doing in heap_page_prune().  I
  think there might be a way to do that, but it's a bit tricky because
  the pruning logic iterates over the page in a somewhat complex way,
  not just a straightforward scan of all the item pointers the way the
  existing logic doesn't.  It would be pretty cool if we could just use
  a bit out of the heap-prune xlog record to indicate whether the
  all-visible bit should be set; then we'd gain the benefit of marking
  things all-visible much more often without needing vacuum.
 
  That doesn't help insert-only tables much, though, because those won't
  require pruning.  We set hint bits (which dirties the page) but
  currently don't write WAL.  We'd have to change that to set the
  all-visible bit when scanning such a table, and that would be
  expensive.  :-(

 Yes, that pretty much sums it up.  We introduced index-only scans in 9.2
 (2012) but they still seem to be not usable for insert-only workloads
 two years later.  Based on current progress, it doesn't look like this
 will be corrected until 9.5 (2015).  I am kind of confused why this has
 not generated more urgency.



For insert and select only, they are usable (if your queries are of the
type that could benefit from them), you just have to do some manual
intervention.  The list of features that sometimes require a DBA to do
something to make maximum use of them under some circumstance would be a
long one.  It would be nice if it were better, but I don't see why this
feature is particularly urgent compared to all the other things that could
be improved.  In particular I think the Freezing without IO is much more
important.  Freezing is rather unimportant until suddenly it is is the most
important thing in the universe.  If we could stop worrying about that, I
think it would free up other aspects of vacuum scheduling to have more
meddling/optimization done to it.




 I guess my question is what approach do we want to take to fixing this?
 If we are doing pruning, aren't we emitting WAL?  You are right that for
 an insert-only workload, we aren't going to prune, but if pruning WAL
 overhead is acceptable for a sequential scan, isn't index-only
 page-all-visible WAL overhead acceptable?



We often don't find that pruning particularly acceptable in seq scans, and
there is a patch pending to conditionally turn it off for them.



 Do we want to track the number of inserts in statistics and trigger an
 auto-vacuum after a specified number of inserts?


We track relpages and relallvisible, which seems like a more direct
measure.  Once analyze is done (which is already triggered by inserts) and
sets those, it could fire a vacuum based on the ratio of those values, or
the autovac process could just look at the ratio after naptime.  So just
introduce autovacuum_vacuum_visible_factor. A problem there is that it
would be a lot of work to aggressively keep the ratio high, and pointless
if the types of queries done on that table don't benefit from IOS anyway,
or if pages are dirtied so rapidly that no amount of vacuuming will keep
the ratio high.  Would we try to automatically tell which tables were
which, or rely on the DBA setting per-table
autovacuum_vacuum_visible_factor for tables that differ from the database
norm?


  The problem there is
 that we really don't need to do any index cleanup, which is what vacuum
 typically does --- we just want to scan the table and set the
 all-visible bits, so that approach seems non-optimal.


In the case of no updates or deletes (or aborted inserts?), there would be
nothing to clean up in the indexes and that step would be skipped (already
in the current code). And if the indexes do need cleaning up, we certainly
can't set the page all visible without doing that clean up.

Cheers,

Jeff


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Claudio Freire
On Tue, Feb 11, 2014 at 4:13 PM, Jeff Janes jeff.ja...@gmail.com wrote:

 Do we want to track the number of inserts in statistics and trigger an
 auto-vacuum after a specified number of inserts?


 We track relpages and relallvisible, which seems like a more direct measure.
 Once analyze is done (which is already triggered by inserts) and sets those,
 it could fire a vacuum based on the ratio of those values, or the autovac
 process could just look at the ratio after naptime.  So just introduce
 autovacuum_vacuum_visible_factor. A problem there is that it would be a lot
 of work to aggressively keep the ratio high, and pointless if the types of
 queries done on that table don't benefit from IOS anyway, or if pages are
 dirtied so rapidly that no amount of vacuuming will keep the ratio high.
 Would we try to automatically tell which tables were which, or rely on the
 DBA setting per-table autovacuum_vacuum_visible_factor for tables that
 differ from the database norm?


Why not track how many times an IOS would be used but wasn't, or how
many heap fetches in IOS have to be performed?

Seems like a more direct measure of whether allvisible needs an update.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-11 Thread Bruce Momjian
On Tue, Feb 11, 2014 at 05:51:36PM -0200, Claudio Freire wrote:
  We track relpages and relallvisible, which seems like a more direct measure.
  Once analyze is done (which is already triggered by inserts) and sets those,
  it could fire a vacuum based on the ratio of those values, or the autovac
  process could just look at the ratio after naptime.  So just introduce
  autovacuum_vacuum_visible_factor. A problem there is that it would be a lot
  of work to aggressively keep the ratio high, and pointless if the types of
  queries done on that table don't benefit from IOS anyway, or if pages are
  dirtied so rapidly that no amount of vacuuming will keep the ratio high.
  Would we try to automatically tell which tables were which, or rely on the
  DBA setting per-table autovacuum_vacuum_visible_factor for tables that
  differ from the database norm?
 
 
 Why not track how many times an IOS would be used but wasn't, or how
 many heap fetches in IOS have to be performed?
 
 Seems like a more direct measure of whether allvisible needs an update.

Now that is in interesting idea, and more direct. 

Do we need to adjust for the insert count, i.e. would the threadhold to
trigger an autovacuum after finding index lookups that had to check the
heap page for visibility be higher if many inserts are happening,
perhaps dirtying pages? (If we are dirtying via update/delete,
autovacuum will already trigger.)

We are aggressive in clearing the page-all-visible flag (we have to be),
but I think we need a little more aggressiveness for setting it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-05 Thread Robert Haas
First, thanks for this thoughtful email.

On Tue, Feb 4, 2014 at 7:14 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Mon, Feb 3, 2014 at 8:55 AM, Robert Haas robertmh...@gmail.com wrote:
 I've also had some further thoughts about the right way to drive
 vacuum scheduling.  I think what we need to do is tightly couple the
 rate at which we're willing to do vacuuming to the rate at which we're
 incurring vacuum debt.  That is, if we're creating 100kB/s of pages
 needing vacuum, we vacuum at 2-3MB/s (with default settings).

 If we can tolerate 2-3MB/s without adverse impact on other work, then we can
 tolerate it.  Do we gain anything substantial by sand-bagging it?

No.  The problem is the other direction.

 If
 we're creating 10MB/s of pages needing vacuum, we *still* vacuum at
 2-3MB/s.  Not shockingly, vacuum gets behind, the database bloats, and
 everything goes to heck.

 (Your reference to bloat made be me think your comments here are about
 vacuuming in general, not specific to IOS.  If that isn't the case, then
 please ignore.)

 If we can only vacuum at 2-3MB/s without adversely impacting other activity,
 but we are creating 10MB/s of future vacuum need, then there are basically
 two possibilities I can think of.  Either the 10MB/s represents a spike, and
 vacuum should tolerate it and hope to catch up on the debt later.  Or it
 represents a new permanent condition, in which case I bought too few hard
 drives for the work load, and no scheduling decision that autovacuum can
 make will save me from my folly. Perhaps there is some middle ground between
 those possibilities, but I don't see room for much middle ground.

 I guess there might be entirely different possibilities not between those
 two; for example, I don't realize I'm doing something that is generating
 10MB/s of vacuum debt, and would like to have this thing I'm doing be
 automatically throttled to the point it doesn't interfere with other
 processes (either directly, or indirectly by bloat)

The underlying issue here is that, in order for there not to be a
problem, a user needs to configure their autovacuum processes to
vacuum at a rate which is greater than or equal to the average rate at
which vacuum debt is being created.  If they don't, they get runaway
bloat.  But to do that, they need to know at what rate they are
creating vacuum debt, which is almost impossible to figure out right
now; and even if they did know it, they'd then need to figure out what
vacuum cost delay settings would allow vacuuming at a rate sufficient
to keep up, which isn't quite as hard to estimate but certainly
involves nontrivial math.  So a lot of people have this set wrong, and
it's not easy to get it right except by frobbing the settings until
you find something that works well in practice.

Also, a whole *lot* of problems in this area are caused by cases where
the rate at which vacuum debt is being created *changes*.  Autovacuum
is keeping up, but then you have either a load spike or just a gradual
increase in activity and it doesn't keep up any more.  You don't
necessarily notice right away, and by the time you do there's no easy
way to recover.  If you've got a table with lots of dead tuples in it,
but it's also got enough internal freespace to satisfy as many inserts
and updates as are happening, then it's possibly reasonable to put off
vacuuming in the hopes that system load will be lower at some time in
the future.  But if you've got a table with lots of dead tuples in it,
and you're extending it to create internal freespace instead of
vacuuming it, it is highly like that you are not doing what will make
the user most happy.  Even if vacuuming that table slows down
foreground activity quite badly, it is probably better than
accumulating an arbitrary amount of bloat.

 The rate of vacuuming needs to be tied
 somehow to the rate at which we're creating stuff that needs to be
 vacuumed.  Right now we don't even have a way to measure that, let
 alone auto-regulate the aggressiveness of autovacuum on that basis.

 There is the formula used to decide when a table gets vacuumed.  Isn't the
 time delta in this formula a measure of how fast we are creating stuff that
 needs to be vacuumed for bloat reasons?  Is your objection that it doesn't
 include other reasons we might want to vacuum, or that it just doesn't work
 very well, or that is not explicitly exposed?

AFAICT, the problem isn't when the table gets vacuumed so much as *how
fast* it gets vacuumed.  The autovacuum algorithm does a fine job
selecting tables for vacuuming, for the most part.  There are problems
with insert-only tables and sometimes for large tables the default
threshold (0.20) is too high, but it's not terrible.  However, the
limit on the overall rate of vacuuming activity to 2-3MB/s regardless
of how fast we're creating vacuum debt is a big problem.

 Similarly, for marking of pages as all-visible, we currently make the
 same decision whether the relation is getting index-scanned 

Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-04 Thread Jeff Janes
On Mon, Feb 3, 2014 at 8:55 AM, Robert Haas robertmh...@gmail.com wrote:


 I've also had some further thoughts about the right way to drive
  vacuum scheduling.  I think what we need to do is tightly couple the
 rate at which we're willing to do vacuuming to the rate at which we're
 incurring vacuum debt.  That is, if we're creating 100kB/s of pages
 needing vacuum, we vacuum at 2-3MB/s (with default settings).


If we can tolerate 2-3MB/s without adverse impact on other work, then we
can tolerate it.  Do we gain anything substantial by sand-bagging it?



 If
 we're creating 10MB/s of pages needing vacuum, we *still* vacuum at
 2-3MB/s.  Not shockingly, vacuum gets behind, the database bloats, and
 everything goes to heck.


(Your reference to bloat made be me think your comments here are about
vacuuming in general, not specific to IOS.  If that isn't the case, then
please ignore.)

If we can only vacuum at 2-3MB/s without adversely impacting other
activity, but we are creating 10MB/s of future vacuum need, then there are
basically two possibilities I can think of.  Either the 10MB/s represents a
spike, and vacuum should tolerate it and hope to catch up on the debt
later.  Or it represents a new permanent condition, in which case I bought
too few hard drives for the work load, and no scheduling decision that
autovacuum can make will save me from my folly. Perhaps there is some
middle ground between those possibilities, but I don't see room for much
middle ground.

I guess there might be entirely different possibilities not between those
two; for example, I don't realize I'm doing something that is generating
10MB/s of vacuum debt, and would like to have this thing I'm doing be
automatically throttled to the point it doesn't interfere with other
processes (either directly, or indirectly by bloat)



 The rate of vacuuming needs to be tied
 somehow to the rate at which we're creating stuff that needs to be
 vacuumed.  Right now we don't even have a way to measure that, let
 alone auto-regulate the aggressiveness of autovacuum on that basis.


There is the formula used to decide when a table gets vacuumed.  Isn't the
time delta in this formula a measure of how fast we are creating stuff that
needs to be vacuumed for bloat reasons?  Is your objection that it doesn't
include other reasons we might want to vacuum, or that it just doesn't work
very well, or that is not explicitly exposed?




 Similarly, for marking of pages as all-visible, we currently make the
 same decision whether the relation is getting index-scanned (in which
 case the failure to mark those pages all-visible may be suppressing
 the use of index scans or making them less effective) or whether it's
 not being accessed at all (in which case vacuuming it won't help
 anything, and might hurt by pushing other pages out of cache).


If it is not getting accessed at all because the database is not very
active right now, that would be the perfect time to vacuum it.  Between I
can accurately project current patterns of (in)activity into the future
and People don't build large tables just to ignore them forever, I think
the latter is more likely to be true.  If the system is busy but this
particular table is not, then that would be a better reason to
de-prioritise vacuuming that table.  But can this degree of reasoning
really be implemented in a practical way?  In core?


  Again,
 if we had better statistics, we could measure this - counting heap
 fetches for actual index-only scans plus heap fetches for index scans
 that might have been planned index-only scans but for the relation
 having too few all-visible pages doesn't sound like an impossible
 metric to gather.


My experience has been that if too few pages are all visible, it generally
switches to a seq scan, not an index scan of a different index.  But many
things that are semantically possible to be index-only-scans would never be
planned that way even if allvisible were 100%, so I think it would have to
do two planning passes, one with the real allvisible, and a hypothetical
one with allvisible set to 100%.  And then there is the possibility that,
while a high allvisible would be useful, the table is so active that no
amount of vacuuming could ever keep it high.

Cheers,

Jeff


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-02-03 Thread Robert Haas
On Fri, Jan 31, 2014 at 10:22 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Sep 19, 2013 at 02:39:43PM -0400, Robert Haas wrote:
 Right now, whether or not to autovacuum is the rest of a two-pronged
 test.  The first prong is based on number of updates and deletes
 relative to table size; that triggers a regular autovacuum.  The
 second prong is based on age(relfrozenxid) and triggers a
 non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).

 The typical case in which this doesn't work out well is when the table
 has a lot of inserts but few or no updates and deletes.  So I propose
 that we change the first prong to count inserts as well as updates and
 deletes when deciding whether it needs to vacuum the table.  We
 already use that calculation to decide whether to auto-analyze, so it
 wouldn't be very novel.   We know that the work of marking pages
 all-visible will need to be done at some point, and doing it sooner
 will result in doing it in smaller batches, which seems generally
 good.

 However, I do have one concern: it might lead to excessive
 index-vacuuming.  Right now, we skip the index vac step only if there
 ZERO dead tuples are found during the heap scan.  Even one dead tuple
 (or line pointer) will cause an index vac cycle, which may easily be
 excessive.  So I further propose that we introduce a threshold for
 index-vac; so that we only do index vac cycle if the number of dead
 tuples exceeds, say 0.1% of the table size.

 Thoughts?  Let the hurling of rotten tomatoes begin.

 Robert, where are we on this?  Should I post a patch?

I started working on this at one point but didn't finish the
implementation, let alone the no-doubt-onerous performance testing
that will be needed to validate whatever we come up with.  It would be
really easy to cause serious regressions with ill-considered changes
in this area, and I don't think many people here have the bandwidth
for a detailed study of all the different workloads that might be
affected here right this very minute.  More generally, you're sending
all these pings three weeks after the deadline for CF4.  I don't think
that's a good time to encourage people to *start* revising old
patches, or writing new ones.

I've also had some further thoughts about the right way to drive
vacuum scheduling.  I think what we need to do is tightly couple the
rate at which we're willing to do vacuuming to the rate at which we're
incurring vacuum debt.  That is, if we're creating 100kB/s of pages
needing vacuum, we vacuum at 2-3MB/s (with default settings).  If
we're creating 10MB/s of pages needing vacuum, we *still* vacuum at
2-3MB/s.  Not shockingly, vacuum gets behind, the database bloats, and
everything goes to heck.  The rate of vacuuming needs to be tied
somehow to the rate at which we're creating stuff that needs to be
vacuumed.  Right now we don't even have a way to measure that, let
alone auto-regulate the aggressiveness of autovacuum on that basis.

Similarly, for marking of pages as all-visible, we currently make the
same decision whether the relation is getting index-scanned (in which
case the failure to mark those pages all-visible may be suppressing
the use of index scans or making them less effective) or whether it's
not being accessed at all (in which case vacuuming it won't help
anything, and might hurt by pushing other pages out of cache).  Again,
if we had better statistics, we could measure this - counting heap
fetches for actual index-only scans plus heap fetches for index scans
that might have been planned index-only scans but for the relation
having too few all-visible pages doesn't sound like an impossible
metric to gather.  And if we had that, we could use it to trigger
vacuuming, instead of guessing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2014-01-31 Thread Bruce Momjian
On Thu, Sep 19, 2013 at 02:39:43PM -0400, Robert Haas wrote:
 Right now, whether or not to autovacuum is the rest of a two-pronged
 test.  The first prong is based on number of updates and deletes
 relative to table size; that triggers a regular autovacuum.  The
 second prong is based on age(relfrozenxid) and triggers a
 non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).
 
 The typical case in which this doesn't work out well is when the table
 has a lot of inserts but few or no updates and deletes.  So I propose
 that we change the first prong to count inserts as well as updates and
 deletes when deciding whether it needs to vacuum the table.  We
 already use that calculation to decide whether to auto-analyze, so it
 wouldn't be very novel.   We know that the work of marking pages
 all-visible will need to be done at some point, and doing it sooner
 will result in doing it in smaller batches, which seems generally
 good.
 
 However, I do have one concern: it might lead to excessive
 index-vacuuming.  Right now, we skip the index vac step only if there
 ZERO dead tuples are found during the heap scan.  Even one dead tuple
 (or line pointer) will cause an index vac cycle, which may easily be
 excessive.  So I further propose that we introduce a threshold for
 index-vac; so that we only do index vac cycle if the number of dead
 tuples exceeds, say 0.1% of the table size.
 
 Thoughts?  Let the hurling of rotten tomatoes begin.

Robert, where are we on this?  Should I post a patch?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-20 Thread Robert Haas
On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund and...@2ndquadrant.com wrote:
 The reason I suggested keeping track of the xids of unremovable tuples
 is that the current logic doesn't handle that at all. We just
 unconditionally set n_dead_tuples to zero after a vacuum even if not a
 single row could actually be cleaned out. Which has the effect that we
 will not start a vacuum until enough bloat (or after changing this, new
 inserts) has collected to start vacuum anew. Which then will do twice
 the work.

 Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
 much good either - we would just immediately trigger a new vacuum the
 next time we check, even if the xmin horizon is still the same.

One idea would be to store the xmin we used for the vacuum somewhere.
Could we make that part of the pgstats infrastructure?  Or store it in
a new pg_class column?  Then we could avoid re-triggering until it
advances.  Or, maybe better, we could remember the oldest XID that we
weren't able to remove due to xmin considerations and re-trigger when
the horizon passes that point.

 However, I do have one concern: it might lead to excessive
 index-vacuuming.  Right now, we skip the index vac step only if there
 ZERO dead tuples are found during the heap scan.  Even one dead tuple
 (or line pointer) will cause an index vac cycle, which may easily be
 excessive.  So I further propose that we introduce a threshold for
 index-vac; so that we only do index vac cycle if the number of dead
 tuples exceeds, say 0.1% of the table size.

 Yes, that's a pretty valid concern. But we can't really do it that
 easily. a) We can only remove dead line pointers when we know there's no
 index pointing to it anymore. Which we only know after the index has
 been removed. b) We cannot check the validity of an index pointer if
 there's no heap tuple for it. Sure, we could check whether we're
 pointing to a dead line pointer, but the random io costs of that are
 prohibitive.
 Now, we could just mark line pointers as dead and not mark that page as
 all-visible and pick it up again on the next vacuum cycle. But that
 would suck long-term.

 I think the only real solution here is to store removed tuples tids
 (i.e. items where we've marked as dead) somewhere. Whenever we've found
 sufficient tuples to-be-removed from indexes we do phase 2.

I don't really agree with that.  Yes, we could make that change, and
yes, it might be better than what we're doing today, but it would be
complex and have its own costs.  And it doesn't mean that lesser steps
are without merit.  A vacuum pass over the heap buys us a LOT of space
for reuse even without touching the indexes: we don't reclaim the line
pointers, but we do reclaim the space for the tuples themselves, which
is a big deal.  So being able to do that more frequently without
causing problems has a lot of value, I think.  The fact that we get to
set all-visible bits along the way makes future vacuums cheaper, and
makes index scans work better, so that's good too.  And the first
vacuum to find a dead tuple will dirty the page to truncate it to a
dead line pointer, while any subsequent revisits prior to the index
vac cycle will only examine the page without dirtying it.  All in all,
just leaving the page to be caught be a future vacuum doesn't seem
that bad to me, at least for a first cut.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-20 Thread Andres Freund
On 2013-09-20 11:30:26 -0400, Robert Haas wrote:
 On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund and...@2ndquadrant.com wrote:
  The reason I suggested keeping track of the xids of unremovable tuples
  is that the current logic doesn't handle that at all. We just
  unconditionally set n_dead_tuples to zero after a vacuum even if not a
  single row could actually be cleaned out. Which has the effect that we
  will not start a vacuum until enough bloat (or after changing this, new
  inserts) has collected to start vacuum anew. Which then will do twice
  the work.
 
  Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
  much good either - we would just immediately trigger a new vacuum the
  next time we check, even if the xmin horizon is still the same.
 
 One idea would be to store the xmin we used for the vacuum somewhere.
 Could we make that part of the pgstats infrastructure?  Or store it in
 a new pg_class column?  Then we could avoid re-triggering until it
 advances.  Or, maybe better, we could remember the oldest XID that we
 weren't able to remove due to xmin considerations and re-trigger when
 the horizon passes that point.

I suggested a slightly more complex variant of this upthread:
http://archives.postgresql.org/message-id/20130907053449.GE626072%40alap2.anarazel.de

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-20 Thread Robert Haas
On Fri, Sep 20, 2013 at 11:51 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-09-20 11:30:26 -0400, Robert Haas wrote:
 On Thu, Sep 19, 2013 at 6:59 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  The reason I suggested keeping track of the xids of unremovable tuples
  is that the current logic doesn't handle that at all. We just
  unconditionally set n_dead_tuples to zero after a vacuum even if not a
  single row could actually be cleaned out. Which has the effect that we
  will not start a vacuum until enough bloat (or after changing this, new
  inserts) has collected to start vacuum anew. Which then will do twice
  the work.
 
  Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
  much good either - we would just immediately trigger a new vacuum the
  next time we check, even if the xmin horizon is still the same.

 One idea would be to store the xmin we used for the vacuum somewhere.
 Could we make that part of the pgstats infrastructure?  Or store it in
 a new pg_class column?  Then we could avoid re-triggering until it
 advances.  Or, maybe better, we could remember the oldest XID that we
 weren't able to remove due to xmin considerations and re-trigger when
 the horizon passes that point.

 I suggested a slightly more complex variant of this upthread:
 http://archives.postgresql.org/message-id/20130907053449.GE626072%40alap2.anarazel.de

Ah, yeah.  Sorry, I forgot about that.

Personally, I'd try the simpler version first.  But I think whoever
takes the time to implement this will probably get to pick.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-19 Thread Robert Haas
On Tue, Sep 17, 2013 at 7:10 PM, Andres Freund and...@2ndquadrant.com wrote:
 I generally think the current logic for triggering VACUUMs via
 autovacuum doesn't really make all that much sense in the days where we
 have the visibility map.

Right now, whether or not to autovacuum is the rest of a two-pronged
test.  The first prong is based on number of updates and deletes
relative to table size; that triggers a regular autovacuum.  The
second prong is based on age(relfrozenxid) and triggers a
non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).

The typical case in which this doesn't work out well is when the table
has a lot of inserts but few or no updates and deletes.  So I propose
that we change the first prong to count inserts as well as updates and
deletes when deciding whether it needs to vacuum the table.  We
already use that calculation to decide whether to auto-analyze, so it
wouldn't be very novel.   We know that the work of marking pages
all-visible will need to be done at some point, and doing it sooner
will result in doing it in smaller batches, which seems generally
good.

However, I do have one concern: it might lead to excessive
index-vacuuming.  Right now, we skip the index vac step only if there
ZERO dead tuples are found during the heap scan.  Even one dead tuple
(or line pointer) will cause an index vac cycle, which may easily be
excessive.  So I further propose that we introduce a threshold for
index-vac; so that we only do index vac cycle if the number of dead
tuples exceeds, say 0.1% of the table size.

Thoughts?  Let the hurling of rotten tomatoes begin.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-19 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:

 Right now, whether or not to autovacuum is the rest of a two-pronged

 test.  The first prong is based on number of updates and deletes
 relative to table size; that triggers a regular autovacuum.  The
 second prong is based on age(relfrozenxid) and triggers a
 non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).
 
 The typical case in which this doesn't work out well is when the table
 has a lot of inserts but few or no updates and deletes.  So I propose
 that we change the first prong to count inserts as well as updates and
 deletes when deciding whether it needs to vacuum the table.  We
 already use that calculation to decide whether to auto-analyze, so it
 wouldn't be very novel.   We know that the work of marking pages
 all-visible will need to be done at some point, and doing it sooner
 will result in doing it in smaller batches, which seems generally
 good.
 
 However, I do have one concern: it might lead to excessive
 index-vacuuming.  Right now, we skip the index vac step only if there
 ZERO dead tuples are found during the heap scan.  Even one dead tuple
 (or line pointer) will cause an index vac cycle, which may easily be
 excessive.  So I further propose that we introduce a threshold for
 index-vac; so that we only do index vac cycle if the number of dead
 tuples exceeds, say 0.1% of the table size.

+1  I've been thinking of suggesting something along the same lines,
for the same reasons.

 
-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-19 Thread Andres Freund
On 2013-09-19 14:39:43 -0400, Robert Haas wrote:
 On Tue, Sep 17, 2013 at 7:10 PM, Andres Freund and...@2ndquadrant.com wrote:
  I generally think the current logic for triggering VACUUMs via
  autovacuum doesn't really make all that much sense in the days where we
  have the visibility map.
 
 Right now, whether or not to autovacuum is the rest of a two-pronged
 test.  The first prong is based on number of updates and deletes
 relative to table size; that triggers a regular autovacuum.  The
 second prong is based on age(relfrozenxid) and triggers a
 non-page-skipping vacuum (colloquially, an anti-wraparound vacuum).

And I have some hopes we can get rid of that in 9.4 (that alone would be
worth a bump to 10.0 ;)). I really like Heikki's patch, even if I am
envious that I didn't have the idea :P. Although it needs quite a bit of
work to be ready.

 The typical case in which this doesn't work out well is when the table
 has a lot of inserts but few or no updates and deletes.  So I propose
 that we change the first prong to count inserts as well as updates and
 deletes when deciding whether it needs to vacuum the table.  We
 already use that calculation to decide whether to auto-analyze, so it
 wouldn't be very novel.   We know that the work of marking pages
 all-visible will need to be done at some point, and doing it sooner
 will result in doing it in smaller batches, which seems generally
 good.

Yes, that's a desperately needed change.

The reason I suggested keeping track of the xids of unremovable tuples
is that the current logic doesn't handle that at all. We just
unconditionally set n_dead_tuples to zero after a vacuum even if not a
single row could actually be cleaned out. Which has the effect that we
will not start a vacuum until enough bloat (or after changing this, new
inserts) has collected to start vacuum anew. Which then will do twice
the work.

Resetting n_dead_tuples to the actual remaining dead tuples wouldn't do
much good either - we would just immediately trigger a new vacuum the
next time we check, even if the xmin horizon is still the same.

 However, I do have one concern: it might lead to excessive
 index-vacuuming.  Right now, we skip the index vac step only if there
 ZERO dead tuples are found during the heap scan.  Even one dead tuple
 (or line pointer) will cause an index vac cycle, which may easily be
 excessive.  So I further propose that we introduce a threshold for
 index-vac; so that we only do index vac cycle if the number of dead
 tuples exceeds, say 0.1% of the table size.

Yes, that's a pretty valid concern. But we can't really do it that
easily. a) We can only remove dead line pointers when we know there's no
index pointing to it anymore. Which we only know after the index has
been removed. b) We cannot check the validity of an index pointer if
there's no heap tuple for it. Sure, we could check whether we're
pointing to a dead line pointer, but the random io costs of that are
prohibitive.
Now, we could just mark line pointers as dead and not mark that page as
all-visible and pick it up again on the next vacuum cycle. But that
would suck long-term.

I think the only real solution here is to store removed tuples tids
(i.e. items where we've marked as dead) somewhere. Whenever we've found
sufficient tuples to-be-removed from indexes we do phase 2.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-18 Thread Jim Nasby

On 9/17/13 6:10 PM, Andres Freund wrote:

What if we maintained XID stats for ranges of pages in a separate
fork? Call it the XidStats fork. Presumably the interesting pieces
would be min(xmin) and max(xmax) for pages that aren't all visible. If
we did that at a granularity of, say, 1MB worth of pages[1] we're
talking 8 bytes per MB, or 1 XidStats page per GB of heap. (Worst case
alignment bumps that up to 2 XidStats pages per GB of heap.)



Yes, I have thought about similar ideas as well, but I came to the
conclusion that it's not worth it. If you want to make the boundaries
precise and the xidstats fork small, you're introducing new contention
points because every DML will need to make sure it's correct.


Actually, that's not true... the XidStats only need to be relatively precise. 
IE: within a few hundred or thousand XIDs.

So for example, you'd only need to attempt an update if the XID already stored 
was more than a few hundred/thousand/whatever XIDs away from your XID. If it's 
any closer don't even bother to update.

That still leaves potential for thundering herd on the fork buffer lock if 
you've got a ton of DML on one table across a bunch of backends, but there 
might be other ways around that. For example, if you know you can update the 
XID with a CPU-atomic instruction, you don't need to lock the page.


Also, the amount of code that would require seems to be bigger than
justified by the increase of precision when to vacuum.


That's very possibly true. I haven't had a chance to see how much VM bits help 
reduce vacuum overhead yet, so I don't have anything to add on this front. 
Perhaps others might.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-17 Thread Jim Nasby

On 9/7/13 12:34 AM, Andres Freund wrote:

What I was thinking of was to keep track of the oldest xids on pages
that cannot be marked all visible. I haven't thought about the
statistics part much, but what if we binned the space between
[RecentGlobalXmin, -nextXid) into 10 bins and counted the number of
pages falling into each bin. Then after the vacuum finished we could
compute how far RecentGlobalXmin would have to progress to make another
vacuum worthwile by counting the number of pages from the lowest bin
upwards and use the bin's upper limit as the triggering xid.

Now, we'd definitely need to amend that scheme by something that handles
pages that are newly written to, but it seems something like that
wouldn't be too hard to implement and would make autovacuum more useful.


If we're binning by XID though you're still dependent on scanning to build that 
range. Anything that creates dead tuples will also be be problematic, because 
it's going to unset VM bits on you, and you won't know if it's due to INSERTS 
or dead tuples.

What if we maintained XID stats for ranges of pages in a separate fork? Call it 
the XidStats fork. Presumably the interesting pieces would be min(xmin) and 
max(xmax) for pages that aren't all visible. If we did that at a granularity 
of, say, 1MB worth of pages[1] we're talking 8 bytes per MB, or 1 XidStats page 
per GB of heap. (Worst case alignment bumps that up to 2 XidStats pages per GB 
of heap.)

Having both min(xmin) and max(xmax) for a range of pages would allow for very 
granular operation of vacuum. Instead of hitting every heap page that's not 
all-visible, it would only hit those that are not visible and where min(xmin) 
or max(xmax) were less than RecentGlobalXmin.

One concern is maintaining this data. A key point is that we don't have to 
update it every time it changes; if the min/max are only off by a few hundred 
XIDs there's no point to updating the XidStats page. We'd obviously need the 
XidStats page to be read in, but even a 100GB heap would be either 100 or 200 
XidStats pages.

[1]: There's a trade-off between how much space we 'waste' on XidStats pages 
and how many heap pages we potentially have to scan in the range. We'd want to 
see what this looked like in a real system. The thing that helps here is that 
regardless of what the stats for a particular heap range are, you're not going 
to scan any pages in that range that are already all-visible.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-17 Thread Andres Freund
On 2013-09-17 11:37:35 -0500, Jim Nasby wrote:
 On 9/7/13 12:34 AM, Andres Freund wrote:
 What I was thinking of was to keep track of the oldest xids on pages
 that cannot be marked all visible. I haven't thought about the
 statistics part much, but what if we binned the space between
 [RecentGlobalXmin, -nextXid) into 10 bins and counted the number of
 pages falling into each bin. Then after the vacuum finished we could
 compute how far RecentGlobalXmin would have to progress to make another
 vacuum worthwile by counting the number of pages from the lowest bin
 upwards and use the bin's upper limit as the triggering xid.
 
 Now, we'd definitely need to amend that scheme by something that handles
 pages that are newly written to, but it seems something like that
 wouldn't be too hard to implement and would make autovacuum more useful.
 
 If we're binning by XID though you're still dependent on scanning to
 build that range. Anything that creates dead tuples will also be be
 problematic, because it's going to unset VM bits on you, and you won't
 know if it's due to INSERTS or dead tuples.

I don't think that's all that much of a problem. In the end, it's a good
idea to look at pages shortly after they have been filled/been
touched. Setting hint bits at that point avoid repetitive IO and in many
cases we will already be able to mark them all-visible.
The binning idea was really about sensibly estimating whether a new scan
already makes sense which is currently very hard to judge.

I generally think the current logic for triggering VACUUMs via
autovacuum doesn't really make all that much sense in the days where we
have the visibility map.

 What if we maintained XID stats for ranges of pages in a separate
 fork? Call it the XidStats fork. Presumably the interesting pieces
 would be min(xmin) and max(xmax) for pages that aren't all visible. If
 we did that at a granularity of, say, 1MB worth of pages[1] we're
 talking 8 bytes per MB, or 1 XidStats page per GB of heap. (Worst case
 alignment bumps that up to 2 XidStats pages per GB of heap.)

Yes, I have thought about similar ideas as well, but I came to the
conclusion that it's not worth it. If you want to make the boundaries
precise and the xidstats fork small, you're introducing new contention
points because every DML will need to make sure it's correct.
Also, the amount of code that would require seems to be bigger than
justified by the increase of precision when to vacuum.


Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-09 Thread Jeff Janes
On Sun, Sep 8, 2013 at 8:49 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Sep 9, 2013 at 2:35 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 I thought it was well known, but maybe I was overly optimistic.  I've
 considered IOS to be mostly useful for data mining work on read-mostly
 tables, which you would probably vacuum manually after a bulk load.

 For transactional tables, I think that trying to keep the vm set-bit
 density high enough would be a losing battle.  If we redefined the
 nature of the vm so that doing a HOT update would not clear the
 visibility bit, perhaps that would change the outcome of this battle.

 Wouldn't it make the Vacuum bit in-efficient in the sense that it will
 skip some of the pages in which there are only
 HOT updates for cleaning dead rows.

Maybe.  But anyone is competent to clean up dead rows from HOT
updates, it is not exclusively vacuum that can do it, like it is for
non-HOT tuples.  So I think any inefficiency would be very small.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-09 Thread Bruce Momjian
On Sun, Sep  8, 2013 at 12:47:35AM +0200, Andres Freund wrote:
 Hi,
 
 On 2013-09-07 12:50:59 -0400, Bruce Momjian wrote:
  That seems very complicated.  I think it would be enough to record the
  current xid at the time of the vacuum, and when testing for later
  vacuums, if that saved xid is earlier than the RecentGlobalXmin, and
  there have been no inserts/updates/deletes, we know that all of
  the pages can now be marked as allvisible.
 
 But that would constantly trigger vacuums, or am I missing something? Or
 what are you suggesting this xid to be used for?

OK, let me give some specifices.  Let's suppose we run a vacuum, and at
the time the current xid counter is 200.  If we later have autovacuum
check if it should vacuum, and there have been no dead rows generated
(no update/delete/abort), if the current RecentGlobalXmin is 200, then
we know that all the transactions that prevented all-visible marking the
last time we ran vacuum has completed.  That leaves us with just
inserts that could prevent all-visible.

If there have been no inserts, we can assume that we can vacuum just the
non-all-visible pages, and even if there are only 10, it just means we
have to read 10 8k blocks, not the entire table, because the all-visible
is set for all the rest of the pages.

Now, if there have been inserts, there are a few cases.  If the inserts
happened in pages that were previously marked all-visible, then we now
have pages that lost all-visible, and we probably don't want to vacuum
those.  Of course, we will not have recorded which pages changed, but
any decrease in the all-visible table count perhaps should have us
avoiding vacuum just to set the visibility map.  We should probably
update our stored vm bit-set count and current xid value so we can check
again later to see if things have sabilized.

If the vm-set bit count is the same as the last time autovacuum checked
the table, then the inserts happened either in the vm-bit cleared pages,
or in new data pages.  If the table size is the same, the inserts
happened in existing pages, so we probably don't want to vacuum.  If the
table size has increased, some inserts went into new pages, so we might
want to vacuum, but I am unclear how many new pages should force a
vacuum.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-09 Thread Amit Kapila
On Mon, Sep 9, 2013 at 9:33 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Sun, Sep 8, 2013 at 8:49 PM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Sep 9, 2013 at 2:35 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 I thought it was well known, but maybe I was overly optimistic.  I've
 considered IOS to be mostly useful for data mining work on read-mostly
 tables, which you would probably vacuum manually after a bulk load.

 For transactional tables, I think that trying to keep the vm set-bit
 density high enough would be a losing battle.  If we redefined the
 nature of the vm so that doing a HOT update would not clear the
 visibility bit, perhaps that would change the outcome of this battle.

 Wouldn't it make the Vacuum bit in-efficient in the sense that it will
 skip some of the pages in which there are only
 HOT updates for cleaning dead rows.

 Maybe.  But anyone is competent to clean up dead rows from HOT
 updates, it is not exclusively vacuum that can do it, like it is for
 non-HOT tuples.

Yes, that is right, but how about freezing of tuples, delaying that
also might not be good. Also it might not be good for all kind of
scenarios that always foreground operations take care of cleaning up
dead rows leaving very less chance for Vacuum (only when it has to
scan all pages aka anti-wraparound vacuum) to cleanup dead rows.

If we are sure that Vacuum skipping pages in a database where there
are less non-HOT updates and deletes (or mostly inserts and
Hot-updates) is not having any significant impact, then it can be
quite useful for IOS.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-08 Thread Jeff Janes
On Thu, Sep 5, 2013 at 7:00 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Sep  5, 2013 at 09:10:06PM -0400, Robert Haas wrote:
 On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote:
  Actually, I now realize it is more complex than that, and worse.  There
  are several questions to study to understand when pg_class.relallvisible
  is updated (which is used to determine if index-only scans are a good
  optimization choice), and when VM all-visible bits are set so heap pages
  can be skipped during index-only scans:
 
  1)  When are VM bits set:
  vacuum (non-full)
  analyze (only some random pages)

 Analyze doesn't set visibility-map bits.  It only updates statistics
 about how many are set.

 Sorry, yes you are correct.

  The calculus we should use to determine when we need to run vacuum has
  changed with index-only scans, and I am not sure we ever fully addressed
  this.

 Yeah, we didn't.  I think the hard part is figuring out what behavior
 would be best.  Counting inserts as well as updates and deletes would
 be a simple approach, but I don't have much confidence in it.  My
 experience is that having vacuum or analyze kick in during a bulk-load
 operation is a disaster.  We'd kinda like to come up with a way to
 make vacuum run after the bulk load is complete, maybe, but how would
 we identify that time, and there are probably cases where that's not
 right either.

 I am unsure how we have gone a year with index-only scans and I am just
 now learning that it only works well with update/delete workloads or by
 running vacuum manually.  I only found this out going back over January
 emails.  Did other people know this?  Was it not considered a serious
 problem?

I thought it was well known, but maybe I was overly optimistic.  I've
considered IOS to be mostly useful for data mining work on read-mostly
tables, which you would probably vacuum manually after a bulk load.

For transactional tables, I think that trying to keep the vm set-bit
density high enough would be a losing battle.  If we redefined the
nature of the vm so that doing a HOT update would not clear the
visibility bit, perhaps that would change the outcome of this battle.


Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-08 Thread Amit Kapila
On Mon, Sep 9, 2013 at 2:35 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Thu, Sep 5, 2013 at 7:00 PM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Sep  5, 2013 at 09:10:06PM -0400, Robert Haas wrote:
 On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote:
  Actually, I now realize it is more complex than that, and worse.  There
  are several questions to study to understand when pg_class.relallvisible
  is updated (which is used to determine if index-only scans are a good
  optimization choice), and when VM all-visible bits are set so heap pages
  can be skipped during index-only scans:
 
  1)  When are VM bits set:
  vacuum (non-full)
  analyze (only some random pages)

 Analyze doesn't set visibility-map bits.  It only updates statistics
 about how many are set.

 Sorry, yes you are correct.

  The calculus we should use to determine when we need to run vacuum has
  changed with index-only scans, and I am not sure we ever fully addressed
  this.

 Yeah, we didn't.  I think the hard part is figuring out what behavior
 would be best.  Counting inserts as well as updates and deletes would
 be a simple approach, but I don't have much confidence in it.  My
 experience is that having vacuum or analyze kick in during a bulk-load
 operation is a disaster.  We'd kinda like to come up with a way to
 make vacuum run after the bulk load is complete, maybe, but how would
 we identify that time, and there are probably cases where that's not
 right either.

 I am unsure how we have gone a year with index-only scans and I am just
 now learning that it only works well with update/delete workloads or by
 running vacuum manually.  I only found this out going back over January
 emails.  Did other people know this?  Was it not considered a serious
 problem?

 I thought it was well known, but maybe I was overly optimistic.  I've
 considered IOS to be mostly useful for data mining work on read-mostly
 tables, which you would probably vacuum manually after a bulk load.

 For transactional tables, I think that trying to keep the vm set-bit
 density high enough would be a losing battle.  If we redefined the
 nature of the vm so that doing a HOT update would not clear the
 visibility bit, perhaps that would change the outcome of this battle.

Wouldn't it make the Vacuum bit in-efficient in the sense that it will
skip some of the pages in which there are only
HOT updates for cleaning dead rows.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-07 Thread Bruce Momjian
On Sat, Sep  7, 2013 at 07:34:49AM +0200, Andres Freund wrote:
  The idea of using RecentGlobalXmin to see how much _work_ has happened
  since the last vacuum is interesting, but it doesn't handle read-only
  transactions;  I am not sure how they can be tracked.  You make a good
  point that 5 minutes passing is meaningless --- you really want to know
  how many transactions have completed.
 
 So, what I was pondering went slightly into a different direction:
 
 (lets ignore anti wraparound vacuum for now)
 
 Currently we trigger autovacuums by the assumed number of dead
 tuples. In the course of it's action it usually will find that it cannot
 remove all dead rows and that it cannot mark everything as all
 visible. That's because the xmin horizon hasn't advanced far enough. We
 won't trigger another vacuum after that unless there are further dead
 tuples in the relation...
 One trick if we want to overcome that problem and that we do not handle
 setting all visible nicely for INSERT only workloads would be to trigger
 vacuum by the amount of pages that are not marked all visible in the vm.
 
 The problem there is that repeatedly scanning a relation that's only 50%
 visible where the rest cannot be marked all visible because of a
 longrunning pg_dump obivously isn't a good idea. So we need something to
 notify us when there's work to be done. Using elapsed time seems like a
 bad idea because it doesn't adapt to changing workloads very well and
 doesn't work nicely for different relations.
 
 What I was thinking of was to keep track of the oldest xids on pages
 that cannot be marked all visible. I haven't thought about the
 statistics part much, but what if we binned the space between
 [RecentGlobalXmin, -nextXid) into 10 bins and counted the number of
 pages falling into each bin. Then after the vacuum finished we could
 compute how far RecentGlobalXmin would have to progress to make another
 vacuum worthwile by counting the number of pages from the lowest bin
 upwards and use the bin's upper limit as the triggering xid.
 
 Now, we'd definitely need to amend that scheme by something that handles
 pages that are newly written to, but it seems something like that
 wouldn't be too hard to implement and would make autovacuum more useful.

That seems very complicated.  I think it would be enough to record the
current xid at the time of the vacuum, and when testing for later
vacuums, if that saved xid is earlier than the RecentGlobalXmin, and
there have been no inserts/updates/deletes, we know that all of
the pages can now be marked as allvisible.

What this doesn't handle is the insert case.  What we could do there is
to record the total free space map space, and if the FSM has not changed
between the last vacuum, we can even vacuum if inserts happened in that
period because we assume the inserts are on new pages.  One problem
there is that the FSM is only updated if an insert will not fit on the
page.  We could record the table size and make sure the table size has
increased before we allow inserts to trigger a vm-set vacuum.

None of this is perfect, but it is better than what we have, and it
would eventually get the VM bits set.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-07 Thread Andres Freund
Hi,

On 2013-09-07 12:50:59 -0400, Bruce Momjian wrote:
 That seems very complicated.  I think it would be enough to record the
 current xid at the time of the vacuum, and when testing for later
 vacuums, if that saved xid is earlier than the RecentGlobalXmin, and
 there have been no inserts/updates/deletes, we know that all of
 the pages can now be marked as allvisible.

But that would constantly trigger vacuums, or am I missing something? Or
what are you suggesting this xid to be used for?

What I was talking about was how to evaluate the benefit of triggering
an VACUUM even if there's not a significant amount of new dead rows. If
we know that for a certain xmin horizon there's N pages that potentially
can be cleaned and marked all visible we have a change of making
sensible decisions.
We could just use one bin (i.e. use one cutoff xid as you propose) and
count the number of pages that would be affected. But that would mean
we'd only trigger vacuums very irregularly if you have a workload with
several longrunning transactions. When the oldest of a set of
longrunning transactions finishes you possibly can already clean up a
good bit reducing the chance of further bloat. Otherwise you have to
wait for all of them to finish.

 What this doesn't handle is the insert case.  What we could do there is
 to record the total free space map space, and if the FSM has not changed
 between the last vacuum, we can even vacuum if inserts happened in that
 period because we assume the inserts are on new pages.  One problem
 there is that the FSM is only updated if an insert will not fit on the
 page.  We could record the table size and make sure the table size has
 increased before we allow inserts to trigger a vm-set vacuum.

Not sure why that's better than just counting the number of pages that
have unset vm bits?
Note that you cannot rely on the FSM data to be correct all the time, we
can only use such tricks to trigger vacuums not for the actual operation
in the vacuum.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Dimitri Fontaine
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 I'm not sure if we need to expose all these new maintenance actions as
 SQL commands.

I strongly think we should, if only for diagnostic purposes. Also to
adapt to some well defined workloads that the automatic system is not
designed to handle.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Hannu Krosing
On 09/06/2013 09:23 AM, Dimitri Fontaine wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 I'm not sure if we need to expose all these new maintenance actions as
 SQL commands.
 I strongly think we should, if only for diagnostic purposes. 
It would be much easier and more flexible to expose them
as pg_*() function calls, not proper commands.
 Also to
 adapt to some well defined workloads that the automatic system is not
 designed to handle.
+1

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Andres Freund
On 2013-09-06 01:22:36 -0400, Alvaro Herrera wrote:
 I think it's shortsighted to keep thinking of autovacuum as just a way
 to run VACUUM and ANALYZE.  We have already discussed work items that
 need to be done separately, such as truncating the last few empty pages
 on a relation that was vacuumed recently.  We also need to process a GIN
 index' pending insertion list; and with minmax indexes I will want to
 run summarization of heap page ranges.

Agreed.

 So maybe instead of trying to think of VM bit setting as part of vacuum,
 we could just keep stats about how many pages we might need to scan
 because of possibly needing to set the bit, and then doing that in
 autovacuum, independently from actually vacuuming the relation.

I am not sure I understand this though. What would be the point to go
and set all visible and not do the rest of the vacuuming work?

I think triggering vacuuming by scanning the visibility map for the
number of unset bits and use that as another trigger is a good idea. The
vm should ensure we're not doing superflous work.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Andres Freund
On 2013-09-06 13:38:56 +0200, Hannu Krosing wrote:
 On 09/06/2013 09:23 AM, Dimitri Fontaine wrote:
  Alvaro Herrera alvhe...@2ndquadrant.com writes:
  I'm not sure if we need to expose all these new maintenance actions as
  SQL commands.
  I strongly think we should, if only for diagnostic purposes. 
 It would be much easier and more flexible to expose them
 as pg_*() function calls, not proper commands.

I don't think that's as easy as you might imagine. For much of what's
done in that context you cannot be in a transaction, you even need to be
in a toplevel statement (since we internally
CommitTransactionCommand/StartTransactionCommand).

So those pg_* commands couldn't be called (except possibly via the
fastpath function call API ...) which might restrict their usefulnes a
teensy bit ;)

So, I think extending the options passed to VACUUM - since it can take
pretty generic options these days - is a more realistic path.

  Also to
  adapt to some well defined workloads that the automatic system is not
  designed to handle.
 +1

What would you like to expose individually?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Hannu Krosing
On 09/06/2013 03:12 PM, Andres Freund wrote:
 On 2013-09-06 13:38:56 +0200, Hannu Krosing wrote:
 On 09/06/2013 09:23 AM, Dimitri Fontaine wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
 I'm not sure if we need to expose all these new maintenance actions as
 SQL commands.
 I strongly think we should, if only for diagnostic purposes. 
 It would be much easier and more flexible to expose them
 as pg_*() function calls, not proper commands.
 I don't think that's as easy as you might imagine. For much of what's
 done in that context you cannot be in a transaction, you even need to be
 in a toplevel statement (since we internally
 CommitTransactionCommand/StartTransactionCommand).

 So those pg_* commands couldn't be called (except possibly via the
 fastpath function call API ...) which might restrict their usefulnes a
 teensy bit ;)

 So, I think extending the options passed to VACUUM - since it can take
 pretty generic options these days - is a more realistic path.
Might be something convoluted like 

VACUUM indexname WITH (function = pg_cleanup_gin($1));

:)

 Also to
 adapt to some well defined workloads that the automatic system is not
 designed to handle.
 +1
 What would you like to expose individually?

 Greetings,

 Andres Freund




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Andres Freund
On 2013-09-06 12:30:56 -0400, Bruce Momjian wrote:
  I am not sure I understand this though. What would be the point to go
  and set all visible and not do the rest of the vacuuming work?
 
  I think triggering vacuuming by scanning the visibility map for the
  number of unset bits and use that as another trigger is a good idea. The
  vm should ensure we're not doing superflous work.

 Yes, I think it might be hard to justify a separate VM-set-only scan of
 the table.  If you are already reading the table, and already checking
 to see if you can set the VM bit, I am not sure why you would not also
 remove old rows, especially since removing those rows might be necessary
 to allow setting VM bits.

Yep. Although adding the table back into the fsm will lead to it being
used for new writes again...

 Another problem I thought of is that while automatic vacuuming only
 happens with high update/delete load, index-only scans are best on
 mostly non-write tables, so we have bad behavior where the ideal case
 (static data) doesn't get vm-bits set, while update/delete has the
 vm-bits set, but then cleared as more update/deletes occur.

Well, older tables will get vacuumed due to vacuum_freeze_table_age. So
at some point they will get vacuumed and the vm bits will get set.

 The more I look at this the worse it appears.   How has this gone
 unaddressed for over a year?

It's been discussed several times including during the introduction of
the feature. I am a bit surprised about the panickey tone in this
thread.
Yes, we need to overhaul the way vacuum works (to reduce the frequency
of rewriting stuff repeatedly) and the way it's triggered (priorization,
more trigger conditions) but all these are known things and just need
somebody with time.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Bruce Momjian
On Fri, Sep  6, 2013 at 03:08:54PM +0200, Andres Freund wrote:
 On 2013-09-06 01:22:36 -0400, Alvaro Herrera wrote:
  I think it's shortsighted to keep thinking of autovacuum as just a way
  to run VACUUM and ANALYZE.  We have already discussed work items that
  need to be done separately, such as truncating the last few empty pages
  on a relation that was vacuumed recently.  We also need to process a GIN
  index' pending insertion list; and with minmax indexes I will want to
  run summarization of heap page ranges.
 
 Agreed.
 
  So maybe instead of trying to think of VM bit setting as part of vacuum,
  we could just keep stats about how many pages we might need to scan
  because of possibly needing to set the bit, and then doing that in
  autovacuum, independently from actually vacuuming the relation.
 
 I am not sure I understand this though. What would be the point to go
 and set all visible and not do the rest of the vacuuming work?
 
 I think triggering vacuuming by scanning the visibility map for the
 number of unset bits and use that as another trigger is a good idea. The
 vm should ensure we're not doing superflous work.

Yes, I think it might be hard to justify a separate VM-set-only scan of
the table.  If you are already reading the table, and already checking
to see if you can set the VM bit, I am not sure why you would not also
remove old rows, especially since removing those rows might be necessary
to allow setting VM bits.

Another problem I thought of is that while automatic vacuuming only
happens with high update/delete load, index-only scans are best on
mostly non-write tables, so we have bad behavior where the ideal case
(static data) doesn't get vm-bits set, while update/delete has the
vm-bits set, but then cleared as more update/deletes occur.

The more I look at this the worse it appears.   How has this gone
unaddressed for over a year?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Jim Nasby

On 9/6/13 2:13 PM, Bruce Momjian wrote:

On Fri, Sep  6, 2013 at 01:01:59PM -0400, Bruce Momjian wrote:

This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
manual VACUUM was required for index-only scans.  That thread ended with
us realizing that pg_upgrade's ANALYZE runs will populate
pg_class.relallvisible.

What I didn't see in that thread is an analysis of what cases are going
to require manual vacuum, and I have seen no work in 9.3 to improve
that.  I don't even see it on the TODO list.


OK, let's start the discussion then.  I have added a TODO list:

Improve setting of visibility map bits for read-only and insert-only 
workloads

So, what should trigger an auto-vacuum vacuum for these workloads?
Rather than activity, which is what normally drives autovacuum, it is
lack of activity that should drive it, combined with a high VM cleared
bit percentage.

It seems we can use these statistics values:

 n_tup_ins   | bigint
 n_tup_upd   | bigint
 n_tup_del   | bigint
 n_tup_hot_upd   | bigint
 n_live_tup  | bigint
 n_dead_tup  | bigint
 n_mod_since_analyze | bigint
 last_vacuum | timestamp with time zone
 last_autovacuum | timestamp with time zone

Particilarly last_vacuum and last_autovacuum can tell us the last time
of vacuum.  If the n_tup_upd/n_tup_del counts are low, and the  VM set
bit count is low, it might need vacuuming, though inserts into existing
pages would complicate that.


Something else that might be useful to look at is if there are any FSM entries 
or not. True insert only shouldn't have any FSM.

That said, there's definitely another case to think about... tables that see 
update activity on newly inserted rows but not on older rows. A work queue that 
is not pruned would be an example of that:

INSERT new work item
UPDATE work item SET status = 'In process';
UPDATE work item SET completion = '50%';
UPDATE work item SET sattus = 'Complete, completion = '100%';

In this case I would expect most of the pages in the table (except the very 
end) to be all visible.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Jim Nasby

On 9/5/13 8:29 PM, Gavin Flower wrote:

How about a 'VACUUM AFTER' command (part of the 'BEGIN' transaction syntax?) 
that would:

 1. only be valid in a transaction
 2. initiate a vacuum after the current transaction completed
 3. defer any vacuum triggered due to other criteria

If the transaction was rolled back: then if there was a pending vacuum, due to 
other reasons, it would then be actioned.

On normal transaction completion, then if there was a pending vacuum it would 
be combined with the one in the transaction.

Still would need some method of ensuring any pending vacuum was done if the 
transaction hung, or took too long.


I *really* like the idea of BEGIN VACUUM AFTER, but I suspect it would be of 
very limited usefulness if it didn't account for currently running transactions.

I'm thinking we add a vacuum_after_xid field somewhere (pg_class), and instead 
of attempting to vacuum inside the backend at commit time the transaction would 
set that field to it's XID unless the field already had a newer XID in it.

autovac would then add all tables where vacuum_after_xid  the oldest running 
transaction to it's priority list.
--
Jim C. Nasby, Data Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Andres Freund
On 2013-09-06 15:13:30 -0400, Bruce Momjian wrote:
 On Fri, Sep  6, 2013 at 01:01:59PM -0400, Bruce Momjian wrote:
  This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
  manual VACUUM was required for index-only scans.  That thread ended with
  us realizing that pg_upgrade's ANALYZE runs will populate
  pg_class.relallvisible.
  
  What I didn't see in that thread is an analysis of what cases are going
  to require manual vacuum, and I have seen no work in 9.3 to improve
  that.  I don't even see it on the TODO list.
 
 OK, let's start the discussion then.  I have added a TODO list:
 
   Improve setting of visibility map bits for read-only and insert-only 
 workloads
 
 So, what should trigger an auto-vacuum vacuum for these workloads? 
 Rather than activity, which is what normally drives autovacuum, it is
 lack of activity that should drive it, combined with a high VM cleared
 bit percentage.
 
 It seems we can use these statistics values:
 
n_tup_ins   | bigint   
n_tup_upd   | bigint   
n_tup_del   | bigint   
n_tup_hot_upd   | bigint   
n_live_tup  | bigint   
n_dead_tup  | bigint   
n_mod_since_analyze | bigint   
last_vacuum | timestamp with time zone 
last_autovacuum | timestamp with time zone 
 
 Particilarly last_vacuum and last_autovacuum can tell us the last time
 of vacuum.  If the n_tup_upd/n_tup_del counts are low, and the  VM set
 bit count is low, it might need vacuuming, though inserts into existing
 pages would complicate that.

I wonder if we shouldn't trigger most vacuums (not analyze!) via unset
fsm bits. Perhaps combined with keeping track of RecentGlobalXmin to
make sure we're not repeatedly checking for work that cannot yet be
done.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Andres Freund
On 2013-09-06 13:01:59 -0400, Bruce Momjian wrote:
 On Fri, Sep  6, 2013 at 06:36:47PM +0200, Andres Freund wrote:
  On 2013-09-06 12:30:56 -0400, Bruce Momjian wrote:
I am not sure I understand this though. What would be the point to go
and set all visible and not do the rest of the vacuuming work?
   
I think triggering vacuuming by scanning the visibility map for the
number of unset bits and use that as another trigger is a good idea. The
vm should ensure we're not doing superflous work.
  
   Yes, I think it might be hard to justify a separate VM-set-only scan of
   the table.  If you are already reading the table, and already checking
   to see if you can set the VM bit, I am not sure why you would not also
   remove old rows, especially since removing those rows might be necessary
   to allow setting VM bits.
  
  Yep. Although adding the table back into the fsm will lead to it being
  used for new writes again...
 
 You mean adding _pages_ back into the table's FSM?  Yes, that is going
 to cause those pages to get dirty, but it is better than expanding the
 table size.  I don't see why you would not update the FSM.

You're right, we should add them, I wasn't really questioning that. I
was, quietly so you couldn't hear it, wondering whether we should
priorize the target buffer selection differently.

   Another problem I thought of is that while automatic vacuuming only
   happens with high update/delete load, index-only scans are best on
   mostly non-write tables, so we have bad behavior where the ideal case
   (static data) doesn't get vm-bits set, while update/delete has the
   vm-bits set, but then cleared as more update/deletes occur.
  
  Well, older tables will get vacuumed due to vacuum_freeze_table_age. So
  at some point they will get vacuumed and the vm bits will get set.
 
 Hmm, good point.  That would help with an insert-only workload, as long
 as you can chew through 200M transactions.   That doesn't help with a
 read-only workload as we don't consume transction IDs for SELECT.

It's even 150mio. For the other workloads, its pretty common wisdom to
VACUUM after bulk data loading. I think we even document that.

   The more I look at this the worse it appears.   How has this gone
   unaddressed for over a year?
  
  It's been discussed several times including during the introduction of
  the feature. I am a bit surprised about the panickey tone in this
  thread.
 
 This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
 manual VACUUM was required for index-only scans.  That thread ended with
 us realizing that pg_upgrade's ANALYZE runs will populate
 pg_class.relallvisible.

 What I didn't see in that thread is an analysis of what cases are going
 to require manual vacuum, and I have seen no work in 9.3 to improve
 that.  I don't even see it on the TODO list.

Yes, TODO maybe missing.

 It bothers me that we spent time developing index-only scans, but have
 significant workloads where it doesn't work, no efforts on improving it,
 and no documentation on manual workarounds.  I have not even seen
 discussion on how we are going to improve this.  I would like to have
 that discussion now.

It's not like the feature is useless in this case. You just need to
perform an extra operation to activate it. I am not saying we shouldn't
document it better, but it seriously worries me that a useful feature is
depicted as useless because it requires a manual VACUUM in some
circumstances.

  Yes, we need to overhaul the way vacuum works (to reduce the frequency
  of rewriting stuff repeatedly) and the way it's triggered (priorization,
  more trigger conditions) but all these are known things and just need
  somebody with time.

 Based on the work needed to improve this, I would have thought someone
 would have taken this on during 9.3 development.

There has been some discussion about it indirectly via the freezing
stuff. That also would require more advanced scheduling.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Bruce Momjian
On Fri, Sep  6, 2013 at 01:01:59PM -0400, Bruce Momjian wrote:
 This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
 manual VACUUM was required for index-only scans.  That thread ended with
 us realizing that pg_upgrade's ANALYZE runs will populate
 pg_class.relallvisible.
 
 What I didn't see in that thread is an analysis of what cases are going
 to require manual vacuum, and I have seen no work in 9.3 to improve
 that.  I don't even see it on the TODO list.

OK, let's start the discussion then.  I have added a TODO list:

Improve setting of visibility map bits for read-only and insert-only 
workloads

So, what should trigger an auto-vacuum vacuum for these workloads? 
Rather than activity, which is what normally drives autovacuum, it is
lack of activity that should drive it, combined with a high VM cleared
bit percentage.

It seems we can use these statistics values:

 n_tup_ins   | bigint   
 n_tup_upd   | bigint   
 n_tup_del   | bigint   
 n_tup_hot_upd   | bigint   
 n_live_tup  | bigint   
 n_dead_tup  | bigint   
 n_mod_since_analyze | bigint   
 last_vacuum | timestamp with time zone 
 last_autovacuum | timestamp with time zone 

Particilarly last_vacuum and last_autovacuum can tell us the last time
of vacuum.  If the n_tup_upd/n_tup_del counts are low, and the  VM set
bit count is low, it might need vacuuming, though inserts into existing
pages would complicate that.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Bruce Momjian
On Sat, Sep  7, 2013 at 12:26:23AM +0200, Andres Freund wrote:
  So, what should trigger an auto-vacuum vacuum for these workloads? 
  Rather than activity, which is what normally drives autovacuum, it is
  lack of activity that should drive it, combined with a high VM cleared
  bit percentage.
  
  It seems we can use these statistics values:
  
   n_tup_ins   | bigint   
   n_tup_upd   | bigint   
   n_tup_del   | bigint   
   n_tup_hot_upd   | bigint   
   n_live_tup  | bigint   
   n_dead_tup  | bigint   
   n_mod_since_analyze | bigint   
   last_vacuum | timestamp with time zone 
   last_autovacuum | timestamp with time zone 
  
  Particilarly last_vacuum and last_autovacuum can tell us the last time
  of vacuum.  If the n_tup_upd/n_tup_del counts are low, and the  VM set
  bit count is low, it might need vacuuming, though inserts into existing
  pages would complicate that.
 
 I wonder if we shouldn't trigger most vacuums (not analyze!) via unset
 fsm bits. Perhaps combined with keeping track of RecentGlobalXmin to

Fsm bits?  FSM tracks the free space on each page.  How does that help?

 make sure we're not repeatedly checking for work that cannot yet be
 done.

The idea of using RecentGlobalXmin to see how much _work_ has happened
since the last vacuum is interesting, but it doesn't handle read-only
transactions;  I am not sure how they can be tracked.  You make a good
point that 5 minutes passing is meaningless --- you really want to know
how many transactions have completed.  Unfortunately, our virtual
transactions make that hard to compute.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Bruce Momjian
On Fri, Sep  6, 2013 at 06:36:47PM +0200, Andres Freund wrote:
 On 2013-09-06 12:30:56 -0400, Bruce Momjian wrote:
   I am not sure I understand this though. What would be the point to go
   and set all visible and not do the rest of the vacuuming work?
  
   I think triggering vacuuming by scanning the visibility map for the
   number of unset bits and use that as another trigger is a good idea. The
   vm should ensure we're not doing superflous work.
 
  Yes, I think it might be hard to justify a separate VM-set-only scan of
  the table.  If you are already reading the table, and already checking
  to see if you can set the VM bit, I am not sure why you would not also
  remove old rows, especially since removing those rows might be necessary
  to allow setting VM bits.
 
 Yep. Although adding the table back into the fsm will lead to it being
 used for new writes again...

You mean adding _pages_ back into the table's FSM?  Yes, that is going
to cause those pages to get dirty, but it is better than expanding the
table size.  I don't see why you would not update the FSM.

  Another problem I thought of is that while automatic vacuuming only
  happens with high update/delete load, index-only scans are best on
  mostly non-write tables, so we have bad behavior where the ideal case
  (static data) doesn't get vm-bits set, while update/delete has the
  vm-bits set, but then cleared as more update/deletes occur.
 
 Well, older tables will get vacuumed due to vacuum_freeze_table_age. So
 at some point they will get vacuumed and the vm bits will get set.

Hmm, good point.  That would help with an insert-only workload, as long
as you can chew through 200M transactions.   That doesn't help with a
read-only workload as we don't consume transction IDs for SELECT.

  The more I look at this the worse it appears.   How has this gone
  unaddressed for over a year?
 
 It's been discussed several times including during the introduction of
 the feature. I am a bit surprised about the panickey tone in this
 thread.

This December 2012 thread by Andrew Dunstan shows he wasn't aware that a
manual VACUUM was required for index-only scans.  That thread ended with
us realizing that pg_upgrade's ANALYZE runs will populate
pg_class.relallvisible.

What I didn't see in that thread is an analysis of what cases are going
to require manual vacuum, and I have seen no work in 9.3 to improve
that.  I don't even see it on the TODO list.

It bothers me that we spent time developing index-only scans, but have
significant workloads where it doesn't work, no efforts on improving it,
and no documentation on manual workarounds.  I have not even seen
discussion on how we are going to improve this.  I would like to have
that discussion now.

 Yes, we need to overhaul the way vacuum works (to reduce the frequency
 of rewriting stuff repeatedly) and the way it's triggered (priorization,
 more trigger conditions) but all these are known things and just need
 somebody with time.

Based on the work needed to improve this, I would have thought someone
would have taken this on during 9.3 development.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-06 Thread Andres Freund
On 2013-09-06 20:29:08 -0400, Bruce Momjian wrote:
 On Sat, Sep  7, 2013 at 12:26:23AM +0200, Andres Freund wrote:
  I wonder if we shouldn't trigger most vacuums (not analyze!) via unset
  fsm bits. Perhaps combined with keeping track of RecentGlobalXmin to
 
 Fsm bits?  FSM tracks the free space on each page.  How does that
 help?

Err. I was way too tired when I wrote that. vm bits.

  make sure we're not repeatedly checking for work that cannot yet be
  done.

 The idea of using RecentGlobalXmin to see how much _work_ has happened
 since the last vacuum is interesting, but it doesn't handle read-only
 transactions;  I am not sure how they can be tracked.  You make a good
 point that 5 minutes passing is meaningless --- you really want to know
 how many transactions have completed.

So, what I was pondering went slightly into a different direction:

(lets ignore anti wraparound vacuum for now)

Currently we trigger autovacuums by the assumed number of dead
tuples. In the course of it's action it usually will find that it cannot
remove all dead rows and that it cannot mark everything as all
visible. That's because the xmin horizon hasn't advanced far enough. We
won't trigger another vacuum after that unless there are further dead
tuples in the relation...
One trick if we want to overcome that problem and that we do not handle
setting all visible nicely for INSERT only workloads would be to trigger
vacuum by the amount of pages that are not marked all visible in the vm.

The problem there is that repeatedly scanning a relation that's only 50%
visible where the rest cannot be marked all visible because of a
longrunning pg_dump obivously isn't a good idea. So we need something to
notify us when there's work to be done. Using elapsed time seems like a
bad idea because it doesn't adapt to changing workloads very well and
doesn't work nicely for different relations.

What I was thinking of was to keep track of the oldest xids on pages
that cannot be marked all visible. I haven't thought about the
statistics part much, but what if we binned the space between
[RecentGlobalXmin, -nextXid) into 10 bins and counted the number of
pages falling into each bin. Then after the vacuum finished we could
compute how far RecentGlobalXmin would have to progress to make another
vacuum worthwile by counting the number of pages from the lowest bin
upwards and use the bin's upper limit as the triggering xid.

Now, we'd definitely need to amend that scheme by something that handles
pages that are newly written to, but it seems something like that
wouldn't be too hard to implement and would make autovacuum more useful.

 Unfortunately, our virtual transactions make that hard to compute.

I don't think they pose too much of a complexity. We basically only have
to care about PGXACT-xmin here and virtual transactions don't change
the handling of that ...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-05 Thread Bruce Momjian
On Wed, Sep  4, 2013 at 04:56:55PM -0400, Bruce Momjian wrote:
  Add a column pg_class.relallvisible to remember the number of pages
  that were all-visible according to the visibility map as of the last
  VACUUM
  (or ANALYZE, or some other operations that update pg_class.relpages).
  Use relallvisible/relpages, instead of an arbitrary constant, to
  estimate how many heap page fetches can be avoided during an
  index-only scan.
  
  Have I missed some nuance?
 
 I am looking back at this issue now and I think you are correct.  The
 commit you mention (Oct 7 2011) says ANALYZE updates the visibility map,
 and the code matches that:
 
   if (!inh)
   vac_update_relstats(onerel,
   RelationGetNumberOfBlocks(onerel),
   totalrows,
 --   visibilitymap_count(onerel),
   hasindex,
   InvalidTransactionId);
 
 so if an index scan was not being used after an ANALYZE, it isn't a bad
 allvisibile estimate but something else.  This code was in PG 9.2.

Actually, I now realize it is more complex than that, and worse.  There
are several questions to study to understand when pg_class.relallvisible
is updated (which is used to determine if index-only scans are a good
optimization choice), and when VM all-visible bits are set so heap pages
can be skipped during index-only scans:

1)  When are VM bits set:
vacuum (non-full)
analyze (only some random pages)
2)  When are massive rows added but VM bits not set:
copy
3) When are VM bits cleared:
insert/update/delete
vacuum (non-full)
4)  When are VM map files cleared:
vacuum full
cluster
5) When is pg_class.relallvisible updated via a VM map file scan:
vacuum (non-full)
analyze

Vacuums run by autovacuum are driven by n_dead_tuples, which is only
update and delete.  Therefore, any scenario where vacuum (non-full) is
never run will not have significant VM bits set.  The only bits that
will be set will be by pages visited randomly by analyze.

The following table activities will not set proper VM bits:

vacuum full
cluster
copy
insert-only

If updates and deletes happen, there will eventually be sufficient
reason for autovacuum to vacuum the table and set proper VM bits, and
pg_class.relallvisible.

The calculus we should use to determine when we need to run vacuum has
changed with index-only scans, and I am not sure we ever fully addressed
this.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-05 Thread Robert Haas
On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote:
 Actually, I now realize it is more complex than that, and worse.  There
 are several questions to study to understand when pg_class.relallvisible
 is updated (which is used to determine if index-only scans are a good
 optimization choice), and when VM all-visible bits are set so heap pages
 can be skipped during index-only scans:

 1)  When are VM bits set:
 vacuum (non-full)
 analyze (only some random pages)

Analyze doesn't set visibility-map bits.  It only updates statistics
about how many are set.

 The calculus we should use to determine when we need to run vacuum has
 changed with index-only scans, and I am not sure we ever fully addressed
 this.

Yeah, we didn't.  I think the hard part is figuring out what behavior
would be best.  Counting inserts as well as updates and deletes would
be a simple approach, but I don't have much confidence in it.  My
experience is that having vacuum or analyze kick in during a bulk-load
operation is a disaster.  We'd kinda like to come up with a way to
make vacuum run after the bulk load is complete, maybe, but how would
we identify that time, and there are probably cases where that's not
right either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-05 Thread Gavin Flower

On 06/09/13 13:10, Robert Haas wrote:

On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote:

Actually, I now realize it is more complex than that, and worse.  There
are several questions to study to understand when pg_class.relallvisible
is updated (which is used to determine if index-only scans are a good
optimization choice), and when VM all-visible bits are set so heap pages
can be skipped during index-only scans:

 1)  When are VM bits set:
 vacuum (non-full)
 analyze (only some random pages)

Analyze doesn't set visibility-map bits.  It only updates statistics
about how many are set.


The calculus we should use to determine when we need to run vacuum has
changed with index-only scans, and I am not sure we ever fully addressed
this.

Yeah, we didn't.  I think the hard part is figuring out what behavior
would be best.  Counting inserts as well as updates and deletes would
be a simple approach, but I don't have much confidence in it.  My
experience is that having vacuum or analyze kick in during a bulk-load
operation is a disaster.  We'd kinda like to come up with a way to
make vacuum run after the bulk load is complete, maybe, but how would
we identify that time, and there are probably cases where that's not
right either.

How about a 'VACUUM AFTER' command (part of the 'BEGIN' transaction 
syntax?) that would:


1. only be valid in a transaction
2. initiate a vacuum after the current transaction completed
3. defer any vacuum triggered due to other criteria

If the transaction was rolled back: then if there was a pending vacuum, 
due to other reasons, it would then be actioned.


On normal transaction completion, then if there was a pending vacuum it 
would be combined with the one in the transaction.


Still would need some method of ensuring any pending vacuum was done if 
the transaction hung, or took too long.



Cheers,
Gavin


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-05 Thread Bruce Momjian
On Thu, Sep  5, 2013 at 09:10:06PM -0400, Robert Haas wrote:
 On Thu, Sep 5, 2013 at 8:14 PM, Bruce Momjian br...@momjian.us wrote:
  Actually, I now realize it is more complex than that, and worse.  There
  are several questions to study to understand when pg_class.relallvisible
  is updated (which is used to determine if index-only scans are a good
  optimization choice), and when VM all-visible bits are set so heap pages
  can be skipped during index-only scans:
 
  1)  When are VM bits set:
  vacuum (non-full)
  analyze (only some random pages)
 
 Analyze doesn't set visibility-map bits.  It only updates statistics
 about how many are set.

Sorry, yes you are correct.

  The calculus we should use to determine when we need to run vacuum has
  changed with index-only scans, and I am not sure we ever fully addressed
  this.
 
 Yeah, we didn't.  I think the hard part is figuring out what behavior
 would be best.  Counting inserts as well as updates and deletes would
 be a simple approach, but I don't have much confidence in it.  My
 experience is that having vacuum or analyze kick in during a bulk-load
 operation is a disaster.  We'd kinda like to come up with a way to
 make vacuum run after the bulk load is complete, maybe, but how would
 we identify that time, and there are probably cases where that's not
 right either.

I am unsure how we have gone a year with index-only scans and I am just
now learning that it only works well with update/delete workloads or by
running vacuum manually.  I only found this out going back over January
emails.  Did other people know this?  Was it not considered a serious
problem?

Well, our logic has been that vacuum is only for removing expired rows. 
I think we either need to improve that, or somehow make sequential scans
update the VM map, and then find a way to trigger update of
relallvisible even without inserts.

Ideas
-

I think we need to detect tables that do not have VM bits set and try to
determine if they should be vacuumed.  If a table has most of its VM
bits set, there in need to vacuum it for VM bit setting.

Autovacuum knows how many pages are in the table via its file size, and
it can scan the VM map to see how many pages are _not_ marked
all-visible.  If the VM map has many pages that are _not_ marked as
all-visible, and change count since last vacuum is low, those pages
might now be all-visible and vacuum might find them.  One problem is
that a long-running transaction is not going to update relallvisible
until commit, so you might be vacuuming a table that is being modified,
e.g. bulk loads.  Do we have any way of detecting if a backend is
modifying a table?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-05 Thread Alvaro Herrera
Bruce Momjian escribió:

 Ideas
 -
 
 I think we need to detect tables that do not have VM bits set and try to
 determine if they should be vacuumed.  If a table has most of its VM
 bits set, there in need to vacuum it for VM bit setting.

I think it's shortsighted to keep thinking of autovacuum as just a way
to run VACUUM and ANALYZE.  We have already discussed work items that
need to be done separately, such as truncating the last few empty pages
on a relation that was vacuumed recently.  We also need to process a GIN
index' pending insertion list; and with minmax indexes I will want to
run summarization of heap page ranges.

So maybe instead of trying to think of VM bit setting as part of vacuum,
we could just keep stats about how many pages we might need to scan
because of possibly needing to set the bit, and then doing that in
autovacuum, independently from actually vacuuming the relation.

I'm not sure if we need to expose all these new maintenance actions as
SQL commands.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2013-09-04 Thread Bruce Momjian
On Thu, Dec 13, 2012 at 03:31:06PM +, Peter Geoghegan wrote:
 On 13 December 2012 03:51, Tom Lane t...@sss.pgh.pa.us wrote:
  ANALYZE does not set that value, and is not going to start doing so,
  because it doesn't scan enough of the table to derive a trustworthy
  value.
 
 I'm slightly surprised by your remarks here, because the commit
 message where the relallvisible column was added (commit
 a2822fb9337a21f98ac4ce850bb4145acf47ca27) says:
 
 Add a column pg_class.relallvisible to remember the number of pages
 that were all-visible according to the visibility map as of the last
 VACUUM
 (or ANALYZE, or some other operations that update pg_class.relpages).
 Use relallvisible/relpages, instead of an arbitrary constant, to
 estimate how many heap page fetches can be avoided during an
 index-only scan.
 
 Have I missed some nuance?

I am looking back at this issue now and I think you are correct.  The
commit you mention (Oct 7 2011) says ANALYZE updates the visibility map,
and the code matches that:

if (!inh)
vac_update_relstats(onerel,
RelationGetNumberOfBlocks(onerel),
totalrows,
-- visibilitymap_count(onerel),
hasindex,
InvalidTransactionId);

so if an index scan was not being used after an ANALYZE, it isn't a bad
allvisibile estimate but something else.  This code was in PG 9.2.


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2012-12-13 Thread Simon Riggs
On 13 December 2012 03:51, Tom Lane t...@sss.pgh.pa.us wrote:

 Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
 have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
 would prefer the later.

 ANALYZE does not set that value, and is not going to start doing so,
 because it doesn't scan enough of the table to derive a trustworthy
 value.

ISTM that ANALYZE doesn't need to scan the table to do this. The
vismap is now trustworthy and we can scan it separately on ANALYZE.

More to the point, since we run ANALYZE more frequently than we run
VACUUM, the value stored by the last VACUUM could be very stale.

 It's been clear for some time that pg_upgrade ought to do something
 about transferring the statistics columns in pg_class to the new
 cluster.  This is just another example of why.

Agreed, but that could bring other problems as well.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2012-12-13 Thread Bruce Momjian
On Thu, Dec 13, 2012 at 09:40:40AM +, Simon Riggs wrote:
 On 13 December 2012 03:51, Tom Lane t...@sss.pgh.pa.us wrote:
 
  Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
  have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
  would prefer the later.
 
  ANALYZE does not set that value, and is not going to start doing so,
  because it doesn't scan enough of the table to derive a trustworthy
  value.
 
 ISTM that ANALYZE doesn't need to scan the table to do this. The
 vismap is now trustworthy and we can scan it separately on ANALYZE.
 
 More to the point, since we run ANALYZE more frequently than we run
 VACUUM, the value stored by the last VACUUM could be very stale.

Wouldn't inserts affect the relallvisible ratio, but not cause a vacuum?
Seems we should be having analyze update this independent of pg_upgrade
needing it.  Also, why is this in pg_class?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2012-12-13 Thread Peter Geoghegan
On 13 December 2012 03:51, Tom Lane t...@sss.pgh.pa.us wrote:
 ANALYZE does not set that value, and is not going to start doing so,
 because it doesn't scan enough of the table to derive a trustworthy
 value.

I'm slightly surprised by your remarks here, because the commit
message where the relallvisible column was added (commit
a2822fb9337a21f98ac4ce850bb4145acf47ca27) says:

Add a column pg_class.relallvisible to remember the number of pages
that were all-visible according to the visibility map as of the last
VACUUM
(or ANALYZE, or some other operations that update pg_class.relpages).
Use relallvisible/relpages, instead of an arbitrary constant, to
estimate how many heap page fetches can be avoided during an
index-only scan.

Have I missed some nuance?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2012-12-12 Thread Bruce Momjian
On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote:
 
 On 12/12/2012 05:12 PM, Andrew Dunstan wrote:
 
 On 12/12/2012 04:32 PM, Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
 A client is testing a migration from 9.1 to 9.2, and has found that a
 large number of queries run much faster if they use index-only scans.
 However, the only way he has found to get such a plan is by increasing
 the seq_page_cost to insanely high levels (3.5). Is there any approved
 way to encourage such scans that's a but less violent than this?
 Is the pg_class.relallvisible estimate for the table realistic? They
 might need a few more VACUUM and ANALYZE cycles to get it into the
 neighborhood of reality, if not.
 
 That was the problem - I didn't know this hadn't been done.
 
 
 Actually, the table had been analysed but not vacuumed, so this
 kinda begs the question what will happen to this value on
 pg_upgrade? Will people's queries suddenly get slower until
 autovacuum kicks in on the table?

[ moved to hackers list.]

Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
would prefer the later.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2012-12-12 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote:
 Actually, the table had been analysed but not vacuumed, so this
 kinda begs the question what will happen to this value on
 pg_upgrade? Will people's queries suddenly get slower until
 autovacuum kicks in on the table?

 [ moved to hackers list.]

 Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
 have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
 would prefer the later.

ANALYZE does not set that value, and is not going to start doing so,
because it doesn't scan enough of the table to derive a trustworthy
value.

It's been clear for some time that pg_upgrade ought to do something
about transferring the statistics columns in pg_class to the new
cluster.  This is just another example of why.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PERFORM] encouraging index-only scans

2012-12-12 Thread Pavan Deolasee
On Thu, Dec 13, 2012 at 9:21 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 On Wed, Dec 12, 2012 at 05:27:39PM -0500, Andrew Dunstan wrote:
 Actually, the table had been analysed but not vacuumed, so this
 kinda begs the question what will happen to this value on
 pg_upgrade? Will people's queries suddenly get slower until
 autovacuum kicks in on the table?

 [ moved to hackers list.]

 Yes, this does seem like a problem for upgrades from 9.2 to 9.3?  We can
 have pg_dump --binary-upgrade set these, or have ANALYZE set it.   I
 would prefer the later.

 ANALYZE does not set that value, and is not going to start doing so,
 because it doesn't scan enough of the table to derive a trustworthy
 value.


Should we do that though ? i.e. scan the entire map and count the
number of bits at the end of ANALYZE, like we do at the end of VACUUM
? I recently tried to optimize that code path by not recounting at the
end of the vacuum and instead track the number of all-visible bits
while scanning them in the earlier phases on vacuum. But it turned out
that its so fast to count even a million bits that its probably not
worth doing so.

 It's been clear for some time that pg_upgrade ought to do something
 about transferring the statistics columns in pg_class to the new
 cluster.  This is just another example of why.


+1.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers