Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-29 Thread Robert Haas
On Mon, Oct 27, 2014 at 5:51 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Jeff Janes wrote:
 It is only a page read if you have to read the page.  It would seem optimal
 to have bgwriter adventitiously set hint bits and vm bits, because that is
 the last point at which the page can be changed without risking that it be
 written out twice. At that point, it has been given the maximum amount of
 time it can be given for the interested transactions to have committed and
 to have aged past the xmin horizon.  I seem to recall that the main problem
 with that, though, is that you must be attached to a database in order to
 determine visibility, and bgwriter is not attached to a database.

 Regarding tuple hint bits, I couldn't find any such limitation in
 SetHintBits, other than in MarkBufferDirtyHint there being some code
 that would cause trouble: it accesses MyPgXact, which bgwriter would
 obviously not have.  Maybe worth some experimentation ...

 I'm not sure about vm bits, though.  That's a whole different topic.

From a theoretical point of view, hint bits and VM bits present
mostly-similar issues; they are basically dependent on the state of
the transaction machinery, which for the most part is cluster-wide, so
it should be fine for a process to do that sort of work without being
bound to a specific database.   I think the biggest problem for either
is that the bgwriter can't look at the system catalogs to find out
whether a particular buffer contains a heap page or an index page (or
something else) and it needs to know that, which probably requires
another buffer flag.

From a practical point of view, there's a major performance difference
between the two operations: setting a VM bit requires emitting a WAL
record, whereas setting hint bits does not.  That might make it too
expensive to be worth doing, at least in some situations.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-28 Thread Alvaro Herrera
Jeff Janes wrote:

 It is only a page read if you have to read the page.  It would seem optimal
 to have bgwriter adventitiously set hint bits and vm bits, because that is
 the last point at which the page can be changed without risking that it be
 written out twice. At that point, it has been given the maximum amount of
 time it can be given for the interested transactions to have committed and
 to have aged past the xmin horizon.  I seem to recall that the main problem
 with that, though, is that you must be attached to a database in order to
 determine visibility, and bgwriter is not attached to a database.

Regarding tuple hint bits, I couldn't find any such limitation in
SetHintBits, other than in MarkBufferDirtyHint there being some code
that would cause trouble: it accesses MyPgXact, which bgwriter would
obviously not have.  Maybe worth some experimentation ...

I'm not sure about vm bits, though.  That's a whole different topic.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-21 Thread Jeff Janes
On Mon, Oct 20, 2014 at 5:46 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-10-20 17:43:26 -0700, Josh Berkus wrote:
  On 10/20/2014 05:39 PM, Jim Nasby wrote:
   Or maybe vacuum isn't the right way to handle some of these scenarios.
   It's become the catch-all for all of this stuff, but maybe that doesn't
   make sense anymore. Certainly when it comes to dealing with inserts
   there's no reason we *have* to do anything other than set hint bits and
   possibly freeze xmin.
 
  +1

 A page read is a page read. What's the point of heaving another process
 do it?


It is only a page read if you have to read the page.  It would seem optimal
to have bgwriter adventitiously set hint bits and vm bits, because that is
the last point at which the page can be changed without risking that it be
written out twice. At that point, it has been given the maximum amount of
time it can be given for the interested transactions to have committed and
to have aged past the xmin horizon.  I seem to recall that the main problem
with that, though, is that you must be attached to a database in order to
determine visibility, and bgwriter is not attached to a database.

Cheers,

Jeff


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-21 Thread Jim Nasby

On 10/21/14, 4:36 PM, Jeff Janes wrote:

On Mon, Oct 20, 2014 at 5:46 PM, Andres Freund and...@2ndquadrant.com 
mailto:and...@2ndquadrant.com wrote:

On 2014-10-20 17:43:26 -0700, Josh Berkus wrote:
 On 10/20/2014 05:39 PM, Jim Nasby wrote:
  Or maybe vacuum isn't the right way to handle some of these scenarios.
  It's become the catch-all for all of this stuff, but maybe that doesn't
  make sense anymore. Certainly when it comes to dealing with inserts
  there's no reason we *have* to do anything other than set hint bits and
  possibly freeze xmin.

 +1

A page read is a page read. What's the point of heaving another process
do it?


It is only a page read if you have to read the page.  It would seem optimal to 
have bgwriter adventitiously set hint bits and vm bits, because that is the 
last point at which the page can be changed without risking that it be written 
out twice. At that point, it has been given the maximum amount of time it can 
be given for the interested transactions to have committed and to have aged 
past the xmin horizon.  I seem to recall that the main problem with that, 
though, is that you must be attached to a database in order to determine 
visibility, and bgwriter is not attached to a database.


It's also a bit more complex than a simple question of is the page still in shared 
buffers. Our *real* last chance is when the page is about to be evicted from the 
filesystem cache; after that reading it back it will be extremely expensive (relatively 
speaking).

I think it's worth considering this, because if you have any moderate length 
transactions on a busy database bgwriter won't be able to help much; you'll be 
burning through shared buffers too quickly.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-20 Thread Andres Freund
On 2014-10-19 20:43:29 -0500, Jim Nasby wrote:
 On 10/19/14, 11:41 AM, Andres Freund wrote:
 On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
 The weird part is that if it's not doing a freeze it will just punt
 on a page if it can't get the cleanup lock.
 
 I don't think that's particularly wierd. Otherwise vacuum can get stuck
 behind a single very hot page - leading to much, much more bloat.
 
 I have to believe that could seriously screw up autovacuum scheduling.
 
 Why?
 
 I'm worried there could be some pathological cases where we'd skip a
 large number of pages, perhaps if a vacuum scan and a seqscan ended up
 running alongside each other.

I've seen little evidence of that. The reverse, a stuck autovacuum, is
imo much more likely. For this to be an actual problem you'd need to
encounter many pages that are not locked, but are pinned. That state
doesn't exist for very long.

 Perhaps this is just paranoia, but we have no idea how bad things
 might be, because we don't have any logging for how many pages we
 skipped because we couldn't lock them.

But so what? If we skip individual pages it won't be too bad - and very
likely waiting very long is going to be more painful. The page won't be
marked 'all visible' so the next vacuum will come around to it
again. And it'll also get cleaned up by opportunistic hot pruning.

 Also, if this really is that big a deal for heap pages, how come we
 don't get screwed by it on Btree index pages, where we mandate that we
 acquire a cleanup lock?

Because we never hold pins for btree pages for very long. Whereas we do
that for heap pages. If you e.g. run a cursor forward you can hold a pin
for essentially unbounded time.

 Now that we have forks, I'm wondering if it would be best to come up
 with a per-page system that could be used to determine when a table
 needs background work to be done. The visibility map could serve a lot
 of this purpose, but I'm not sure if it would work for getting hint
 bits set in the background.
 
 It would. Per definition, all tuples that are 'all visible' need to be
 fully hint bitted.
 
 I think it would also be a win if we had a way to advance relfrozenxid
 and relminmxid. Perhaps something that simply remembered the last XID
 that touched each page...
 
 Not sure what you're getting at here?
 
 That ultimately, our current method for determining when and what to
 vacuum is rather crude, and likely results in wasted effort during
 scans as well as not firing autovac often enough. Keep in mind that
 autovac started as a user-space utility and the best it could possibly
 do was to keep a table of stats counters.

I agree that we should trigger autovacuum more often. It's
*intentionally* not triggered *at all* for insert only workloads (if you
discount anti wraparound vacuums). I think it's time to change that. For
that we'd need to make vacuums that don't delete any tuples cheaper. We
already rescan only the changed parts of the heaps - but we always scan
indexes fully...

 The visibility map obviously helps cut down on extra work during a
 scan, but it only goes so far in that regard.

Aha.

 Instead of relying on the crude methods, if we reliably tracked
 certain txids on a per-block basis in a fork, we could cheaply scan
 the fork and make an extremely informed decision on how much a vacuum
 would gain us, and exactly what blocks it should hit.

 Let me use freezing as an example. If we had a reliable list of the
 lowest txid for each block of a relation that would allow us to do a
 freeze scan by hitting only blocks with minimum txid within our freeze
 range. The same could be done for multixacts.

It'd also become a prime contention point because you'd need to
constantly update it. In contrast to a simple 'is frozen' bit (akin to
is_visible) which only changes infrequently, and only in one direction.

 If we stored 3 txids for each block in a fork, we could fit
 information for ~680 heap blocks in each fork block. So in a database
 with 680G of heap data, we could fully determine every *block* (not
 table) we needed to vacuum by scanning just 1GB of data. That would
 allow for far better autovacuum scheduling than what we do today.

It's not that simple. Wraparounds and locking complicate it
significantly.

 I think the big missing piece lest something like Heikki's xid lsn
 ranges thing gets finished is a freeze map.
 
 The problem with a simple freeze map is when do you actually set the
 bit?

There's precisely one place where you can set it for normal
operation. During vacuum's scan.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-20 Thread Jim Nasby

On 10/20/14, 3:11 PM, Andres Freund wrote:

On 2014-10-19 20:43:29 -0500, Jim Nasby wrote:

On 10/19/14, 11:41 AM, Andres Freund wrote:

On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:

The weird part is that if it's not doing a freeze it will just punt
on a page if it can't get the cleanup lock.


I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.


I have to believe that could seriously screw up autovacuum scheduling.


Why?


I'm worried there could be some pathological cases where we'd skip a
large number of pages, perhaps if a vacuum scan and a seqscan ended up
running alongside each other.


I've seen little evidence of that. The reverse, a stuck autovacuum, is
imo much more likely. For this to be an actual problem you'd need to
encounter many pages that are not locked, but are pinned. That state
doesn't exist for very long.


How would you actually get evidence of this... we don't log it. :) (See my 
proposal at 
http://www.postgresql.org/message-id/54446c10.2080...@bluetreble.com)


Perhaps this is just paranoia, but we have no idea how bad things
might be, because we don't have any logging for how many pages we
skipped because we couldn't lock them.


But so what? If we skip individual pages it won't be too bad - and very
likely waiting very long is going to be more painful. The page won't be
marked 'all visible' so the next vacuum will come around to it
again. And it'll also get cleaned up by opportunistic hot pruning.


Probably true. Hopefully we can start logging it and then we'll know for sure.



That ultimately, our current method for determining when and what to
vacuum is rather crude, and likely results in wasted effort during
scans as well as not firing autovac often enough. Keep in mind that
autovac started as a user-space utility and the best it could possibly
do was to keep a table of stats counters.


I agree that we should trigger autovacuum more often. It's
*intentionally* not triggered *at all* for insert only workloads (if you
discount anti wraparound vacuums). I think it's time to change that. For
that we'd need to make vacuums that don't delete any tuples cheaper. We
already rescan only the changed parts of the heaps - but we always scan
indexes fully...


Or maybe vacuum isn't the right way to handle some of these scenarios. It's 
become the catch-all for all of this stuff, but maybe that doesn't make sense 
anymore. Certainly when it comes to dealing with inserts there's no reason we 
*have* to do anything other than set hint bits and possibly freeze xmin.


Instead of relying on the crude methods, if we reliably tracked
certain txids on a per-block basis in a fork, we could cheaply scan
the fork and make an extremely informed decision on how much a vacuum
would gain us, and exactly what blocks it should hit.



Let me use freezing as an example. If we had a reliable list of the
lowest txid for each block of a relation that would allow us to do a
freeze scan by hitting only blocks with minimum txid within our freeze
range. The same could be done for multixacts.


It'd also become a prime contention point because you'd need to
constantly update it. In contrast to a simple 'is frozen' bit (akin to
is_visible) which only changes infrequently, and only in one direction.


Actually, the contention on freeze would very possibly be minimal, because it 
probably doesn't change very often. Even if it did, it's OK if the value isn't 
100% accurate, so long as the recorded XID is guaranteed older than what's 
actually on the page.


If we stored 3 txids for each block in a fork, we could fit
information for ~680 heap blocks in each fork block. So in a database
with 680G of heap data, we could fully determine every *block* (not
table) we needed to vacuum by scanning just 1GB of data. That would
allow for far better autovacuum scheduling than what we do today.


It's not that simple. Wraparounds and locking complicate it
significantly.


I realize what I'm talking about isn't trivial (though, I'm confused by your 
comment about wraparound since presumably TransactionIdPrecedes() and it's ilk 
solve that problem...)

My ultimate point here is that we're using what are (today) very crude methods 
to control what gets vacuumed when, and I think that now that we have resource 
forks would could do *much* better without a tremendous amount of work. But to 
make a big advancement here we'll need to take a step back and rethink some 
things (like vacuum is the only way to handle these problems).

Let me put some thought into this.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-20 Thread Josh Berkus
On 10/20/2014 05:39 PM, Jim Nasby wrote:
 Or maybe vacuum isn't the right way to handle some of these scenarios.
 It's become the catch-all for all of this stuff, but maybe that doesn't
 make sense anymore. Certainly when it comes to dealing with inserts
 there's no reason we *have* to do anything other than set hint bits and
 possibly freeze xmin.

+1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-20 Thread Andres Freund
On 2014-10-20 17:43:26 -0700, Josh Berkus wrote:
 On 10/20/2014 05:39 PM, Jim Nasby wrote:
  Or maybe vacuum isn't the right way to handle some of these scenarios.
  It's become the catch-all for all of this stuff, but maybe that doesn't
  make sense anymore. Certainly when it comes to dealing with inserts
  there's no reason we *have* to do anything other than set hint bits and
  possibly freeze xmin.
 
 +1

A page read is a page read. What's the point of heaving another process
do it? Vacuum doesn't dirty pages if they don't have to be
dirtied. Especially stuff like freezing cannot really be dealt with
outside of vacuum unless you make already complex stuff more complex for
a marginal benefit.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-19 Thread Andres Freund
On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
 On 10/9/14, 4:19 PM, Andres Freund wrote:
 On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
  On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
   Bruce Momjian wrote:
   
I agree this is a serious problem.  We have discussed various 
options,
but have not decided on anything.  The TODO list has:

  https://wiki.postgresql.org/wiki/Todo

  Improve setting of visibility map bits for read-only and 
 insert-only
  workloads
  
  
 http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us
   
   I hate to repeat myself, but I think autovacuum could be modified to 
   run
   actions other than vacuum and analyze.  In this specific case we 
   could
   be running a table scan that checks only pages that don't have the
   all-visible bit set, and see if it can be set.
  
  Isn't that*precisely*  what a plain vacuum run does?
 
 Well, it also scans for dead tuples, removes them, and needs to go
 through indexes to remove their references.

 IIRC it doesn't do most of that if that there's no need. And if it's a
 insert only table without rollbacks. I*do*  think there's some
 optimizations we could make in general.
 
 No, it always attempts dead tuple removal.

I said some steps, not all steps. Check it out:

/* If any tuples need to be deleted, perform final vacuum cycle */
/* XXX put a threshold on min number of tuples here? */
if (vacrelstats-num_dead_tuples  0)
{
/* Log cleanup info before we touch indexes */
vacuum_log_cleanup_info(onerel, vacrelstats);

/* Remove index entries */
for (i = 0; i  nindexes; i++)
lazy_vacuum_index(Irel[i],
  indstats[i],
  vacrelstats);
/* Remove tuples from heap */
lazy_vacuum_heap(onerel, vacrelstats);
vacrelstats-num_index_scans++;
}

There's rub here though. We unconditionally do:
/* Do post-vacuum cleanup and statistics update for each index */
for (i = 0; i  nindexes; i++)
lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);

and that's not particularly cheap. Maybe we should make that conditional
when there's been no lazy_vacuum_index/heap calls at all?

 The weird part is that if it's not doing a freeze it will just punt
 on a page if it can't get the cleanup lock.

I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.

 I have to believe that could seriously screw up autovacuum scheduling.

Why?

 Now that we have forks, I'm wondering if it would be best to come up
 with a per-page system that could be used to determine when a table
 needs background work to be done. The visibility map could serve a lot
 of this purpose, but I'm not sure if it would work for getting hint
 bits set in the background.

It would. Per definition, all tuples that are 'all visible' need to be
fully hint bitted.

 I think it would also be a win if we had a way to advance relfrozenxid
 and relminmxid. Perhaps something that simply remembered the last XID
 that touched each page...

Not sure what you're getting at here?

I think the big missing piece lest something like Heikki's xid lsn
ranges thing gets finished is a freeze map.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-19 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 There's rub here though. We unconditionally do:
   /* Do post-vacuum cleanup and statistics update for each index */
   for (i = 0; i  nindexes; i++)
   lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);

 and that's not particularly cheap. Maybe we should make that conditional
 when there's been no lazy_vacuum_index/heap calls at all?

Absolutely not.  If the cleanup step is skippable, it would be the
province of the index AM to make that decision.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-19 Thread Andres Freund
On 2014-10-19 12:50:30 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  There's rub here though. We unconditionally do:
  /* Do post-vacuum cleanup and statistics update for each index */
  for (i = 0; i  nindexes; i++)
  lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
 
  and that's not particularly cheap. Maybe we should make that conditional
  when there's been no lazy_vacuum_index/heap calls at all?
 
 Absolutely not.  If the cleanup step is skippable, it would be the
 province of the index AM to make that decision.

Fair point. At the moment we're doing a full of nbtree indexes everytime
we do a vacuum. Even when the heap vacuum only scanned a couple hundred
pages of a huge table. That makes partial vacuum noticeably less
useful. So I do think we need to do something to improve upon the
situation.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-19 Thread Jim Nasby



On 10/19/14, 11:41 AM, Andres Freund wrote:

On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:

On 10/9/14, 4:19 PM, Andres Freund wrote:

On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:

Bruce Momjian wrote:


I agree this is a serious problem.  We have discussed various options,
but have not decided on anything.  The TODO list has:

https://wiki.postgresql.org/wiki/Todo

Improve setting of visibility map bits for read-only and insert-only
workloads

http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us


I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze.  In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.


Isn't that*precisely*  what a plain vacuum run does?


Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.



IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I*do*  think there's some
optimizations we could make in general.


No, it always attempts dead tuple removal.


I said some steps, not all steps. Check it out:

 /* If any tuples need to be deleted, perform final vacuum cycle */
 /* XXX put a threshold on min number of tuples here? */
 if (vacrelstats-num_dead_tuples  0)
 {
 /* Log cleanup info before we touch indexes */
 vacuum_log_cleanup_info(onerel, vacrelstats);

 /* Remove index entries */
 for (i = 0; i  nindexes; i++)
 lazy_vacuum_index(Irel[i],
   indstats[i],
   vacrelstats);
 /* Remove tuples from heap */
 lazy_vacuum_heap(onerel, vacrelstats);
 vacrelstats-num_index_scans++;
 }

There's rub here though. We unconditionally do:
/* Do post-vacuum cleanup and statistics update for each index */
for (i = 0; i  nindexes; i++)
lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);

and that's not particularly cheap. Maybe we should make that conditional
when there's been no lazy_vacuum_index/heap calls at all?


We could possibly pass in to lazy_cleanup_index whether we actually removed any 
tuples.


The weird part is that if it's not doing a freeze it will just punt
on a page if it can't get the cleanup lock.


I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.


I have to believe that could seriously screw up autovacuum scheduling.


Why?


I'm worried there could be some pathological cases where we'd skip a large 
number of pages, perhaps if a vacuum scan and a seqscan ended up running 
alongside each other.

Perhaps this is just paranoia, but we have no idea how bad things might be, 
because we don't have any logging for how many pages we skipped because we 
couldn't lock them.

Also, if this really is that big a deal for heap pages, how come we don't get 
screwed by it on Btree index pages, where we mandate that we acquire a cleanup 
lock?


Now that we have forks, I'm wondering if it would be best to come up
with a per-page system that could be used to determine when a table
needs background work to be done. The visibility map could serve a lot
of this purpose, but I'm not sure if it would work for getting hint
bits set in the background.


It would. Per definition, all tuples that are 'all visible' need to be
fully hint bitted.


I think it would also be a win if we had a way to advance relfrozenxid
and relminmxid. Perhaps something that simply remembered the last XID
that touched each page...


Not sure what you're getting at here?


That ultimately, our current method for determining when and what to vacuum is 
rather crude, and likely results in wasted effort during scans as well as not 
firing autovac often enough. Keep in mind that autovac started as a user-space 
utility and the best it could possibly do was to keep a table of stats counters.

The visibility map obviously helps cut down on extra work during a scan, but it 
only goes so far in that regard.

Instead of relying on the crude methods, if we reliably tracked certain txids 
on a per-block basis in a fork, we could cheaply scan the fork and make an 
extremely informed decision on how much a vacuum would gain us, and exactly 
what blocks it should hit.

Let me use freezing as an example. If we had a reliable list of the lowest txid 
for each block of a relation that would allow us to do a freeze scan by hitting 
only blocks with minimum txid within our freeze range. The same could be done 
for multixacts.

That's just one example. We could do something similar for background hinting 
(oldest xmin/xmax of all non-hinted tuples), and whether it's worth trying 

Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-18 Thread Jim Nasby

On 10/9/14, 4:19 PM, Andres Freund wrote:

On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:

Andres Freund wrote:

 On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:

  Bruce Momjian wrote:
  

   I agree this is a serious problem.  We have discussed various options,
   but have not decided on anything.  The TODO list has:
   
https://wiki.postgresql.org/wiki/Todo
   
Improve setting of visibility map bits for read-only and insert-only
workloads


http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us

  
  I hate to repeat myself, but I think autovacuum could be modified to run
  actions other than vacuum and analyze.  In this specific case we could
  be running a table scan that checks only pages that don't have the
  all-visible bit set, and see if it can be set.

 
 Isn't that*precisely*  what a plain vacuum run does?


Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I*do*  think there's some
optimizations we could make in general.


No, it always attempts dead tuple removal. The weird part is that if it's not 
doing a freeze it will just punt on a page if it can't get the cleanup lock. I have to 
believe that could seriously screw up autovacuum scheduling.

Now that we have forks, I'm wondering if it would be best to come up with a 
per-page system that could be used to determine when a table needs background 
work to be done. The visibility map could serve a lot of this purpose, but I'm 
not sure if it would work for getting hint bits set in the background. I think 
it would also be a win if we had a way to advance relfrozenxid and relminmxid. 
Perhaps something that simply remembered the last XID that touched each page...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Alexey Bashtanov

Hello!

Autovacuum daemon performs vacuum when the number of rows 
updated/deleted (n_dead_tuples) reaches some threshold.
Similarly it performs analyze when the number of rows changed in any way 
(incl. inserted).
When a table is mostly insert-only, its visibility map is not updated as 
vacuum threshold is almost never reached, but analyze does not update 
visibility map.


Why could it be a bad idea to run vacuum after some number of any 
changes including inserts, like analyze?
Or at least make it tunable by user (add a second bunch of paramters to 
control second vacuum threshold, disabled by default)?


Best regards,
  Alexey Bashtanov


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Bruce Momjian
On Thu, Oct  9, 2014 at 02:34:17PM +0400, Alexey Bashtanov wrote:
 Hello!
 
 Autovacuum daemon performs vacuum when the number of rows
 updated/deleted (n_dead_tuples) reaches some threshold.
 Similarly it performs analyze when the number of rows changed in any
 way (incl. inserted).
 When a table is mostly insert-only, its visibility map is not
 updated as vacuum threshold is almost never reached, but analyze
 does not update visibility map.
 
 Why could it be a bad idea to run vacuum after some number of any
 changes including inserts, like analyze?
 Or at least make it tunable by user (add a second bunch of paramters
 to control second vacuum threshold, disabled by default)?

I agree this is a serious problem.  We have discussed various options,
but have not decided on anything.  The TODO list has:

https://wiki.postgresql.org/wiki/Todo

Improve setting of visibility map bits for read-only and insert-only
workloads

  http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us


-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Alvaro Herrera
Bruce Momjian wrote:

 I agree this is a serious problem.  We have discussed various options,
 but have not decided on anything.  The TODO list has:
 
   https://wiki.postgresql.org/wiki/Todo
 
   Improve setting of visibility map bits for read-only and insert-only
   workloads
   
 http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze.  In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.  (Of course, this idea
needs refinement to avoid running over and over when the bit cannot be
set on some pages for whatever reason.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Andres Freund
On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
 Bruce Momjian wrote:
 
  I agree this is a serious problem.  We have discussed various options,
  but have not decided on anything.  The TODO list has:
  
  https://wiki.postgresql.org/wiki/Todo
  
  Improve setting of visibility map bits for read-only and insert-only
  workloads
  
http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us
 
 I hate to repeat myself, but I think autovacuum could be modified to run
 actions other than vacuum and analyze.  In this specific case we could
 be running a table scan that checks only pages that don't have the
 all-visible bit set, and see if it can be set.

Isn't that *precisely* what a plain vacuum run does?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Kevin Grittner
Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Bruce Momjian wrote:

 I agree this is a serious problem.  We have discussed various options,
 but have not decided on anything.  The TODO list has:

 https://wiki.postgresql.org/wiki/Todo

 Improve setting of visibility map bits for read-only and insert-only
 workloads

   http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us

 I hate to repeat myself, but I think autovacuum could be modified to run
 actions other than vacuum and analyze.  In this specific case we could
 be running a table scan that checks only pages that don't have the
 all-visible bit set, and see if it can be set.  (Of course, this idea
 needs refinement to avoid running over and over when the bit cannot be
 set on some pages for whatever reason.)

Wouldn't we get substantially the same thing just by counting tuple
inserts toward the autovacuum vacuum threshold?  I mean, it unless
the table is due for wraparound prevention autovacuum, it will only
visit pages that don't have the all-visible bit set, right?  And
how much work would that do beyond what you're describing if none
of the tuples are dead?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
  Bruce Momjian wrote:
  
   I agree this is a serious problem.  We have discussed various options,
   but have not decided on anything.  The TODO list has:
   
 https://wiki.postgresql.org/wiki/Todo
   
 Improve setting of visibility map bits for read-only and insert-only
 workloads
 
   http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us
  
  I hate to repeat myself, but I think autovacuum could be modified to run
  actions other than vacuum and analyze.  In this specific case we could
  be running a table scan that checks only pages that don't have the
  all-visible bit set, and see if it can be set.
 
 Isn't that *precisely* what a plain vacuum run does?

Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.  I'm thinking in something
very lightweight.  Otherwise, why don't we just reduce the
vacuum_scale_factor default to something very small, so that vacuum is
triggered more often?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Alvaro Herrera
Kevin Grittner wrote:

 Wouldn't we get substantially the same thing just by counting tuple
 inserts toward the autovacuum vacuum threshold?  I mean, it unless
 the table is due for wraparound prevention autovacuum, it will only
 visit pages that don't have the all-visible bit set, right?  And
 how much work would that do beyond what you're describing if none
 of the tuples are dead?

The problem is precisely what happens if there are some dead tuples, but
not enough to reach the 20% threshold: this vacuum now has to scan the
table twice and has to clean up indexes also.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Andres Freund
On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
  On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
   Bruce Momjian wrote:
   
I agree this is a serious problem.  We have discussed various options,
but have not decided on anything.  The TODO list has:

https://wiki.postgresql.org/wiki/Todo

Improve setting of visibility map bits for read-only and 
insert-only
workloads

  
http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us
   
   I hate to repeat myself, but I think autovacuum could be modified to run
   actions other than vacuum and analyze.  In this specific case we could
   be running a table scan that checks only pages that don't have the
   all-visible bit set, and see if it can be set.
  
  Isn't that *precisely* what a plain vacuum run does?
 
 Well, it also scans for dead tuples, removes them, and needs to go
 through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I *do* think there's some
optimizations we could make in general.

 I'm thinking in something
 very lightweight.  Otherwise, why don't we just reduce the
 vacuum_scale_factor default to something very small, so that vacuum is
 triggered more often?

The problem here is that that doesn't trigger for inserts. Just for
updates/deletes or rollbacks.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

2014-10-09 Thread Jim Nasby

On 10/9/14, 4:03 PM, Alvaro Herrera wrote:

Bruce Momjian wrote:


I agree this is a serious problem.  We have discussed various options,
but have not decided on anything.  The TODO list has:

https://wiki.postgresql.org/wiki/Todo

Improve setting of visibility map bits for read-only and insert-only
workloads

  http://www.postgresql.org/message-id/20130906001437.ga29...@momjian.us


I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze.  In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.  (Of course, this idea
needs refinement to avoid running over and over when the bit cannot be
set on some pages for whatever reason.)


If we go down that road we should also think about having it proactively set 
hint bits...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers