[HACKERS] Visibility map page pinned for too long ?

2012-12-03 Thread Pavan Deolasee
I was looking at the code when the following tiny bit caught my attention.
In vacuumlazy.c, we release the pin on the final VM page at line number 972.

 954 if (vacrelstats-num_dead_tuples  0)
 955 {
 956 /* Log cleanup info before we touch indexes */
 957 vacuum_log_cleanup_info(onerel, vacrelstats);
 958
 959 /* Remove index entries */
 960 for (i = 0; i  nindexes; i++)
 961 lazy_vacuum_index(Irel[i],
 962   indstats[i],
 963   vacrelstats);
 964 /* Remove tuples from heap */
 965 lazy_vacuum_heap(onerel, vacrelstats);
 966 vacrelstats-num_index_scans++;
 967 }
 968
 969 /* Release the pin on the visibility map page */
 970 if (BufferIsValid(vmbuffer))
 971 {
 972 ReleaseBuffer(vmbuffer);
 973 vmbuffer = InvalidBuffer;
 974 }

So we are holding the pin right through the index vacuuming and the second
pass over the heap; both can take a very long time. We can and should
really be releasing the pin *before* those steps. In fact, it would be
appropriate to do it right after the preceding big for-loop.

While it may or may not matter from the performance or correctness
perspective, I think we should fix that.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


Re: [HACKERS] Visibility map page pinned for too long ?

2012-12-03 Thread Simon Riggs
On 3 December 2012 17:37, Pavan Deolasee pavan.deola...@gmail.com wrote:
 I was looking at the code when the following tiny bit caught my attention.
 In vacuumlazy.c, we release the pin on the final VM page at line number 972.

  954 if (vacrelstats-num_dead_tuples  0)
  955 {
  956 /* Log cleanup info before we touch indexes */
  957 vacuum_log_cleanup_info(onerel, vacrelstats);
  958
  959 /* Remove index entries */
  960 for (i = 0; i  nindexes; i++)
  961 lazy_vacuum_index(Irel[i],
  962   indstats[i],
  963   vacrelstats);
  964 /* Remove tuples from heap */
  965 lazy_vacuum_heap(onerel, vacrelstats);
  966 vacrelstats-num_index_scans++;
  967 }
  968
  969 /* Release the pin on the visibility map page */
  970 if (BufferIsValid(vmbuffer))
  971 {
  972 ReleaseBuffer(vmbuffer);
  973 vmbuffer = InvalidBuffer;
  974 }

 So we are holding the pin right through the index vacuuming and the second
 pass over the heap; both can take a very long time. We can and should really
 be releasing the pin *before* those steps. In fact, it would be appropriate
 to do it right after the preceding big for-loop.

 While it may or may not matter from the performance or correctness
 perspective, I think we should fix that.

Yes, its a clear bug. Patched.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and hint bits

2011-05-06 Thread Merlin Moncure
On Thu, May 5, 2011 at 2:20 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Thu, May 5, 2011 at 2:00 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 Merlin Moncure mmonc...@gmail.com wrote:

 a small cache that remembers the commit/cancel status of recently
 seen transactions.

 How is that different from the head of the clog SLRU?

 several things:
 *) any slru access requires lock (besides the lock itself, you are
 spending cycles in critical path)
 *) cache access happens at different stage of processing in
 HeapTupleSatisfiesMVCC: both TransactionIdIsCurrentTransactionId and
 TransactionIdIsInProgress have to be checked first. Logically, it's
 extension of hint bit check itself, not expansion of lower levels of
 caching
 *) in tqual.c you can sneak in some small optimizations like only
 caching the bit if it's known good in the WAL (XlogNeedsFlush).  That
 way you don't need to keep checking it over and over for the same
 trasaction
 *) slru level accesses happen too late to give much benefit:

 I can't stress enough how tight HeapTupleSatisfiesMVCC is.  On my
 workstation VM, each non inline function call shows up measurably in
 profiling.  I think anything you do here has to be inline, hand
 rolled, and very tight (you can forget anything around dynahash).
 Delegating the cache management to transam or (even worse) slru level
 penalizes some workloads non-trivially.

An updated patch is attached.  It's still WIP, but I need a little
guidance before going further.

What I did:
*) Added a lot of source level comments that should explain better
what's happening and why
*) Fixed a significant number of goofs in the earlier patch.
*) Reorganized the interaction with HeapTupleSatisfiesMVCC.  In
particular SetHintBits() is returning if it actually set the bit
because I can use that information.

What's not done:
*) Only commit bits are cached, and caching action is only happening
in HeapTupleSatisfiesMVCC.  I'm not sure yet if it's better to store
invalid bits in the same cache or in a separate one.  I'm not sure if
the other satisfies routines should also be engaging the cache.
Translated from nerd speak, that means I haven't yet done the research
to see when they are fired and if they are bottlenecks :-).

*) I'd like to reach some sort of consensus with Tom if there is any
point in going further in direction.  Not so much on how the mechanics
of how the cache work, but that it is at the tqual.c level and the
changes to HeapTuplesSatisfiesMVCC. In particular.  I think caching at
transam.c level is a dead end on performance grounds regardless of how
you implement the cache.

Some points of note:
*) Is it acceptable to use static definition of memory like that.  If
not, should there be a more standard allocation under
CacheMemoryContext?

*) Testing for the benefit is simple: just create a bunch of records
and seqscan the table (select count(*)).  Without the patch the first
scan is slower and does a bunch of i/o.  With it, it does not.

*) The cache overhead is *almost* not measurable.   As best I can tell
we are looking at maybe 1% ish overhead in synthetic scan heavy
workloads (i think this is a fair price to pay for all the i/o
savings).  The degenerate case of repeated 'rollups' is really
difficult to generate, even synthetically -- if the cache is
performing lousily the regular hint bit action tends to protect it.
Performance testing under real workloads is going to give better info
here.

merlin


hbache.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Visibility map and hint bits

2011-05-05 Thread Bruce Momjian
There has been a lot of recent discussion about the visibility map (for
index-only scans) and hint bits (trying to avoid double-writing a
table).

I wonder if we could fix both of these at the same time.  Once the
visibility map is reliable, can we use that to avoid updating the hint
bits on all rows on a page?

For bulk loads, all the pages are going have the same xid and all be
visible, so instead of writing the entire table, we just write the
visibility map.

I think the problem is that we have the PD_ALL_VISIBLE page flag, which
requires a write of the page as well.  Could we get by with only the
visibility bits and remove PD_ALL_VISIBLE?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and hint bits

2011-05-05 Thread Merlin Moncure
On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian br...@momjian.us wrote:
 There has been a lot of recent discussion about the visibility map (for
 index-only scans) and hint bits (trying to avoid double-writing a
 table).

I still think a small tqual.c maintained cache of hint bits will
effectively eliminate hint bit i/o issues surrounding bulk loads.  Tom
fired a shot across the bow regarding the general worthiness of that
technique though (see:
http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html)
:(.  I can rig up a cleaned up version of the patch pretty
easily...it's a local change and fairly simple.

I don't think there is any way to remove the hint bits without
suffering some other problem.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and hint bits

2011-05-05 Thread Bruce Momjian
Merlin Moncure wrote:
 On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian br...@momjian.us wrote:
  There has been a lot of recent discussion about the visibility map (for
  index-only scans) and hint bits (trying to avoid double-writing a
  table).
 
 I still think a small tqual.c maintained cache of hint bits will
 effectively eliminate hint bit i/o issues surrounding bulk loads.  Tom
 fired a shot across the bow regarding the general worthiness of that
 technique though (see:
 http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html)
 :(.  I can rig up a cleaned up version of the patch pretty
 easily...it's a local change and fairly simple.
 
 I don't think there is any way to remove the hint bits without
 suffering some other problem.

Was that the idea that the pages had to fit in the cache and be updated
with hint bits before being written to disk?  Restricting that to the
size of the buffer cache seemed very limiting.

One 8k visibilty map page can hold bits for 1/2 gig of heap pages so I
thought that would be a better all-visible indictor and avoid many
all-visible page writes in bulk load cases.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and hint bits

2011-05-05 Thread Robert Haas
On Thu, May 5, 2011 at 12:59 PM, Bruce Momjian br...@momjian.us wrote:
 I wonder if we could fix both of these at the same time.  Once the
 visibility map is reliable, can we use that to avoid updating the hint
 bits on all rows on a page?

I don't think so.  There are two problems:

1. If there is a long-running transaction on the system, it will not
be possible to set PD_ALL_VISIBLE, but hint bits can still be set.  So
there could be a significant performance regression if we don't set
hint bits in that case.

2. Making the visibility map crash-safe will mean making setting hint
bits emit XLOG records, so it can't be done on Hot Standby servers at
all, and it's much more expensive than just setting a hint bit on the
master.

 For bulk loads, all the pages are going have the same xid and all be
 visible, so instead of writing the entire table, we just write the
 visibility map.

 I think the problem is that we have the PD_ALL_VISIBLE page flag, which
 requires a write of the page as well.  Could we get by with only the
 visibility bits and remove PD_ALL_VISIBLE?

In some ways, that would make things much simpler.  But to make that
work, every insert/update/delete to a page would have to pin the
visibility map page and clear PD_ALL_VISIBLE if appropriate, so it
might not be good from a performance standpoint, especially in
high-concurrency workloads.  Right now, if PD_ALL_VISIBLE isn't set,
we don't bother touching the visibility map page, which seems like a
possibly important optimization.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and hint bits

2011-05-05 Thread Merlin Moncure
On Thu, May 5, 2011 at 1:34 PM, Bruce Momjian br...@momjian.us wrote:
 Merlin Moncure wrote:
 On Thu, May 5, 2011 at 11:59 AM, Bruce Momjian br...@momjian.us wrote:
  There has been a lot of recent discussion about the visibility map (for
  index-only scans) and hint bits (trying to avoid double-writing a
  table).

 I still think a small tqual.c maintained cache of hint bits will
 effectively eliminate hint bit i/o issues surrounding bulk loads.  Tom
 fired a shot across the bow regarding the general worthiness of that
 technique though (see:
 http://postgresql.1045698.n5.nabble.com/Process-local-hint-bit-cache-td4270229.html)
 :(.  I can rig up a cleaned up version of the patch pretty
 easily...it's a local change and fairly simple.

 I don't think there is any way to remove the hint bits without
 suffering some other problem.

 Was that the idea that the pages had to fit in the cache and be updated
 with hint bits before being written to disk?  Restricting that to the
 size of the buffer cache seemed very limiting.

 One 8k visibilty map page can hold bits for 1/2 gig of heap pages so I
 thought that would be a better all-visible indictor and avoid many
 all-visible page writes in bulk load cases.

no, that was my first idea -- check visibility when you evict.  that
helps a different problem but not bulk loads.  One way it could help
is for marking PD_ALL_VISIBLE.  This might also be a winner but there
is some valid skepticism that adding more work for bgwriter is really
a good idea.

The tqual cache idea is such that there is a small cache that
remembers the commit/cancel status of recently seen transactions. If
scan a tuple and the status is known via cache, you set the bit but
don't mark the page dirty.  That way, if you are scanning a lot of
unhinted tuples with similar xid, you don't need to jam out i/o.  I
think the general concept is clean, but it might need some buy in from
tom and some performance testing for justification.

The alternate 'cleaner' approach of maintaining larger transam.c cache
had some downsides I saw no simple workaround for.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and hint bits

2011-05-05 Thread Kevin Grittner
Merlin Moncure mmonc...@gmail.com wrote:
 
 a small cache that remembers the commit/cancel status of recently
 seen transactions.
 
How is that different from the head of the clog SLRU?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and hint bits

2011-05-05 Thread Merlin Moncure
On Thu, May 5, 2011 at 2:00 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Merlin Moncure mmonc...@gmail.com wrote:

 a small cache that remembers the commit/cancel status of recently
 seen transactions.

 How is that different from the head of the clog SLRU?

several things:
*) any slru access requires lock (besides the lock itself, you are
spending cycles in critical path)
*) cache access happens at different stage of processing in
HeapTupleSatisfiesMVCC: both TransactionIdIsCurrentTransactionId and
TransactionIdIsInProgress have to be checked first. Logically, it's
extension of hint bit check itself, not expansion of lower levels of
caching
*) in tqual.c you can sneak in some small optimizations like only
caching the bit if it's known good in the WAL (XlogNeedsFlush).  That
way you don't need to keep checking it over and over for the same
trasaction
*) slru level accesses happen too late to give much benefit:

I can't stress enough how tight HeapTupleSatisfiesMVCC is.  On my
workstation VM, each non inline function call shows up measurably in
profiling.  I think anything you do here has to be inline, hand
rolled, and very tight (you can forget anything around dynahash).
Delegating the cache management to transam or (even worse) slru level
penalizes some workloads non-trivially.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map

2010-11-23 Thread 高增琦
Can we just log the change of VM in log_heap_clean() for redo?
Thanks

--
GaoZengqi
pgf...@gmail.com
zengqi...@gmail.com


On Tue, Nov 23, 2010 at 3:24 AM, Robert Haas robertmh...@gmail.com wrote:

 On Mon, Jun 14, 2010 at 1:19 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  I *think* that the answer to this parenthesized question is no.
  When we vacuum a page, we set the LSN on both the heap page and the
  visibility map page.  Therefore, neither of them can get written to
  disk until the WAL record is flushed, but they could get flushed in
  either order.  So the visibility map page could get flushed before the
  heap page, as the non-parenthesized portion of the comment indicates.
 
  Right.
 
  However, at least in theory, it seems like we could fix this up during
  redo.
 
  Setting a bit in the visibility map is currently not WAL-logged, but yes
  once we add WAL-logging, that's straightforward to fix.

 Eh, so.  Suppose - for the sake of argument - we do the following:

 1. Allocate an additional infomask(2) bit that means xmin is frozen,
 no need to call XidInMVCCSnapshot().  When we freeze a tuple, we set
 this bit in lieu of overwriting xmin.  Note that freezing pages is
 already WAL-logged, so redo is possible.

 2. Modify VACUUM so that, when the page is observed to be all-visible,
 it will freeze all tuples on the page, set PD_ALL_VISIBLE, and set the
 visibility map bit, writing a single XLOG record for the whole
 operation (possibly piggybacking on XLOG_HEAP2_CLEAN if the same
 vacuum already removed tuples; otherwise and/or when no tuples were
 removed writing XLOG_HEAP2_FREEZE or some new record type).  This
 loses no forensic information because of (1).  (If the page is NOT
 observed to be all-visible, we freeze individual tuples only when they
 hit the current age thresholds.)

 Setting the visibility map bit is now crash-safe.

 Please poke holes.

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] visibility map

2010-11-23 Thread Heikki Linnakangas

On 22.11.2010 21:24, Robert Haas wrote:

Eh, so.  Suppose - for the sake of argument - we do the following:

1. Allocate an additional infomask(2) bit that means xmin is frozen,
no need to call XidInMVCCSnapshot().  When we freeze a tuple, we set
this bit in lieu of overwriting xmin.  Note that freezing pages is
already WAL-logged, so redo is possible.

2. Modify VACUUM so that, when the page is observed to be all-visible,
it will freeze all tuples on the page, set PD_ALL_VISIBLE, and set the
visibility map bit, writing a single XLOG record for the whole
operation (possibly piggybacking on XLOG_HEAP2_CLEAN if the same
vacuum already removed tuples; otherwise and/or when no tuples were
removed writing XLOG_HEAP2_FREEZE or some new record type).  This
loses no forensic information because of (1).  (If the page is NOT
observed to be all-visible, we freeze individual tuples only when they
hit the current age thresholds.)

Setting the visibility map bit is now crash-safe.


That's an interesting idea. You pickyback setting the vm bit on the 
freeze WAL record, on the assumption that you have to write the freeze 
record anyway. However, if that assumption doesn't hold, because the 
tuples are deleted before they reach vacuum_freeze_min_age, it's no 
better than the naive approach of WAL-logging the vm bit set separately. 
Whether that's acceptable or not, I don't know.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map

2010-11-23 Thread Robert Haas
On Tue, Nov 23, 2010 at 3:42 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 That's an interesting idea. You pickyback setting the vm bit on the freeze
 WAL record, on the assumption that you have to write the freeze record
 anyway. However, if that assumption doesn't hold, because the tuples are
 deleted before they reach vacuum_freeze_min_age, it's no better than the
 naive approach of WAL-logging the vm bit set separately. Whether that's
 acceptable or not, I don't know.

I don't know, either.  I was trying to think of the cases where this
would generate a net increase in WAL before I sent the email, but
couldn't fully wrap my brain around it at the time.  Thanks for
summarizing.

Here's another design to poke holes in:

1. Imagine that the visibility map is divided into granules.  For the
sake of argument let's suppose there are 8K bits per granule; thus
each granule covers 64M of the underlying heap and 1K of space in the
visibility map itself.

2. In shared memory, create a new array called the visibility vacuum
array (VVA), each element of which has room for a backend ID, a
relfilenode, a granule number, and an LSN.  Before setting bits in the
visibility map, a backend is required to allocate a slot in this
array, XLOG the slot allocation, and fill in its backend ID,
relfilenode number, and the granule number whose bits it will be
manipulating, plus the LSN of the slot allocation XLOG record.  It
then sets as many bits within that granule as it likes.  When done, it
sets the backend ID of the VVA slot to InvalidBackendId but does not
remove it from the array immediately; such a slot is said to have been
released.

3. When visibility map bits are set, the LSN of the page is set to the
new-VVA-slot XLOG record, so that the visibility map page can't hit
the disk before the new-VVA-slot XLOG record.  Also, the contents of
the VVA, sans backend IDs, are XLOG'd at each checkpoint.  Thus, on
redo, we can compute a list of all VVA slots for which visibility-bit
changes might already be on disk; we go through and clear both the
visibility map bit and the PD_ALL_VISIBLE bits on the underlying
pages.

4. To free a VVA slot that has been released, we must xlogflush as far
as the record that allocated the slot and sync the visibility map and
heap segments containing that granule.  Thus, all slots released
before a checkpoint starts can be freed after it completes.
Alternatively, an individual backend can free a previously-released
slot by perfoming the xlog flush and syncs itself.  (This might
require a few more bookkeeping details to be stored in the VVA, but it
seems manageable.)

One problem with this design is that the visibility map bits never get
set on standby servers.  If we don't XLOG setting the bit then I
suppose that doesn't happen now either, but it's more sucky (that's
the technical term) if you're relying on it for index-only scans
(which are also relevant on the standby, either during HS or if
promoted) versus if you're only relying on it for vacuum (which
doesn't happen on the standby anyway unless and until it's promoted).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map

2010-11-22 Thread Robert Haas
On Mon, Jun 14, 2010 at 1:19 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 I *think* that the answer to this parenthesized question is no.
 When we vacuum a page, we set the LSN on both the heap page and the
 visibility map page.  Therefore, neither of them can get written to
 disk until the WAL record is flushed, but they could get flushed in
 either order.  So the visibility map page could get flushed before the
 heap page, as the non-parenthesized portion of the comment indicates.

 Right.

 However, at least in theory, it seems like we could fix this up during
 redo.

 Setting a bit in the visibility map is currently not WAL-logged, but yes
 once we add WAL-logging, that's straightforward to fix.

Eh, so.  Suppose - for the sake of argument - we do the following:

1. Allocate an additional infomask(2) bit that means xmin is frozen,
no need to call XidInMVCCSnapshot().  When we freeze a tuple, we set
this bit in lieu of overwriting xmin.  Note that freezing pages is
already WAL-logged, so redo is possible.

2. Modify VACUUM so that, when the page is observed to be all-visible,
it will freeze all tuples on the page, set PD_ALL_VISIBLE, and set the
visibility map bit, writing a single XLOG record for the whole
operation (possibly piggybacking on XLOG_HEAP2_CLEAN if the same
vacuum already removed tuples; otherwise and/or when no tuples were
removed writing XLOG_HEAP2_FREEZE or some new record type).  This
loses no forensic information because of (1).  (If the page is NOT
observed to be all-visible, we freeze individual tuples only when they
hit the current age thresholds.)

Setting the visibility map bit is now crash-safe.

Please poke holes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] visibility map

2010-06-13 Thread Robert Haas
visibilitymap.c begins with a long and useful comment - but this part
seems to have a bit of split personality disorder.

 * Currently, the visibility map is not 100% correct all the time.
 * During updates, the bit in the visibility map is cleared after releasing
 * the lock on the heap page. During the window between releasing the lock
 * and clearing the bit in the visibility map, the bit in the visibility map
 * is set, but the new insertion or deletion is not yet visible to other
 * backends.
 *
 * That might actually be OK for the index scans, though. The newly inserted
 * tuple wouldn't have an index pointer yet, so all tuples reachable from an
 * index would still be visible to all other backends, and deletions wouldn't
 * be visible to other backends yet.  (But HOT breaks that argument, no?)

I believe that the answer to the parenthesized question here is yes
(in which case we might want to just delete this paragraph).

 * There's another hole in the way the PD_ALL_VISIBLE flag is set. When
 * vacuum observes that all tuples are visible to all, it sets the flag on
 * the heap page, and also sets the bit in the visibility map. If we then
 * crash, and only the visibility map page was flushed to disk, we'll have
 * a bit set in the visibility map, but the corresponding flag on the heap
 * page is not set. If the heap page is then updated, the updater won't
 * know to clear the bit in the visibility map.  (Isn't that prevented by
 * the LSN interlock?)

I *think* that the answer to this parenthesized question is no.
When we vacuum a page, we set the LSN on both the heap page and the
visibility map page.  Therefore, neither of them can get written to
disk until the WAL record is flushed, but they could get flushed in
either order.  So the visibility map page could get flushed before the
heap page, as the non-parenthesized portion of the comment indicates.
However, at least in theory, it seems like we could fix this up during
redo.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map

2010-06-13 Thread Heikki Linnakangas

On 14/06/10 06:08, Robert Haas wrote:

visibilitymap.c begins with a long and useful comment - but this part
seems to have a bit of split personality disorder.

  * Currently, the visibility map is not 100% correct all the time.
  * During updates, the bit in the visibility map is cleared after releasing
  * the lock on the heap page. During the window between releasing the lock
  * and clearing the bit in the visibility map, the bit in the visibility map
  * is set, but the new insertion or deletion is not yet visible to other
  * backends.
  *
  * That might actually be OK for the index scans, though. The newly inserted
  * tuple wouldn't have an index pointer yet, so all tuples reachable from an
  * index would still be visible to all other backends, and deletions wouldn't
  * be visible to other backends yet.  (But HOT breaks that argument, no?)

I believe that the answer to the parenthesized question here is yes
(in which case we might want to just delete this paragraph).


A HOT update can only update non-indexed columns, so I think we're still 
OK with HOT. To an index-only scan, it doesn't matter which tuple in a 
HOT update chain you consider as live, because they both must all the 
same value in the indexed columns. Subtle..



  * There's another hole in the way the PD_ALL_VISIBLE flag is set. When
  * vacuum observes that all tuples are visible to all, it sets the flag on
  * the heap page, and also sets the bit in the visibility map. If we then
  * crash, and only the visibility map page was flushed to disk, we'll have
  * a bit set in the visibility map, but the corresponding flag on the heap
  * page is not set. If the heap page is then updated, the updater won't
  * know to clear the bit in the visibility map.  (Isn't that prevented by
  * the LSN interlock?)

I *think* that the answer to this parenthesized question is no.
When we vacuum a page, we set the LSN on both the heap page and the
visibility map page.  Therefore, neither of them can get written to
disk until the WAL record is flushed, but they could get flushed in
either order.  So the visibility map page could get flushed before the
heap page, as the non-parenthesized portion of the comment indicates.


Right.


However, at least in theory, it seems like we could fix this up during
redo.


Setting a bit in the visibility map is currently not WAL-logged, but yes 
once we add WAL-logging, that's straightforward to fix.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-22 Thread Heikki Linnakangas

Heikki Linnakangas wrote:

ITAGAKI Takahiro wrote:

- What relation are there between autovacuum_freeze_max_age,
  vacuum_freeze_min_age and vacuum_freeze_table_age? If we increase
  one of them, should we also increase the others?


Yeah, that's a fair question. I'll try to work a doc patch to explain 
that better.


Ok, how does this sound:

para
+commandVACUUM/ normally skips pages that don't have any dead row
+versions, but those pages might still have tuples with old XID values.
+To replace them too, a scan of the whole table is needed every once
+in a while. varnamevacuum_freeze_table_age/ controls when
+commandVACUUM/ does that: a whole table sweep is forced if
+structfieldrelfrozenxid/ is more than
+varnamevacuum_freeze_table_age/ transactions old. Setting it to 0
+makes commandVACUUM/ to ignore the visibility map and always 
scan all
+pages.  The effective maximum is 0.95 * 
varnameautovacuum_freeze_max_age/;

+a setting higher than that will be capped to that maximum. A value
+higher than varnameautovacuum_freeze_max_age/ wouldn't make sense
+because an anti-wraparound autovacuum would be triggered at that point
+anyway, and the 0.95 multiplier leaves some breathing room to run a 
manual

+commandVACUUM/ before that happens.  As a rule of thumb,
+commandvacuum_freeze_table_age/ should be set to a value somewhat
+below varnameautovacuum_freeze_max_age/. Setting it too close could
+lead to anti-wraparound autovacuums, even though the table was recently
+manually vacuumed, whereas lower values lead to more frequent 
whole-table

+scans.
+   /para

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-22 Thread Heikki Linnakangas

Euler Taveira de Oliveira wrote:

Simon Riggs escreveu:

On Tue, 2009-01-20 at 21:51 +0200, Heikki Linnakangas wrote:


Attached is a simple patch to only start skipping pages after 20
consecutive pages marked as visible in the visibility map. This doesn't 
do any look-ahead, so it will always scan the first 20 pages of a 
table before it starts to skip pages, and whenever there's even one page 
that needs vacuuming, the next 19 pages will also be vacuumed.


We could adjust that figure 20 according to table size. Or by 
seq_page_cost/random_page_cost. But I'm leaning towards a simple 
hard-coded value for now.

Yes, sounds good. Can we stick to multiples of 2 as the OS readahead
does IIRC? So either 16 or 32. I'd go 32.


Agreed. And do it a constant (ALL_VISIBLE_VM_THRESHOLD?).


Okay-dokay. I committed this with the constant as a #define, at value 32.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-20 Thread Heikki Linnakangas

ITAGAKI Takahiro wrote:

Gregory Stark st...@enterprisedb.com wrote:


I don't think we can perfectly capture the meaning of these GUCs in the
name. I think our goal should be to avoid confusion between them.

I was thinking it would be clearer if the options which control *when*
autovacuum fires off a worker consistently had some action word in them like
trigger or start or launch.


I think we need more explanations about those variables,
not only how to work but also how to tune them.
I feel they are un-tunable parameters.

Our documentation says:
| Larger values of these settings
| preserve transactional information longer, while smaller values increase
| the number of transactions that can elapse before the table must be
| vacuumed again.
i.e, we are explaining the variables only as Larger is better,
but is it really true?


Yes, that is explicitly explained in the docs:


The sole disadvantage of increasing varnamevacuum_freeze_table_age/
and varnameautovacuum_freeze_max_age/
is that the filenamepg_clog/ subdirectory of the database cluster
will take more space, because it must store the commit status for all
transactions back to the varnameautovacuum_freeze_max_age/ horizon.

 ...



- What relation are there between autovacuum_freeze_max_age,
  vacuum_freeze_min_age and vacuum_freeze_table_age? If we increase
  one of them, should we also increase the others?


Yeah, that's a fair question. I'll try to work a doc patch to explain 
that better.



- Is it ok to increase the variables to maximum values?
  Are there any trade-off?


Disk space.


- Are there some conditions where whole-table-scanning vacuum is more
  effective than vacuums using visibility map? If so, we should switch
  to full-scan *automatically*, without relying on user configurations.


Hmm, the only downside I can see is that skipping a page here and there 
could defeat the OS read-ahead. Perhaps we should call 
posix_fadvise(SEQUENTIAL) to compensate. Or, we could modify the logic 
to only skip pages when there's at least N consecutive pages that can be 
skipped.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-20 Thread Simon Riggs

On Tue, 2009-01-20 at 11:03 +0200, Heikki Linnakangas wrote:
  - Are there some conditions where whole-table-scanning vacuum is
 more
effective than vacuums using visibility map? If so, we should
 switch
to full-scan *automatically*, without relying on user
 configurations.
 
 Hmm, the only downside I can see is that skipping a page here and
 there could defeat the OS read-ahead. Perhaps we should call 
 posix_fadvise(SEQUENTIAL) to compensate. Or, we could modify the logic
 to only skip pages when there's at least N consecutive pages that can
 be skipped.

I would rather we didn't skip any pages at all unless the gains are
significant. Skipping the odd page makes no difference from a
performance perspective but may have a robustness impact.

Significant gains should take into account the size of both heap and
indexes, and recognise that we still scan whole indexes in either case.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-20 Thread Heikki Linnakangas

Simon Riggs wrote:

On Tue, 2009-01-20 at 11:03 +0200, Heikki Linnakangas wrote:

- Are there some conditions where whole-table-scanning vacuum is

more

  effective than vacuums using visibility map? If so, we should

switch

  to full-scan *automatically*, without relying on user

configurations.

Hmm, the only downside I can see is that skipping a page here and
there could defeat the OS read-ahead. Perhaps we should call 
posix_fadvise(SEQUENTIAL) to compensate. Or, we could modify the logic

to only skip pages when there's at least N consecutive pages that can
be skipped.


I would rather we didn't skip any pages at all unless the gains are
significant. Skipping the odd page makes no difference from a
performance perspective but may have a robustness impact.

Significant gains should take into account the size of both heap and
indexes, and recognise that we still scan whole indexes in either case.


That sounds pretty complex, approaching what the planner does. I'd 
rather keep it simple.


Attached is a simple patch to only start skipping pages after 20 
consecutive pages marked as visible in the visibility map. This doesn't 
do any look-ahead, so it will always scan the first 20 pages of a 
table before it starts to skip pages, and whenever there's even one page 
that needs vacuuming, the next 19 pages will also be vacuumed.


We could adjust that figure 20 according to table size. Or by 
seq_page_cost/random_page_cost. But I'm leaning towards a simple 
hard-coded value for now.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** src/backend/commands/vacuumlazy.c
--- src/backend/commands/vacuumlazy.c
***
*** 271,276  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
--- 271,277 
  	int			i;
  	PGRUsage	ru0;
  	Buffer		vmbuffer = InvalidBuffer;
+ 	BlockNumber	all_visible_streak;
  
  	pg_rusage_init(ru0);
  
***
*** 292,297  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
--- 293,299 
  
  	lazy_space_alloc(vacrelstats, nblocks);
  
+ 	all_visible_streak = 0;
  	for (blkno = 0; blkno  nblocks; blkno++)
  	{
  		Buffer		buf;
***
*** 309,315  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
  
  		/*
  		 * Skip pages that don't require vacuuming according to the
! 		 * visibility map.
  		 */
  		if (!scan_all)
  		{
--- 311,324 
  
  		/*
  		 * Skip pages that don't require vacuuming according to the
! 		 * visibility map. But only if we've seen a streak of at least
! 		 * 20 pages marked as clean. Since we're reading sequentially,
! 		 * the OS should be doing readahead for us and there's no gain
! 		 * in skipping a page now and then. You need a longer run of
! 		 * consecutive skipped pages before it's worthwhile. Also,
! 		 * skipping even a single page means that we can't update
! 		 * relfrozenxid or reltuples, so we only want to do it if
! 		 * there's a good chance to skip a goodly number of pages.
  		 */
  		if (!scan_all)
  		{
***
*** 317,325  lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
  visibilitymap_test(onerel, blkno, vmbuffer);
  			if (all_visible_according_to_vm)
  			{
! vacrelstats-scanned_all = false;
! continue;
  			}
  		}
  
  		vacuum_delay_point();
--- 326,340 
  visibilitymap_test(onerel, blkno, vmbuffer);
  			if (all_visible_according_to_vm)
  			{
! all_visible_streak++;
! if (all_visible_streak = 20)
! {
! 	vacrelstats-scanned_all = false;
! 	continue;
! }
  			}
+ 			else
+ all_visible_streak = 0;
  		}
  
  		vacuum_delay_point();

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-20 Thread Simon Riggs

On Tue, 2009-01-20 at 21:51 +0200, Heikki Linnakangas wrote:

 Attached is a simple patch to only start skipping pages after 20 
 consecutive pages marked as visible in the visibility map. This doesn't 
 do any look-ahead, so it will always scan the first 20 pages of a 
 table before it starts to skip pages, and whenever there's even one page 
 that needs vacuuming, the next 19 pages will also be vacuumed.
 
 We could adjust that figure 20 according to table size. Or by 
 seq_page_cost/random_page_cost. But I'm leaning towards a simple 
 hard-coded value for now.

Yes, sounds good. Can we stick to multiples of 2 as the OS readahead
does IIRC? So either 16 or 32. I'd go 32.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-20 Thread Euler Taveira de Oliveira
Simon Riggs escreveu:
 On Tue, 2009-01-20 at 21:51 +0200, Heikki Linnakangas wrote:
 
 Attached is a simple patch to only start skipping pages after 20 
 consecutive pages marked as visible in the visibility map. This doesn't 
 do any look-ahead, so it will always scan the first 20 pages of a 
 table before it starts to skip pages, and whenever there's even one page 
 that needs vacuuming, the next 19 pages will also be vacuumed.

 We could adjust that figure 20 according to table size. Or by 
 seq_page_cost/random_page_cost. But I'm leaning towards a simple 
 hard-coded value for now.
 
 Yes, sounds good. Can we stick to multiples of 2 as the OS readahead
 does IIRC? So either 16 or 32. I'd go 32.
 
Agreed. And do it a constant (ALL_VISIBLE_VM_THRESHOLD?).


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-18 Thread ITAGAKI Takahiro

Gregory Stark st...@enterprisedb.com wrote:

  I don't think we can perfectly capture the meaning of these GUCs in the
  name. I think our goal should be to avoid confusion between them.
 
 I was thinking it would be clearer if the options which control *when*
 autovacuum fires off a worker consistently had some action word in them like
 trigger or start or launch.

I think we need more explanations about those variables,
not only how to work but also how to tune them.
I feel they are un-tunable parameters.

Our documentation says:
| Larger values of these settings
| preserve transactional information longer, while smaller values increase
| the number of transactions that can elapse before the table must be
| vacuumed again.
i.e, we are explaining the variables only as Larger is better,
but is it really true?

I think we should have answers about the following questions:

- What relation are there between autovacuum_freeze_max_age,
  vacuum_freeze_min_age and vacuum_freeze_table_age? If we increase
  one of them, should we also increase the others?

- Is it ok to increase the variables to maximum values?
  Are there any trade-off?

- Are there some conditions where whole-table-scanning vacuum is more
  effective than vacuums using visibility map? If so, we should switch
  to full-scan *automatically*, without relying on user configurations.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-16 Thread Gregory Stark
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:

 Jeff Davis wrote:
 On Fri, 2009-01-09 at 13:49 +0200, Heikki Linnakangas wrote:

 I'm now leaning towards:

 autovacuum_freeze_max_age
 vacuum_freeze_table_age
 vacuum_freeze_min_age

 where autovacuum_freeze_max_age and vacuum_freeze_min_age are unchanged, and
 vacuum_freeze_table_age is the new setting that controls when VACUUM or
 autovacuum should perform a full scan of the table to advance relfrozenxid.

 I'm still bothered by the fact that max and min really mean the same
 thing here.

 Yeah. Those are existing names, though, and I don't recall any complaints from
 users.

 I don't think we can perfectly capture the meaning of these GUCs in the
 name. I think our goal should be to avoid confusion between them.

 Agreed.

I was thinking it would be clearer if the options which control *when*
autovacuum fires off a worker consistently had some action word in them like
trigger or start or launch.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-16 Thread Heikki Linnakangas

Alvaro Herrera wrote:

Heikki Linnakangas escribió:

This patch adds a new column to pg_autovacuum, reflecting the  new
vacuum_freeze_table_age GUC just like freeze_min_age column reflects
vacuum_freeze_min_age. If I commit this tomorrow, will I cause you a
lot  of trouble with the reloptions patch?


No -- go ahead, I'll merge it.


Ok, committed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2009-01-15 Thread Bruce Momjian
Heikki Linnakangas wrote:
  Also, is anything being done about the concern about 'vacuum storm'
  explained below?
  
  I'm interested too.
 
 The additional vacuum_freeze_table_age (as I'm now calling it) setting 
 I discussed in a later thread should alleviate that somewhat. When a 
 table is autovacuumed, the whole table is scanned to freeze tuples if 
 it's older than vacuum_freeze_table_age, and relfrozenxid is advanced. 
 When different tables reach the autovacuum threshold at different times, 
 they will also have their relfrozenxids set to different values. And in 
 fact no anti-wraparound vacuum is needed.
 
 That doesn't help with read-only or insert-only tables, but that's not a 
 new problem.

OK, is this targeted for 8.4?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2009-01-15 Thread Bruce Momjian
Gregory Stark wrote:
 Bruce Momjian br...@momjian.us writes:
 
  Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
  when our wraparound limit is around 2B?
 
 I suggested raising it dramatically in the post you quote and Heikki pointed
 it controls the maximum amount of space the clog will take. Raising it to,
 say, 800M will mean up to 200MB of space which might be kind of annoying for a
 small database.
 
 It would be nice if we could ensure the clog got trimmed frequently enough on
 small databases that we could raise the max_age. It's really annoying to see
 all these vacuums running 10x more often than necessary.

I always assumed that it was our 4-byte xid that was requiring our
vacuum freeze, but I now see our limiting factor is the size of clog;
interesting.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2009-01-15 Thread Heikki Linnakangas

Bruce Momjian wrote:

Heikki Linnakangas wrote:

Also, is anything being done about the concern about 'vacuum storm'
explained below?

I'm interested too.
The additional vacuum_freeze_table_age (as I'm now calling it) setting 
I discussed in a later thread should alleviate that somewhat. When a 
table is autovacuumed, the whole table is scanned to freeze tuples if 
it's older than vacuum_freeze_table_age, and relfrozenxid is advanced. 
When different tables reach the autovacuum threshold at different times, 
they will also have their relfrozenxids set to different values. And in 
fact no anti-wraparound vacuum is needed.


That doesn't help with read-only or insert-only tables, but that's not a 
new problem.


OK, is this targeted for 8.4?


Yes. It's been on my todo list for a long time, and I've also added it 
to the Open Items list so that we don't lose track of it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-15 Thread Heikki Linnakangas

Jeff Davis wrote:

On Fri, 2009-01-09 at 13:49 +0200, Heikki Linnakangas wrote:
If the distinction you're making is that autovacuum_freeze_max_age
affects the launching of a vacuum rather than the behavior of a vacuum,
maybe we could incorporate the word launch like:

autovacuum_launch_freeze_threshold


Hmm, I think I'd like it in the form autovacuum_freeze_launch_threshold 
more.



I'm now leaning towards:

autovacuum_freeze_max_age
vacuum_freeze_table_age
vacuum_freeze_min_age

where autovacuum_freeze_max_age and vacuum_freeze_min_age are unchanged, 
and vacuum_freeze_table_age is the new setting that controls when VACUUM 
or autovacuum should perform a full scan of the table to advance 
relfrozenxid.


I'm still bothered by the fact that max and min really mean the same
thing here.


Yeah. Those are existing names, though, and I don't recall any 
complaints from users.



I don't think we can perfectly capture the meaning of these GUCs in the
name. I think our goal should be to avoid confusion between them.


Agreed.

Well, for better or worse here's a patch leaving the existing setting 
names alone, and the new GUC is called vacuum_freeze_table_age. I'm 
not opposed to changing the names of the existing GUCs. If we do change 
them, the columns in pg_autovacuum need to be changed too.


There's documentation changes included to descríbe the new GUC, and the 
new behavior of VACUUM with visibility map in general. Is it readable, 
and is it enough?


Alvaro, are you getting rid of pg_autovacuum in favor of the reloptions 
for 8.4? This patch adds a new column to pg_autovacuum, reflecting the 
new vacuum_freeze_table_age GUC just like freeze_min_age column reflects 
vacuum_freeze_min_age. If I commit this tomorrow, will I cause you a lot 
of trouble with the reloptions patch?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** doc/src/sgml/catalogs.sgml
--- doc/src/sgml/catalogs.sgml
***
*** 1361,1366 
--- 1361,1373 
entry/entry
entryCustom varnameautovacuum_freeze_max_age/ parameter/entry
   /row
+ 
+  row
+   entrystructfieldfreeze_table_age/structfield/entry
+   entrytypeinteger/type/entry
+   entry/entry
+   entryCustom varnamevacuum_freeze_table_age/ parameter/entry
+  /row
  /tbody
 /tgroup
/table
*** doc/src/sgml/config.sgml
--- doc/src/sgml/config.sgml
***
*** 3950,3955  COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
--- 3950,3975 
/listitem
   /varlistentry
  
+  varlistentry id=guc-vacuum-freeze-max-age xreflabel=vacuum_freeze_max_age
+   termvarnamevacuum_freeze_max_age/varname (typeinteger/type)/term
+   indexterm
+primaryvarnamevacuum_freeze_max_age/ configuration parameter/primary
+   /indexterm
+   listitem
+para
+ commandVACUUM/ performs a whole-table scan if the table's
+ structnamepg_class/.structfieldrelfrozenxid/ field reaches the
+ age specified by this setting.  The default is 150 million
+ transactions.  Although users can set this value anywhere from zero to
+ one billion, commandVACUUM/ will silently limit the effective value
+ to the value of xref linkend=guc-autovacuum-freeze-max-age minus
+ 1 million transactions, so that regular manual commandVACUUM/ has a
+ chance to run before autovacuum is launched to prevent XID wraparound.
+ For more information see xref linkend=vacuum-for-wraparound.
+/para
+   /listitem
+  /varlistentry
+ 
   varlistentry id=guc-vacuum-freeze-min-age xreflabel=vacuum_freeze_min_age
termvarnamevacuum_freeze_min_age/varname (typeinteger/type)/term
indexterm
***
*** 3960,3966  COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
  Specifies the cutoff age (in transactions) that commandVACUUM/
  should use to decide whether to replace transaction IDs with
  literalFrozenXID/ while scanning a table.
! The default is 100 million transactions.  Although
  users can set this value anywhere from zero to one billion,
  commandVACUUM/ will silently limit the effective value to half
  the value of xref linkend=guc-autovacuum-freeze-max-age, so
--- 3980,3986 
  Specifies the cutoff age (in transactions) that commandVACUUM/
  should use to decide whether to replace transaction IDs with
  literalFrozenXID/ while scanning a table.
! The default is 50 million transactions.  Although
  users can set this value anywhere from zero to one billion,
  commandVACUUM/ will silently limit the effective value to half
  the value of xref linkend=guc-autovacuum-freeze-max-age, so
***
*** 3971,3976  COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
--- 3991,4017 
/listitem
   /varlistentry
  
+  varlistentry 

Re: [HACKERS] Visibility map and freezing

2009-01-15 Thread Alvaro Herrera
Heikki Linnakangas escribió:

 Alvaro, are you getting rid of pg_autovacuum in favor of the reloptions  
 for 8.4?

That's the intention, yes.

 This patch adds a new column to pg_autovacuum, reflecting the  new
 vacuum_freeze_table_age GUC just like freeze_min_age column reflects
 vacuum_freeze_min_age. If I commit this tomorrow, will I cause you a
 lot  of trouble with the reloptions patch?

No -- go ahead, I'll merge it.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2009-01-14 Thread Bruce Momjian

Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
when our wraparound limit is around 2B?

Also, is anything being done about the concern about 'vacuum storm'
explained below?

---

Gregory Stark wrote:
 Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 
  Hmm. It just occurred to me that I think this circumvented the 
  anti-wraparound
  vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need 
  to
  disable the skipping when autovacuum is triggered to prevent wraparound. 
  VACUUM
  FREEZE does that already, but it's unnecessarily aggressive in freezing.
 
 Having seen how the anti-wraparound vacuums work in the field I think merely
 replacing it with a regular vacuum which covers the whole table will not
 actually work well.
 
 What will happen is that, because nothing else is advancing the relfrozenxid,
 the age of the relfrozenxid for all tables will advance until they all hit
 autovacuum_max_freeze_age. Quite often all the tables were created around the
 same time so they will all hit autovacuum_max_freeze_age at the same time.
 
 So a database which was operating fine and receiving regular vacuums at a
 reasonable pace will suddenly be hit by vacuums for every table all at the
 same time, 3 at a time. If you don't have vacuum_cost_delay set that will
 cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
 the small busy tables from getting vacuumed regularly due to the backlog in
 anti-wraparound vacuums.
 
 Worse, vacuum will set the freeze_xid to nearly the same value for all of the
 tables. So it will all happen again in another 100M transactions. And again in
 another 100M transactions, and again...
 
 I think there are several things which need to happen here.
 
 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
means unnecessary full table vacuums long before they accomplish anything.
 
 2) Include a factor which spreads out the anti-wraparound freezes in the
autovacuum launcher. Some ideas:
 
 . we could implicitly add random(vacuum_freeze_min_age) to the
   autovacuum_max_freeze_age. That would spread them out evenly over 100M
   transactions.
 
 . we could check if another anti-wraparound vacuum is still running and
   implicitly add a vacuum_freeze_min_age penalty to the
   autovacuum_max_freeze_age for each running anti-wraparound vacuum. That
   would spread them out without being introducing non-determinism which
   seems better.
 
 . we could leave autovacuum_max_freeze_age and instead pick a semi-random
   vacuum_freeze_min_age. This would mean the first set of anti-wraparound
   vacuums would still be synchronized but subsequent ones might be spread
   out somewhat. There's not as much room to randomize this though and it
   would affect how much i/o vacuum did which makes it seem less palatable
   to me.
 
 3) I also think we need to put a clamp on the vacuum_cost_delay. Too many
people are setting it to unreasonably high values which results in their
vacuums never completing. Actually I think what we should do is junk all
the existing parameters and replace it with a vacuum_nice_level or
vacuum_bandwidth_cap from which we calculate the cost_limit and hide all
the other parameters as internal parameters.
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
   Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
 training!
 
 -- 
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2009-01-14 Thread Andrew Dunstan



Bruce Momjian wrote:

Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
when our wraparound limit is around 2B?
  


Presumably because of this (from the docs):

The commit status uses two bits per transaction, so if 
autovacuum_freeze_max_age has its maximum allowed value of a little less 
than two billion, pg_clog can be expected to grow to about half a gigabyte.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2009-01-14 Thread Bruce Momjian
Andrew Dunstan wrote:
 
 
 Bruce Momjian wrote:
  Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
  when our wraparound limit is around 2B?

 
 Presumably because of this (from the docs):
 
 The commit status uses two bits per transaction, so if 
 autovacuum_freeze_max_age has its maximum allowed value of a little less 
 than two billion, pg_clog can be expected to grow to about half a gigabyte.

Oh, that's interesting;  thanks.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2009-01-14 Thread Gregory Stark
Bruce Momjian br...@momjian.us writes:

 Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
 when our wraparound limit is around 2B?

I suggested raising it dramatically in the post you quote and Heikki pointed
it controls the maximum amount of space the clog will take. Raising it to,
say, 800M will mean up to 200MB of space which might be kind of annoying for a
small database.

It would be nice if we could ensure the clog got trimmed frequently enough on
small databases that we could raise the max_age. It's really annoying to see
all these vacuums running 10x more often than necessary.

The rest of the thread is visible at the bottom of:

http://article.gmane.org/gmane.comp.db.postgresql.devel.general/107525

 Also, is anything being done about the concern about 'vacuum storm'
 explained below?

I'm interested too.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2009-01-14 Thread Heikki Linnakangas

Gregory Stark wrote:

Bruce Momjian br...@momjian.us writes:


Would someone tell me why 'autovacuum_freeze_max_age' defaults to 200M
when our wraparound limit is around 2B?


I suggested raising it dramatically in the post you quote and Heikki pointed
it controls the maximum amount of space the clog will take. Raising it to,
say, 800M will mean up to 200MB of space which might be kind of annoying for a
small database.

It would be nice if we could ensure the clog got trimmed frequently enough on
small databases that we could raise the max_age. It's really annoying to see
all these vacuums running 10x more often than necessary.


Well, if it's a small database, you might as well just vacuum it.


The rest of the thread is visible at the bottom of:

http://article.gmane.org/gmane.comp.db.postgresql.devel.general/107525


Also, is anything being done about the concern about 'vacuum storm'
explained below?


I'm interested too.


The additional vacuum_freeze_table_age (as I'm now calling it) setting 
I discussed in a later thread should alleviate that somewhat. When a 
table is autovacuumed, the whole table is scanned to freeze tuples if 
it's older than vacuum_freeze_table_age, and relfrozenxid is advanced. 
When different tables reach the autovacuum threshold at different times, 
they will also have their relfrozenxids set to different values. And in 
fact no anti-wraparound vacuum is needed.


That doesn't help with read-only or insert-only tables, but that's not a 
new problem.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-09 Thread Heikki Linnakangas

Jeff Davis wrote:

On Wed, 2009-01-07 at 09:34 +0200, Heikki Linnakangas wrote:

autovacuum_freeze_max_age - autovacuum_freeze_scan_age
vacuum_freeze_max_age   - vacuum_freeze_scan_age
vacuum_freeze_min_age   - vacuum_freeze_tuple_age

*_scan_age settings control when the table is fully scanned to freeze 
tuples and advance relfrozenxid, and vacuum_freeze_tuple_age controls 
how old a tuple needs to be to be frozen. One objection is that you can 
read freeze_scan to mean that a scan is frozen, like a tuple is 
frozen. Any better ideas?


I see what you mean about the possible misinterpretation, but I think
it's a big improvement, and I don't have a better suggestion.


Thinking about this some more, I'm not too happy with those names 
either. vacuum_freeze_scan_age and autovacuum_freeze_scan_age don't mean 
quite the same thing, like vacuum_cost_delay and 
autovacuum_vacuum_cost_delay do, for example.


I'm now leaning towards:

autovacuum_freeze_max_age
vacuum_freeze_table_age
vacuum_freeze_min_age

where autovacuum_freeze_max_age and vacuum_freeze_min_age are unchanged, 
and vacuum_freeze_table_age is the new setting that controls when VACUUM 
or autovacuum should perform a full scan of the table to advance 
relfrozenxid.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-09 Thread Jeff Davis
On Fri, 2009-01-09 at 13:49 +0200, Heikki Linnakangas wrote:
 Thinking about this some more, I'm not too happy with those names 
 either. vacuum_freeze_scan_age and autovacuum_freeze_scan_age don't mean 
 quite the same thing, like vacuum_cost_delay and 
 autovacuum_vacuum_cost_delay do, for example.

If the distinction you're making is that autovacuum_freeze_max_age
affects the launching of a vacuum rather than the behavior of a vacuum,
maybe we could incorporate the word launch like:

autovacuum_launch_freeze_threshold

 I'm now leaning towards:
 
 autovacuum_freeze_max_age
 vacuum_freeze_table_age
 vacuum_freeze_min_age
 
 where autovacuum_freeze_max_age and vacuum_freeze_min_age are unchanged, 
 and vacuum_freeze_table_age is the new setting that controls when VACUUM 
 or autovacuum should perform a full scan of the table to advance 
 relfrozenxid.

I'm still bothered by the fact that max and min really mean the same
thing here.

I don't think we can perfectly capture the meaning of these GUCs in the
name. I think our goal should be to avoid confusion between them.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-07 Thread Jeff Davis
On Wed, 2009-01-07 at 09:34 +0200, Heikki Linnakangas wrote:
 autovacuum_freeze_max_age - autovacuum_freeze_scan_age
 vacuum_freeze_max_age - vacuum_freeze_scan_age
 vacuum_freeze_min_age - vacuum_freeze_tuple_age
 
 *_scan_age settings control when the table is fully scanned to freeze 
 tuples and advance relfrozenxid, and vacuum_freeze_tuple_age controls 
 how old a tuple needs to be to be frozen. One objection is that you can 
 read freeze_scan to mean that a scan is frozen, like a tuple is 
 frozen. Any better ideas?

I see what you mean about the possible misinterpretation, but I think
it's a big improvement, and I don't have a better suggestion.

Thanks,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2009-01-06 Thread Heikki Linnakangas

Jeff Davis wrote:

On Mon, 2008-12-22 at 21:24 +0200, Heikki Linnakangas wrote:
Introduce new vacuum_freeze_max_age setting. Manual VACUUM will scan the 
whole table and advance relfrozenxid, if relfrozenxid is older than 
vacuum_freeze_max_age.




It's confusing to have two GUCs named vacuum_freeze_min_age and
vacuum_freeze_max_age with incompatible definitions. The former is the
minimum transaction age of a tuple found during the scan of a table,
while the latter is the maximum transaction age of the relfrozenxid of
the table.

If you set vacuum_freeze_max_age to 0, the visibility map is not used to 
skip pages, so you'll get the pre-8.4 old behavior.


Seems like a strange way to turn off visibility maps, and the meaning
doesn't seem to fit with either vacuum_freeze_min_age or
autovacuum_freeze_max_age.

The proposal itself makes sense, but I think we need to decide on some
better names. Right now the meanings of autovacuum_freeze_max_age and
vacuum_freeze_min_age are incompatible, so we're not in a good position,
but there has to be something less confusing.

For one thing, there isn't even a common definition of min or max
between them. They both trigger an event (freezing or vacuuming) when
something (tuple xmin or relfrozenxid) exceeds some number. Why is one
called a min and the other a max?


Yeah, you're right. Fuji's point that it's confusing that you can have a 
min greater than max is also valid.


How about

autovacuum_freeze_max_age - autovacuum_freeze_scan_age
vacuum_freeze_max_age   - vacuum_freeze_scan_age
vacuum_freeze_min_age   - vacuum_freeze_tuple_age

*_scan_age settings control when the table is fully scanned to freeze 
tuples and advance relfrozenxid, and vacuum_freeze_tuple_age controls 
how old a tuple needs to be to be frozen. One objection is that you can 
read freeze_scan to mean that a scan is frozen, like a tuple is 
frozen. Any better ideas?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2008-12-23 Thread Jeff Davis
On Mon, 2008-12-22 at 21:24 +0200, Heikki Linnakangas wrote:
 Introduce new vacuum_freeze_max_age setting. Manual VACUUM will scan the 
 whole table and advance relfrozenxid, if relfrozenxid is older than 
 vacuum_freeze_max_age.
 

It's confusing to have two GUCs named vacuum_freeze_min_age and
vacuum_freeze_max_age with incompatible definitions. The former is the
minimum transaction age of a tuple found during the scan of a table,
while the latter is the maximum transaction age of the relfrozenxid of
the table.

 If you set vacuum_freeze_max_age to 0, the visibility map is not used to 
 skip pages, so you'll get the pre-8.4 old behavior.

Seems like a strange way to turn off visibility maps, and the meaning
doesn't seem to fit with either vacuum_freeze_min_age or
autovacuum_freeze_max_age.

The proposal itself makes sense, but I think we need to decide on some
better names. Right now the meanings of autovacuum_freeze_max_age and
vacuum_freeze_min_age are incompatible, so we're not in a good position,
but there has to be something less confusing.

For one thing, there isn't even a common definition of min or max
between them. They both trigger an event (freezing or vacuuming) when
something (tuple xmin or relfrozenxid) exceeds some number. Why is one
called a min and the other a max?

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2008-12-22 Thread Heikki Linnakangas

Heikki Linnakangas wrote:

Peter Eisentraut wrote:

Heikki Linnakangas wrote:
I think we need a threshold similar to autovacuum_freeze_max_age for 
manual vacuums as well: vacuum_freeze_max_age. If you run VACUUM, and 
relfrozenxid is older than vacuum_freeze_max_age, the visibility map 
is ignored and all pages are scanned.


Would one parameter to control both suffice?  (i.e., rename 
autovacuum_freeze_max_age to vacuum_freeze_max_age)


Imagine that you run a nightly VACUUM from cron, and have autovacuum 
disabled. If autovacuum_freeze_max_age is the same as 
vacuum_freeze_max_age, as soon as that age is reached, an 
anti-wraparound autovacuum launched. What we'd want to happen is for the 
next nightly VACUUM to do the work instead. So they need to be separate 
settings, with some space between them by default.


Attached is a proposed patch to handle freezing. In a nutshell:

Introduce new vacuum_freeze_max_age setting. Manual VACUUM will scan the 
whole table and advance relfrozenxid, if relfrozenxid is older than 
vacuum_freeze_max_age.


If vacuum_freeze_max_age is greater than autovacuum_freeze_max_age - 1 
million transactions, it's effectively capped at that value. It doesn't 
make sense to have vacuum_freeze_max_age  autovacuum_freeze_max_age, 
because the manual VACUUM wouldn't have a chance to do the full sweep 
before the anti-wraparound autovacuum is launched. The minus one 
million transactions is to give some headroom.


I'm lowering vacuum_freeze_min_age from 100 million transactions to 50 
million, so that the whole-table vacuum doesn't need to run as often. 
Note that since VACUUM normally only scans pages that need vacuuming 
according to the visibility map, tuples on skipped pages are not frozen 
any earlier even though vacuum_freeze_min_age is lower.


To recap, here's the new defaults:
autovacuum_freeze_max_age   2
vacuum_freeze_max_age   15000
vacuum_freeze_min_age5000

This means that with defaults, autovacuum will launch a whole-table 
vacuum every 150 million transactions (autovacuum_freeze_max_age - 
vacuum_freeze_min_age). If you run a nightly manual VACUUM, you'll get a 
whole-table vacuum every 100 million transactions.


vacuum_freeze_max_age also affects autovacuums. If an autovacuum is 
launched on table to remove dead tuples, and vacuum_freeze_max_age has 
been reached (but not yet autovacuum_freeze_max_age), the autovacuum 
will scan the whole table. I'm not sure if this is desirable, to avoid 
having to launch separate anti-wraparound autovacuums even when there's 
not many dead tuples, or just confusing.


If you set vacuum_freeze_max_age to 0, the visibility map is not used to 
skip pages, so you'll get the pre-8.4 old behavior.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7493ca9..9848ce0 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3925,6 +3925,26 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   /listitem
  /varlistentry
 
+ varlistentry id=guc-vacuum-freeze-max-age xreflabel=vacuum_freeze_max_age
+  termvarnamevacuum_freeze_max_age/varname (typeinteger/type)/term
+  indexterm
+   primaryvarnamevacuum_freeze_max_age/ configuration parameter/primary
+  /indexterm
+  listitem
+   para
+commandVACUUM/ performs a whole-table scan if the table's
+structnamepg_class/.structfieldrelfrozenxid/ field reaches the
+age specified by this setting.  The default is 150 million
+transactions.  Although users can set this value anywhere from zero to
+one billion, commandVACUUM/ will silently limit the effective value
+to the value of xref linkend=guc-autovacuum-freeze-max-age minus
+1 million transactions, so that regular manual commandVACUUM/ has a
+chance to run before autovacuum is launched to prevent XID wraparound.
+For more information see xref linkend=vacuum-for-wraparound.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry id=guc-vacuum-freeze-min-age xreflabel=vacuum_freeze_min_age
   termvarnamevacuum_freeze_min_age/varname (typeinteger/type)/term
   indexterm
@@ -3935,7 +3955,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
 Specifies the cutoff age (in transactions) that commandVACUUM/
 should use to decide whether to replace transaction IDs with
 literalFrozenXID/ while scanning a table.
-The default is 100 million transactions.  Although
+The default is 50 million transactions.  Although
 users can set this value anywhere from zero to one billion,
 commandVACUUM/ will silently limit the effective value to half
 the value of xref linkend=guc-autovacuum-freeze-max-age, so
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 03ce2e9..c41d464 100644

Re: [HACKERS] Visibility map and freezing

2008-12-22 Thread Fujii Masao
On Tue, Dec 23, 2008 at 4:24 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Heikki Linnakangas wrote:

 Peter Eisentraut wrote:

 Heikki Linnakangas wrote:

 I think we need a threshold similar to autovacuum_freeze_max_age for
 manual vacuums as well: vacuum_freeze_max_age. If you run VACUUM, and
 relfrozenxid is older than vacuum_freeze_max_age, the visibility map is
 ignored and all pages are scanned.

 Would one parameter to control both suffice?  (i.e., rename
 autovacuum_freeze_max_age to vacuum_freeze_max_age)

 Imagine that you run a nightly VACUUM from cron, and have autovacuum
 disabled. If autovacuum_freeze_max_age is the same as vacuum_freeze_max_age,
 as soon as that age is reached, an anti-wraparound autovacuum launched. What
 we'd want to happen is for the next nightly VACUUM to do the work instead.
 So they need to be separate settings, with some space between them by
 default.

 Attached is a proposed patch to handle freezing. In a nutshell:

 Introduce new vacuum_freeze_max_age setting. Manual VACUUM will scan the
 whole table and advance relfrozenxid, if relfrozenxid is older than
 vacuum_freeze_max_age.

 If vacuum_freeze_max_age is greater than autovacuum_freeze_max_age - 1
 million transactions, it's effectively capped at that value. It doesn't make
 sense to have vacuum_freeze_max_age  autovacuum_freeze_max_age, because the
 manual VACUUM wouldn't have a chance to do the full sweep before the
 anti-wraparound autovacuum is launched. The minus one million transactions
 is to give some headroom.

 I'm lowering vacuum_freeze_min_age from 100 million transactions to 50
 million, so that the whole-table vacuum doesn't need to run as often. Note
 that since VACUUM normally only scans pages that need vacuuming according to
 the visibility map, tuples on skipped pages are not frozen any earlier even
 though vacuum_freeze_min_age is lower.

 To recap, here's the new defaults:
 autovacuum_freeze_max_age   2
 vacuum_freeze_max_age   15000
 vacuum_freeze_min_age5000

 This means that with defaults, autovacuum will launch a whole-table vacuum
 every 150 million transactions (autovacuum_freeze_max_age -
 vacuum_freeze_min_age). If you run a nightly manual VACUUM, you'll get a
 whole-table vacuum every 100 million transactions.

 vacuum_freeze_max_age also affects autovacuums. If an autovacuum is launched
 on table to remove dead tuples, and vacuum_freeze_max_age has been reached
 (but not yet autovacuum_freeze_max_age), the autovacuum will scan the whole
 table. I'm not sure if this is desirable, to avoid having to launch separate
 anti-wraparound autovacuums even when there's not many dead tuples, or just
 confusing.

 If you set vacuum_freeze_max_age to 0, the visibility map is not used to
 skip pages, so you'll get the pre-8.4 old behavior.

It seems to be strange that max can be less than min. Is it worth dividing
a parameter into two(min/max)?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2008-12-18 Thread Peter Eisentraut

Heikki Linnakangas wrote:
I think we need a threshold similar to autovacuum_freeze_max_age for 
manual vacuums as well: vacuum_freeze_max_age. If you run VACUUM, and 
relfrozenxid is older than vacuum_freeze_max_age, the visibility map is 
ignored and all pages are scanned.


Would one parameter to control both suffice?  (i.e., rename 
autovacuum_freeze_max_age to vacuum_freeze_max_age)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2008-12-18 Thread Heikki Linnakangas

Peter Eisentraut wrote:

Heikki Linnakangas wrote:
I think we need a threshold similar to autovacuum_freeze_max_age for 
manual vacuums as well: vacuum_freeze_max_age. If you run VACUUM, and 
relfrozenxid is older than vacuum_freeze_max_age, the visibility map 
is ignored and all pages are scanned.


Would one parameter to control both suffice?  (i.e., rename 
autovacuum_freeze_max_age to vacuum_freeze_max_age)


Imagine that you run a nightly VACUUM from cron, and have autovacuum 
disabled. If autovacuum_freeze_max_age is the same as 
vacuum_freeze_max_age, as soon as that age is reached, an 
anti-wraparound autovacuum launched. What we'd want to happen is for the 
next nightly VACUUM to do the work instead. So they need to be separate 
settings, with some space between them by default.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map and reltuples

2008-12-17 Thread Heikki Linnakangas

Tom Lane wrote:

I think your previous sketch is right: suppress update of reltuples (and
relpages) from a partial vacuum scan, and ensure that the analyze phase
is allowed to do it instead if it happens during VACUUM ANALYZE.


We also mustn't reset n_live_tuples in pgstat in partial vacuum. 
Committed a patch to do that.


Thanks for the report!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Visibility map and freezing

2008-12-17 Thread Heikki Linnakangas
The way VACUUM works with the visibility map is that if any pages are 
skipped, relfrozenxid can't be updated. That means that plain VACUUM 
won't advance relfrozenxid, and doesn't protect from XID wraparound.


We discussed this in the context of autovacuum before, and we have that 
covered now. Autovacuum will launch a full-scanning vacuum that advances 
relfrozenxid, when autovacuum_freeze_max_age is reached, and partial 
vacuums otherwise.


Autovacuum will launch anti-wraparound vacuum even if it's otherwise 
disabled. Which is good, but it'll be an unpleasant surprise for someone 
who performs a simple manual database-wide VACUUM, for example, every 
night from a cron job. You could run VACUUM FREEZE, say monthly, to 
force a full-scanning vacuum, but that's unnecessarily aggressive, and 
you need to know about the issue to set that up in the first place.


I think we need a threshold similar to autovacuum_freeze_max_age for 
manual vacuums as well: vacuum_freeze_max_age. If you run VACUUM, and 
relfrozenxid is older than vacuum_freeze_max_age, the visibility map is 
ignored and all pages are scanned.


This ensures that you don't run into forced anti-wraparound autovacuums 
if you do your VACUUMs manually.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2008-12-17 Thread Gregory Stark
Zdenek Kotala zdenek.kot...@sun.com writes:

 What's about add second bit which mark frozen page (all tuples have freeze
 XID)? It should avoid full scan, but extend size of map.

That would only really work if you have a very static table where entire pages
get frozen and stay frozen long before the freeze_max_age is reached. Even
that wouldn't really work because the partial vacuums would never see those
pages.

One option is to keep a frozenxid per page which would allow us to visit only
pages that need freezing.

A more complex scheme would be to have a bit which indicates that all
non-frozen xids are  relfrozenxid+100M. When we find all the bits set we can
clear them all and bump relfrozenxid by 100M. This would allow regular partial
vacuums to gradually move the frozenxid forward.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2008-12-17 Thread Heikki Linnakangas

Zdenek Kotala wrote:
What's about add second bit which mark frozen page (all tuples have 
freeze XID)? It should avoid full scan, but extend size of map.


First of all, we'd still have to make the decision of when to scan pages 
that need freezing, and when to only scan pages that have dead tuples.


Secondly, if it's just one bit, we'd have to freeze all tuples on the 
page to set the bit, which is a lot more aggressive than we do now.


Thirdly, those frozen bits would be critical, not just hints like the 
visibility map is right now. Corrupt frozen bits would lead to data 
loss. That means we'd have to solve the problem of how to make sure the 
bits are always cleared when tuples are updated on a page. That's not 
completely crash-proof at the moment, see comments in visibilitymap.c. 
Even if we did solve that, I'd rather live with a hints-only visibility 
map for one release, before we take the plunge and make it a critical 
part. Just in case...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map and freezing

2008-12-17 Thread Zdenek Kotala
What's about add second bit which mark frozen page (all tuples have freeze XID)? 
It should avoid full scan, but extend size of map.


Zdenek



Heikki Linnakangas napsal(a):
The way VACUUM works with the visibility map is that if any pages are 
skipped, relfrozenxid can't be updated. That means that plain VACUUM 
won't advance relfrozenxid, and doesn't protect from XID wraparound.


We discussed this in the context of autovacuum before, and we have that 
covered now. Autovacuum will launch a full-scanning vacuum that advances 
relfrozenxid, when autovacuum_freeze_max_age is reached, and partial 
vacuums otherwise.


Autovacuum will launch anti-wraparound vacuum even if it's otherwise 
disabled. Which is good, but it'll be an unpleasant surprise for someone 
who performs a simple manual database-wide VACUUM, for example, every 
night from a cron job. You could run VACUUM FREEZE, say monthly, to 
force a full-scanning vacuum, but that's unnecessarily aggressive, and 
you need to know about the issue to set that up in the first place.


I think we need a threshold similar to autovacuum_freeze_max_age for 
manual vacuums as well: vacuum_freeze_max_age. If you run VACUUM, and 
relfrozenxid is older than vacuum_freeze_max_age, the visibility map is 
ignored and all pages are scanned.


This ensures that you don't run into forced anti-wraparound autovacuums 
if you do your VACUUMs manually.





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map and reltuples

2008-12-15 Thread Heikki Linnakangas

Heikki Linnakangas wrote:

Ned T. Crigler wrote:
It appears that the visibility map patch is causing pg_class.reltuples 
to be

set improperly after a vacuum. For example, it is set to 0 if the map
indicated that no pages in the heap needed to be scanned.

Perhaps reltuples should not be updated unless every page was scanned 
during

the vacuum?


Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all pages.


Because we use reltuples divided by relpages in the planner, we probably 
shouldn't update relpages either if we don't update reltuples. 
Otherwise, if the table has grown a lot since we last updated reltuples, 
the reltuples / relpages ratio would be less, not more, accurate, if 
relpages is updated to a new higher value but reltuples is not.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map and reltuples

2008-12-15 Thread Greg Stark
I wonder if we should switch to keeping reltuplesperpage instead. Then  
a partial vacuum could update it by taking the average number of  
tuples per page forbthe pages it saw. Perhaps adjusting it to the  
weights average between the old value and the new value based on how  
many pages were seen.


I suppose there's no reason we can't update reltuples using that same  
logic though it would be a big opaque.


--
Greg


On 15 Dec 2008, at 04:01, Heikki Linnakangas heikki.linnakan...@enterprisedb.com 
 wrote:



Heikki Linnakangas wrote:

Ned T. Crigler wrote:
It appears that the visibility map patch is causing  
pg_class.reltuples to be
set improperly after a vacuum. For example, it is set to 0 if the  
map

indicated that no pages in the heap needed to be scanned.

Perhaps reltuples should not be updated unless every page was  
scanned during

the vacuum?
Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all  
pages.


Because we use reltuples divided by relpages in the planner, we  
probably shouldn't update relpages either if we don't update  
reltuples. Otherwise, if the table has grown a lot since we last  
updated reltuples, the reltuples / relpages ratio would be less, not  
more, accurate, if relpages is updated to a new higher value but  
reltuples is not.


--
 Heikki Linnakangas
 EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map and reltuples

2008-12-15 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 Greg Stark wrote:
 I wonder if we should switch to keeping reltuplesperpage instead. Then a 
 partial vacuum could update it by taking the average number of tuples 
 per page forbthe pages it saw. Perhaps adjusting it to the weights 
 average between the old value and the new value based on how many pages 
 were seen.

 The pages scanned by a partial vacuum isn't a random sample of pages in 
 the table. That would bias the reltuplesperpage value towards those 
 pages that are updated more.

Yeah ... and it's highly likely that repeatedly-updated pages would have
more dead space than never-updated ones, so there'd be a systematic
creep towards underestimation of the total tuple count.

I think your previous sketch is right: suppress update of reltuples (and
relpages) from a partial vacuum scan, and ensure that the analyze phase
is allowed to do it instead if it happens during VACUUM ANALYZE.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map and reltuples

2008-12-15 Thread Heikki Linnakangas

Greg Stark wrote:
I wonder if we should switch to keeping reltuplesperpage instead. Then a 
partial vacuum could update it by taking the average number of tuples 
per page forbthe pages it saw. Perhaps adjusting it to the weights 
average between the old value and the new value based on how many pages 
were seen.


The pages scanned by a partial vacuum isn't a random sample of pages in 
the table. That would bias the reltuplesperpage value towards those 
pages that are updated more.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map and reltuples

2008-12-15 Thread Greg Smith

On Mon, 15 Dec 2008, Greg Stark wrote:


I wonder if we should switch to keeping reltuplesperpage instead.


It would be preferrable to not touch the user side of reltuples if 
possible, since it's the only instant way to get a good estimate of the 
number of rows in a table right now.  That's been a regular application 
technique for at least two years now, since 
http://www.varlena.com/GeneralBits/120.php popularized it.


--
* Greg Smith gsm...@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map and reltuples

2008-12-14 Thread Heikki Linnakangas

Ned T. Crigler wrote:

It appears that the visibility map patch is causing pg_class.reltuples to be
set improperly after a vacuum. For example, it is set to 0 if the map
indicated that no pages in the heap needed to be scanned.

Perhaps reltuples should not be updated unless every page was scanned during
the vacuum?


Yeah, vacuum shouldn't overwrite reltuples if it hasn't scanned all pages.

The interplay of vacuum and analyze in VACUUM ANALYZE needs to be 
changed too. Currently, the analyze after vacuum doesn't overwrite 
reltuples, because the one calculated by vacuum is based on scanning all 
pages, and is thus more accurate than the one estimated from the sample 
(which is not true anymore, as you pointed out). I think the vacuum 
needs to somehow tell analyze whether it updated reltuples or not, so 
that analyze can update reltuples if the vacuum didn't scan all pages.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] visibility map and reltuples

2008-12-13 Thread Ned T. Crigler
It appears that the visibility map patch is causing pg_class.reltuples to be
set improperly after a vacuum. For example, it is set to 0 if the map
indicated that no pages in the heap needed to be scanned.

Perhaps reltuples should not be updated unless every page was scanned during
the vacuum?

-- 
Ned T. Crigler

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] visibility map - what do i miss?

2008-12-06 Thread hubert depesz lubaczewski
--- repost to hackers as suggested by RhodiumToad ---

hi,
i tried to test new visibility map feature.

to do so i:
1. fetched postgresql sources from cvs
2. compiled
3. turned autovacuum off
4. started pg
5. ran this queries:
  - CREATE TABLE test_1 (i INT4);
  - CREATE TABLE test_2 (i INT4);
  - CREATE TABLE test_3 (i INT4);
  - CREATE TABLE test_4 (i INT4);
  - INSERT INTO test_1 SELECT generate_series(1, 1);
  - INSERT INTO test_2 SELECT generate_series(1, 1);
  - INSERT INTO test_3 SELECT generate_series(1, 1);
  - INSERT INTO test_4 SELECT generate_series(1, 1);
  - UPDATE test_2 SET i = i + 1 WHERE i  1000;
  - UPDATE test_3 SET i = i + 1 WHERE i  5000;
  - UPDATE test_4 SET i = i + 1 WHERE i  9000;
  - VACUUM test_1;
  - VACUUM test_2;
  - VACUUM test_3;
  - VACUUM test_4;

I did it 2 times, first with sources of pg from 1st of november, and
second - with head from yesterday evening (warsaw, poland time).

results puzzled me.

First run - without visibility maps, timing of vacuums:
Time: 267844.822 ms
Time: 138854.592 ms
Time: 305467.950 ms
Time: 487133.179 ms

Second run - on head:

Time: 252218.609 ms
Time: 234388.763 ms
Time: 334016.413 ms
Time: 575698.750 ms

Now - as I understand the change - visilibity maps should make second run much 
faster?

Tests were performed on laptop. During first test I used it to browse the web,
read mail. During second test - nobody used the laptop.

Relation forms seem to exist:
# select oid from pg_database where datname = 'depesz';
  oid
---
 16389
(1 row)
# select relfilenode from pg_class where relname ~ 'test_';
 relfilenode
-
   26756
   26759
   26762
   26765
(4 rows)

= ls -l {26756,26759,26762,26765}*
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:31 26756
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:33 26756.1
-rw--- 1 pgdba pgdba 1065066496 2008-12-06 01:34 26756.2
-rw--- 1 pgdba pgdba 811008 2008-12-06 01:34 26756_fsm
-rw--- 1 pgdba pgdba  57344 2008-12-06 01:34 26756_vm
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:35 26759
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:36 26759.1
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:37 26759.2
-rw--- 1 pgdba pgdba  312582144 2008-12-06 01:39 26759.3
-rw--- 1 pgdba pgdba 892928 2008-12-06 01:39 26759_fsm
-rw--- 1 pgdba pgdba  57344 2008-12-06 01:39 26759_vm
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:39 26762
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:49 26762.1
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:41 26762.2
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:42 26762.3
-rw--- 1 pgdba pgdba  523862016 2008-12-06 01:43 26762.4
-rw--- 1 pgdba pgdba1204224 2008-12-06 01:43 26762_fsm
-rw--- 1 pgdba pgdba  81920 2008-12-06 01:53 26762_vm
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 02:01 26765
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 02:08 26765.1
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 02:18 26765.2
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:50 26765.3
-rw--- 1 pgdba pgdba 1073741824 2008-12-06 01:51 26765.4
-rw--- 1 pgdba pgdba  735141888 2008-12-06 02:00 26765.5
-rw--- 1 pgdba pgdba1523712 2008-12-06 02:00 26765_fsm
-rw--- 1 pgdba pgdba  98304 2008-12-06 02:18 26765_vm

What do I miss?

Best regards,

depesz

-- 
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk: [EMAIL PROTECTED] / aim:depeszhdl / skype:depesz_hdl / gg:6749007

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map - what do i miss?

2008-12-06 Thread Guillaume Smet
On Sat, Dec 6, 2008 at 12:53 PM, hubert depesz lubaczewski
[EMAIL PROTECTED] wrote:
 First run - without visibility maps, timing of vacuums:
 Time: 267844.822 ms
 Time: 138854.592 ms
 Time: 305467.950 ms
 Time: 487133.179 ms

 Second run - on head:

 Time: 252218.609 ms
 Time: 234388.763 ms
 Time: 334016.413 ms
 Time: 575698.750 ms

 Now - as I understand the change - visilibity maps should make second run 
 much faster?

If I understand correctly what Heikki explained, not if you run VACUUM
only once (and you confirmed me on IRC you run it only once). It's the
VACUUM which sets the PD_ALL_VISIBLE flag on the pages so the first
VACUUM should be slower with HEAD than with 8.3 as it has far more
work to do.

The second VACUUM should then be faster.

-- 
Guillaume

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] visibility map - what do i miss?

2008-12-06 Thread Heikki Linnakangas

Guillaume Smet wrote:

On Sat, Dec 6, 2008 at 12:53 PM, hubert depesz lubaczewski
[EMAIL PROTECTED] wrote:

First run - without visibility maps, timing of vacuums:
Time: 267844.822 ms
Time: 138854.592 ms
Time: 305467.950 ms
Time: 487133.179 ms

Second run - on head:

Time: 252218.609 ms
Time: 234388.763 ms
Time: 334016.413 ms
Time: 575698.750 ms

Now - as I understand the change - visilibity maps should make second run much 
faster?


If I understand correctly what Heikki explained, not if you run VACUUM
only once (and you confirmed me on IRC you run it only once). It's the
VACUUM which sets the PD_ALL_VISIBLE flag on the pages so the first
VACUUM should be slower with HEAD than with 8.3 as it has far more
work to do.

The second VACUUM should then be faster.


That diagnosis is not quite right, but the prognosis is correct. The 
first VACUUM after the UPDATEs doesn't set the PD_ALL_VISIBLE flags, 
because there's still dead tuples on the pages. The dead tuples are 
removed in the 2nd pass of the first vacuum, but it doesn't try to set 
the PD_ALL_VISIBLE flags; that's only done in the first phase.


The second vacuum is just as slow as the first one, because the 
visibility map doesn't have any bits set yet. The second vacuum will set 
the bits, though, so the *third* vacuum should go faster.


So setting the PD_ALL_VISIBLE flags doesn't slow things down. That 
should be just a tiny bit of extra CPU work per vacuumed page, not 
something that would show up in performance tests.


This is the 1st issue I mentioned in this mail:

http://archives.postgresql.org/message-id/[EMAIL PROTECTED]

There was some suggestions in that thread, but none has been implemented.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Heikki Linnakangas

Heikki Linnakangas wrote:
Here's an updated version, with a lot of smaller cleanups, and using 
relcache invalidation to notify other backends when the visibility map 
fork is extended. I already committed the change to FSM to do the same. 
I'm feeling quite satisfied to commit this patch early next week.


Committed.

I haven't done any doc changes for this yet. I think a short section in 
the database internal storage chapter is probably in order, and the 
fact that plain VACUUM skips pages should be mentioned somewhere. I'll 
skim through references to vacuum and see what needs to be changed.


Hmm. It just occurred to me that I think this circumvented the 
anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid 
anymore. We'll need to disable the skipping when autovacuum is triggered 
to prevent wraparound. VACUUM FREEZE does that already, but it's 
unnecessarily aggressive in freezing.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 Hmm. It just occurred to me that I think this circumvented the anti-wraparound
 vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
 disable the skipping when autovacuum is triggered to prevent wraparound. 
 VACUUM
 FREEZE does that already, but it's unnecessarily aggressive in freezing.

Having seen how the anti-wraparound vacuums work in the field I think merely
replacing it with a regular vacuum which covers the whole table will not
actually work well.

What will happen is that, because nothing else is advancing the relfrozenxid,
the age of the relfrozenxid for all tables will advance until they all hit
autovacuum_max_freeze_age. Quite often all the tables were created around the
same time so they will all hit autovacuum_max_freeze_age at the same time.

So a database which was operating fine and receiving regular vacuums at a
reasonable pace will suddenly be hit by vacuums for every table all at the
same time, 3 at a time. If you don't have vacuum_cost_delay set that will
cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
the small busy tables from getting vacuumed regularly due to the backlog in
anti-wraparound vacuums.

Worse, vacuum will set the freeze_xid to nearly the same value for all of the
tables. So it will all happen again in another 100M transactions. And again in
another 100M transactions, and again...

I think there are several things which need to happen here.

1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
   means unnecessary full table vacuums long before they accomplish anything.

2) Include a factor which spreads out the anti-wraparound freezes in the
   autovacuum launcher. Some ideas:

. we could implicitly add random(vacuum_freeze_min_age) to the
  autovacuum_max_freeze_age. That would spread them out evenly over 100M
  transactions.

. we could check if another anti-wraparound vacuum is still running and
  implicitly add a vacuum_freeze_min_age penalty to the
  autovacuum_max_freeze_age for each running anti-wraparound vacuum. That
  would spread them out without being introducing non-determinism which
  seems better.

. we could leave autovacuum_max_freeze_age and instead pick a semi-random
  vacuum_freeze_min_age. This would mean the first set of anti-wraparound
  vacuums would still be synchronized but subsequent ones might be spread
  out somewhat. There's not as much room to randomize this though and it
  would affect how much i/o vacuum did which makes it seem less palatable
  to me.

3) I also think we need to put a clamp on the vacuum_cost_delay. Too many
   people are setting it to unreasonably high values which results in their
   vacuums never completing. Actually I think what we should do is junk all
   the existing parameters and replace it with a vacuum_nice_level or
   vacuum_bandwidth_cap from which we calculate the cost_limit and hide all
   the other parameters as internal parameters.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Alvaro Herrera
Heikki Linnakangas wrote:

 Hmm. It just occurred to me that I think this circumvented the  
 anti-wraparound vacuuming: a normal vacuum doesn't advance relfrozenxid  
 anymore. We'll need to disable the skipping when autovacuum is triggered  
 to prevent wraparound. VACUUM FREEZE does that already, but it's  
 unnecessarily aggressive in freezing.

Heh :-)  Yes, this should be handled sanely, without having to invoke
FREEZE.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Magnus Hagander
Gregory Stark wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 
 Hmm. It just occurred to me that I think this circumvented the 
 anti-wraparound
 vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need 
 to
 disable the skipping when autovacuum is triggered to prevent wraparound. 
 VACUUM
 FREEZE does that already, but it's unnecessarily aggressive in freezing.
 
 Having seen how the anti-wraparound vacuums work in the field I think merely
 replacing it with a regular vacuum which covers the whole table will not
 actually work well.
 
 What will happen is that, because nothing else is advancing the relfrozenxid,
 the age of the relfrozenxid for all tables will advance until they all hit
 autovacuum_max_freeze_age. Quite often all the tables were created around the
 same time so they will all hit autovacuum_max_freeze_age at the same time.
 
 So a database which was operating fine and receiving regular vacuums at a
 reasonable pace will suddenly be hit by vacuums for every table all at the
 same time, 3 at a time. If you don't have vacuum_cost_delay set that will
 cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
 the small busy tables from getting vacuumed regularly due to the backlog in
 anti-wraparound vacuums.
 
 Worse, vacuum will set the freeze_xid to nearly the same value for all of the
 tables. So it will all happen again in another 100M transactions. And again in
 another 100M transactions, and again...
 
 I think there are several things which need to happen here.
 
 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
means unnecessary full table vacuums long before they accomplish anything.
 
 2) Include a factor which spreads out the anti-wraparound freezes in the
autovacuum launcher. Some ideas:
 
 . we could implicitly add random(vacuum_freeze_min_age) to the
   autovacuum_max_freeze_age. That would spread them out evenly over 100M
   transactions.
 
 . we could check if another anti-wraparound vacuum is still running and
   implicitly add a vacuum_freeze_min_age penalty to the
   autovacuum_max_freeze_age for each running anti-wraparound vacuum. That
   would spread them out without being introducing non-determinism which
   seems better.
 
 . we could leave autovacuum_max_freeze_age and instead pick a semi-random
   vacuum_freeze_min_age. This would mean the first set of anti-wraparound
   vacuums would still be synchronized but subsequent ones might be spread
   out somewhat. There's not as much room to randomize this though and it
   would affect how much i/o vacuum did which makes it seem less palatable
   to me.

How about a way to say that only one (or a config parameter for n) of
the autovac workers can be used for anti-wraparound vacuum? Then the
other slots would still be available for the
small-but-frequently-updated tables.



 3) I also think we need to put a clamp on the vacuum_cost_delay. Too many
people are setting it to unreasonably high values which results in their
vacuums never completing. Actually I think what we should do is junk all
the existing parameters and replace it with a vacuum_nice_level or
vacuum_bandwidth_cap from which we calculate the cost_limit and hide all
the other parameters as internal parameters.

It would certainly be helpful if it was just a single parameter - the
arbitraryness of the parameters there now make them pretty hard to set
properly - or at least easy to set wrong.


//Magnus

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Heikki Linnakangas

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:


Hmm. It just occurred to me that I think this circumvented the anti-wraparound
vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM
FREEZE does that already, but it's unnecessarily aggressive in freezing.


FWIW, it seems the omission is actually the other way 'round. Autovacuum 
always forces a full-scanning vacuum, making the visibility map useless 
for autovacuum. This obviously needs to be fixed.



What will happen is that, because nothing else is advancing the relfrozenxid,
the age of the relfrozenxid for all tables will advance until they all hit
autovacuum_max_freeze_age. Quite often all the tables were created around the
same time so they will all hit autovacuum_max_freeze_age at the same time.

So a database which was operating fine and receiving regular vacuums at a
reasonable pace will suddenly be hit by vacuums for every table all at the
same time, 3 at a time. If you don't have vacuum_cost_delay set that will
cause a major issue. Even if you do have vacuum_cost_delay set it will prevent
the small busy tables from getting vacuumed regularly due to the backlog in
anti-wraparound vacuums.

Worse, vacuum will set the freeze_xid to nearly the same value for all of the
tables. So it will all happen again in another 100M transactions. And again in
another 100M transactions, and again...


But we already have that problem, don't we? When you initially load your 
database, all tuples will have the same xmin, and all tables will have 
more or less the same relfrozenxid. I guess you can argue that it 
becomes more obvious if vacuums are otherwise cheaper, but I don't think 
the visibility map makes that much difference to suddenly make this 
issue urgent.


Agreed that it would be nice to do something about it, though.


I think there are several things which need to happen here.

1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
   means unnecessary full table vacuums long before they accomplish anything.


It allows you to truncate clog. If I did my math right, 200M 
transactions amounts to ~50MB of clog. Perhaps we should still raise it, 
disk space is cheap after all.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:

 Hmm. It just occurred to me that I think this circumvented the 
 anti-wraparound
 vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need 
 to
 disable the skipping when autovacuum is triggered to prevent wraparound. 
 VACUUM
 FREEZE does that already, but it's unnecessarily aggressive in freezing.

 FWIW, it seems the omission is actually the other way 'round. Autovacuum 
 always
 forces a full-scanning vacuum, making the visibility map useless for
 autovacuum. This obviously needs to be fixed.

How does it do that? Is there some option in the VacStmt to control this? Do
we just need a syntax to set that option?


How easy is it to tell what percentage of the table needs to be vacuumed? If
it's  50% perhaps it would make sense to scan the whole table? (Hm. Not
really if it's a contiguous 50% though...)

Another idea: Perhaps each page of the visibility map should have a frozenxid
(or multiple frozenxids?). Then if an individual page of the visibility map is
old we could force scanning all the heap pages covered by that map page and
update it. I'm not sure we can do that safely though without locking issues --
or is it ok because it's vacuum doing the updating?

 Worse, vacuum will set the freeze_xid to nearly the same value for all of the
 tables. So it will all happen again in another 100M transactions. And again 
 in
 another 100M transactions, and again...

 But we already have that problem, don't we? When you initially load your
 database, all tuples will have the same xmin, and all tables will have more or
 less the same relfrozenxid. I guess you can argue that it becomes more obvious
 if vacuums are otherwise cheaper, but I don't think the visibility map makes
 that much difference to suddenly make this issue urgent.

We already have that problem but it only bites in a specific case: if you have
no other vacuums being triggered by the regular dead tuple scale factor. The
normal case is intended to be that autovacuum triggers much more frequently
than every 100M transactions to reduce bloat.

However in practice this specific case does seem to arise rather alarmingly
easy. Most databases do have some large tables which are never deleted from or
updated. Also, the default scale factor of 20% is actually quite easy to never
reach if your tables are also growing quickly -- effectively moving the
goalposts further out as fast as the updates and deletes bloat the table.

The visibility map essentially widens this specific use case to cover *all*
tables. Since the relfrozenxid would never get advanced by regular vacuums the
only time it would get advanced is when they all hit the 200M wall
simultaneously.

 Agreed that it would be nice to do something about it, though.

 I think there are several things which need to happen here.

 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
means unnecessary full table vacuums long before they accomplish anything.

 It allows you to truncate clog. If I did my math right, 200M transactions
 amounts to ~50MB of clog. Perhaps we should still raise it, disk space is 
 cheap
 after all.

Ah. Hm. Then perhaps this belongs in the realm of the config generator people
are working on. They'll need a dial to say how much disk space you expect your
database to take in addition to how much memory your machine has available.
50M is nothing for a 1TB database but it's kind of silly to have to keep
hundreds of megs of clogs on a 1MB database.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Heikki Linnakangas

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Hmm. It just occurred to me that I think this circumvented the anti-wraparound
vacuuming: a normal vacuum doesn't advance relfrozenxid anymore. We'll need to
disable the skipping when autovacuum is triggered to prevent wraparound. VACUUM
FREEZE does that already, but it's unnecessarily aggressive in freezing.

FWIW, it seems the omission is actually the other way 'round. Autovacuum always
forces a full-scanning vacuum, making the visibility map useless for
autovacuum. This obviously needs to be fixed.


How does it do that? Is there some option in the VacStmt to control this? Do
we just need a syntax to set that option?


The way it works now is that if VacuumStmt-freeze_min_age is not -1 
(which means use the default), the visibility map is not used and the 
whole table is scanned. Autovacuum always sets freeze_min_age, so it's 
never using the visibility map. Attached is a patch I'm considering to 
fix that.



How easy is it to tell what percentage of the table needs to be vacuumed? If
it's  50% perhaps it would make sense to scan the whole table? (Hm. Not
really if it's a contiguous 50% though...)


Hmm. You could scan the visibility map to see how much you could skip by 
using it. You could account for contiguity.



Another idea: Perhaps each page of the visibility map should have a frozenxid
(or multiple frozenxids?). Then if an individual page of the visibility map is
old we could force scanning all the heap pages covered by that map page and
update it. I'm not sure we can do that safely though without locking issues --
or is it ok because it's vacuum doing the updating?


We discussed that a while ago:

http://archives.postgresql.org/message-id/[EMAIL PROTECTED]

Tom was concerned about making the visibility map not just a hint but 
critical data. Rightly so. This is certainly 8.5 stuff; perhaps it would 
be more palatable after we get the index-only-scans working using the 
visibility map, since the map would be critical data anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index fd2429a..3e3cb9d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -171,10 +171,7 @@ lazy_vacuum_rel(Relation onerel, VacuumStmt *vacstmt,
 	vacrelstats-hasindex = (nindexes  0);
 
 	/* Should we use the visibility map or scan all pages? */
-	if (vacstmt-freeze_min_age != -1)
-		scan_all = true;
-	else
-		scan_all = false;
+	scan_all = vacstmt-scan_all;
  
 	/* Do the vacuuming */
 	lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, scan_all);
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index eb7ab4d..2781f6e 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2771,6 +2771,7 @@ _copyVacuumStmt(VacuumStmt *from)
 	COPY_SCALAR_FIELD(analyze);
 	COPY_SCALAR_FIELD(verbose);
 	COPY_SCALAR_FIELD(freeze_min_age);
+	COPY_SCALAR_FIELD(scan_all));
 	COPY_NODE_FIELD(relation);
 	COPY_NODE_FIELD(va_cols);
 
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index d4c57bb..86a032f 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1436,6 +1436,7 @@ _equalVacuumStmt(VacuumStmt *a, VacuumStmt *b)
 	COMPARE_SCALAR_FIELD(analyze);
 	COMPARE_SCALAR_FIELD(verbose);
 	COMPARE_SCALAR_FIELD(freeze_min_age);
+	COMPARE_SCALAR_FIELD(scan_all);
 	COMPARE_NODE_FIELD(relation);
 	COMPARE_NODE_FIELD(va_cols);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 85f4616..1aab75c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -5837,6 +5837,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 	n-analyze = false;
 	n-full = $2;
 	n-freeze_min_age = $3 ? 0 : -1;
+	n-scan_all = $3;
 	n-verbose = $4;
 	n-relation = NULL;
 	n-va_cols = NIL;
@@ -5849,6 +5850,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 	n-analyze = false;
 	n-full = $2;
 	n-freeze_min_age = $3 ? 0 : -1;
+	n-scan_all = $3;
 	n-verbose = $4;
 	n-relation = $5;
 	n-va_cols = NIL;
@@ -5860,6 +5862,7 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 	n-vacuum = true;
 	n-full = $2;
 	n-freeze_min_age = $3 ? 0 : -1;
+	n-scan_all = $3;
 	n-verbose |= $4;
 	$$ = (Node *)n;
 }
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 8d8947f..2c68779 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2649,6 +2649,7 @@ autovacuum_do_vac_analyze(autovac_table *tab,
 	vacstmt.full = false;
 	vacstmt.analyze = tab-at_doanalyze;
 	vacstmt.freeze_min_age = tab-at_freeze_min_age;
+	vacstmt.scan_all = tab-at_wraparound;
 	vacstmt.verbose = false;
 	vacstmt.relation = NULL;	

Re: [HACKERS] Visibility map, partial vacuums

2008-12-03 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes:

 Heikki Linnakangas [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 1) Raise autovacuum_max_freeze_age to 400M or 800M. Having it at 200M just
means unnecessary full table vacuums long before they accomplish 
 anything.

 It allows you to truncate clog. If I did my math right, 200M transactions
 amounts to ~50MB of clog. Perhaps we should still raise it, disk space is 
 cheap
 after all.

Hm, the more I think about it the more this bothers me. It's another subtle
change from the current behaviour. 

Currently *every* vacuum tries to truncate the clog. So you're constantly
trimming off a little bit.

With the visibility map (assuming you fix it not to do full scans all the
time) you can never truncate the clog just as you can never advance the
relfrozenxid unless you do a special full-table vacuum.

I think in practice most people had a read-only table somewhere in their
database which prevented the clog from ever being truncated anyways, so
perhaps this isn't such a big deal.

But the bottom line is that the anti-wraparound vacuums are going to be a lot
more important and much more visible now than they were in the past.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-27 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
There is another problem, though, if the map is frequently probed for 
pages that don't exist in the map, or the map doesn't exist at all. 
Currently, the size of the map file is kept in relcache, in the 
rd_vm_nblocks_cache variable. Whenever a page is accessed that's  
rd_vm_nblocks_cache, smgrnblocks is called to see if the page exists, 
and rd_vm_nblocks_cache is updated. That means that every probe to a 
non-existing page causes an lseek(), which isn't free.


Well, considering how seldom new pages will be added to the visibility
map, it seems to me we could afford to send out a relcache inval event
when that happens.  Then rd_vm_nblocks_cache could be treated as
trustworthy.


Here's an updated version, with a lot of smaller cleanups, and using 
relcache invalidation to notify other backends when the visibility map 
fork is extended. I already committed the change to FSM to do the same. 
I'm feeling quite satisfied to commit this patch early next week.


I modified the VACUUM VERBOSE output slightly, to print the number of 
pages scanned. The added part emphasized below:


postgres=# vacuum verbose foo;
INFO:  vacuuming public.foo
INFO:  foo: removed 230 row versions in 10 pages
INFO:  foo: found 230 removable, 10 nonremovable row versions in *10 
out of* 43 pages

DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM

That seems OK to me, but maybe others have an opinion on that?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-27 Thread Heikki Linnakangas

Heikki Linnakangas wrote:

Here's an updated version, ...


And here it is, for real...

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** src/backend/access/heap/Makefile
--- src/backend/access/heap/Makefile
***
*** 12,17  subdir = src/backend/access/heap
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,17 
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
  
  include $(top_srcdir)/src/backend/common.mk
*** src/backend/access/heap/heapam.c
--- src/backend/access/heap/heapam.c
***
*** 47,52 
--- 47,53 
  #include access/transam.h
  #include access/tuptoaster.h
  #include access/valid.h
+ #include access/visibilitymap.h
  #include access/xact.h
  #include access/xlogutils.h
  #include catalog/catalog.h
***
*** 195,200  heapgetpage(HeapScanDesc scan, BlockNumber page)
--- 196,202 
  	int			ntup;
  	OffsetNumber lineoff;
  	ItemId		lpp;
+ 	bool		all_visible;
  
  	Assert(page  scan-rs_nblocks);
  
***
*** 233,252  heapgetpage(HeapScanDesc scan, BlockNumber page)
  	lines = PageGetMaxOffsetNumber(dp);
  	ntup = 0;
  
  	for (lineoff = FirstOffsetNumber, lpp = PageGetItemId(dp, lineoff);
  		 lineoff = lines;
  		 lineoff++, lpp++)
  	{
  		if (ItemIdIsNormal(lpp))
  		{
- 			HeapTupleData loctup;
  			bool		valid;
  
! 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
! 			loctup.t_len = ItemIdGetLength(lpp);
! 			ItemPointerSet((loctup.t_self), page, lineoff);
  
! 			valid = HeapTupleSatisfiesVisibility(loctup, snapshot, buffer);
  			if (valid)
  scan-rs_vistuples[ntup++] = lineoff;
  		}
--- 235,266 
  	lines = PageGetMaxOffsetNumber(dp);
  	ntup = 0;
  
+ 	/*
+ 	 * If the all-visible flag indicates that all tuples on the page are
+ 	 * visible to everyone, we can skip the per-tuple visibility tests.
+ 	 */
+ 	all_visible = PageIsAllVisible(dp);
+ 
  	for (lineoff = FirstOffsetNumber, lpp = PageGetItemId(dp, lineoff);
  		 lineoff = lines;
  		 lineoff++, lpp++)
  	{
  		if (ItemIdIsNormal(lpp))
  		{
  			bool		valid;
  
! 			if (all_visible)
! valid = true;
! 			else
! 			{
! HeapTupleData loctup;
! 
! loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
! loctup.t_len = ItemIdGetLength(lpp);
! ItemPointerSet((loctup.t_self), page, lineoff);
  
! valid = HeapTupleSatisfiesVisibility(loctup, snapshot, buffer);
! 			}
  			if (valid)
  scan-rs_vistuples[ntup++] = lineoff;
  		}
***
*** 1860,1865  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
--- 1874,1880 
  	TransactionId xid = GetCurrentTransactionId();
  	HeapTuple	heaptup;
  	Buffer		buffer;
+ 	bool		all_visible_cleared = false;
  
  	if (relation-rd_rel-relhasoids)
  	{
***
*** 1920,1925  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
--- 1935,1946 
  
  	RelationPutHeapTuple(relation, buffer, heaptup);
  
+ 	if (PageIsAllVisible(BufferGetPage(buffer)))
+ 	{
+ 		all_visible_cleared = true;
+ 		PageClearAllVisible(BufferGetPage(buffer));
+ 	}
+ 
  	/*
  	 * XXX Should we set PageSetPrunable on this page ?
  	 *
***
*** 1943,1948  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
--- 1964,1970 
  		Page		page = BufferGetPage(buffer);
  		uint8		info = XLOG_HEAP_INSERT;
  
+ 		xlrec.all_visible_cleared = all_visible_cleared;
  		xlrec.target.node = relation-rd_node;
  		xlrec.target.tid = heaptup-t_self;
  		rdata[0].data = (char *) xlrec;
***
*** 1994,1999  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
--- 2016,2026 
  
  	UnlockReleaseBuffer(buffer);
  
+ 	/* Clear the bit in the visibility map if necessary */
+ 	if (all_visible_cleared)
+ 		visibilitymap_clear(relation, 
+ 			ItemPointerGetBlockNumber((heaptup-t_self)));
+ 
  	/*
  	 * If tuple is cachable, mark it for invalidation from the caches in case
  	 * we abort.  Note it is OK to do this after releasing the buffer, because
***
*** 2070,2075  heap_delete(Relation relation, ItemPointer tid,
--- 2097,2103 
  	Buffer		buffer;
  	bool		have_tuple_lock = false;
  	bool		iscombo;
+ 	bool		all_visible_cleared = false;
  
  	Assert(ItemPointerIsValid(tid));
  
***
*** 2216,2221  l1:
--- 2244,2255 
  	 */
  	PageSetPrunable(page, xid);
  
+ 	if (PageIsAllVisible(page))
+ 	{
+ 		all_visible_cleared = true;
+ 		PageClearAllVisible(page);
+ 	}
+ 
  	/* store transaction information of xact deleting the tuple */
  	tp.t_data-t_infomask = ~(HEAP_XMAX_COMMITTED |
  			   HEAP_XMAX_INVALID |
***
*** 2237,2242  l1:
--- 2271,2277 
  		XLogRecPtr	recptr;
  		XLogRecData rdata[2];
 

Re: [HACKERS] Visibility map, partial vacuums

2008-11-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
The visibility map won't be inquired unless you vacuum. This is a bit 
tricky. In vacuum, we only know whether we can set a bit or not, after 
we've acquired a cleanup lock on the page, and scanned all the tuples. 
While we're holding a cleanup lock, we don't want to do I/O, which could 
potentially block out other processes for a long time. So it's too late 
to extend the visibility map at that point.


This is no good; I think you've made the wrong tradeoffs.  In
particular, even though only vacuum *currently* uses the map, you want
to extend it to be used by indexscans.  So it's going to uselessly
spring into being even without vacuums.

I'm not convinced that I/O while holding cleanup lock is so bad that we
should break other aspects of the system to avoid it.  However, if you
want to stick to that, how about
* vacuum page, possibly set its header bit
* release page lock (but not pin)
* if we need to set the bit, fetch the corresponding map page
  (I/O might happen here)
* get share lock on heap page, then recheck its header bit;
  if still set, set the map bit


Yeah, could do that.

There is another problem, though, if the map is frequently probed for 
pages that don't exist in the map, or the map doesn't exist at all. 
Currently, the size of the map file is kept in relcache, in the 
rd_vm_nblocks_cache variable. Whenever a page is accessed that's  
rd_vm_nblocks_cache, smgrnblocks is called to see if the page exists, 
and rd_vm_nblocks_cache is updated. That means that every probe to a 
non-existing page causes an lseek(), which isn't free.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
There is another problem, though, if the map is frequently probed for 
pages that don't exist in the map, or the map doesn't exist at all. 
Currently, the size of the map file is kept in relcache, in the 
rd_vm_nblocks_cache variable. Whenever a page is accessed that's  
rd_vm_nblocks_cache, smgrnblocks is called to see if the page exists, 
and rd_vm_nblocks_cache is updated. That means that every probe to a 
non-existing page causes an lseek(), which isn't free.


Well, considering how seldom new pages will be added to the visibility
map, it seems to me we could afford to send out a relcache inval event
when that happens.  Then rd_vm_nblocks_cache could be treated as
trustworthy.

Maybe it'd be worth doing that for the FSM too.  The frequency of
invals would be higher, but then again the reference frequency is
probably higher too?


A relcache invalidation sounds awfully heavy-weight. Perhaps a 
light-weight invalidation event that doesn't flush the entry altogether, 
but just resets the cached sizes?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Well, considering how seldom new pages will be added to the visibility
 map, it seems to me we could afford to send out a relcache inval event
 when that happens.  Then rd_vm_nblocks_cache could be treated as
 trustworthy.

 A relcache invalidation sounds awfully heavy-weight.

It really isn't.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-26 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 There is another problem, though, if the map is frequently probed for 
 pages that don't exist in the map, or the map doesn't exist at all. 
 Currently, the size of the map file is kept in relcache, in the 
 rd_vm_nblocks_cache variable. Whenever a page is accessed that's  
 rd_vm_nblocks_cache, smgrnblocks is called to see if the page exists, 
 and rd_vm_nblocks_cache is updated. That means that every probe to a 
 non-existing page causes an lseek(), which isn't free.

Well, considering how seldom new pages will be added to the visibility
map, it seems to me we could afford to send out a relcache inval event
when that happens.  Then rd_vm_nblocks_cache could be treated as
trustworthy.

Maybe it'd be worth doing that for the FSM too.  The frequency of
invals would be higher, but then again the reference frequency is
probably higher too?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-26 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Tom Lane wrote:

Well, considering how seldom new pages will be added to the visibility
map, it seems to me we could afford to send out a relcache inval event
when that happens.  Then rd_vm_nblocks_cache could be treated as
trustworthy.



A relcache invalidation sounds awfully heavy-weight.


It really isn't.


Okay, then. I'll use relcache invalidation for both the FSM and 
visibility map.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-25 Thread Decibel!

On Nov 23, 2008, at 3:18 PM, Tom Lane wrote:

So it seems like we do indeed want to rejigger autovac's rules a bit
to account for the possibility of wanting to apply vacuum to get
visibility bits set.



That makes the idea of not writing out hint bit updates unless the  
page is already dirty a lot easier to swallow, because now we'd have  
a mechanism in place to ensure that they were set in a reasonable  
timeframe by autovacuum. That actually wouldn't incur much extra  
overhead at all, except in the case of a table that's effectively  
write-only. Actually, that's not even true; you still have to  
eventually freeze a write-mostly table.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-24 Thread Heikki Linnakangas

Tom Lane wrote:

* ISTM that the patch is designed on the plan that the PD_ALL_VISIBLE
page header flag *must* be correct, but it's really okay if the backing
map bit *isn't* correct --- in particular we don't trust the map bit
when performing antiwraparound vacuums.  This isn't well documented.


Right. Will add comments.

We can't use the map bit for antiwraparound vacuums, because the bit 
doesn't tell you when the tuples have been frozen. And we can't advance 
relfrozenxid if we've skipped any pages.


I've been thinking that we could add one frozenxid field to each 
visibility map page, for the oldest xid on the heap pages covered by the 
visibility map page. That would allow more fine-grained anti-wraparound 
vacuums as well.



* Also, I see that vacuum has a provision for clearing an incorrectly
set PD_ALL_VISIBLE flag, but shouldn't it fix the map too?


Yes, will fix. Although, as long as we don't trust the visibility map, 
no real damage would be done.



* It would be good if the visibility map fork were never created until
there is occasion to set a bit in it; this would for instance typically
mean that temp tables would never have one.  I think that
visibilitymap.c doesn't get this quite right --- in particular
vm_readbuf seems willing to create/extend the fork whether its extend
argument is true or not, so it looks like an inquiry operation would
cause the map fork to be created.  It should be possible to act as
though a nonexistent fork just means all zeroes.


The visibility map won't be inquired unless you vacuum. This is a bit 
tricky. In vacuum, we only know whether we can set a bit or not, after 
we've acquired a cleanup lock on the page, and scanned all the tuples. 
While we're holding a cleanup lock, we don't want to do I/O, which could 
potentially block out other processes for a long time. So it's too late 
to extend the visibility map at that point.


I agree that vm_readbuf should not create the fork if 'extend' is false, 
that's an oversight, but it won't change the actual behavior because 
visibilitymap_test calls it with 'extend' true. Because of the above.


I will add comments about that, though, there's nothing describing that 
currently.



* heap_insert's all_visible_cleared variable doesn't seem to get
initialized --- didn't your compiler complain?


Hmph, I must've been compiling with -O0.


* You missed updating SizeOfHeapDelete and SizeOfHeapUpdate


Thanks.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-24 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 I've been thinking that we could add one frozenxid field to each 
 visibility map page, for the oldest xid on the heap pages covered by the 
 visibility map page. That would allow more fine-grained anti-wraparound 
 vacuums as well.

This doesn't strike me as a particularly good idea.  Right now the map
is only hints as far as vacuum is concerned --- if you do the above then
the map becomes critical data.  And I don't really think you'll buy
much.

 The visibility map won't be inquired unless you vacuum. This is a bit 
 tricky. In vacuum, we only know whether we can set a bit or not, after 
 we've acquired a cleanup lock on the page, and scanned all the tuples. 
 While we're holding a cleanup lock, we don't want to do I/O, which could 
 potentially block out other processes for a long time. So it's too late 
 to extend the visibility map at that point.

This is no good; I think you've made the wrong tradeoffs.  In
particular, even though only vacuum *currently* uses the map, you want
to extend it to be used by indexscans.  So it's going to uselessly
spring into being even without vacuums.

I'm not convinced that I/O while holding cleanup lock is so bad that we
should break other aspects of the system to avoid it.  However, if you
want to stick to that, how about
* vacuum page, possibly set its header bit
* release page lock (but not pin)
* if we need to set the bit, fetch the corresponding map page
  (I/O might happen here)
* get share lock on heap page, then recheck its header bit;
  if still set, set the map bit

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-24 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 So it seems like we do indeed want to rejigger autovac's rules a bit
 to account for the possibility of wanting to apply vacuum to get
 visibility bits set.

 I'm not too excited about triggering an extra vacuum. As Matthew pointed 
 out, the point of this patch is to reduce the number of vacuums 
 required, not increase it. If you're not going to vacuum a table, you 
 don't care if the bits in the visibility map are set or not.

But it's already the case that the bits provide a performance increase
to other things besides vacuum.

 We could set the PD_ALL_VISIBLE flag more aggressively, outside VACUUMs, 
 if we want to make the seqscan optimization more effective. For example, 
 a seqscan could set the flag too, if it sees that all the tuples were 
 visible, and had the XMIN_COMMITTED and XMAX_INVALID hint bits set.

I was wondering whether we could teach heap_page_prune to set the flag
without adding any extra tuple visibility checks.  A seqscan per se
shouldn't be doing this because it doesn't normally call
HeapTupleSatifiesVacuum.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-24 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Heikki Linnakangas [EMAIL PROTECTED] writes:
 I've been thinking that we could add one frozenxid field to each 
 visibility map page, for the oldest xid on the heap pages covered by the 
 visibility map page. That would allow more fine-grained anti-wraparound 
 vacuums as well.

 This doesn't strike me as a particularly good idea.  Right now the map
 is only hints as far as vacuum is concerned --- if you do the above then
 the map becomes critical data.  And I don't really think you'll buy
 much.

Hm, that depends on how critical the critical data is. It's critical that the
frozenxid that autovacuum sees is no more recent than the actual frozenxid,
but not critical that it be entirely up-to-date otherwise.

So if it's possible for the frozenxid in the visibility map to go backwards
then it's no good, since if that update is lost we might skip a necessary
vacuum freeze. But if we guarantee that we never update the frozenxid in the
visibility map forward ahead of recentglobalxmin then it can't ever go
backwards. (Well, not in a way that matters)

However I'm a bit puzzled how you could possibly maintain this frozenxid. As
soon as you freeze an xid you'll have to visit all the other pages covered by
that visibility map page to see what the new value should be.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-24 Thread Heikki Linnakangas

Gregory Stark wrote:

However I'm a bit puzzled how you could possibly maintain this frozenxid. As
soon as you freeze an xid you'll have to visit all the other pages covered by
that visibility map page to see what the new value should be.


Right, you could only advance it when you scan all the pages covered by 
the visibility map page. But that's better than having to scan the whole 
relation.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-24 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 However I'm a bit puzzled how you could possibly maintain this frozenxid. As
 soon as you freeze an xid you'll have to visit all the other pages covered by
 that visibility map page to see what the new value should be.

 Right, you could only advance it when you scan all the pages covered by the
 visibility map page. But that's better than having to scan the whole relation.

Is it? It seems like that would just move around the work. You'll still have
to visit every page once ever 2B transactions or so. You'll just do it 64MB at
a time. 

It's nice to smooth the work but it would be much nicer to detect that a
normal vacuum has already processed all of those pages since the last
insert/update/delete on those pages and so avoid the work entirely.

To avoid the work entirely you need some information about the oldest xid on
those pages seen by regular vacuums (and/or prunes). 

We would want to skip any page which:

a) Has been visited by vacuum freeze and not been updated since 

b) Has been visited by a regular vacuum and the oldest xid found was more
   recent than freeze_threshold.

c) Has been updated frequently such that no old tuples remain

Ideally (b) should completely obviate the need for anti-wraparound freezes
entirely.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's 24x7 Postgres support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-24 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 So if it's possible for the frozenxid in the visibility map to go backwards
 then it's no good, since if that update is lost we might skip a necessary
 vacuum freeze.

Seems like a lost disk write would be enough to make that happen.

Now you might argue that the odds of that are no worse than the odds of
losing an update to one particular heap page, but in this case the
single hiccup could lead to losing half a gigabyte of data (assuming 8K
page size).  The leverage you get for saving vacuum freeze work is
exactly equal to the magnification factor for data loss.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-23 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 I committed the changes to FSM truncation yesterday, that helps with the 
 truncation of the visibility map as well. Attached is an updated 
 visibility map patch.

I looked over this patch a bit ...

 1. The bits in the visibility map are set in the 1st phase of lazy 
 vacuum. That works, but it means that after a delete or update, it takes 
 two vacuums until the bit in the visibility map is set. The first vacuum 
 removes the dead tuple, and only the second sees that there's no dead 
 tuples and sets the bit.

I think this is probably not a big issue really.  The point of this change
is to optimize things for pages that are static over the long term; one
extra vacuum cycle before the page is deemed static doesn't seem like a
problem.  You could even argue that this saves I/O because we don't set
the bit (and perhaps later have to clear it) until we know that the page
has stayed static across a vacuum cycle and thus has a reasonable
probability of continuing to do so.

A possible problem is that if a relation is filled all in one shot,
autovacuum would trigger a single vacuum cycle on it and then never have
a reason to trigger another; leading to the bits never getting set (or
at least not till an antiwraparound vacuum occurs).  We might want to
tweak autovac so that an extra vacuum cycle occurs in this case.  But
I'm not quite sure what a reasonable heuristic would be.

Some other points:

* ISTM that the patch is designed on the plan that the PD_ALL_VISIBLE
page header flag *must* be correct, but it's really okay if the backing
map bit *isn't* correct --- in particular we don't trust the map bit
when performing antiwraparound vacuums.  This isn't well documented.

* Also, I see that vacuum has a provision for clearing an incorrectly
set PD_ALL_VISIBLE flag, but shouldn't it fix the map too?

* It would be good if the visibility map fork were never created until
there is occasion to set a bit in it; this would for instance typically
mean that temp tables would never have one.  I think that
visibilitymap.c doesn't get this quite right --- in particular
vm_readbuf seems willing to create/extend the fork whether its extend
argument is true or not, so it looks like an inquiry operation would
cause the map fork to be created.  It should be possible to act as
though a nonexistent fork just means all zeroes.

* heap_insert's all_visible_cleared variable doesn't seem to get
initialized --- didn't your compiler complain?

* You missed updating SizeOfHeapDelete and SizeOfHeapUpdate

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-23 Thread Jeff Davis
On Sun, 2008-11-23 at 14:05 -0500, Tom Lane wrote:
 A possible problem is that if a relation is filled all in one shot,
 autovacuum would trigger a single vacuum cycle on it and then never have
 a reason to trigger another; leading to the bits never getting set (or
 at least not till an antiwraparound vacuum occurs).  We might want to
 tweak autovac so that an extra vacuum cycle occurs in this case.  But
 I'm not quite sure what a reasonable heuristic would be.
 

This would only be an issue if using the visibility map for things other
than partial vacuum (e.g. index-only scan), right? If we never do
another VACUUM, we don't need partial vacuum.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-23 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 On Sun, 2008-11-23 at 14:05 -0500, Tom Lane wrote:
 A possible problem is that if a relation is filled all in one shot,
 autovacuum would trigger a single vacuum cycle on it and then never have
 a reason to trigger another; leading to the bits never getting set (or
 at least not till an antiwraparound vacuum occurs).

 This would only be an issue if using the visibility map for things other
 than partial vacuum (e.g. index-only scan), right? If we never do
 another VACUUM, we don't need partial vacuum.

Well, the patch already uses the page header bits for optimization of
seqscans, and could probably make good use of them for bitmap scans too.
It'd be nice if the page header bits got set even if the map bits
didn't.

Reflecting on it though, maybe Heikki described the behavior too
pessimistically anyway.  If a page contains no dead tuples, it should
get its bits set on first visit anyhow, no?  So for the ordinary bulk
load scenario where there are no failed insertions, the first vacuum
pass should set all the bits ... at least, if enough time has passed
for RecentXmin to be past the inserting transaction.

However, my comment above was too optimistic, because in an insert-only
scenario autovac would in fact not trigger VACUUM at all, only ANALYZE.

So it seems like we do indeed want to rejigger autovac's rules a bit
to account for the possibility of wanting to apply vacuum to get
visibility bits set.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-23 Thread Matthew T. O'Connor

Tom Lane wrote:

However, my comment above was too optimistic, because in an insert-only
scenario autovac would in fact not trigger VACUUM at all, only ANALYZE.

So it seems like we do indeed want to rejigger autovac's rules a bit
to account for the possibility of wanting to apply vacuum to get
visibility bits set.


I'm sure I'm missing something, but I thought the point of this was to 
lessen the impact of VACUUM and now you are suggesting that we have to 
add vacuums to tables that have never needed one before.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-23 Thread Heikki Linnakangas

Tom Lane wrote:

Reflecting on it though, maybe Heikki described the behavior too
pessimistically anyway.  If a page contains no dead tuples, it should
get its bits set on first visit anyhow, no?  So for the ordinary bulk
load scenario where there are no failed insertions, the first vacuum
pass should set all the bits ... at least, if enough time has passed
for RecentXmin to be past the inserting transaction.


Right. I did say ... after a delete or update, it takes two vacuums 
until ... in my mail.



However, my comment above was too optimistic, because in an insert-only
scenario autovac would in fact not trigger VACUUM at all, only ANALYZE.

So it seems like we do indeed want to rejigger autovac's rules a bit
to account for the possibility of wanting to apply vacuum to get
visibility bits set.


I'm not too excited about triggering an extra vacuum. As Matthew pointed 
out, the point of this patch is to reduce the number of vacuums 
required, not increase it. If you're not going to vacuum a table, you 
don't care if the bits in the visibility map are set or not.


We could set the PD_ALL_VISIBLE flag more aggressively, outside VACUUMs, 
if we want to make the seqscan optimization more effective. For example, 
a seqscan could set the flag too, if it sees that all the tuples were 
visible, and had the XMIN_COMMITTED and XMAX_INVALID hint bits set.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-20 Thread Heikki Linnakangas
I committed the changes to FSM truncation yesterday, that helps with the 
truncation of the visibility map as well. Attached is an updated 
visibility map patch.


There's two open issues:

1. The bits in the visibility map are set in the 1st phase of lazy 
vacuum. That works, but it means that after a delete or update, it takes 
two vacuums until the bit in the visibility map is set. The first vacuum 
removes the dead tuple, and only the second sees that there's no dead 
tuples and sets the bit.


2. Should modify the output of VACUUM VERBOSE to say how many pages were 
actually scanned. What other information is relevant, or is no longer 
relevant, with partial vacuums.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** src/backend/access/heap/Makefile
--- src/backend/access/heap/Makefile
***
*** 12,17  subdir = src/backend/access/heap
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o
  
  include $(top_srcdir)/src/backend/common.mk
--- 12,17 
  top_builddir = ../../../..
  include $(top_builddir)/src/Makefile.global
  
! OBJS = heapam.o hio.o pruneheap.o rewriteheap.o syncscan.o tuptoaster.o visibilitymap.o
  
  include $(top_srcdir)/src/backend/common.mk
*** src/backend/access/heap/heapam.c
--- src/backend/access/heap/heapam.c
***
*** 47,52 
--- 47,53 
  #include access/transam.h
  #include access/tuptoaster.h
  #include access/valid.h
+ #include access/visibilitymap.h
  #include access/xact.h
  #include access/xlogutils.h
  #include catalog/catalog.h
***
*** 195,200  heapgetpage(HeapScanDesc scan, BlockNumber page)
--- 196,202 
  	int			ntup;
  	OffsetNumber lineoff;
  	ItemId		lpp;
+ 	bool		all_visible;
  
  	Assert(page  scan-rs_nblocks);
  
***
*** 233,252  heapgetpage(HeapScanDesc scan, BlockNumber page)
  	lines = PageGetMaxOffsetNumber(dp);
  	ntup = 0;
  
  	for (lineoff = FirstOffsetNumber, lpp = PageGetItemId(dp, lineoff);
  		 lineoff = lines;
  		 lineoff++, lpp++)
  	{
  		if (ItemIdIsNormal(lpp))
  		{
- 			HeapTupleData loctup;
  			bool		valid;
  
! 			loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
! 			loctup.t_len = ItemIdGetLength(lpp);
! 			ItemPointerSet((loctup.t_self), page, lineoff);
  
! 			valid = HeapTupleSatisfiesVisibility(loctup, snapshot, buffer);
  			if (valid)
  scan-rs_vistuples[ntup++] = lineoff;
  		}
--- 235,266 
  	lines = PageGetMaxOffsetNumber(dp);
  	ntup = 0;
  
+ 	/*
+ 	 * If the all-visible flag indicates that all tuples on the page are
+ 	 * visible to everyone, we can skip the per-tuple visibility tests.
+ 	 */
+ 	all_visible = PageIsAllVisible(dp);
+ 
  	for (lineoff = FirstOffsetNumber, lpp = PageGetItemId(dp, lineoff);
  		 lineoff = lines;
  		 lineoff++, lpp++)
  	{
  		if (ItemIdIsNormal(lpp))
  		{
  			bool		valid;
  
! 			if (all_visible)
! valid = true;
! 			else
! 			{
! HeapTupleData loctup;
! 
! loctup.t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
! loctup.t_len = ItemIdGetLength(lpp);
! ItemPointerSet((loctup.t_self), page, lineoff);
  
! valid = HeapTupleSatisfiesVisibility(loctup, snapshot, buffer);
! 			}
  			if (valid)
  scan-rs_vistuples[ntup++] = lineoff;
  		}
***
*** 1860,1865  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
--- 1874,1880 
  	TransactionId xid = GetCurrentTransactionId();
  	HeapTuple	heaptup;
  	Buffer		buffer;
+ 	bool		all_visible_cleared;
  
  	if (relation-rd_rel-relhasoids)
  	{
***
*** 1920,1925  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
--- 1935,1946 
  
  	RelationPutHeapTuple(relation, buffer, heaptup);
  
+ 	if (PageIsAllVisible(BufferGetPage(buffer)))
+ 	{
+ 		all_visible_cleared = true;
+ 		PageClearAllVisible(BufferGetPage(buffer));
+ 	}
+ 
  	/*
  	 * XXX Should we set PageSetPrunable on this page ?
  	 *
***
*** 1943,1948  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
--- 1964,1970 
  		Page		page = BufferGetPage(buffer);
  		uint8		info = XLOG_HEAP_INSERT;
  
+ 		xlrec.all_visible_cleared = all_visible_cleared;
  		xlrec.target.node = relation-rd_node;
  		xlrec.target.tid = heaptup-t_self;
  		rdata[0].data = (char *) xlrec;
***
*** 1994,1999  heap_insert(Relation relation, HeapTuple tup, CommandId cid,
--- 2016,2026 
  
  	UnlockReleaseBuffer(buffer);
  
+ 	/* Clear the bit in the visibility map if necessary */
+ 	if (all_visible_cleared)
+ 		visibilitymap_clear(relation, 
+ 			ItemPointerGetBlockNumber((heaptup-t_self)));
+ 
  	/*
  	 * If tuple is cachable, mark it for invalidation from the caches in case
  	 * we abort.  Note it is OK to do this after releasing the buffer, because
***
*** 2070,2075  heap_delete(Relation relation, ItemPointer tid,
--- 2097,2103 
  	Buffer		buffer;
  	bool		

Re: [HACKERS] Visibility map, partial vacuums

2008-11-14 Thread Heikki Linnakangas

Heikki Linnakangas wrote:
Another thing that does need to be fixed, is the way that the extension 
and truncation of the visibility map is handled; that's broken in the 
current patch. I started working on the patch a long time ago, before 
the FSM rewrite was finished, and haven't gotten around fixing that part 
yet. We already solved it for the FSM, so we could just follow that 
pattern. The way we solved truncation in the FSM was to write a separate 
WAL record with the new heap size, but perhaps we want to revisit that 
decision, instead of adding again new code to write a third WAL record, 
for truncation of the visibility map. smgrtruncate() writes a WAL record 
of its own, if any full blocks are truncated away of the FSM, but we 
needed a WAL record even if no full blocks are truncated from the FSM 
file, because the tail of the last remaining FSM page, representing 
the truncated away heap pages, still needs to cleared. Visibility map 
has the same problem.


One proposal was to piggyback on the smgrtruncate() WAL-record, and call 
FreeSpaceMapTruncateRel from smgr_redo(). I considered that ugly from a 
modularity point of view; smgr.c shouldn't be calling higher-level 
functions. But maybe it wouldn't be that bad, after all. Or, we could 
remove WAL-logging from smgrtruncate() altogether, and move it to 
RelationTruncate() or another higher-level function, and handle the 
WAL-logging and replay there.


In preparation for the visibility map patch, I revisited the truncation 
issue, and hacked together a patch to piggyback the FSM truncation to 
the main fork smgr truncation WAL record. I moved the WAL-logging from 
smgrtruncate() to RelationTruncate(). There's a new flag to 
RelationTruncate indicating whether the FSM should be truncated too, and 
only one truncation WAL record is written for the operation.


That does seem cleaner than the current approach where the FSM writes a 
separate WAL record just to clear the bits of the last remaining FSM 
page. I had to move RelationTruncate() to smgr.c, because I don't think 
a function in bufmgr.c should be doing WAL-logging. However, 
RelationTruncate really doesn't belong in smgr.c either. Also, now that 
smgrtruncate doesn't write its own WAL record, it doesn't seem right for 
smgrcreate to be doing that either.


So, I think I'll take this one step forward, and move RelationTruncate() 
to a new higher level file, e.g. src/backend/catalog/storage.c, and also 
create a new RelationCreateStorage() function that calls smgrcreate(), 
and move the WAL-logging from smgrcreate() to RelationCreateStorage().


So, we'll have two functions in a new file:

/* Create physical storage for a relation. If 'fsm' is true, an FSM fork 
is also created */

RelationCreateStorage(Relation rel, bool fsm)
/* Truncate the relation to 'nblocks' blocks. If 'fsm' is true, the FSM 
is also truncated */

RelationTruncate(Relation rel, BlockNumber nblocks, bool fsm)

The next question is whether the pending rel deletion stuff in smgr.c 
should be moved to the new file too. It seems like it would belong there 
better. That would leave smgr.c as a very thin wrapper around md.c


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-11-14 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 The next question is whether the pending rel deletion stuff in smgr.c should
 be moved to the new file too. It seems like it would belong there better. That
 would leave smgr.c as a very thin wrapper around md.c

Well it's just a switch, albeit with only one case, so I wouldn't expect it to
be much more than a thin wrapper.

If we had more storage systems it might be clearer what features were common
to all of them and could be hoisted up from md.c. I'm not clear there are any
though.

Actually I wonder if an entirely in-memory storage system would help with the
temporary table problem on systems where the kernel is too aggressive about
flushing file buffers or metadata.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-10-28 Thread Simon Riggs

On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:
 One option would be to just ignore that problem for now, and not 
 WAL-log.

Probably worth skipping for now, since it will cause patch conflicts if
you do. Are there any other interactions with Hot Standby? 

But it seems like we can sneak in an extra flag on a HEAP2_CLEAN record
to say page is now all visible, without too much work.

Does the PD_ALL_VISIBLE flag need to be set at the same time as updating
the VM? Surely heapgetpage() could do a ConditionalLockBuffer exclusive
to set the block flag (unlogged), but just not update VM. Separating the
two concepts should allow the visibility check speed gain to more
generally available. 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-10-28 Thread Heikki Linnakangas

Simon Riggs wrote:

On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:
One option would be to just ignore that problem for now, and not 
WAL-log.


Probably worth skipping for now, since it will cause patch conflicts if
you do. Are there any other interactions with Hot Standby? 


But it seems like we can sneak in an extra flag on a HEAP2_CLEAN record
to say page is now all visible, without too much work.


Hmm. Even if a tuple is visible to everyone on the master, it's not 
necessarily yet visible to all the read-only transactions in the slave.



Does the PD_ALL_VISIBLE flag need to be set at the same time as updating
the VM? Surely heapgetpage() could do a ConditionalLockBuffer exclusive
to set the block flag (unlogged), but just not update VM. Separating the
two concepts should allow the visibility check speed gain to more
generally available. 


Yes, that should be possible in theory. There's no version of 
ConditionalLockBuffer() for conditionally upgrading a shared lock to 
exclusive, but it should be possible in theory. I'm not sure if it would 
be safe to set the PD_ALL_VISIBLE_FLAG while holding just a shared lock, 
though. If it is, then we could do just that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-10-28 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 ... I'm not sure if it would 
 be safe to set the PD_ALL_VISIBLE_FLAG while holding just a shared lock, 
 though. If it is, then we could do just that.

Seems like it must be safe.  If you have shared lock on a page then no
one else could be modifying the page in a way that would falsify
PD_ALL_VISIBLE.  You might have several processes concurrently try to
set the bit but that is safe (same situation as for hint bits).

The harder part is propagating the bit to the visibility map, but I
gather you intend to only allow VACUUM to do that?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-10-28 Thread Heikki Linnakangas

Tom Lane wrote:

The harder part is propagating the bit to the visibility map, but I
gather you intend to only allow VACUUM to do that?


Yep.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-10-28 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

To modify a page:
If PD_ALL_VISIBLE flag is set, the bit in the visibility map is cleared 
first. The heap page is kept pinned, but not locked, while the 
visibility map is updated. We want to avoid holding a lock across I/O, 
even though the visibility map is likely to stay in cache. After the 
visibility map has been updated, the page is exclusively locked and 
modified as usual, and PD_ALL_VISIBLE flag is cleared before releasing 
the lock.


So after having determined that you will modify a page, you release the
ex lock on the buffer and then try to regain it later?  Seems like a
really bad idea from here.  What if it's no longer possible to do the
modification you intended?


In case of insert/update, you have to find a new target page. I put the 
logic in RelationGetBufferForTuple(). In case of delete and update (old 
page), the flag is checked and bit cleared just after pinning the 
buffer, before doing anything else. (I note that that's not actually 
what the patch is doing for heap_update, will fix..)


If we give up on the strict requirement that the bit in the visibility 
map has to be cleared if the PD_ALL_VISIBLE flag on the page is not set, 
then we could just update the visibility map after releasing the locks 
on the heap pages. I think I'll do that for now, for simplicity.


To set the PD_ALL_VISIBLE flag, you must hold an exclusive lock on the 
page, while you observe that all tuples on the page are visible to everyone.


That doesn't sound too good from a concurrency standpoint...


Well, no, but it's only done in VACUUM. And pruning. I implemented it as 
a new loop that call HeapTupleSatisfiesVacuum on each tuple, and 
checking that xmin is old enough for live tuples, but come to think of 
it, we're already calling HeapTupleSatisfiesVacuum for every tuple on 
the page during VACUUM, so it should be possible to piggyback on that by 
restructuring the code.


That's how the patch works right now. However, there's a small 
performance problem with the current approach: setting the 
PD_ALL_VISIBLE flag must be WAL-logged. Otherwise, this could happen:


I'm more concerned about *clearing* the bit being WAL-logged.  That's
necessary for correctness.


Yes, clearing the PD_ALL_VISIBLE flag always needs to be WAL-logged. 
There's a new boolean field in xl_heap_insert/update/delete records 
indicating if the operation cleared the flag. On replay, if the flag was 
cleared, the bit in the visibility map is also cleared.



--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-10-28 Thread Simon Riggs

On Tue, 2008-10-28 at 14:57 +0200, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:
  One option would be to just ignore that problem for now, and not 
  WAL-log.
  
  Probably worth skipping for now, since it will cause patch conflicts if
  you do. Are there any other interactions with Hot Standby? 
  
  But it seems like we can sneak in an extra flag on a HEAP2_CLEAN record
  to say page is now all visible, without too much work.
 
 Hmm. Even if a tuple is visible to everyone on the master, it's not 
 necessarily yet visible to all the read-only transactions in the slave.

Never a problem. No query can ever see the rows removed by a cleanup
record, enforced by the recovery system.

  Does the PD_ALL_VISIBLE flag need to be set at the same time as updating
  the VM? Surely heapgetpage() could do a ConditionalLockBuffer exclusive
  to set the block flag (unlogged), but just not update VM. Separating the
  two concepts should allow the visibility check speed gain to more
  generally available. 
 
 Yes, that should be possible in theory. There's no version of 
 ConditionalLockBuffer() for conditionally upgrading a shared lock to 
 exclusive, but it should be possible in theory. I'm not sure if it would 
 be safe to set the PD_ALL_VISIBLE_FLAG while holding just a shared lock, 
 though. If it is, then we could do just that.

To be honest, I'm more excited about your perf results for that than I
am about speeding up some VACUUMs.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-10-28 Thread Heikki Linnakangas

Simon Riggs wrote:

On Tue, 2008-10-28 at 14:57 +0200, Heikki Linnakangas wrote:

Simon Riggs wrote:

On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:
One option would be to just ignore that problem for now, and not 
WAL-log.

Probably worth skipping for now, since it will cause patch conflicts if
you do. Are there any other interactions with Hot Standby? 


But it seems like we can sneak in an extra flag on a HEAP2_CLEAN record
to say page is now all visible, without too much work.
Hmm. Even if a tuple is visible to everyone on the master, it's not 
necessarily yet visible to all the read-only transactions in the slave.


Never a problem. No query can ever see the rows removed by a cleanup
record, enforced by the recovery system.


Yes, but there's a problem with recently inserted tuples:

1. A query begins in the slave, taking a snapshot with xmax = 100. So 
the effects of anything more recent should not be seen.

2. Transaction 100 inserts a tuple in the master, and commits
3. A vacuum comes along. There's no other transactions running in the 
master. Vacuum sees that all tuples on the page, including the one just 
inserted, are visible to everyone, and sets PD_ALL_VISIBLE flag.

4. The change is replicated to the slave.
5. The query in the slave that began at step 1 looks at the page, sees 
that the PD_ALL_VISIBLE flag is set. Therefore it skips the visibility 
checks, and erroneously returns the inserted tuple.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-10-28 Thread Simon Riggs

On Mon, 2008-10-27 at 14:03 +0200, Heikki Linnakangas wrote:

 Lazy VACUUM only needs to visit pages that are '0' in the visibility 
 map. This allows partial vacuums, where we only need to scan those parts 
 of the table that need vacuuming, plus all indexes.

Just realised that this means we still have to visit each block of a
btree index with a cleanup lock.

That means the earlier idea of saying I don't need a cleanup lock if the
page is not in memory makes a lot more sense with a partial vacuum.

1. Scan all blocks in memory for the index (and so, don't do this unless
the index is larger than a certain % of shared buffers), 
2. Start reading in new blocks until you've removed the correct number
of tuples
3. Work through the rest of the blocks checking that they are either in
shared buffers and we can get a cleanup lock, or they aren't in shared
buffers and so nobody has them pinned.

If you step (2) intelligently with regard to index correlation you might
not need to do much I/O at all, if any.

(1) has a good hit ratio because mostly only active tables will be
vacuumed so are fairly likely to be in memory.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Visibility map, partial vacuums

2008-10-28 Thread Simon Riggs

On Tue, 2008-10-28 at 19:02 +0200, Heikki Linnakangas wrote:

 Yes, but there's a problem with recently inserted tuples:
 
 1. A query begins in the slave, taking a snapshot with xmax = 100. So 
 the effects of anything more recent should not be seen.
 2. Transaction 100 inserts a tuple in the master, and commits
 3. A vacuum comes along. There's no other transactions running in the 
 master. Vacuum sees that all tuples on the page, including the one just 
 inserted, are visible to everyone, and sets PD_ALL_VISIBLE flag.
 4. The change is replicated to the slave.
 5. The query in the slave that began at step 1 looks at the page, sees 
 that the PD_ALL_VISIBLE flag is set. Therefore it skips the visibility 
 checks, and erroneously returns the inserted tuple.

Yep. I was thinking about FSM and row removal. So PD_ALL_VISIBLE must be
separately settable on the standby. Another reason why it should be able
to be set without a VACUUM - since there will never be one on standby.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >