Re: [HACKERS] Dead Space Map for vacuum

2007-01-16 Thread ITAGAKI Takahiro
I can see that there are two issues in the design of Dead Space Map
in the recent discussions:
  1. information accuracy of dead spaces
  2. memory management

I'll write up the discussion about the 1st for now.


We need to increase page-tracking status for effective vacuum. 1 bit per
block is not enough.

Simon Riggs [EMAIL PROTECTED] wrote:
 I would suggest that we tracked whether a block has had 0, 1 or 1+
 updates/deletes against it. When a block has 1+ it can then be
 worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is
 really the minimum space worth reclaiming on any block.

The suggestion is to classify pages by vacuum priority. There are 3 tracking
status in the model.
  [A1] Clean (all tuples in the page are frozen)
  [A2] Low priority to vacuum
  [A3] High priority to vacuum

In another discussion, there is a idea to avoid aggressive freezing.
Normal VACUUM scans only pages marked in the B3 bitmap.
  [B1] Clean
  [B2] Unfrozen (some tuples need to be frozen)
  [B3] Unvacuumed (some tuples need to be vacuumed)


Both of the above have only 3 status, so that we can describe all of them
in 2 bits. I would suggest the 4 status DSM model:
  [C1] Clean
  [C2] Unfrozen (all tuples are possible to be frozen, but not yet)
  [C3] Low priority to vacuum
  [C4] High priority to vacuum

INSERT or after-UPDATE tuples are marked with C3 status -- they need
only to be frozen on commit. In the other hand, DELETE or before-UPDATE
tuples are marked with C4 status -- to be vacuumed on commit.
If transaction becomes ROLLBACK, the necessity of freeze/vacuum will
be inverted, but we can suppose COMMIT is more than ROLLBACK.

We can lower the priority C4 to C3 for the pages that has too small free
spaces to reuse, as the original idea by Simon. We can refer to C3 status
to find the page has had 0 or 1 dead tuples then. Marking either C3 or C4
is an optimizing issue.


We need to add new two VACUUM modes, that use Dead Space Map.
Almost users and autovacuum use only the mode 5.

1.VACUUM FULL   (scan all pages)
2.VACUUM FREEZE ALL (scan all pages)
3.VACUUM ALL(scan all pages)
4.VACUUM FREEZE (scan C2,3,4)
5.VACUUM(scan only C4 basically)

VACUUM downgrades the status of scanned pages from C4 to other.
If any dead tuples, VACUUM tries to freeze all tuples in the page
and change its status to C1(Clean), because it become dirty at all,
freezing is almost free (no additional I/Os). When unfrozen or
unvacuumed tuples remain, the status becomes C2 or C3.

Normal VACUUM (with DSM) scans only pages marked with C4 status basically,
but it may be good to vacuum other pages in some cases; in maintenance
windows, in the case we can retrive several pages in one disk read, etc.
This is also an optimizing issue.


Any ideas?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Dead Space Map for vacuum

2007-01-05 Thread Jim Nasby

On Jan 3, 2007, at 11:42 PM, ITAGAKI Takahiro wrote:
BTW, if we want to achieve the index-only scan, we might have to do  
more
aggressive VACUUM FREEZE. There were many comments that we should  
avoid
vacuuming pages that contain only unfrozen tuples or a few dead  
tuples.
I think it it true for efficient VACUUM, but the index-only scan  
does not

work for the unfrozen pages. Which should we give priority?


Unless I'm mistaken, as soon as vacuum decides to dirty a page, the  
only cost involved in freezing the page is CPU - and vacuum isn't  
exactly a CPU-intensive process.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Dead Space Map for vacuum

2007-01-03 Thread ITAGAKI Takahiro

Jochem van Dieten [EMAIL PROTECTED] wrote:

 On 12/28/06, ITAGAKI Takahiro wrote:
 
  | [TODO item] Allow data to be pulled directly from indexes
  | Another idea is to maintain a bitmap of heap pages where all rows are
  | visible to all backends, and allow index lookups to reference that bitmap
  | to avoid heap lookups
 
  It is not done yet, but we can use DSM for this purpose. If the 
  corresponding
  bit in DSM is '0', all tuples in the page are frozen and visible to all
  backends. We don't have to look up frozen pages only for visibiliby 
  checking.
 
 Does that really work in the absence of a retail index vacuum method?
 What if the heap is already vacuumed, frozen and the bit for that page
 in the DSM is set to '0', but the index still contains entries that
 haven't been removed by a vacuum yet?

No, if the DSM bit is 0, there are no frozen nor dead tuples in the page
and no dead index entries linking to tuples in it. In other words, we can
reset DSM to 0 only in such condition.

BTW, if we want to achieve the index-only scan, we might have to do more
aggressive VACUUM FREEZE. There were many comments that we should avoid
vacuuming pages that contain only unfrozen tuples or a few dead tuples.
I think it it true for efficient VACUUM, but the index-only scan does not
work for the unfrozen pages. Which should we give priority?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Dead Space Map for vacuum

2006-12-29 Thread Simon Riggs
On Thu, 2006-12-28 at 21:35 +, Heikki Linnakangas wrote:

 I only used 1 bit, just like in Itagaki's approach. 

1 bit may not be enough.

In many cases, a block will receive only 1 UPDATE or DELETE. If we then
mark this in the DSM, when we VACUUM that block it will not have
sufficient space free to make it worth VACUUM adding the block to the
FSM (using current thinking). That thinking is good, since tuples
frequently vary in length, so we could easy be in a position that a
block in the FSM doesn't even have space for a single new tuple version.

I would suggest that we tracked whether a block has had 0, 1 or 1+
updates/deletes against it. When a block has 1+ it can then be
worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is
really the minimum space worth reclaiming on any block.

Otherwise we will end up with a huge, not very useful FSM and fairly
inefficient VACUUMing of single dead tuples.

So I'd suggest that we used at least 2 bits/block, so that we can avoid
fiddling with small amounts of reclaimed space. The extra space taken up
by the DSM is easily outweighed by the space we would have wasted in the
FSM to make it effective when tracking smaller chunks of freespace
(which needs 6 bytes/block).

DSM is a good thing; many use cases will benefit from it.

I'd want to be able to turn off DSM completely when the whole database
fits in RAM and would probably like to enable/disable it for particular
tables, until we get some good heuristics for when it is worthwhile.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Dead Space Map for vacuum

2006-12-29 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 I would suggest that we tracked whether a block has had 0, 1 or 1+
 updates/deletes against it. When a block has 1+ it can then be
 worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is
 really the minimum space worth reclaiming on any block.

How do you arrive at that conclusion?

Counterexample: table in which all tuples exceed half a page.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Dead Space Map for vacuum

2006-12-29 Thread Simon Riggs
On Fri, 2006-12-29 at 10:49 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  I would suggest that we tracked whether a block has had 0, 1 or 1+
  updates/deletes against it. When a block has 1+ it can then be
  worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is
  really the minimum space worth reclaiming on any block.
 
 How do you arrive at that conclusion?

FSM code ignores any block with less space than 1 average tuple, which
is a pretty reasonable rule.

If you only track whether a block has been updated, not whether it has
been updated twice, then you will be VACUUMing lots of blocks that have
only a 50% chance of being usefully stored by the FSM. As I explained,
the extra bit per block is easily regained from storing less FSM data.

My understanding was that DSM was meant to increase VACUUM efficiency,
so having a way to focus in on blocks most worth vacuuming makes sense
using the 80/20 rule.

 Counterexample: table in which all tuples exceed half a page.

Current FSM code will ignore those too, if they are less than the
average size of the tuple so far requested. Thats a pretty wierd
counterexample, even if it is a case that needs handling.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Dead Space Map for vacuum

2006-12-29 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Fri, 2006-12-29 at 10:49 -0500, Tom Lane wrote:
 Counterexample: table in which all tuples exceed half a page.

 Current FSM code will ignore those too, if they are less than the
 average size of the tuple so far requested. Thats a pretty wierd
 counterexample, even if it is a case that needs handling.

Better read it again.  The number that's passed to the FSM is the
free space *after* vacuuming, which in this scenario will be
BLCKSZ-less-page-header.  This case is not broken now, but it will
be if we adopt your proposal.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Dead Space Map for vacuum

2006-12-29 Thread Simon Riggs
On Fri, 2006-12-29 at 16:41 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Fri, 2006-12-29 at 10:49 -0500, Tom Lane wrote:
  Counterexample: table in which all tuples exceed half a page.
 
  Current FSM code will ignore those too, if they are less than the
  average size of the tuple so far requested. Thats a pretty wierd
  counterexample, even if it is a case that needs handling.
 
 Better read it again.  The number that's passed to the FSM is the
 free space *after* vacuuming, which in this scenario will be
 BLCKSZ-less-page-header.  This case is not broken now, but it will
 be if we adopt your proposal.

The case doesn't is extremely rare, since

#define TOAST_TUPLE_THRESHOLD (MaxTupleSize / 4)

Even so, I'm fairly certain that an  if ()  statement is OK to handle
that case. So I don't really understand that as a limit to the proposal,
which is a small change in the scheme of things. 

DSM has my support; I would like it to be as efficient as possible.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Dead Space Map for vacuum

2006-12-29 Thread Russell Smith

Simon Riggs wrote:

On Fri, 2006-12-29 at 10:49 -0500, Tom Lane wrote:
  

Simon Riggs [EMAIL PROTECTED] writes:


I would suggest that we tracked whether a block has had 0, 1 or 1+
updates/deletes against it. When a block has 1+ it can then be
worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is
really the minimum space worth reclaiming on any block.
  

How do you arrive at that conclusion?



FSM code ignores any block with less space than 1 average tuple, which
is a pretty reasonable rule.
  
FSM serves a different purpose than DSM and therefore has an entirely 
different set of rules governing what it should and shouldn't be doing.  
This is a reasonable rule for FSM, but not for DSM.

If you only track whether a block has been updated, not whether it has
been updated twice, then you will be VACUUMing lots of blocks that have
only a 50% chance of being usefully stored by the FSM. As I explained,
the extra bit per block is easily regained from storing less FSM data.
  
Well, it seems that when implementing the DSM, it'd be a great time to 
move FSM from it's current location in Shared Memory to somewhere else.  
Possibly the same place as DSM.  A couple of special blocks per file 
segment would a good place.  Also I'm not sure that the point of 
VACUUMing is always to be able be able to immediately reuse the space.  
There are cases where large DELETE's are done, and you just want to 
decrease the index size.  In Tom's counter example of large tuples, you 
certainly want to vacuum the index when only a single update/delete occurs.

My understanding was that DSM was meant to increase VACUUM efficiency,
so having a way to focus in on blocks most worth vacuuming makes sense
using the 80/20 rule.
  
Possibly true.  I don't have anything to indicate what usage patterns 
produce what requirements in vacuum patterns.  If there are significant 
numbers of blocks with one update, is it a loss to actually vacuum 
those.  I know it could be faster if we didn't, but would it still be 
faster than what we do now.
  

Counterexample: table in which all tuples exceed half a page.



Current FSM code will ignore those too, if they are less than the
average size of the tuple so far requested. Thats a pretty wierd
counterexample, even if it is a case that needs handling.
  
Again I'd be careful saying that FSM = DSM for handling of what should 
be done for a particular block.


Russell Smith.



Re: [HACKERS] Dead Space Map for vacuum

2006-12-29 Thread Simon Riggs
On Sat, 2006-12-30 at 09:22 +1100, Russell Smith wrote:
 Simon Riggs wrote: 
  FSM code ignores any block with less space than 1 average tuple, which
  is a pretty reasonable rule.

 FSM serves a different purpose than DSM and therefore has an entirely
 different set of rules governing what it should and shouldn't be
 doing.  This is a reasonable rule for FSM, but not for DSM.

VACUUM and space reuse are intimately connected. You cannot consider one
without considering the other.

  If you only track whether a block has been updated, not whether it has
  been updated twice, then you will be VACUUMing lots of blocks that have
  only a 50% chance of being usefully stored by the FSM. As I explained,
  the extra bit per block is easily regained from storing less FSM data.

 Well, it seems that when implementing the DSM, it'd be a great time to
 move FSM from it's current location in Shared Memory to somewhere
 else.  Possibly the same place as DSM.  A couple of special blocks per
 file segment would a good place.  Also I'm not sure that the point of
 VACUUMing is always to be able be able to immediately reuse the space.
 There are cases where large DELETE's are done, and you just want to
 decrease the index size. 

I can see the argument in favour of reducing index size, but do we want
to perform a read *and* a write IO to remove *one* heap tuple, just so
we can remove a single tuple from index(es)?

We might want to eventually, but I'm proposing keeping track of that so
that we can tell the difference between single dead tuples and more
efficient targets for our VACUUM. When there are no better targets,
sure, we'll be forced to reach for the high fruit.

The idea of DSM is to improve the efficiency of VACUUM by removing
wasted I/Os (and again I say, I back DSM). The zero-dead-tuples blocks
are obvious targets to avoid, but avoiding them only avoids one read
I/O. Avoiding the single-dead-tuple blocks avoids two I/Os, at small
loss. They are poor targets for a selective VACUUM.

When we are picking just some of the blocks in a table, we will quickly
move from sequential to random I/O, so we must be careful and
conservative about the blocks we pick, otherwise DSM VACUUM will not be
any better than VACUUM as it is now.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Dead Space Map for vacuum

2006-12-28 Thread Gavin Sherry
On Thu, 28 Dec 2006, Heikki Linnakangas wrote:

 ITAGAKI Takahiro wrote:
  Hello,
 
  NTT staffs are working on TODO item:
  | Create a bitmap of pages that need vacuuming
 
  We call the bitmap Dead Space Map (DSM), that allows VACUUM to scan
  only pages that need vacuuming or freezing. We'd like to discuss the
  design on hackers and make agreements with community.

 Great!

I agree!


  We implemented the basic parts of it and measured the performance.
  As expected, VACUUM took shorter time when the fraction of updates are low.
  DSM is useful for large but not so heavily-updated databases. The overhead
  of managing DSM seemed to be negligible small in our CPU-loaded tests.
 
  There are a lot of choices in implementation. We followed the descriptions
  in TODO list and past discussions in some parts, but did not in other parts
  for some reasons. I would appreciate your comments and suggestions.

 I experimented with a different DSM design last winter. I got busy with
 other things and never posted it AFAIR, but the idea was to store a
 bitmap in the special area on every 32k heap page. That had some advantages:

 * doesn't require a new dedicated shared memory area that needs to be
 allocated and tuned.
 * doesn't introduce a (single) new global lwlock that might become hotspot.
 * WAL logging is quite simple, since the bitmaps are on normal pages
 handled by buffer manager.

I too have experimented with this idea. 4 DSM pages means 2 bits per
block. What were you using the other bit for? When I discussed this some
time ago with Jan Weick, we recommended we track blocks in the FSM with
this extra bit.

One problem this approach may have is contention for the DSM pages and the
granularity of the lock associated with it. Locking a DSM page to update
one bit will affect writes to 32K pages. This might be less of a problem
than I think because of the very low cost of setting the bit.

  | In the event of a system crash, the bitmap would probably be invalidated.
 
  We bought it for simplicity. Currently, all DSM are lost after crash.
  All tables will be untracked status. Full-scanned vacuum is needed
  after the lost of DSM.

 If you flush the DSM to disk at checkpoint, it should be easy to bring
 it up-to-date on WAL replay. Having to do full-scanning vacuum after
 crash can be a nasty surprise.

I agree. One interesting insight is, we have 4 bits left over (those not
used for the 4 DSM blocks). We might use those status bits for some
purpose. For example, when a page is dirtied and we update the DSM page,
we set a DSM dirtied bit. Checkpoint would then only flush DSM pages
dirtied since the last check point. This is important in the presense of
lots of read only tables.

  | One complexity is that index entries still have to be vacuumed, and doing
  | this without an index scan (by using the heap values to find the index 
  entry)
  | might be slow and unreliable, especially for user-defined index functions.
 
  Indexes are still needed to be full-scanned at the present moment. We are
  also researching a retail index vacuum method, but it requires more works.

 Yeah, that's an old story :(.

 BTW: Yesterday I realized that bitmap indexes could be retail vacuumed
 safely. You'll still need to visit all bitmaps to find the dead bit, but
 you only need to check the bitmap page that corresponds the tid of the
 dead tuple.

Cool. The problem is solving it for the other AMs, particularly btree.


  | http://archives.postgresql.org/pgsql-hackers/2004-03/msg00957.php
  | Maintain 2 bits per block that tell if the block has been vaccumed of all
  | dead tuples since the last time it was dirtied, and if all its tuples are
  | completely frozen.
 
  We use 1 bit per block, so we cannot separate pages that need either
  vacuum or freeze only. The reason is that we cannot determine where to
  record before/after updated tuples. If the transaction is commited,
  before-version should be recorded into needs-vacuum bitmap and
  after-version into needs-freeze bitmap. But on rollback, it is inverted.
  We cannot judge which we should do until the end of transaction.

 Yeah, that makes the DSM less useful. Are you thinking of freezing more
 aggressively because of that? Or doing a full-scanning vacuum every now
 and then to freeze?

I don't see any problem with moving to two status bits per block in the
DSM-in-heap model, so we can track this better. What do you think?

Thanks,

Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Dead Space Map for vacuum

2006-12-28 Thread Jochem van Dieten

On 12/28/06, ITAGAKI Takahiro wrote:


| [TODO item] Allow data to be pulled directly from indexes
| Another idea is to maintain a bitmap of heap pages where all rows are
| visible to all backends, and allow index lookups to reference that bitmap
| to avoid heap lookups

It is not done yet, but we can use DSM for this purpose. If the corresponding
bit in DSM is '0', all tuples in the page are frozen and visible to all
backends. We don't have to look up frozen pages only for visibiliby checking.


Does that really work in the absence of a retail index vacuum method?
What if the heap is already vacuumed, frozen and the bit for that page
in the DSM is set to '0', but the index still contains entries that
haven't been removed by a vacuum yet?

Jochem

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Dead Space Map for vacuum

2006-12-28 Thread ITAGAKI Takahiro

Heikki Linnakangas [EMAIL PROTECTED] wrote:

  We use 1 bit per block, so we cannot separate pages that need either
  vacuum or freeze only. The reason is that we cannot determine where to
  record before/after updated tuples. If the transaction is commited,
  before-version should be recorded into needs-vacuum bitmap and
  after-version into needs-freeze bitmap. But on rollback, it is inverted.
  We cannot judge which we should do until the end of transaction.
 
 Yeah, that makes the DSM less useful. Are you thinking of freezing more 
 aggressively because of that? Or doing a full-scanning vacuum every now 
 and then to freeze?

I recommend to use VACUUM FREEZE with the patch, but what you say is
surely reasonable. I think that backends record all of the changes into
1st bitmap, and then, VACUUM moves some bits from 1st bitmap to 2nd one.

Assumimg we had two bitmaps; (1)invisible-from-someone-bitmap and
(2)freeze-needed-bitmap. (Actually, two bits would be packed into one
compound bitmap, but for illustrative purposes.)

We have to record changes into (1) at first, because we can find a page
needs either VACUUM or FREEZE only after commits or rollbacks.

VACUUM scans pages using (1). Newly inserted pages are also scanned at
first vacuum, even if they requires only freeze. If the vacuum find that
all of tuples in a page are vacuumed completely and old enough to be
visible by all open transactions, delete the page from (1). Then, all are
also frozen, the page is not in both bitmaps. Otherwise, put it into (2).

We can use (1) for the index only scanning, because all of the tuples not
recorded in (1) are visible to all backends, regardless of whether they
are also recorded in (2).

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Dead Space Map for vacuum

2006-12-28 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 I experimented with a different DSM design last winter. I got busy with 
 other things and never posted it AFAIR, but the idea was to store a 
 bitmap in the special area on every 32k heap page. That had some advantages:

 * doesn't require a new dedicated shared memory area that needs to be 
 allocated and tuned.
 * doesn't introduce a (single) new global lwlock that might become hotspot.

I agree with doing something along that line (not necessarily with
putting the bits into existing heap pages, but anyway with keeping the
information on disk not in shared memory).  Our experience with the
existing FSM design has been uniformly negative; so I don't wish to add
still another shared memory area that requires tricky manual size
tuning.  I think sooner or later we'll need to redesign FSM so it's not
kept in a fixed-size shared memory area.  Let's not make that mistake
twice.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Dead Space Map for vacuum

2006-12-28 Thread Heikki Linnakangas

Gavin Sherry wrote:

On Thu, 28 Dec 2006, Heikki Linnakangas wrote:


ITAGAKI Takahiro wrote:
I experimented with a different DSM design last winter. I got busy with
other things and never posted it AFAIR, but the idea was to store a
bitmap in the special area on every 32k heap page. That had some advantages:

* doesn't require a new dedicated shared memory area that needs to be
allocated and tuned.
* doesn't introduce a (single) new global lwlock that might become hotspot.
* WAL logging is quite simple, since the bitmaps are on normal pages
handled by buffer manager.


I too have experimented with this idea. 4 DSM pages means 2 bits per
block. What were you using the other bit for? When I discussed this some
time ago with Jan Weick, we recommended we track blocks in the FSM with
this extra bit.


I only used 1 bit, just like in Itagaki's approach. The bitmap in the 
special area only occupied half a page, the rest was available for 
normal heap tuples. That might seem strange, but it avoids having to 
allocate 2 pages for small tables with just a handful of rows. It also 
spreads out the lock contention a bit.


I think I wasn't clear in my explanation. If a table had less than 32k 
pages, it had just one bitmap with each bit representing a heap page. 
The bitmap was stored in the special area of the first page. If a table 
was larger, the bitmap on the 1st page represented pages 1-32k. On the 
32768th page there's another bitmap that represents the next 32k pages, 
and so on. In other words, the DSM bitmap of page number X was always on 
page number (X / 32768). Vacuum had to check all the bitmaps.



BTW: Yesterday I realized that bitmap indexes could be retail vacuumed
safely. You'll still need to visit all bitmaps to find the dead bit, but
you only need to check the bitmap page that corresponds the tid of the
dead tuple.


Cool. The problem is solving it for the other AMs, particularly btree.


Yeah..


| http://archives.postgresql.org/pgsql-hackers/2004-03/msg00957.php
| Maintain 2 bits per block that tell if the block has been vaccumed of all
| dead tuples since the last time it was dirtied, and if all its tuples are
| completely frozen.

We use 1 bit per block, so we cannot separate pages that need either
vacuum or freeze only. The reason is that we cannot determine where to
record before/after updated tuples. If the transaction is commited,
before-version should be recorded into needs-vacuum bitmap and
after-version into needs-freeze bitmap. But on rollback, it is inverted.
We cannot judge which we should do until the end of transaction.

Yeah, that makes the DSM less useful. Are you thinking of freezing more
aggressively because of that? Or doing a full-scanning vacuum every now
and then to freeze?


I don't see any problem with moving to two status bits per block in the
DSM-in-heap model, so we can track this better. What do you think?


Well, the obvious drawback is that two bits take more space than one 
bit. But it feels ok to me as well.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Dead Space Map for vacuum

2006-12-27 Thread Heikki Linnakangas

ITAGAKI Takahiro wrote:

Hello,

NTT staffs are working on TODO item:
| Create a bitmap of pages that need vacuuming

We call the bitmap Dead Space Map (DSM), that allows VACUUM to scan
only pages that need vacuuming or freezing. We'd like to discuss the
design on hackers and make agreements with community.


Great!


We implemented the basic parts of it and measured the performance.
As expected, VACUUM took shorter time when the fraction of updates are low.
DSM is useful for large but not so heavily-updated databases. The overhead
of managing DSM seemed to be negligible small in our CPU-loaded tests.

There are a lot of choices in implementation. We followed the descriptions
in TODO list and past discussions in some parts, but did not in other parts
for some reasons. I would appreciate your comments and suggestions.


I experimented with a different DSM design last winter. I got busy with 
other things and never posted it AFAIR, but the idea was to store a 
bitmap in the special area on every 32k heap page. That had some advantages:


* doesn't require a new dedicated shared memory area that needs to be 
allocated and tuned.

* doesn't introduce a (single) new global lwlock that might become hotspot.
* WAL logging is quite simple, since the bitmaps are on normal pages 
handled by buffer manager.


I had it working enough to see that vacuum time was shortened, but I 
didn't perform any further performance testing.



| In the event of a system crash, the bitmap would probably be invalidated.

We bought it for simplicity. Currently, all DSM are lost after crash.
All tables will be untracked status. Full-scanned vacuum is needed
after the lost of DSM.


If you flush the DSM to disk at checkpoint, it should be easy to bring 
it up-to-date on WAL replay. Having to do full-scanning vacuum after 
crash can be a nasty surprise.



| One complexity is that index entries still have to be vacuumed, and doing
| this without an index scan (by using the heap values to find the index entry)
| might be slow and unreliable, especially for user-defined index functions. 


Indexes are still needed to be full-scanned at the present moment. We are
also researching a retail index vacuum method, but it requires more works.


Yeah, that's an old story :(.

BTW: Yesterday I realized that bitmap indexes could be retail vacuumed 
safely. You'll still need to visit all bitmaps to find the dead bit, but 
you only need to check the bitmap page that corresponds the tid of the 
dead tuple.



| http://archives.postgresql.org/pgsql-hackers/2004-03/msg00957.php
| Maintain 2 bits per block that tell if the block has been vaccumed of all
| dead tuples since the last time it was dirtied, and if all its tuples are
| completely frozen.

We use 1 bit per block, so we cannot separate pages that need either
vacuum or freeze only. The reason is that we cannot determine where to
record before/after updated tuples. If the transaction is commited,
before-version should be recorded into needs-vacuum bitmap and
after-version into needs-freeze bitmap. But on rollback, it is inverted.
We cannot judge which we should do until the end of transaction.


Yeah, that makes the DSM less useful. Are you thinking of freezing more 
aggressively because of that? Or doing a full-scanning vacuum every now 
and then to freeze?



| [TODO item] Allow data to be pulled directly from indexes
| Another idea is to maintain a bitmap of heap pages where all rows are
| visible to all backends, and allow index lookups to reference that bitmap
| to avoid heap lookups

It is not done yet, but we can use DSM for this purpose. If the corresponding
bit in DSM is '0', all tuples in the page are frozen and visible to all
backends. We don't have to look up frozen pages only for visibiliby checking.


Cool.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster