Re: [HACKERS] Dead Space Map for vacuum
I can see that there are two issues in the design of Dead Space Map in the recent discussions: 1. information accuracy of dead spaces 2. memory management I'll write up the discussion about the 1st for now. We need to increase page-tracking status for effective vacuum. 1 bit per block is not enough. Simon Riggs [EMAIL PROTECTED] wrote: I would suggest that we tracked whether a block has had 0, 1 or 1+ updates/deletes against it. When a block has 1+ it can then be worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is really the minimum space worth reclaiming on any block. The suggestion is to classify pages by vacuum priority. There are 3 tracking status in the model. [A1] Clean (all tuples in the page are frozen) [A2] Low priority to vacuum [A3] High priority to vacuum In another discussion, there is a idea to avoid aggressive freezing. Normal VACUUM scans only pages marked in the B3 bitmap. [B1] Clean [B2] Unfrozen (some tuples need to be frozen) [B3] Unvacuumed (some tuples need to be vacuumed) Both of the above have only 3 status, so that we can describe all of them in 2 bits. I would suggest the 4 status DSM model: [C1] Clean [C2] Unfrozen (all tuples are possible to be frozen, but not yet) [C3] Low priority to vacuum [C4] High priority to vacuum INSERT or after-UPDATE tuples are marked with C3 status -- they need only to be frozen on commit. In the other hand, DELETE or before-UPDATE tuples are marked with C4 status -- to be vacuumed on commit. If transaction becomes ROLLBACK, the necessity of freeze/vacuum will be inverted, but we can suppose COMMIT is more than ROLLBACK. We can lower the priority C4 to C3 for the pages that has too small free spaces to reuse, as the original idea by Simon. We can refer to C3 status to find the page has had 0 or 1 dead tuples then. Marking either C3 or C4 is an optimizing issue. We need to add new two VACUUM modes, that use Dead Space Map. Almost users and autovacuum use only the mode 5. 1.VACUUM FULL (scan all pages) 2.VACUUM FREEZE ALL (scan all pages) 3.VACUUM ALL(scan all pages) 4.VACUUM FREEZE (scan C2,3,4) 5.VACUUM(scan only C4 basically) VACUUM downgrades the status of scanned pages from C4 to other. If any dead tuples, VACUUM tries to freeze all tuples in the page and change its status to C1(Clean), because it become dirty at all, freezing is almost free (no additional I/Os). When unfrozen or unvacuumed tuples remain, the status becomes C2 or C3. Normal VACUUM (with DSM) scans only pages marked with C4 status basically, but it may be good to vacuum other pages in some cases; in maintenance windows, in the case we can retrive several pages in one disk read, etc. This is also an optimizing issue. Any ideas? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Dead Space Map for vacuum
On Jan 3, 2007, at 11:42 PM, ITAGAKI Takahiro wrote: BTW, if we want to achieve the index-only scan, we might have to do more aggressive VACUUM FREEZE. There were many comments that we should avoid vacuuming pages that contain only unfrozen tuples or a few dead tuples. I think it it true for efficient VACUUM, but the index-only scan does not work for the unfrozen pages. Which should we give priority? Unless I'm mistaken, as soon as vacuum decides to dirty a page, the only cost involved in freezing the page is CPU - and vacuum isn't exactly a CPU-intensive process. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Dead Space Map for vacuum
Jochem van Dieten [EMAIL PROTECTED] wrote: On 12/28/06, ITAGAKI Takahiro wrote: | [TODO item] Allow data to be pulled directly from indexes | Another idea is to maintain a bitmap of heap pages where all rows are | visible to all backends, and allow index lookups to reference that bitmap | to avoid heap lookups It is not done yet, but we can use DSM for this purpose. If the corresponding bit in DSM is '0', all tuples in the page are frozen and visible to all backends. We don't have to look up frozen pages only for visibiliby checking. Does that really work in the absence of a retail index vacuum method? What if the heap is already vacuumed, frozen and the bit for that page in the DSM is set to '0', but the index still contains entries that haven't been removed by a vacuum yet? No, if the DSM bit is 0, there are no frozen nor dead tuples in the page and no dead index entries linking to tuples in it. In other words, we can reset DSM to 0 only in such condition. BTW, if we want to achieve the index-only scan, we might have to do more aggressive VACUUM FREEZE. There were many comments that we should avoid vacuuming pages that contain only unfrozen tuples or a few dead tuples. I think it it true for efficient VACUUM, but the index-only scan does not work for the unfrozen pages. Which should we give priority? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Dead Space Map for vacuum
On Thu, 2006-12-28 at 21:35 +, Heikki Linnakangas wrote: I only used 1 bit, just like in Itagaki's approach. 1 bit may not be enough. In many cases, a block will receive only 1 UPDATE or DELETE. If we then mark this in the DSM, when we VACUUM that block it will not have sufficient space free to make it worth VACUUM adding the block to the FSM (using current thinking). That thinking is good, since tuples frequently vary in length, so we could easy be in a position that a block in the FSM doesn't even have space for a single new tuple version. I would suggest that we tracked whether a block has had 0, 1 or 1+ updates/deletes against it. When a block has 1+ it can then be worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is really the minimum space worth reclaiming on any block. Otherwise we will end up with a huge, not very useful FSM and fairly inefficient VACUUMing of single dead tuples. So I'd suggest that we used at least 2 bits/block, so that we can avoid fiddling with small amounts of reclaimed space. The extra space taken up by the DSM is easily outweighed by the space we would have wasted in the FSM to make it effective when tracking smaller chunks of freespace (which needs 6 bytes/block). DSM is a good thing; many use cases will benefit from it. I'd want to be able to turn off DSM completely when the whole database fits in RAM and would probably like to enable/disable it for particular tables, until we get some good heuristics for when it is worthwhile. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dead Space Map for vacuum
Simon Riggs [EMAIL PROTECTED] writes: I would suggest that we tracked whether a block has had 0, 1 or 1+ updates/deletes against it. When a block has 1+ it can then be worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is really the minimum space worth reclaiming on any block. How do you arrive at that conclusion? Counterexample: table in which all tuples exceed half a page. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Dead Space Map for vacuum
On Fri, 2006-12-29 at 10:49 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I would suggest that we tracked whether a block has had 0, 1 or 1+ updates/deletes against it. When a block has 1+ it can then be worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is really the minimum space worth reclaiming on any block. How do you arrive at that conclusion? FSM code ignores any block with less space than 1 average tuple, which is a pretty reasonable rule. If you only track whether a block has been updated, not whether it has been updated twice, then you will be VACUUMing lots of blocks that have only a 50% chance of being usefully stored by the FSM. As I explained, the extra bit per block is easily regained from storing less FSM data. My understanding was that DSM was meant to increase VACUUM efficiency, so having a way to focus in on blocks most worth vacuuming makes sense using the 80/20 rule. Counterexample: table in which all tuples exceed half a page. Current FSM code will ignore those too, if they are less than the average size of the tuple so far requested. Thats a pretty wierd counterexample, even if it is a case that needs handling. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Dead Space Map for vacuum
Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2006-12-29 at 10:49 -0500, Tom Lane wrote: Counterexample: table in which all tuples exceed half a page. Current FSM code will ignore those too, if they are less than the average size of the tuple so far requested. Thats a pretty wierd counterexample, even if it is a case that needs handling. Better read it again. The number that's passed to the FSM is the free space *after* vacuuming, which in this scenario will be BLCKSZ-less-page-header. This case is not broken now, but it will be if we adopt your proposal. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Dead Space Map for vacuum
On Fri, 2006-12-29 at 16:41 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2006-12-29 at 10:49 -0500, Tom Lane wrote: Counterexample: table in which all tuples exceed half a page. Current FSM code will ignore those too, if they are less than the average size of the tuple so far requested. Thats a pretty wierd counterexample, even if it is a case that needs handling. Better read it again. The number that's passed to the FSM is the free space *after* vacuuming, which in this scenario will be BLCKSZ-less-page-header. This case is not broken now, but it will be if we adopt your proposal. The case doesn't is extremely rare, since #define TOAST_TUPLE_THRESHOLD (MaxTupleSize / 4) Even so, I'm fairly certain that an if () statement is OK to handle that case. So I don't really understand that as a limit to the proposal, which is a small change in the scheme of things. DSM has my support; I would like it to be as efficient as possible. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Dead Space Map for vacuum
Simon Riggs wrote: On Fri, 2006-12-29 at 10:49 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I would suggest that we tracked whether a block has had 0, 1 or 1+ updates/deletes against it. When a block has 1+ it can then be worthwhile to VACUUM it and to place it onto the FSM. Two dead tuples is really the minimum space worth reclaiming on any block. How do you arrive at that conclusion? FSM code ignores any block with less space than 1 average tuple, which is a pretty reasonable rule. FSM serves a different purpose than DSM and therefore has an entirely different set of rules governing what it should and shouldn't be doing. This is a reasonable rule for FSM, but not for DSM. If you only track whether a block has been updated, not whether it has been updated twice, then you will be VACUUMing lots of blocks that have only a 50% chance of being usefully stored by the FSM. As I explained, the extra bit per block is easily regained from storing less FSM data. Well, it seems that when implementing the DSM, it'd be a great time to move FSM from it's current location in Shared Memory to somewhere else. Possibly the same place as DSM. A couple of special blocks per file segment would a good place. Also I'm not sure that the point of VACUUMing is always to be able be able to immediately reuse the space. There are cases where large DELETE's are done, and you just want to decrease the index size. In Tom's counter example of large tuples, you certainly want to vacuum the index when only a single update/delete occurs. My understanding was that DSM was meant to increase VACUUM efficiency, so having a way to focus in on blocks most worth vacuuming makes sense using the 80/20 rule. Possibly true. I don't have anything to indicate what usage patterns produce what requirements in vacuum patterns. If there are significant numbers of blocks with one update, is it a loss to actually vacuum those. I know it could be faster if we didn't, but would it still be faster than what we do now. Counterexample: table in which all tuples exceed half a page. Current FSM code will ignore those too, if they are less than the average size of the tuple so far requested. Thats a pretty wierd counterexample, even if it is a case that needs handling. Again I'd be careful saying that FSM = DSM for handling of what should be done for a particular block. Russell Smith.
Re: [HACKERS] Dead Space Map for vacuum
On Sat, 2006-12-30 at 09:22 +1100, Russell Smith wrote: Simon Riggs wrote: FSM code ignores any block with less space than 1 average tuple, which is a pretty reasonable rule. FSM serves a different purpose than DSM and therefore has an entirely different set of rules governing what it should and shouldn't be doing. This is a reasonable rule for FSM, but not for DSM. VACUUM and space reuse are intimately connected. You cannot consider one without considering the other. If you only track whether a block has been updated, not whether it has been updated twice, then you will be VACUUMing lots of blocks that have only a 50% chance of being usefully stored by the FSM. As I explained, the extra bit per block is easily regained from storing less FSM data. Well, it seems that when implementing the DSM, it'd be a great time to move FSM from it's current location in Shared Memory to somewhere else. Possibly the same place as DSM. A couple of special blocks per file segment would a good place. Also I'm not sure that the point of VACUUMing is always to be able be able to immediately reuse the space. There are cases where large DELETE's are done, and you just want to decrease the index size. I can see the argument in favour of reducing index size, but do we want to perform a read *and* a write IO to remove *one* heap tuple, just so we can remove a single tuple from index(es)? We might want to eventually, but I'm proposing keeping track of that so that we can tell the difference between single dead tuples and more efficient targets for our VACUUM. When there are no better targets, sure, we'll be forced to reach for the high fruit. The idea of DSM is to improve the efficiency of VACUUM by removing wasted I/Os (and again I say, I back DSM). The zero-dead-tuples blocks are obvious targets to avoid, but avoiding them only avoids one read I/O. Avoiding the single-dead-tuple blocks avoids two I/Os, at small loss. They are poor targets for a selective VACUUM. When we are picking just some of the blocks in a table, we will quickly move from sequential to random I/O, so we must be careful and conservative about the blocks we pick, otherwise DSM VACUUM will not be any better than VACUUM as it is now. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Dead Space Map for vacuum
On Thu, 28 Dec 2006, Heikki Linnakangas wrote: ITAGAKI Takahiro wrote: Hello, NTT staffs are working on TODO item: | Create a bitmap of pages that need vacuuming We call the bitmap Dead Space Map (DSM), that allows VACUUM to scan only pages that need vacuuming or freezing. We'd like to discuss the design on hackers and make agreements with community. Great! I agree! We implemented the basic parts of it and measured the performance. As expected, VACUUM took shorter time when the fraction of updates are low. DSM is useful for large but not so heavily-updated databases. The overhead of managing DSM seemed to be negligible small in our CPU-loaded tests. There are a lot of choices in implementation. We followed the descriptions in TODO list and past discussions in some parts, but did not in other parts for some reasons. I would appreciate your comments and suggestions. I experimented with a different DSM design last winter. I got busy with other things and never posted it AFAIR, but the idea was to store a bitmap in the special area on every 32k heap page. That had some advantages: * doesn't require a new dedicated shared memory area that needs to be allocated and tuned. * doesn't introduce a (single) new global lwlock that might become hotspot. * WAL logging is quite simple, since the bitmaps are on normal pages handled by buffer manager. I too have experimented with this idea. 4 DSM pages means 2 bits per block. What were you using the other bit for? When I discussed this some time ago with Jan Weick, we recommended we track blocks in the FSM with this extra bit. One problem this approach may have is contention for the DSM pages and the granularity of the lock associated with it. Locking a DSM page to update one bit will affect writes to 32K pages. This might be less of a problem than I think because of the very low cost of setting the bit. | In the event of a system crash, the bitmap would probably be invalidated. We bought it for simplicity. Currently, all DSM are lost after crash. All tables will be untracked status. Full-scanned vacuum is needed after the lost of DSM. If you flush the DSM to disk at checkpoint, it should be easy to bring it up-to-date on WAL replay. Having to do full-scanning vacuum after crash can be a nasty surprise. I agree. One interesting insight is, we have 4 bits left over (those not used for the 4 DSM blocks). We might use those status bits for some purpose. For example, when a page is dirtied and we update the DSM page, we set a DSM dirtied bit. Checkpoint would then only flush DSM pages dirtied since the last check point. This is important in the presense of lots of read only tables. | One complexity is that index entries still have to be vacuumed, and doing | this without an index scan (by using the heap values to find the index entry) | might be slow and unreliable, especially for user-defined index functions. Indexes are still needed to be full-scanned at the present moment. We are also researching a retail index vacuum method, but it requires more works. Yeah, that's an old story :(. BTW: Yesterday I realized that bitmap indexes could be retail vacuumed safely. You'll still need to visit all bitmaps to find the dead bit, but you only need to check the bitmap page that corresponds the tid of the dead tuple. Cool. The problem is solving it for the other AMs, particularly btree. | http://archives.postgresql.org/pgsql-hackers/2004-03/msg00957.php | Maintain 2 bits per block that tell if the block has been vaccumed of all | dead tuples since the last time it was dirtied, and if all its tuples are | completely frozen. We use 1 bit per block, so we cannot separate pages that need either vacuum or freeze only. The reason is that we cannot determine where to record before/after updated tuples. If the transaction is commited, before-version should be recorded into needs-vacuum bitmap and after-version into needs-freeze bitmap. But on rollback, it is inverted. We cannot judge which we should do until the end of transaction. Yeah, that makes the DSM less useful. Are you thinking of freezing more aggressively because of that? Or doing a full-scanning vacuum every now and then to freeze? I don't see any problem with moving to two status bits per block in the DSM-in-heap model, so we can track this better. What do you think? Thanks, Gavin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Dead Space Map for vacuum
On 12/28/06, ITAGAKI Takahiro wrote: | [TODO item] Allow data to be pulled directly from indexes | Another idea is to maintain a bitmap of heap pages where all rows are | visible to all backends, and allow index lookups to reference that bitmap | to avoid heap lookups It is not done yet, but we can use DSM for this purpose. If the corresponding bit in DSM is '0', all tuples in the page are frozen and visible to all backends. We don't have to look up frozen pages only for visibiliby checking. Does that really work in the absence of a retail index vacuum method? What if the heap is already vacuumed, frozen and the bit for that page in the DSM is set to '0', but the index still contains entries that haven't been removed by a vacuum yet? Jochem ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Dead Space Map for vacuum
Heikki Linnakangas [EMAIL PROTECTED] wrote: We use 1 bit per block, so we cannot separate pages that need either vacuum or freeze only. The reason is that we cannot determine where to record before/after updated tuples. If the transaction is commited, before-version should be recorded into needs-vacuum bitmap and after-version into needs-freeze bitmap. But on rollback, it is inverted. We cannot judge which we should do until the end of transaction. Yeah, that makes the DSM less useful. Are you thinking of freezing more aggressively because of that? Or doing a full-scanning vacuum every now and then to freeze? I recommend to use VACUUM FREEZE with the patch, but what you say is surely reasonable. I think that backends record all of the changes into 1st bitmap, and then, VACUUM moves some bits from 1st bitmap to 2nd one. Assumimg we had two bitmaps; (1)invisible-from-someone-bitmap and (2)freeze-needed-bitmap. (Actually, two bits would be packed into one compound bitmap, but for illustrative purposes.) We have to record changes into (1) at first, because we can find a page needs either VACUUM or FREEZE only after commits or rollbacks. VACUUM scans pages using (1). Newly inserted pages are also scanned at first vacuum, even if they requires only freeze. If the vacuum find that all of tuples in a page are vacuumed completely and old enough to be visible by all open transactions, delete the page from (1). Then, all are also frozen, the page is not in both bitmaps. Otherwise, put it into (2). We can use (1) for the index only scanning, because all of the tuples not recorded in (1) are visible to all backends, regardless of whether they are also recorded in (2). Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Dead Space Map for vacuum
Heikki Linnakangas [EMAIL PROTECTED] writes: I experimented with a different DSM design last winter. I got busy with other things and never posted it AFAIR, but the idea was to store a bitmap in the special area on every 32k heap page. That had some advantages: * doesn't require a new dedicated shared memory area that needs to be allocated and tuned. * doesn't introduce a (single) new global lwlock that might become hotspot. I agree with doing something along that line (not necessarily with putting the bits into existing heap pages, but anyway with keeping the information on disk not in shared memory). Our experience with the existing FSM design has been uniformly negative; so I don't wish to add still another shared memory area that requires tricky manual size tuning. I think sooner or later we'll need to redesign FSM so it's not kept in a fixed-size shared memory area. Let's not make that mistake twice. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Dead Space Map for vacuum
Gavin Sherry wrote: On Thu, 28 Dec 2006, Heikki Linnakangas wrote: ITAGAKI Takahiro wrote: I experimented with a different DSM design last winter. I got busy with other things and never posted it AFAIR, but the idea was to store a bitmap in the special area on every 32k heap page. That had some advantages: * doesn't require a new dedicated shared memory area that needs to be allocated and tuned. * doesn't introduce a (single) new global lwlock that might become hotspot. * WAL logging is quite simple, since the bitmaps are on normal pages handled by buffer manager. I too have experimented with this idea. 4 DSM pages means 2 bits per block. What were you using the other bit for? When I discussed this some time ago with Jan Weick, we recommended we track blocks in the FSM with this extra bit. I only used 1 bit, just like in Itagaki's approach. The bitmap in the special area only occupied half a page, the rest was available for normal heap tuples. That might seem strange, but it avoids having to allocate 2 pages for small tables with just a handful of rows. It also spreads out the lock contention a bit. I think I wasn't clear in my explanation. If a table had less than 32k pages, it had just one bitmap with each bit representing a heap page. The bitmap was stored in the special area of the first page. If a table was larger, the bitmap on the 1st page represented pages 1-32k. On the 32768th page there's another bitmap that represents the next 32k pages, and so on. In other words, the DSM bitmap of page number X was always on page number (X / 32768). Vacuum had to check all the bitmaps. BTW: Yesterday I realized that bitmap indexes could be retail vacuumed safely. You'll still need to visit all bitmaps to find the dead bit, but you only need to check the bitmap page that corresponds the tid of the dead tuple. Cool. The problem is solving it for the other AMs, particularly btree. Yeah.. | http://archives.postgresql.org/pgsql-hackers/2004-03/msg00957.php | Maintain 2 bits per block that tell if the block has been vaccumed of all | dead tuples since the last time it was dirtied, and if all its tuples are | completely frozen. We use 1 bit per block, so we cannot separate pages that need either vacuum or freeze only. The reason is that we cannot determine where to record before/after updated tuples. If the transaction is commited, before-version should be recorded into needs-vacuum bitmap and after-version into needs-freeze bitmap. But on rollback, it is inverted. We cannot judge which we should do until the end of transaction. Yeah, that makes the DSM less useful. Are you thinking of freezing more aggressively because of that? Or doing a full-scanning vacuum every now and then to freeze? I don't see any problem with moving to two status bits per block in the DSM-in-heap model, so we can track this better. What do you think? Well, the obvious drawback is that two bits take more space than one bit. But it feels ok to me as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Dead Space Map for vacuum
ITAGAKI Takahiro wrote: Hello, NTT staffs are working on TODO item: | Create a bitmap of pages that need vacuuming We call the bitmap Dead Space Map (DSM), that allows VACUUM to scan only pages that need vacuuming or freezing. We'd like to discuss the design on hackers and make agreements with community. Great! We implemented the basic parts of it and measured the performance. As expected, VACUUM took shorter time when the fraction of updates are low. DSM is useful for large but not so heavily-updated databases. The overhead of managing DSM seemed to be negligible small in our CPU-loaded tests. There are a lot of choices in implementation. We followed the descriptions in TODO list and past discussions in some parts, but did not in other parts for some reasons. I would appreciate your comments and suggestions. I experimented with a different DSM design last winter. I got busy with other things and never posted it AFAIR, but the idea was to store a bitmap in the special area on every 32k heap page. That had some advantages: * doesn't require a new dedicated shared memory area that needs to be allocated and tuned. * doesn't introduce a (single) new global lwlock that might become hotspot. * WAL logging is quite simple, since the bitmaps are on normal pages handled by buffer manager. I had it working enough to see that vacuum time was shortened, but I didn't perform any further performance testing. | In the event of a system crash, the bitmap would probably be invalidated. We bought it for simplicity. Currently, all DSM are lost after crash. All tables will be untracked status. Full-scanned vacuum is needed after the lost of DSM. If you flush the DSM to disk at checkpoint, it should be easy to bring it up-to-date on WAL replay. Having to do full-scanning vacuum after crash can be a nasty surprise. | One complexity is that index entries still have to be vacuumed, and doing | this without an index scan (by using the heap values to find the index entry) | might be slow and unreliable, especially for user-defined index functions. Indexes are still needed to be full-scanned at the present moment. We are also researching a retail index vacuum method, but it requires more works. Yeah, that's an old story :(. BTW: Yesterday I realized that bitmap indexes could be retail vacuumed safely. You'll still need to visit all bitmaps to find the dead bit, but you only need to check the bitmap page that corresponds the tid of the dead tuple. | http://archives.postgresql.org/pgsql-hackers/2004-03/msg00957.php | Maintain 2 bits per block that tell if the block has been vaccumed of all | dead tuples since the last time it was dirtied, and if all its tuples are | completely frozen. We use 1 bit per block, so we cannot separate pages that need either vacuum or freeze only. The reason is that we cannot determine where to record before/after updated tuples. If the transaction is commited, before-version should be recorded into needs-vacuum bitmap and after-version into needs-freeze bitmap. But on rollback, it is inverted. We cannot judge which we should do until the end of transaction. Yeah, that makes the DSM less useful. Are you thinking of freezing more aggressively because of that? Or doing a full-scanning vacuum every now and then to freeze? | [TODO item] Allow data to be pulled directly from indexes | Another idea is to maintain a bitmap of heap pages where all rows are | visible to all backends, and allow index lookups to reference that bitmap | to avoid heap lookups It is not done yet, but we can use DSM for this purpose. If the corresponding bit in DSM is '0', all tuples in the page are frozen and visible to all backends. We don't have to look up frozen pages only for visibiliby checking. Cool. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster