Hannu Krosing wrote: > I would suggest that you use just an additional heap with decoupled > visibility fields as DSM.
Yeah, I remember you've suggested that before, and I haven't responded this far. The problems I see with that approach are: 1) How do you know which visibility info corresponds which heap tuple? You'd need to have a pointer from the visibility info to the heap tuple, and from the heap tuple to the visibility info. Which increases the total (uncompressed) storage size. 2) If the visibility info / heap ordering isn't the same, seqscans need to do random I/O. 3) If you need to do regular index scans, you're going to have to access the index, the heap and the visibility info separately, and in that order. That sounds expensive. 4) It's a big and complex change. The significance of 2 and 3 depends a lot on how much of the visibility information is in cache. > For a large number of usage scenarios this will be highly compressible > and will mostly stay in processor caches . This seems to be where the potential gains are coming from in this scheme. It boils down to how much compression you can do, and how expensive it is to access the information in compressed form. > 1) it is usually higly compressible, at least you can throw away > cmin/cmax quite soon, usually also FREEZE and RLE encode the rest. If you RLE compress the data, you'll need to figure out what to do when you need update a field and it doesn't compress as well anymore. You might have to move things around pages, so you'll have to update any pointers to that information atomically. > 2) faster access, more tightly packed data pages. But you do need to access the visibility information as well, at least on tuples that match the query. > 5) makes VACUUM faster even for worst cases (interleaving live and dead > tuples) Does it? You still need to go to the heap pages to actually remove the dead tuples. I suppose you could skip that and do it the first time you access the page, like we do pruning with HOT. > 6) any index scan will be faster due to fetching only visible rows from > main heap. Assuming the visibility information is already in cache, and that there's enough non-visible tuples for that to matter. >> BTW, another issue you'll have to tackle, that a DSM-based patch will >> have to solve as well, is how to return tuples from an index. In b-tree, >> we scan pages page at a time, keeping a list of all tids that match the >> scanquals in BTScanOpaque. If we need to return not only the tids of the >> matching tuples, but the tuples as well, where do we store them? You >> could make a palloc'd copy of them all, but that seems quite expensive. > > Have you considered returning them as "already visibility-checked pages" > similar to what views or set-returning functions return ? Sorry, I don't understand what you mean by that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings