Ühel kenal päeval, T, 2007-10-23 kell 14:16, kirjutas Heikki
> Hannu Krosing wrote:
> > I would suggest that you use just an additional heap with decoupled
> > visibility fields as DSM.
> Yeah, I remember you've suggested that before, and I haven't responded
> this far. The problems I see with that approach are:
> 1) How do you know which visibility info corresponds which heap tuple?
> You'd need to have a pointer from the visibility info to the heap tuple,
> and from the heap tuple to the visibility info. Which increases the
> total (uncompressed) storage size.

only the first , and not a pointer mostly , but just a fast lookup
function IsVisible(ctid, snapshot)

> 2) If the visibility info / heap ordering isn't the same, seqscans need
> to do random I/O.

Visibility should be a tree-like structure, with
wholly-(visible/invisible) inner pages removed, stored in seqscan order.

seqscans will get a few (or few hundred) pages worth of sequential info
and then fetch only visible tuples from heap, then repeat

> 3) If you need to do regular index scans, you're going to have to access
> the index, the heap and the visibility info separately, and in that
> order. That sounds expensive.

I'd do it in order 1) index 2) visibility 3) heap

The main assumption is, that visibility info is highly compressed -->
stays in processor cache most of the time --> virtually free to access

> 4) It's a big and complex change.

We can go in smaller steps, first adding it as a DSM replacement and as
a complement to visibility info in heap and only when we are sure that
we have gotten it right (by checking that there is always valid info in
visibility heap) will we move other internal stuff to using it, one by

> The significance of 2 and 3 depends a lot on how much of the visibility
> information is in cache.
> > For a large number of usage scenarios this will be highly compressible
> > and will mostly stay in processor caches .
> This seems to be where the potential gains are coming from in this
> scheme. It boils down to how much compression you can do, and how
> expensive it is to access the information in compressed form.

Hopefully much cheaper than in uncompressed form, the best case being
when the table is static and you do an in-L1-cache lookup on VizHeap
root page

> > 1) it is usually higly compressible, at least you can throw away
> > cmin/cmax quite soon, usually also FREEZE and RLE encode the rest.
> If you RLE compress the data, you'll need to figure out what to do when
> you need update a field and it doesn't compress as well anymore. You
> might have to move things around pages, so you'll have to update any
> pointers to that information atomically.

I envision each RLE range to have an exception list, which can be
appended to atomically with no locks. only when the list is long enough
will there be locking and rearranging.

> > 2) faster access, more tightly packed data pages.
> But you do need to access the visibility information as well, at least
> on tuples that match the query.

usually you even access visbility first, then data (if visible).

> > 5) makes VACUUM faster even for worst cases (interleaving live and dead
> > tuples)
> Does it? You still need to go to the heap pages to actually remove the
> dead tuples. I suppose you could skip that and do it the first time you
> access the page, like we do pruning with HOT.

Possibly for heap. But not for indexes which have to be cleaned up
during vacuum. At least I can't see how we could do it later.

OTOH, we can have very fast VACUUM FREEZE ONLY command, which will run
on just visibility heap and FREEZE all visible-to-all tuples and
compress them. It should also mark all aborted or
deleted-and-invisible-to-all to a separate value so they too compress

> > 6) any index scan will be faster due to fetching only visible rows from
> > main heap.
> Assuming the visibility information is already in cache, and that
> there's enough non-visible tuples for that to matter.

If there is not enough, then the visibility info is likely very highly
compressible and thus cheap to access.


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at


Reply via email to