Martijn van Oosterhout wrote:
On Fri, Sep 29, 2006 at 10:51:32AM +0100, Heikki Linnakangas wrote:
After some thought:

Imagine a normal B-tree just like what we have now. But when there is
more than one tuple on the same heap page with consecutive index keys,
we represent all of them in a single index tuple that contains the key
of the first one of them, and a (run-length encoded) bitmap of the
OffsetNumbers. We should get most of the space and I/O savings as with
the original proposal, but we can vacuum it without re-evaluating index
expressions.

I think something like this has been discussed before. Where an index
tuple currently has the key values followed by a ctid, simply change
that so that it can be a list of ctid's, in order.

Actually it's t_tid followed by t_info (which is size + flags) followed by key values.

This works on having the same key, but doesn't care if the tuples are
on the same page. It used to be not possible because of how to handle
forward and backward scanning within an index entry during updates. I
think with the new "page at a time" index scanning, this got a lot
easier.

I'm not very interested in the case where you have a lot of equal keys, I think the bitmap index am is more suitable for that. The Block B-tree could be used whenever you have a clustered table (including unique indexes). Some DBMSs have index-organized-tables for the same use case.

When I tested a prototype of the original idea with TPC-C (DBT-2) data, a block index on the order_line table was under 2% of the size of a normal B-tree. That's very close to a best-case scenario; narrow rows and a completely clustered table. I'm aiming at that order of magnitude reduction in storage size.

One issue is that a single index page could hold more than 1000 index
entries, which might cause problems for callers.

I don't think that's a problem.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to