Re: [HACKERS] Index Tuple Compression Approach?

Simon Riggs Wed, 15 Aug 2007 03:37:22 -0700

On Wed, 2007-08-15 at 06:51 +0100, Heikki Linnakangas wrote:
> Jeff Davis wrote:
> > On Tue, 2007-08-14 at 16:27 -0500, Decibel! wrote:
> >> Isn't this what Grouped Index Tuples is?
> > 
> > http://community.enterprisedb.com/git/git-readme.txt
> > 
> > It looks like GIT is a little different. 
> > 
> > GIT actually stores a lower-bound key of a contiguous* range of keys
> > that all point to the same page, and for each of those ranges stores a
> > bitmap of page offsets. A search searches first for an exact match in
> > the index, and failing that, looks to see if the previous index tuple
> > happens to be one of these ranges.
> > 
> > The algorithm Chris is talking about stores a set of tuple ids (which
> > include page and offset) for each distinct key.
> 
> Right.
> 
> > Both could be helpful, although I don't think they can work together
> > very well.
> 
> What Chris is suggesting is basically a special case of GIT, where all
> the heap tuples represented by an index tuple have the same key. I was
> actually thinking of adding a flag to index tuples to indicate that
> special case in GIT. We could effectively do both.


A few additional thoughts...

The approach suggested by Chris is also used by Teradata Non-Unique
Secondary Indexes, known as NUSIs (but not named by me!). The following
link is a public domain description that is detailed enough:
http://teradata.uark.edu/research/wang/indexes.html

Replicating this approach directly isn't that useful because it would
interact badly with the way we handle updates. Thinking about the basic
design pattern however, we can envisage a type of index that changes the
1:1 mapping between index and heap tuple into the concept of a "tuple
set index" where we have a 1:Many mapping between index and heap. 

In general, the tuple set index approach can significantly reduce index
size. This won't be of interest to anyone unless all of your data
overflows RAM and you need to do I/O to constantly page back in pieces
of your data. If your database does fit in RAM, reducing the number of
index blocks might just increase contention. This means that the tuple
set approach is only useful when you have very large databases, but when
you do it is very, very useful.

GIT is a tuple set index with two important extensions:

1. GIT allows a range of values to be indexed, not just a single value,
so this allows it to be useful for both Unique and Non-Unique cases.

2. GIT restricts the set of tuples to a small range of blocks within the
table. Making the range of blocks = 1 means that GIT is designed to work
well with HOT, which also stays on the same block. Keeping the range of
blocks small means GIT degrades as slowly as possible in the face of
"cold" UPDATEs. If the values are inserted in roughly ordered/clustered
sequence then this doesn't increase index size at all, so the most
common/highest volume use cases are covered.

So from my perspective, GIT is very close to the optimal design for a
tuple set index that addresses the need for high concurrency and
unique/near-uniqueness with PostgreSQL. There are certainly many other
options for a tuple set index, and bitmap indexes are simply another
version of a tuple set index but with different behaviours tuned to a
different use case. There maybe other use cases that require more than
two kinds of tuple set index...and we have discussed implementing the
tuple set behaviour as part of the other main index types.

As an aside, it turns out, after further research that GIT is similar to
a clustered index in SQLServer, as described by Louis Davidson, "Pro SQL
Server 2005 Database Design and Optimization", p.405. SQLServer creates
a clustered index by default on each PK, so it says.

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

Re: [HACKERS] Index Tuple Compression Approach?

Reply via email to