Re: [HACKERS] Questions about indexes?

Curt Sampson Mon, 17 Feb 2003 07:43:20 -0800

On Mon, 16 Feb 2003, Ryan Bradetich wrote:

> I am not sure why all the data is duplicated in the index ...


Well, you have to have the full key in the index, or how would you know,
when you look at a particular index item, if it actually matches what
you're searching for?

MS SQL server does have an interesting option that would help you a lot
in this case: clustered indexes. A table may have a single clustered
index, and each leaf node of the index stores not just the key but
actually the entire row. Thus, in a case like yours, you'd store the row
only once, not twice.

Without thinking too hard about it (my usual mode of operation on this
list :-)) this could probably be implemented in postgresql. But I don't
think it would be entirely trivial, and your case is unusual enough
that I very much doubt whether it would be worth implementing to fix
that alone. It would also offer the advantage that any lookup using the
clustered index would save fetching the heap page after that as well,
but it's hard to say if the savings would be worth the work.

> Since my only requirement is that the rows be unique, I have developed a
> custom MD5 function in C, and created an index on the MD5 hash of the
> concatanation of all the fields.

Well, that won't guarantee uniqueness, since it's perfectly possible
to have two different rows hash to the same value. (If that weren't
possible, your hash would have to contain as much information as the row
itself, and your space savings wouldn't be nearly so dramatic.)

cjs
-- 
Curt Sampson  <[EMAIL PROTECTED]>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [HACKERS] Questions about indexes?

Reply via email to