On Mon, 16 Feb 2003, Ryan Bradetich wrote: > I am not sure why all the data is duplicated in the index ...
Well, you have to have the full key in the index, or how would you know, when you look at a particular index item, if it actually matches what you're searching for? MS SQL server does have an interesting option that would help you a lot in this case: clustered indexes. A table may have a single clustered index, and each leaf node of the index stores not just the key but actually the entire row. Thus, in a case like yours, you'd store the row only once, not twice. Without thinking too hard about it (my usual mode of operation on this list :-)) this could probably be implemented in postgresql. But I don't think it would be entirely trivial, and your case is unusual enough that I very much doubt whether it would be worth implementing to fix that alone. It would also offer the advantage that any lookup using the clustered index would save fetching the heap page after that as well, but it's hard to say if the savings would be worth the work. > Since my only requirement is that the rows be unique, I have developed a > custom MD5 function in C, and created an index on the MD5 hash of the > concatanation of all the fields. Well, that won't guarantee uniqueness, since it's perfectly possible to have two different rows hash to the same value. (If that weren't possible, your hash would have to contain as much information as the row itself, and your space savings wouldn't be nearly so dramatic.) cjs -- Curt Sampson <[EMAIL PROTECTED]> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org