>ISTM we would be able to do this fairly well if we implemented
>Index-only columns. i.e. columns that don't exist in the heap, only in
>Taken to the extreme, all columns could be removed from the heap and
>placed in an index(es). Only the visibility information would remain on
So, let me understand this correctly - you want to index the columns and
use the index to reconstruct the data? Some kind of "implicit" reconstruction?
>Doing this per column would be a big win over vertical databases
>since AFAIK they *have* to do this to *every* column, even if it is not
>beneficial to do so.
I was thinking about something a little more crude - each column being a free-standing
table, but being "viewed" by the client as a single entity, a kind of "data federation".
The idea was that the existing storage mechanism wouldn't be altered, and
with a little slight-of-hand, we could extend the mechanism without hampering things
like clustering, future extensions, optimizations, etc. No changes to MVCC would be
needed, because it would above and through it.
The idea was that for a "wide" table you target only a few columns, so you could
sequential read without the penalty of having to seek to a new offset for each record.
Instead of processing all the columns, you process a single column, which means less
data to read for the same number of records. That gain starts to slope off
when you specify more and more columns from the table.
Throw in selective indexing (similar to what you were talking about) and suddenly we can reclaim
some of that lost speed. We'll make a kind of "compressed index", where the key turns into
a hash that points to (is attached to?) a bucket, and the bucket contains the offset of all the
records that relate to that key. Other tricks can be employed, such as run-length encoding
entire ranges of offsets, etc. to keep this really really small. Really small = faster and faster
to read, using more CPU than I/O. And I/O is still more of an issue than CPU at this point.
Then again, if you ditch the column altogether and use a "compressed index" to reconstruct
data implicitly, now we're close to what you were talking about (assuming I understand you
correctly and also assuming that PostgreSQL doesn't already do this with indexes). So, if the
column is indexed, then maybe split it off into a "compressed index", and if not, keep it in the main
I guess I really need to think about this a bit more before I start delving into code.
<Vertical DB market discussion - snipped>
>I thought maybe we can call it COAST, Column-oriented attribute storage technique, :-)
I like it. :-) I just wish I would have read this before applying for a project name
at pgfoundry, the current proposal is given as "pg-cstore".