Be forewarned - this is probably a very long post, and I'm just a mere
mortal (ie. admin) who doesn't write copious amounts of C code. Take
the following posts and suggestions with a grain of salt.
So I've been seeing/hearing all of the hoopla over vertical databases
(column stores), and how they'll not only slice bread but also make
toast, etc. I've done some quick searches for past articles on
"C-Store", "Vertica", "Column Store", and "Vertical Database", and have
seen little discussion on this. And then a funny thought occurs to me -
when I look at the directory structure and file layout of a PostgreSQL
database, I see that each OID corresponds to a table, which corresponds
to (generally) a single file. Then I have a second funny thought - what
if there was a low-friction, low-cost-of-implementation way to bring
similar advantages to PostgreSQL without major alterations, recoding,
etc? Finally it occurs to me that PostgreSQL already does something
similar but it could do it so much better, with only one language change
and minor changes to the storage layout. So here's my plum-crazy
proposal (and I've made some before - see
http://archives.postgresql.org/pgsql-odbc/2006-10/msg00040.php - and
they not only made it into production, but they are in active use by me
on a weekly basis - Thanks Hiroshi!!!), bear with me...
Make one small, very tiny syntactic change to "CREATE TABLE" that
includes a new keyword, "COLUMN-STORE" or something similar. I don't
care where it appears as long as it's after the "CREATE TABLE". You
would not have to break any existing SQL conventions, PostgreSQL would
continue to be SQL compliant, and given the odd wording, I highly doubt
that the folks who work on SQL keywords will end up using it at any
point in time. If adding COLUMN-STORE is objectionable because it will
"cloud the compliance of the language" then simply move the
functionality into the table space functionality. In hindsight, it
might even belong there instead. So, instead of specifying it by
table, create a table space that has an attribute "Column Storage" set
as active. When inactive, it uses the traditional "one-file-per-table"
Make each table column capable of receiving an OID. This will be
critical for the following steps...
If a table is created with "COLUMN-STORE" as an option, then it will
continue to behave in the same way it always has, but the storage will
be different. Each column in the table will be represented by a single
file, with the file name being (naturally) the OID.
INSERT/UPDATE/DELETE would function as it always has, etc. Nothing would
change. Except how the data is stored. The existing TOAST mechanisms
continue to work - because the engine would treat each file as a
One additional "column" would be added to the store, an invisible one
that not only tracks the OID for the "rows" in this type of setup, but
also the state of the row. Let's call this the "Control Column". Given
that the metadata size for the row would be fixed/constant, we won't
have to worry about what is in the other "columns" and "rows", they can
be any size. BTW, the "Control Column" would be just another column
from the storage engine's point of view. It just happens to be one that
no-one can see, other than the database (and maybe the administrator).
When you go to VACUUM a table, you would treat each column as a
single-row table, so if a row is a candidate for a VACUUM reclamation,
then it will adjust each "column" an equal amount. Under no
circumstances would you have columns "out of sync", so if a record goes,
it means each adjacent column goes with it. This sounds disk-intensive
at first, until you realize that the admin will have made a contentious
decision to use this format, and understands the advantages/drawbacks to
this method. So it takes a little longer to VACUUM, I don't care,
because as an admin I will have specified this layout for a reason - to
do OLAP, not OLTP. Which means, I rarely VACUUM it. Add to this the
high efficiency you would gain by packing more records into buffers per
read, and most of the losses you take in re-reading data would really
not amount to as big a loss as you might think.
DELETE would simply mark a row off as deleted in the "Control Column".
If the storage engine needed to reclaim a row, it would not have to look
any further than the "control column" to find an empty spot where it
could overwrite data.
INSERT/UPDATE continue to work as they always have. The storage engine
would perceive each "column" as a single-column table, meaning that the
existing TOAST mechanisms continue to work! Nothing needs to change
there. The real change would be that the table's columns would be
"split up" into individual updates, and the "Control Column" would be
used to keep all of the records in sync.
Why bother with this? Because, when you are said and done, you will
find yourself with a rough equivelent of a column-store database, with
all of the OLAP goodness that people are looking for. You have little
if any impact on the admin/users perception, other than a flag was
checked somewhere and forgotten about in the database. From the storage
engine's perspective, you have many many many small 1-column tables to
take care of, and they all update at the same "place" at the same "time"
to keep the records in sync when you recompose a row. TOAST and large
object storage works the same as before, nothing changes, and that's as
it should be.
All with what would be (hopefully) a minor change to the storage
backend. We're not talking about brain surgery on existing, tested
code, but rather, a new feature that uses existing features in-place.
As TOAST improves so does this feature. As caching improves, so does
the feature again. And so on.
I've been in a bit of a hurry to blurt all of this out, and I'm sure
that I've forgotten something along the way, so if you find something
missing, please be patient- I had to write all of this in about 20
minutes or less and I didn't have alot of time.
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at