[HACKERS] Follow-Up to A Silly Idea for Vertically-Oriented Databases

2007-09-07 Thread Avery Payne
In hindsight, I did miss quite a bit in my last post.  Here's a summary 
that might clear it up:


Add a single keyword that specifies that the storage format changes 
slightly.  The keyword should not affect SQL compliancy while still 
extending functionality.  It can be specified as either part of the 
CREATE TABLE statement or part of the tablespace mechanism.


When a table is created with this setting, all columns in a record are 
split vertically into individual, 1-column-wide tables, and each column 
in the table is assigned an OIDs.  Each OID corresponds to one of our 
1-wide tables.  An additional control column will be created that is 
only visible to the database and the administrator.  This column stores 
a single logical indicating if the record is allocated or not.  You 
might even be able to create a special bitmap index that is hidden, and 
just use existing bitmap functions in the index code.  In essence, this 
column helps keep all of the other columns in sync when dealing with rows.


When writing data to the table, each individual column will update, but 
the engine invisibly wraps together all of the columns into a single 
transaction.  That is, each row insert is still atomic and behaves like 
it normally would - either the insert succeeds or it doesn't.  Because 
the updates are handled by the engine as many separate tables, no 
special changes are required, and existing storage mechanisms (TOAST) 
continue to function as they always did.  This could be written as a 
super-function of sorts, one that would combine all of the smaller steps 
together and use the existing mechanisms.


Updates are performed in the same manner, with each column being 
rolled up into a single invisible mini-transaction for the given record.


Deletes are performed by marking not only the columns as deleted but 
also the control column as having that row available for overwrite.  I'm 
simplifying quite a bit but I think the general idea is understood.  
Yes, a delete will have significant overhead compared to an insert or 
update but this is a known tradeoff that the administrator is willing to 
make, so they can gain faster read speeds - ie. they want an 
OLAP-oriented store, not an OLTP-oriented store.


The control column would be used to locate records that can be 
overwritten quickly.  When a record is deleted, the control column's 
bitmap was adjusted to indicate that a free space was available.  The 
engine would then co-ordinate as it did above, but it can cheat - 
instead of trying to figure things out for each table, the offset to 
write to is already known, so the update proceeds as listed above, other 
than each part of the little mini-transaction writes to the same 
offset (ie. each column in the record will have the same hole, so 
when you go to write the record out, write it to the same record 
spot).  This is where the control column not only coordinates deletes 
but also inserts that re-use space from deleted records.


Hopefully that makes it a little clearer.




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Follow-Up to A Silly Idea for Vertically-Oriented Databases

2007-09-07 Thread Decibel!
On Fri, Sep 07, 2007 at 02:58:03PM -0700, Avery Payne wrote:
 In hindsight, I did miss quite a bit in my last post.  Here's a summary 
 that might clear it up:
 
 Add a single keyword that specifies that the storage format changes 
 slightly.  The keyword should not affect SQL compliancy while still 
 extending functionality.  It can be specified as either part of the 
 CREATE TABLE statement or part of the tablespace mechanism.
 
 When a table is created with this setting, all columns in a record are 
 split vertically into individual, 1-column-wide tables, and each column 
 in the table is assigned an OIDs.  Each OID corresponds to one of our 
 1-wide tables.  An additional control column will be created that is 
 only visible to the database and the administrator.  This column stores 
 a single logical indicating if the record is allocated or not.  You 
 might even be able to create a special bitmap index that is hidden, and 
 just use existing bitmap functions in the index code.  In essence, this 
 column helps keep all of the other columns in sync when dealing with rows.

OID's aren't the way to link this stuff together. It would make more
sense for there to be one file that stores all the MVCC and other row
overhead, and for that table to store ctids, because that will be the
fastest way to look up the columns.

 When writing data to the table, each individual column will update, but 
 the engine invisibly wraps together all of the columns into a single 
 transaction.  That is, each row insert is still atomic and behaves like 
 it normally would - either the insert succeeds or it doesn't.  Because 
 the updates are handled by the engine as many separate tables, no 
 special changes are required, and existing storage mechanisms (TOAST) 
 continue to function as they always did.  This could be written as a 
 super-function of sorts, one that would combine all of the smaller steps 
 together and use the existing mechanisms.
 
 Updates are performed in the same manner, with each column being 
 rolled up into a single invisible mini-transaction for the given record.

The problem is that the idea of rows being a string of bytes within a
page is spread pretty widely throughout the code; I'm pretty sure it
extends far beyond just smgr. At some point stuff becomes just datums,
but I don't know that there's a nice, clean line where that happens. I
think this is probably the biggest obstacle that you're facing.

 Deletes are performed by marking not only the columns as deleted but 
 also the control column as having that row available for overwrite.  I'm 
 simplifying quite a bit but I think the general idea is understood.  
 Yes, a delete will have significant overhead compared to an insert or 
 update but this is a known tradeoff that the administrator is willing to 
 make, so they can gain faster read speeds - ie. they want an 
 OLAP-oriented store, not an OLTP-oriented store.

You do *not* want to try and change how MVCC works at the same time
you're doing this. There *may* be some possibility of changing things
afterwards, but trying to tackle that off the bat is suicide. On top of
that, HOT might well may this kind of optimization pointless.

 The control column would be used to locate records that can be 
 overwritten quickly.  When a record is deleted, the control column's 
 bitmap was adjusted to indicate that a free space was available.  The 
 engine would then co-ordinate as it did above, but it can cheat - 
 instead of trying to figure things out for each table, the offset to 
 write to is already known, so the update proceeds as listed above, other 
 than each part of the little mini-transaction writes to the same 
 offset (ie. each column in the record will have the same hole, so 
 when you go to write the record out, write it to the same record 
 spot).  This is where the control column not only coordinates deletes 
 but also inserts that re-use space from deleted records.
-- 
Decibel!, aka Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)


pgp0bviTEQ2r6.pgp
Description: PGP signature