On 12/28/15 1:15 PM, Alvaro Herrera wrote:
Currently within the executor
a tuple is a TupleTableSlot which contains one Datum array, which has
all the values coming out of the HeapTuple; but for split storage
tuples, we will need to have a TupleTableSlot that has multiple "Datum
arrays" (in a way --- because, actually, once we get to vectorise as in
the preceding paragraph, we no longer have a Datum array, but some more
I think that trying to make the FDW API address all these concerns,
while at the same time*also* serving the needs of external data
sources, insanity will ensue.
Are you familiar with DataFrames in Pandas? They're a collection of
Series, which are essentially vectors. (Technically, they're more
complex than that because you can assign arbitrary indexes). So instead
of the normal collection of rows, a DataFrame is a collection of
columns. Series are also sparse (like our tuples), but the sparse value
can be anything, not just NULL (or NaN in panda-speak). There's also
DataFrames in R; not sure how equivalent they are.
I mention this because there's a lot being done with dataframes and they
might be a good basis for a columnstore API, killing 2 birds with one stone.
BTW, the underlying python type for Series is ndarrays, which are
specifically designed to interface to things like C arrays. So a column
store could potentially be accessed directly.
Aside from potential API inspiration, it might be useful to prototype a
columnstore using Series (or maybe ndarrays).
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: