On Mon, Apr 15, 2019 at 11:57:49AM -0700, Ashwin Agrawal wrote:
On Mon, Apr 15, 2019 at 11:18 AM Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:
Maybe. I'm not going to pretend I fully understand the internals. Does
that mean the container contains ZSUncompressedBtreeItem as elements? Or
just the plain Datum values?
First, your reading of code and all the comments/questions so far have
been highly encouraging. Thanks a lot for the same.
;-)
Container contains ZSUncompressedBtreeItem as elements. As for Item will
have to store meta-data like size, undo and such info. We don't wish to
restrict compressing only items from same insertion sessions only. Hence,
yes doens't just store Datum values. Wish to consider it more tuple level
operations and have meta-data for it and able to work with tuple level
granularity than block level.
OK, thanks for the clarification, that somewhat explains my confusion.
So if I understand it correctly, ZSCompressedBtreeItem is essentially a
sequence of ZSUncompressedBtreeItem(s) stored one after another, along
with some additional top-level metadata.
Definitely many more tricks can be and need to be applied to optimize
storage format, like for fixed width columns no need to store the size in
every item. Keep it simple is theme have been trying to maintain.
Compression ideally should compress duplicate data pretty easily and
efficiently as well, but we will try to optimize as much we can without
the same.
I think there's plenty of room for improvement. The main problem I see
is that it mixes different types of data, which is bad for compression
and vectorized execution. I think we'll end up with a very different
representation of the container, essentially decomposing the items into
arrays of values of the same type - array of TIDs, array of undo
pointers, buffer of serialized values, etc.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services