On Mon, Apr 15, 2019 at 11:57:49AM -0700, Ashwin Agrawal wrote:
  On Mon, Apr 15, 2019 at 11:18 AM Tomas Vondra
  <tomas.von...@2ndquadrant.com> wrote:

    Maybe. I'm not going to pretend I fully understand the internals. Does
    that mean the container contains ZSUncompressedBtreeItem as elements? Or
    just the plain Datum values?

  First, your reading of code and all the comments/questions so far have
  been highly encouraging. Thanks a lot for the same.

;-)

  Container contains ZSUncompressedBtreeItem as elements. As for Item will
  have to store meta-data like size, undo and such info. We don't wish to
  restrict compressing only items from same insertion sessions only. Hence,
  yes doens't just store Datum values. Wish to consider it more tuple level
  operations and have meta-data for it and able to work with tuple level
  granularity than block level.

OK, thanks for the clarification, that somewhat explains my confusion.
So if I understand it correctly, ZSCompressedBtreeItem is essentially a
sequence of ZSUncompressedBtreeItem(s) stored one after another, along
with some additional top-level metadata.

  Definitely many more tricks can be and need to be applied to optimize
  storage format, like for fixed width columns no need to store the size in
  every item. Keep it simple is theme have been trying to maintain.
  Compression ideally should compress duplicate data pretty easily and
  efficiently as well, but we will try to optimize as much we can without
  the same.

I think there's plenty of room for improvement. The main problem I see
is that it mixes different types of data, which is bad for compression
and vectorized execution. I think we'll end up with a very different
representation of the container, essentially decomposing the items into arrays of values of the same type - array of TIDs, array of undo pointers, buffer of serialized values, etc.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to