On 29/08/2019 14:30, Ashutosh Sharma wrote:

On Wed, Aug 28, 2019 at 5:30 AM Alexandra Wang <lew...@pivotal.io <mailto:lew...@pivotal.io>> wrote:

    You are correct that we currently go through each item in the leaf
    page that
    contains the given tid, specifically, the logic to retrieve all the
    attribute
    items inside a ZSAttStream is now moved to decode_attstream() in the
    latest
    code, and then in zsbt_attr_fetch() we again loop through each item we
    previously retrieved from decode_attstream() and look for the given
tid.

Okay. Any idea why this new way of storing attribute data as streams (lowerstream and upperstream) has been chosen just for the attributes but not for tids. Are only attribute blocks compressed but not the tids blocks?

Right, only attribute blocks are currently compressed. Tid blocks need to be modified when there are UPDATEs or DELETE, so I think having to decompress and recompress them would be more costly. Also, there is no user data on the TID tree, and the Simple-8b encoded codewords used to represent the TIDs are already pretty compact. I'm not sure how much gain you would get from passing it through a general purpose compressor.

I could be wrong though. We could certainly try it out, and see how it performs.

    One
    optimization we can to is to tell decode_attstream() to stop
    decoding at the
    tid we are interested in. We can also apply other tricks to speed up the
    lookups in the page, for fixed length attribute, it is easy to do
    binary search
    instead of linear search, and for variable length attribute, we can
    probably
try something that we didn't think of yet.

I think we can probably ask decode_attstream() to stop once it has found the tid that we are searching for but then we only need to do that for Index Scans.

I've been thinking that we should add a few "bookmarks" on long streams, so that you could skip e.g. to the midpoint in a stream. It's a tradeoff though; when you add more information for random access, it makes the representation less compact.

    Zedstore currently implement update as delete+insert, hence the old
    tid is not
    reused. We don't store the tuple in our UNDO log, and we only store the
    transaction information in the UNDO log. Reusing the tid of the old
    tuple means
    putting the old tuple in the UNDO log, which we have not implemented
    yet.

OKay, so that means performing update on a non-key attribute would also require changes in the index table. In short, HOT update is currently not possible with zedstore table. Am I right?

That's right. There's a lot of potential gain for doing HOT updates. For example, if you UPDATE one column on every row on a table, ideally you would only modify the attribute tree containing that column. But that hasn't been implemented.

- Heikki


Reply via email to