On 29/08/2019 14:30, Ashutosh Sharma wrote:
On Wed, Aug 28, 2019 at 5:30 AM Alexandra Wang <lew...@pivotal.io
<mailto:lew...@pivotal.io>> wrote:
You are correct that we currently go through each item in the leaf
page that
contains the given tid, specifically, the logic to retrieve all the
attribute
items inside a ZSAttStream is now moved to decode_attstream() in the
latest
code, and then in zsbt_attr_fetch() we again loop through each item we
previously retrieved from decode_attstream() and look for the given
tid.
Okay. Any idea why this new way of storing attribute data as streams
(lowerstream and upperstream) has been chosen just for the attributes
but not for tids. Are only attribute blocks compressed but not the tids
blocks?
Right, only attribute blocks are currently compressed. Tid blocks need
to be modified when there are UPDATEs or DELETE, so I think having to
decompress and recompress them would be more costly. Also, there is no
user data on the TID tree, and the Simple-8b encoded codewords used to
represent the TIDs are already pretty compact. I'm not sure how much
gain you would get from passing it through a general purpose compressor.
I could be wrong though. We could certainly try it out, and see how it
performs.
One
optimization we can to is to tell decode_attstream() to stop
decoding at the
tid we are interested in. We can also apply other tricks to speed up the
lookups in the page, for fixed length attribute, it is easy to do
binary search
instead of linear search, and for variable length attribute, we can
probably
try something that we didn't think of yet.
I think we can probably ask decode_attstream() to stop once it has found
the tid that we are searching for but then we only need to do that for
Index Scans.
I've been thinking that we should add a few "bookmarks" on long streams,
so that you could skip e.g. to the midpoint in a stream. It's a tradeoff
though; when you add more information for random access, it makes the
representation less compact.
Zedstore currently implement update as delete+insert, hence the old
tid is not
reused. We don't store the tuple in our UNDO log, and we only store the
transaction information in the UNDO log. Reusing the tid of the old
tuple means
putting the old tuple in the UNDO log, which we have not implemented
yet.
OKay, so that means performing update on a non-key attribute would also
require changes in the index table. In short, HOT update is currently
not possible with zedstore table. Am I right?
That's right. There's a lot of potential gain for doing HOT updates. For
example, if you UPDATE one column on every row on a table, ideally you
would only modify the attribute tree containing that column. But that
hasn't been implemented.
- Heikki