Re: Zedstore - compressed in-core columnar storage

Alexandra Wang Tue, 27 Aug 2019 17:01:15 -0700

On Tue, Aug 27, 2019 at 12:03 AM Ashutosh Sharma <[email protected]>
wrote:


> My point is, once we find the leaf page containing the given tid, we go
> through each item in the page until we find the data corresponding to the
> given tid which means we kind of perform a sequential scan at the page
> level. I'm referring to the below loop in zsbt_attr_scan_fetch_array().
>
>         for (off = FirstOffsetNumber; off <= maxoff; off++)
>         {
>             ItemId      iid = PageGetItemId(page, off);
>             ZSAttributeArrayItem *item = (ZSAttributeArrayItem *)
> PageGetItem(page, iid);
>
>             if (item->t_endtid <= nexttid)
>                 continue;
>
>             if (item->t_firsttid > nexttid)
>                 break;
>
> But that's not true for IndexScan in case of heap table because there the
> index tuple contains the exact physical location of tuple in the heap. So,
> there is no need to scan the entire page.
>

You are correct that we currently go through each item in the leaf page that
contains the given tid, specifically, the logic to retrieve all the
attribute
items inside a ZSAttStream is now moved to decode_attstream() in the latest
code, and then in zsbt_attr_fetch() we again loop through each item we
previously retrieved from decode_attstream() and look for the given tid. One
optimization we can to is to tell decode_attstream() to stop decoding at the
tid we are interested in. We can also apply other tricks to speed up the
lookups in the page, for fixed length attribute, it is easy to do binary
search
instead of linear search, and for variable length attribute, we can probably
try something that we didn't think of yet.


1) In zsundo_insert_finish(), there is a double call to
> BufferGetPage(undobuf); Is that required ?
>

Fixed, thanks!


2) In zedstoream_fetch_row(), why is zsbt_tid_begin_scan() being called
> twice? I'm referring to the below code.
>
>     if (fetch_proj->num_proj_atts == 0)
>     {
>         ....
>         ....
>         zsbt_tid_begin_scan(rel, tid, tid + 1,
>                             snapshot,
>                             &fetch_proj->tid_scan);
>         fetch_proj->tid_scan.serializable = true;
>
>         for (int i = 1; i < fetch_proj->num_proj_atts; i++)
>         {
>             int         attno = fetch_proj->proj_atts[i];
>
>             zsbt_attr_begin_scan(rel,  reldesc, attno,
>                                  &fetch_proj->attr_scans[i - 1]);
>         }
>         MemoryContextSwitchTo(oldcontext);
>
>         zsbt_tid_begin_scan(rel, tid, tid + 1, snapshot,
> &fetch_proj->tid_scan);
>      }
>

I removed the second call, thanks!



> Also, for all types of update operation (be it key or non-key update) we
> create a new tid for the new version of tuple. Can't we use the tid
> associated with the old tuple for the cases where there is no concurrent
> transactions to whom the old tuple is still visible.
>

Zedstore currently implement update as delete+insert, hence the old tid is
not
reused. We don't store the tuple in our UNDO log, and we only store the
transaction information in the UNDO log. Reusing the tid of the old tuple
means
putting the old tuple in the UNDO log, which we have not implemented yet.


Thanks for reporting, this is very helpful! Patches are welcome as well!

Re: Zedstore - compressed in-core columnar storage

Reply via email to