On Thu, Apr 11, 2019 at 6:06 AM Rafia Sabih <rafia.pghack...@gmail.com> wrote: > Reading about it reminds me of this work -- TAG column storage( > http://www09.sigmod.org/sigmod/record/issues/0703/03.article-graefe.pdf ). > Isn't this storage system inspired from there, with TID as the TAG? > > It is not referenced here so made me wonder.
I don't think they're particularly similar, because that paper describes an architecture based on using purely logical row identifiers, which is not what a TID is. TID is a hybrid physical/logical identifier, sometimes called a "physiological" identifier, which will have significant overhead. Ashwin said that ZedStore TIDs are logical identifiers, but I don't see how that's compatible with a hybrid row/column design (unless you map heap TID to logical row identifier using a separate B-Tree). The big idea with Graefe's TAG design is that there is practically no storage overhead for these logical identifiers, because each entry's identifier is calculated by adding its slot number to the page's tag/low key. The ZedStore design, in contrast, explicitly stores TID for every entry. ZedStore seems more flexible for that reason, but at the same time the per-datum overhead seems very high to me. Maybe prefix compression could help here, which a low key and high key can do rather well. -- Peter Geoghegan