On Mon, Aug 8, 2016 at 11:08 PM, Bruce Momjian <br...@momjian.us> wrote:
> On Sun, Aug 7, 2016 at 12:55:01PM -0400, Bruce Momjian wrote:
> > On Sun, Aug 7, 2016 at 10:49:45AM -0400, Bruce Momjian wrote:
> > > OK, crazy idea time --- what if we only do WARM chain additions when
> > > indexed values are increasing (with NULLs higher than all values)? (If
> > > a key is always-increasing, it can't match a previous value in the
> > > chain.) That avoids the problem of having to check the WARM chain,
> > > except for the previous tuple, and the problem of pruning removing
> > > changed rows. It avoids having to check the index for matching
> > > values, and it prevents CREATE INDEX from having to index WARM chain
> > > values.
> > >
> > > Any decreasing value would cause a normal tuple be created.
> > Actually, when we add the first WARM tuple, we can mark the HOT/WARM
> > chain as either all-incrementing or all-decrementing. We would need a
> > bit to indicate that.
> FYI, is see at least two available tuple header bits here, 0x0800 and
> * information stored in t_infomask2:
> #define HEAP_NATTS_MASK 0x07FF /* 11 bits for number of
> attributes */
> /* bits 0x1800 are available */
> #define HEAP_KEYS_UPDATED 0x2000 /* tuple was updated and
> key cols
> * modified, or tuple
> deleted */
> #define HEAP_HOT_UPDATED 0x4000 /* tuple was HOT-updated */
> #define HEAP_ONLY_TUPLE 0x8000 /* this is heap-only tuple
> #define HEAP2_XACT_MASK 0xE000 /* visibility-related bits
What I am currently trying to do is to reuse at least the BlockNumber field
in t_ctid. For HOT/WARM chains, that field is really unused (except the
last tuple when regular update needs to store block number of the new
block). My idea is to use one free bit in t_infomask2 to tell us that
t_ctid is really not a CTID, but contains new information (for pg_upgrade's
sake). For example, one bit in bi_hi can tell us that this is the last
tuple in the chain (information today conveyed by t_ctid pointing to self).
Another bit can tell us that this tuple was WARM updated. We will still
have plenty of bits to store additional information about WARM chains.
> My guess is we would need one bit to mark a WARM chain, and perhaps
> reuse obsolete pre-9.0 HEAP_MOVED_OFF to indicate increment-only or
I am not convinced that the checking for increment/decrement adds a lot of
value. Sure, we might be able to address some typical work load, but is
that really a common use case? Instead, what I am looking at storing a
bitmap which shows us which table columns have changed so far in the WARM
chain. We only have limited bits, so we can track only limited columns.
This will help the cases where different columns are updated, but not so
much if the same column is updated repeatedly.
What will help, and something I haven't yet applied any thoughts, is when
we can turn WARM chains back to HOT by removing stale index entries.
Some heuristics and limits on amount of work done to detect duplicate index
entries will help too.
> We can't use the bits LP_REDIRECT lp_len because we need to create WARM
> chains before pruning, and I don't think walking the pre-pruned chain is
> worth it. (As I understand HOT, LP_REDIRECT is only created during
That's correct. But lp_len provides us some place to stash information from
heap tuples when they are pruned.
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services