Re: In-place index updates and HOT (Was: [HACKERS] Patch: Write Amplification Reduction Method (WARM))

2017-07-29 Thread Peter Geoghegan

Claudio Freire  wrote:

README.HOT says that that cost is not worth the benefit of
preventing a new index write, but I think that it ought to take into
account that not all index writes are equal. There is an appreciable
difference between inserting a new tuple, and updating one in-place. We
can remove the cost (hurting new snapshots by making them go through old
heap pages) while preserving most of the benefits (no logically
unnecessary index bloat).


It's a neat idea.


Thanks.

I think it's important to both prevent index bloat, and to make sure
that only the latest version is pointed to within indexes. There are
only so many ways that that can be done. I've tried to come up with a
way of doing those two things that breaks as little of heapam.c as
possible. As a bonus, some kind of super-pruning of many linked HOT
chains may be enabled, which is something that an asynchronous process
can do when triggered by a regular prune within a user backend.

This is a kind of micro-vacuum that is actually much closer to VACUUM
than the kill_prior_tuple stuff, or traditional pruning, in that it
potentially kills index entries (just those that were not subsequently
updated in place, because the new values for the index differed), and
then kills heap tuples, all together, without even keeping around a stub
itemId in the heap. And, chaining together HOT chains also lets us chain
together pruning. Retail index tuple deletion from pruning needs to be
crash safe, unlike LP_DEAD setting.


And, well, now that you mention, you don't need to touch indexes at all.

You can create the new chain, and "update" the index to point to it,
without ever touching the index itself, since you can repoint the old
HOT chain's start line pointer to point to the new HOT chain, create
a new pointer for the old one and point to it in the new HOT chain's
t_tid.

Existing index tuples thus now point to the right HOT chain without
having to go into the index and make any changes.

You do need the new HOT chain to live in the same page for this,
however.


That seems complicated. The idea that I'm trying to preserve here is the
idea that the beginning of a HOT-chain (a definition that includes a
"potential HOT chain" -- a single heap tuple that could later receive a
HOT UPDATE) unambiguously signals a need for physical changes to indexes
in all cases. The idea that I'm trying to move away from is that those
physical changes need to be new index insertions (new insertions should
only happen when it is logically necessary, because indexed values
changed).

Note that this can preserve the kill_prior_tuple stuff, I think, because
if everything is dead within a single HOT chain (a HOT chain by our
current definition -- not a chain of HOT chains) then nobody can need
the index tuple. This does require adding complexity around aborted
transactions, whose new (potential) HOT chain t_tid "backpointer" is
still needed; we must revise the definition of a HOT chain being
all_dead to accommodate that. But for the most part, we preserve HOT
chains as a thing that garbage collection can independently reason
about, process with single page atomic operations while still being
crash safe, etc.

As far as microvacuum style garbage collection goes, at a high level,
HOT chains seem like a good choke point to do clean-up of both heap
tuples (pruning) and index tuples. The complexity of doing that seems
manageable. And by chaining together HOT chains, you can really
aggressively microvacuum many HOT chains on many pages within an
asynchronous process as soon as the long running transaction goes away.
We lean on temporal locality for garbage collection.

There are numerous complications that I haven't really acknowledged but
am at least aware of. For one, when I say "update in place", I don't
necessarily mean it literally. It's probably possible to literally
update in place with unique indexes. For secondary indexes, which should
still have heap TID as part of their keyspace (once you go implement
that, Claudio), we probably need an index insertion immediately followed
by an index deletion, often within the same leaf page.

I hope that this design, such as it is, will be reviewed as a thought
experiment. What would be good or bad about a design like this in the
real world, particularly as compared to alternatives that we know about?
Is *some* "third way" design desirable and achievable, if not this one?
By "third way" design, I mean a design that is much less invasive than
adopting UNDO for MVCC, that still addresses the issues that we
currently have with certain types of UPDATE-heavy workloads, especially
when there are long running transactions, etc. I doubt that WARM meets
this standard, unfortunately, because it doesn't do anything for cases
that suffer only due to a long running xact.

I don't accept that there is a rigid dichotomy between Postgres style
MVCC, and using UNDO for MVCC, and I most certainly don't accept that
garbage 

Re: In-place index updates and HOT (Was: [HACKERS] Patch: Write Amplification Reduction Method (WARM))

2017-07-28 Thread Claudio Freire
On Fri, Jul 28, 2017 at 8:32 PM, Peter Geoghegan  wrote:
> README.HOT says that that cost is not worth the benefit of
> preventing a new index write, but I think that it ought to take into
> account that not all index writes are equal. There is an appreciable
> difference between inserting a new tuple, and updating one in-place. We
> can remove the cost (hurting new snapshots by making them go through old
> heap pages) while preserving most of the benefits (no logically
> unnecessary index bloat).

It's a neat idea.

And, well, now that you mention, you don't need to touch indexes at all.

You can create the new chain, and "update" the index to point to it,
without ever touching the index itself, since you can repoint the old
HOT chain's start line pointer to point to the new HOT chain, create
a new pointer for the old one and point to it in the new HOT chain's
t_tid.

Existing index tuples thus now point to the right HOT chain without
having to go into the index and make any changes.

You do need the new HOT chain to live in the same page for this,
however.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


In-place index updates and HOT (Was: [HACKERS] Patch: Write Amplification Reduction Method (WARM))

2017-07-28 Thread Peter Geoghegan

Pavan Deolasee  wrote:

One good thing is that the patch is ready and fully functional. So that
allows those who are keen to run real performance tests and see the actual
impact of the patch.


Very true.


I see your point. But I would like to think this way: does the technology
significantly help many common use cases, that are currently not addressed
by HOT? It probably won't help all workloads, that's given. Also, we don't
have any credible alternative while this patch has progressed quite a lot.
May be Robert will soon present the pluggable storage/UNDO patch and that
will cover everything and more that is currently covered by HOT/WARM. That
will probably make many other things redundant.


Well, I don't assume that it will; again, I just don't know. I agree
with your general assessment of things, which is that WARM, EDB's
Z-Heap/UNDO project, and things like IOTs have significant overlap in
terms of the high-level problems that they fix. While it's hard to say
just how much overlap exists, it's clearly more than a little. And, you
are right that we don't have a credible alternative in this general
category right now. The WARM patch is available today.

As you may have noticed, in recent weeks I've been very vocal about the
role of index bloat in cases where bloat has a big impact on production
workloads. I think that it has an under-appreciated role in workloads
that deteriorate over time, as bloat accumulates. Perhaps HOT made such
a big difference to workloads 10 years ago not just because it prevented
creating new index entries. It also reduced fragmentation of the
keyspace in indexes, by never inserting duplicates in the first place.

I have some rough ideas related to this, and to the general questions
you're addressing. I'd like to run these by you.

In-place index updates + HOT


Maybe we could improve things markedly in this general area by "chaining
together HOT chains", and updating index heap pointers in place, to
point to the start of the latest HOT chain in that chain of chains
(provided the index tuple was "logically unchanged" -- otherwise, you'd
need to have both sets of indexed values at once, of course). Index
tuples therefore always point to the latest HOT chain, favoring recent
MVCC snapshots over older ones.

Pruning
---

HOT pruning is great because you can remove heap bloat without worrying
about there being index entries with heap item pointers pointing to what
is removed. But isn't that limitation as much about what is in the index
as it is about what is in the heap?

Under this scheme, you don't even have to keep around the old ItemId
stub when pruning, if it's a sufficiently old HOT chain that no index
points to the corresponding TID. That may not seem like a lot of bloat
to have to keep around, but it accumulates within a page until VACUUM
runs, ultimately limiting the effectiveness of pruning for certain
workloads.

Old snapshots/row versions
--

Superseding HOT chains have their last heap tuple's t_tid point to the
start of the preceding/superseded HOT chain (not their own TID, as
today, which is redundant), which may or may not be on the same heap
page. That's how old snapshots go backwards to get old versions, without
needing their own "logically redundant" index entries. So with UPDATE
heavy workloads that are essentially HOT-safe today, performance doesn't
tank due to a long running transaction that obstructs pruning within a
heap page, and thus necessitates the insertion of new index tuples.
That's the main justification for this entire design.

It's also possible that pruning can be taught that since only one index
update was logically necessary when the to-be-pruned HOT chain was
created, it's worth doing a "retail index tuple deletion" against the
index tuple that was logically necessary, then completely obliterating
the HOT chain, stub item pointer and all.

Bloat and locality
--

README.HOT argues against HOT chains that span pages, which this is a
bit like, on the grounds that it's bad news that every recent snapshot
has to go through the old heap page. That makes sense, but only because
the temporal locality there is horrible, which would not be the case
here. README.HOT says that that cost is not worth the benefit of
preventing a new index write, but I think that it ought to take into
account that not all index writes are equal. There is an appreciable
difference between inserting a new tuple, and updating one in-place. We
can remove the cost (hurting new snapshots by making them go through old
heap pages) while preserving most of the benefits (no logically
unnecessary index bloat).

The benefit of HOT is clearly more bloat prevention than not having to
visit indexes at all. InnoDB secondary index updates update the index
twice: The first time, during the update itself, and the second time, by
the purge thread, once the xact commits. Clearly they care about