[HACKERS] HOT WIP Patch - version 1

Pavan Deolasee Wed, 14 Feb 2007 02:07:38 -0800

This is a WIP patch based on the recent posting by Simon and discussions
thereafter. We are trying to do one piece at a time and intention is to post
the work ASAP so that we could get early and continuous feedback from
the community. We could then incorporate those suggestions in the next
WIP patch.


To start with, this patch implements  HOT-update for a simple case
when there is enough free space in the same block so that it can
accommodate the new version of the tuple. A necessary condition for
doing HOT-update is that none of the index columns is changed.
The old version is marked as HEAP_UPDATE_ROOT and the new
version is marked as HEAP_ONLY_TUPLE. If a tuple is HOT-updated,
no new index entry is added.

When fetching a tuple using an index, if the root tuple is not visible to
the given snapshot, the ctid chain is followed until a visible tuple is
found or
end of HOT-update chain is reached. The prior_xmax/next_xmin chain
is validated while following the ctid chain.

This patch is generated on the current CVS head. It passes all the
regression
tests, but I haven't measured any performance impact since thats not the
goal for posting this early version. There are several things that are not
yet implemented and there are few unresolved issues for which I am looking
for community help and feedback.

Open Issues:
------------------

- CREATE INDEX needs more work in the HOT context. The existing HOT
tuples may require chilling for the CREATE INDEX to work correctly. There
are concerns about the crash-safety on chilling operation. Few suggestions
were posted in this regard. We need to conclude that and post a working
design/patch.

- We need to find a way to handle DEAD root tuples, either convert them into
stubs or overwrite them with a new version. We can also perform pointer
swinging from the index. Again there are concerns about crash-safety and
concurrent index-scans working properly. We don't have a community
consensus on any of the suggestions in this regard. But hopefully we
would converge on some design soon.

- Retail VACUUM. We need to implement the block-level vacuum for
UPDATEs to find enough free space in the block to do HOT-update.
Though we are still discussing how to handle the dead root tuples, we
should be able to remove any intermediate dead tuples in the HOT-update
chain safely. If we do so without fixing the root tuple, the
prior_xmax/next_xmin chain would be broken. A similar problem exists
with freezing HOT tuples.

Whats Next:
-----------------

In the current implementation, an HOT-updated tuple can not be vacuumed
because it might be in the middle of the access path to the heap-only
visible tuple.
This can cause the table to grow rapidly even if autovacuum is turned on.
The
HOT-update chain also keeps growing if there is enough free space in the
block.
I am thinking of implementing some sort of HOT-update chain squeezing logic
so that intermediate dead tuples can be retired and vacuumed away. This
would
also help us keep the HOT-update chain small enough so that the chain
following
does not become unduly costly.

I am thinking of squeezing the HOT-update chain while following it in the
index fetch.
If the root tuple is dead, we follow the chain until the first LIVE or
RECENTLY_DEAD tuple is found. The ctid pointer in the root tuple is made
point to the first LIVE or RECENTLY_DEAD tuple. All the intermediate
DEAD tuples are marked ~HEAP_UPDATE_ROOT so that they can be vacuumed
in the next cycle. We hold an exclusive lock on the page while doing so.
That should
avoid any race conditions. This infrastructure should also help us retail
vacuum the
block later.

Please let me know your comments.

Thanks,
Pavan

--

EnterpriseDB     http://www.enterprisedb.com

NewHOT-v1.1-pgsql-head.patch.gz
Description: GNU Zip compressed data

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

[HACKERS] HOT WIP Patch - version 1

Reply via email to