Re: [HACKERS] vacuum, performance, and MVCC

Jan Wieck Sat, 24 Jun 2006 12:11:19 -0700

On 6/24/2006 9:23 AM, Mark Woodward wrote:

On Sat, 24 Jun 2006, Mark Woodward wrote:

I'm probably mistaken, but aren't there already forward references in
tuples to later versions? If so, I'm only sugesting reversing the order
and referencing the latest version.


I thought I understood your idea, but now you lost me again. I thought
what you want is that the older heap tuple has a pointer to the
newer one. Which it already has, it's called t_ctid.


Perfect!


Can you try to explain more carefully how the whole thing would work?
What would an index tuple point to? What pointers would a heap tuple
have? What would an index scan do to find the row version it's interested
in? What exactly would an update do?



Since we already allocate space for some notion of linked list, then all
I'm suggesting is we reverse the order, sort of. Currently it looks like
this:

ver001->ver002->ver003->...-verN

That's what t_ctid does now, right? Well, that's sort of stupid. Why not
have it do this:

ver001->verN->...->ver003->ver002->|
 ^---------------------------------/

This will speed up almost *all* queries when there are more than two
version of rows.

OK, here is the behavior of an update:
(1) Find the latest version of the row
(2) Duplicate row and modify as per plan
(3) Set the t_ctid of the new row to the last "latest"
(4) Set the t_ctid of the first row to that of the new row
(5) Attempt to index the row
(6) If the first version of the row is in the index already (ver001) Don't
modify the index, otherwise, add the new version (just as before)

When you vacuum, simply make the latest version (verN) the key row (ver001).

This isn't done "simply". Currently, vacuum collects a trivial array ofctid's it is removing and every now and then does a bulk remove of theindex tuples pointing to them. Now lets consider a table with twoindexed columns with the following row versions resulting from an insertand 3 updates to that same row:


  v1:  a,b
  v2:  a,c
  v3:  a,d
  v4:  b,d

In your new scheme, there would be two index tuples for column 1 (onepointing to v1, one pointing to v4) and 3 index tuples for column 2 (onefor each different value pointing to v1, v2 and v3). Did I get thatright so far?

If vacuum now can remove v1, it has to update index 1 to point to v2 andremove the pointer to v1 from index 2. If it can remove v1 and v2, ithas to update index 1 to point to v3 and remove v1 and v2 from index 2.If it can remove v1, v2 and v3 it must delete the index 1 tuple pointingto v1, delete the index 2 entries pointing to v1 and v2 and update theindex 2 entry for v3 to point to v4. Figuring out which index tuples toremove and which ones to update can only be done by comparing each andevery indexed columns old and new values. To do so, vacuum will have tofetch all the row versions, which can be scattered all over the place,with all possible locking issues including but not limited to deadlocks.



Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== [EMAIL PROTECTED] #

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Re: [HACKERS] vacuum, performance, and MVCC

Reply via email to