Re: [PATCHES] HOT latest patch - version 8

2007-07-31 Thread Gregory Stark


Heikki Linnakangas [EMAIL PROTECTED] writes:

 CREATE INDEX
 CREATE INDEX CONCURRENTLY
 -
 I'm not very familiar with how these, so I'll just shut up..

Here is an addendum for how HOT handles CREATE INDEX and CREATE INDEX
CONCURRENTLY.

CREATE INDEX

There are two strategies for CREATE INDEX depending on whether we're doing it
concurrently or not. For a regular locking create index we take the two
precautions:

1) When indexing a table we must take care that the index contains pointers
   only to Root tuples. These are the non-HOT tuples which everyone else
   expects to have index pointers.

   However the key value we want to index is actually the key at the *end* of
   the chain, ie, the most recent tuple on the index chain. Note that this
   means we could create broken HOT chains because the index key of the new
   index could have changed in the HOT chain.

   If we find any HOT-updated tuple with is RECENTLY_DEAD or
   DELETE_IN_PROGRESS we ignore it assuming that we'll also come across the
   *end* of the update chain and index that instead.

2) If we ignore any HOT-updated tuples above then we mark the index as
   unusable by older transactions for which those tuples might be visible.
   This ensures that nobody for whom the older tuples in a broken HOT chain
   are visible will use the index.

   We do this by putting our transaction-id in pg_index.indcreatexid after
   building the index. Queries check whether that indcreatexid is in their
   serializable snapshot, if it isn't then the index is not usable for that
   query.

   This means that transactions started before the create index commits will
   not get the benefit of the new index even if they first scan the table
   only after the index exists. However new transactions get the benefit of
   the new index immediately but they will always follow the HOT update
   chain since the old tuples with the possibly different keys will never be
   visible to them.

   The tricky case arises with queries executed in the same transaction as
   the CREATE INDEX. In the case of a new table created within the same
   transaction such as with pg_dump -1 the index will always be usable
   because there will never be any HOT update chains so the indcreatexid
   will never be set. Also in the case of a read-committed transaction new
   queries will always be able to use the index. Only in the case of a
   serializable transaction building an index on an existing table with HOT
   updates in it will the index go unused.


CREATE INDEX CONCURRENTLY

In the concurrent case we take a different approach.

1) we create the pg_index entry immediately, before we scan the table. The
   pg_index is marked as not ready for inserts. Then we commit and wait for
   any transactions which have the table open to finish. This ensures that no
   *new* HOT updates will change the key value for our new index.

2) We wait out any transactions which had the table open. Then we build the
   index with a snapshot. Because we waited out anyone who might have been
   around before the index was created any tuples seen in the snapshot will
   have only valid HOT chains following them. They may still have older HOT
   updates behind them which are broken. As above we point the index pointers
   at the Root of the HOT-update chain but we use the key from the end of the
   chain.

4) We mark the index open for inserts (but still not ready for reads) then we
   again wait out any transactions which had the table open. Then we take a
   second reference snapshot and validate the index. This searches for tuples
   missing from the index -- but it again has to look bup the root of the HOT
   chains and search for those tuples in the index.

4) Then we wait until *every* transaction in progress in the validate_index
   reference snapshot is finished. This ensures that nobody is alive any
   longer who could possibly see any of the tuples in a broken HOT chain.

Glossary:

Broken HOT Chain
 . A HOT chain in which the key value for an index has changed. This is
 not allowed to occur normally but if a new index is created it can happen. In
 that case various strategies are used to ensure that no transaction for which
 the older tuples are visible can use the index.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] HOT latest patch - version 8

2007-07-23 Thread Pavan Deolasee

On 7/15/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:



 Actually
storing InvalidOffsetNumber in lp_off is a bit bogus in the first place
since lp_off is unsigned, and InvalidOffsetNumber is -1, so I fixed that
as well.




I see InvalidOffsetNumber as 0 in off.h:26

#define InvalidOffsetNumber ((OffsetNumber) 0)

So I think we should be OK to use that to indicate redirect-dead
pointers.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [PATCHES] HOT latest patch - version 8

2007-07-18 Thread Heikki Linnakangas
[EMAIL PROTECTED] wrote:
 Any objections to changing the name of RedirectDead?  A RedirectDead
 ItemId is not really redirected, it's just a stub representing a dead
 tuple (the space for that tuple has been reused but an index entry may
 still point to the ItemId).  
 
 How about stub or tombstone?

Sounds good to me. Stub is a bit generic, I'd go for tombstone.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT latest patch - version 8

2007-07-16 Thread korryd
 I'm looking for ways to make the patch simpler and less invasive. 


Any objections to changing the name of RedirectDead?  A RedirectDead
ItemId is not really redirected, it's just a stub representing a dead
tuple (the space for that tuple has been reused but an index entry may
still point to the ItemId).  

How about stub or tombstone?

-- Korry


--
  Korry Douglas[EMAIL PROTECTED]
  EnterpriseDB  http://www.enterprisedb.com


Re: [PATCHES] HOT latest patch - version 8

2007-07-15 Thread Stefan Kaltenbrunner
Heikki Linnakangas wrote:
 Stefan Kaltenbrunner wrote:
 tried to test a bit on my Solaris 10 install(sun studio , 64bit build)
 but I'm hitting the following while trying to initdb a new cluster:
 
 I can't reproduce this error, but I found a bug that's likely causing
 it. The patch uses InvalidOffsetNumber in lp_off to mark so called
 redirect dead line pointers, but that special case is not checked in
 PageGetRedirectingOffset-function, writing to a caller-supplied array
 with -1 index instead, globbering over whatever is there. Actually
 storing InvalidOffsetNumber in lp_off is a bit bogus in the first place
 since lp_off is unsigned, and InvalidOffsetNumber is -1, so I fixed that
 as well.

this seems to fix the problem for me - a least I can now successfully
initdb a new cluster with the HOT patch applied.


Stefan

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] HOT latest patch - version 8

2007-07-15 Thread Simon Riggs
On Fri, 2007-07-13 at 16:22 +0100, Heikki Linnakangas wrote:
 Heikki Linnakangas wrote:
  I have some suggestions which I'll post separately, 

 I'm looking for ways to make the patch simpler and less invasive. We may 
 want to put back some of this stuff, or come up with a more clever 
 solution, in future releases, but right let's keep it simple.

I believe we're all trying to do that, but I would like to see some
analysis of which techniques are truly effective and which are not.
Simpler code may not have desirable behaviour and then the whole lot of
code is pointless. Let's make it effective by making it complex enough.
I'm not clear where the optimum lies. (c.f. Flying Buttresses).

 A significant chunk of the complexity and new code in the patch comes 
 from pruning hot chains and reusing the space for new updates. Because 
 we can't reclaim dead space in the page like a VACUUM does, without 
 holding the vacuum lock, we have to deal with pages that contain deleted 
 tuples, and be able to reuse them, and keep track of the changes in 
 tuple length etc.
 
 A much simpler approach would be to try to acquire the vacuum lock, and 
 compact the page the usual way, and fall back to a cold update if we 
 can't get the lock immediately.
 
 The obvious downside of that is that if a page is continuously pinned, 
 we can't HOT update tuples on it. Keeping in mind that the primary use 
 case for HOT is largeish tables, small tables are handled pretty well by 
 autovacuum, chances are pretty good that you can get a vacuum lock when 
 you need it.

The main problem HOT seeks to avoid is wasted inserts into indexes, and
the subsequent VACUUMing that requires. Small tables have smaller
indexes, so that the VACUUMing is less of a problem. If we have hot
spots in larger tables, DSM would allow us to avoid the I/O on the main
table, but we would still need to scan the indexes. So HOT *can* be
better than DSM. I'm worried that requiring the vacuum lock in all cases
will mean that HOT will be ineffective where it is needed most - in the
hot spots - i.e. the blocks that contain frequently updated rows. [As an
aside, in OLTP it is frequently the index blocks that become hot spots,
so reducing index inserts because of UPDATEs will also reduce block
contention]

Our main test case for OLTP is DBT-2 which follows TPC-C in being
perfectly scalable with no hot spots in the heap and limited hot spots
in the indexes. As such it's a poor test of real world applications,
where Benfold's Law holds true. Requiring the vacuum lock in all cases
would allow good benchmark performance but would probably fail in the
real world at providing good long term performance.

I'm interested in some numbers that show which we need. I'm thinking of
some pg_stats output that shows how many vac locks were taken and how
many prunes were made. Something general that allows some beta testers
to provide feedback on the efficacy of the patch.

That leads to the suggestion that we should make the HOT pruning logic
into an add-on patch, commit it, but evaluate its performance during
beta. If we have no clear evidence of additional benefit, we remove it
again.

I'm not in favour of background retail vacuuming by the bgwriter. The
timeliness of that is (similarly) in question and I think bgwriter has
enough work to do already.

[Just as a note to all performance testers: HOT is designed to show
long-term steady performance. Short performance tests frequently show no
benefit if sufficient RAM is available to avoid the table bloat and we
avoid hitting the point where autovacuums kick in. I know Heikki knows
this, just not sure we actually said it.]

-- 
  Simon Riggs
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT latest patch - version 8

2007-07-14 Thread Stefan Kaltenbrunner
Pavan Deolasee wrote:
 
 Please see updated version of the patch. This includes further code
 refactoring and bug fixes.
 
 heapam code is now refactored and I have also added several comments
 in that code as well as vacuum and create index related code.
 
 In the previous version, we were not collecting aborted heap-only
 tuples since they are not part of any chain. This version deals with
 such tuples.
 
 There is also some cleanup in the VACUUM FULL area. Especially
 we have dealt with the redirected line pointers by pointing them to
 the first non-dead tuple in the chain. Since we hold exclusive lock
 on the relation, we can easily do that.
 
 Anyways, we still need a comprehensive writeup, but if someone
 wants to test things further, this patch should be used.

tried to test a bit on my Solaris 10 install(sun studio , 64bit build)
but I'm hitting the following while trying to initdb a new cluster:

program terminated by signal SEGV (no mapping at the fault address)
Current function is PageGetRedirectingOffset
 1186   offsets[ItemIdGetRedirect(lp) - 1] = offnum;
(dbx) where
=[1] PageGetRedirectingOffset(page = 0xfd7ffdf2bea0 , offsets =
0xfd7fffdfd2fc, size = 582), line 1186 in bufpage.c
  [2] lazy_scan_heap(onerel = 0xc7fcd8, vacrelstats = 0xcb9940, Irel =
0xcd0008, nindexes = 2), line 434 in vacuumlazy.c
  [3] lazy_vacuum_rel(onerel = 0xc7fcd8, vacstmt = 0xc34e70, bstrategy =
0xcb76d8), line 187 in vacuumlazy.c
  [4] vacuum_rel(relid = 2608U, vacstmt = 0xc34e70, expected_relkind =
'r'), line 1109 in vacuum.c
  [5] vacuum(vacstmt = 0xc34e70, relids = (nil), bstrategy = 0xcb76d8,
isTopLevel = '\001'), line 424 in vacuum.c
  [6] ProcessUtility(parsetree = 0xc34e70, queryString = 0xbcd978
VACUUM pg_depend;\n, params = (nil), isTopLevel = '\001', dest =
0xb0f440, completionTag = 0xfd7fffdff5d0 ), line 997 in utility.c
  [7] PortalRunUtility(portal = 0xcc5328, utilityStmt = 0xc34e70,
isTopLevel = '\001', dest = 0xb0f440, completionTag = 0xfd7fffdff5d0
), line 1179 in pquery.c
  [8] PortalRunMulti(portal = 0xcc5328, isTopLevel = '\001', dest =
0xb0f440, altdest = 0xb0f440, completionTag = 0xfd7fffdff5d0 ),
line 1267 in pquery.c
  [9] PortalRun(portal = 0xcc5328, count = 9223372036854775807,
isTopLevel = '\001', dest = 0xb0f440, altdest = 0xb0f440, completionTag
= 0xfd7fffdff5d0 ), line 814 in pquery.c
  [10] exec_simple_query(query_string = 0xc34c08 VACUUM pg_depend;\n),
line 967 in postgres.c
  [11] PostgresMain(argc = 8, argv = 0xb94c68, username = 0xb8dd10
pgbuild), line 3527 in postgres.c
  [12] main(argc = 9, argv = 0xb94c60), line 186 in main.c


Stefan

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT latest patch - version 8

2007-07-14 Thread Heikki Linnakangas
Stefan Kaltenbrunner wrote:
 tried to test a bit on my Solaris 10 install(sun studio , 64bit build)
 but I'm hitting the following while trying to initdb a new cluster:

I can't reproduce this error, but I found a bug that's likely causing
it. The patch uses InvalidOffsetNumber in lp_off to mark so called
redirect dead line pointers, but that special case is not checked in
PageGetRedirectingOffset-function, writing to a caller-supplied array
with -1 index instead, globbering over whatever is there. Actually
storing InvalidOffsetNumber in lp_off is a bit bogus in the first place
since lp_off is unsigned, and InvalidOffsetNumber is -1, so I fixed that
as well.

Patch attached.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
*** src/backend/storage/page/bufpage.c	2007-07-14 20:54:04.0 +0100
--- src/backend/storage/page/bufpage.c	2007-07-14 20:37:36.0 +0100
***
*** 1180,1186 
  			continue;
  
  		/* interested in only redirected lps */
! 		if (!ItemIdIsRedirected(lp))
  			continue;
  
  		offsets[ItemIdGetRedirect(lp) - 1] = offnum;
--- 1180,1186 
  			continue;
  
  		/* interested in only redirected lps */
! 		if (!ItemIdIsRedirected(lp) || ItemIdIsRedirectDead(lp))
  			continue;
  
  		offsets[ItemIdGetRedirect(lp) - 1] = offnum;
*** src/include/storage/itemid.h	2007-07-14 20:54:04.0 +0100
--- src/include/storage/itemid.h	2007-07-14 20:46:09.0 +0100
***
*** 27,32 
--- 27,37 
  typedef ItemIdData *ItemId;
  
  /*
+  * Magic lp_off value to mark redirect dead line pointers 
+  */
+ #define DeadOffsetNumber		0x7FFF
+ 
+ /*
   * lp_flags contains these flags:
   */
  #define LP_USED			0x01	/* this line pointer is being used */
***
*** 127,140 
  ( \
  	AssertMacro(ItemIdIsValid(itemId)), \
  	(bool) ((ItemIdIsRedirected(itemId))  \
! 			(ItemIdGetRedirect(itemId) == InvalidOffsetNumber)) \
  )
  
  /* Set itemId to redirect-dead i.e. redirected to itself */
  #define ItemIdSetRedirectDead(itemId) \
  ( \
  	AssertMacro(ItemIdIsValid(itemId)), \
! 	ItemIdSetRedirect((itemId), InvalidOffsetNumber) \
  )
  
  #endif   /* ITEMID_H */
--- 132,145 
  ( \
  	AssertMacro(ItemIdIsValid(itemId)), \
  	(bool) ((ItemIdIsRedirected(itemId))  \
! 			(ItemIdGetRedirect(itemId) == DeadOffsetNumber)) \
  )
  
  /* Set itemId to redirect-dead i.e. redirected to itself */
  #define ItemIdSetRedirectDead(itemId) \
  ( \
  	AssertMacro(ItemIdIsValid(itemId)), \
! 	ItemIdSetRedirect((itemId), DeadOffsetNumber) \
  )
  
  #endif   /* ITEMID_H */

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] HOT latest patch - version 8

2007-07-13 Thread Heikki Linnakangas

Pavan Deolasee wrote:

Please see updated version of the patch. This includes further code
refactoring and bug fixes.


Thanks for the update, Pavan!

I've been looking at this patch in the last couple of weeks in detail. I 
wrote a short summary of how it works (attached) to help reviewing it. 
Especially the glossary is helpful, since the patch introduces a lot of 
new concepts.


I have some suggestions which I'll post separately, this just describes 
the status quo of the patch.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com
Use case

The best use case for HOT is a table that's frequently UPDATEd, and is large
enough that VACUUM is painful. On small tables that fit in cache, running 
VACUUM every few minutes isn't a problem.

Heap-only tuples

When a HOT update is performed, the new tuple is placed on the same page as the 
old one, marked with the HEAP_ONLY_TUPLE flag. HEAP_ONLY_TUPLE means that 
there's no index pointers to the tuple, which allows pruning the chain in the 
future. The old tuple is marked with HEAP_HOT_UPDATE-flag, which means that the 
tuple pointed to by t_ctid is a heap-only tuple. That needs to be taken into 
account when vacuuming, so that we don't remove the root tuple in the update 
chain, when there's no index pointers to the later tuples.

When doing an index scan, whenever we reach a non-visible tuple, we need to 
check if the tuple has been HOT-updated (== HEAP_HOT_UPDATE flag is set). If 
so, we need to follow the ctid pointer until we reach a visible one, or one 
that hasn't been HOT-updated.

Sequential scans (and bitmap heap scans with a lossy bitmap) don't need to pay 
attention to the flags.

Pre-requirements for HOT updates:
1. None of the indexed columns are changed
2. There is no functional indexes on the table
3. There is no partial indexes on the table

2. and 3. could be relaxed, the code just hasn't been written yet.

These requirements are checked at execution time, comparing the binary 
representation of the old and new values. That means that dummy updates, like 
UPDATE foo SET col1 = ?, where ? is the same as the old value can be HOT.

In addition to the above, there needs to be room on the page for the new tuple. 
If the page is full, we try to make room by pruning the page.

Pruning
---
When we're doing a HOT update, and there isn't enough space on the page, and 
there's no suitably sized LP_DELETEd tuples to reuse, all HOT update chains on 
the page are pruned to make room. Pruning can be thought of as a lightweight 
retail vacuum, that marks all dead heap-only tuples with LP_DELETE flag, 
allowing them to be reused. We can't just outright remove the tuples like we do 
in vacuum, because we'd need a vacuum-strength lock for that.

To reclaim the index-visible (i.e. first) tuple in a HOT chain, the line 
pointer is turned into a redirecting line pointer that points to the line 
pointer of the next tuple in the chain. To keep track of the space occupied by 
the dead tuple, so that we can reuse the space, a new line pointer is allocated 
and marked with LP_DELETE to point to the dead tuple. That means its tid 
changes, but that's ok since it's dead.

When the last live tuple in an update chain becomes dead (after a DELETE or a 
cold update), all tuples in the chain can be marked with LP_DELETE, and the 
redirecting line pointer is marked as redirected dead. 

We've effectively resurrected the truncate dead tuples to just line pointer 
idea that has been proposed and rejected before because of fear of line pointer 
bloat. To limit the damage in worst case, and to keep numerous arrays as well 
as the bitmaps in bitmap scans reasonably sized, the maximum number of line 
pointers (MaxHeapTuplesPerPage) is somewhat arbitrarily capped at 2 * what it 
was before.

In addition to pruning when a page gets full, pruning of a single HOT chain is 
done when doing an index fetch. That avoids doing the same chain-following work 
on future fetches of the same row.

VACUUM FULL
---
To make vacuum full work, any DEAD tuples in the middle of an update chain 
needs to be removed (see comments at the top of heap_prune_hotchain_hard for 
details). Vacuum full performs a more aggressive pruning that not only removes 
dead tuples at the beginning of an update chain, it scans the whole chain and 
removes any intermediate dead tuples as well.

Reusing LP_DELETEd heap tuples
--
When doing an update, HOT or not, we check if there's a tuple on the page 
marked with LP_DELETE that's big enough to accommodate the new tuple. If there 
is, that slot is reused, overwriting the deleted tuple. 

We could reuse the slots for inserts as well, but as the patch stands, we don't.

Row-level fragmentation
---
If the new tuple is smaller than the old LP_DELETEd tuple that's reused, the 
new tuple is marked as fragmented, which means that there is some unused space 
between the end of this 

Re: [PATCHES] HOT latest patch - version 8

2007-07-13 Thread Heikki Linnakangas

Heikki Linnakangas wrote:
I have some suggestions which I'll post separately, 


A significant chunk of the complexity and new code in the patch comes 
from pruning hot chains and reusing the space for new updates. Because 
we can't reclaim dead space in the page like a VACUUM does, without 
holding the vacuum lock, we have to deal with pages that contain deleted 
tuples, and be able to reuse them, and keep track of the changes in 
tuple length etc.


A much simpler approach would be to try to acquire the vacuum lock, and 
compact the page the usual way, and fall back to a cold update if we 
can't get the lock immediately.


The obvious downside of that is that if a page is continuously pinned, 
we can't HOT update tuples on it. Keeping in mind that the primary use 
case for HOT is largeish tables, small tables are handled pretty well by 
autovacuum, chances are pretty good that you can get a vacuum lock when 
you need it.


Thoughts?

I'm looking for ways to make the patch simpler and less invasive. We may 
want to put back some of this stuff, or come up with a more clever 
solution, in future releases, but right let's keep it simple.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PATCHES] HOT latest patch - version 8

2007-07-13 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
 A much simpler approach would be to try to acquire the vacuum lock, and 
 compact the page the usual way, and fall back to a cold update if we 
 can't get the lock immediately.
 
 Seems like that could work.

 I just realized that there's a big problem with that. The process doing 
 the update surely holds a pin on the buffer itself. Needs more thought..

So does VACUUM when it's trying to lock a page, no?  In any case we
could surely make an extra parameter to LockBufferForCleanup if it
really needs to distinguish the cases.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] HOT latest patch - version 8

2007-07-13 Thread Heikki Linnakangas

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:

Tom Lane wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:
A much simpler approach would be to try to acquire the vacuum lock, and 
compact the page the usual way, and fall back to a cold update if we 
can't get the lock immediately.

Seems like that could work.


I just realized that there's a big problem with that. The process doing 
the update surely holds a pin on the buffer itself. Needs more thought..


So does VACUUM when it's trying to lock a page, no?  In any case we
could surely make an extra parameter to LockBufferForCleanup if it
really needs to distinguish the cases.


The problem is that if we trigger the page cleanup from heap_update, as 
I was planning to do, the executor has already pinned the page and holds 
a reference to the old tuple on the page. We can't shuffle the data on 
the page, because the pointer to the old tuple would become invalid.


We could do still it from somewhere else, though. For example, in 
heap_fetch, the first time a page is touched. Or in bgwriter.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] HOT latest patch - version 8

2007-07-13 Thread Michael Glaesemann

Heikki,

Thanks for providing this summary. As someone unfamiliar with the  
domain (both HOT development specifically and tuple management in  
general), I found it easy to follow.


On Jul 13, 2007, at 8:31 , Heikki Linnakangas wrote:


Pruning
---


To reclaim the index-visible (i.e. first) tuple in a HOT chain, the  
line pointer is turned into a redirecting line pointer that points  
to the line pointer of the next tuple in the chain. To keep track  
of the space occupied by the dead tuple, so that we can reuse the  
space, a new line pointer is allocated and marked with LP_DELETE to  
point to the dead tuple. That means its tid changes, but that's ok  
since it's dead.



Row-level fragmentation
---


If there's no LP_DELETEd tuples large enough to fit the new tuple  
in, the row-level fragmentation is repaired in the hope that some  
of the slots were actually big enough, but were just fragmented.  
That's done by mapping the offsets in the page, and enlarging all  
LP_DELETEd line pointers up to the beginning of the next tuple.


Would it make sense to enlarge the LP_DELETEd line pointers up to the  
beginning of the next tuple at the time the tuple is marked LP_DELETE?



Vacuum
--
Vacuum prunes all HOT chains, and removes any LP_DELETEd tuples,  
making the space available for any use.


In the case of a fragmented row, am I right to assume vacuum reclaims  
all space up to the next (live) tuple?


Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] HOT latest patch - version 8

2007-07-13 Thread Heikki Linnakangas

Michael Glaesemann wrote:

On Jul 13, 2007, at 8:31 , Heikki Linnakangas wrote:

Row-level fragmentation
---


If there's no LP_DELETEd tuples large enough to fit the new tuple in, 
the row-level fragmentation is repaired in the hope that some of the 
slots were actually big enough, but were just fragmented. That's done 
by mapping the offsets in the page, and enlarging all LP_DELETEd line 
pointers up to the beginning of the next tuple.


Would it make sense to enlarge the LP_DELETEd line pointers up to the 
beginning of the next tuple at the time the tuple is marked LP_DELETE?


The thing is, it's relatively expensive to figure out where the next 
tuple starts. We need to scan all line pointers to do that. Though given 
that pruning all tuples on a page is a relatively expensive operation 
anyway, maybe it wouldn't be so bad.



Vacuum
--
Vacuum prunes all HOT chains, and removes any LP_DELETEd tuples, 
making the space available for any use.


In the case of a fragmented row, am I right to assume vacuum reclaims 
all space up to the next (live) tuple?


Yes, Vacuum will 'defrag' the page.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq