> Date: Mon, 14 Mar 2005 09:41:30 +0800
> From: "Qingqing Zhou" <[EMAIL PROTECTED]>
> To: firstname.lastname@example.org
> Subject: Re: One tuple per transaction
> Message-ID: <[EMAIL PROTECTED]>
> ""Tambet Matiisen"" <[EMAIL PROTECTED]> writes
> > If I'm correct, the dead tuples must be scanned also during
> table and
> > index scan, so a lot of dead tuples slows down queries
> > especially when the table doesn't fit into shared buffers any more.
> > And as I'm in transaction, I can't VACUUM to get rid of
> those tuples.
> > In one occasion the page count for a table went from 400 to
> 22000 at
> > the end.
> Not exactly. The dead tuple in the index will be scanned the
> first time (and its pointed heap tuple as well), then we will
> mark it dead, then next time we came here, we will know that
> the index tuple actually points to a uesless tuple, so we
> will not scan its pointed heap tuple.
But the dead index tuple will still be read from disk next time? Maybe
really the performance loss will be neglible, but if most of tuples in
your table/index are dead, then it might be significant.
Consider the often suggested solution for speeding up "select count(*)
from table" query: make another table rowcounts and for each of the
original tables add insert and delete triggers to update row count in
rowcounts table. Actually this is standard denormalization technique,
which I use often. For example to ensure that order.total =
Now, if typical inserts into your most active table occur in batches of
3 rows, in one transaction, then row count for this table is updated 3
times during transaction. 3 updates generate 3 tuples, while 2 of them
are dead from the very start. You effectively commit 2 useless tuples.
After millions of inserts you end up with rowcounts table having 2/3 of
dead tuples and queries start to slow down.
Current solution is to vacuum often. My proposal was to create new tuple
only with first update. The next updates in the same transaction would
update the existing tuple, not create a new.
But as I'm writing this, I'm starting to get some of the associated
implementation problems. The updated tuple might not be the same size as
previous tuple. Tuple updates are probably not implemented anyway. And
for a reason, as disk write takes the same time, regardless if you
update or write new data. And tons of other problems, which developers
are probably more aware of.
But one thing still bothers me. Why is new index tuple generated when I
update non-indexed column? OK, I get it again. Index tuple points to
heap tuple, thus after update it would point to dead tuple. And as it
takes the same time to update pointer or to write a new tuple, it's
easier to write a new.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?