subject:"\[PERFORM\] One tuple per transaction"

Re: [PERFORM] One tuple per transaction

2005-03-18 Thread Hannu Krosing

On L, 2005-03-12 at 14:05 -0800, Josh Berkus wrote:
> Tambet,
> 
> > In one of our applications we have a database function, which
> > recalculates COGS (cost of good sold) for certain period. This involves
> > deleting bunch of rows from one table, inserting them again in correct
> > order and updating them one-by-one (sometimes one row twice) to reflect
> > current state. The problem is, that this generates an enormous amount of
> > tuples in that table.
> 
> Sounds like you have an application design problem ...  how about re-writing 
> your function so it's a little more sensible?

Also, you could at least use a temp table for intermediate steps. This
will at least save WAL traffic.

-- 
Hannu Krosing <[EMAIL PROTECTED]>

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] One tuple per transaction

2005-03-15 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> On Tue, Mar 15, 2005 at 06:51:19PM -0500, Tom Lane wrote:
>> I believe VACUUM already knows that xmin = xmax implies the tuple
>> is dead to everyone.

> Huh, that is too simplistic in a subtransactions' world, isn't it?

Well, it's still correct as a fast-path check.  There are extensions
you could imagine making ... but offhand I agree that it's not worth
the trouble.  Maybe in a few years when everyone and his sister is
using subtransactions constantly, we'll feel a need to optimize these
cases. 

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] One tuple per transaction

2005-03-15 Thread Alvaro Herrera

On Tue, Mar 15, 2005 at 06:51:19PM -0500, Tom Lane wrote:
> Robert Treat <[EMAIL PROTECTED]> writes:
> >  On a similar note I was just wondering if it would be possible to
> > mark any of these dead tuples as ready to be reused at transaction
> > commit time, since we know that they are dead to any and all other
> > transactions currently going on.
> 
> I believe VACUUM already knows that xmin = xmax implies the tuple
> is dead to everyone.

Huh, that is too simplistic in a subtransactions' world, isn't it?

One way to solve this would be that a transaction that kills a tuple
checks whether it was created by itself (not necessarily the same Xid),
and somehow report it to the FSM right away.

That'd mean physically moving a lot of tuples in the page, so ISTM it's
too expensive an "optimization."  Oh, and also delete the tuple from
indexes.

-- 
Alvaro Herrera (<[EMAIL PROTECTED]>)
"Vivir y dejar de vivir son soluciones imaginarias.
La existencia está en otra parte" (Andre Breton)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match

Re: [PERFORM] One tuple per transaction

2005-03-15 Thread Tom Lane

Robert Treat <[EMAIL PROTECTED]> writes:
>  On a similar note I was just wondering if it would be possible to
> mark any of these dead tuples as ready to be reused at transaction
> commit time, since we know that they are dead to any and all other
> transactions currently going on.

I believe VACUUM already knows that xmin = xmax implies the tuple
is dead to everyone.

> This would save you from having to vacuum to get the tuples marked
> ready for reuse.

No; you forgot about reclaiming associated index entries.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] One tuple per transaction

2005-03-15 Thread Robert Treat

On Tuesday 15 March 2005 04:37, Richard Huxton wrote:
> Tambet Matiisen wrote:
> > Now, if typical inserts into your most active table occur in batches of
> > 3 rows, in one transaction, then row count for this table is updated 3
> > times during transaction. 3 updates generate 3 tuples, while 2 of them
> > are dead from the very start. You effectively commit 2 useless tuples.
> > After millions of inserts you end up with rowcounts table having 2/3 of
> > dead tuples and queries start to slow down.
> >
> > Current solution is to vacuum often. My proposal was to create new tuple
> > only with first update. The next updates in the same transaction would
> > update the existing tuple, not create a new.
>
> How do you roll back to a savepoint with this model?
>

You can't, but you could add the caveat to just do this auto-reuse within any 
given nested transaction.   Then as long as you aren't using savepoints you 
get to reclaim all the space/ 

 On a similar note I was just wondering if it would be possible to mark any of 
these dead tuples as ready to be reused at transaction commit time, since we 
know that they are dead to any and all other transactions currently going on.  
This would save you from having to vacuum to get the tuples marked ready for 
reuse.  In the above scenario this could be a win, whether it would be 
overall is hard to say. 

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] One tuple per transaction

2005-03-15 Thread Tambet Matiisen



> -Original Message-
> From: Richard Huxton [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, March 15, 2005 11:38 AM
> To: Tambet Matiisen
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] One tuple per transaction
> 
...
> 
> > Consider the often suggested solution for speeding up 
> "select count(*) 
> > from table" query: make another table rowcounts and for each of the 
> > original tables add insert and delete triggers to update 
> row count in 
> > rowcounts table. Actually this is standard denormalization 
> technique, 
> > which I use often. For example to ensure that order.total = 
> > sum(order_line.total).
> 
> This does of course completely destroy concurrency. Since you need to 
> lock the summary table, other clients have to wait until you are done.
> 

Yes, it does for rowcounts table. But consider the orders example - it
only locks the order which I add lines. As there is mostly one client
dealing with one order, but possibly thousands dealing with different
orders, it should not pose any concurrency restrictions.

> > Now, if typical inserts into your most active table occur 
> in batches 
> > of 3 rows, in one transaction, then row count for this table is 
> > updated 3 times during transaction. 3 updates generate 3 
> tuples, while 
> > 2 of them are dead from the very start. You effectively commit 2 
> > useless tuples. After millions of inserts you end up with rowcounts 
> > table having 2/3 of dead tuples and queries start to slow down.
> > 
> > Current solution is to vacuum often. My proposal was to create new 
> > tuple only with first update. The next updates in the same 
> transaction 
> > would update the existing tuple, not create a new.
> 
> How do you roll back to a savepoint with this model?
> 

Every savepoint initiates a new (sub)transaction.

  Tambet

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] One tuple per transaction

2005-03-15 Thread Richard Huxton

Tambet Matiisen wrote:
Not exactly. The dead tuple in the index will be scanned the 
first time (and its pointed heap tuple as well), then we will 
mark it dead, then next time we came here, we will know that 
the index tuple actually points to a uesless tuple, so we 
will not scan its pointed heap tuple.


But the dead index tuple will still be read from disk next time? Maybe
really the performance loss will be neglible, but if most of tuples in
your table/index are dead, then it might be significant.
When a block is read from disk, any dead tuples in that block will be 
read in. Vacuum recovers these.

Consider the often suggested solution for speeding up "select count(*)
from table" query: make another table rowcounts and for each of the
original tables add insert and delete triggers to update row count in
rowcounts table. Actually this is standard denormalization technique,
which I use often. For example to ensure that order.total =
sum(order_line.total).
This does of course completely destroy concurrency. Since you need to 
lock the summary table, other clients have to wait until you are done.

Now, if typical inserts into your most active table occur in batches of
3 rows, in one transaction, then row count for this table is updated 3
times during transaction. 3 updates generate 3 tuples, while 2 of them
are dead from the very start. You effectively commit 2 useless tuples.
After millions of inserts you end up with rowcounts table having 2/3 of
dead tuples and queries start to slow down.
Current solution is to vacuum often. My proposal was to create new tuple
only with first update. The next updates in the same transaction would
update the existing tuple, not create a new. 
How do you roll back to a savepoint with this model?
--
  Richard Huxton
  Archonet Ltd
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

Re: [PERFORM] One tuple per transaction

2005-03-15 Thread Tambet Matiisen

> --
> 
> Date: Mon, 14 Mar 2005 09:41:30 +0800
> From: "Qingqing Zhou" <[EMAIL PROTECTED]>
> To: pgsql-performance@postgresql.org
> Subject: Re: One tuple per transaction
> Message-ID: <[EMAIL PROTECTED]>
> 
> ""Tambet Matiisen"" <[EMAIL PROTECTED]> writes
...
> > If I'm correct, the dead tuples must be scanned also during 
> table and 
> > index scan, so a lot of dead tuples slows down queries 
> considerably, 
> > especially when the table doesn't fit into shared buffers any more. 
> > And as I'm in transaction, I can't VACUUM to get rid of 
> those tuples. 
> > In one occasion the page count for a table went from 400 to 
> 22000 at 
> > the end.
> 
> Not exactly. The dead tuple in the index will be scanned the 
> first time (and its pointed heap tuple as well), then we will 
> mark it dead, then next time we came here, we will know that 
> the index tuple actually points to a uesless tuple, so we 
> will not scan its pointed heap tuple.
> 

But the dead index tuple will still be read from disk next time? Maybe
really the performance loss will be neglible, but if most of tuples in
your table/index are dead, then it might be significant.

Consider the often suggested solution for speeding up "select count(*)
from table" query: make another table rowcounts and for each of the
original tables add insert and delete triggers to update row count in
rowcounts table. Actually this is standard denormalization technique,
which I use often. For example to ensure that order.total =
sum(order_line.total).

Now, if typical inserts into your most active table occur in batches of
3 rows, in one transaction, then row count for this table is updated 3
times during transaction. 3 updates generate 3 tuples, while 2 of them
are dead from the very start. You effectively commit 2 useless tuples.
After millions of inserts you end up with rowcounts table having 2/3 of
dead tuples and queries start to slow down.

Current solution is to vacuum often. My proposal was to create new tuple
only with first update. The next updates in the same transaction would
update the existing tuple, not create a new. 

But as I'm writing this, I'm starting to get some of the associated
implementation problems. The updated tuple might not be the same size as
previous tuple. Tuple updates are probably not implemented anyway. And
for a reason, as disk write takes the same time, regardless if you
update or write new data. And tons of other problems, which developers
are probably more aware of.

But one thing still bothers me. Why is new index tuple generated when I
update non-indexed column? OK, I get it again. Index tuple points to
heap tuple, thus after update it would point to dead tuple. And as it
takes the same time to update pointer or to write a new tuple, it's
easier to write a new.

Case closed.

  Tambet

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] One tuple per transaction

2005-03-13 Thread Qingqing Zhou

""Tambet Matiisen"" <[EMAIL PROTECTED]> writes
> Hi!
>
> In one of our applications we have a database function, which
> recalculates COGS (cost of good sold) for certain period. This involves
> deleting bunch of rows from one table, inserting them again in correct
> order and updating them one-by-one (sometimes one row twice) to reflect
> current state. The problem is, that this generates an enormous amount of
> tuples in that table.
>
> If I'm correct, the dead tuples must be scanned also during table and
> index scan, so a lot of dead tuples slows down queries considerably,
> especially when the table doesn't fit into shared buffers any more. And
> as I'm in transaction, I can't VACUUM to get rid of those tuples. In one
> occasion the page count for a table went from 400 to 22000 at the end.

Not exactly. The dead tuple in the index will be scanned the first time (and
its pointed heap tuple as well), then we will mark it dead, then next time
we came here, we will know that the index tuple actually points to a uesless
tuple, so we will not scan its pointed heap tuple.

>
> All this made me wonder, why is new tuple created after every update?
> One tuple per transaction should be enough, because you always commit or
> rollback transaction as whole. And my observations seem to indicate,
> that new index tuple is created after column update even if this column
> is not indexed.

This is one cost of MVCC. A good thing of MVCC is there is no conflict
between read and write - maybe some applications need this.

A reference could be found here:

http://www.postgresql.org/docs/8.0/static/storage-page-layout.html#HEAPTUPLEHEADERDATA-TABLE

>
> One tuple per transaction would save a loads of I/O bandwidth, so I
> believe there must be a reason why it isn't implemented as such. Or were
> my assumptions wrong, that dead tuples must be read from disk?
>
>   Tambet
>
> ---(end of broadcast)---
> TIP 8: explain analyze is your friend
>



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] One tuple per transaction

2005-03-13 Thread Tambet Matiisen



> -Original Message-
> From: Josh Berkus [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, March 13, 2005 12:05 AM
> To: Tambet Matiisen
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] One tuple per transaction
> 
> 
> Tambet,
> 
> > In one of our applications we have a database function, which 
> > recalculates COGS (cost of good sold) for certain period. This 
> > involves deleting bunch of rows from one table, inserting 
> them again 
> > in correct order and updating them one-by-one (sometimes one row 
> > twice) to reflect current state. The problem is, that this 
> generates 
> > an enormous amount of tuples in that table.
> 
> Sounds like you have an application design problem ...  how 
> about re-writing 
> your function so it's a little more sensible?
> 

I agree, that I have violated the no 1 rule of transactions - don't make
the transaction last too long. But imagine a situation, where a table is
updated twice in transaction. Why? Perhaps programmer felt, that the
code is more modular in this way. Now if you have tons of those
transactions, the I/O throughput is twice as big as it could be, because
every transaction creates two tuples instead of one. One tuple per
transaction could allow the programmer to keep his modular code and
benefit from the increased performance.

  Tambet

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] One tuple per transaction

2005-03-12 Thread Josh Berkus

Tambet,

> In one of our applications we have a database function, which
> recalculates COGS (cost of good sold) for certain period. This involves
> deleting bunch of rows from one table, inserting them again in correct
> order and updating them one-by-one (sometimes one row twice) to reflect
> current state. The problem is, that this generates an enormous amount of
> tuples in that table.

Sounds like you have an application design problem ...  how about re-writing 
your function so it's a little more sensible?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

[PERFORM] One tuple per transaction

2005-03-12 Thread Tambet Matiisen

Hi!

In one of our applications we have a database function, which
recalculates COGS (cost of good sold) for certain period. This involves
deleting bunch of rows from one table, inserting them again in correct
order and updating them one-by-one (sometimes one row twice) to reflect
current state. The problem is, that this generates an enormous amount of
tuples in that table.

If I'm correct, the dead tuples must be scanned also during table and
index scan, so a lot of dead tuples slows down queries considerably,
especially when the table doesn't fit into shared buffers any more. And
as I'm in transaction, I can't VACUUM to get rid of those tuples. In one
occasion the page count for a table went from 400 to 22000 at the end.

All this made me wonder, why is new tuple created after every update?
One tuple per transaction should be enough, because you always commit or
rollback transaction as whole. And my observations seem to indicate,
that new index tuple is created after column update even if this column
is not indexed.

One tuple per transaction would save a loads of I/O bandwidth, so I
believe there must be a reason why it isn't implemented as such. Or were
my assumptions wrong, that dead tuples must be read from disk?

  Tambet

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] One tuple per transaction

Re: [PERFORM] One tuple per transaction

Re: [PERFORM] One tuple per transaction

Re: [PERFORM] One tuple per transaction

Re: [PERFORM] One tuple per transaction

Re: [PERFORM] One tuple per transaction

Re: [PERFORM] One tuple per transaction

Re: [PERFORM] One tuple per transaction

Re: [PERFORM] One tuple per transaction

Re: [PERFORM] One tuple per transaction

Re: [PERFORM] One tuple per transaction

[PERFORM] One tuple per transaction

12 matches

Site Navigation

Mail list logo

Footer information