Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-15 Thread Jim Nasby
On Aug 4, 2011, at 8:40 AM, Kevin Grittner wrote:
 Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 3, 2011 at 6:05 PM, Jim Nasby j...@nasby.net wrote:
 Not sure how much this relates to this discussion, but I have
 often wished we had AFTER FOR EACH STATEMENT triggers that
 provided OLD and NEW recordsets you could make use of. Sometimes
 it's very valuably to be able to look at *all* the rows that
 changed in a transaction in one shot.
 
 Yeah, that would be awesome.  I think some of our competitors
 provide exactly that feature...
 
 If I remember correctly, MS SQL Server and Sybase ASE provide
 INSERTED and DELETED relations in triggers instead of NEW and OLD
 records.  In a FOR EACH ROW trigger the relation contains only one
 row.
 
 This is related to the thread on BEFORE triggers, in that these
 products require that you UPDATE the row in the base table to modify
 it (normally by joining to the INSERTED relation), making the latest
 values available to other trigger code, and providing a clear
 distinction between the values coming in to the trigger and the
 latest values in the database.

Yeah, that sounds like how DB2 does it (though the relations were by default 
named NEW and OLD).

If there is agreement that having a NEW recordset that contains all the new 
records (or new values on updates) and an OLD recordset that contains all the 
old records, are there any other API issues that need to be resolved? How would 
this actually be implemented?
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-04 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 On Wed, Aug 3, 2011 at 6:05 PM, Jim Nasby j...@nasby.net wrote:
 Not sure how much this relates to this discussion, but I have
 often wished we had AFTER FOR EACH STATEMENT triggers that
 provided OLD and NEW recordsets you could make use of. Sometimes
 it's very valuably to be able to look at *all* the rows that
 changed in a transaction in one shot.
 
 Yeah, that would be awesome.  I think some of our competitors
 provide exactly that feature...
 
If I remember correctly, MS SQL Server and Sybase ASE provide
INSERTED and DELETED relations in triggers instead of NEW and OLD
records.  In a FOR EACH ROW trigger the relation contains only one
row.
 
This is related to the thread on BEFORE triggers, in that these
products require that you UPDATE the row in the base table to modify
it (normally by joining to the INSERTED relation), making the latest
values available to other trigger code, and providing a clear
distinction between the values coming in to the trigger and the
latest values in the database.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-03 Thread Jim Nasby
On Aug 2, 2011, at 7:09 AM, Simon Riggs wrote:
  The best compression and flexibility in
 that case is to store a bitmap since that will average out at about 1
 bit per row, with variable length bitmaps. Which is about 8 times
 better compression ratio than originally suggested, without any loss
 of metadata.
 
 Yeah that's probably possible in specific cases, but I'm still not
 sure how to make it meet the full requirements of the after trigger
 queue.
 
 I think you'd better explain what use case you are trying to optimise
 for. It seems unlikely that you will come up with a compression scheme
 that will fit all cases.
 
 The only cases that seem a problem to me are
 * bulk RI checks
 * large writes on tables using trigger based replication
 maybe you have others?

Not sure how much this relates to this discussion, but I have often wished we 
had AFTER FOR EACH STATEMENT triggers that provided OLD and NEW recordsets you 
could make use of. Sometimes it's very valuably to be able to look at *all* the 
rows that changed in a transaction in one shot.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-03 Thread Robert Haas
On Wed, Aug 3, 2011 at 6:05 PM, Jim Nasby j...@nasby.net wrote:
 Not sure how much this relates to this discussion, but I have often wished we 
 had AFTER FOR EACH STATEMENT triggers that provided OLD and NEW recordsets 
 you could make use of. Sometimes it's very valuably to be able to look at 
 *all* the rows that changed in a transaction in one shot.

Yeah, that would be awesome.  I think some of our competitors provide
exactly that feature...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-03 Thread Alvaro Herrera
Excerpts from Jim Nasby's message of mié ago 03 18:05:21 -0400 2011:

 Not sure how much this relates to this discussion, but I have often wished we 
 had AFTER FOR EACH STATEMENT triggers that provided OLD and NEW recordsets 
 you could make use of. Sometimes it's very valuably to be able to look at 
 *all* the rows that changed in a transaction in one shot.

I have a mind to implement this sometime.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-02 Thread Dean Rasheed
On 1 August 2011 20:53, Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 OK, so I should split this into 2 patches?
 Even without the compression, it's probably worth the 16 - 10 byte
 reduction that would result from removing the 2nd CTID in the UPDATE
 case, and that part would be a pretty small patch.

 Yeah, my point exactly.  The rest of it might or might not be worth the
 extra complication.


OK, here's a patch for the first bit - just removing the second CTID
in the UPDATE case, and including a sanity check of the new tuple's
xmin and cmin.

It passes all the regression tests. I also tested it by doing a 10M
row UPDATE x=x+1 on a deferrable PK, and it gave about the expected
reduction in memory usage, with no difference in run time.

I'll test out the additional compression separately.

Regards,
Dean


after-triggers-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-02 Thread Dean Rasheed
On 1 August 2011 21:02, Simon Riggs si...@2ndquadrant.com wrote:
 I would prefer an approach where we store the events in a buffer,
 which gets added to the main event queue when it fills/block number
 changes/etc. That way we can apply intelligence to the actual
 compression format used, yet retain all required info in the queued
 event. We might also be able to ensure the event queue is ordered more
 naturally by block number, to ensure we have reasonable readahead
 while re-traversing the event queue - its not just the size of the
 event queue that is the problem here.


I tried a similar approach before
(http://archives.postgresql.org/pgsql-hackers/2009-10/msg00464.php)
but one of the problems was that, for user-defined triggers at least,
it was thought that the order of firing for the triggers needed to
match the row update order, which is difficult to achieve with a
bitmap.

I also worry about the runtime cost of anything too complicated. What
I was aiming for with this patch was a more modest compression that
wouldn't add too many additional cycles (or complexity) to the
existing code (I need to do more testing to confirm that that is
indeed the case).

The biggest recent complaint I found in this area was this thread:
http://archives.postgresql.org/pgsql-bugs/2010-12/msg2.php, which
was a one billion row insert. Personally, if I ever insert that much
data I always drop and re-create all the constraints and triggers, but
it would be nice if we could achieve that sort of size on reasonably
modern hardware.

 I'm also nervous of compression formats that rely too heavily on
 earlier data, so any corner case bug in one of the formats will screw
 the whole data structure. I'd prefer a more modular approach that is
 easier to test in regression tests.


Nearly all compression formats rely on earlier data. In this case each
event only depends on the previous event, so it just needs testing
with various classes of difference between successive rows, perhaps
with some C test code to make up CTIDs testing the corner cases.

 The cases we have problems with are the bulk cases, so we should be
 specifically optimising for that. For example, if we are running a
 COPY/INSERT SELECT or a seqscan DELETE then we will queue a trigger
 for every tuple in a block.

Agreed. That's what I tried to target with my code too - so it
achieves its best compression with sequential data.

 The best compression and flexibility in
 that case is to store a bitmap since that will average out at about 1
 bit per row, with variable length bitmaps. Which is about 8 times
 better compression ratio than originally suggested, without any loss
 of metadata.

Yeah that's probably possible in specific cases, but I'm still not
sure how to make it meet the full requirements of the after trigger
queue.

Regards,
Dean



 We should also optimise the HOT UPDATE case.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-02 Thread Simon Riggs
On Tue, Aug 2, 2011 at 12:28 PM, Dean Rasheed dean.a.rash...@gmail.com wrote:
 On 1 August 2011 21:02, Simon Riggs si...@2ndquadrant.com wrote:
 I would prefer an approach where we store the events in a buffer,
 which gets added to the main event queue when it fills/block number
 changes/etc. That way we can apply intelligence to the actual
 compression format used, yet retain all required info in the queued
 event. We might also be able to ensure the event queue is ordered more
 naturally by block number, to ensure we have reasonable readahead
 while re-traversing the event queue - its not just the size of the
 event queue that is the problem here.


 I tried a similar approach before
 (http://archives.postgresql.org/pgsql-hackers/2009-10/msg00464.php)
 but one of the problems was that, for user-defined triggers at least,
 it was thought that the order of firing for the triggers needed to
 match the row update order, which is difficult to achieve with a
 bitmap.

Luckily, I think what I said 2 years ago was correct.
http://archives.postgresql.org/pgsql-hackers/2009-10/msg01528.php

We just need a way to say that trigger execution order is not
important, which we know to be true for FK checks - or any STABLE
function.

I think that label is STABLE so we don't even need to invent new
function labels.

 The cases we have problems with are the bulk cases, so we should be
 specifically optimising for that. For example, if we are running a
 COPY/INSERT SELECT or a seqscan DELETE then we will queue a trigger
 for every tuple in a block.

 Agreed. That's what I tried to target with my code too - so it
 achieves its best compression with sequential data.

Maybe, but you're still storing 1 byte per row at least, often more.
Whereas 1 bit would be much better compression.

  The best compression and flexibility in
 that case is to store a bitmap since that will average out at about 1
 bit per row, with variable length bitmaps. Which is about 8 times
 better compression ratio than originally suggested, without any loss
 of metadata.

 Yeah that's probably possible in specific cases, but I'm still not
 sure how to make it meet the full requirements of the after trigger
 queue.

I think you'd better explain what use case you are trying to optimise
for. It seems unlikely that you will come up with a compression scheme
that will fit all cases.

The only cases that seem a problem to me are
* bulk RI checks
* large writes on tables using trigger based replication
maybe you have others?

If you are trying to optimise trigger based replication, I would be
inclined to look at immediate execution triggers. If we insert into a
log table then that will only take effect if the transaction commits.
That way we would just bypass the after trigger queue entirely. I see
nothing in the SQL Standard that prevents that.

Bitmaps work better than the current suggestion for optimising bulk RI
checks. If you used immediate execution triggers you could optimise
bulk RI checks against small tables even further by just storing a
local hash table of already acquired row locks, again avoiding the
need to build up a huge after trigger event queue in the first place.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Dean Rasheed
I've been thinking some more about the long-standing problem of the
AFTER TRIGGER queue using too much memory, and I think that the
situation can be improved by using some basic compression.

Currently each event added to the AFTER TRIGGER queue uses 10 bytes
per trigger per row for INSERTs and DELETEs, and 16 for UPDATEs. The
attached patch reduces this down to just 1 byte in a number of common
cases.

It works by optimising for the case where an event's CTID is same or
close to the previous event's CTID, and similarly for the event's
pointer to the shared trigger data (AfterTriggerSharedData).

If there is only 1 trigger on the table, it will use 1 byte for most
INSERTed rows. For UPDATEs and DELETEs, the space needed will vary
according to the table's layout and traversal order. Given a not too
fragmented table, and a sequential or bitmap heap scan, it should use
1-2 bytes per row. This will increase for more random traversal orders
- the theoretical maximum is 7 bytes per row, but that would take a
pretty pathological case.

If there are multiple triggers firing for each row, then each
additional trigger event will take 1 byte, provided the triggers fire
unconditionally (no WHERE clauses). If the triggers fire more
randomly, due to WHERE clauses, then some of the trigger events may
take 2 bytes each (for up to 256 triggers on the table) or 3 bytes
each if there are more triggers than that.

The compression code is pretty simple, so should only use a few
additional cycles. In my testing so far, it seems to make no
detectable difference to performance (actually it seems to be 2-3%
faster for a 50M row INSERT, presumably because the reduced memory
usage leads to fewer cache misses).

For comparison, saving a 1MB AfterTriggerEventChunk produced by the
current code to a file and compressing it with gzip gives a
compression ratio of around 4:1, and bzip2 gives around 8:1. I've not
tried the pg_lzcompress code, but at first glance it is hard to
imagine that it would be as good as gzip, and it is not so suited to
on-the-fly compression/decompression.

It still needs more testing, but the initial results are encouraging.

Thoughts?

Regards,
Dean


after-triggers.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 I've been thinking some more about the long-standing problem of the
 AFTER TRIGGER queue using too much memory, and I think that the
 situation can be improved by using some basic compression.

 Currently each event added to the AFTER TRIGGER queue uses 10 bytes
 per trigger per row for INSERTs and DELETEs, and 16 for UPDATEs. The
 attached patch reduces this down to just 1 byte in a number of common
 cases.

Ummm ... I only read the data structure comments, not the code, but I
don't see where you store the second CTID for an update event?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Dean Rasheed
On 1 August 2011 17:49, Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 I've been thinking some more about the long-standing problem of the
 AFTER TRIGGER queue using too much memory, and I think that the
 situation can be improved by using some basic compression.

 Currently each event added to the AFTER TRIGGER queue uses 10 bytes
 per trigger per row for INSERTs and DELETEs, and 16 for UPDATEs. The
 attached patch reduces this down to just 1 byte in a number of common
 cases.

 Ummm ... I only read the data structure comments, not the code, but I
 don't see where you store the second CTID for an update event?


Ah yes, I forgot to mention that bit. I'm using
(tuple1.t_data-t_ctid) to get the second CTID from the old tuple. Is
that safe? As far as I could see, this would match the new tuple after
a heap update.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 On 1 August 2011 17:49, Tom Lane t...@sss.pgh.pa.us wrote:
 Ummm ... I only read the data structure comments, not the code, but I
 don't see where you store the second CTID for an update event?

 Ah yes, I forgot to mention that bit. I'm using
 (tuple1.t_data-t_ctid) to get the second CTID from the old tuple. Is
 that safe?

H ... not sure.  It seems a bit scary, but on the other hand we
should be able to assume that the updating subtransaction hasn't been
rolled back (else surely we shouldn't be firing the trigger).  So in
principle it seems like the t_ctid link can't have been replaced.
This will foreclose any ideas about collapsing t_ctid link chains,
if anyone had it in mind to do that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Robert Haas
On Mon, Aug 1, 2011 at 1:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 On 1 August 2011 17:49, Tom Lane t...@sss.pgh.pa.us wrote:
 Ummm ... I only read the data structure comments, not the code, but I
 don't see where you store the second CTID for an update event?

 Ah yes, I forgot to mention that bit. I'm using
 (tuple1.t_data-t_ctid) to get the second CTID from the old tuple. Is
 that safe?

 H ... not sure.  It seems a bit scary, but on the other hand we
 should be able to assume that the updating subtransaction hasn't been
 rolled back (else surely we shouldn't be firing the trigger).  So in
 principle it seems like the t_ctid link can't have been replaced.
 This will foreclose any ideas about collapsing t_ctid link chains,
 if anyone had it in mind to do that.

Don't we already do that when pruning HOT chains?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Dean Rasheed
On 1 August 2011 18:36, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Aug 1, 2011 at 1:31 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 On 1 August 2011 17:49, Tom Lane t...@sss.pgh.pa.us wrote:
 Ummm ... I only read the data structure comments, not the code, but I
 don't see where you store the second CTID for an update event?

 Ah yes, I forgot to mention that bit. I'm using
 (tuple1.t_data-t_ctid) to get the second CTID from the old tuple. Is
 that safe?

 H ... not sure.  It seems a bit scary, but on the other hand we
 should be able to assume that the updating subtransaction hasn't been
 rolled back (else surely we shouldn't be firing the trigger).  So in
 principle it seems like the t_ctid link can't have been replaced.
 This will foreclose any ideas about collapsing t_ctid link chains,
 if anyone had it in mind to do that.

 Don't we already do that when pruning HOT chains?


I thought that only happens after the transaction is committed, and
old enough, whereas the trigger code only needs to follow the chain in
the updating transaction.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Robert Haas
On Mon, Aug 1, 2011 at 1:42 PM, Dean Rasheed dean.a.rash...@gmail.com wrote:
 H ... not sure.  It seems a bit scary, but on the other hand we
 should be able to assume that the updating subtransaction hasn't been
 rolled back (else surely we shouldn't be firing the trigger).  So in
 principle it seems like the t_ctid link can't have been replaced.
 This will foreclose any ideas about collapsing t_ctid link chains,
 if anyone had it in mind to do that.

 Don't we already do that when pruning HOT chains?

 I thought that only happens after the transaction is committed, and
 old enough, whereas the trigger code only needs to follow the chain in
 the updating transaction.

Hmm, true.

I worry a bit that this might foreclose possible future optimization
of the self update case, which is a known pain point.  Am I wrong to
worry?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, Aug 1, 2011 at 1:42 PM, Dean Rasheed dean.a.rash...@gmail.com wrote:
 Don't we already do that when pruning HOT chains?

 I thought that only happens after the transaction is committed, and
 old enough, whereas the trigger code only needs to follow the chain in
 the updating transaction.

 Hmm, true.

 I worry a bit that this might foreclose possible future optimization
 of the self update case, which is a known pain point.  Am I wrong to
 worry?

I think it might be OK if you explicitly verify that xmin/cmin of the
linked-to tuple matches the (sub)transaction/command that queued the
trigger event.  I don't recall whether the trigger code does that
already; I think there is some related test but it might not be that
strict.

There's also a definitional issue involved: if a transaction updates the
same tuple twice, in the presence of a deferred update trigger for the
table, is it supposed to (eventually) fire the trigger for both update
actions or only the last one?  I have a feeling we might already be
locked into the second choice, but if not, this would probably force it.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Dean Rasheed
On 1 August 2011 18:55, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Aug 1, 2011 at 1:42 PM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:
 Don't we already do that when pruning HOT chains?

 I thought that only happens after the transaction is committed, and
 old enough, whereas the trigger code only needs to follow the chain in
 the updating transaction.

 Hmm, true.

 I worry a bit that this might foreclose possible future optimization
 of the self update case, which is a known pain point.  Am I wrong to
 worry?

 I think it might be OK if you explicitly verify that xmin/cmin of the
 linked-to tuple matches the (sub)transaction/command that queued the
 trigger event.  I don't recall whether the trigger code does that
 already; I think there is some related test but it might not be that
 strict.

 There's also a definitional issue involved: if a transaction updates the
 same tuple twice, in the presence of a deferred update trigger for the
 table, is it supposed to (eventually) fire the trigger for both update
 actions or only the last one?  I have a feeling we might already be
 locked into the second choice, but if not, this would probably force it.


Do you mean this sort of case:

DROP TABLE IF EXISTS foo;
CREATE TABLE foo(a int);
INSERT INTO foo VALUES(1);

CREATE OR REPLACE FUNCTION foo_trig_fn() RETURNS trigger AS
$$
BEGIN
  RAISE NOTICE 'In foo_trig_fn(): old.a=%, new.a=%', old.a, new.a;
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE CONSTRAINT TRIGGER foo_trig AFTER UPDATE ON foo
  DEFERRABLE INITIALLY DEFERRED
  FOR EACH ROW EXECUTE PROCEDURE foo_trig_fn();

BEGIN;
  UPDATE foo SET a=a+1;
  UPDATE foo SET a=a+1;
COMMIT;

In this case we currently fire the trigger twice (once for each
update) when the transaction commits, and the new code behaves the
same. So at commit time you get:

NOTICE:  In foo_trig_fn(): old.a=1, new.a=2
NOTICE:  In foo_trig_fn(): old.a=2, new.a=3

Thinking back to the deferred PK checking trigger, I thought that in
this sort of case it is the trigger's responsibility to check that the
tuple(s) it is given are not dead.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 On 1 August 2011 18:55, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Aug 1, 2011 at 1:42 PM, Dean Rasheed dean.a.rash...@gmail.com wrote:
 Don't we already do that when pruning HOT chains?

 I thought that only happens after the transaction is committed, and
 old enough, whereas the trigger code only needs to follow the chain in
 the updating transaction.

 Hmm, true.

On reflection I think we're probably worrying over nothing.  The only
way that the t_ctid link might have gotten changed (given that the
original updating subtransaction hasn't been aborted) is if we were
willing to prune an inserted-and-then-deleted-in-same-xact tuple out of
the t_ctid link chain while its parent xact is still running.  However,
it is unsafe to do that unless you know for certain that the xact in
question has no snapshots that could see the tuple as live.  VACUUM
certainly doesn't know that, and neither does pruneheap.c, and I don't
really foresee us introducing enough bookkeeping so that they would know
it.  (If we did try to do that, we could handle the problem by
considering that queueing of trigger events introduces a quasi-snapshot
that requires the relevant tuples to remain available.)

So I think this is probably OK as long as it's adequately documented in
the code.  Just for luck, though, we should add an xmin/cmin match check
to the code if there's not one already.

However, this means that Dean is not simply adding some self-contained
compression logic; he's eliminating information from the data structure
on the grounds that he can get it from elsewhere.  I think that that
ought to be treated as a separate patch, and that the costs/benefits
of the compressed data structure ought to be evaluated relative to the
situation with the second CTID removed but the data structure not
otherwise changed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Dean Rasheed
On 1 August 2011 19:56, Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 On 1 August 2011 18:55, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, Aug 1, 2011 at 1:42 PM, Dean Rasheed dean.a.rash...@gmail.com 
 wrote:
 Don't we already do that when pruning HOT chains?

 I thought that only happens after the transaction is committed, and
 old enough, whereas the trigger code only needs to follow the chain in
 the updating transaction.

 Hmm, true.

 On reflection I think we're probably worrying over nothing.  The only
 way that the t_ctid link might have gotten changed (given that the
 original updating subtransaction hasn't been aborted) is if we were
 willing to prune an inserted-and-then-deleted-in-same-xact tuple out of
 the t_ctid link chain while its parent xact is still running.  However,
 it is unsafe to do that unless you know for certain that the xact in
 question has no snapshots that could see the tuple as live.  VACUUM
 certainly doesn't know that, and neither does pruneheap.c, and I don't
 really foresee us introducing enough bookkeeping so that they would know
 it.  (If we did try to do that, we could handle the problem by
 considering that queueing of trigger events introduces a quasi-snapshot
 that requires the relevant tuples to remain available.)

 So I think this is probably OK as long as it's adequately documented in
 the code.  Just for luck, though, we should add an xmin/cmin match check
 to the code if there's not one already.


I don't see any such existing check.
Is this the same as checking that xmax/cmax on the old tuple matches
xmin/cmin on the linked-to tuple? If so, that's pretty
straightforward. Otherwise I'm not sure where the xmin/cmin to check
against would come from, without adding more information to the queues
somewhere.


 However, this means that Dean is not simply adding some self-contained
 compression logic; he's eliminating information from the data structure
 on the grounds that he can get it from elsewhere.  I think that that
 ought to be treated as a separate patch, and that the costs/benefits
 of the compressed data structure ought to be evaluated relative to the
 situation with the second CTID removed but the data structure not
 otherwise changed.


OK, so I should split this into 2 patches?
Even without the compression, it's probably worth the 16 - 10 byte
reduction that would result from removing the 2nd CTID in the UPDATE
case, and that part would be a pretty small patch.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 OK, so I should split this into 2 patches?
 Even without the compression, it's probably worth the 16 - 10 byte
 reduction that would result from removing the 2nd CTID in the UPDATE
 case, and that part would be a pretty small patch.

Yeah, my point exactly.  The rest of it might or might not be worth the
extra complication.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Compressing the AFTER TRIGGER queue

2011-08-01 Thread Simon Riggs
On Mon, Aug 1, 2011 at 7:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 However, this means that Dean is not simply adding some self-contained
 compression logic; he's eliminating information from the data structure
 on the grounds that he can get it from elsewhere.  I think that that
 ought to be treated as a separate patch, and that the costs/benefits
 of the compressed data structure ought to be evaluated relative to the
 situation with the second CTID removed but the data structure not
 otherwise changed.

I would prefer an approach where we store the events in a buffer,
which gets added to the main event queue when it fills/block number
changes/etc. That way we can apply intelligence to the actual
compression format used, yet retain all required info in the queued
event. We might also be able to ensure the event queue is ordered more
naturally by block number, to ensure we have reasonable readahead
while re-traversing the event queue - its not just the size of the
event queue that is the problem here.

I'm also nervous of compression formats that rely too heavily on
earlier data, so any corner case bug in one of the formats will screw
the whole data structure. I'd prefer a more modular approach that is
easier to test in regression tests.

The cases we have problems with are the bulk cases, so we should be
specifically optimising for that. For example, if we are running a
COPY/INSERT SELECT or a seqscan DELETE then we will queue a trigger
for every tuple in a block. The best compression and flexibility in
that case is to store a bitmap since that will average out at about 1
bit per row, with variable length bitmaps. Which is about 8 times
better compression ratio than originally suggested, without any loss
of metadata.

We should also optimise the HOT UPDATE case.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers