Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-26 Thread Vik Reykja
On Wed, Jun 20, 2012 at 2:19 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 I've marked this patch committed, although in the end there was nothing
 left of it ;-)


Thank you, Dean and Tom!

I'm sorry for not participating in this thread, I've been away for the past
five weeks and have much catching up to do.


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-19 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 On 12 February 2012 02:06, Vik Reykja vikrey...@gmail.com wrote:
 I decided to take a crack at the todo item created from the following post:
 http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php

 Here's my review of this patch.

I've marked this patch committed, although in the end there was nothing
left of it ;-).  After teaching the trigger skip logic that old PK nulls
or new FK nulls mean the constraint needn't be checked, there is no case
where ri_KeysEqual is called on data that is not known to be
all-non-null on one side or the other.  So it doesn't matter whether
we use plain equality or is-not-distinct logic.  We could have applied
these changes anyway but I didn't see much value, since as previously
noted there would be some cases where the comparison got microscopically
slower.  I did add a comment about the point in case anybody revisits
the code again in future.

 There's also a separate question about whether the RI trigger
 functions need to be doing these key comparisons, given that they are
 done earlier when the triggers are queued
 (http://archives.postgresql.org/pgsql-performance/2005-10/msg00459.php),
 but the savings to be made there are likely to be smaller than the
 savings this patch makes by not queuing the triggers at all.

I haven't looked into this question.  It would only matter for PK
updates (since the FK-side triggers make no such comparisons), so it's
probably not going to make much difference in typical workloads.
Still, if anybody wants to investigate ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-18 Thread Dean Rasheed
On 17 June 2012 18:30, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Gurjeet Singh  wrote:
 Dean Rasheed wrote:

 in HEAD:
 ... (actual time=1390.037..1390.037 rows=0 loops=1)
 Trigger for constraint fk_table_e_fkey: time=210.184 calls=9
 Total runtime: 1607.626 ms

 With this patch:
 ... (actual time=1489.640..1489.640 rows=0 loops=1)
 [no triggers fired]
 Total runtime: 1489.679 ms

 for every row:
 ... (actual time=1565.148..1565.148 rows=0 loops=1)
 Trigger for constraint fk_table_e_fkey: time=705.962 calls=10
 Total runtime: 2279.408 ms

 with this patch
 ... (actual time=1962.755..1962.755 rows=0 loops=1)
 Trigger for constraint fk_table_e_fkey: time=257.845 calls=1
 Total runtime: 2221.912 ms

 I find it interesting that 'actual time' for top level 'Update on
 fk_table' is always higher in patched versions, and yet the 'Total
 runtime' is lower for the patched versions. I would've expected
 'Total runtime' to be proportional to the increase in top-level
 row-source's 'actual time'.

 I figured that the trigger time was counted separately.  It seems to
 add up pretty well that way.  I guess the question is whether there
 is a case where the increase in seqscan time is *not* compensated by
 less time in the triggers.

 -Kevin

I wouldn't read too much into the individual timings I posted above,
since I get massive variations between runs. If I repeat it enough
times, I can convince myself that the update times excluding trigger
execution are unchanged on average, but the trigger execution times
(which are indeed counted separately) are a real savings.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-18 Thread Dean Rasheed
On 17 June 2012 18:48, Tom Lane t...@sss.pgh.pa.us wrote:
 Gurjeet Singh singh.gurj...@gmail.com writes:
 On Sat, Jun 16, 2012 at 9:50 AM, Dean Rasheed 
 dean.a.rash...@gmail.comwrote:
 I find it interesting that 'actual time' for top level 'Update on fk_table'
 is always higher in patched versions, and yet the 'Total runtime' is lower
 for the patched versions. I would've expected 'Total runtime' to be
 proportional to the increase in top-level row-source's 'actual time'.
 Even the time consumed by Seq scans is higher in patched version, so I
 think the patch's affect on performance needs to be evaluated.

 AFAICS, the only way that the given patch could possibly make anything
 slower is that if the old value of some tested attribute is NULL, the
 comparison routines used to fall out immediately; now, they will do an
 additional SPI_getbinval call to extract the new value before making
 any decision.  So that would account for some small increase in the
 ModifyTable runtime in cases where there are a lot of null keys in FK
 rows being updated, which accurately describes Dean's test case, if not
 so much the real world.  I don't have a big problem with it, since the
 point of the patch is to possibly save a great deal more work in exactly
 these cases.

 It strikes me though that we are still leaving some money on the table.
 The SQL spec says clearly that no RI action need be taken when a null
 PK key value is updated to non-null, and I think this is right because
 there cannot possibly be any FK rows that are considered to match the
 old value.  (Note that both the spec and our FK code treat the RI
 equality operators as strict, even if the underlying functions aren't
 really.)  So we ought to have asymmetric logic in there when making
 checks on PK rows, such that null-non-null is not considered an
 interesting change.  If done properly this would remove the above-
 described slowdown in the PK case.


Yeah, that makes sense.

 Conversely, if an FK value is changed from non-null to null, that is
 either always OK (if MATCH SIMPLE, or if MATCH FULL and all the FK
 columns went to null) or a certain failure (if MATCH FULL and we
 have a mix of nulls and non-nulls).  There's no need to queue a
 trigger event in the always OK cases, so I think we need some
 asymmetric logic in the FK case as well.


Makes sense too.
I think that the patch already covers the most common use case (in my
experience) but we may as well get as much out of it as we can while
we're here.

Are you planning to tackle this, or should I move the patch back to
waiting on author to give Vik Reykja a chance to update it?

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-18 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 I think that the patch already covers the most common use case (in my
 experience) but we may as well get as much out of it as we can while
 we're here.

Yeah.  The cases involving nulls are probably really rather unlikely
altogether, but it seems a tad silly to fix only some of them when
we can fix all of them for marginally more effort.

 Are you planning to tackle this, or should I move the patch back to
 waiting on author to give Vik Reykja a chance to update it?

It doesn't look like much else is ready for committer yet, so I think
I'll keep hacking on this one.  The whole of ri_triggers is looking a
bit old and creaky to me; for instance the SET DEFAULT triggers believe
that they can't cache plans involving DEFAULT, which was fixed years
ago (I'm pretty sure that the plancache level should take care of that
automatically, since an ALTER TABLE ... DEFAULT command would force a
relcache flush).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-17 Thread Gurjeet Singh
On Sat, Jun 16, 2012 at 9:50 AM, Dean Rasheed dean.a.rash...@gmail.comwrote:

Then in HEAD:
 EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1;

  QUERY PLAN

 ---
  Update on fk_table  (cost=0.00..2300.00 rows=10 width=26) (actual
 time=1390.037..1390.037 rows=0 loops=1)
   -  Seq Scan on fk_table  (cost=0.00..2300.00 rows=10 width=26)
 (actual time=0.010..60.841 rows=10 loops=1)
  Trigger for constraint fk_table_e_fkey: time=210.184 calls=9
  Total runtime: 1607.626 ms
 (4 rows)

 So the RI trigger is fired 9 times, for the unchanged NULL FK rows.

 With this patch, the RI trigger is not fired at all:
 EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1;

  QUERY PLAN

 ---
  Update on fk_table  (cost=0.00..2300.00 rows=10 width=26) (actual
 time=1489.640..1489.640 rows=0 loops=1)
   -  Seq Scan on fk_table  (cost=0.00..2300.00 rows=10 width=26)
 (actual time=0.010..66.328 rows=10 loops=1)
  Total runtime: 1489.679 ms
 (3 rows)


 Similarly, if I update the FK column in HEAD the RI trigger is fired
 for every row:
 EXPLAIN ANALYSE UPDATE fk_table SET e=e-1;

  QUERY PLAN

 ---
  Update on fk_table  (cost=0.00..1800.00 rows=10 width=26) (actual
 time=1565.148..1565.148 rows=0 loops=1)
   -  Seq Scan on fk_table  (cost=0.00..1800.00 rows=10 width=26)
 (actual time=0.010..42.725 rows=10 loops=1)
  Trigger for constraint fk_table_e_fkey: time=705.962 calls=10
  Total runtime: 2279.408 ms
 (4 rows)

 whereas with this patch it is only fired for the non-NULL FK rows that
 are changing:
 EXPLAIN ANALYSE UPDATE fk_table SET e=e-1;

  QUERY PLAN

 ---
  Update on fk_table  (cost=0.00..5393.45 rows=299636 width=26) (actual
 time=1962.755..1962.755 rows=0 loops=1)
   -  Seq Scan on fk_table  (cost=0.00..5393.45 rows=299636 width=26)
 (actual time=0.023..52.850 rows=10 loops=1)
  Trigger for constraint fk_table_e_fkey: time=257.845 calls=1
  Total runtime: 2221.912 ms
 (4 rows)


I find it interesting that 'actual time' for top level 'Update on fk_table'
is always higher in patched versions, and yet the 'Total runtime' is lower
for the patched versions. I would've expected 'Total runtime' to be
proportional to the increase in top-level row-source's 'actual time'.

Even the time consumed by Seq scans is higher in patched version, so I
think the patch's affect on performance needs to be evaluated.

Best regards,
-- 
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-17 Thread Kevin Grittner
Gurjeet Singh  wrote:
 Dean Rasheed wrote:
 
 in HEAD:
 ... (actual time=1390.037..1390.037 rows=0 loops=1)
 Trigger for constraint fk_table_e_fkey: time=210.184 calls=9
 Total runtime: 1607.626 ms
 
 With this patch:
 ... (actual time=1489.640..1489.640 rows=0 loops=1)
 [no triggers fired]
 Total runtime: 1489.679 ms
 
 for every row:
 ... (actual time=1565.148..1565.148 rows=0 loops=1)
 Trigger for constraint fk_table_e_fkey: time=705.962 calls=10
 Total runtime: 2279.408 ms
 
 with this patch
 ... (actual time=1962.755..1962.755 rows=0 loops=1)
 Trigger for constraint fk_table_e_fkey: time=257.845 calls=1
 Total runtime: 2221.912 ms
 
 I find it interesting that 'actual time' for top level 'Update on
 fk_table' is always higher in patched versions, and yet the 'Total
 runtime' is lower for the patched versions. I would've expected
 'Total runtime' to be proportional to the increase in top-level
 row-source's 'actual time'.
 
I figured that the trigger time was counted separately.  It seems to
add up pretty well that way.  I guess the question is whether there
is a case where the increase in seqscan time is *not* compensated by
less time in the triggers.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-17 Thread Tom Lane
Gurjeet Singh singh.gurj...@gmail.com writes:
 On Sat, Jun 16, 2012 at 9:50 AM, Dean Rasheed dean.a.rash...@gmail.comwrote:
 I find it interesting that 'actual time' for top level 'Update on fk_table'
 is always higher in patched versions, and yet the 'Total runtime' is lower
 for the patched versions. I would've expected 'Total runtime' to be
 proportional to the increase in top-level row-source's 'actual time'.
 Even the time consumed by Seq scans is higher in patched version, so I
 think the patch's affect on performance needs to be evaluated.

AFAICS, the only way that the given patch could possibly make anything
slower is that if the old value of some tested attribute is NULL, the
comparison routines used to fall out immediately; now, they will do an
additional SPI_getbinval call to extract the new value before making
any decision.  So that would account for some small increase in the
ModifyTable runtime in cases where there are a lot of null keys in FK
rows being updated, which accurately describes Dean's test case, if not
so much the real world.  I don't have a big problem with it, since the
point of the patch is to possibly save a great deal more work in exactly
these cases.

It strikes me though that we are still leaving some money on the table.
The SQL spec says clearly that no RI action need be taken when a null
PK key value is updated to non-null, and I think this is right because
there cannot possibly be any FK rows that are considered to match the
old value.  (Note that both the spec and our FK code treat the RI
equality operators as strict, even if the underlying functions aren't
really.)  So we ought to have asymmetric logic in there when making
checks on PK rows, such that null-non-null is not considered an
interesting change.  If done properly this would remove the above-
described slowdown in the PK case.

Conversely, if an FK value is changed from non-null to null, that is
either always OK (if MATCH SIMPLE, or if MATCH FULL and all the FK
columns went to null) or a certain failure (if MATCH FULL and we
have a mix of nulls and non-nulls).  There's no need to queue a
trigger event in the always OK cases, so I think we need some
asymmetric logic in the FK case as well.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-17 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 I figured that the trigger time was counted separately.

Yeah, it is.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-16 Thread Dean Rasheed
On 12 February 2012 02:06, Vik Reykja vikrey...@gmail.com wrote:
 I decided to take a crack at the todo item created from the following post:
 http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php

 The attached patch makes the desired changes in both code and function
 naming.

 It seemed quite easy to do but wasn't marked as easy on the todo, so I'm
 wondering if I've missed something.  All regression tests pass.


Here's my review of this patch.


Basic stuff:


* Patch applies OK (some offsets).

BTW, I had no problems applying both the original patch and Chetan
Suttraway's version. The only difference between the patches seems to
be that the original is in context format, and Chetan Suttraway's is
in unified format.

Which format do hackers actually prefer? The wiki page
http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
suggests context format, but then the linked example
http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified
format. Do people care, or are both formats OK?

* Compiles cleanly with no warnings.

* Regression tests pass.

The regression tests have not been updated. I think that's fair enough
- I don't see a way to test this in a regression test - but I did some
testing (see below) to confirm the expected behaviour.

* No doc changes needed.


What it does:
-

The primary benefit this patch offers is to prevent unnecessary
queuing of RI triggers in the case of updates to a FK table where the
old and new FK values are both NULL. It does this by effectively
replacing the existing key equality checks with IS [NOT] DISTINCT FROM
checks.

This seems like a worthwhile optimisation, because I think that it is
fairly common to have NULLs in FKs columns, and then update some other
column.

The patch also prevents unnecessary queuing of RI triggers when the PK
table is updated, and the old and new values are both NULL, but that
seems like a much less common case.

I've looked over the code changes fairly closely, and I believe that
this is a safe change.

Technically, I think the changes to ri_OneKeyEqual() and
ri_AllKeysUnequal() are unnecessary, since they can only be called
from the trigger functions in the case where all the old values are
non-NULL, hence the new versions end up behaving the same. However, I
think it makes sense for all these functions to be consistent.

Talking of consistency, I wonder if RI_FKey_keyequal_upd_pk() and
RI_FKey_keyequal_upd_fk() ought to be renamed to
RI_FKey_keyunchanged_upd_pk() and RI_FKey_keyunchanged_upd_pk()?


Testing:


I tested using the following tables:

CREATE TABLE pk_table
(
  a int PRIMARY KEY
);

INSERT INTO pk_table
  SELECT * FROM generate_series(1,10);

CREATE TABLE fk_table
(
  a int PRIMARY KEY,
  b int,
  c int,
  d int,
  e int REFERENCES pk_table(a)
);

INSERT INTO fk_table
  SELECT i, i*2, i*3, i*4, CASE WHEN i%10 = 0 THEN i END
FROM generate_series(1,10) g(i);

(i.e., FK populated in 10% of rows)


Then in HEAD:
EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1;

  QUERY PLAN
---
 Update on fk_table  (cost=0.00..2300.00 rows=10 width=26) (actual
time=1390.037..1390.037 rows=0 loops=1)
   -  Seq Scan on fk_table  (cost=0.00..2300.00 rows=10 width=26)
(actual time=0.010..60.841 rows=10 loops=1)
 Trigger for constraint fk_table_e_fkey: time=210.184 calls=9
 Total runtime: 1607.626 ms
(4 rows)

So the RI trigger is fired 9 times, for the unchanged NULL FK rows.

With this patch, the RI trigger is not fired at all:
EXPLAIN ANALYSE UPDATE fk_table SET b=b+1, c=c+1, d=d+1;

  QUERY PLAN
---
 Update on fk_table  (cost=0.00..2300.00 rows=10 width=26) (actual
time=1489.640..1489.640 rows=0 loops=1)
   -  Seq Scan on fk_table  (cost=0.00..2300.00 rows=10 width=26)
(actual time=0.010..66.328 rows=10 loops=1)
 Total runtime: 1489.679 ms
(3 rows)


Similarly, if I update the FK column in HEAD the RI trigger is fired
for every row:
EXPLAIN ANALYSE UPDATE fk_table SET e=e-1;

  QUERY PLAN
---
 Update on fk_table  (cost=0.00..1800.00 rows=10 width=26) (actual
time=1565.148..1565.148 rows=0 loops=1)
   -  Seq Scan on fk_table  (cost=0.00..1800.00 rows=10 width=26)
(actual time=0.010..42.725 rows=10 loops=1)
 Trigger for constraint fk_table_e_fkey: time=705.962 calls=10
 Total runtime: 2279.408 ms
(4 rows)

whereas with this patch it is only fired for the non-NULL FK rows that
are changing:
EXPLAIN ANALYSE 

Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-16 Thread Tom Lane
Dean Rasheed dean.a.rash...@gmail.com writes:
 BTW, I had no problems applying both the original patch and Chetan
 Suttraway's version. The only difference between the patches seems to
 be that the original is in context format, and Chetan Suttraway's is
 in unified format.

 Which format do hackers actually prefer? The wiki page
 http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
 suggests context format, but then the linked example
 http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified
 format. Do people care, or are both formats OK?

Some people find one or the other more readable.  (I'm in the camp that
says unified format is great for isolated single-line changes and
utterly unreadable for anything more complex, but apparently there are
people who prefer it.)

For detailed review/commit purposes, it doesn't matter that much as long
as the patch applies cleanly, since it's easy to apply it and then get
a diff in the other format if you prefer reading the other.  However,
if you're just hoping people will eyeball the patch in email and comment
on it, readability matters.  If the patch requires manual fixup in order
to get it to apply anymore, readability is also a concern, since you're
dependent on the committer not misinterpreting the hunks he has to patch
in by hand.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REVIEW: Optimize referential integrity checks (todo item)

2012-06-16 Thread Dean Rasheed
On 16 June 2012 18:04, Tom Lane t...@sss.pgh.pa.us wrote:
 Dean Rasheed dean.a.rash...@gmail.com writes:
 BTW, I had no problems applying both the original patch and Chetan
 Suttraway's version. The only difference between the patches seems to
 be that the original is in context format, and Chetan Suttraway's is
 in unified format.

 Which format do hackers actually prefer? The wiki page
 http://wiki.postgresql.org/wiki/Working_with_Git#Context_diffs_with_Git
 suggests context format, but then the linked example
 http://wiki.postgresql.org/wiki/Creating_Clean_Patches is in unified
 format. Do people care, or are both formats OK?

 Some people find one or the other more readable.  (I'm in the camp that
 says unified format is great for isolated single-line changes and
 utterly unreadable for anything more complex, but apparently there are
 people who prefer it.)

 For detailed review/commit purposes, it doesn't matter that much as long
 as the patch applies cleanly, since it's easy to apply it and then get
 a diff in the other format if you prefer reading the other.  However,
 if you're just hoping people will eyeball the patch in email and comment
 on it, readability matters.  If the patch requires manual fixup in order
 to get it to apply anymore, readability is also a concern, since you're
 dependent on the committer not misinterpreting the hunks he has to patch
 in by hand.


OK thanks, that's good to know.
I tend to find context format easier to read for large patches, but
that's a highly subjective thing.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers