Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-27 Thread Jeff Davis
On Mon, 2009-10-26 at 17:23 +, Dean Rasheed wrote:
 If it's of any relevance, I'm currently using an optimised build, with
 assert checking off.
 [Linux x86_64, 2 core Intel Core2]

Ok, I'm able to reproduce it now. Thanks for looking into it!

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-26 Thread Dean Rasheed
2009/10/25 Simon Riggs si...@2ndquadrant.com:
 On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote:

 This is a WIP patch to replace the after-trigger queues with TID bitmaps
 to prevent them from using excessive amounts of memory. Each round of
 trigger executions is a modified bitmap heap scan.

 This is an interesting patch. The justification is fine, the idea is
 good, though I'd like to see more analysis of the technique, what other
 options exist and some thought about when we should use the technique.

 We have a bitmap for each UPDATE statement, I think, but there's no docs
 or readme. Why just UPDATE? Is the cost of starting up the bitmap higher
 than the existing mechanism? Do we need to look at starting with an
 existing mechanism and then switching over to new mechanism? Is the TID
 bitmap always a win for large numbers of rows?


Thanks for looking at this. It works for all kinds of trigger events,
and is intended as a complete drop-in replacement for the after
triggers queue. I admit that I haven't yet done very much performance
testing. As it stands, there does appear to be a small performance
penalty associated with the bitmaps, but I need to do more testing to
be more specific about that.

I had thought that, for relatively small numbers of rows, I could use
something like a small list of CTID arrays of increasing size, and
then switch over to the new mechanism when this becomes too large.

But first, I wanted to get feedback on whether this TID bitmap
approach is actually valid for general trigger operation.


 The technique relies on these assumptions
 * Trigger functions are idempotent

I don't understand what you're saying here. It should execute the
triggers in exactly the same way as the current code (but possibly in
a different order). Idempotentence isn't required.


 * Trigger execution order is not important (in terms of rows)

It is true that the order in which the rows are processed will change.
As far as I can tell from the spec, there is nothing to say that the
rows for a given statement should be processed in any particular
order. I guess that I'm looking for feedback from people on this list
as to whether that will be a problem for existing apps.


 * Multiple trigger execution order is not important

This patch does not change the order of execution in the case where
there are multiple triggers (at least not for regular non-constraint
triggers). They should still be fired in name order, for each row. All
such triggers share a single TID bitmap, and are processed together.
This is in line with the spec.

Deferrable constraint triggers are a different matter, and these will
be fired in a different order (each set of triggers for a given
constraint will be fired together, rather than being interleaved).
This is not covered by the spec, but if they are genuinely being used
to enforce constraints, the order shouldn't matter.



 All of those seem false in the general case. What will you do?

At this point I'm looking for more feedback as to whether any of this
is a show-stopper, before I expend more effort on this patch.

 - Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-26 Thread Dean Rasheed
2009/10/25 Jeff Davis pg...@j-davis.com:
 On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote:
 This is a WIP patch to replace the after-trigger queues with TID bitmaps
 to prevent them from using excessive amounts of memory. Each round of
 trigger executions is a modified bitmap heap scan.

 Can you please take a look at my patch here:
 http://archives.postgresql.org/message-id/1256499249.12775.20.ca...@jdavis

 to make sure that we're not interfering with eachother? I implemented
 deferred constraint checking in my operator exclusion constraints patch
 (formerly generalized index constraints).


Yes, I've been following this, and I'm looking forward to this new
functionality.


 After looking very briefly at your approach, I think that it's entirely
 orthogonal, so I don't expect a problem.


I agree. I think that the 2 are orthogonal.

Possibly they could both share some common bulk checking code, but I've
not thought much about how to do that yet.


 I have a git repo here:
 http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=shortlog;h=refs/heads/operator-exclusion-constraints

 which may be helpful if you just want to look at the commit for deferred
 constraint checking. Any comments welcome.


I did a quick bit of testing, and I think that there is a
locking/concurrency problem :-(

Attached is a (rather crappy) python script (using PyGreSQL) that I
used to test consistency while I was working on the deferrable
uniqueness constraints patch. Basically it just spawns a bunch of
threads, each of which does random CRUD, with heavy contention and
lots of constraint violations and deadlocks, which are rolled back.

I modified the script to enforce uniqueness with an exclusion constraint,
and the script is able to break the constraint, forcing invalid data into
the table.

I haven't looked at your code in depth, but I hope that this is not a
difficult problem to fix. It seems like it ought to be similar to the btree
code.

 - Dean
#!/usr/bin/python

import sys, pg, threading, random, time, datetime

num_threads = 10
num_loops = 1000

lock = threading.RLock()
total_nontrans = 0
total_commits = 0
total_rollbacks = 0
total_inserts = 0
total_updates = 0
total_deletes = 0
total_violations = 0
total_deadlocks = 0
total_unknown_errors = 0
total_errors = 0
total_duplicates = 0

def open():
return pg.DB(pgdevel, localhost, -1,
 -c client_min_messages=WARNING, None, pgdevel, )

def find_duplicates(db):
result = db.query(SELECT max(c)-1 AS dups FROM +\
  (SELECT count(*) AS c FROM foo GROUP BY a) AS foo)
result = result.dictresult()[0][dups]

if result == None: return 0
return result

def setup():
db = open()
db.query(DROP TABLE IF EXISTS foo)
#db.query(CREATE TABLE foo(a int UNIQUE))
#db.query(CREATE TABLE foo(a int UNIQUE DEFERRABLE INITIALLY DEFERRED))
db.query(CREATE TABLE foo(a int))
db.query(ALTER TABLE foo ADD CONSTRAINT foo_u EXCLUSION+\
  USING btree (a CHECK WITH =) DEFERRABLE INITIALLY DEFERRED)
db.close()

def do_crud(db):
global total_nontrans, total_commits, total_rollbacks
global total_inserts, total_updates, total_deletes
global total_violations, total_deadlocks, total_unknown_errors
global total_errors, total_duplicates

inserts = 0
updates = 0
deletes = 0

do_trans = random.random()  0.2
do_commit = random.random()  0.2
do_loop = True

duplicates = find_duplicates(db)
lock.acquire()
total_duplicates += duplicates
if duplicates  0: print 1 FOUND DUPLICATES
lock.release()
if total_duplicates  0: sys.exit(1)

try:
if do_trans:
db.query(BEGIN)

while do_loop:
if random.random()  0.5:
val = int(random.random()*100)
db.query(INSERT INTO foo VALUES(+str(val)+))
inserts += 1
if random.random()  0.5:
val1 = int(random.random()*100)
val2 = int(random.random()*100)
db.query(UPDATE foo SET a=+str(val2)+ WHERE a=+str(val1))
updates += 1
if random.random()  0.5:
val = int(random.random()*100)
db.query(DELETE FROM foo WHERE a=+str(val))
deletes += 1
if random.random()  0.5:
do_loop = False

if do_trans:
if do_commit:
db.query(COMMIT)
else:
db.query(ROLLBACK)
inserts = 0
updates = 0
deletes = 0

duplicates = find_duplicates(db)

lock.acquire()

if do_trans:
if do_commit: total_commits += 1
else: total_rollbacks += 1
else:
total_nontrans += 1

total_inserts += inserts
total_updates += updates
total_deletes += deletes

total_duplicates += duplicates
if duplicates  0: print 2 

Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-26 Thread Simon Riggs
On Mon, 2009-10-26 at 13:28 +, Dean Rasheed wrote:

 It works for all kinds of trigger events,
 and is intended as a complete drop-in replacement for the after
 triggers queue. 

  All of those seem false in the general case. What will you do?
 
 At this point I'm looking for more feedback as to whether any of this
 is a show-stopper, before I expend more effort on this patch.

I see no show stoppers, only for you to look at ways of specifying that
this optimization is possible for particular cases. I think we might be
able to make the general statement that it will work for all after
triggers that execute STABLE or IMMUTABLE functions. I don't think we
can assume that firing order is irrelevant for some cases, e.g. message
queues.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-26 Thread Jeff Davis
On Mon, 2009-10-26 at 13:41 +, Dean Rasheed wrote:
 I did a quick bit of testing, and I think that there is a
 locking/concurrency problem :-(

Unfortunately I can't reproduce the problem on my machine; it always
passes.

If you have a minute, can you try to determine if the problem can happen
with a non-deferrable constraint?

I'll keep looking into it.

Thanks,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-26 Thread Dean Rasheed
2009/10/26 Simon Riggs si...@2ndquadrant.com:
 On Mon, 2009-10-26 at 13:28 +, Dean Rasheed wrote:

 It works for all kinds of trigger events,
 and is intended as a complete drop-in replacement for the after
 triggers queue.

  All of those seem false in the general case. What will you do?

 At this point I'm looking for more feedback as to whether any of this
 is a show-stopper, before I expend more effort on this patch.

 I see no show stoppers, only for you to look at ways of specifying that
 this optimization is possible for particular cases. I think we might be
 able to make the general statement that it will work for all after
 triggers that execute STABLE or IMMUTABLE functions. I don't think we
 can assume that firing order is irrelevant for some cases, e.g. message
 queues.


Hmm, thinking about this some more... one thing this patch does is to
separate out the queues for regular triggers from those for RI
triggers and deferrable constraint checks. ITSM that row-order only
really matters for the former. It's also the case that for these
triggers there will never be any other choice but to execute them one
at a time, so they may as well just spool to a file rather than using
a TID bitmap.

The bitmaps are probably only useful for constraint triggers, where a
bulk check can be used instead of executing individual triggers for
each row, if enough rows are modified.

 - Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-26 Thread Dean Rasheed
2009/10/26 Jeff Davis pg...@j-davis.com:
 On Mon, 2009-10-26 at 13:41 +, Dean Rasheed wrote:
 I did a quick bit of testing, and I think that there is a
 locking/concurrency problem :-(

 Unfortunately I can't reproduce the problem on my machine; it always
 passes.


That's odd. It happens every time on my machine (10 threads, 1000 loops).

 If you have a minute, can you try to determine if the problem can happen
 with a non-deferrable constraint?


If anything, that seems to make it fail more quickly.

If it's of any relevance, I'm currently using an optimised build, with
assert checking off.
[Linux x86_64, 2 core Intel Core2]

 - Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-26 Thread Robert Haas
On Mon, Oct 26, 2009 at 9:46 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, 2009-10-26 at 13:28 +, Dean Rasheed wrote:

 It works for all kinds of trigger events,
 and is intended as a complete drop-in replacement for the after
 triggers queue.

  All of those seem false in the general case. What will you do?

 At this point I'm looking for more feedback as to whether any of this
 is a show-stopper, before I expend more effort on this patch.

 I see no show stoppers, only for you to look at ways of specifying that
 this optimization is possible for particular cases. I think we might be
 able to make the general statement that it will work for all after
 triggers that execute STABLE or IMMUTABLE functions. I don't think we
 can assume that firing order is irrelevant for some cases, e.g. message
 queues.

Hmm.  After-trigger functions are very unlikely to really be STABLE or
IMMUTABLE, though.  Almost by definition, they'd better be modifying
some data somewhere, or there's no point.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-25 Thread Simon Riggs
On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote:

 This is a WIP patch to replace the after-trigger queues with TID bitmaps
 to prevent them from using excessive amounts of memory. Each round of
 trigger executions is a modified bitmap heap scan.

This is an interesting patch. The justification is fine, the idea is
good, though I'd like to see more analysis of the technique, what other
options exist and some thought about when we should use the technique.

We have a bitmap for each UPDATE statement, I think, but there's no docs
or readme. Why just UPDATE? Is the cost of starting up the bitmap higher
than the existing mechanism? Do we need to look at starting with an
existing mechanism and then switching over to new mechanism? Is the TID
bitmap always a win for large numbers of rows?

The technique relies on these assumptions
* Trigger functions are idempotent
* Trigger execution order is not important (in terms of rows)
* Multiple trigger execution order is not important

All of those seem false in the general case. What will you do?

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-25 Thread Jeff Davis
On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote:
 This is a WIP patch to replace the after-trigger queues with TID bitmaps
 to prevent them from using excessive amounts of memory. Each round of
 trigger executions is a modified bitmap heap scan.

Can you please take a look at my patch here:
http://archives.postgresql.org/message-id/1256499249.12775.20.ca...@jdavis

to make sure that we're not interfering with eachother? I implemented
deferred constraint checking in my operator exclusion constraints patch
(formerly generalized index constraints).

After looking very briefly at your approach, I think that it's entirely
orthogonal, so I don't expect a problem.

I have a git repo here:
http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=shortlog;h=refs/heads/operator-exclusion-constraints

which may be helpful if you just want to look at the commit for deferred
constraint checking. Any comments welcome.

I'll also take a look at your patch in the next few days.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-19 Thread Robert Haas
On Mon, Oct 19, 2009 at 12:48 PM, Dean Rasheed
dean.a.rash...@googlemail.com wrote:
 This is a WIP patch to replace the after-trigger queues with TID bitmaps
 to prevent them from using excessive amounts of memory. Each round of
 trigger executions is a modified bitmap heap scan.

If the bitmap becomes lossy, how do you preserve the correct semantics?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue

2009-10-19 Thread Dean Rasheed
2009/10/19 Robert Haas robertmh...@gmail.com:
 On Mon, Oct 19, 2009 at 12:48 PM, Dean Rasheed
 dean.a.rash...@googlemail.com wrote:
 This is a WIP patch to replace the after-trigger queues with TID bitmaps
 to prevent them from using excessive amounts of memory. Each round of
 trigger executions is a modified bitmap heap scan.

 If the bitmap becomes lossy, how do you preserve the correct semantics?

 ...Robert


The idea is that it filters by the transaction ID and command ID of
modified rows to see what's been updated in the command(s) the trigger
is for...

 - Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers