Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
On Mon, 2009-10-26 at 17:23 +, Dean Rasheed wrote: If it's of any relevance, I'm currently using an optimised build, with assert checking off. [Linux x86_64, 2 core Intel Core2] Ok, I'm able to reproduce it now. Thanks for looking into it! Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
2009/10/25 Simon Riggs si...@2ndquadrant.com: On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote: This is a WIP patch to replace the after-trigger queues with TID bitmaps to prevent them from using excessive amounts of memory. Each round of trigger executions is a modified bitmap heap scan. This is an interesting patch. The justification is fine, the idea is good, though I'd like to see more analysis of the technique, what other options exist and some thought about when we should use the technique. We have a bitmap for each UPDATE statement, I think, but there's no docs or readme. Why just UPDATE? Is the cost of starting up the bitmap higher than the existing mechanism? Do we need to look at starting with an existing mechanism and then switching over to new mechanism? Is the TID bitmap always a win for large numbers of rows? Thanks for looking at this. It works for all kinds of trigger events, and is intended as a complete drop-in replacement for the after triggers queue. I admit that I haven't yet done very much performance testing. As it stands, there does appear to be a small performance penalty associated with the bitmaps, but I need to do more testing to be more specific about that. I had thought that, for relatively small numbers of rows, I could use something like a small list of CTID arrays of increasing size, and then switch over to the new mechanism when this becomes too large. But first, I wanted to get feedback on whether this TID bitmap approach is actually valid for general trigger operation. The technique relies on these assumptions * Trigger functions are idempotent I don't understand what you're saying here. It should execute the triggers in exactly the same way as the current code (but possibly in a different order). Idempotentence isn't required. * Trigger execution order is not important (in terms of rows) It is true that the order in which the rows are processed will change. As far as I can tell from the spec, there is nothing to say that the rows for a given statement should be processed in any particular order. I guess that I'm looking for feedback from people on this list as to whether that will be a problem for existing apps. * Multiple trigger execution order is not important This patch does not change the order of execution in the case where there are multiple triggers (at least not for regular non-constraint triggers). They should still be fired in name order, for each row. All such triggers share a single TID bitmap, and are processed together. This is in line with the spec. Deferrable constraint triggers are a different matter, and these will be fired in a different order (each set of triggers for a given constraint will be fired together, rather than being interleaved). This is not covered by the spec, but if they are genuinely being used to enforce constraints, the order shouldn't matter. All of those seem false in the general case. What will you do? At this point I'm looking for more feedback as to whether any of this is a show-stopper, before I expend more effort on this patch. - Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
2009/10/25 Jeff Davis pg...@j-davis.com: On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote: This is a WIP patch to replace the after-trigger queues with TID bitmaps to prevent them from using excessive amounts of memory. Each round of trigger executions is a modified bitmap heap scan. Can you please take a look at my patch here: http://archives.postgresql.org/message-id/1256499249.12775.20.ca...@jdavis to make sure that we're not interfering with eachother? I implemented deferred constraint checking in my operator exclusion constraints patch (formerly generalized index constraints). Yes, I've been following this, and I'm looking forward to this new functionality. After looking very briefly at your approach, I think that it's entirely orthogonal, so I don't expect a problem. I agree. I think that the 2 are orthogonal. Possibly they could both share some common bulk checking code, but I've not thought much about how to do that yet. I have a git repo here: http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=shortlog;h=refs/heads/operator-exclusion-constraints which may be helpful if you just want to look at the commit for deferred constraint checking. Any comments welcome. I did a quick bit of testing, and I think that there is a locking/concurrency problem :-( Attached is a (rather crappy) python script (using PyGreSQL) that I used to test consistency while I was working on the deferrable uniqueness constraints patch. Basically it just spawns a bunch of threads, each of which does random CRUD, with heavy contention and lots of constraint violations and deadlocks, which are rolled back. I modified the script to enforce uniqueness with an exclusion constraint, and the script is able to break the constraint, forcing invalid data into the table. I haven't looked at your code in depth, but I hope that this is not a difficult problem to fix. It seems like it ought to be similar to the btree code. - Dean #!/usr/bin/python import sys, pg, threading, random, time, datetime num_threads = 10 num_loops = 1000 lock = threading.RLock() total_nontrans = 0 total_commits = 0 total_rollbacks = 0 total_inserts = 0 total_updates = 0 total_deletes = 0 total_violations = 0 total_deadlocks = 0 total_unknown_errors = 0 total_errors = 0 total_duplicates = 0 def open(): return pg.DB(pgdevel, localhost, -1, -c client_min_messages=WARNING, None, pgdevel, ) def find_duplicates(db): result = db.query(SELECT max(c)-1 AS dups FROM +\ (SELECT count(*) AS c FROM foo GROUP BY a) AS foo) result = result.dictresult()[0][dups] if result == None: return 0 return result def setup(): db = open() db.query(DROP TABLE IF EXISTS foo) #db.query(CREATE TABLE foo(a int UNIQUE)) #db.query(CREATE TABLE foo(a int UNIQUE DEFERRABLE INITIALLY DEFERRED)) db.query(CREATE TABLE foo(a int)) db.query(ALTER TABLE foo ADD CONSTRAINT foo_u EXCLUSION+\ USING btree (a CHECK WITH =) DEFERRABLE INITIALLY DEFERRED) db.close() def do_crud(db): global total_nontrans, total_commits, total_rollbacks global total_inserts, total_updates, total_deletes global total_violations, total_deadlocks, total_unknown_errors global total_errors, total_duplicates inserts = 0 updates = 0 deletes = 0 do_trans = random.random() 0.2 do_commit = random.random() 0.2 do_loop = True duplicates = find_duplicates(db) lock.acquire() total_duplicates += duplicates if duplicates 0: print 1 FOUND DUPLICATES lock.release() if total_duplicates 0: sys.exit(1) try: if do_trans: db.query(BEGIN) while do_loop: if random.random() 0.5: val = int(random.random()*100) db.query(INSERT INTO foo VALUES(+str(val)+)) inserts += 1 if random.random() 0.5: val1 = int(random.random()*100) val2 = int(random.random()*100) db.query(UPDATE foo SET a=+str(val2)+ WHERE a=+str(val1)) updates += 1 if random.random() 0.5: val = int(random.random()*100) db.query(DELETE FROM foo WHERE a=+str(val)) deletes += 1 if random.random() 0.5: do_loop = False if do_trans: if do_commit: db.query(COMMIT) else: db.query(ROLLBACK) inserts = 0 updates = 0 deletes = 0 duplicates = find_duplicates(db) lock.acquire() if do_trans: if do_commit: total_commits += 1 else: total_rollbacks += 1 else: total_nontrans += 1 total_inserts += inserts total_updates += updates total_deletes += deletes total_duplicates += duplicates if duplicates 0: print 2
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
On Mon, 2009-10-26 at 13:28 +, Dean Rasheed wrote: It works for all kinds of trigger events, and is intended as a complete drop-in replacement for the after triggers queue. All of those seem false in the general case. What will you do? At this point I'm looking for more feedback as to whether any of this is a show-stopper, before I expend more effort on this patch. I see no show stoppers, only for you to look at ways of specifying that this optimization is possible for particular cases. I think we might be able to make the general statement that it will work for all after triggers that execute STABLE or IMMUTABLE functions. I don't think we can assume that firing order is irrelevant for some cases, e.g. message queues. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
On Mon, 2009-10-26 at 13:41 +, Dean Rasheed wrote: I did a quick bit of testing, and I think that there is a locking/concurrency problem :-( Unfortunately I can't reproduce the problem on my machine; it always passes. If you have a minute, can you try to determine if the problem can happen with a non-deferrable constraint? I'll keep looking into it. Thanks, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
2009/10/26 Simon Riggs si...@2ndquadrant.com: On Mon, 2009-10-26 at 13:28 +, Dean Rasheed wrote: It works for all kinds of trigger events, and is intended as a complete drop-in replacement for the after triggers queue. All of those seem false in the general case. What will you do? At this point I'm looking for more feedback as to whether any of this is a show-stopper, before I expend more effort on this patch. I see no show stoppers, only for you to look at ways of specifying that this optimization is possible for particular cases. I think we might be able to make the general statement that it will work for all after triggers that execute STABLE or IMMUTABLE functions. I don't think we can assume that firing order is irrelevant for some cases, e.g. message queues. Hmm, thinking about this some more... one thing this patch does is to separate out the queues for regular triggers from those for RI triggers and deferrable constraint checks. ITSM that row-order only really matters for the former. It's also the case that for these triggers there will never be any other choice but to execute them one at a time, so they may as well just spool to a file rather than using a TID bitmap. The bitmaps are probably only useful for constraint triggers, where a bulk check can be used instead of executing individual triggers for each row, if enough rows are modified. - Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
2009/10/26 Jeff Davis pg...@j-davis.com: On Mon, 2009-10-26 at 13:41 +, Dean Rasheed wrote: I did a quick bit of testing, and I think that there is a locking/concurrency problem :-( Unfortunately I can't reproduce the problem on my machine; it always passes. That's odd. It happens every time on my machine (10 threads, 1000 loops). If you have a minute, can you try to determine if the problem can happen with a non-deferrable constraint? If anything, that seems to make it fail more quickly. If it's of any relevance, I'm currently using an optimised build, with assert checking off. [Linux x86_64, 2 core Intel Core2] - Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
On Mon, Oct 26, 2009 at 9:46 AM, Simon Riggs si...@2ndquadrant.com wrote: On Mon, 2009-10-26 at 13:28 +, Dean Rasheed wrote: It works for all kinds of trigger events, and is intended as a complete drop-in replacement for the after triggers queue. All of those seem false in the general case. What will you do? At this point I'm looking for more feedback as to whether any of this is a show-stopper, before I expend more effort on this patch. I see no show stoppers, only for you to look at ways of specifying that this optimization is possible for particular cases. I think we might be able to make the general statement that it will work for all after triggers that execute STABLE or IMMUTABLE functions. I don't think we can assume that firing order is irrelevant for some cases, e.g. message queues. Hmm. After-trigger functions are very unlikely to really be STABLE or IMMUTABLE, though. Almost by definition, they'd better be modifying some data somewhere, or there's no point. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote: This is a WIP patch to replace the after-trigger queues with TID bitmaps to prevent them from using excessive amounts of memory. Each round of trigger executions is a modified bitmap heap scan. This is an interesting patch. The justification is fine, the idea is good, though I'd like to see more analysis of the technique, what other options exist and some thought about when we should use the technique. We have a bitmap for each UPDATE statement, I think, but there's no docs or readme. Why just UPDATE? Is the cost of starting up the bitmap higher than the existing mechanism? Do we need to look at starting with an existing mechanism and then switching over to new mechanism? Is the TID bitmap always a win for large numbers of rows? The technique relies on these assumptions * Trigger functions are idempotent * Trigger execution order is not important (in terms of rows) * Multiple trigger execution order is not important All of those seem false in the general case. What will you do? -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
On Mon, 2009-10-19 at 17:48 +0100, Dean Rasheed wrote: This is a WIP patch to replace the after-trigger queues with TID bitmaps to prevent them from using excessive amounts of memory. Each round of trigger executions is a modified bitmap heap scan. Can you please take a look at my patch here: http://archives.postgresql.org/message-id/1256499249.12775.20.ca...@jdavis to make sure that we're not interfering with eachother? I implemented deferred constraint checking in my operator exclusion constraints patch (formerly generalized index constraints). After looking very briefly at your approach, I think that it's entirely orthogonal, so I don't expect a problem. I have a git repo here: http://git.postgresql.org/gitweb?p=users/jdavis/postgres.git;a=shortlog;h=refs/heads/operator-exclusion-constraints which may be helpful if you just want to look at the commit for deferred constraint checking. Any comments welcome. I'll also take a look at your patch in the next few days. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
On Mon, Oct 19, 2009 at 12:48 PM, Dean Rasheed dean.a.rash...@googlemail.com wrote: This is a WIP patch to replace the after-trigger queues with TID bitmaps to prevent them from using excessive amounts of memory. Each round of trigger executions is a modified bitmap heap scan. If the bitmap becomes lossy, how do you preserve the correct semantics? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Scaling up deferred unique checks and the after trigger queue
2009/10/19 Robert Haas robertmh...@gmail.com: On Mon, Oct 19, 2009 at 12:48 PM, Dean Rasheed dean.a.rash...@googlemail.com wrote: This is a WIP patch to replace the after-trigger queues with TID bitmaps to prevent them from using excessive amounts of memory. Each round of trigger executions is a modified bitmap heap scan. If the bitmap becomes lossy, how do you preserve the correct semantics? ...Robert The idea is that it filters by the transaction ID and command ID of modified rows to see what's been updated in the command(s) the trigger is for... - Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers