Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Tue, Jul 3, 2012 at 10:22 AM, Stanislaw Pankevich s.pankev...@gmail.com wrote: Hello, My question below is almost exact copy of the on on SO: http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way The post on SO caused a few answers, all as one stating DO ONLY TRUNCATION - this is the fast. Also I think I've met some amount of misunderstanding of what exactly do I want. I would appreciate it great, if you try, as people whom I may trust in performance question. Here goes the SO subject, formulating exact task I want to accomplish, this procedure is intended to be run beetween after or before each test, ensure database is cleaned enough and has reset unique identifiers column (User.id of the first User should be nor the number left from previous test in a test suite but 1). Here goes the message: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. I found myself such kind of solution for MySQL, it performs much faster than just truncation of tables one by one. But anyway, I am interested in the fastest solutions for MySQL too. See my result here, of course it it for MySQL only: https://github.com/bmabey/database_cleaner/issues/126 I have following assumptions: I have 30-100 tables. Let them be 30. Half of the tables are empty. Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large. I need an optional possibility to exclude 2 or 5 or N tables from this procedure. I cannot! use transactions. I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. I see the following approaches: 1) Truncate each table. It is too slow, I think, especially for empty tables. 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. I use Ruby code to iterate through all tables, calling code below on each of them, I tried to setup SQL code running against each table like: DO $$DECLARE r record; BEGIN somehow_captured = SELECT last_value from #{table}_id_seq IF (somehow_captured == 1) THEN == restore initial unique identifier column value here == END IF (somehow_captured 1) THEN TRUNCATE TABLE #{table}; END IF; END$$; This didn't work because you can't use variables for table names in non-dynamic (that is, executed as a string) statements. You'd probably want: EXECUTE 'TRUNCATE TABLE ' || #{table}; As to performance, TRUNCATE in postgres (just like mysql) has the nice property that the speed of truncation is mostly not dependent on table size: truncating a table with 100 records is not very much faster than truncating a table with millions of records. For very small tables, it might be faster to simply fire off a delete. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
If someone is interested with the current strategy, I am using for this, see this Ruby-based repo https://github.com/stanislaw/truncate-vs-count for both MySQL and PostgreSQL. MySQL: the fastest strategy for cleaning databases is truncation with following modifications: 1) We check is table is not empty and then truncate. 2) If table is empty, we check if AUTO_INCREMENT was changed. If it was, we do a truncate. For MySQL just truncation is much faster than just deletion. The only case where DELETE wins TRUNCATE is doing it on empty table. For MySQL truncation with empty checks is much faster than just multiple truncation. For MySQL deletion with empty checks is much faster than just DELETE on each tables. PostgreSQL: The fastest strategy for cleaning databases is deletion with the same modifications. For PostgreSQL just deletion is much faster than just TRUNCATION(even multiple). For PostgreSQL multiple TRUNCATE doing empty checks before is slightly faster than just multiple TRUNCATE For PostgreSQL deletion with empty checks is slightly faster than just PostgreSQL deletion. This is from where it began: https://github.com/bmabey/database_cleaner/issues/126 This is the result code and long discussion: https://github.com/bmabey/database_cleaner/issues/126 We began collecting users feedback proving my idea with first checking empty tables is right. Thanks to all participants, especially those who've suggested trying DELETE as well as optimizing TRUNCATE. Stanislaw On Fri, Jul 6, 2012 at 7:06 PM, Steve Crawford scrawf...@pinpointresearch.com wrote: On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. It would help if we really understood your use-case. If you want to fully reset your database to a known starting state for test runs, why not just have a base database initialized exactly as you wish, say test_base, then just drop your test database and create the new database from your template: drop database test; create database test template test_base; This should be very fast but it won't allow you to exclude individual tables. Are you interested in absolute fastest as a mind-game or is there a specific use requirement, i.e. how fast is fast enough? This is the basic starting point for tuning, hardware selection, etc. Truncate should be extremely fast but on tables that are as tiny as yours the difference may not be visible to an end-user. I just tried a delete from to empty a 10,000 record table and it took 14 milliseconds so you could do your maximum of 100 tables each containing 10-times your max number of records in less than two seconds. Regardless of the method you choose, you need to be sure that nobody is accessing the database when you reset it. The drop/create database method will, of course, require and enforce that. Truncate requires an exclusive lock so it may appear to be very slow if it is waiting to get that lock. And even if you don't have locking issues, your reluctance to wrap your reset code in transactions means that a client could be updating some table or tables whenever the reset script isn't actively working on that same table leading to unexplained weird test results. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/13/2012 03:50 PM, Stanislaw Pankevich wrote: MySQL: the fastest strategy for cleaning databases is truncation with following modifications: 1) We check is table is not empty and then truncate. 2) If table is empty, we check if AUTO_INCREMENT was changed. If it was, we do a truncate. For MySQL just truncation is much faster than just deletion. You're talking about MySQL like it's only one database. Is this with MyISAM tables? InnoDB? Something else? I don't see any mention of table formats in a very quick skim of the discussion you linked to. PostgreSQL will /never/ be able to compete with MyISAM on raw speed of small, simple operations. There might things that can be made faster than they are right now, but I really doubt it'll ever surpass MyISAM. My mental analogy is asking an abseiler, who is busy clipping in and testing their gear at the top of a bridge, why they aren't at the bottom of the canyon with the BASE jumper yet. The BASE jumper will always get there faster, but the abseiler will always get there alive. If you're talking about InnoDB or another durable, reliable table structure then I'd be interested in the mechanics of what MySQL's truncates are doing. -- Craig Ringer
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Thanks for the answer. Please, see my answers below: On Fri, Jul 6, 2012 at 2:35 PM, Craig Ringer ring...@ringerc.id.au wrote: On 07/06/2012 07:29 PM, Craig Ringer wrote: On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote: I cannot! use transactions. Everything in PostgreSQL uses transactions, they are not optional. I'm assuming you mean you can't use explicit transaction demarcation, ie BEGIN and COMMIT. Yes, right! need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. Just so you know, there isn't really any PostgreSQL 8 or PostgreSQL 9. Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct major versions. This is different to most software and IMO pretty damn annoying, but that's how it is. Yes, right! I've meant queries as much universal across different versions as possible by saying this. 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; YES, I know it ;) and I use this option! 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. You can examine the value of SELECT last_value FROM the_sequence ; I tried using last_value, but somehow, it was equal 1, for table with 0 rows, and for table with 1 rows, and began to increment only after rows 1! This seemed very strange to me, but I ensured it working this way by many times running my test script. Because of this, I am using SELECT currval. that's the equivalent of the MySQL hack you're using. To set it, use 'setval(...)'. http://www.postgresql.org/docs/9.1/static/functions-sequence.html I use Ruby code to iterate through all tables If you want to be fast, get rid of iteration. Do it all in one query or a couple of simple queries. Minimize the number of round-trips and queries. I'll be truly stunned if the fastest way isn't to just TRUNCATE all the target tables in a single statement (not iteratively one by one with separate TRUNCATEs). Oh, also, you can setval(...) a bunch of sequences at once: SELECT setval('first_seq', 0), setval('second_seq', 0), setval('third_seq', 0), setval('fouth_seq', 0); ... etc. You should only need two statements, fast ones, to reset your DB to the default state. Good idea! Could please look at my latest results at https://github.com/stanislaw/truncate-vs-count? I think they are awesome for test oriented context. In slower way, resetting ids I do SELECT currval('#{table}_id_seq'); then check whether it raises an error or 0. In a faster way, just checking for a number of rows, for each table I do: at_least_one_row = execute(-TR SELECT true FROM #{table} LIMIT 1; TR ) If there is at least one row, I add this table to the list of tables_to_truncate. Finally I run multiple truncate: TRUNCATE tables_to_truncate; Thanks, Stanislaw. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Fri, Jul 6, 2012 at 4:38 PM, Craig Ringer ring...@ringerc.id.au wrote: On 07/06/2012 07:38 PM, Daniel Farina wrote: On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote: 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; I have seen in trivial cases -- in terms of data size -- where TRUNCATE is much slower than a full-table DELETE. The most common use case for that is rapid setup/teardown of tests, where it can add up quite quickly and in a very big way. This is probably an artifact the speed of one's file system to truncate and/or unlink everything. That makes some sense, actually. DELETEing from a table that has no foreign keys, triggers, etc while nothing else is accessing the table is fairly cheap and doesn't take much (any?) cleanup work afterwards. For tiny deletes I can easily see it being better than forcing the OS to journal a metadata change or two and a couple of fsync()s for a truncate. I suspect truncating many tables at once will prove a win over iteratively DELETEing from many tables at once. I'd benchmark it except that it's optimizing something I don't care about at all, and the results would be massively dependent on the file system (ext3, ext4, xfs) and its journal configuration. Question: Is there a possibility in PostgreSQL to do DELETE on many tables massively, like TRUNCATE allows. Like DELETE table1, table2, ...? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Fri, Jul 6, 2012 at 4:39 PM, Albe Laurenz laurenz.a...@wien.gv.at wrote: Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. I have following assumptions: I have 30-100 tables. Let them be 30. Half of the tables are empty. Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large. I need an optional possibility to exclude 2 or 5 or N tables from this procedure. I cannot! use transactions. Why? That would definitely speed up everything. It is because of specifics of Ruby or the Rails testing environment, when running tests again webdriver, which uses its own connection separate from one, which test suite itself uses. Transactions are great, but not for all cases. I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. I see the following approaches: 1) Truncate each table. It is too slow, I think, especially for empty tables. Did you actually try it? That's the king's way to performance questions! Truncating a single table is done in a matter of microseconds, particularly if it is not big. Do you have tens of thousands of tables? Actually, 10-100 tables. 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. That seems fragile an won't work everywhere. What if the table has no primary key with a DEFAULT that uses a sequence? What if it has such a key, but the DEFAULT was not used for an INSERT? What if somebody manually reset the sequence? I'm using currval in my latest code. Besides, how do you find out what the sequence for a table's primary key is? With a SELECT, I guess. That SELECT is probably not faster than a simple TRUNCATE. Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the check procedure units, cleaning procedure should consist of, but haven't accomplished it too. You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't think that this will be considerably faster than just truncating the table. Exactly this query is much faster, believe me. You can see my latest results on https://github.com/stanislaw/truncate-vs-count. I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about Obviously, you need DatabaseCleaner for PG and so on and so on. I completely fail to understand what you talk about here. Yes, I know it is very Ruby and Ruby on Rails specific. But I tried to make my question clear and abstract enough, to be understandable without the context it was originally drawn from. Thanks. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Fri, Jul 6, 2012 at 5:22 PM, Craig Ringer ring...@ringerc.id.au wrote: On 07/06/2012 09:45 PM, Stanislaw Pankevich wrote: Question: Is there a possibility in PostgreSQL to do DELETE on many tables massively, like TRUNCATE allows. Like DELETE table1, table2, ...? Yes, you can do it with a writable common table expression, but you wanted version portability. WITH discard1 AS (DELETE FROM test1), discard2 AS (DELETE FROM test2 AS b) SELECT 1; Not only will this not work in older versions (IIRC it only works with 9.1, maybe 9.0 too but I don't see it in the documentation for SELECT for 9.0) but I find it hard to imagine any performance benefit over simply sending DELETE FROM test1; DELETE FROM test2; This all smells like premature optimisation of cases that don't matter. What problem are you solving with this? I will write tests for both massive TRUNCATE and DELETE (DELETE each_table) for my case with Ruby testing environment, and let you know about the results. For now, I think, I should go for massive TRUNCATE. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Interesting catch, I will try to test the behavior of 'DELETE vs multiple TRUNCATE'. I'll post it here, If I discover any amazing results. On Fri, Jul 6, 2012 at 2:38 PM, Daniel Farina dan...@heroku.com wrote: On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote: 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; I have seen in trivial cases -- in terms of data size -- where TRUNCATE is much slower than a full-table DELETE. The most common use case for that is rapid setup/teardown of tests, where it can add up quite quickly and in a very big way. This is probably an artifact the speed of one's file system to truncate and/or unlink everything. I haven't tried a multi-truncate though. Still, I don't know a mechanism besides slow file system truncation time that would explain why DELETE would be significantly faster. -- fdr -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Marc, thanks for the answer. Na, these seem not to be enough universal and easy to hook into existing truncation strategies used in Ruby world. On Fri, Jul 6, 2012 at 6:24 PM, Marc Mamin m.ma...@intershop.de wrote: Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones Hello, 2 'exotic' ideas: - use dblink_send_query to do the job in multiple threads (I doubt this really could be faster) - have prepared empty tables in a separate schema, and a garbage schema: ALTER TABLE x set schema garbage; ALTER TABLE prepared.x set schema current; you should be ready for the next test, but still have to clean garbage nad moved to prepared for the next but one in the background best regards, Marc Mamin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Hello, My question below is almost exact copy of the on on SO: http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way The post on SO caused a few answers, all as one stating DO ONLY TRUNCATION - this is the fast. Also I think I've met some amount of misunderstanding of what exactly do I want. I would appreciate it great, if you try, as people whom I may trust in performance question. Here goes the SO subject, formulating exact task I want to accomplish, this procedure is intended to be run beetween after or before each test, ensure database is cleaned enough and has reset unique identifiers column (User.id of the first User should be nor the number left from previous test in a test suite but 1). Here goes the message: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. I found myself such kind of solution for MySQL, it performs much faster than just truncation of tables one by one. But anyway, I am interested in the fastest solutions for MySQL too. See my result here, of course it it for MySQL only: https://github.com/bmabey/database_cleaner/issues/126 I have following assumptions: I have 30-100 tables. Let them be 30. Half of the tables are empty. Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large. I need an optional possibility to exclude 2 or 5 or N tables from this procedure. I cannot! use transactions. I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. I see the following approaches: 1) Truncate each table. It is too slow, I think, especially for empty tables. 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. I use Ruby code to iterate through all tables, calling code below on each of them, I tried to setup SQL code running against each table like: DO $$DECLARE r record; BEGIN somehow_captured = SELECT last_value from #{table}_id_seq IF (somehow_captured == 1) THEN == restore initial unique identifier column value here == END IF (somehow_captured 1) THEN TRUNCATE TABLE #{table}; END IF; END$$; Manipulating this code in various aspects, I couldn't make it work, because of I am unfamiliar with PostgreSQL functions and blocks (and variables). Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the check procedure units, cleaning procedure should consist of, but haven't accomplished it too. I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way. Thanks! UPDATE: I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about Obviously, you need DatabaseCleaner for PG and so on and so on. post ends Thanks, Stanislaw.
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote: I cannot! use transactions. Everything in PostgreSQL uses transactions, they are not optional. I'm assuming you mean you can't use explicit transaction demarcation, ie BEGIN and COMMIT. need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. Just so you know, there isn't really any PostgreSQL 8 or PostgreSQL 9. Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct major versions. This is different to most software and IMO pretty damn annoying, but that's how it is. 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. You can examine the value of SELECT last_value FROM the_sequence ; that's the equivalent of the MySQL hack you're using. To set it, use 'setval(...)'. http://www.postgresql.org/docs/9.1/static/functions-sequence.html I use Ruby code to iterate through all tables If you want to be fast, get rid of iteration. Do it all in one query or a couple of simple queries. Minimize the number of round-trips and queries. I'll be truly stunned if the fastest way isn't to just TRUNCATE all the target tables in a single statement (not iteratively one by one with separate TRUNCATEs). -- Craig Ringer
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/06/2012 07:29 PM, Craig Ringer wrote: On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote: I cannot! use transactions. Everything in PostgreSQL uses transactions, they are not optional. I'm assuming you mean you can't use explicit transaction demarcation, ie BEGIN and COMMIT. need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. Just so you know, there isn't really any PostgreSQL 8 or PostgreSQL 9. Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct major versions. This is different to most software and IMO pretty damn annoying, but that's how it is. 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. You can examine the value of SELECT last_value FROM the_sequence ; that's the equivalent of the MySQL hack you're using. To set it, use 'setval(...)'. http://www.postgresql.org/docs/9.1/static/functions-sequence.html I use Ruby code to iterate through all tables If you want to be fast, get rid of iteration. Do it all in one query or a couple of simple queries. Minimize the number of round-trips and queries. I'll be truly stunned if the fastest way isn't to just TRUNCATE all the target tables in a single statement (not iteratively one by one with separate TRUNCATEs). Oh, also, you can setval(...) a bunch of sequences at once: SELECT setval('first_seq', 0), setval('second_seq', 0), setval('third_seq', 0), setval('fouth_seq', 0); ... etc. You should only need two statements, fast ones, to reset your DB to the default state. -- Craig Ringer
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote: 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; I have seen in trivial cases -- in terms of data size -- where TRUNCATE is much slower than a full-table DELETE. The most common use case for that is rapid setup/teardown of tests, where it can add up quite quickly and in a very big way. This is probably an artifact the speed of one's file system to truncate and/or unlink everything. I haven't tried a multi-truncate though. Still, I don't know a mechanism besides slow file system truncation time that would explain why DELETE would be significantly faster. -- fdr -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/06/2012 07:38 PM, Daniel Farina wrote: On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote: 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; I have seen in trivial cases -- in terms of data size -- where TRUNCATE is much slower than a full-table DELETE. The most common use case for that is rapid setup/teardown of tests, where it can add up quite quickly and in a very big way. This is probably an artifact the speed of one's file system to truncate and/or unlink everything. That makes some sense, actually. DELETEing from a table that has no foreign keys, triggers, etc while nothing else is accessing the table is fairly cheap and doesn't take much (any?) cleanup work afterwards. For tiny deletes I can easily see it being better than forcing the OS to journal a metadata change or two and a couple of fsync()s for a truncate. I suspect truncating many tables at once will prove a win over iteratively DELETEing from many tables at once. I'd benchmark it except that it's optimizing something I don't care about at all, and the results would be massively dependent on the file system (ext3, ext4, xfs) and its journal configuration. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. I have following assumptions: I have 30-100 tables. Let them be 30. Half of the tables are empty. Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large. I need an optional possibility to exclude 2 or 5 or N tables from this procedure. I cannot! use transactions. Why? That would definitely speed up everything. I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. I see the following approaches: 1) Truncate each table. It is too slow, I think, especially for empty tables. Did you actually try it? That's the king's way to performance questions! Truncating a single table is done in a matter of microseconds, particularly if it is not big. Do you have tens of thousands of tables? 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. That seems fragile an won't work everywhere. What if the table has no primary key with a DEFAULT that uses a sequence? What if it has such a key, but the DEFAULT was not used for an INSERT? What if somebody manually reset the sequence? Besides, how do you find out what the sequence for a table's primary key is? With a SELECT, I guess. That SELECT is probably not faster than a simple TRUNCATE. Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the check procedure units, cleaning procedure should consist of, but haven't accomplished it too. You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't think that this will be considerably faster than just truncating the table. I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way. Thanks! UPDATE: I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about Obviously, you need DatabaseCleaner for PG and so on and so on. I completely fail to understand what you talk about here. Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Friday, July 06, 2012 01:38:56 PM Daniel Farina wrote: ll, I don't know a mechanism besides slow file system truncation time that would explain why DELETE would be significantly faster. There is no filesystem truncation happening. The heap and the indexes get mapped into a new file. Otherwise rollback would be pretty hard to implement. I guess the biggest cost in a bigger cluster is the dropping the buffers that were formerly mapped to that relation (DropRelFileNodeBuffers). Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/06/2012 09:45 PM, Stanislaw Pankevich wrote: Question: Is there a possibility in PostgreSQL to do DELETE on many tables massively, like TRUNCATE allows. Like DELETE table1, table2, ...? Yes, you can do it with a writable common table expression, but you wanted version portability. WITH discard1 AS (DELETE FROM test1), discard2 AS (DELETE FROM test2 AS b) SELECT 1; Not only will this not work in older versions (IIRC it only works with 9.1, maybe 9.0 too but I don't see it in the documentation for SELECT for 9.0) but I find it hard to imagine any performance benefit over simply sending DELETE FROM test1; DELETE FROM test2; This all smells like premature optimisation of cases that don't matter. What problem are you solving with this? -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. I see the following approaches: 1) Truncate each table. It is too slow, I think, especially for empty tables. Did you actually try it? That's the king's way to performance questions! Truncating a single table is done in a matter of microseconds, particularly if it is not big. Do you have tens of thousands of tables? Actually, 10-100 tables. You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't think that this will be considerably faster than just truncating the table. Exactly this query is much faster, believe me. You can see my latest results on https://github.com/stanislaw/truncate-vs-count. Ok, I believe you. My quick tests showed that a sible truncate (including transaction and client-server roundtrip via UNIX sockets takes some 10 to 30 milliseconds. Multiply that with 100, and you end up with just a few seconds at most. Or what did you measure? I guess you run that deletion very often so that it is painful. Still I think that the biggest performance gain is to be had by using PostgreSQL's features (truncate several tables in one statement, ...). Try to bend your Ruby framework! Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones Hello, 2 'exotic' ideas: - use dblink_send_query to do the job in multiple threads (I doubt this really could be faster) - have prepared empty tables in a separate schema, and a garbage schema: ALTER TABLE x set schema garbage; ALTER TABLE prepared.x set schema current; you should be ready for the next test, but still have to clean garbage nad moved to prepared for the next but one in the background best regards, Marc Mamin I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. I see the following approaches: 1) Truncate each table. It is too slow, I think, especially for empty tables. Did you actually try it? That's the king's way to performance questions! Truncating a single table is done in a matter of microseconds, particularly if it is not big. Do you have tens of thousands of tables? Actually, 10-100 tables. You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't think that this will be considerably faster than just truncating the table. Exactly this query is much faster, believe me. You can see my latest results on https://github.com/stanislaw/truncate-vs-count. Ok, I believe you. My quick tests showed that a sible truncate (including transaction and client-server roundtrip via UNIX sockets takes some 10 to 30 milliseconds. Multiply that with 100, and you end up with just a few seconds at most. Or what did you measure? I guess you run that deletion very often so that it is painful. Still I think that the biggest performance gain is to be had by using PostgreSQL's features (truncate several tables in one statement, ...). Try to bend your Ruby framework! Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote: On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote: 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; This still calls DropRelFileNodeAllBuffers once for each table (and each index), even if the table is empty. With large shared_buffers, this can be relatively slow. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. It would help if we really understood your use-case. If you want to fully reset your database to a known starting state for test runs, why not just have a base database initialized exactly as you wish, say test_base, then just drop your test database and create the new database from your template: drop database test; create database test template test_base; This should be very fast but it won't allow you to exclude individual tables. Are you interested in absolute fastest as a mind-game or is there a specific use requirement, i.e. how fast is fast enough? This is the basic starting point for tuning, hardware selection, etc. Truncate should be extremely fast but on tables that are as tiny as yours the difference may not be visible to an end-user. I just tried a delete from to empty a 10,000 record table and it took 14 milliseconds so you could do your maximum of 100 tables each containing 10-times your max number of records in less than two seconds. Regardless of the method you choose, you need to be sure that nobody is accessing the database when you reset it. The drop/create database method will, of course, require and enforce that. Truncate requires an exclusive lock so it may appear to be very slow if it is waiting to get that lock. And even if you don't have locking issues, your reluctance to wrap your reset code in transactions means that a client could be updating some table or tables whenever the reset script isn't actively working on that same table leading to unexplained weird test results. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance