Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-18 Thread Merlin Moncure
On Tue, Jul 3, 2012 at 10:22 AM, Stanislaw Pankevich
s.pankev...@gmail.com wrote:
 Hello,

 My question below is almost exact copy of the on on SO:
 http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way

 The post on SO caused a few answers, all as one stating DO ONLY TRUNCATION
 - this is the fast.

 Also I think I've met some amount of misunderstanding of what exactly do I
 want. I would appreciate it great, if you try, as people whom I may trust in
 performance question.

 Here goes the SO subject, formulating exact task I want to accomplish, this
 procedure is intended to be run beetween after or before each test, ensure
 database is cleaned enough and has reset unique identifiers column (User.id
 of the first User should be nor the number left from previous test in a test
 suite but 1). Here goes the message:

  PostgreSQL db, 30 tables with number of rows  100 (not huge) - the
 fastest way to clean each non-empty table and reset unique identifier column
 of empty ones 

 I wonder, what is the fastest way to accomplish this kind of task in
 PostgreSQL. I am interested in the fastest solutions ever possible.

 I found myself such kind of solution for MySQL, it performs much faster than
 just truncation of tables one by one. But anyway, I am interested in the
 fastest solutions for MySQL too. See my result here, of course it it for
 MySQL only: https://github.com/bmabey/database_cleaner/issues/126

 I have following assumptions:

 I have 30-100 tables. Let them be 30.

 Half of the tables are empty.

 Each non-empty table has, say, no more than 100 rows. By this I mean,
 tables are NOT large.

 I need an optional possibility to exclude 2 or 5 or N tables from this
 procedure.

 I cannot! use transactions.

 I need the fastest cleaning strategy for such case working on PostgreSQL
 both 8 and 9.

 I see the following approaches:

 1) Truncate each table. It is too slow, I think, especially for empty
 tables.

 2) Check each table for emptiness by more faster method, and then if it is
 empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL)
 to initial state (1), i.e to restore its last_value from sequence (the same
 AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.

 I use Ruby code to iterate through all tables, calling code below on each of
 them, I tried to setup SQL code running against each table like:

 DO $$DECLARE r record;
 BEGIN
   somehow_captured = SELECT last_value from #{table}_id_seq
   IF (somehow_captured == 1) THEN
 == restore initial unique identifier column value here ==
   END

   IF (somehow_captured  1) THEN
 TRUNCATE TABLE #{table};
   END IF;
 END$$;

This didn't work because you can't use variables for table names in
non-dynamic (that is, executed as a string) statements. You'd probably
want:

EXECUTE 'TRUNCATE TABLE ' || #{table};

As to performance, TRUNCATE in postgres (just like mysql) has the nice
property that the speed of truncation is mostly not dependent on table
size: truncating a table with 100 records is not very much faster than
truncating a table with millions of records.  For very small tables,
it might be faster to simply fire off a delete.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-13 Thread Stanislaw Pankevich
If someone is interested with the current strategy, I am using for
this, see this Ruby-based repo
https://github.com/stanislaw/truncate-vs-count for both MySQL and
PostgreSQL.

MySQL: the fastest strategy for cleaning databases is truncation with
following modifications:
1) We check is table is not empty and then truncate.
2) If table is empty, we check if AUTO_INCREMENT was changed. If it
was, we do a truncate.

For MySQL just truncation is much faster than just deletion. The only
case where DELETE wins TRUNCATE is doing it on empty table.
For MySQL truncation with empty checks is much faster than just
multiple truncation.
For MySQL deletion with empty checks is much faster than just DELETE
on each tables.

PostgreSQL: The fastest strategy for cleaning databases is deletion
with the same modifications.

For PostgreSQL just deletion is much faster than just TRUNCATION(even multiple).
For PostgreSQL multiple TRUNCATE doing empty checks before is slightly
faster than just multiple TRUNCATE
For PostgreSQL deletion with empty checks is slightly faster than just
PostgreSQL deletion.

This is from where it began:
https://github.com/bmabey/database_cleaner/issues/126
This is the result code and long discussion:
https://github.com/bmabey/database_cleaner/issues/126

We began collecting users feedback proving my idea with first checking
empty tables is right.

Thanks to all participants, especially those who've suggested trying
DELETE as well as optimizing TRUNCATE.

Stanislaw

On Fri, Jul 6, 2012 at 7:06 PM, Steve Crawford
scrawf...@pinpointresearch.com wrote:
 On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote:


  PostgreSQL db, 30 tables with number of rows  100 (not huge) - the
 fastest way to clean each non-empty table and reset unique identifier column
 of empty ones 

 I wonder, what is the fastest way to accomplish this kind of task in
 PostgreSQL. I am interested in the fastest solutions ever possible.

 It would help if we really understood your use-case. If you want to fully
 reset your database to a known starting state for test runs, why not just
 have a base database initialized exactly as you wish, say test_base, then
 just drop your test database and create the new database from your template:
 drop database test;
 create database test template test_base;

 This should be very fast but it won't allow you to exclude individual
 tables.

 Are you interested in absolute fastest as a mind-game or is there a specific
 use requirement, i.e. how fast is fast enough? This is the basic starting
 point for tuning, hardware selection, etc.

 Truncate should be extremely fast but on tables that are as tiny as yours
 the difference may not be visible to an end-user. I just tried a delete
 from to empty a 10,000 record table and it took 14 milliseconds so you
 could do your maximum of 100 tables each containing 10-times your max number
 of records in less than two seconds.

 Regardless of the method you choose, you need to be sure that nobody is
 accessing the database when you reset it. The drop/create database method
 will, of course, require and enforce that. Truncate requires an exclusive
 lock so it may appear to be very slow if it is waiting to get that lock. And
 even if you don't have locking issues, your reluctance to wrap your reset
 code in transactions means that a client could be updating some table or
 tables whenever the reset script isn't actively working on that same table
 leading to unexplained weird test results.

 Cheers,
 Steve


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-13 Thread Craig Ringer

On 07/13/2012 03:50 PM, Stanislaw Pankevich wrote:

MySQL: the fastest strategy for cleaning databases is truncation with
following modifications:
1) We check is table is not empty and then truncate.
2) If table is empty, we check if AUTO_INCREMENT was changed. If it
was, we do a truncate.

For MySQL just truncation is much faster than just deletion.
You're talking about MySQL like it's only one database. Is this with 
MyISAM tables? InnoDB? Something else? I don't see any mention of table 
formats in a very quick skim of the discussion you linked to.


PostgreSQL will /never/ be able to compete with MyISAM on raw speed of 
small, simple operations. There might things that can be made faster 
than they are right now, but I really doubt it'll ever surpass MyISAM.


My mental analogy is asking an abseiler, who is busy clipping in and 
testing their gear at the top of a bridge, why they aren't at the bottom 
of the canyon with the BASE jumper yet.


The BASE jumper will always get there faster, but the abseiler will 
always get there alive.


If you're talking about InnoDB or another durable, reliable table 
structure then I'd be interested in the mechanics of what MySQL's 
truncates are doing.


--
Craig Ringer


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-09 Thread Stanislaw Pankevich
Thanks for the answer.

Please, see my answers below:

On Fri, Jul 6, 2012 at 2:35 PM, Craig Ringer ring...@ringerc.id.au wrote:
 On 07/06/2012 07:29 PM, Craig Ringer wrote:

 On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:

 I cannot! use transactions.

 Everything in PostgreSQL uses transactions, they are not optional.

 I'm assuming you mean you can't use explicit transaction demarcation, ie
 BEGIN and COMMIT.

Yes, right!

  need the fastest cleaning strategy for such case working on PostgreSQL both
 8 and 9.

 Just so you know, there isn't really any PostgreSQL 8 or PostgreSQL 9.
 Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct major
 versions. This is different to most software and IMO pretty damn annoying,
 but that's how it is.

Yes, right! I've meant queries as much universal across different
versions as possible by saying this.


 1) Truncate each table. It is too slow, I think, especially for empty
 tables.

 Really?!? TRUNCATE should be extremely fast, especially on empty tables.

 You're aware that you can TRUNCATE many tables in one run, right?

 TRUNCATE TABLE a, b, c, d, e, f, g;

YES, I know it ;) and I use this option!

 2) Check each table for emptiness by more faster method, and then if it is
 empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL)
 to initial state (1), i.e to restore its last_value from sequence (the same
 AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.

 You can examine the value of SELECT last_value FROM the_sequence ;

I tried using last_value, but somehow, it was equal 1, for table with
0 rows, and for table with 1 rows, and began to increment only after
rows  1! This seemed very strange to me, but I ensured it working
this way by many times running my test script. Because of this, I am
using SELECT currval.

 that's
 the equivalent of the MySQL hack you're using. To set it, use 'setval(...)'.

 http://www.postgresql.org/docs/9.1/static/functions-sequence.html

 I use Ruby code to iterate through all tables


 If you want to be fast, get rid of iteration. Do it all in one query or a
 couple of simple queries. Minimize the number of round-trips and queries.

 I'll be truly stunned if the fastest way isn't to just TRUNCATE all the
 target tables in a single statement (not iteratively one by one with
 separate TRUNCATEs).


 Oh, also, you can setval(...) a bunch of sequences at once:

 SELECT
   setval('first_seq', 0),
   setval('second_seq', 0),
   setval('third_seq', 0),
   setval('fouth_seq', 0);
 ... etc. You should only need two statements, fast ones, to reset your DB to
 the default state.

Good idea!

Could please look at my latest results at
https://github.com/stanislaw/truncate-vs-count? I think they are
awesome for test oriented context.

In slower way, resetting ids I do SELECT currval('#{table}_id_seq');
then check whether it raises an error or  0.

In a faster way, just checking for a number of rows, for each table I do:
at_least_one_row = execute(-TR
SELECT true FROM #{table} LIMIT 1;
TR
)

If there is at least one row, I add this table to the list of
tables_to_truncate.
Finally I run multiple truncate: TRUNCATE tables_to_truncate;

Thanks,
Stanislaw.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-09 Thread Stanislaw Pankevich
On Fri, Jul 6, 2012 at 4:38 PM, Craig Ringer ring...@ringerc.id.au wrote:
 On 07/06/2012 07:38 PM, Daniel Farina wrote:

 On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au
 wrote:

 1) Truncate each table. It is too slow, I think, especially for empty
 tables.

 Really?!? TRUNCATE should be extremely fast, especially on empty tables.

 You're aware that you can TRUNCATE many tables in one run, right?

 TRUNCATE TABLE a, b, c, d, e, f, g;

 I have seen in trivial cases -- in terms of data size -- where
 TRUNCATE is much slower than a full-table DELETE.  The most common use
 case for that is rapid setup/teardown of tests, where it can add up
 quite quickly and in a very big way. This is probably an artifact the
 speed of one's file system to truncate and/or unlink everything.

 That makes some sense, actually. DELETEing from a table that has no foreign
 keys, triggers, etc while nothing else is accessing the table is fairly
 cheap and doesn't take much (any?) cleanup work afterwards. For tiny deletes
 I can easily see it being better than forcing the OS to journal a metadata
 change or two and a couple of fsync()s for a truncate.

 I suspect truncating many tables at once will prove a win over iteratively
 DELETEing from many tables at once. I'd benchmark it except that it's
 optimizing something I don't care about at all, and the results would be
 massively dependent on the file system (ext3, ext4, xfs) and its journal
 configuration.

Question:
Is there a possibility in PostgreSQL to do DELETE on many tables
massively, like TRUNCATE allows. Like DELETE table1, table2, ...?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-09 Thread Stanislaw Pankevich
On Fri, Jul 6, 2012 at 4:39 PM, Albe Laurenz laurenz.a...@wien.gv.at wrote:
 Stanislaw Pankevich wrote:
  PostgreSQL db, 30 tables with number of rows  100 (not huge) - the 
 fastest way to clean each
 non-empty table and reset unique identifier column of empty ones 

 I wonder, what is the fastest way to accomplish this kind of task in 
 PostgreSQL. I am interested in
 the fastest solutions ever possible.

 I have following assumptions:

 I have 30-100 tables. Let them be 30.

 Half of the tables are empty.

 Each non-empty table has, say, no more than 100 rows. By this I mean, 
 tables are NOT large.

 I need an optional possibility to exclude 2 or 5 or N tables from this 
 procedure.

 I cannot! use transactions.

 Why? That would definitely speed up everything.
It is because of specifics of Ruby or the Rails testing environment,
when running tests again webdriver, which uses its own connection
separate from one, which test suite itself uses. Transactions are
great, but not for all cases.

 I need the fastest cleaning strategy for such case working on PostgreSQL 
 both 8 and 9.

 I see the following approaches:

 1) Truncate each table. It is too slow, I think, especially for empty tables.

 Did you actually try it? That's the king's way to performance questions!
 Truncating a single table is done in a matter of microseconds, particularly
 if it is not big.
 Do you have tens of thousands of tables?

Actually, 10-100 tables.

 2) Check each table for emptiness by more faster method, and then if it is 
 empty reset its unique
 identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), 
 i.e to restore its
 last_value from sequence (the same AUTO_INCREMENT analog) back to 1, 
 otherwise run truncate on it.

 That seems fragile an won't work everywhere.

 What if the table has no primary key with a DEFAULT that uses a sequence?
 What if it has such a key, but the DEFAULT was not used for an INSERT?
 What if somebody manually reset the sequence?

I'm using currval in my latest code.

 Besides, how do you find out what the sequence for a table's primary key
 is? With a SELECT, I guess. That SELECT is probably not faster than
 a simple TRUNCATE.

 Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be 
 used to work good as one
 of the check procedure units, cleaning procedure should consist of, but 
 haven't accomplished it too.

 You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
 think that this will be considerably faster than just truncating the table.

Exactly this query is much faster, believe me. You can see my latest
results on https://github.com/stanislaw/truncate-vs-count.

 I need all this to run unit and integration tests for Ruby or Ruby on Rails 
 projects. Each test should
 have a clean DB before it runs, or to do a cleanup after itself (so called 
 teardown). Transactions are
 very good, but they become unusable when running tests against particular 
 webdrivers, in my case the
 switch to truncation strategy is needed. Once I updated that with reference 
 to RoR, please do not post
 here the answers about Obviously, you need DatabaseCleaner for PG and so 
 on and so on.

 I completely fail to understand what you talk about here.
Yes, I know it is very Ruby and Ruby on Rails specific. But I tried to
make my question clear and abstract enough, to be understandable
without the context it was originally drawn from.

Thanks.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-09 Thread Stanislaw Pankevich
On Fri, Jul 6, 2012 at 5:22 PM, Craig Ringer ring...@ringerc.id.au wrote:
 On 07/06/2012 09:45 PM, Stanislaw Pankevich wrote:

 Question: Is there a possibility in PostgreSQL to do DELETE on many tables
 massively, like TRUNCATE allows. Like DELETE table1, table2, ...?


 Yes, you can do it with a writable common table expression, but you wanted
 version portability.

 WITH
   discard1 AS (DELETE FROM test1),
   discard2 AS (DELETE FROM test2 AS b)
 SELECT 1;

 Not only will this not work in older versions (IIRC it only works with 9.1,
 maybe 9.0 too but I don't see it in the documentation for SELECT for 9.0)
 but I find it hard to imagine any performance benefit over simply sending

   DELETE FROM test1; DELETE FROM test2;

 This all smells like premature optimisation of cases that don't matter. What
 problem are you solving with this?

I will write tests for both massive TRUNCATE and DELETE (DELETE
each_table) for my case with Ruby testing environment, and let you
know about the results. For now, I think, I should go for massive
TRUNCATE.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-09 Thread Stanislaw Pankevich
Interesting catch, I will try to test the behavior of 'DELETE vs
multiple TRUNCATE'.

I'll post it here, If I discover any amazing results.

On Fri, Jul 6, 2012 at 2:38 PM, Daniel Farina dan...@heroku.com wrote:
 On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote:
 1) Truncate each table. It is too slow, I think, especially for empty
 tables.

 Really?!? TRUNCATE should be extremely fast, especially on empty tables.

 You're aware that you can TRUNCATE many tables in one run, right?

 TRUNCATE TABLE a, b, c, d, e, f, g;

 I have seen in trivial cases -- in terms of data size -- where
 TRUNCATE is much slower than a full-table DELETE.  The most common use
 case for that is rapid setup/teardown of tests, where it can add up
 quite quickly and in a very big way. This is probably an artifact the
 speed of one's file system to truncate and/or unlink everything.

 I haven't tried a multi-truncate though.  Still, I don't know a
 mechanism besides slow file system truncation time that would explain
 why DELETE would be significantly faster.

 --
 fdr

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-09 Thread Stanislaw Pankevich
Marc, thanks for the answer.

Na, these seem not to be enough universal and easy to hook into
existing truncation strategies used in Ruby world.

On Fri, Jul 6, 2012 at 6:24 PM, Marc Mamin m.ma...@intershop.de wrote:



 Stanislaw Pankevich wrote:
  PostgreSQL db, 30 tables with number of rows  100 (not huge) - the
 fastest way to clean each
 non-empty table and reset unique identifier column of empty ones 

 Hello,

 2 'exotic' ideas:

 - use dblink_send_query to do the job in multiple threads (I doubt this
 really could be faster)
 - have prepared empty tables in a separate schema, and a garbage schema:

ALTER TABLE x set schema garbage;
ALTER TABLE prepared.x set schema current;

 you should be ready for the next test,

 but still have to clean garbage nad moved to prepared for the next but one
 in the background

 best regards,

 Marc Mamin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Stanislaw Pankevich
Hello,

My question below is almost exact copy of the on on SO:
http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way

The post on SO caused a few answers, all as one stating DO ONLY TRUNCATION
- this is the fast.

Also I think I've met some amount of misunderstanding of what exactly do I
want. I would appreciate it great, if you try, as people whom I may trust
in performance question.

Here goes the SO subject, formulating exact task I want to accomplish, this
procedure is intended to be run beetween after or before each test, ensure
database is cleaned enough and has reset unique identifiers column (User.id
of the first User should be nor the number left from previous test in a
test suite but 1). Here goes the message:

 PostgreSQL db, 30 tables with number of rows  100 (not huge) - the
fastest way to clean each non-empty table and reset unique identifier
column of empty ones 

I wonder, what is the fastest way to accomplish this kind of task in
PostgreSQL. I am interested in the fastest solutions ever possible.

I found myself such kind of solution for MySQL, it performs much faster
than just truncation of tables one by one. But anyway, I am interested in
the fastest solutions for MySQL too. See my result here, of course it it
for MySQL only: https://github.com/bmabey/database_cleaner/issues/126

I have following assumptions:

I have 30-100 tables. Let them be 30.

Half of the tables are empty.

Each non-empty table has, say, no more than 100 rows. By this I mean,
tables are NOT large.

I need an optional possibility to exclude 2 or 5 or N tables from this
procedure.

I cannot! use transactions.

I need the fastest cleaning strategy for such case working on PostgreSQL
both 8 and 9.

I see the following approaches:

1) Truncate each table. It is too slow, I think, especially for empty
tables.

2) Check each table for emptiness by more faster method, and then if it is
empty reset its unique identifier column (analog of AUTO_INCREMENT in
MySQL) to initial state (1), i.e to restore its last_value from sequence
(the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.

I use Ruby code to iterate through all tables, calling code below on each
of them, I tried to setup SQL code running against each table like:

DO $$DECLARE r record;
BEGIN
  somehow_captured = SELECT last_value from #{table}_id_seq
  IF (somehow_captured == 1) THEN
== restore initial unique identifier column value here ==
  END

  IF (somehow_captured  1) THEN
TRUNCATE TABLE #{table};
  END IF;
END$$;

Manipulating this code in various aspects, I couldn't make it work, because
of I am unfamiliar with PostgreSQL functions and blocks (and variables).

Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be
used to work good as one of the check procedure units, cleaning procedure
should consist of, but haven't accomplished it too.

I would appreciate any hints on how this procedure could be accomplished in
PostgreSQL native way.

Thanks!

UPDATE:

I need all this to run unit and integration tests for Ruby or Ruby on Rails
projects. Each test should have a clean DB before it runs, or to do a
cleanup after itself (so called teardown). Transactions are very good, but
they become unusable when running tests against particular webdrivers, in
my case the switch to truncation strategy is needed. Once I updated that
with reference to RoR, please do not post here the answers about
Obviously, you need DatabaseCleaner for PG and so on and so on.

 post ends 

Thanks,

Stanislaw.


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Craig Ringer

On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:

I cannot! use transactions.

Everything in PostgreSQL uses transactions, they are not optional.

I'm assuming you mean you can't use explicit transaction demarcation, ie 
BEGIN and COMMIT.


 need the fastest cleaning strategy for such case working on 
PostgreSQL both 8 and 9.
Just so you know, there isn't really any PostgreSQL 8 or PostgreSQL 
9. Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct 
major versions. This is different to most software and IMO pretty damn 
annoying, but that's how it is.




1) Truncate each table. It is too slow, I think, especially for empty 
tables.

Really?!? TRUNCATE should be extremely fast, especially on empty tables.

You're aware that you can TRUNCATE many tables in one run, right?

TRUNCATE TABLE a, b, c, d, e, f, g;



2) Check each table for emptiness by more faster method, and then if 
it is empty reset its unique identifier column (analog of 
AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its 
last_value from sequence (the same AUTO_INCREMENT analog) back to 1, 
otherwise run truncate on it.
You can examine the value of SELECT last_value FROM the_sequence ; 
that's the equivalent of the MySQL hack you're using. To set it, use 
'setval(...)'.


http://www.postgresql.org/docs/9.1/static/functions-sequence.html


I use Ruby code to iterate through all tables


If you want to be fast, get rid of iteration. Do it all in one query or 
a couple of simple queries. Minimize the number of round-trips and queries.


I'll be truly stunned if the fastest way isn't to just TRUNCATE all the 
target tables in a single statement (not iteratively one by one with 
separate TRUNCATEs).


--
Craig Ringer


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Craig Ringer

On 07/06/2012 07:29 PM, Craig Ringer wrote:

On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:

I cannot! use transactions.

Everything in PostgreSQL uses transactions, they are not optional.

I'm assuming you mean you can't use explicit transaction demarcation, 
ie BEGIN and COMMIT.


 need the fastest cleaning strategy for such case working on 
PostgreSQL both 8 and 9.
Just so you know, there isn't really any PostgreSQL 8 or PostgreSQL 
9. Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct 
major versions. This is different to most software and IMO pretty damn 
annoying, but that's how it is.




1) Truncate each table. It is too slow, I think, especially for empty 
tables.

Really?!? TRUNCATE should be extremely fast, especially on empty tables.

You're aware that you can TRUNCATE many tables in one run, right?

TRUNCATE TABLE a, b, c, d, e, f, g;



2) Check each table for emptiness by more faster method, and then if 
it is empty reset its unique identifier column (analog of 
AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its 
last_value from sequence (the same AUTO_INCREMENT analog) back to 1, 
otherwise run truncate on it.
You can examine the value of SELECT last_value FROM the_sequence ; 
that's the equivalent of the MySQL hack you're using. To set it, use 
'setval(...)'.


http://www.postgresql.org/docs/9.1/static/functions-sequence.html


I use Ruby code to iterate through all tables


If you want to be fast, get rid of iteration. Do it all in one query 
or a couple of simple queries. Minimize the number of round-trips and 
queries.


I'll be truly stunned if the fastest way isn't to just TRUNCATE all 
the target tables in a single statement (not iteratively one by one 
with separate TRUNCATEs).


Oh, also, you can setval(...) a bunch of sequences at once:

SELECT
  setval('first_seq', 0),
  setval('second_seq', 0),
  setval('third_seq', 0),
  setval('fouth_seq', 0);

... etc. You should only need two statements, fast ones, to reset your 
DB to the default state.


--
Craig Ringer


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Daniel Farina
On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote:
 1) Truncate each table. It is too slow, I think, especially for empty
 tables.

 Really?!? TRUNCATE should be extremely fast, especially on empty tables.

 You're aware that you can TRUNCATE many tables in one run, right?

 TRUNCATE TABLE a, b, c, d, e, f, g;

I have seen in trivial cases -- in terms of data size -- where
TRUNCATE is much slower than a full-table DELETE.  The most common use
case for that is rapid setup/teardown of tests, where it can add up
quite quickly and in a very big way. This is probably an artifact the
speed of one's file system to truncate and/or unlink everything.

I haven't tried a multi-truncate though.  Still, I don't know a
mechanism besides slow file system truncation time that would explain
why DELETE would be significantly faster.

-- 
fdr

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Craig Ringer

On 07/06/2012 07:38 PM, Daniel Farina wrote:

On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote:

1) Truncate each table. It is too slow, I think, especially for empty
tables.

Really?!? TRUNCATE should be extremely fast, especially on empty tables.

You're aware that you can TRUNCATE many tables in one run, right?

TRUNCATE TABLE a, b, c, d, e, f, g;

I have seen in trivial cases -- in terms of data size -- where
TRUNCATE is much slower than a full-table DELETE.  The most common use
case for that is rapid setup/teardown of tests, where it can add up
quite quickly and in a very big way. This is probably an artifact the
speed of one's file system to truncate and/or unlink everything.
That makes some sense, actually. DELETEing from a table that has no 
foreign keys, triggers, etc while nothing else is accessing the table is 
fairly cheap and doesn't take much (any?) cleanup work afterwards. For 
tiny deletes I can easily see it being better than forcing the OS to 
journal a metadata change or two and a couple of fsync()s for a truncate.


I suspect truncating many tables at once will prove a win over 
iteratively DELETEing from many tables at once. I'd benchmark it except 
that it's optimizing something I don't care about at all, and the 
results would be massively dependent on the file system (ext3, ext4, 
xfs) and its journal configuration.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Albe Laurenz
Stanislaw Pankevich wrote:
  PostgreSQL db, 30 tables with number of rows  100 (not huge) - the 
 fastest way to clean each
 non-empty table and reset unique identifier column of empty ones 
 
 I wonder, what is the fastest way to accomplish this kind of task in 
 PostgreSQL. I am interested in
 the fastest solutions ever possible.

 I have following assumptions:
 
 I have 30-100 tables. Let them be 30.
 
 Half of the tables are empty.
 
 Each non-empty table has, say, no more than 100 rows. By this I mean, 
 tables are NOT large.
 
 I need an optional possibility to exclude 2 or 5 or N tables from this 
 procedure.
 
 I cannot! use transactions.

Why? That would definitely speed up everything.

 I need the fastest cleaning strategy for such case working on PostgreSQL both 
 8 and 9.
 
 I see the following approaches:
 
 1) Truncate each table. It is too slow, I think, especially for empty tables.

Did you actually try it? That's the king's way to performance questions!
Truncating a single table is done in a matter of microseconds, particularly
if it is not big.
Do you have tens of thousands of tables?

 2) Check each table for emptiness by more faster method, and then if it is 
 empty reset its unique
 identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), 
 i.e to restore its
 last_value from sequence (the same AUTO_INCREMENT analog) back to 1, 
 otherwise run truncate on it.

That seems fragile an won't work everywhere.

What if the table has no primary key with a DEFAULT that uses a sequence?
What if it has such a key, but the DEFAULT was not used for an INSERT?
What if somebody manually reset the sequence?

Besides, how do you find out what the sequence for a table's primary key
is? With a SELECT, I guess. That SELECT is probably not faster than
a simple TRUNCATE.

 Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be 
 used to work good as one
 of the check procedure units, cleaning procedure should consist of, but 
 haven't accomplished it too.

You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
think that this will be considerably faster than just truncating the table.

 I would appreciate any hints on how this procedure could be accomplished in 
 PostgreSQL native way.
 
 Thanks!
 
 UPDATE:
 
 I need all this to run unit and integration tests for Ruby or Ruby on Rails 
 projects. Each test should
 have a clean DB before it runs, or to do a cleanup after itself (so called 
 teardown). Transactions are
 very good, but they become unusable when running tests against particular 
 webdrivers, in my case the
 switch to truncation strategy is needed. Once I updated that with reference 
 to RoR, please do not post
 here the answers about Obviously, you need DatabaseCleaner for PG and so on 
 and so on.

I completely fail to understand what you talk about here.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Andres Freund
On Friday, July 06, 2012 01:38:56 PM Daniel Farina wrote:
 ll, I don't know a
 mechanism besides slow file system truncation time that would explain
 why DELETE would be significantly faster.
There is no filesystem truncation happening. The heap and the indexes get 
mapped into a new file. Otherwise rollback would be pretty hard to implement.

I guess the biggest cost in a bigger cluster is the dropping the buffers that 
were formerly mapped to that relation (DropRelFileNodeBuffers).

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Craig Ringer

On 07/06/2012 09:45 PM, Stanislaw Pankevich wrote:

Question: Is there a possibility in PostgreSQL to do DELETE on many 
tables massively, like TRUNCATE allows. Like DELETE table1, table2, ...? 


Yes, you can do it with a writable common table expression, but you 
wanted version portability.


WITH
  discard1 AS (DELETE FROM test1),
  discard2 AS (DELETE FROM test2 AS b)
SELECT 1;

Not only will this not work in older versions (IIRC it only works with 
9.1, maybe 9.0 too but I don't see it in the documentation for SELECT 
for 9.0) but I find it hard to imagine any performance benefit over 
simply sending


  DELETE FROM test1; DELETE FROM test2;

This all smells like premature optimisation of cases that don't matter. 
What problem are you solving with this?


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Albe Laurenz
Stanislaw Pankevich wrote:
  PostgreSQL db, 30 tables with number of rows  100 (not huge) - the 
 fastest way to clean each
 non-empty table and reset unique identifier column of empty ones 

 I wonder, what is the fastest way to accomplish this kind of task in 
 PostgreSQL. I am interested in
 the fastest solutions ever possible.

 I need the fastest cleaning strategy for such case working on PostgreSQL 
 both 8 and 9.

 I see the following approaches:

 1) Truncate each table. It is too slow, I think, especially for empty 
 tables.

 Did you actually try it? That's the king's way to performance questions!
 Truncating a single table is done in a matter of microseconds, particularly
 if it is not big.
 Do you have tens of thousands of tables?

 Actually, 10-100 tables.

 You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
 think that this will be considerably faster than just truncating the table.
 
 Exactly this query is much faster, believe me. You can see my latest
 results on https://github.com/stanislaw/truncate-vs-count.

Ok, I believe you.

My quick tests showed that a sible truncate (including transaction and
client-server roundtrip via UNIX sockets takes some 10 to 30 milliseconds.

Multiply that with 100, and you end up with just a few seconds at most.
Or what did you measure?

I guess you run that deletion very often so that it is painful.

Still I think that the biggest performance gain is to be had by using
PostgreSQL's features (truncate several tables in one statement, ...).

Try to bend your Ruby framework!

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Marc Mamin



Stanislaw Pankevich wrote:
  PostgreSQL db, 30 tables with number of rows  100 (not huge) - the 
 fastest way to clean each
 non-empty table and reset unique identifier column of empty ones 

Hello, 

2 'exotic' ideas:

- use dblink_send_query to do the job in multiple threads (I doubt this really 
could be faster)
- have prepared empty tables in a separate schema, and a garbage schema:

   ALTER TABLE x set schema garbage;
   ALTER TABLE prepared.x set schema current;

you should be ready for the next test, 

but still have to clean garbage nad moved to prepared for the next but one in 
the background

best regards,

Marc Mamin






 I wonder, what is the fastest way to accomplish this kind of task in 
 PostgreSQL. I am interested in
 the fastest solutions ever possible.

 I need the fastest cleaning strategy for such case working on PostgreSQL 
 both 8 and 9.

 I see the following approaches:

 1) Truncate each table. It is too slow, I think, especially for empty 
 tables.

 Did you actually try it? That's the king's way to performance questions!
 Truncating a single table is done in a matter of microseconds, particularly
 if it is not big.
 Do you have tens of thousands of tables?

 Actually, 10-100 tables.

 You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
 think that this will be considerably faster than just truncating the table.
 
 Exactly this query is much faster, believe me. You can see my latest
 results on https://github.com/stanislaw/truncate-vs-count.

Ok, I believe you.

My quick tests showed that a sible truncate (including transaction and
client-server roundtrip via UNIX sockets takes some 10 to 30 milliseconds.

Multiply that with 100, and you end up with just a few seconds at most.
Or what did you measure?

I guess you run that deletion very often so that it is painful.

Still I think that the biggest performance gain is to be had by using
PostgreSQL's features (truncate several tables in one statement, ...).

Try to bend your Ruby framework!

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Jeff Janes
On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote:
 On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:

  1) Truncate each table. It is too slow, I think, especially for empty
  tables.

 Really?!? TRUNCATE should be extremely fast, especially on empty tables.

 You're aware that you can TRUNCATE many tables in one run, right?

 TRUNCATE TABLE a, b, c, d, e, f, g;

This still calls DropRelFileNodeAllBuffers once for each table (and
each index), even if the table is empty.

With large shared_buffers, this can be relatively slow.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Steve Crawford

On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote:


 PostgreSQL db, 30 tables with number of rows  100 (not huge) - 
the fastest way to clean each non-empty table and reset unique 
identifier column of empty ones 


I wonder, what is the fastest way to accomplish this kind of task in 
PostgreSQL. I am interested in the fastest solutions ever possible.


It would help if we really understood your use-case. If you want to 
fully reset your database to a known starting state for test runs, why 
not just have a base database initialized exactly as you wish, say 
test_base, then just drop your test database and create the new 
database from your template:

drop database test;
create database test template test_base;

This should be very fast but it won't allow you to exclude individual 
tables.


Are you interested in absolute fastest as a mind-game or is there a 
specific use requirement, i.e. how fast is fast enough? This is the 
basic starting point for tuning, hardware selection, etc.


Truncate should be extremely fast but on tables that are as tiny as 
yours the difference may not be visible to an end-user. I just tried a 
delete from to empty a 10,000 record table and it took 14 milliseconds 
so you could do your maximum of 100 tables each containing 10-times your 
max number of records in less than two seconds.


Regardless of the method you choose, you need to be sure that nobody is 
accessing the database when you reset it. The drop/create database 
method will, of course, require and enforce that. Truncate requires an 
exclusive lock so it may appear to be very slow if it is waiting to get 
that lock. And even if you don't have locking issues, your reluctance to 
wrap your reset code in transactions means that a client could be 
updating some table or tables whenever the reset script isn't actively 
working on that same table leading to unexplained weird test results.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance