[PERFORM] select operations that generate disk writes

2012-07-06 Thread CSS
Hello,

Time for a broad question.  I'm aware of some specific select queries that will 
generate disk writes - for example, a sort operation when there's not enough 
work_mem can cause PG to write out some temp tables (not the correct 
terminology?).  That scenario is easily remedied by enabling log_temp_files 
and specifying the threshold in temp file size at which you want logging to 
happen.

I've recently been trying to put some of my recent reading of Greg's book and 
other performance-related documentation to use by seeking out queries that take 
an inordinate amount of time to run.  Given that we're usually disk-bound, I've 
gotten in the habit of running an iostat in a terminal while running and 
tweaking some of the problem queries.  I find this gives me some nice instant 
feedback on how hard the query is causing PG to hit the disks.  What's 
currently puzzling me are some selects with complex joins and sorts that 
generate some fairly large bursts of write activity while they run.  I was able 
to reduce this by increasing work_mem (client-side) to give the sorts an 
opportunity to happen in memory.  I now see no temp file writes being logged, 
and indeed the query sped up.

So my question is, what else can generate writes when doing read-only 
operations?  I know it sounds like a simple question, but I'm just not finding 
a concise answer anywhere.

Thanks,

Charles
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select operations that generate disk writes

2012-07-06 Thread Pavel Stehule
Hello

2012/7/6 CSS c...@morefoo.com:
 Hello,

 Time for a broad question.  I'm aware of some specific select queries that 
 will generate disk writes - for example, a sort operation when there's not 
 enough work_mem can cause PG to write out some temp tables (not the correct 
 terminology?).  That scenario is easily remedied by enabling log_temp_files 
 and specifying the threshold in temp file size at which you want logging to 
 happen.

 I've recently been trying to put some of my recent reading of Greg's book and 
 other performance-related documentation to use by seeking out queries that 
 take an inordinate amount of time to run.  Given that we're usually 
 disk-bound, I've gotten in the habit of running an iostat in a terminal while 
 running and tweaking some of the problem queries.  I find this gives me some 
 nice instant feedback on how hard the query is causing PG to hit the disks.  
 What's currently puzzling me are some selects with complex joins and sorts 
 that generate some fairly large bursts of write activity while they run.  I 
 was able to reduce this by increasing work_mem (client-side) to give the 
 sorts an opportunity to happen in memory.  I now see no temp file writes 
 being logged, and indeed the query sped up.

 So my question is, what else can generate writes when doing read-only 
 operations?  I know it sounds like a simple question, but I'm just not 
 finding a concise answer anywhere.

statistics 
http://www.postgresql.org/docs/9.1/interactive/runtime-config-statistics.html

Regards

Pavel


 Thanks,

 Charles
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select operations that generate disk writes

2012-07-06 Thread Craig Ringer

On 07/06/2012 02:20 PM, Pavel Stehule wrote:

Hello

2012/7/6 CSS c...@morefoo.com:

So my question is, what else can generate writes when doing read-only 
operations?  I know it sounds like a simple question, but I'm just not finding 
a concise answer anywhere.

statistics 
http://www.postgresql.org/docs/9.1/interactive/runtime-config-statistics.html



 Hint bits, too:

http://wiki.postgresql.org/wiki/Hint_Bits

--
Craig Ringer


[PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Stanislaw Pankevich
Hello,

My question below is almost exact copy of the on on SO:
http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way

The post on SO caused a few answers, all as one stating DO ONLY TRUNCATION
- this is the fast.

Also I think I've met some amount of misunderstanding of what exactly do I
want. I would appreciate it great, if you try, as people whom I may trust
in performance question.

Here goes the SO subject, formulating exact task I want to accomplish, this
procedure is intended to be run beetween after or before each test, ensure
database is cleaned enough and has reset unique identifiers column (User.id
of the first User should be nor the number left from previous test in a
test suite but 1). Here goes the message:

 PostgreSQL db, 30 tables with number of rows  100 (not huge) - the
fastest way to clean each non-empty table and reset unique identifier
column of empty ones 

I wonder, what is the fastest way to accomplish this kind of task in
PostgreSQL. I am interested in the fastest solutions ever possible.

I found myself such kind of solution for MySQL, it performs much faster
than just truncation of tables one by one. But anyway, I am interested in
the fastest solutions for MySQL too. See my result here, of course it it
for MySQL only: https://github.com/bmabey/database_cleaner/issues/126

I have following assumptions:

I have 30-100 tables. Let them be 30.

Half of the tables are empty.

Each non-empty table has, say, no more than 100 rows. By this I mean,
tables are NOT large.

I need an optional possibility to exclude 2 or 5 or N tables from this
procedure.

I cannot! use transactions.

I need the fastest cleaning strategy for such case working on PostgreSQL
both 8 and 9.

I see the following approaches:

1) Truncate each table. It is too slow, I think, especially for empty
tables.

2) Check each table for emptiness by more faster method, and then if it is
empty reset its unique identifier column (analog of AUTO_INCREMENT in
MySQL) to initial state (1), i.e to restore its last_value from sequence
(the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it.

I use Ruby code to iterate through all tables, calling code below on each
of them, I tried to setup SQL code running against each table like:

DO $$DECLARE r record;
BEGIN
  somehow_captured = SELECT last_value from #{table}_id_seq
  IF (somehow_captured == 1) THEN
== restore initial unique identifier column value here ==
  END

  IF (somehow_captured  1) THEN
TRUNCATE TABLE #{table};
  END IF;
END$$;

Manipulating this code in various aspects, I couldn't make it work, because
of I am unfamiliar with PostgreSQL functions and blocks (and variables).

Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be
used to work good as one of the check procedure units, cleaning procedure
should consist of, but haven't accomplished it too.

I would appreciate any hints on how this procedure could be accomplished in
PostgreSQL native way.

Thanks!

UPDATE:

I need all this to run unit and integration tests for Ruby or Ruby on Rails
projects. Each test should have a clean DB before it runs, or to do a
cleanup after itself (so called teardown). Transactions are very good, but
they become unusable when running tests against particular webdrivers, in
my case the switch to truncation strategy is needed. Once I updated that
with reference to RoR, please do not post here the answers about
Obviously, you need DatabaseCleaner for PG and so on and so on.

 post ends 

Thanks,

Stanislaw.


[PERFORM] Paged Query

2012-07-06 Thread Hermann Matthes
I want to implement a paged Query feature, where the user can enter in 
a dialog, how much rows he want to see. After displaying the first page 
of rows, he can can push a button to display the next/previous page.
On database level I could user limit to implement this feature. My 
problem now is, that the user is not permitted to view all rows. For 
every row a permission check is performed and if permission is granted, 
the row is added to the list of rows sent to the client.
If for example the user has entered a page size of 50 and I use limit 
50 to only fetch 50 records, what should I do if he is only permitted 
to see 20 of these 50 records? There may be more records he can view.
But if I don't use limit, what happens if the query would return 
5,000,000 rows? Would my result set contain 5,000,000 rows or would the 
performance of the database go down?


Thanks in advance
Hermann

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] how could select id=xx so slow?

2012-07-06 Thread Yan Chunlu
I have grabbed one day slow query log and analyzed it by pgfouine, to my
surprise, the slowest query is just a simple select statement:

*select diggcontent_data_message.thing_id, diggcontent_data_message.KEY,
diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882;*


where thing_id is the primary key,  guess how long it takes?

754.61 seconds!!

I tried explain analyze it and below is the result, which is very fast:

*
*
*
explain analyze select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY, diggcontent_data_message.value,
diggcontent_data_message.kind FROM diggcontent_data_message WHERE
diggcontent_data_message.thing_id = 3570882;

QUERY PLAN

---
 Index Scan using idx_thing_id_diggcontent_data_message on
diggcontent_data_message  (cost=0.00..15.34 rows=32 width=51) (actual
time=0.080..0.096 rows=8 loops=1)
   Index Cond: (thing_id = 3570882)
 Total runtime: 0.115 ms
(3 rows)
*


so I wonder could this simple select is innocent and affected badly by
other queries? how could I find those queries that really slow down the
database?
thanks!


Re: [PERFORM] how could select id=xx so slow?

2012-07-06 Thread Daniel Farina
On Thu, Jul 5, 2012 at 11:17 PM, Yan Chunlu springri...@gmail.com wrote:
 I have grabbed one day slow query log and analyzed it by pgfouine, to my
 surprise, the slowest query is just a simple select statement:

 select diggcontent_data_message.thing_id, diggcontent_data_message.KEY,
 diggcontent_data_message.value, diggcontent_data_message.kind FROM
 diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882;


 where thing_id is the primary key,  guess how long it takes?

 754.61 seconds!!

Is it possible that the size of the tuple is enormous?  Because one
area where I've noticed EXPLAIN ANALYZE blows away normal performance
is when a lot of the work would be in reassembling, decompressing
(collectively: de-TOASTING) and sending the data.

Even then, that time seems excessive...but something to think about.

-- 
fdr

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how could select id=xx so slow?

2012-07-06 Thread Craig Ringer

On 07/06/2012 02:17 PM, Yan Chunlu wrote:

so I wonder could this simple select is innocent and affected badly by 
other queries? how could I find those queries that really slow down 
the database?


It might not be other queries. Your query could be taking that long 
because it was blocked by a lock during maintenance work (say, an ALTER 
TABLE). It's also quite possible that it was held up by a slow 
checkpoint; check your logs to see if there are warnings about 
checkpoint activity.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Craig Ringer

On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:

I cannot! use transactions.

Everything in PostgreSQL uses transactions, they are not optional.

I'm assuming you mean you can't use explicit transaction demarcation, ie 
BEGIN and COMMIT.


 need the fastest cleaning strategy for such case working on 
PostgreSQL both 8 and 9.
Just so you know, there isn't really any PostgreSQL 8 or PostgreSQL 
9. Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct 
major versions. This is different to most software and IMO pretty damn 
annoying, but that's how it is.




1) Truncate each table. It is too slow, I think, especially for empty 
tables.

Really?!? TRUNCATE should be extremely fast, especially on empty tables.

You're aware that you can TRUNCATE many tables in one run, right?

TRUNCATE TABLE a, b, c, d, e, f, g;



2) Check each table for emptiness by more faster method, and then if 
it is empty reset its unique identifier column (analog of 
AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its 
last_value from sequence (the same AUTO_INCREMENT analog) back to 1, 
otherwise run truncate on it.
You can examine the value of SELECT last_value FROM the_sequence ; 
that's the equivalent of the MySQL hack you're using. To set it, use 
'setval(...)'.


http://www.postgresql.org/docs/9.1/static/functions-sequence.html


I use Ruby code to iterate through all tables


If you want to be fast, get rid of iteration. Do it all in one query or 
a couple of simple queries. Minimize the number of round-trips and queries.


I'll be truly stunned if the fastest way isn't to just TRUNCATE all the 
target tables in a single statement (not iteratively one by one with 
separate TRUNCATEs).


--
Craig Ringer


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Craig Ringer

On 07/06/2012 07:29 PM, Craig Ringer wrote:

On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:

I cannot! use transactions.

Everything in PostgreSQL uses transactions, they are not optional.

I'm assuming you mean you can't use explicit transaction demarcation, 
ie BEGIN and COMMIT.


 need the fastest cleaning strategy for such case working on 
PostgreSQL both 8 and 9.
Just so you know, there isn't really any PostgreSQL 8 or PostgreSQL 
9. Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct 
major versions. This is different to most software and IMO pretty damn 
annoying, but that's how it is.




1) Truncate each table. It is too slow, I think, especially for empty 
tables.

Really?!? TRUNCATE should be extremely fast, especially on empty tables.

You're aware that you can TRUNCATE many tables in one run, right?

TRUNCATE TABLE a, b, c, d, e, f, g;



2) Check each table for emptiness by more faster method, and then if 
it is empty reset its unique identifier column (analog of 
AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its 
last_value from sequence (the same AUTO_INCREMENT analog) back to 1, 
otherwise run truncate on it.
You can examine the value of SELECT last_value FROM the_sequence ; 
that's the equivalent of the MySQL hack you're using. To set it, use 
'setval(...)'.


http://www.postgresql.org/docs/9.1/static/functions-sequence.html


I use Ruby code to iterate through all tables


If you want to be fast, get rid of iteration. Do it all in one query 
or a couple of simple queries. Minimize the number of round-trips and 
queries.


I'll be truly stunned if the fastest way isn't to just TRUNCATE all 
the target tables in a single statement (not iteratively one by one 
with separate TRUNCATEs).


Oh, also, you can setval(...) a bunch of sequences at once:

SELECT
  setval('first_seq', 0),
  setval('second_seq', 0),
  setval('third_seq', 0),
  setval('fouth_seq', 0);

... etc. You should only need two statements, fast ones, to reset your 
DB to the default state.


--
Craig Ringer


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Daniel Farina
On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote:
 1) Truncate each table. It is too slow, I think, especially for empty
 tables.

 Really?!? TRUNCATE should be extremely fast, especially on empty tables.

 You're aware that you can TRUNCATE many tables in one run, right?

 TRUNCATE TABLE a, b, c, d, e, f, g;

I have seen in trivial cases -- in terms of data size -- where
TRUNCATE is much slower than a full-table DELETE.  The most common use
case for that is rapid setup/teardown of tests, where it can add up
quite quickly and in a very big way. This is probably an artifact the
speed of one's file system to truncate and/or unlink everything.

I haven't tried a multi-truncate though.  Still, I don't know a
mechanism besides slow file system truncation time that would explain
why DELETE would be significantly faster.

-- 
fdr

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how could select id=xx so slow?

2012-07-06 Thread Albe Laurenz
Yan Chunlu wrote:
 I have grabbed one day slow query log and analyzed it by pgfouine, to
my surprise, the slowest query
 is just a simple select statement:
 
 select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY,
 diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE
 diggcontent_data_message.thing_id = 3570882;
 
 where thing_id is the primary key,  guess how long it takes?
 
 754.61 seconds!!
 
 I tried explain analyze it and below is the result, which is very
fast:
 
 explain analyze select diggcontent_data_message.thing_id,
diggcontent_data_message.KEY,
 diggcontent_data_message.value, diggcontent_data_message.kind FROM
diggcontent_data_message WHERE
 diggcontent_data_message.thing_id = 3570882;

QUERY PLAN


--
 -
  Index Scan using idx_thing_id_diggcontent_data_message on
diggcontent_data_message  (cost=0.00..15.34
 rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1)
Index Cond: (thing_id = 3570882)
  Total runtime: 0.115 ms
 (3 rows)
 
 so I wonder could this simple select is innocent and affected badly by
other queries? how could I find
 those queries that really slow down the database?

Are these by any chance the aggregated costs in pgFouine?
Could it be that the statement just ran very often and used that time in
total?

Other than that, it could have been blocked by something that takes an
exclusive lock on the table.

There are no ON SELECT DO INSTEAD rules or similar things on the table,
right?

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Paged Query

2012-07-06 Thread Віталій Тимчишин
What language are you using? Usually there is iterator with chunked fetch
option (like setFetchSize in java jdbc). So you are passing query without
limit and then read as many results as you need. Note that query plan in
this case won't be optimized for your limit and I don't remember if
postgres has optimize for N rows statement option.
Also, if your statement is ordered by some key, you can use general paging
technique when you rerun query with keymax_prev_value filter to get next
chunk.

Середа, 4 липня 2012 р. користувач Hermann Matthes hermann.matt...@web.de
написав:
 I want to implement a paged Query feature, where the user can enter in
a dialog, how much rows he want to see. After displaying the first page of
rows, he can can push a button to display the next/previous page.
 On database level I could user limit to implement this feature. My
problem now is, that the user is not permitted to view all rows. For every
row a permission check is performed and if permission is granted, the row
is added to the list of rows sent to the client.
 If for example the user has entered a page size of 50 and I use limit
50 to only fetch 50 records, what should I do if he is only permitted to
see 20 of these 50 records? There may be more records he can view.
 But if I don't use limit, what happens if the query would return
5,000,000 rows? Would my result set contain 5,000,000 rows or would the
performance of the database go down?

 Thanks in advance
 Hermann

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Best regards,
 Vitalii Tymchyshyn


Re: [PERFORM] Paged Query

2012-07-06 Thread Albe Laurenz
Hermann Matthes wrote:
 I want to implement a paged Query feature, where the user can enter
in
 a dialog, how much rows he want to see. After displaying the first
page
 of rows, he can can push a button to display the next/previous page.
 On database level I could user limit to implement this feature. My
 problem now is, that the user is not permitted to view all rows. For
 every row a permission check is performed and if permission is
granted,
 the row is added to the list of rows sent to the client.
 If for example the user has entered a page size of 50 and I use limit
 50 to only fetch 50 records, what should I do if he is only permitted
 to see 20 of these 50 records? There may be more records he can view.
 But if I don't use limit, what happens if the query would return
 5,000,000 rows? Would my result set contain 5,000,000 rows or would
the
 performance of the database go down?

Selecting all 500 rows would consume a lot of memory wherever
they are cached. Also, it might lead to bad response times (with
an appropriate LIMIT clause, the server can choose a plan that
returns the first few rows quickly).

I assume that there is some kind of ORDER BY involved, so that
the order of rows displayed is not random.

I have two ideas:
- Try to integrate the permission check in the query.
  It might be more efficient, and you could just use LIMIT
  and OFFSET like you intended.
- Select some more rows than you want to display on one page,
  perform the permission checks. Stop when you reach the end
  or have enough rows. Remember the sort key of the last row
  processed.
  When the next page is to be displayed, use the remembered
  sort key value to SELECT the next rows.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Paged Query

2012-07-06 Thread Greg Spiegelberg
On Wed, Jul 4, 2012 at 6:25 AM, Hermann Matthes hermann.matt...@web.dewrote:

 I want to implement a paged Query feature, where the user can enter in a
 dialog, how much rows he want to see. After displaying the first page of
 rows, he can can push a button to display the next/previous page.
 On database level I could user limit to implement this feature. My
 problem now is, that the user is not permitted to view all rows. For every
 row a permission check is performed and if permission is granted, the row
 is added to the list of rows sent to the client.
 If for example the user has entered a page size of 50 and I use limit 50
 to only fetch 50 records, what should I do if he is only permitted to see
 20 of these 50 records? There may be more records he can view.
 But if I don't use limit, what happens if the query would return
 5,000,000 rows? Would my result set contain 5,000,000 rows or would the
 performance of the database go down?


Sounds like your permission check is not implemented in the database.  If
it were, those records would be excluded and the OFFSET-LIMIT combo would
be your solution.  Also appears that you have access to the application.
 If so, I would recommend implementing the permission check in the
database.  Much cleaner from a query  pagination standpoint.

An alternative is to have the application complicate the query with the
appropriate permission logic excluding the unviewable records from the
final ORDER BY-OFFSET-LIMIT.  This will give you an accurate page count.

IMHO, the worst alternative is to select your max page size, exclude rows
the user cannot see, rinse and repeat until you have your records per page
limit.  Whatever you're ordering on will serve as the page number.  Issue
with this solution is you may not have an accurate page count.

Luck.

-Greg


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Craig Ringer

On 07/06/2012 07:38 PM, Daniel Farina wrote:

On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote:

1) Truncate each table. It is too slow, I think, especially for empty
tables.

Really?!? TRUNCATE should be extremely fast, especially on empty tables.

You're aware that you can TRUNCATE many tables in one run, right?

TRUNCATE TABLE a, b, c, d, e, f, g;

I have seen in trivial cases -- in terms of data size -- where
TRUNCATE is much slower than a full-table DELETE.  The most common use
case for that is rapid setup/teardown of tests, where it can add up
quite quickly and in a very big way. This is probably an artifact the
speed of one's file system to truncate and/or unlink everything.
That makes some sense, actually. DELETEing from a table that has no 
foreign keys, triggers, etc while nothing else is accessing the table is 
fairly cheap and doesn't take much (any?) cleanup work afterwards. For 
tiny deletes I can easily see it being better than forcing the OS to 
journal a metadata change or two and a couple of fsync()s for a truncate.


I suspect truncating many tables at once will prove a win over 
iteratively DELETEing from many tables at once. I'd benchmark it except 
that it's optimizing something I don't care about at all, and the 
results would be massively dependent on the file system (ext3, ext4, 
xfs) and its journal configuration.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Albe Laurenz
Stanislaw Pankevich wrote:
  PostgreSQL db, 30 tables with number of rows  100 (not huge) - the 
 fastest way to clean each
 non-empty table and reset unique identifier column of empty ones 
 
 I wonder, what is the fastest way to accomplish this kind of task in 
 PostgreSQL. I am interested in
 the fastest solutions ever possible.

 I have following assumptions:
 
 I have 30-100 tables. Let them be 30.
 
 Half of the tables are empty.
 
 Each non-empty table has, say, no more than 100 rows. By this I mean, 
 tables are NOT large.
 
 I need an optional possibility to exclude 2 or 5 or N tables from this 
 procedure.
 
 I cannot! use transactions.

Why? That would definitely speed up everything.

 I need the fastest cleaning strategy for such case working on PostgreSQL both 
 8 and 9.
 
 I see the following approaches:
 
 1) Truncate each table. It is too slow, I think, especially for empty tables.

Did you actually try it? That's the king's way to performance questions!
Truncating a single table is done in a matter of microseconds, particularly
if it is not big.
Do you have tens of thousands of tables?

 2) Check each table for emptiness by more faster method, and then if it is 
 empty reset its unique
 identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), 
 i.e to restore its
 last_value from sequence (the same AUTO_INCREMENT analog) back to 1, 
 otherwise run truncate on it.

That seems fragile an won't work everywhere.

What if the table has no primary key with a DEFAULT that uses a sequence?
What if it has such a key, but the DEFAULT was not used for an INSERT?
What if somebody manually reset the sequence?

Besides, how do you find out what the sequence for a table's primary key
is? With a SELECT, I guess. That SELECT is probably not faster than
a simple TRUNCATE.

 Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be 
 used to work good as one
 of the check procedure units, cleaning procedure should consist of, but 
 haven't accomplished it too.

You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
think that this will be considerably faster than just truncating the table.

 I would appreciate any hints on how this procedure could be accomplished in 
 PostgreSQL native way.
 
 Thanks!
 
 UPDATE:
 
 I need all this to run unit and integration tests for Ruby or Ruby on Rails 
 projects. Each test should
 have a clean DB before it runs, or to do a cleanup after itself (so called 
 teardown). Transactions are
 very good, but they become unusable when running tests against particular 
 webdrivers, in my case the
 switch to truncation strategy is needed. Once I updated that with reference 
 to RoR, please do not post
 here the answers about Obviously, you need DatabaseCleaner for PG and so on 
 and so on.

I completely fail to understand what you talk about here.

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Paged Query

2012-07-06 Thread Misa Simic
Hi Hermann,

Well,

Not clear how you get rows for user without paging?

If it is some query:

SELECT columns FROM table WHERE UserHasPerimision(rowPK, userid)

Paging would be:

SELECT columns FROM table WHERE UserHasPerimision(rowPK, userid) LIMIT
NoOfRecords OFFSET page*NoOfRecords

Kind Regards,

Misa

2012/7/4 Hermann Matthes hermann.matt...@web.de

 I want to implement a paged Query feature, where the user can enter in a
 dialog, how much rows he want to see. After displaying the first page of
 rows, he can can push a button to display the next/previous page.
 On database level I could user limit to implement this feature. My
 problem now is, that the user is not permitted to view all rows. For every
 row a permission check is performed and if permission is granted, the row
 is added to the list of rows sent to the client.
 If for example the user has entered a page size of 50 and I use limit 50
 to only fetch 50 records, what should I do if he is only permitted to see
 20 of these 50 records? There may be more records he can view.
 But if I don't use limit, what happens if the query would return
 5,000,000 rows? Would my result set contain 5,000,000 rows or would the
 performance of the database go down?

 Thanks in advance
 Hermann

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.**
 org pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/**mailpref/pgsql-performancehttp://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Andres Freund
On Friday, July 06, 2012 01:38:56 PM Daniel Farina wrote:
 ll, I don't know a
 mechanism besides slow file system truncation time that would explain
 why DELETE would be significantly faster.
There is no filesystem truncation happening. The heap and the indexes get 
mapped into a new file. Otherwise rollback would be pretty hard to implement.

I guess the biggest cost in a bigger cluster is the dropping the buffers that 
were formerly mapped to that relation (DropRelFileNodeBuffers).

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Craig Ringer

On 07/06/2012 09:45 PM, Stanislaw Pankevich wrote:

Question: Is there a possibility in PostgreSQL to do DELETE on many 
tables massively, like TRUNCATE allows. Like DELETE table1, table2, ...? 


Yes, you can do it with a writable common table expression, but you 
wanted version portability.


WITH
  discard1 AS (DELETE FROM test1),
  discard2 AS (DELETE FROM test2 AS b)
SELECT 1;

Not only will this not work in older versions (IIRC it only works with 
9.1, maybe 9.0 too but I don't see it in the documentation for SELECT 
for 9.0) but I find it hard to imagine any performance benefit over 
simply sending


  DELETE FROM test1; DELETE FROM test2;

This all smells like premature optimisation of cases that don't matter. 
What problem are you solving with this?


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Albe Laurenz
Stanislaw Pankevich wrote:
  PostgreSQL db, 30 tables with number of rows  100 (not huge) - the 
 fastest way to clean each
 non-empty table and reset unique identifier column of empty ones 

 I wonder, what is the fastest way to accomplish this kind of task in 
 PostgreSQL. I am interested in
 the fastest solutions ever possible.

 I need the fastest cleaning strategy for such case working on PostgreSQL 
 both 8 and 9.

 I see the following approaches:

 1) Truncate each table. It is too slow, I think, especially for empty 
 tables.

 Did you actually try it? That's the king's way to performance questions!
 Truncating a single table is done in a matter of microseconds, particularly
 if it is not big.
 Do you have tens of thousands of tables?

 Actually, 10-100 tables.

 You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
 think that this will be considerably faster than just truncating the table.
 
 Exactly this query is much faster, believe me. You can see my latest
 results on https://github.com/stanislaw/truncate-vs-count.

Ok, I believe you.

My quick tests showed that a sible truncate (including transaction and
client-server roundtrip via UNIX sockets takes some 10 to 30 milliseconds.

Multiply that with 100, and you end up with just a few seconds at most.
Or what did you measure?

I guess you run that deletion very often so that it is painful.

Still I think that the biggest performance gain is to be had by using
PostgreSQL's features (truncate several tables in one statement, ...).

Try to bend your Ruby framework!

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Create tables performance

2012-07-06 Thread Sylvain CAILLET

Hi to all, 


I run Postgresql 8.3.9 on a dedicated server running with Debian 5.0.4, a 
strong bi quad-proc with RAM 16Go. My biggest db contains at least 100 000 
tables. Last time, I started a Java process I use to make some change on it, it 
created 170 new tables and it took one full minute. That is a very long time 
for such a process on such a server ! 
Do you think there could be some configuration tuning to do to improve the 
performance for create tables ? 
Or do I have to use tablespaces because 10 files in a single folder is a 
too many for OS ? 
It's possible to migrate the DB in 9.1 version. Do you think it could solve the 
trouble ? 


Thank you all for your advices, 


Best regards 


Sylvain 

Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Marc Mamin



Stanislaw Pankevich wrote:
  PostgreSQL db, 30 tables with number of rows  100 (not huge) - the 
 fastest way to clean each
 non-empty table and reset unique identifier column of empty ones 

Hello, 

2 'exotic' ideas:

- use dblink_send_query to do the job in multiple threads (I doubt this really 
could be faster)
- have prepared empty tables in a separate schema, and a garbage schema:

   ALTER TABLE x set schema garbage;
   ALTER TABLE prepared.x set schema current;

you should be ready for the next test, 

but still have to clean garbage nad moved to prepared for the next but one in 
the background

best regards,

Marc Mamin






 I wonder, what is the fastest way to accomplish this kind of task in 
 PostgreSQL. I am interested in
 the fastest solutions ever possible.

 I need the fastest cleaning strategy for such case working on PostgreSQL 
 both 8 and 9.

 I see the following approaches:

 1) Truncate each table. It is too slow, I think, especially for empty 
 tables.

 Did you actually try it? That's the king's way to performance questions!
 Truncating a single table is done in a matter of microseconds, particularly
 if it is not big.
 Do you have tens of thousands of tables?

 Actually, 10-100 tables.

 You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't
 think that this will be considerably faster than just truncating the table.
 
 Exactly this query is much faster, believe me. You can see my latest
 results on https://github.com/stanislaw/truncate-vs-count.

Ok, I believe you.

My quick tests showed that a sible truncate (including transaction and
client-server roundtrip via UNIX sockets takes some 10 to 30 milliseconds.

Multiply that with 100, and you end up with just a few seconds at most.
Or what did you measure?

I guess you run that deletion very often so that it is painful.

Still I think that the biggest performance gain is to be had by using
PostgreSQL's features (truncate several tables in one statement, ...).

Try to bend your Ruby framework!

Yours,
Laurenz Albe

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Jeff Janes
On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote:
 On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote:

  1) Truncate each table. It is too slow, I think, especially for empty
  tables.

 Really?!? TRUNCATE should be extremely fast, especially on empty tables.

 You're aware that you can TRUNCATE many tables in one run, right?

 TRUNCATE TABLE a, b, c, d, e, f, g;

This still calls DropRelFileNodeAllBuffers once for each table (and
each index), even if the table is empty.

With large shared_buffers, this can be relatively slow.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Steve Crawford

On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote:


 PostgreSQL db, 30 tables with number of rows  100 (not huge) - 
the fastest way to clean each non-empty table and reset unique 
identifier column of empty ones 


I wonder, what is the fastest way to accomplish this kind of task in 
PostgreSQL. I am interested in the fastest solutions ever possible.


It would help if we really understood your use-case. If you want to 
fully reset your database to a known starting state for test runs, why 
not just have a base database initialized exactly as you wish, say 
test_base, then just drop your test database and create the new 
database from your template:

drop database test;
create database test template test_base;

This should be very fast but it won't allow you to exclude individual 
tables.


Are you interested in absolute fastest as a mind-game or is there a 
specific use requirement, i.e. how fast is fast enough? This is the 
basic starting point for tuning, hardware selection, etc.


Truncate should be extremely fast but on tables that are as tiny as 
yours the difference may not be visible to an end-user. I just tried a 
delete from to empty a 10,000 record table and it took 14 milliseconds 
so you could do your maximum of 100 tables each containing 10-times your 
max number of records in less than two seconds.


Regardless of the method you choose, you need to be sure that nobody is 
accessing the database when you reset it. The drop/create database 
method will, of course, require and enforce that. Truncate requires an 
exclusive lock so it may appear to be very slow if it is waiting to get 
that lock. And even if you don't have locking issues, your reluctance to 
wrap your reset code in transactions means that a client could be 
updating some table or tables whenever the reset script isn't actively 
working on that same table leading to unexplained weird test results.


Cheers,
Steve


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Create tables performance

2012-07-06 Thread Jeff Janes
On Fri, Jul 6, 2012 at 8:15 AM, Sylvain CAILLET scail...@alaloop.com wrote:
 Hi to all,

 I run Postgresql 8.3.9 on a dedicated server running with Debian 5.0.4, a
 strong bi quad-proc with RAM 16Go. My biggest db contains at least 100 000
 tables. Last time, I started a Java process I use to make some change on it,
 it created 170 new tables and it took one full minute. That is a very long
 time for  such a process on such a server !

What if you create those 170 tables in a database without 100,000
pre-existing tables?

What else does your script do?

I can create 170 tables each with 10 rows in a database containing
100,000 other tables in less than a second on 8.3.9, either all in one
transaction or in ~340 separate transactions.

So whatever problem you are having is probably specific to your
details, not a generic issue.  It is hard to say if an upgrade would
help if the root cause is not known.

What do the standard monitoring tools show?  Are you IO bound, or CPU
bound?  If CPU, is it in postgres or in java?

 Do you think there could be some configuration tuning to do to improve the
 performance for create tables ?
 Or do I have to use tablespaces because 10 files in a single folder is a
 too many for OS ?

I doubt that that is a problem on any reasonably modern Linux.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Create tables performance

2012-07-06 Thread Richard Huxton

On 06/07/12 16:15, Sylvain CAILLET wrote:

Hi to all,

I run Postgresql 8.3.9 on a dedicated server running with Debian 5.0.4,
a strong bi quad-proc with RAM 16Go. My biggest db contains at least 100
000 tables.


That is a *lot* of tables and it's probably going to be slow whatever 
you do.



Last time, I started a Java process I use to make some
change on it, it created 170 new tables and it took one full minute.


What are you using all these tables for? I'm assuming most of them have 
identical structure.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.

2012-07-06 Thread Chris Hanks

Daniel Farina-4 wrote
 
 On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer lt;ringerc@.idgt; wrote:
 1) Truncate each table. It is too slow, I think, especially for empty
 tables.

 Really?!? TRUNCATE should be extremely fast, especially on empty tables.

 You're aware that you can TRUNCATE many tables in one run, right?

 TRUNCATE TABLE a, b, c, d, e, f, g;
 
 I have seen in trivial cases -- in terms of data size -- where
 TRUNCATE is much slower than a full-table DELETE.  The most common use
 case for that is rapid setup/teardown of tests, where it can add up
 quite quickly and in a very big way. This is probably an artifact the
 speed of one's file system to truncate and/or unlink everything.
 
 I haven't tried a multi-truncate though.  Still, I don't know a
 mechanism besides slow file system truncation time that would explain
 why DELETE would be significantly faster.
 
 -- 
 fdr
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 

That's my experience - I have a set of regression tests that clean the
database (deletes everything from a single parent table and lets the
referential integrity checks cascade to delete five other tables) at the end
of each test run, and it can complete 90 tests (including 90 mass deletes)
in a little over five seconds. If I replace that simple delete with a
truncation of all six tables at once, my test run balloons to 42 seconds.

I run my development database with synchronous_commit = off, though, so I
guess TRUNCATE has to hit the disk while the mass delete doesn't.

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/PostgreSQL-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way-to-clean-each-non-empty-tab-tp5715643p5715734.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Create tables performance

2012-07-06 Thread Craig Ringer

On 07/06/2012 11:15 PM, Sylvain CAILLET wrote:

Hi to all,

I run Postgresql 8.3.9 on a dedicated server running with Debian 
5.0.4, a strong bi quad-proc with RAM 16Go. My biggest db contains at 
least 100 000 tables. Last time, I started a Java process I use to 
make some change on it, it created 170 new tables and it took one full 
minute. That is a very long time for  such a process on such a server !
If you create and drop a lot of tables, you need to make sure you're 
vacuuming the pg_catalog tables frequently. Newer versions mostly take 
care of this for you, but on 8.3 you'll at minimum have to turn 
autovaccum right up.


See what happens if you run in psql, as a Pg superuser (usually the 
postgres account):


  CLUSTER pg_class_oid_index ON pg_catalog.pg_class;
  CLUSTER pg_type_oid_index ON pg_catalog.pg_type;
  CLUSTER pg_attribute_relid_attnam_index ON pg_catalog.pg_attribute;
  CLUSTER pg_index_indexrelid_index ON pg_catalog.pg_index;

I'm guessing you have severe table bloat in your catalogs, in which case 
this may help. I use CLUSTER instead of VACCUUM FULL because on old 
versions like 8.3 it'll run faster and sort the indexes for you too.


Do you think there could be some configuration tuning to do to improve 
the performance for create tables ?
Or do I have to use tablespaces because 10 files in a single 
folder is a too many for OS ?


That won't be a problem unless your OS and file system are truly crap.

--
Craig Ringer