[PERFORM] select operations that generate disk writes
Hello, Time for a broad question. I'm aware of some specific select queries that will generate disk writes - for example, a sort operation when there's not enough work_mem can cause PG to write out some temp tables (not the correct terminology?). That scenario is easily remedied by enabling log_temp_files and specifying the threshold in temp file size at which you want logging to happen. I've recently been trying to put some of my recent reading of Greg's book and other performance-related documentation to use by seeking out queries that take an inordinate amount of time to run. Given that we're usually disk-bound, I've gotten in the habit of running an iostat in a terminal while running and tweaking some of the problem queries. I find this gives me some nice instant feedback on how hard the query is causing PG to hit the disks. What's currently puzzling me are some selects with complex joins and sorts that generate some fairly large bursts of write activity while they run. I was able to reduce this by increasing work_mem (client-side) to give the sorts an opportunity to happen in memory. I now see no temp file writes being logged, and indeed the query sped up. So my question is, what else can generate writes when doing read-only operations? I know it sounds like a simple question, but I'm just not finding a concise answer anywhere. Thanks, Charles -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select operations that generate disk writes
Hello 2012/7/6 CSS c...@morefoo.com: Hello, Time for a broad question. I'm aware of some specific select queries that will generate disk writes - for example, a sort operation when there's not enough work_mem can cause PG to write out some temp tables (not the correct terminology?). That scenario is easily remedied by enabling log_temp_files and specifying the threshold in temp file size at which you want logging to happen. I've recently been trying to put some of my recent reading of Greg's book and other performance-related documentation to use by seeking out queries that take an inordinate amount of time to run. Given that we're usually disk-bound, I've gotten in the habit of running an iostat in a terminal while running and tweaking some of the problem queries. I find this gives me some nice instant feedback on how hard the query is causing PG to hit the disks. What's currently puzzling me are some selects with complex joins and sorts that generate some fairly large bursts of write activity while they run. I was able to reduce this by increasing work_mem (client-side) to give the sorts an opportunity to happen in memory. I now see no temp file writes being logged, and indeed the query sped up. So my question is, what else can generate writes when doing read-only operations? I know it sounds like a simple question, but I'm just not finding a concise answer anywhere. statistics http://www.postgresql.org/docs/9.1/interactive/runtime-config-statistics.html Regards Pavel Thanks, Charles -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select operations that generate disk writes
On 07/06/2012 02:20 PM, Pavel Stehule wrote: Hello 2012/7/6 CSS c...@morefoo.com: So my question is, what else can generate writes when doing read-only operations? I know it sounds like a simple question, but I'm just not finding a concise answer anywhere. statistics http://www.postgresql.org/docs/9.1/interactive/runtime-config-statistics.html Hint bits, too: http://wiki.postgresql.org/wiki/Hint_Bits -- Craig Ringer
[PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Hello, My question below is almost exact copy of the on on SO: http://stackoverflow.com/questions/11311079/postgresql-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way The post on SO caused a few answers, all as one stating DO ONLY TRUNCATION - this is the fast. Also I think I've met some amount of misunderstanding of what exactly do I want. I would appreciate it great, if you try, as people whom I may trust in performance question. Here goes the SO subject, formulating exact task I want to accomplish, this procedure is intended to be run beetween after or before each test, ensure database is cleaned enough and has reset unique identifiers column (User.id of the first User should be nor the number left from previous test in a test suite but 1). Here goes the message: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. I found myself such kind of solution for MySQL, it performs much faster than just truncation of tables one by one. But anyway, I am interested in the fastest solutions for MySQL too. See my result here, of course it it for MySQL only: https://github.com/bmabey/database_cleaner/issues/126 I have following assumptions: I have 30-100 tables. Let them be 30. Half of the tables are empty. Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large. I need an optional possibility to exclude 2 or 5 or N tables from this procedure. I cannot! use transactions. I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. I see the following approaches: 1) Truncate each table. It is too slow, I think, especially for empty tables. 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. I use Ruby code to iterate through all tables, calling code below on each of them, I tried to setup SQL code running against each table like: DO $$DECLARE r record; BEGIN somehow_captured = SELECT last_value from #{table}_id_seq IF (somehow_captured == 1) THEN == restore initial unique identifier column value here == END IF (somehow_captured 1) THEN TRUNCATE TABLE #{table}; END IF; END$$; Manipulating this code in various aspects, I couldn't make it work, because of I am unfamiliar with PostgreSQL functions and blocks (and variables). Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the check procedure units, cleaning procedure should consist of, but haven't accomplished it too. I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way. Thanks! UPDATE: I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about Obviously, you need DatabaseCleaner for PG and so on and so on. post ends Thanks, Stanislaw.
[PERFORM] Paged Query
I want to implement a paged Query feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page. On database level I could user limit to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client. If for example the user has entered a page size of 50 and I use limit 50 to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view. But if I don't use limit, what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down? Thanks in advance Hermann -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] how could select id=xx so slow?
I have grabbed one day slow query log and analyzed it by pgfouine, to my surprise, the slowest query is just a simple select statement: *select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882;* where thing_id is the primary key, guess how long it takes? 754.61 seconds!! I tried explain analyze it and below is the result, which is very fast: * * * explain analyze select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882; QUERY PLAN --- Index Scan using idx_thing_id_diggcontent_data_message on diggcontent_data_message (cost=0.00..15.34 rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1) Index Cond: (thing_id = 3570882) Total runtime: 0.115 ms (3 rows) * so I wonder could this simple select is innocent and affected badly by other queries? how could I find those queries that really slow down the database? thanks!
Re: [PERFORM] how could select id=xx so slow?
On Thu, Jul 5, 2012 at 11:17 PM, Yan Chunlu springri...@gmail.com wrote: I have grabbed one day slow query log and analyzed it by pgfouine, to my surprise, the slowest query is just a simple select statement: select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882; where thing_id is the primary key, guess how long it takes? 754.61 seconds!! Is it possible that the size of the tuple is enormous? Because one area where I've noticed EXPLAIN ANALYZE blows away normal performance is when a lot of the work would be in reassembling, decompressing (collectively: de-TOASTING) and sending the data. Even then, that time seems excessive...but something to think about. -- fdr -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how could select id=xx so slow?
On 07/06/2012 02:17 PM, Yan Chunlu wrote: so I wonder could this simple select is innocent and affected badly by other queries? how could I find those queries that really slow down the database? It might not be other queries. Your query could be taking that long because it was blocked by a lock during maintenance work (say, an ALTER TABLE). It's also quite possible that it was held up by a slow checkpoint; check your logs to see if there are warnings about checkpoint activity. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote: I cannot! use transactions. Everything in PostgreSQL uses transactions, they are not optional. I'm assuming you mean you can't use explicit transaction demarcation, ie BEGIN and COMMIT. need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. Just so you know, there isn't really any PostgreSQL 8 or PostgreSQL 9. Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct major versions. This is different to most software and IMO pretty damn annoying, but that's how it is. 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. You can examine the value of SELECT last_value FROM the_sequence ; that's the equivalent of the MySQL hack you're using. To set it, use 'setval(...)'. http://www.postgresql.org/docs/9.1/static/functions-sequence.html I use Ruby code to iterate through all tables If you want to be fast, get rid of iteration. Do it all in one query or a couple of simple queries. Minimize the number of round-trips and queries. I'll be truly stunned if the fastest way isn't to just TRUNCATE all the target tables in a single statement (not iteratively one by one with separate TRUNCATEs). -- Craig Ringer
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/06/2012 07:29 PM, Craig Ringer wrote: On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote: I cannot! use transactions. Everything in PostgreSQL uses transactions, they are not optional. I'm assuming you mean you can't use explicit transaction demarcation, ie BEGIN and COMMIT. need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. Just so you know, there isn't really any PostgreSQL 8 or PostgreSQL 9. Major versions are x.y, eg 8.4, 9.0, 9.1 and 9.2 are all distinct major versions. This is different to most software and IMO pretty damn annoying, but that's how it is. 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. You can examine the value of SELECT last_value FROM the_sequence ; that's the equivalent of the MySQL hack you're using. To set it, use 'setval(...)'. http://www.postgresql.org/docs/9.1/static/functions-sequence.html I use Ruby code to iterate through all tables If you want to be fast, get rid of iteration. Do it all in one query or a couple of simple queries. Minimize the number of round-trips and queries. I'll be truly stunned if the fastest way isn't to just TRUNCATE all the target tables in a single statement (not iteratively one by one with separate TRUNCATEs). Oh, also, you can setval(...) a bunch of sequences at once: SELECT setval('first_seq', 0), setval('second_seq', 0), setval('third_seq', 0), setval('fouth_seq', 0); ... etc. You should only need two statements, fast ones, to reset your DB to the default state. -- Craig Ringer
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote: 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; I have seen in trivial cases -- in terms of data size -- where TRUNCATE is much slower than a full-table DELETE. The most common use case for that is rapid setup/teardown of tests, where it can add up quite quickly and in a very big way. This is probably an artifact the speed of one's file system to truncate and/or unlink everything. I haven't tried a multi-truncate though. Still, I don't know a mechanism besides slow file system truncation time that would explain why DELETE would be significantly faster. -- fdr -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] how could select id=xx so slow?
Yan Chunlu wrote: I have grabbed one day slow query log and analyzed it by pgfouine, to my surprise, the slowest query is just a simple select statement: select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882; where thing_id is the primary key, guess how long it takes? 754.61 seconds!! I tried explain analyze it and below is the result, which is very fast: explain analyze select diggcontent_data_message.thing_id, diggcontent_data_message.KEY, diggcontent_data_message.value, diggcontent_data_message.kind FROM diggcontent_data_message WHERE diggcontent_data_message.thing_id = 3570882; QUERY PLAN -- - Index Scan using idx_thing_id_diggcontent_data_message on diggcontent_data_message (cost=0.00..15.34 rows=32 width=51) (actual time=0.080..0.096 rows=8 loops=1) Index Cond: (thing_id = 3570882) Total runtime: 0.115 ms (3 rows) so I wonder could this simple select is innocent and affected badly by other queries? how could I find those queries that really slow down the database? Are these by any chance the aggregated costs in pgFouine? Could it be that the statement just ran very often and used that time in total? Other than that, it could have been blocked by something that takes an exclusive lock on the table. There are no ON SELECT DO INSTEAD rules or similar things on the table, right? Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Paged Query
What language are you using? Usually there is iterator with chunked fetch option (like setFetchSize in java jdbc). So you are passing query without limit and then read as many results as you need. Note that query plan in this case won't be optimized for your limit and I don't remember if postgres has optimize for N rows statement option. Also, if your statement is ordered by some key, you can use general paging technique when you rerun query with keymax_prev_value filter to get next chunk. Середа, 4 липня 2012 р. користувач Hermann Matthes hermann.matt...@web.de написав: I want to implement a paged Query feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page. On database level I could user limit to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client. If for example the user has entered a page size of 50 and I use limit 50 to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view. But if I don't use limit, what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down? Thanks in advance Hermann -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Best regards, Vitalii Tymchyshyn
Re: [PERFORM] Paged Query
Hermann Matthes wrote: I want to implement a paged Query feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page. On database level I could user limit to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client. If for example the user has entered a page size of 50 and I use limit 50 to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view. But if I don't use limit, what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down? Selecting all 500 rows would consume a lot of memory wherever they are cached. Also, it might lead to bad response times (with an appropriate LIMIT clause, the server can choose a plan that returns the first few rows quickly). I assume that there is some kind of ORDER BY involved, so that the order of rows displayed is not random. I have two ideas: - Try to integrate the permission check in the query. It might be more efficient, and you could just use LIMIT and OFFSET like you intended. - Select some more rows than you want to display on one page, perform the permission checks. Stop when you reach the end or have enough rows. Remember the sort key of the last row processed. When the next page is to be displayed, use the remembered sort key value to SELECT the next rows. Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Paged Query
On Wed, Jul 4, 2012 at 6:25 AM, Hermann Matthes hermann.matt...@web.dewrote: I want to implement a paged Query feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page. On database level I could user limit to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client. If for example the user has entered a page size of 50 and I use limit 50 to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view. But if I don't use limit, what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down? Sounds like your permission check is not implemented in the database. If it were, those records would be excluded and the OFFSET-LIMIT combo would be your solution. Also appears that you have access to the application. If so, I would recommend implementing the permission check in the database. Much cleaner from a query pagination standpoint. An alternative is to have the application complicate the query with the appropriate permission logic excluding the unviewable records from the final ORDER BY-OFFSET-LIMIT. This will give you an accurate page count. IMHO, the worst alternative is to select your max page size, exclude rows the user cannot see, rinse and repeat until you have your records per page limit. Whatever you're ordering on will serve as the page number. Issue with this solution is you may not have an accurate page count. Luck. -Greg
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/06/2012 07:38 PM, Daniel Farina wrote: On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote: 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; I have seen in trivial cases -- in terms of data size -- where TRUNCATE is much slower than a full-table DELETE. The most common use case for that is rapid setup/teardown of tests, where it can add up quite quickly and in a very big way. This is probably an artifact the speed of one's file system to truncate and/or unlink everything. That makes some sense, actually. DELETEing from a table that has no foreign keys, triggers, etc while nothing else is accessing the table is fairly cheap and doesn't take much (any?) cleanup work afterwards. For tiny deletes I can easily see it being better than forcing the OS to journal a metadata change or two and a couple of fsync()s for a truncate. I suspect truncating many tables at once will prove a win over iteratively DELETEing from many tables at once. I'd benchmark it except that it's optimizing something I don't care about at all, and the results would be massively dependent on the file system (ext3, ext4, xfs) and its journal configuration. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. I have following assumptions: I have 30-100 tables. Let them be 30. Half of the tables are empty. Each non-empty table has, say, no more than 100 rows. By this I mean, tables are NOT large. I need an optional possibility to exclude 2 or 5 or N tables from this procedure. I cannot! use transactions. Why? That would definitely speed up everything. I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. I see the following approaches: 1) Truncate each table. It is too slow, I think, especially for empty tables. Did you actually try it? That's the king's way to performance questions! Truncating a single table is done in a matter of microseconds, particularly if it is not big. Do you have tens of thousands of tables? 2) Check each table for emptiness by more faster method, and then if it is empty reset its unique identifier column (analog of AUTO_INCREMENT in MySQL) to initial state (1), i.e to restore its last_value from sequence (the same AUTO_INCREMENT analog) back to 1, otherwise run truncate on it. That seems fragile an won't work everywhere. What if the table has no primary key with a DEFAULT that uses a sequence? What if it has such a key, but the DEFAULT was not used for an INSERT? What if somebody manually reset the sequence? Besides, how do you find out what the sequence for a table's primary key is? With a SELECT, I guess. That SELECT is probably not faster than a simple TRUNCATE. Also my guess was that EXISTS(SELECT something FROM TABLE) could somehow be used to work good as one of the check procedure units, cleaning procedure should consist of, but haven't accomplished it too. You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't think that this will be considerably faster than just truncating the table. I would appreciate any hints on how this procedure could be accomplished in PostgreSQL native way. Thanks! UPDATE: I need all this to run unit and integration tests for Ruby or Ruby on Rails projects. Each test should have a clean DB before it runs, or to do a cleanup after itself (so called teardown). Transactions are very good, but they become unusable when running tests against particular webdrivers, in my case the switch to truncation strategy is needed. Once I updated that with reference to RoR, please do not post here the answers about Obviously, you need DatabaseCleaner for PG and so on and so on. I completely fail to understand what you talk about here. Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Paged Query
Hi Hermann, Well, Not clear how you get rows for user without paging? If it is some query: SELECT columns FROM table WHERE UserHasPerimision(rowPK, userid) Paging would be: SELECT columns FROM table WHERE UserHasPerimision(rowPK, userid) LIMIT NoOfRecords OFFSET page*NoOfRecords Kind Regards, Misa 2012/7/4 Hermann Matthes hermann.matt...@web.de I want to implement a paged Query feature, where the user can enter in a dialog, how much rows he want to see. After displaying the first page of rows, he can can push a button to display the next/previous page. On database level I could user limit to implement this feature. My problem now is, that the user is not permitted to view all rows. For every row a permission check is performed and if permission is granted, the row is added to the list of rows sent to the client. If for example the user has entered a page size of 50 and I use limit 50 to only fetch 50 records, what should I do if he is only permitted to see 20 of these 50 records? There may be more records he can view. But if I don't use limit, what happens if the query would return 5,000,000 rows? Would my result set contain 5,000,000 rows or would the performance of the database go down? Thanks in advance Hermann -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.** org pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-performancehttp://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Friday, July 06, 2012 01:38:56 PM Daniel Farina wrote: ll, I don't know a mechanism besides slow file system truncation time that would explain why DELETE would be significantly faster. There is no filesystem truncation happening. The heap and the indexes get mapped into a new file. Otherwise rollback would be pretty hard to implement. I guess the biggest cost in a bigger cluster is the dropping the buffers that were formerly mapped to that relation (DropRelFileNodeBuffers). Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/06/2012 09:45 PM, Stanislaw Pankevich wrote: Question: Is there a possibility in PostgreSQL to do DELETE on many tables massively, like TRUNCATE allows. Like DELETE table1, table2, ...? Yes, you can do it with a writable common table expression, but you wanted version portability. WITH discard1 AS (DELETE FROM test1), discard2 AS (DELETE FROM test2 AS b) SELECT 1; Not only will this not work in older versions (IIRC it only works with 9.1, maybe 9.0 too but I don't see it in the documentation for SELECT for 9.0) but I find it hard to imagine any performance benefit over simply sending DELETE FROM test1; DELETE FROM test2; This all smells like premature optimisation of cases that don't matter. What problem are you solving with this? -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. I see the following approaches: 1) Truncate each table. It is too slow, I think, especially for empty tables. Did you actually try it? That's the king's way to performance questions! Truncating a single table is done in a matter of microseconds, particularly if it is not big. Do you have tens of thousands of tables? Actually, 10-100 tables. You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't think that this will be considerably faster than just truncating the table. Exactly this query is much faster, believe me. You can see my latest results on https://github.com/stanislaw/truncate-vs-count. Ok, I believe you. My quick tests showed that a sible truncate (including transaction and client-server roundtrip via UNIX sockets takes some 10 to 30 milliseconds. Multiply that with 100, and you end up with just a few seconds at most. Or what did you measure? I guess you run that deletion very often so that it is painful. Still I think that the biggest performance gain is to be had by using PostgreSQL's features (truncate several tables in one statement, ...). Try to bend your Ruby framework! Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Create tables performance
Hi to all, I run Postgresql 8.3.9 on a dedicated server running with Debian 5.0.4, a strong bi quad-proc with RAM 16Go. My biggest db contains at least 100 000 tables. Last time, I started a Java process I use to make some change on it, it created 170 new tables and it took one full minute. That is a very long time for such a process on such a server ! Do you think there could be some configuration tuning to do to improve the performance for create tables ? Or do I have to use tablespaces because 10 files in a single folder is a too many for OS ? It's possible to migrate the DB in 9.1 version. Do you think it could solve the trouble ? Thank you all for your advices, Best regards Sylvain
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones Hello, 2 'exotic' ideas: - use dblink_send_query to do the job in multiple threads (I doubt this really could be faster) - have prepared empty tables in a separate schema, and a garbage schema: ALTER TABLE x set schema garbage; ALTER TABLE prepared.x set schema current; you should be ready for the next test, but still have to clean garbage nad moved to prepared for the next but one in the background best regards, Marc Mamin I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. I need the fastest cleaning strategy for such case working on PostgreSQL both 8 and 9. I see the following approaches: 1) Truncate each table. It is too slow, I think, especially for empty tables. Did you actually try it? That's the king's way to performance questions! Truncating a single table is done in a matter of microseconds, particularly if it is not big. Do you have tens of thousands of tables? Actually, 10-100 tables. You could of course run a SELECT 1 FROM table LIMIT 1, but again I don't think that this will be considerably faster than just truncating the table. Exactly this query is much faster, believe me. You can see my latest results on https://github.com/stanislaw/truncate-vs-count. Ok, I believe you. My quick tests showed that a sible truncate (including transaction and client-server roundtrip via UNIX sockets takes some 10 to 30 milliseconds. Multiply that with 100, and you end up with just a few seconds at most. Or what did you measure? I guess you run that deletion very often so that it is painful. Still I think that the biggest performance gain is to be had by using PostgreSQL's features (truncate several tables in one statement, ...). Try to bend your Ruby framework! Yours, Laurenz Albe -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer ring...@ringerc.id.au wrote: On 07/03/2012 11:22 PM, Stanislaw Pankevich wrote: 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; This still calls DropRelFileNodeAllBuffers once for each table (and each index), even if the table is empty. With large shared_buffers, this can be relatively slow. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
On 07/03/2012 08:22 AM, Stanislaw Pankevich wrote: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones I wonder, what is the fastest way to accomplish this kind of task in PostgreSQL. I am interested in the fastest solutions ever possible. It would help if we really understood your use-case. If you want to fully reset your database to a known starting state for test runs, why not just have a base database initialized exactly as you wish, say test_base, then just drop your test database and create the new database from your template: drop database test; create database test template test_base; This should be very fast but it won't allow you to exclude individual tables. Are you interested in absolute fastest as a mind-game or is there a specific use requirement, i.e. how fast is fast enough? This is the basic starting point for tuning, hardware selection, etc. Truncate should be extremely fast but on tables that are as tiny as yours the difference may not be visible to an end-user. I just tried a delete from to empty a 10,000 record table and it took 14 milliseconds so you could do your maximum of 100 tables each containing 10-times your max number of records in less than two seconds. Regardless of the method you choose, you need to be sure that nobody is accessing the database when you reset it. The drop/create database method will, of course, require and enforce that. Truncate requires an exclusive lock so it may appear to be very slow if it is waiting to get that lock. And even if you don't have locking issues, your reluctance to wrap your reset code in transactions means that a client could be updating some table or tables whenever the reset script isn't actively working on that same table leading to unexplained weird test results. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Create tables performance
On Fri, Jul 6, 2012 at 8:15 AM, Sylvain CAILLET scail...@alaloop.com wrote: Hi to all, I run Postgresql 8.3.9 on a dedicated server running with Debian 5.0.4, a strong bi quad-proc with RAM 16Go. My biggest db contains at least 100 000 tables. Last time, I started a Java process I use to make some change on it, it created 170 new tables and it took one full minute. That is a very long time for such a process on such a server ! What if you create those 170 tables in a database without 100,000 pre-existing tables? What else does your script do? I can create 170 tables each with 10 rows in a database containing 100,000 other tables in less than a second on 8.3.9, either all in one transaction or in ~340 separate transactions. So whatever problem you are having is probably specific to your details, not a generic issue. It is hard to say if an upgrade would help if the root cause is not known. What do the standard monitoring tools show? Are you IO bound, or CPU bound? If CPU, is it in postgres or in java? Do you think there could be some configuration tuning to do to improve the performance for create tables ? Or do I have to use tablespaces because 10 files in a single folder is a too many for OS ? I doubt that that is a problem on any reasonably modern Linux. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Create tables performance
On 06/07/12 16:15, Sylvain CAILLET wrote: Hi to all, I run Postgresql 8.3.9 on a dedicated server running with Debian 5.0.4, a strong bi quad-proc with RAM 16Go. My biggest db contains at least 100 000 tables. That is a *lot* of tables and it's probably going to be slow whatever you do. Last time, I started a Java process I use to make some change on it, it created 170 new tables and it took one full minute. What are you using all these tables for? I'm assuming most of them have identical structure. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Re: PostgreSQL db, 30 tables with number of rows 100 (not huge) - the fastest way to clean each non-empty table and reset unique identifier column of empty ones.
Daniel Farina-4 wrote On Fri, Jul 6, 2012 at 4:29 AM, Craig Ringer lt;ringerc@.idgt; wrote: 1) Truncate each table. It is too slow, I think, especially for empty tables. Really?!? TRUNCATE should be extremely fast, especially on empty tables. You're aware that you can TRUNCATE many tables in one run, right? TRUNCATE TABLE a, b, c, d, e, f, g; I have seen in trivial cases -- in terms of data size -- where TRUNCATE is much slower than a full-table DELETE. The most common use case for that is rapid setup/teardown of tests, where it can add up quite quickly and in a very big way. This is probably an artifact the speed of one's file system to truncate and/or unlink everything. I haven't tried a multi-truncate though. Still, I don't know a mechanism besides slow file system truncation time that would explain why DELETE would be significantly faster. -- fdr -- Sent via pgsql-performance mailing list (pgsql-performance@) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance That's my experience - I have a set of regression tests that clean the database (deletes everything from a single parent table and lets the referential integrity checks cascade to delete five other tables) at the end of each test run, and it can complete 90 tests (including 90 mass deletes) in a little over five seconds. If I replace that simple delete with a truncation of all six tables at once, my test run balloons to 42 seconds. I run my development database with synchronous_commit = off, though, so I guess TRUNCATE has to hit the disk while the mass delete doesn't. -- View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-db-30-tables-with-number-of-rows-100-not-huge-the-fastest-way-to-clean-each-non-empty-tab-tp5715643p5715734.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Create tables performance
On 07/06/2012 11:15 PM, Sylvain CAILLET wrote: Hi to all, I run Postgresql 8.3.9 on a dedicated server running with Debian 5.0.4, a strong bi quad-proc with RAM 16Go. My biggest db contains at least 100 000 tables. Last time, I started a Java process I use to make some change on it, it created 170 new tables and it took one full minute. That is a very long time for such a process on such a server ! If you create and drop a lot of tables, you need to make sure you're vacuuming the pg_catalog tables frequently. Newer versions mostly take care of this for you, but on 8.3 you'll at minimum have to turn autovaccum right up. See what happens if you run in psql, as a Pg superuser (usually the postgres account): CLUSTER pg_class_oid_index ON pg_catalog.pg_class; CLUSTER pg_type_oid_index ON pg_catalog.pg_type; CLUSTER pg_attribute_relid_attnam_index ON pg_catalog.pg_attribute; CLUSTER pg_index_indexrelid_index ON pg_catalog.pg_index; I'm guessing you have severe table bloat in your catalogs, in which case this may help. I use CLUSTER instead of VACCUUM FULL because on old versions like 8.3 it'll run faster and sort the indexes for you too. Do you think there could be some configuration tuning to do to improve the performance for create tables ? Or do I have to use tablespaces because 10 files in a single folder is a too many for OS ? That won't be a problem unless your OS and file system are truly crap. -- Craig Ringer