[PERFORM] High CPU Usage
Hello, I have around 1000 schema in database, Each schema having similar data structure with different data Each schema has few tables which never updates (Read only table) and other tables rewrites almost everyday so I prefer to TRUNCATE those tables and restores with new data Now facing issue on high CPU IO on database primarily of Stats Collector Vacuuming, size of statfile is almost 28MB and when I manually vacuum analyze complete database it takes almost 90 minutes though auto vacuum is configured Restoring dump on each schema may minor data variations Executing SQL statements on schema are few , Affecting less than 50 touple / day My Questions : Increasing Maintainace_Work_Mem improves auto / manual vacuum performance ? If it improves will it require more IO / CPU resource ? If I stops Stats Collector process auto vaccuming Execute manual vaccum based on schema restoration with major change what performance parameter I need to consider ? (Restoring data has vary few changes) Is Vacuuming Stats required here for Metadata for improving performance ? (Table structures remain same) Any more on this which can help to reduce IO without affecting major performance regards, Siddharth -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scale up (postgresql vs mssql)
Hi, all. this is an obligation from the past: http://archives.postgresql.org/pgsql-performance/2012-05/msg00017.php the same test, that did ~230 results, is now doing ~700 results. that is, BTW even better than mssql. the ultimate solution for that problem was to NOT to do ON COMMIT DELETE ROWS for the temporary tables. instead, we just do DELETE FROM temp_table1. doing TRUNCATE temp_table1 is defiantly the worst case (~100 results in the same test). this is something we knew for a long time, which is why we did ON COMMIT DELETE ROWS, but eventually it turned out as far from being the best. another minor issue is that when configuring temp_tablespace='other_tablespace', the sequences of the temporary tables remain on the 'main_tablespace'. i hope that will help making postgres even better :)
[PERFORM] index-only scan is missing the INCLUDE feature
Hi all, as far as i looked around about the new feature: index-only scan, i guess this feature will not include the option such as ms-sql INCLUDE. well, i have a table with columns: a,b,c i query the table like this: select a,c from table where a=x and b=y as for now, i have unique-index on (a,b) in the future (when upgrading to 9.2), i would like to have unique-index on (a,b, INCLUDE c). but that wont be possible (right?). so... in order to have index-only scan, i will have to create an index like (a,b,c), but this has problems: 1. i lose the uniqueness enforcement of (a,b), unless, i will create 2 indexes: (a,b) and (a,b,c). 2. every update to column c would result in an unnecessary index-key-update (or what ever you call that operation), which is not just updating a tuple, but also an attempt to re-ordering it(!). 3. i just wonder: practically there is uniqueness of (a,b). now, if i create index like (a,b,c) the optimizer dose not know about the uniqueness of (a,b), therefore i afraid, it may not pick the best query-plan.. Thanks for any comment.
[PERFORM] Why is a hash join being used?
I am running the following query: SELECT res1.x, res1.y, res1.z FROM test t JOIN residue_atom_coords res1 ON t.struct_id_1 = res1.struct_id AND res1.atomno IN (1,2,3,4) AND (res1.seqpos BETWEEN t.pair_1_helix_1_begin AND t.pair_1_helix_1_end) WHERE t.compare_id BETWEEN 1 AND 1; The 'test' table is very large (~270 million rows) as is the residue_atom_coords table (~540 million rows). The number of compare_ids I select in the 'WHERE' clause determines the join type in the following way: t.compare_id BETWEEN 1 AND 5000; Nested Loop (cost=766.52..15996963.12 rows=3316307 width=24) - Index Scan using test_pkey on test t (cost=0.00..317.20 rows=5372 width=24) Index Cond: ((compare_id = 1) AND (compare_id = 5000)) - Bitmap Heap Scan on residue_atom_coords res1 (cost=766.52..2966.84 rows=625 width=44) Recheck Cond: ((struct_id = t.struct_id_1) AND (seqpos = t.pair_1_helix_1_begin) AND (seqpos = t.pair_1_helix_1_end) AND (atomno = ANY ('{1,2,3,4}'::integer[]))) - Bitmap Index Scan on residue_atom_coords_pkey (cost=0.00..766.36 rows=625 width=0) Index Cond: ((struct_id = t.struct_id_1) AND (seqpos = t.pair_1_helix_1_begin) AND (seqpos = t.pair_1_helix_1_end) AND (atomno = ANY ('{1,2,3,4}'::integer[]))) t.compare_id BETWEEN 1 AND 1; Hash Join (cost=16024139.91..20940899.94 rows=6633849 width=24) Hash Cond: (t.struct_id_1 = res1.struct_id) Join Filter: ((res1.seqpos = t.pair_1_helix_1_begin) AND (res1.seqpos = t.pair_1_helix_1_end)) - Index Scan using test_pkey on test t (cost=0.00..603.68 rows=10746 width=24) Index Cond: ((compare_id = 1) AND (compare_id = 1)) - Hash (cost=13357564.16..13357564.16 rows=125255660 width=44) - Seq Scan on residue_atom_coords res1 (cost=0.00..13357564.16 rows=125255660 width=44) Filter: (atomno = ANY ('{1,2,3,4}'::integer[])) The nested loop join performs very quickly, whereas the hash join is incredibly slow. If I disable the hash join temporarily then a nested loop join is used in the second case and is the query runs much more quickly. How can I change my configuration to favor the nested join in this case? Is this a bad idea? Alternatively, since I will be doing selections like this many times, what indexes can be put in place to expedite the query without mucking with the query optimizer? I've already created an index on the struct_id field of residue_atom_coords (each unique struct_id should only have a small number of rows for the residue_atom_coords table). Thanks in advance, Tim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] pgbouncer - massive overhead?
Given a baseline postgresql.conf config and a couple DL580 40 core/256GB memory I noticed a large over head for pgbouncer, has anyone seen this before? $ pgbench -h `hostname -i` -j 32 -p 4320 -U asgprod -s 500 -c 32 -S -T 60 pgbench_500 Scale option ignored, using pgbench_branches table count = 500 starting vacuum...end. transaction type: SELECT only scaling factor: 500 query mode: simple number of clients: 32 number of threads: 32 duration: 60 s number of transactions actually processed: 1743073 tps = 29049.88 (including connections establishing) tps = 29050.308194 (excluding connections establishing) $ pgbench -h `hostname -i` -j 32 -p 4310 -U asgprod -s 500 -c 32 -S -T 60 pgbench_500 Scale option ignored, using pgbench_branches table count = 500 starting vacuum...end. transaction type: SELECT only scaling factor: 500 query mode: simple number of clients: 32 number of threads: 32 duration: 60 s number of transactions actually processed: 8692204 tps = 144857.505107 (including connections establishing) tps = 144880.181341 (excluding connections establishing) processor : 39 vendor_id : GenuineIntel cpu family : 6 model : 47 model name :Intel(R) Xeon(R) CPU E7- 4860 @ 2.27GHz This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Why is a hash join being used?
On Wed, Jun 20, 2012 at 1:34 AM, Tim Jacobs tjaco...@email.unc.edu wrote: The nested loop join performs very quickly, whereas the hash join is incredibly slow. If I disable the hash join temporarily then a nested loop join is used in the second case and is the query runs much more quickly. How can I change my configuration to favor the nested join in this case? Is this a bad idea? First do ANALYZE the tables and try the tests again. If it helped check your autovacuum configuration. Look at http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#AUTOVACUUM and the pg_stat_user_tables table (last_* and *_count fields). If it still produces wrong plan then try to increase statistics entries by ALTER TABLE SET STATISTICS (do not forget to ANALYZE after doing it) or by the default_statistics_target configuration parameter. Read more about it here http://www.postgresql.org/docs/9.1/static/planner-stats.html. Alternatively, since I will be doing selections like this many times, what indexes can be put in place to expedite the query without mucking with the query optimizer? I've already created an index on the struct_id field of residue_atom_coords (each unique struct_id should only have a small number of rows for the residue_atom_coords table). As I can see everything is okay with indexes. Thanks in advance, Tim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sergey Konoplev a database architect, software developer at PostgreSQL-Consulting.com http://www.postgresql-consulting.com Jabber: gray...@gmail.com Skype: gray-hemp Phone: +79160686204 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scale up (postgresql vs mssql)
On 6/20/2012 1:01 AM, Eyal Wilde wrote: Hi, all. this is an obligation from the past: http://archives.postgresql.org/pgsql-performance/2012-05/msg00017.php the same test, that did ~230 results, is now doing ~700 results. that is, BTW even better than mssql. the ultimate solution for that problem was to NOT to do ON COMMIT DELETE ROWS for the temporary tables. instead, we just do DELETE FROM temp_table1. doing TRUNCATE temp_table1 is defiantly the worst case (~100 results in the same test). this is something we knew for a long time, which is why we did ON COMMIT DELETE ROWS, but eventually it turned out as far from being the best. another minor issue is that when configuring temp_tablespace='other_tablespace', the sequences of the temporary tables remain on the 'main_tablespace'. i hope that will help making postgres even better :) Did you ever try re-writing some of the temp table usage to use subselect's/views/cte/etc? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] index-only scan is missing the INCLUDE feature
On 06/20/2012 12:46 PM, Eyal Wilde wrote: Hi all, as far as i looked around about the new feature: index-only scan, i guess this feature will not include the option such as ms-sql INCLUDE. For those of us who don't know MS-SQL, can you give a quick explanation of what the INCLUDE keyword in an index definition is expected to do, or some documentation references? It's possible to guess it somewhat from your description, but it's helpful to be specific when asking a question about features from another DBMS. -- Craig Ringer
Re: [PERFORM] scale up (postgresql vs mssql)
On Wed, Jun 20, 2012 at 8:43 AM, Andy Colson a...@squeakycode.net wrote: this is an obligation from the past: http://archives.postgresql.org/pgsql-performance/2012-05/msg00017.php the same test, that did ~230 results, is now doing ~700 results. that is, BTW even better than mssql. the ultimate solution for that problem was to NOT to do ON COMMIT DELETE ROWS for the temporary tables. instead, we just do DELETE FROM temp_table1. doing TRUNCATE temp_table1 is defiantly the worst case (~100 results in the same test). this is something we knew for a long time, which is why we did ON COMMIT DELETE ROWS, but eventually it turned out as far from being the best. another minor issue is that when configuring temp_tablespace='other_tablespace', the sequences of the temporary tables remain on the 'main_tablespace'. i hope that will help making postgres even better :) Did you ever try re-writing some of the temp table usage to use subselect's/views/cte/etc? Yeah -- especially CTE. But, assuming you really do need to keep a temp table organized and you want absolutely minimum latency in the temp table manipulating function, you can use a nifty trick so organize a table around txid_current(); CREATE UNLOGGED TABLE Cache (txid BIGINT DEFAULT txid_current(), a TEXT, b TEXT); CREATE INDEX ON Cache(txid); -- or -- CREATE INDEX ON Cache(txid, a); -- if a is lookup key etc. When you insert to the table let the default catch the current txid and make sure that all queries are properly filtering the table on txid, and that all indexes are left prefixed on txid. Why do this? Now the record delete operations can be delegated to an external process. At any time, a scheduled process can do: DELETE from Cache; This is not guaranteed to be faster, but it probably will be. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] pgbouncer - massive overhead?
On 06/19/2012 09:00 AM, Strange, John W wrote: Given a baseline postgresql.conf config and a couple DL580 40 core/256GB memory I noticed a large over head for pgbouncer, has anyone seen this before? $ pgbench -h `hostname -i` -j 32 -p 4320 -U asgprod -s 500 -c 32 -S -T 60 pgbench_500 Scale option ignored, using pgbench_branches table count = 500 starting vacuum...end. transaction type: SELECT only scaling factor: 500 query mode: simple number of clients: 32 number of threads: 32 duration: 60 s number of transactions actually processed: 1743073 tps = 29049.88 (including connections establishing) tps = 29050.308194 (excluding connections establishing) $ pgbench -h `hostname -i` -j 32 -p 4310 -U asgprod -s 500 -c 32 -S -T 60 pgbench_500 Scale option ignored, using pgbench_branches table count = 500 starting vacuum...end. transaction type: SELECT only scaling factor: 500 query mode: simple number of clients: 32 number of threads: 32 duration: 60 s number of transactions actually processed: 8692204 tps = 144857.505107 (including connections establishing) tps = 144880.181341 (excluding connections establishing) processor : 39 vendor_id : GenuineIntel cpu family : 6 model : 47 model name :Intel(R) Xeon(R) CPU E7- 4860 @ 2.27GHz I'm very dubious that the stats are meaningful as run. Were the above stats generated on consecutive runs on the same machine or was the test database fully returned to baseline between runs and the machine restarted to clear cache? I doubt anyone here would trust the results of a 60-second pgbench run - especially a select-only test on a server that will likely end up with virtually everything ultimately in cache. Make sure each run is started from the same state and run for 30-60 minutes. Still, you *are* adding a layer between the client and the server. Running the simplest of read-only queries against a fully-cached database on a fast many-core machine is likely to emphasize any latency introduced by pgbouncer. But it's also not a use-case for which pgbouncer is intended. If you were to add -C so each query required a new client connection a different picture would emerge. Same thing if you had 2000 client connections of which only a handful were running queries at any moment. Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] index-only scan is missing the INCLUDE feature
On 06/20/2012 09:11 AM, Craig Ringer wrote: For those of us who don't know MS-SQL, can you give a quick explanation of what the INCLUDE keyword in an index definition is expected to do, or some documentation references? He's talking about what MS SQL Server commonly calls a covering index. In these cases, you can specify columns to be included in the index, but not actually part of the calculated hash. This prevents a trip to the table data, so selects can be serviced entirely by an index scan. PostgreSQL is about half way there by allowing index-only scans, though I've no idea if they intend on adding further functionality like this. Effectively you can trade index bloat for query speed. But considering the differences between the engines, it might not be necessary. I couldn't say. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] moving tables
I need to move a postgres 9.0 database -- with tables, indexes, and wals associated with 16 tablespaces on 12 logical drives -- to an existing raid 10 drive in another volume on the same server. Once I get the data off the initial 12 drives they will be reconfigured, at which point I'll need to move everything from the 2nd volume to the aforementioned 12 logical drives on the first volume. This is being done both to free up the 2nd volume and to better utilize raid 10. I checked around and found a way to create sql statements to alter the public tablespaces and indexes, but I haven't found anything that provides information about moving the numerous associated config files, log files, etc. ANY comments, suggestions, or direction to existing documentation would be greatly appreciated. Current server info: - 4 dual-core AMD Opteron 2212 processors, 2010.485 MHz - 64GB RAM - 16 67GB RAID 1 drives and 1 464GB RAID 10 drive (all ext3) on 2 volumes. - Linux 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux Thank you, Midge
Re: [PERFORM] moving tables
On 6/20/12 3:27 PM, Midge Brown wrote: I need to move a postgres 9.0 database -- with tables, indexes, and wals associated with 16 tablespaces on 12 logical drives -- to an existing raid 10 drive in another volume on the same server. Once I get the data off the initial 12 drives they will be reconfigured, at which point I'll need to move everything from the 2nd volume to the aforementioned 12 logical drives on the first volume. This is being done both to free up the 2nd volume and to better utilize raid 10. I checked around and found a way to create sql statements to alter the public tablespaces and indexes, but I haven't found anything that provides information about moving the numerous associated config files, log files, etc. ANY comments, suggestions, or direction to existing documentation would be greatly appreciated. 1. back everything up. 2. create a bunch of directories on the RAID10 to match the existing tablespaces (they won't be mounts, but Postgres doesn't care about that). 3. shut down postgres 4. copy all your files to the new directories 5. change your mount points which were in use by the old tablespaces to symlinks which point at the new diretories 6. start postgres back up from the new location -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] index-only scan is missing the INCLUDE feature
On 06/20/2012 11:32 PM, Shaun Thomas wrote: On 06/20/2012 09:11 AM, Craig Ringer wrote: For those of us who don't know MS-SQL, can you give a quick explanation of what the INCLUDE keyword in an index definition is expected to do, or some documentation references? He's talking about what MS SQL Server commonly calls a covering index. In these cases, you can specify columns to be included in the index, but not actually part of the calculated hash. This prevents a trip to the table data, so selects can be serviced entirely by an index scan. Oh, OK, so it's a covering index with added fields that don't form part of the searchable index structure to make the index a little less expensive than a fully covering index on all the columns of interest. Fair enough. Thanks for the explanation. Eyal, you'll get a better response to questions about other DBMSs if you explain what you need/want to do with the desired feature and what that feature does in the other DBMS. PostgreSQL is about half way there by allowing index-only scans, though I've no idea if they intend on adding further functionality like this. There's certainly lots of interest in adding more, but not that many people with the expertise to be able to do it - and fewer still who're paid to work on Pg so they have time to focus on it. Covering indexes with Pg's MVCC model seem to be particularly challenging, too. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance