Re: [PERFORM] CPU bound
Thanks guys - interesting. On 14/12/2010, at 5:59 AM, Josh Berkus wrote: On 12/12/10 6:43 PM, Royce Ausburn wrote: Hi all, I notice that when restoring a DB on a laptop with an SDD, typically postgres is maxing out a CPU - even during a COPY. I wonder, what is postgres usually doing with the CPU? I would have thought the disk would usually be the bottleneck in the DB, but occasionally it's not. We're embarking on a new DB server project and it'd be helpful to understand where the CPU is likely to be the bottleneck. That's pretty normal; as soon as you get decent disk, especially something like an SSD with a RAM cache, you become CPU-bound. COPY does a LOT of parsing and data manipulation. Index building, of course, is almost pure CPU if you have a decent amount of RAM available. If you're restoring from a pg_dump file, and have several cores available, I suggest using parallel pg_restore. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Index Bloat - how to tell?
How can you tell when your indexes are starting to get bloated and when you need to rebuild them. I haven't seen a quick way to tell and not sure if it's being tracked. ___ | John W. Strange | Investment Bank | Global Commodities Technology | J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C: 281-744-6476 | F: 713 236- | john.w.stra...@jpmchase.com | jpmorgan.com This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to European legal entities. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index Bloat - how to tell?
I have used this in the past ... run this against the database that you want to inspect. SELECT current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/ ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, CASE WHEN relpages otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes, iname, /*ituples::bigint, ipages::bigint, iotta,*/ ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes FROM ( SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols FROM ( SELECT ma,bs,schemaname,tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname, tablename, hdr, ma, bs, SUM((1-null_frac)*avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename ) AS nullhdr FROM pg_stats s, ( SELECT (SELECT current_setting('block_size')::numeric) AS bs, CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants GROUP BY 1,2,3,4,5 ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname 'information_schema' LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml ORDER BY wastedbytes DESC -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of John W Strange Sent: Tuesday, December 14, 2010 8:48 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Index Bloat - how to tell? How can you tell when your indexes are starting to get bloated and when you need to rebuild them. I haven't seen a quick way to tell and not sure if it's being tracked. ___ | John W. Strange | Investment Bank | Global Commodities Technology | J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C: 281-744-6476 | F: 713 236- | john.w.stra...@jpmchase.com | jpmorgan.com This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to European legal entities. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with bulk read performance
On 11/1/2010 9:15 AM, Dan Schaffer wrote: Hello We have an application that needs to do bulk reads of ENTIRE Postgres tables very quickly (i.e. select * from table). We have observed that such sequential scans run two orders of magnitude slower than observed raw disk reads (5 MB/s versus 100 MB/s). Part of this is due to the storage overhead we have observed in Postgres. In the example below, it takes 1 GB to store 350 MB of nominal data. However that suggests we would expect to get 35 MB/s bulk read rates. Observations using iostat and top during these bulk reads suggest that the queries are CPU bound, not I/O bound. In fact, repeating the queries yields similar response times. Presumably if it were an I/O issue the response times would be much shorter the second time through with the benefit of caching. We have tried these simple queries using psql, JDBC, pl/java stored procedures, and libpq. In all cases the client code ran on the same box as the server. We have experimented with Postgres 8.1, 8.3 and 9.0. We also tried playing around with some of the server tuning parameters such as shared_buffers to no avail. Here is uname -a for a machine we have tested on: Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20 07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux A sample dataset that reproduces these results looks like the following (there are no indexes): Table bulk_performance.counts Column | Type | Modifiers +-+--- i1 | integer | i2 | integer | i3 | integer | i4 | integer | There are 22 million rows in this case. We HAVE observed that summation queries run considerably faster. In this case, select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts runs at 35 MB/s. Our business logic does operations on the resulting data such that the output is several orders of magnitude smaller than the input. So we had hoped that by putting our business logic into stored procedures (and thus drastically reducing the amount of data flowing to the client) our throughput would go way up. This did not happen. So our questions are as follows: Is there any way using stored procedures (maybe C code that calls SPI directly) or some other approach to get close to the expected 35 MB/s doing these bulk reads? Or is this the price we have to pay for using SQL instead of some NoSQL solution. (We actually tried Tokyo Cabinet and found it to perform quite well. However it does not measure up to Postgres in terms of replication, data interrogation, community support, acceptance, etc). Thanks Dan Schaffer Paul Hamer Nick Matheson Whoa... Deja Vu Is this the same thing Nick is working on? How'd he get along? http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with bulk read performance
On Dec 14, 2010, at 9:27 AM, Andy Colson wrote: Is this the same thing Nick is working on? How'd he get along? http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov So it is. The one I replied to stood out because no one had replied to it; I didn't see the earlier email. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with bulk read performance
On 12/14/2010 9:41 AM, Jim Nasby wrote: On Dec 14, 2010, at 9:27 AM, Andy Colson wrote: Is this the same thing Nick is working on? How'd he get along? http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov So it is. The one I replied to stood out because no one had replied to it; I didn't see the earlier email. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net Oh.. I didn't even notice the date... I thought it was a new post. But still... (and I'll cc Nick on this) I'd love to hear an update on how this worked out. Did you get it to go fast? What'd you use? Did the project go over budget and did you all get fired? COME ON MAN! We need to know! :-) -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index Bloat - how to tell?
Can you explain this query a bit? It isn't at all clear to me. Plugge, Joe R. wrote: I have used this in the past ... run this against the database that you want to inspect. SELECT current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/ ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, CASE WHEN relpages otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes, iname, /*ituples::bigint, ipages::bigint, iotta,*/ ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes FROM ( SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols FROM ( SELECT ma,bs,schemaname,tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname, tablename, hdr, ma, bs, SUM((1-null_frac)*avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename ) AS nullhdr FROM pg_stats s, ( SELECT (SELECT current_setting('block_size')::numeric) AS bs, CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants GROUP BY 1,2,3,4,5 ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname 'information_schema' LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml ORDER BY wastedbytes DESC -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of John W Strange Sent: Tuesday, December 14, 2010 8:48 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Index Bloat - how to tell? How can you tell when your indexes are starting to get bloated and when you need to rebuild them. I haven't seen a quick way to tell and not sure if it's being tracked. ___ | John W. Strange | Investment Bank | Global Commodities Technology | J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C: 281-744-6476 | F: 713 236- | john.w.stra...@jpmchase.com | jpmorgan.com This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you. Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to European legal entities. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription:
Re: [PERFORM] Help with bulk read performance
BTW, have you tried prepared statements? bytea is most likely faster (in part) due to less parsing in the backend. Prepared statements would eliminate that parsing step. On Dec 14, 2010, at 10:07 AM, Nick Matheson wrote: Hey all- Glad to know you are still interested... ;) Didn't mean to leave you hanging, the holiday and all have put some bumps in the road. Dan my co-worker might be able to post some more detailed information here, but here is a brief summary of what I am aware of: 1. We have not tested any stored procedure/SPI based solutions to date. 2. The COPY API has been the best of the possible solutions explored to date. 3. We were able to get rates on the order of 35 MB/s with the original problem this way. 4. Another variant of the problem we were working on included some metadata fields and 300 float values (for this we tried three variants) a. 300 float values as columns b. 300 float in a float array column c. 300 floats packed into a bytea column Long story short on these three variants a and b largely performed the same. C was the winner and seems to have improved the throughput on multiple counts. 1. it reduces the data transmitted over the wire by a factor of two (float columns and float arrays have a 2x overhead over the raw data requirement.) 2. this reduction seems to have reduced the cpu burdens on the server side thus producing a better than the expected 2x speed. I think the final numbers left us somewhere in the 80-90 MB/s. Thanks again for all the input. If you have any other questions let us know. Also if we get results for the stored procedure/SPI route we will try and post, but the improvements via standard JDBC are such that we aren't really pressed at this point in time to get more throughput so it may not happen. Cheers, Nick On 12/14/2010 9:41 AM, Jim Nasby wrote: On Dec 14, 2010, at 9:27 AM, Andy Colson wrote: Is this the same thing Nick is working on? How'd he get along? http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov So it is. The one I replied to stood out because no one had replied to it; I didn't see the earlier email. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net Oh.. I didn't even notice the date... I thought it was a new post. But still... (and I'll cc Nick on this) I'd love to hear an update on how this worked out. Did you get it to go fast? What'd you use? Did the project go over budget and did you all get fired? COME ON MAN! We need to know! :-) -Andy -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help with bulk read performance
Hey all- Glad to know you are still interested... ;) Didn't mean to leave you hanging, the holiday and all have put some bumps in the road. Dan my co-worker might be able to post some more detailed information here, but here is a brief summary of what I am aware of: 1. We have not tested any stored procedure/SPI based solutions to date. 2. The COPY API has been the best of the possible solutions explored to date. 3. We were able to get rates on the order of 35 MB/s with the original problem this way. 4. Another variant of the problem we were working on included some metadata fields and 300 float values (for this we tried three variants) a. 300 float values as columns b. 300 float in a float array column c. 300 floats packed into a bytea column Long story short on these three variants a and b largely performed the same. C was the winner and seems to have improved the throughput on multiple counts. 1. it reduces the data transmitted over the wire by a factor of two (float columns and float arrays have a 2x overhead over the raw data requirement.) 2. this reduction seems to have reduced the cpu burdens on the server side thus producing a better than the expected 2x speed. I think the final numbers left us somewhere in the 80-90 MB/s. Thanks again for all the input. If you have any other questions let us know. Also if we get results for the stored procedure/SPI route we will try and post, but the improvements via standard JDBC are such that we aren't really pressed at this point in time to get more throughput so it may not happen. Cheers, Nick On 12/14/2010 9:41 AM, Jim Nasby wrote: On Dec 14, 2010, at 9:27 AM, Andy Colson wrote: Is this the same thing Nick is working on? How'd he get along? http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov So it is. The one I replied to stood out because no one had replied to it; I didn't see the earlier email. -- Jim C. Nasby, Database Architect j...@nasby.net 512.569.9461 (cell) http://jim.nasby.net Oh.. I didn't even notice the date... I thought it was a new post. But still... (and I'll cc Nick on this) I'd love to hear an update on how this worked out. Did you get it to go fast? What'd you use? Did the project go over budget and did you all get fired? COME ON MAN! We need to know! :-) -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Index Bloat - how to tell?
There is a plugin called pgstattuple which can be quite informative however, it actually does a full scan of the table / index files, which may be a bit invasive depending on your environment and load. http://www.postgresql.org/docs/current/static/pgstattuple.html It's in the contrib (at least for 8.4), and so you have to import its functions into your schema using the script in the contrib directory. Cheers Dave On Tue, Dec 14, 2010 at 8:54 AM, Plugge, Joe R. jrplu...@west.com wrote: I have used this in the past ... run this against the database that you want to inspect. SELECT current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/ ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, CASE WHEN relpages otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes, iname, /*ituples::bigint, ipages::bigint, iotta,*/ ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes FROM ( SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols FROM ( SELECT ma,bs,schemaname,tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname, tablename, hdr, ma, bs, SUM((1-null_frac)*avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename ) AS nullhdr FROM pg_stats s, ( SELECT (SELECT current_setting('block_size')::numeric) AS bs, CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants GROUP BY 1,2,3,4,5 ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname 'information_schema' LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml ORDER BY wastedbytes DESC -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto: pgsql-performance-ow...@postgresql.org] On Behalf Of John W Strange Sent: Tuesday, December 14, 2010 8:48 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] Index Bloat - how to tell? How can you tell when your indexes are starting to get bloated and when you need to rebuild them. I haven't seen a quick way to tell and not sure if it's being tracked. ___ | John W. Strange | Investment Bank | Global Commodities Technology | J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C: 281-744-6476 | F: 713 236- | john.w.stra...@jpmchase.com | jpmorgan.com This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase Co., its subsidiaries and affiliates. This transmission may contain information that is privileged, confidential, legally privileged, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMorgan Chase Co., its subsidiaries and affiliates, as applicable, for any loss or damage arising in any way from its use. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank
Re: [PERFORM] Index Bloat - how to tell?
On 15/12/10 09:12, Dave Crooke wrote: There is a plugin called pgstattuple which can be quite informative however, it actually does a full scan of the table / index files, which may be a bit invasive depending on your environment and load. http://www.postgresql.org/docs/current/static/pgstattuple.html It's in the contrib (at least for 8.4), and so you have to import its functions into your schema using the script in the contrib directory. If you are using 8.4 or later, try the Freespacemap module: http://www.postgresql.org/docs/current/static/pgfreespacemap.html I tend to run this query: SELECT oid::regclass, pg_relation_size(oid)/(1024*1024) AS mb, sum(free)/(1024*1024) AS free_mb FROM (SELECT oid, (pg_freespace(oid)).avail AS free FROM pg_class) AS a GROUP BY a.oid ORDER BY free_mb DESC; to show up potentially troublesome amounts of bloat. regards Mark
[PERFORM] only one index is using, why?
I have a table in Postgresql 9.0.1 as folllows: Table public.crmentity Column|Type | Modifiers --+-+ crmid| integer | not null smcreatorid | integer | not null default 0 smownerid| integer | not null default 0 modifiedby | integer | not null default 0 setype | character varying(30) | not null description | text| createdtime | timestamp without time zone | not null modifiedtime | timestamp without time zone | not null viewedtime | timestamp without time zone | status | character varying(50) | version | integer | not null default 0 presence | integer | default 1 deleted | integer | not null default 0 Indexes: crmentity_pkey PRIMARY KEY, btree (crmid) crmentity_createdtime_idx btree (createdtime) crmentity_modifiedby_idx btree (modifiedby) crmentity_modifiedtime_idx btree (modifiedtime) crmentity_smcreatorid_idx btree (smcreatorid) crmentity_smownerid_idx btree (smownerid) ftx_crmentity_descr gin (to_tsvector('english'::regconfig, replace(description, '!--'::text, '!-'::text))) crmentity_deleted_idx btree (deleted) crmentity_setype_idx btree (setype) Referenced by: TABLE service CONSTRAINT fk_1_service FOREIGN KEY (serviceid) REFERENCES crmentity(crmid) ON DELETE CASCADE TABLE _cc2crmentity CONSTRAINT fk__cc2crmentity_crmentity FOREIGN KEY (crm_id) REFERENCES crmentity(crmid) ON UPDATE CASCADE ON DELETE CASCADE EXPLAIN ANALYZE on this table: explain analyze select * FROM crmentity where crmentity.deleted=0 and crmentity.setype='Emails' Index Scan using crmentity_setype_idx on crmentity (cost=0.00..1882.76 rows=55469 width=301) (actual time=0.058..158.564 rows=79193 loops=1) Index Cond: ((setype)::text = 'Emails'::text) Filter: (deleted = 0) Total runtime: 231.256 ms (4 rows) My question is why crmentity_setype_idx index is being used only. crmentity_deleted_idx index is not using. Any idea please.