Re: [PERFORM] CPU bound

2010-12-14 Thread Royce Ausburn
Thanks guys - interesting.  


On 14/12/2010, at 5:59 AM, Josh Berkus wrote:

 On 12/12/10 6:43 PM, Royce Ausburn wrote:
 Hi all,
 
 I notice that when restoring a DB on a laptop with an SDD, typically 
 postgres is maxing out a CPU - even during a COPY.  I wonder, what is 
 postgres usually doing with the CPU?  I would have thought the disk would 
 usually be the bottleneck in the DB, but occasionally it's not.  We're 
 embarking on a new DB server project and it'd be helpful to understand where 
 the CPU is likely to be the bottleneck.
 
 That's pretty normal; as soon as you get decent disk, especially
 something like an SSD with a RAM cache, you become CPU-bound.  COPY does
 a LOT of parsing and data manipulation.  Index building, of course, is
 almost pure CPU if you have a decent amount of RAM available.
 
 If you're restoring from a pg_dump file, and have several cores
 available, I suggest using parallel pg_restore.
 
 
 -- 
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Index Bloat - how to tell?

2010-12-14 Thread John W Strange
How can you tell when your indexes are starting to get bloated and when you 
need to rebuild them.  I haven't seen a quick way to tell and not sure if it's 
being tracked.

___
| John W. Strange | Investment Bank | Global Commodities Technology 
| J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C: 281-744-6476 | 
F: 713 236-
| john.w.stra...@jpmchase.com | jpmorgan.com

This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase  Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase 
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Bloat - how to tell?

2010-12-14 Thread Plugge, Joe R.
I have used this in the past ... run this against the database that you want to 
inspect.


SELECT
  current_database(), schemaname, tablename, /*reltuples::bigint, 
relpages::bigint, otta,*/
  ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS 
tbloat,
  CASE WHEN relpages  otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS 
wastedbytes,
  iname, /*ituples::bigint, ipages::bigint, iotta,*/
  ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric 
END,1) AS ibloat,
  CASE WHEN ipages  iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
  SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
  (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma 
END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, 
COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- 
very rough approximation, assumes all cols
  FROM (
SELECT
  ma,bs,schemaname,tablename,
  (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma 
END)))::numeric AS datahdr,
  (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma 
END))) AS nullhdr2
FROM (
  SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
  SELECT 1+count(*)/8
  FROM pg_stats s2
  WHERE null_frac0 AND s2.schemaname = s.schemaname AND s2.tablename 
= s.tablename
) AS nullhdr
  FROM pg_stats s, (
SELECT
  (SELECT current_setting('block_size')::numeric) AS bs,
  CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 
END AS hdr,
  CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
  ) AS constants
  GROUP BY 1,2,3,4,5
) AS foo
  ) AS rs
  JOIN pg_class cc ON cc.relname = rs.tablename
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = 
rs.schemaname AND nn.nspname  'information_schema'
  LEFT JOIN pg_index i ON indrelid = cc.oid
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of John W Strange
Sent: Tuesday, December 14, 2010 8:48 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Index Bloat - how to tell?

How can you tell when your indexes are starting to get bloated and when you 
need to rebuild them.  I haven't seen a quick way to tell and not sure if it's 
being tracked.



___

| John W. Strange | Investment Bank | Global Commodities Technology 

| J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C: 281-744-6476 | 
F: 713 236-

| john.w.stra...@jpmchase.com | jpmorgan.com



This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase  Co., its subsidiaries
and affiliates.



This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase 
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.



Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with bulk read performance

2010-12-14 Thread Andy Colson

On 11/1/2010 9:15 AM, Dan Schaffer wrote:

Hello

We have an application that needs to do bulk reads of ENTIRE Postgres
tables very quickly (i.e. select * from table). We have observed that
such sequential scans run two orders of magnitude slower than observed
raw disk reads (5 MB/s versus 100 MB/s). Part of this is due to the
storage overhead we have observed in Postgres. In the example below, it
takes 1 GB to store 350 MB of nominal data. However that suggests we
would expect to get 35 MB/s bulk read rates.

Observations using iostat and top during these bulk reads suggest that
the queries are CPU bound, not I/O bound. In fact, repeating the queries
yields similar response times. Presumably if it were an I/O issue the
response times would be much shorter the second time through with the
benefit of caching.

We have tried these simple queries using psql, JDBC, pl/java stored
procedures, and libpq. In all cases the client code ran on the same box
as the server.
We have experimented with Postgres 8.1, 8.3 and 9.0.

We also tried playing around with some of the server tuning parameters
such as shared_buffers to no avail.

Here is uname -a for a machine we have tested on:

Linux nevs-bdb1.fsl.noaa.gov 2.6.18-194.17.1.el5 #1 SMP Mon Sep 20
07:12:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

A sample dataset that reproduces these results looks like the following
(there are no indexes):

Table bulk_performance.counts
Column | Type | Modifiers
+-+---
i1 | integer |
i2 | integer |
i3 | integer |
i4 | integer |

There are 22 million rows in this case.

We HAVE observed that summation queries run considerably faster. In this
case,

select sum(i1), sum(i2), sum(i3), sum(i4) from bulk_performance.counts

runs at 35 MB/s.


Our business logic does operations on the resulting data such that the
output is several orders of magnitude smaller than the input. So we had
hoped that by putting our business logic into stored procedures (and
thus drastically reducing the amount of data flowing to the client) our
throughput would go way up. This did not happen.

So our questions are as follows:

Is there any way using stored procedures (maybe C code that calls SPI
directly) or some other approach to get close to the expected 35 MB/s
doing these bulk reads? Or is this the price we have to pay for using
SQL instead of some NoSQL solution. (We actually tried Tokyo Cabinet and
found it to perform quite well. However it does not measure up to
Postgres in terms of replication, data interrogation, community support,
acceptance, etc).

Thanks

Dan Schaffer
Paul Hamer
Nick Matheson






Whoa... Deja Vu

Is this the same thing Nick is working on?  How'd he get along?

http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with bulk read performance

2010-12-14 Thread Jim Nasby
On Dec 14, 2010, at 9:27 AM, Andy Colson wrote:
 Is this the same thing Nick is working on?  How'd he get along?
 
 http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov

So it is. The one I replied to stood out because no one had replied to it; I 
didn't see the earlier email.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with bulk read performance

2010-12-14 Thread Andy Colson

On 12/14/2010 9:41 AM, Jim Nasby wrote:

On Dec 14, 2010, at 9:27 AM, Andy Colson wrote:

Is this the same thing Nick is working on?  How'd he get along?

http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov


So it is. The one I replied to stood out because no one had replied to it; I 
didn't see the earlier email.
--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net





Oh.. I didn't even notice the date... I thought it was a new post.

But still... (and I'll cc Nick on this)  I'd love to hear an update on 
how this worked out.


Did you get it to go fast?  What'd you use?  Did the project go over 
budget and did you all get fired?  COME ON MAN!  We need to know! :-)


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Bloat - how to tell?

2010-12-14 Thread Mladen Gogala

Can you explain this query a bit? It isn't at all clear to me.


Plugge, Joe R. wrote:

I have used this in the past ... run this against the database that you want to 
inspect.


SELECT
  current_database(), schemaname, tablename, /*reltuples::bigint, 
relpages::bigint, otta,*/
  ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS 
tbloat,
  CASE WHEN relpages  otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS 
wastedbytes,
  iname, /*ituples::bigint, ipages::bigint, iotta,*/
  ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric 
END,1) AS ibloat,
  CASE WHEN ipages  iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
  SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
  (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma 
END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, 
COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- 
very rough approximation, assumes all cols
  FROM (
SELECT
  ma,bs,schemaname,tablename,
  (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma 
END)))::numeric AS datahdr,
  (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma 
END))) AS nullhdr2
FROM (
  SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
  SELECT 1+count(*)/8
  FROM pg_stats s2
  WHERE null_frac0 AND s2.schemaname = s.schemaname AND s2.tablename 
= s.tablename
) AS nullhdr
  FROM pg_stats s, (
SELECT
  (SELECT current_setting('block_size')::numeric) AS bs,
  CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 
END AS hdr,
  CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
  ) AS constants
  GROUP BY 1,2,3,4,5
) AS foo
  ) AS rs
  JOIN pg_class cc ON cc.relname = rs.tablename
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND 
nn.nspname  'information_schema'
  LEFT JOIN pg_index i ON indrelid = cc.oid
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC

-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of John W Strange
Sent: Tuesday, December 14, 2010 8:48 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Index Bloat - how to tell?

How can you tell when your indexes are starting to get bloated and when you 
need to rebuild them.  I haven't seen a quick way to tell and not sure if it's 
being tracked.



___

| John W. Strange | Investment Bank | Global Commodities Technology 


| J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C: 281-744-6476 | 
F: 713 236-

| john.w.stra...@jpmchase.com | jpmorgan.com



This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase  Co., its subsidiaries
and affiliates.



This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase 
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.



Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

  



--

Mladen Gogala 
Sr. Oracle DBA

1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com 
The Leader in Integrated Media Intelligence Solutions





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:

Re: [PERFORM] Help with bulk read performance

2010-12-14 Thread Jim Nasby
BTW, have you tried prepared statements? bytea is most likely faster (in part) 
due to less parsing in the backend. Prepared statements would eliminate that 
parsing step.

On Dec 14, 2010, at 10:07 AM, Nick Matheson wrote:

 Hey all-
 
 Glad to know you are still interested... ;)
 
 Didn't mean to leave you hanging, the holiday and all have put some bumps in 
 the road.
 
 Dan my co-worker might be able to post some more detailed information here, 
 but here is a brief summary of what I am aware of:
 
 1. We have not tested any stored procedure/SPI based solutions to date.
 2. The COPY API has been the best of the possible solutions explored to date.
 3. We were able to get rates on the order of 35 MB/s with the original 
 problem this way.
 4. Another variant of the problem we were working on included some metadata 
 fields and 300 float values (for this we tried three variants)
   a. 300 float values as columns
   b. 300 float in a float array column
   c. 300 floats packed into a bytea column
 Long story short on these three variants a and b largely performed the same. 
 C was the winner and seems to have improved the throughput on multiple 
 counts. 1. it reduces the data transmitted over the wire by a factor of two 
 (float columns and float arrays have a 2x overhead over the raw data 
 requirement.) 2. this reduction seems to have reduced the cpu burdens on the 
 server side thus producing a better than the expected 2x speed. I think the 
 final numbers left us somewhere in the 80-90 MB/s.
 
 Thanks again for all the input. If you have any other questions let us know. 
 Also if we get results for the stored procedure/SPI route we will try and 
 post, but the improvements via standard JDBC are such that we aren't really 
 pressed at this point in time to get more throughput so it may not happen.
 
 Cheers,
 
 Nick
 On 12/14/2010 9:41 AM, Jim Nasby wrote:
 On Dec 14, 2010, at 9:27 AM, Andy Colson wrote:
 Is this the same thing Nick is working on?  How'd he get along?
 
 http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov
 
 So it is. The one I replied to stood out because no one had replied to it; 
 I didn't see the earlier email.
 -- 
 Jim C. Nasby, Database Architect   j...@nasby.net
 512.569.9461 (cell) http://jim.nasby.net
 
 
 
 
 Oh.. I didn't even notice the date... I thought it was a new post.
 
 But still... (and I'll cc Nick on this)  I'd love to hear an update on how 
 this worked out.
 
 Did you get it to go fast?  What'd you use?  Did the project go over budget 
 and did you all get fired?  COME ON MAN!  We need to know! :-)
 
 -Andy
 

--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help with bulk read performance

2010-12-14 Thread Nick Matheson

Hey all-

Glad to know you are still interested... ;)

Didn't mean to leave you hanging, the holiday and all have put some 
bumps in the road.


Dan my co-worker might be able to post some more detailed information 
here, but here is a brief summary of what I am aware of:


1. We have not tested any stored procedure/SPI based solutions to date.
2. The COPY API has been the best of the possible solutions explored to 
date.
3. We were able to get rates on the order of 35 MB/s with the original 
problem this way.
4. Another variant of the problem we were working on included some 
metadata fields and 300 float values (for this we tried three variants)

   a. 300 float values as columns
   b. 300 float in a float array column
   c. 300 floats packed into a bytea column
Long story short on these three variants a and b largely performed the 
same. C was the winner and seems to have improved the throughput on 
multiple counts. 1. it reduces the data transmitted over the wire by a 
factor of two (float columns and float arrays have a 2x overhead over 
the raw data requirement.) 2. this reduction seems to have reduced the 
cpu burdens on the server side thus producing a better than the expected 
2x speed. I think the final numbers left us somewhere in the 80-90 MB/s.


Thanks again for all the input. If you have any other questions let us 
know. Also if we get results for the stored procedure/SPI route we will 
try and post, but the improvements via standard JDBC are such that we 
aren't really pressed at this point in time to get more throughput so it 
may not happen.


Cheers,

Nick

On 12/14/2010 9:41 AM, Jim Nasby wrote:

On Dec 14, 2010, at 9:27 AM, Andy Colson wrote:

Is this the same thing Nick is working on?  How'd he get along?

http://archives.postgresql.org/message-id/4cd1853f.2010...@noaa.gov


So it is. The one I replied to stood out because no one had replied 
to it; I didn't see the earlier email.

--
Jim C. Nasby, Database Architect   j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net





Oh.. I didn't even notice the date... I thought it was a new post.

But still... (and I'll cc Nick on this)  I'd love to hear an update on 
how this worked out.


Did you get it to go fast?  What'd you use?  Did the project go over 
budget and did you all get fired?  COME ON MAN!  We need to know! :-)


-Andy



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index Bloat - how to tell?

2010-12-14 Thread Dave Crooke
There is a plugin called pgstattuple which can be quite informative 
however, it actually does a full scan of the table / index files, which may
be a bit invasive depending on your environment and load.

http://www.postgresql.org/docs/current/static/pgstattuple.html

It's in the contrib (at least for 8.4), and so you have to import its
functions into your schema using the script in the contrib directory.

Cheers
Dave

On Tue, Dec 14, 2010 at 8:54 AM, Plugge, Joe R. jrplu...@west.com wrote:

 I have used this in the past ... run this against the database that you
 want to inspect.


 SELECT
  current_database(), schemaname, tablename, /*reltuples::bigint,
 relpages::bigint, otta,*/
  ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS
 tbloat,
  CASE WHEN relpages  otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END
 AS wastedbytes,
  iname, /*ituples::bigint, ipages::bigint, iotta,*/
  ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric
 END,1) AS ibloat,
  CASE WHEN ipages  iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
 FROM (
  SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
  (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma
 END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples,
 COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta --
 very rough approximation, assumes all cols
  FROM (
SELECT
  ma,bs,schemaname,tablename,
  (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma
 END)))::numeric AS datahdr,
  (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE
 nullhdr%ma END))) AS nullhdr2
FROM (
  SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
  SELECT 1+count(*)/8
  FROM pg_stats s2
  WHERE null_frac0 AND s2.schemaname = s.schemaname AND
 s2.tablename = s.tablename
) AS nullhdr
  FROM pg_stats s, (
SELECT
  (SELECT current_setting('block_size')::numeric) AS bs,
  CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23
 END AS hdr,
  CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
  ) AS constants
  GROUP BY 1,2,3,4,5
) AS foo
  ) AS rs
  JOIN pg_class cc ON cc.relname = rs.tablename
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname =
 rs.schemaname AND nn.nspname  'information_schema'
  LEFT JOIN pg_index i ON indrelid = cc.oid
  LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
 ) AS sml
 ORDER BY wastedbytes DESC

 -Original Message-
 From: pgsql-performance-ow...@postgresql.org [mailto:
 pgsql-performance-ow...@postgresql.org] On Behalf Of John W Strange
 Sent: Tuesday, December 14, 2010 8:48 AM
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] Index Bloat - how to tell?

 How can you tell when your indexes are starting to get bloated and when you
 need to rebuild them.  I haven't seen a quick way to tell and not sure if
 it's being tracked.




 ___

 | John W. Strange | Investment Bank | Global Commodities Technology

 | J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C:
 281-744-6476 | F: 713 236-

 | john.w.stra...@jpmchase.com | jpmorgan.com



 This communication is for informational purposes only. It is not
 intended as an offer or solicitation for the purchase or sale of
 any financial instrument or as an official confirmation of any
 transaction. All market prices, data and other information are not
 warranted as to completeness or accuracy and are subject to change
 without notice. Any comments or statements made herein do not
 necessarily reflect those of JPMorgan Chase  Co., its subsidiaries
 and affiliates.



 This transmission may contain information that is privileged,
 confidential, legally privileged, and/or exempt from disclosure
 under applicable law. If you are not the intended recipient, you
 are hereby notified that any disclosure, copying, distribution, or
 use of the information contained herein (including any reliance
 thereon) is STRICTLY PROHIBITED. Although this transmission and any
 attachments are believed to be free of any virus or other defect
 that might affect any computer system into which it is received and
 opened, it is the responsibility of the recipient to ensure that it
 is virus free and no responsibility is accepted by JPMorgan Chase 
 Co., its subsidiaries and affiliates, as applicable, for any loss
 or damage arising in any way from its use. If you received this
 transmission in error, please immediately contact the sender and
 destroy the material in its entirety, whether in electronic or hard
 copy format. Thank 

Re: [PERFORM] Index Bloat - how to tell?

2010-12-14 Thread Mark Kirkwood

On 15/12/10 09:12, Dave Crooke wrote:
There is a plugin called pgstattuple which can be quite informative 
 however, it actually does a full scan of the table / index files, 
which may be a bit invasive depending on your environment and load.


http://www.postgresql.org/docs/current/static/pgstattuple.html

It's in the contrib (at least for 8.4), and so you have to import its 
functions into your schema using the script in the contrib directory.




If you are using 8.4 or later, try the Freespacemap module:

http://www.postgresql.org/docs/current/static/pgfreespacemap.html

I tend to run this query:

SELECT oid::regclass,
   pg_relation_size(oid)/(1024*1024) AS mb,
   sum(free)/(1024*1024) AS free_mb
FROM
(SELECT oid, (pg_freespace(oid)).avail AS free
 FROM pg_class) AS a
GROUP BY a.oid ORDER BY free_mb DESC;


to show up potentially troublesome amounts of bloat.

regards

Mark


[PERFORM] only one index is using, why?

2010-12-14 Thread AI Rumman
I have a table in  Postgresql 9.0.1 as folllows:

 Table public.crmentity
Column|Type | Modifiers
--+-+
 crmid| integer | not null
 smcreatorid  | integer | not null default 0
 smownerid| integer | not null default 0
 modifiedby   | integer | not null default 0
 setype   | character varying(30)   | not null
 description  | text|
 createdtime  | timestamp without time zone | not null
 modifiedtime | timestamp without time zone | not null
 viewedtime   | timestamp without time zone |
 status   | character varying(50)   |
 version  | integer | not null default 0
 presence | integer | default 1
 deleted  | integer | not null default 0
Indexes:
crmentity_pkey PRIMARY KEY, btree (crmid)
crmentity_createdtime_idx btree (createdtime)
crmentity_modifiedby_idx btree (modifiedby)
crmentity_modifiedtime_idx btree (modifiedtime)
crmentity_smcreatorid_idx btree (smcreatorid)
crmentity_smownerid_idx btree (smownerid)
ftx_crmentity_descr gin (to_tsvector('english'::regconfig,
replace(description, '!--'::text, '!-'::text)))
crmentity_deleted_idx btree (deleted)
crmentity_setype_idx btree (setype)
Referenced by:
TABLE service CONSTRAINT fk_1_service FOREIGN KEY (serviceid)
REFERENCES crmentity(crmid) ON DELETE CASCADE
TABLE _cc2crmentity CONSTRAINT fk__cc2crmentity_crmentity FOREIGN
KEY (crm_id) REFERENCES crmentity(crmid) ON UPDATE CASCADE ON DELETE CASCADE


EXPLAIN ANALYZE on this table:

explain analyze
select *
FROM crmentity
where   crmentity.deleted=0 and crmentity.setype='Emails'

 Index Scan using crmentity_setype_idx on crmentity  (cost=0.00..1882.76
rows=55469 width=301) (actual time=0.058..158.564 rows=79193 loops=1)
   Index Cond: ((setype)::text = 'Emails'::text)
   Filter: (deleted = 0)
 Total runtime: 231.256 ms
(4 rows)

My question is why crmentity_setype_idx index is being used only.
crmentity_deleted_idx index is not using.

Any idea please.