[PERFORM] Odd blocking (or massively latent) issue - even with EXPLAIN

2012-07-23 Thread Jim Vanns
(First, apologies if this post now appears twice - it appears our mail server 
rewrites my address!)

Hello all. I'm a pgsql performance virgin so hope I cross all the 't's
and dot the lower-case 'j's when posting this query...

On our production database server we're experiencing behaviour that
several engineers are unable to explain - hence this Email. First, our
specs;

Scientific Linux 6.2, kernel 2.6.32
PG version 9.1.3, release 1PGDG.rhel6
24GB RAM
8 cores
2x software SSD-based RAIDs:
  a) ~660GB, RAID 5, 4 SSDs (data)
  b) ~160GB, RAID 1, 2 SSDs (xlogs + tmp tables)

We're seeing SELECT statements and even EXPLAIN (no ANAYLZE) 
statements hang indefinitely until *something* (we don't know what)
releases some kind of resource or no longer becomes a massive bottle
neck. These are the symptoms.

However, the system seems healthy - no table ('heavyweight') locks are
held by any session (this happens with only a few connected sessions),
all indexes are used correctly, other transactions are writing data (we
generally only have a few sessions running at a time - perhaps 10) etc.
etc. In fact, we are writing (or bgwriter is), 2-3 hundred MB/s
sometimes.

We regularly run vacuum analyze at quiet periods - generally 1-2s daily.

These sessions (that only read data) that are blocked can block from
anything from between only 5 minutes to 10s of hours then miraculously
complete successfully at once.

Any suggestions for my next avenue of investigation? I'll try and
capture more data by observation next time it happens (it is relatively
intermittent).

Regards,

Jim

PS. These are the settings that differ from the default:

checkpoint_segments = 128
maintenance_work_mem = 256MB
synchronous_commit = off
random_page_cost = 3.0
wal_buffers = 16MB
shared_buffers = 8192MB
checkpoint_completion_target = 0.9
effective_cache_size = 18432MB
work_mem = 32MB
effective_io_concurrency = 12
max_stack_depth = 8MB
log_autovacuum_min_duration = 0
log_lock_waits = on
autovacuum_vacuum_scale_factor = 0.1
autovacuum_naptime = 8
autovacuum_max_workers = 4

PPS. I've just noticed that our memory configuration is over subscribed!
 shared_buffers + effective_cache_size  Total available RAM! Could 
 this be the root cause somehow?

-- 
Jim Vanns
Systems Programmer
Framestore


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Shards + hash = forever running queries

2012-07-23 Thread Daniele Varrazzo
Hello,

We are using Postgres 9.1.4. We are struggling with a class of queries
that got impossible to run after sharding a large table. Everything
like:

select small.something, big.anything
from small join big on small.big_id = big.id;

and variation such as select * from big where id in (select big_id from small)

Since big was sharded, the query plan results in something like:

 Hash Join  (cost=101.23..30038997974.72 rows=10 width=753)
   Hash Cond: (b.id = i.big_id)
   -  Append  (cost=0.00..20038552251.23 rows=118859245 width=11)
 -  Index Scan using big_201207_pkey on big_201207 b
(cost=0.00..2224100.46 rows=1609634 width=12)
 -  Index Scan using big_201101_pkey on big_201101 b
(cost=0.00..404899.71 rows=5437497 width=12)
 -  Index Scan using big_201104_pkey on big_201104 b
(cost=0.00..349657.58 rows=4625181 width=12)
 - [...all the shards]
   -  Hash  (cost=101.10..101.10 rows=10 width=742)
 -  Seq Scan on small i  (cost=100.00..101.10
rows=10 width=742)

Postgres ends up in never-ending reads: even if small has only three
rows I've never seen such query finishing, the time passed being even
longer than a full scan on big.

The plan looks sub-optimal, as it seems it first does a huge indexscan
of all the partitions, then it joins the result against a small hash.

1. Can we fix the queries to work around this problem?

2. Could the planner be fixed for this scenario for PG 9.2 (or 9.3)?
Creating the hash beforehand, performing an hash join for each
partition and merging the results looks like it would bring it back
into the realm of the runnable queries. Am I wrong?

Thank you very much.

-- Daniele

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Shards + hash = forever running queries

2012-07-23 Thread Daniele Varrazzo
On Mon, Jul 23, 2012 at 11:03 AM, Daniele Varrazzo
daniele.varra...@gmail.com wrote:

 1. Can we fix the queries to work around this problem?

As a stop-gap measure I've defined a get_big(id) function and using it
to pull in the details we're interested into from the big table:

create function get_big (id int) returns big as $$
select * from big where id = $1;
$$ language sql stable strict;

I'm not completely satisfied by it though: if there's any better
solution I'd be happy to know.

Thank you,

-- Daniele

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Odd blocking (or massively latent) issue - even with EXPLAIN

2012-07-23 Thread Martin French
Hi  We're seeing SELECT statements and even EXPLAIN (no ANAYLZE)  statements hang indefinitely until *something* (we don't know what) releases some kind of resource or no longer becomes a massive bottle neck. These are the symptoms.Is this in pgAdmin? Or psql on the console? However, the system seems healthy - no table ('heavyweight') locks are held by any session (this happens with only a few connected sessions), all indexes are used correctly, other transactions are writing data (we generally only have a few sessions running at a time - perhaps 10) etc. etc. In fact, we are writing (or bgwriter is), 2-3 hundred MB/s sometimes.What is shown in top and iostat whilst the queries are running?  We regularly run vacuum analyze at quiet periods - generally 1-2s daily.  These sessions (that only read data) that are blocked can block from anything from between only 5 minutes to 10s of hours then miraculously complete successfully at once. Are any blockers shown in pg_stat_activity?  checkpoint_segments = 128 maintenance_work_mem = 256MB synchronous_commit = off random_page_cost = 3.0 wal_buffers = 16MB shared_buffers = 8192MB checkpoint_completion_target = 0.9 effective_cache_size = 18432MB work_mem = 32MB effective_io_concurrency = 12 max_stack_depth = 8MB log_autovacuum_min_duration = 0 log_lock_waits = on autovacuum_vacuum_scale_factor = 0.1 autovacuum_naptime = 8 autovacuum_max_workers = 4Memory looks reasonably configured to me. effective_cache_size is only an indication to the planner and is not actually allocated. Is anything being written to the logfiles?Cheers=

Romax Technology Limited
Rutherford House
Nottingham Science & Technology Park
Nottingham, 
NG7 2PZ
England

Telephone numbers:
+44 (0)115 951 88 00 (main)

For other office locations see:
http://www.romaxtech.com/Contact
=
===
E-mail: i...@romaxtech.com
Website: www.romaxtech.com
=


Confidentiality Statement
This transmission is for the addressee only and contains information that is confidential and privileged.
Unless you are the named addressee, or authorised to receive it on behalf of the addressee 
you may not copy or use it, or disclose it to anyone else. 
If you have received this transmission in error please delete from your system and contact the sender. Thank you for your cooperation.
=





Re: [PERFORM] Odd blocking (or massively latent) issue - even with EXPLAIN

2012-07-23 Thread Andrew Dunstan


On 07/23/2012 04:41 AM, Jim Vanns wrote:

We're seeing SELECT statements and even EXPLAIN (no ANAYLZE)
statements hang indefinitely until *something* (we don't know what)
releases some kind of resource or no longer becomes a massive bottle
neck. These are the symptoms.



I have seen this sort of behaviour on systems with massive catalogs 
(millions of tables and indexes). Could that be your problem?



cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Odd blocking (or massively latent) issue - even with EXPLAIN

2012-07-23 Thread Jim Vanns
Thank you all for your replies, I shall try and qualify and confirm...

On Mon, 2012-07-23 at 14:46 +0100, Martin French wrote:
 Hi
 
  
  We're seeing SELECT statements and even EXPLAIN (no ANAYLZE) 
  statements hang indefinitely until *something* (we don't know what)
  releases some kind of resource or no longer becomes a massive bottle
  neck. These are the symptoms.
 
 Is this in pgAdmin? Or psql on the console?
 
psql

  However, the system seems healthy - no table ('heavyweight') locks
 are
  held by any session (this happens with only a few connected
 sessions),
  all indexes are used correctly, other transactions are writing data
 (we
  generally only have a few sessions running at a time - perhaps 10)
 etc.
  etc. In fact, we are writing (or bgwriter is), 2-3 hundred MB/s
  sometimes.
 
 What is shown in top and iostat whilst the queries are running?

Generally, lots of CPU churn (90-100%) and a fair bit of I/O wait.
iostat reports massive reads (up to 300MB/s).

  
  We regularly run vacuum analyze at quiet periods - generally 1-2s
 daily.

(this is to answer to someone who didn't reply to the list)

We run full scans using vacuumdb so don't just rely on autovacuum. The
small table is so small (50 tuples) a sequence scan is always
performed.

  These sessions (that only read data) that are blocked can block from
  anything from between only 5 minutes to 10s of hours then
 miraculously
  complete successfully at once.
  
 
 Are any blockers shown in pg_stat_activity?

None. Ever. Nothing in pg_locks either.

  
  checkpoint_segments = 128
  maintenance_work_mem = 256MB
  synchronous_commit = off
  random_page_cost = 3.0
  wal_buffers = 16MB
  shared_buffers = 8192MB
  checkpoint_completion_target = 0.9
  effective_cache_size = 18432MB
  work_mem = 32MB
  effective_io_concurrency = 12
  max_stack_depth = 8MB
  log_autovacuum_min_duration = 0
  log_lock_waits = on
  autovacuum_vacuum_scale_factor = 0.1
  autovacuum_naptime = 8
  autovacuum_max_workers = 4
 
 Memory looks reasonably configured to me. effective_cache_size is only
 an indication to the planner and is not actually allocated. 

I realise that.

 Is anything being written to the logfiles?

Nothing obvious - and we log a fair amount. No tmp table creations,
no locks held. 

To add to this EXPLAIN reports it took only 0.23ms to run (for example)
whereas the wall clock time is more like 20-30 minutes (or up to n hours
as I said where everything appears to click back into place at the same
time).

Thanks.

Jim

 Cheers= Romax Technology
 Limited Rutherford House Nottingham Science  Technology Park
 Nottingham, NG7 2PZ England Telephone numbers: +44 (0)115 951 88 00
 (main) For other office locations see:
 http://www.romaxtech.com/Contact =
 === E-mail: i...@romaxtech.com Website: www.romaxtech.com
 =  Confidentiality
 Statement This transmission is for the addressee only and contains
 information that is confidential and privileged. Unless you are the
 named addressee, or authorised to receive it on behalf of the
 addressee you may not copy or use it, or disclose it to anyone else.
 If you have received this transmission in error please delete from
 your system and contact the sender. Thank you for your cooperation.
 =
 

-- 
Jim Vanns
Systems Programmer
Framestore


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Odd blocking (or massively latent) issue - even with EXPLAIN

2012-07-23 Thread Jim Vanns
On Mon, 2012-07-23 at 09:53 -0400, Andrew Dunstan wrote:
 On 07/23/2012 04:41 AM, Jim Vanns wrote:
  We're seeing SELECT statements and even EXPLAIN (no ANAYLZE)
  statements hang indefinitely until *something* (we don't know what)
  releases some kind of resource or no longer becomes a massive bottle
  neck. These are the symptoms.
 
 I have seen this sort of behaviour on systems with massive catalogs 
 (millions of tables and indexes). Could that be your problem?

Possibly. I'm not familiar with the catalogs. I'll look into that.

Thanks,

Jim

 
 cheers
 
 andrew
 

-- 
Jim Vanns
Systems Programmer
Framestore


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Shards + hash = forever running queries

2012-07-23 Thread Tom Lane
Daniele Varrazzo daniele.varra...@gmail.com writes:
 Since big was sharded, the query plan results in something like:

  Hash Join  (cost=101.23..30038997974.72 rows=10 width=753)
Hash Cond: (b.id = i.big_id)
-  Append  (cost=0.00..20038552251.23 rows=118859245 width=11)
  -  Index Scan using big_201207_pkey on big_201207 b
 (cost=0.00..2224100.46 rows=1609634 width=12)
  -  Index Scan using big_201101_pkey on big_201101 b
 (cost=0.00..404899.71 rows=5437497 width=12)
  -  Index Scan using big_201104_pkey on big_201104 b
 (cost=0.00..349657.58 rows=4625181 width=12)
  - [...all the shards]
-  Hash  (cost=101.10..101.10 rows=10 width=742)
  -  Seq Scan on small i  (cost=100.00..101.10
 rows=10 width=742)

[ squint... ]  9.1 certainly ought to be able to find a smarter plan for
such a case.  For instance, if I try this on 9.1 branch tip:

regression=# create table p (id int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index p_pkey for 
table p
CREATE TABLE
regression=# create table c1 (primary key (id)) inherits(p);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index c1_pkey for 
table c1
CREATE TABLE
regression=# create table c2 (primary key (id)) inherits(p);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index c2_pkey for 
table c2
CREATE TABLE
regression=# explain select * from p,int4_tbl where id=f1;
   QUERY PLAN   

 Nested Loop  (cost=0.00..53.25 rows=120 width=8)
   Join Filter: (public.p.id = int4_tbl.f1)
   -  Seq Scan on int4_tbl  (cost=0.00..1.05 rows=5 width=4)
   -  Append  (cost=0.00..10.40 rows=3 width=4)
 -  Index Scan using p_pkey on p  (cost=0.00..1.87 rows=1 width=4)
   Index Cond: (id = int4_tbl.f1)
 -  Index Scan using c1_pkey on c1 p  (cost=0.00..4.27 rows=1 width=4)
   Index Cond: (id = int4_tbl.f1)
 -  Index Scan using c2_pkey on c2 p  (cost=0.00..4.27 rows=1 width=4)
   Index Cond: (id = int4_tbl.f1)
(10 rows)

You have evidently got enable_seqscan turned off, so I wonder whether
the cost penalties applied by that are swamping the estimates.  Do you
get any better results if you re-enable that?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Odd blocking (or massively latent) issue - even with EXPLAIN

2012-07-23 Thread Tom Lane
Jim Vanns james.va...@framestore.com writes:
 We're seeing SELECT statements and even EXPLAIN (no ANAYLZE) 
 statements hang indefinitely until *something* (we don't know what)
 releases some kind of resource or no longer becomes a massive bottle
 neck. These are the symptoms.

Does anything show up as blocked in the pg_locks view?

Could you attach to the stuck process with gdb and get a stack trace?

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Odd blocking (or massively latent) issue - even with EXPLAIN

2012-07-23 Thread Jim Vanns
On Mon, 2012-07-23 at 11:09 -0400, Tom Lane wrote:
 Jim Vanns james.va...@framestore.com writes:
  We're seeing SELECT statements and even EXPLAIN (no ANAYLZE) 
  statements hang indefinitely until *something* (we don't know what)
  releases some kind of resource or no longer becomes a massive bottle
  neck. These are the symptoms.
 
 Does anything show up as blocked in the pg_locks view?

Nope.

 Could you attach to the stuck process with gdb and get a stack trace?

Haven't been quite brave enough to do that yet - this is a production
server. I did manage to strace a process though - it (the server side
process of a psql EXPLAIN) appeared to spin on an awful lot of semop()
calls with the occasional read(). Of course, in the context of a shared
memory system such as postgres I'd expect to see quite a lot of semop()
calls but I've no idea how much is normal and how much is excessive.

Jim

   regards, tom lane
 

-- 
Jim Vanns
Systems Programmer
Framestore


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres clustering interactions with pg_dump

2012-07-23 Thread Robert Haas
On Mon, Jun 11, 2012 at 9:55 AM, Fitch, Britt bri...@telenav.com wrote:
 Hi, I have a table that I am clustering on an index.

 I am then dumping that table via pg_dump –Fc and loading it into another
 database via pg_restore.

 It is unclear to me though if the clustering I did in the original database
 is preserved during the dump  restore or if I would still need to perform a
 CLUSTER again once the data was loaded into the new database.

 Can anyone confirm this?

The rows will end up in the new table in the same physical order that
they were stored in the dump file.

You might want to look at pg_stats.correlation for the clustered
column - that's often a good way to know whether things are ordered
the way you expect, and it's updated every time the table is analyzed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU Usage

2012-07-23 Thread Robert Haas
On Thu, Jun 14, 2012 at 11:15 AM, Siddharth Shah
siddharth.s...@elitecore.com wrote:
 I have around 1000 schema in database, Each schema having similar data
 structure with different data
 Each schema has few tables which never updates (Read only table) and other
 tables rewrites almost everyday so I prefer to TRUNCATE those tables and
 restores with new data

 Now facing issue on high CPU  IO on database primarily of Stats Collector 
 Vacuuming, size of statfile is almost 28MB

How many tables do you have across all the schemas?

 and when I manually vacuum
 analyze complete database it takes almost 90 minutes though auto vacuum is
 configured

There's no real reason to run vacuum analyze manually if you have
autovacuum configured.

 Restoring dump on each schema may minor data variations
 Executing SQL statements on schema are few , Affecting less than 50 touple /
 day

 My Questions :

 Increasing Maintainace_Work_Mem improves auto / manual vacuum performance ?

It can, but mostly if there are a lot of updates or deletes.  If the
tables aren't changing much it isn't going to do anything.

 If it improves will it require more IO / CPU resource ?
 If I stops Stats Collector process  auto vaccuming  Execute manual vaccum
 based on schema restoration with major change what performance parameter I
 need to consider ? (Restoring data has vary few changes)
 Is Vacuuming  Stats required here for Metadata for improving performance ?
 (Table structures remain same)

 Any more on this which can help to reduce IO without affecting major
 performance

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Efficiency of EXISTS?

2012-07-23 Thread Kenneth Tilton
My mental model of the EXISTS clause must be off. This snippet appears at
the end of a series of WITH clauses I suspect are irrelevant:

with etc etc ... , cids as
   (select distinct c.id from ddr2 c
 join claim_entries ce on ce.claim_id = c.id
 where (c.assigned_ddr = 879
 or exists (select 1 from ddr_cdt dc
 where
 dc.sys_user_id = 879
 and dc.document_type = c.document_type
 -- makes it faster: and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code)
 )))

 select count(*) from cids


If I uncomment the bit where it says make it faster I get decent response
and the graphical analyze display shows the expected user+doctype+cdtcode
index is being used (and nice thin lines suggesting efficient lookup).

As it is, the analyze display shows the expected user+doctype index* being
used but the lines are fat, and performance is an exponential disaster.

* I created the (to me ) redundant user+doctype index trying to get
Postgres to Do the Right Thing(tm), but I can see that was not the issue.

I presume the reason performance drops off a cliff is because there can be
9000 cdt_codes for one user+doctype, but I was hoping EXISTS would just
look to see if there was at least one row matching user+doctype and return
its decision. I have tried select *, select 1, and limit 1 on the nested
select to no avail.

Am I just doing something wrong? I am a relative noob. Is there some other
hint I can give the planner?

Thx, ken


Re: [PERFORM] Efficiency of EXISTS?

2012-07-23 Thread Merlin Moncure
On Mon, Jul 23, 2012 at 4:12 PM, Kenneth Tilton ktil...@mcna.net wrote:
 My mental model of the EXISTS clause must be off. This snippet appears at
 the end of a series of WITH clauses I suspect are irrelevant:

 with etc etc ... , cids as
   (select distinct c.id from ddr2 c
 join claim_entries ce on ce.claim_id = c.id
 where (c.assigned_ddr = 879
 or exists (select 1 from ddr_cdt dc
 where
 dc.sys_user_id = 879
 and dc.document_type = c.document_type
 -- makes it faster: and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code)
 )))

 select count(*) from cids


 If I uncomment the bit where it says make it faster I get decent response
 and the graphical analyze display shows the expected user+doctype+cdtcode
 index is being used (and nice thin lines suggesting efficient lookup).

 As it is, the analyze display shows the expected user+doctype index* being
 used but the lines are fat, and performance is an exponential disaster.

 * I created the (to me ) redundant user+doctype index trying to get Postgres
 to Do the Right Thing(tm), but I can see that was not the issue.

 I presume the reason performance drops off a cliff is because there can be
 9000 cdt_codes for one user+doctype, but I was hoping EXISTS would just look
 to see if there was at least one row matching user+doctype and return its
 decision. I have tried select *, select 1, and limit 1 on the nested select
 to no avail.

 Am I just doing something wrong? I am a relative noob. Is there some other
 hint I can give the planner?

hard to say without having the explain analyze output.  also it's not
clear why you need to use WITH, at least for the terminating query.
I'd just do:

select count(*) from
(
  inner_query
)

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Efficiency of EXISTS?

2012-07-23 Thread Kenneth Tilton
On Mon, Jul 23, 2012 at 2:52 PM, Merlin Moncure mmonc...@gmail.com wrote:

 On Mon, Jul 23, 2012 at 4:12 PM, Kenneth Tilton ktil...@mcna.net wrote:
  My mental model of the EXISTS clause must be off. This snippet appears at
  the end of a series of WITH clauses I suspect are irrelevant:
 
  with etc etc ... , cids as
(select distinct c.id from ddr2 c
  join claim_entries ce on ce.claim_id = c.id
  where (c.assigned_ddr = 879
  or exists (select 1 from ddr_cdt dc
  where
  dc.sys_user_id = 879
  and dc.document_type = c.document_type
  -- makes it faster: and (dc.cdt_code is null or dc.cdt_code =
 ce.cpt_code)
  )))
 
  select count(*) from cids
 
 
  If I uncomment the bit where it says make it faster I get decent
 response
  and the graphical analyze display shows the expected user+doctype+cdtcode
  index is being used (and nice thin lines suggesting efficient lookup).
 
  As it is, the analyze display shows the expected user+doctype index*
 being
  used but the lines are fat, and performance is an exponential disaster.
 
  * I created the (to me ) redundant user+doctype index trying to get
 Postgres
  to Do the Right Thing(tm), but I can see that was not the issue.
 
  I presume the reason performance drops off a cliff is because there can
 be
  9000 cdt_codes for one user+doctype, but I was hoping EXISTS would just
 look
  to see if there was at least one row matching user+doctype and return its
  decision. I have tried select *, select 1, and limit 1 on the nested
 select
  to no avail.
 
  Am I just doing something wrong? I am a relative noob. Is there some
 other
  hint I can give the planner?

 hard to say without having the explain analyze output.  also it's not
 clear why you need to use WITH, at least for the terminating query.
 I'd just do:

 select count(*) from
 (
   inner_query
 )


OK. Here is the full query:

with ddr as (
select c.id
 ,case
   when c.portal_user_id is null then u.provider_facility_id
   else pu.provider_facility_id
  end provider_facility_id
 from claims c
 left join sys_users u on u.id = c.created_by
 left join portal_users pu on pu.id = c.portal_user_id
 WHERE c.deleted = 0
 AND c.status = 0
 AND (c.created_by is not null or c.portal_user_id is not null)
 AND true not in ( select ineligible_code_id in (46,65)
from claim_carcs cc
where c.id = cc.claim_id
and cc.deleted = 0 )
 AND (false OR c.document_type = 0)
 AND (false OR c.group_plan_id = 44)

 limit 1500
)

,ddr2 as (
select c.id
 , c.document_type
 , c.radiographs
 , c.nea_number
 , c.assigned_ddr
 , d.provider_facility_id as submitting_facility_id
 , count(ca.id) as claim_attachments_count
 , cast(exists (select 1 from triples where s = c.id and sda='claims' and p
= 'ddr-review-passed-on-by') as boolean) as passedon
 from ddr d
inner join
claims c on d.id = c.id
join claim_attachments ca on c.id = ca.claim_id
group by
 c.id
 , submitting_facility_id
having ((nullif(trim(c.nea_number, ' '),'') is not null)
or case transmission_method
when 'P' then count(distinct ca.id)  1
else count(distinct ca.id)  0
   end
or c.radiographs  0))

, cids as
  (select distinct c.id from ddr2 c
join claim_entries ce on ce.claim_id = c.id
where (c.assigned_ddr = 879
or exists (select 1 from ddr_cdt dc
where
dc.sys_user_id = 879
and dc.document_type = c.document_type
--and (dc.cdt_code is null or dc.cdt_code = ce.cpt_code)
)))
select count(*) from cids

And the explain output:

Aggregate  (cost=56060.60..56060.61 rows=1 width=0)
  CTE ddr
-  Limit  (cost=306.29..16203.83 rows=1500 width=16)
  -  Nested Loop Left Join  (cost=306.29..7442626.75 rows=702214
width=16)
-  Hash Left Join  (cost=306.29..7244556.97 rows=702214
width=12)
  Hash Cond: (c.created_by = u.id)
  -  Index Scan using claims_lca1 on claims c
 (cost=0.00..7230212.96 rows=702214 width=12)
Index Cond: ((deleted = 0) AND (status = 0)
AND (group_plan_id = 44) AND (document_type = 0))
Filter: (((created_by IS NOT NULL) OR
(portal_user_id IS NOT NULL)) AND (NOT (SubPlan 1)))
SubPlan 1
  -  Index Scan using claim_carcs_claim_id on
claim_carcs cc  (cost=0.00..9.23 rows=1 width=4)
Index Cond: (c.id = claim_id)
Filter: (deleted = 0)
  -  Hash  (cost=224.46..224.46 rows=6546 width=8)
-  Seq Scan on sys_users u
 (cost=0.00..224.46 rows=6546 width=8)
-  Index Scan using portal_users_pkey on portal_users pu
 (cost=0.00..0.27 rows=1 width=8)
  Index Cond: (id = c.portal_user_id)
  CTE ddr2
-  GroupAggregate  (cost=25714.40..28093.98 rows=286 width=27)
  Filter: ((NULLIF(btrim((c.nea_number)::text, ' '::text),
''::text) IS NOT NULL) OR CASE c.transmission_method WHEN 'P'::bpchar THEN
(count(DISTINCT ca.id)  1) ELSE (count(DISTINCT ca.id)  

Re: [PERFORM] Odd blocking (or massively latent) issue - even with EXPLAIN

2012-07-23 Thread Craig Ringer

On 07/23/2012 10:46 PM, Jim Vanns wrote:

Nothing obvious - and we log a fair amount. No tmp table creations,
no locks held.

To add to this EXPLAIN reports it took only 0.23ms to run (for example)
whereas the wall clock time is more like 20-30 minutes (or up to n hours
as I said where everything appears to click back into place at the same
time).

How many concurrent connections do you have?

--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance