[PERFORM] have: seq scan - want: index scan

2012-10-17 Thread Chris Ruprecht
Hi guys,

PG = 9.1.5
OS = winDOS 2008R8

I have a table that currently has 207 million rows.
there is a timestamp field that contains data.
more data gets copied from another database into this database.
How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.
I tested different indexes - no same behavior.

The query does this:

SELECT   
audittrailclinical.pgid,   
audittrailclinical.timestamp,   
mmuser.logon,  
audittrailclinical.entityname,   
audittrailclinical.clinicalactivity,   
audittrailclinical.audittraileventcode,   
account.accountnumber,   
patient.dnsortpersonnumber  
FROM   
public.account,   
public.audittrailclinical,   
public.encounter,   
public.entity,   
public.mmuser,   
public.patient,   
public.patientaccount 
WHERE   
  audittrailclinical.encountersid = encounter.encountersid 
and   audittrailclinical.timestamp = '2008-01-01'::timestamp without time zone 
and   audittrailclinical.timestamp = '2012-10-05'::timestamp without time zone
AND  encounter.practiceid = patient.practiceid 
AND  encounter.patientid = patient.patientid 
AND  encounter.staffid = patient.staffid 
AND  entity.entitysid = audittrailclinical.entitysid 
AND  mmuser.mmusersid = audittrailclinical.mmusersid 
AND  patient.practiceid = patientaccount.practiceid 
AND  patient.patientid = patientaccount.patientid 
AND  patientaccount.accountsid = account.accountsid 
AND  patientaccount.defaultaccount = 'Y' 
AND  patient.dnsortpersonnumber = '347450' ;

The query plan says:

  -  Seq Scan on audittrailclinical  (cost=0.00..8637598.76 
rows=203856829 width=62)
Filter: ((timestamp = '2008-01-01 00:00:00'::timestamp 
without time zone) AND (timestamp = '2012-10-05 00:00:00'::timestamp without 
time zone))

which takes forever.

How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.

the table definitions are (the createstamp field is empty - I know, bad data):

CREATE TABLE audittrailclinical
(
  audittrailid text,
  audittraileventcode text,
  clinicalactivity text,
  eventsuccessful text,
  externalunique text,
  recordstamp timestamp without time zone,
  recorddescription text,
  encountersid integer,
  eventuserlogon text,
  computername text,
  applicationcode text,
  practiceid integer,
  mmusersid integer,
  entitysid integer,
  entityname text,
  timestamp timestamp without time zone,
  lastuser integer,
  createstamp timestamp without time zone,
  pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE audittrailclinical
  OWNER TO intergy;
GRANT ALL ON TABLE audittrailclinical TO intergy;
GRANT SELECT ON TABLE audittrailclinical TO rb;

-- Index: atc_en_time

CREATE INDEX atc_en_time
  ON audittrailclinical
  USING btree
  (entitysid , timestamp );

-- Index: atc_id

-- DROP INDEX atc_id;

CREATE INDEX atc_id
  ON audittrailclinical
  USING btree
  (audittrailid COLLATE pg_catalog.default );






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] have: seq scan - want: index scan

2012-10-17 Thread Samuel Gendler
On Tue, Oct 16, 2012 at 4:45 PM, Chris Ruprecht ch...@cdrbill.com wrote:

 Hi guys,

 PG = 9.1.5
 OS = winDOS 2008R8

 I have a table that currently has 207 million rows.
 there is a timestamp field that contains data.
 more data gets copied from another database into this database.
 How do I make this do an index scan instead?
 I did an analyze audittrailclinical to no avail.
 I tested different indexes - no same behavior.


 The query plan says:

   -  Seq Scan on audittrailclinical  (cost=0.00..8637598.76
 rows=203856829 width=62)
 Filter: ((timestamp = '2008-01-01
 00:00:00'::timestamp without time zone) AND (timestamp = '2012-10-05
 00:00:00'::timestamp without time zone))

 which takes forever.

 How do I make this do an index scan instead?
 I did an analyze audittrailclinical to no avail.


analyze says 203 million out of 207 million rows are matched by your
timestamp filter, so it is definitely going to favour a sequential scan,
since an index scan that matches that many rows will inevitably be slower
than simply scanning the table, since it will have to both do the lookups
and load the actual records from the table (all of them, basically) in
order to determine their visibility to you, so your index scan will just
turn sequential access of the table pages into random access and require
index lookups as well.   You can possibly verify this by setting
enable_seqscan to false and running your analyze again and see how the plan
changes, though I don't believe that will necessarily remove all sequential
scans, it just reduces their likelihood, so you may see that nothing
changes. If the estimate for the number of matching rows is incorrect,
you'll want to increase the statistics gathering for that table or just
that column.

ALTER TABLE table ALTER COLUMN column SET STATISTICS number

where number is between 10 and 1000 and I think the default is 100.  Then
re-analyze the table and see if the query plan shows better estimates.  I
think 9.2 also supports index only scans which eliminate the need to load
the matched records in certain circumstances. However, all of the columns
used by the query would need to be in the index, and you are using an awful
lot of columns between the select clause and the table joins.

Are you lacking indexes on the columns used for joins that would allow more
selective index scans on those columns which could then just filter by
timestamp?  I'm not much of an expert on the query planner, so I'm not sure
what exactly will cause that behaviour, but I'd think that good statistics
and useful indexes should allow the rest of the where clause to be more
selective of the rows from audittrailclinical unless
patientaccount.defaultaccount
= 'Y' and patient.dnsortpersonnumber = '347450'  are similarly
non-selective, though patient.dnsortpersonnumber would seem like it is
probably the strong filter, so make sure you've got indexes and accurate
stats on all of the foreign keys that connect patient table and
audittrailclinical table.  It'd be useful to see the rest of the explain
analyze output so we could see how it is handling the joins and why.  Note
that because you have multiple composite foreign keys joining tables in
your query, you almost certainly won't those composite keys in a single
index.  If you have indexes on those columns but they are single-column
indexes, that may be what is causing the planner to try to filter the atc
table on the timestamp rather than via the joins.  I'm sure someone more
knowledgable than I will be along eventually to correct any misinformation
I may have passed along.  Without knowing anything about your schema or the
rest of the explain analyze output, I'm mostly just guessing.  There is an
entire page devoted to formulating useful mailing list questions,
incidentally.  Yours really isn't.  Or if the atc table definition is
complete, you are definitely missing potentially useful indexes, since you
are joining to that table via encountersid and you don't show an index on
that column - yet that is the column that eventually joins out to the
patient and patientaccount tables, which have the stronger filters on them.

Incidentally, why the join to the entity table via entitysid?  No columns
from that table appear to be used anywhere else in the query.

--sam


[PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Chris Ruprecht
Hi guys,

PG = 9.1.5
OS = winDOS 2008R8

I have a table that currently has 207 million rows.
there is a timestamp field that contains data.
more data gets copied from another database into this database.
How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.
I tested different indexes - no same behavior.

The query does this:

SELECT   
audittrailclinical.pgid,   
audittrailclinical.timestamp,   
mmuser.logon,  
audittrailclinical.entityname,   
audittrailclinical.clinicalactivity,   
audittrailclinical.audittraileventcode,   
account.accountnumber,   
patient.dnsortpersonnumber  
FROM   
public.account,   
public.audittrailclinical,   
public.encounter,   
public.entity,   
public.mmuser,   
public.patient,   
public.patientaccount 
WHERE   
 audittrailclinical.encountersid = encounter.encountersid 
and   audittrailclinical.timestamp = '2008-01-01'::timestamp without time zone 
and   audittrailclinical.timestamp = '2012-10-05'::timestamp without time zone
AND  encounter.practiceid = patient.practiceid 
AND  encounter.patientid = patient.patientid 
AND  encounter.staffid = patient.staffid 
AND  entity.entitysid = audittrailclinical.entitysid 
AND  mmuser.mmusersid = audittrailclinical.mmusersid 
AND  patient.practiceid = patientaccount.practiceid 
AND  patient.patientid = patientaccount.patientid 
AND  patientaccount.accountsid = account.accountsid 
AND  patientaccount.defaultaccount = 'Y' 
AND  patient.dnsortpersonnumber = '347450' ;

The query plan says:

  -  Seq Scan on audittrailclinical  (cost=0.00..8637598.76 
rows=203856829 width=62)
Filter: ((timestamp = '2008-01-01 00:00:00'::timestamp 
without time zone) AND (timestamp = '2012-10-05 00:00:00'::timestamp without 
time zone))

which takes forever.

How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.

the table definitions are (the createstamp field is empty - I know, bad data):

CREATE TABLE audittrailclinical
(
 audittrailid text,
 audittraileventcode text,
 clinicalactivity text,
 eventsuccessful text,
 externalunique text,
 recordstamp timestamp without time zone,
 recorddescription text,
 encountersid integer,
 eventuserlogon text,
 computername text,
 applicationcode text,
 practiceid integer,
 mmusersid integer,
 entitysid integer,
 entityname text,
 timestamp timestamp without time zone,
 lastuser integer,
 createstamp timestamp without time zone,
 pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass)
)
WITH (
 OIDS=FALSE
);
ALTER TABLE audittrailclinical
 OWNER TO intergy;
GRANT ALL ON TABLE audittrailclinical TO intergy;
GRANT SELECT ON TABLE audittrailclinical TO rb;

-- Index: atc_en_time

CREATE INDEX atc_en_time
 ON audittrailclinical
 USING btree
 (entitysid , timestamp );

-- Index: atc_id

-- DROP INDEX atc_id;

CREATE INDEX atc_id
 ON audittrailclinical
 USING btree
 (audittrailid COLLATE pg_catalog.default );


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Evgeny Shishkin

On Oct 17, 2012, at 3:52 AM, Chris Ruprecht ch...@cdrbill.com wrote:

 Hi guys,
 
 PG = 9.1.5
 OS = winDOS 2008R8
 
 I have a table that currently has 207 million rows.
 there is a timestamp field that contains data.
 more data gets copied from another database into this database.
 How do I make this do an index scan instead?
 I did an analyze audittrailclinical to no avail.
 I tested different indexes - no same behavior.
 
 The query does this:
 
 SELECT   
 audittrailclinical.pgid,   
 audittrailclinical.timestamp,   
 mmuser.logon,  
 audittrailclinical.entityname,   
 audittrailclinical.clinicalactivity,   
 audittrailclinical.audittraileventcode,   
 account.accountnumber,   
 patient.dnsortpersonnumber  
 FROM   
 public.account,   
 public.audittrailclinical,   
 public.encounter,   
 public.entity,   
 public.mmuser,   
 public.patient,   
 public.patientaccount 
 WHERE   
 audittrailclinical.encountersid = encounter.encountersid 
 and   audittrailclinical.timestamp = '2008-01-01'::timestamp without time 
 zone 
 and   audittrailclinical.timestamp = '2012-10-05'::timestamp without time 
 zone
 AND  encounter.practiceid = patient.practiceid 
 AND  encounter.patientid = patient.patientid 
 AND  encounter.staffid = patient.staffid 
 AND  entity.entitysid = audittrailclinical.entitysid 
 AND  mmuser.mmusersid = audittrailclinical.mmusersid 
 AND  patient.practiceid = patientaccount.practiceid 
 AND  patient.patientid = patientaccount.patientid 
 AND  patientaccount.accountsid = account.accountsid 
 AND  patientaccount.defaultaccount = 'Y' 
 AND  patient.dnsortpersonnumber = '347450' ;
 
 The query plan says:
 
   -  Seq Scan on audittrailclinical  (cost=0.00..8637598.76 
 rows=203856829 width=62)
 Filter: ((timestamp = '2008-01-01 
 00:00:00'::timestamp without time zone) AND (timestamp = '2012-10-05 
 00:00:00'::timestamp without time zone))
 
 which takes forever.
 

Selecting 5 yours of data is not selective at all, so postgres decides it is 
cheaper to do seqscan. 

Do you have an index on patient.dnsortpersonnumber? Can you post a result from 
select count(*) from patient where dnsortpersonnumber = '347450'; ?


 How do I make this do an index scan instead?
 I did an analyze audittrailclinical to no avail.
 
 the table definitions are (the createstamp field is empty - I know, bad data):
 
 CREATE TABLE audittrailclinical
 (
 audittrailid text,
 audittraileventcode text,
 clinicalactivity text,
 eventsuccessful text,
 externalunique text,
 recordstamp timestamp without time zone,
 recorddescription text,
 encountersid integer,
 eventuserlogon text,
 computername text,
 applicationcode text,
 practiceid integer,
 mmusersid integer,
 entitysid integer,
 entityname text,
 timestamp timestamp without time zone,
 lastuser integer,
 createstamp timestamp without time zone,
 pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass)
 )
 WITH (
 OIDS=FALSE
 );
 ALTER TABLE audittrailclinical
 OWNER TO intergy;
 GRANT ALL ON TABLE audittrailclinical TO intergy;
 GRANT SELECT ON TABLE audittrailclinical TO rb;
 
 -- Index: atc_en_time
 
 CREATE INDEX atc_en_time
 ON audittrailclinical
 USING btree
 (entitysid , timestamp );
 
 -- Index: atc_id
 
 -- DROP INDEX atc_id;
 
 CREATE INDEX atc_id
 ON audittrailclinical
 USING btree
 (audittrailid COLLATE pg_catalog.default );
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Chris Ruprecht

On Oct 16, 2012, at 20:01 , Evgeny Shishkin itparan...@gmail.com wrote:

 Selecting 5 yours of data is not selective at all, so postgres decides it is 
 cheaper to do seqscan. 
 
 Do you have an index on patient.dnsortpersonnumber? Can you post a result 
 from 
 select count(*) from patient where dnsortpersonnumber = '347450'; ?
 

Yes, there is an index:

Aggregate  (cost=6427.06..6427.07 rows=1 width=0)
  -  Index Scan using patient_pracsortpatientnumber on patient  
(cost=0.00..6427.06 rows=1 width=0)
Index Cond: (dnsortpersonnumber = '347450'::text)


In fact, all the other criteria is picked using an index. I fear that the = 
and = on the timestamp is causing the issue. If I do a = of just one of 
them, I get an index scan. But I need to scan the entire range. I get queries 
like give me everything that was entered into the system for this patient 
between these two dates. A single date wouldn't work.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Bruce Momjian
On Tue, Oct 16, 2012 at 08:19:43PM -0400, Chris Ruprecht wrote:
 
 On Oct 16, 2012, at 20:01 , Evgeny Shishkin itparan...@gmail.com wrote:
 
  Selecting 5 yours of data is not selective at all, so postgres decides it 
  is cheaper to do seqscan. 
  
  Do you have an index on patient.dnsortpersonnumber? Can you post a result 
  from 
  select count(*) from patient where dnsortpersonnumber = '347450'; ?
  
 
 Yes, there is an index:
 
 Aggregate  (cost=6427.06..6427.07 rows=1 width=0)
   -  Index Scan using patient_pracsortpatientnumber on patient  
 (cost=0.00..6427.06 rows=1 width=0)
 Index Cond: (dnsortpersonnumber = '347450'::text)
 
 
 In fact, all the other criteria is picked using an index. I fear that the = 
 and = on the timestamp is causing the issue. If I do a = of just one of 
 them, I get an index scan. But I need to scan the entire range. I get queries 
 like give me everything that was entered into the system for this patient 
 between these two dates. A single date wouldn't work.

Have you read our FAQ on this matter?


http://wiki.postgresql.org/wiki/FAQ#Why_are_my_queries_slow.3F_Why_don.27t_they_use_my_indexes.3F

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Chris Ruprecht
Thanks Bruce, 

I have, and I even thought, I understood it :). 

I just ran an explain analyze on another table - and ever since the query plan 
changed. It's now using the index as expected. I guess, I have some more 
reading to do.

On Oct 16, 2012, at 20:31 , Bruce Momjian br...@momjian.us wrote:

 
 Have you read our FAQ on this matter?
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance