Re: [PERFORM] Slow Delete : Seq scan instead of index scan
Hi Sylvain, Might sound like a nasty question, and gurus will correct me if I'm wrong, but first thing to investigate is why the index is not used : - You have 2/3 million rows per table so the planner should use the index. Seqscan is prefered for small tables. - Maybe the WHERE clause of your DELETE statement doesn't make use of your start and end date columns ? If so, in which order ? Please, provide with your Pg version and the table setup with the index. Regards, Sekine 2012/10/16 Sylvain CAILLET scail...@alaloop.com Hi to all, I've got a trouble with some delete statements. My db contains a little more than 1 tables and runs on a dedicated server (Debian 6 - bi quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2 and 3 million rows and no foreign keys exist between them. Each is indexed (btree) on start_date / end_date fields (bigint). The Postgresql server has been tuned (I can give modified values if needed). I perform recurrent DELETE upon a table subset (~1900 tables) and each time, I delete a few lines (between 0 and 1200). Usually it takes between 10s and more than 2mn. It seems to me to be a huge amount of time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a Seq Scan instead of an Index Scan. Autovaccum is on and I expect the db stats to be updated in real time (pg_stats file is stored in /dev/shm RAM disk for quick access). Do you have any idea about this trouble ? Sylvain Caillet Bureau : + 33 5 59 41 51 10 scail...@alaloop.com ALALOOP S.A.S. - Technopole Izarbel - 64210 Bidart www.alaloop.com
Re: [PERFORM] Slow Delete : Seq scan instead of index scan
Hi Sékine, You're right : my question is why the planner doesn't use the index ! My DELETE statements have WHERE clause like : start_date134648610. They are executed to delete too old rows. My postgresql version is 8.4. Below is an example of a table (they all have the same structure) : CREATE TABLE agg_t100_outgoing_a39_src_net_f5 ( total_pkts bigint, end_date bigint, src_network inet, start_date bigint, total_flows bigint, total_bytes bigint ) WITH ( OIDS=FALSE ); CREATE INDEX agg_t100_outgoing_a39_src_net_f5_end_date ON agg_t100_outgoing_a39_src_net_f5 USING btree (end_date); CREATE INDEX agg_t100_outgoing_a39_src_net_f5_start_date ON agg_t100_outgoing_a39_src_net_f5 USING btree (start_date); I have investigated in the pg_stat_all_tables table and it seems the autovaccum / autoanalyze don't do their job. Many tables have no last_autovacuum / last_autoanalyze dates ! So the planner doesn't have fresh stats to decide. Don't you think it could be a good reason for slow DELETE ? In this case, the trouble could come from the autovaccum configuration. Regards, Sylvain - Mail original - Hi Sylvain, Might sound like a nasty question, and gurus will correct me if I'm wrong, but first thing to investigate is why the index is not used : - You have 2/3 million rows per table so the planner should use the index. Seqscan is prefered for small tables. - Maybe the WHERE clause of your DELETE statement doesn't make use of your start and end date columns ? If so, in which order ? Please, provide with your Pg version and the table setup with the index. Regards, Sekine 2012/10/16 Sylvain CAILLET scail...@alaloop.com Hi to all, I've got a trouble with some delete statements. My db contains a little more than 1 tables and runs on a dedicated server (Debian 6 - bi quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2 and 3 million rows and no foreign keys exist between them. Each is indexed (btree) on start_date / end_date fields (bigint). The Postgresql server has been tuned (I can give modified values if needed). I perform recurrent DELETE upon a table subset (~1900 tables) and each time, I delete a few lines (between 0 and 1200). Usually it takes between 10s and more than 2mn. It seems to me to be a huge amount of time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a Seq Scan instead of an Index Scan. Autovaccum is on and I expect the db stats to be updated in real time (pg_stats file is stored in /dev/shm RAM disk for quick access). Do you have any idea about this trouble ? Sylvain Caillet Bureau : + 33 5 59 41 51 10 scail...@alaloop.com ALALOOP S.A.S. - Technopole Izarbel - 64210 Bidart www.alaloop.com
Re: [PERFORM] Slow Delete : Seq scan instead of index scan
On 10/16/2012 03:50 PM, Sylvain CAILLET wrote: Hi to all, I've got a trouble with some delete statements. My db contains a little more than 1 tables and runs on a dedicated server (Debian 6 - bi quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2 and 3 million rows and no foreign keys exist between them. Each is indexed (btree) on start_date / end_date fields (bigint). The Postgresql server has been tuned (I can give modified values if needed). I perform recurrent DELETE upon a table subset (~1900 tables) and each time, I delete a few lines (between 0 and 1200). Usually it takes between 10s and more than 2mn. It seems to me to be a huge amount of time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a Seq Scan instead of an Index Scan. Can you post that (or paste to explain.depesz.com and link to it here) along with a \d tablename from psql? -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow Delete : Seq scan instead of index scan
Hi Craig, Here are the outputs : flows=# explain analyze delete from agg_t377_incoming_a40_dst_net_f5 where start_date 1346487911000; QUERY PLAN --- Seq Scan on agg_t377_incoming_a40_dst_net_f5 (cost=0.00..34448.96 rows=657622 width=6) (actual time=3429.058..7135.901 rows=143 loops=1) Filter: (start_date 1346487911000::bigint) Total runtime: 7136.191 ms (3 rows) flows=# \d agg_t377_incoming_a40_dst_net_f5 Table public.agg_t377_incoming_a40_dst_net_f5 Column | Type | Modifiers -++--- end_date | bigint | dst_network | inet | total_pkts | bigint | total_bytes | bigint | start_date | bigint | total_flows | bigint | Indexes: agg_t377_incoming_a40_dst_net_f5_end_date btree (end_date) agg_t377_incoming_a40_dst_net_f5_start_date btree (start_date) Thanks for your help, Sylvain - Mail original - On 10/16/2012 03:50 PM, Sylvain CAILLET wrote: Hi to all, I've got a trouble with some delete statements. My db contains a little more than 1 tables and runs on a dedicated server (Debian 6 - bi quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2 and 3 million rows and no foreign keys exist between them. Each is indexed (btree) on start_date / end_date fields (bigint). The Postgresql server has been tuned (I can give modified values if needed). I perform recurrent DELETE upon a table subset (~1900 tables) and each time, I delete a few lines (between 0 and 1200). Usually it takes between 10s and more than 2mn. It seems to me to be a huge amount of time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a Seq Scan instead of an Index Scan. Can you post that (or paste to explain.depesz.com and link to it here) along with a \d tablename from psql? -- Craig Ringer
Re: [PERFORM] Slow Delete : Seq scan instead of index scan
the first thing you should probably do is run an 'analyze' on one of these tables and then run again the delete statement. if there are no stats for these tables, it's normal not to have very good plans. On Tue, Oct 16, 2012 at 11:24 AM, Sylvain CAILLET scail...@alaloop.comwrote: Hi Craig, Here are the outputs : flows=# explain analyze delete from agg_t377_incoming_a40_dst_net_f5 where start_date 1346487911000; QUERY PLAN --- Seq Scan on agg_t377_incoming_a40_dst_net_f5 (cost=0.00..34448.96 rows=657622 width=6) (actual time=3429.058..7135.901 rows=143 loops=1) Filter: (start_date 1346487911000::bigint) Total runtime: 7136.191 ms (3 rows) flows=# \d agg_t377_incoming_a40_dst_net_f5 Table public.agg_t377_incoming_a40_dst_net_f5 Column| Type | Modifiers -++--- end_date| bigint | dst_network | inet | total_pkts | bigint | total_bytes | bigint | start_date | bigint | total_flows | bigint | Indexes: agg_t377_incoming_a40_dst_net_f5_end_date btree (end_date) agg_t377_incoming_a40_dst_net_f5_start_date btree (start_date) Thanks for your help, Sylvain -- On 10/16/2012 03:50 PM, Sylvain CAILLET wrote: Hi to all, I've got a trouble with some delete statements. My db contains a little more than 1 tables and runs on a dedicated server (Debian 6 - bi quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2 and 3 million rows and no foreign keys exist between them. Each is indexed (btree) on start_date / end_date fields (bigint). The Postgresql server has been tuned (I can give modified values if needed). I perform recurrent DELETE upon a table subset (~1900 tables) and each time, I delete a few lines (between 0 and 1200). Usually it takes between 10s and more than 2mn. It seems to me to be a huge amount of time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a Seq Scan instead of an Index Scan. Can you post that (or paste to explain.depesz.com and link to it here) along with a \d tablename from psql? -- Craig Ringer
[PERFORM] LIKE op with B-Tree Index?
Hi communities, I am investigating a performance issue involved with LIKE '%' on an index in a complex query with joins. The problem boils down into this simple scenario---: Scenario My database locale is C, using UTF-8 encoding. I tested this on 9.1.6 and 9. 2.1. Q1. SELECT * FROM shipments WHERE shipment_id LIKE '12345678%' Q2. SELECT * FROM shipments WHERE shipment_id = '12345678' AND shipment_id '12345679' shipments is a table with million rows and 20 columns. Shipment_id is the primary key with text and non-null field. CREATE TABLE cod.shipments ( shipment_id text NOT NULL, -- other columns omitted CONSTRAINT shipments_pkey PRIMARY KEY (shipment_id) ) Analyze Q1 gives this: Index Scan using shipments_pkey on shipments (cost=0.00..39.84 rows=1450 width=294) (actual time=0.018..0.018 rows=1 loops=1) Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id '12345679'::text)) Filter: (shipment_id ~~ '12345678%'::text) Buffers: shared hit=4 Analyze Q2 gives this: Index Scan using shipments_pkey on shipments (cost=0.00..39.83 rows=1 width=294) (actual time=0.027..0.027 rows=1 loops=1) Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id '12345679'::text)) Buffers: shared hit=4 Problem Description In Q1, the planner thought there will be 1450 rows, and Q2 gave a much better estimate of 1. The problem is when I combine such condition with a join to other table, postgres will prefer a merge join (or hash) rather than a nested loop. Question Is Q1 and Q2 equivalent? From what I see and the result they seems to be the same, or did I miss something? (Charset: C, Encoding: UTF-8) If they are equivalent, is that a bug of the planner? Many Thanks, Sam (The email didn’t seems to go through without subscription. Resending)
[PERFORM] Support Create package
Hi, Why PostgreSQL, the EnterpriseBD supports create/alter/drop package and the opensource doesn't? Is a project or never will have support? Thanks
Re: [PERFORM] Support Create package
2012/10/16 Alejandro Carrillo faster...@yahoo.es: Hi, Why PostgreSQL, the EnterpriseBD supports create/alter/drop package and the opensource doesn't? Is a project or never will have support? Packages are part of EnterpriseDB Oracle compatibility layer. PostgreSQL doesn't support this functionality. Packages are in our ToDo, but probably nobody working on it and I don't expect it in next few years. Regards Pavel Stehule Thanks -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Support Create package
On Tue, Oct 16, 2012 at 01:26:37PM +0100, Alejandro Carrillo wrote: Hi, Why PostgreSQL, the EnterpriseBD supports create/alter/drop package and the opensource doesn't? Is a project or never will have support? Hi Alejandro, Isn't that part of their Oracle compatibility secret sauce? For the opensource version, it has never been important enough to anyone invest in the development effort. Cheers, Ken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Guide to Posting Slow Query Questions
On Sun, Oct 7, 2012 at 7:43 AM, Ants Aasma a...@cybertec.at wrote: On Wed, Sep 26, 2012 at 11:11 PM, Jeff Janes jeff.ja...@gmail.com wrote: On Wed, Sep 12, 2012 at 11:40 PM, Ants Aasma a...@cybertec.at wrote: I don't have any links for OS level monitoring, but with version 9.2 track_io_timing would do the job. I don't know how to advice people on how to use this to obtain information on a specific query. Would someone else like to take a stab at explaining that? I added a line suggesting that 9.2 users turn it on via SET track_io_timing TO on; That was easy. I thought there was more to it because I didn't get any IO timing output when I tried it. But that was just because there was nothing to output, as all data was in shared_buffers by the time I turned the timing on. Thanks, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] LIKE op with B-Tree Index?
Hi communities, I am investigating a performance issue involved with LIKE '%' on an index in a complex query with joins. The problem boils down into this simple scenario---: Scenario My database locale is C, using UTF-8 encoding. I tested this on 9.1.6 and 9. 2.1. Q1. SELECT * FROM shipments WHERE shipment_id LIKE '12345678%' Q2. SELECT * FROM shipments WHERE shipment_id = '12345678' AND shipment_id '12345679' shipments is a table with million rows and 20 columns. Shipment_id is the primary key with text and non-null field. CREATE TABLE cod.shipments ( shipment_id text NOT NULL, -- other columns omitted CONSTRAINT shipments_pkey PRIMARY KEY (shipment_id) ) Analyze Q1 gives this: Index Scan using shipments_pkey on shipments (cost=0.00..39.84 rows=1450 width=294) (actual time=0.018..0.018 rows=1 loops=1) Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id '12345679'::text)) Filter: (shipment_id ~~ '12345678%'::text) Buffers: shared hit=4 Analyze Q2 gives this: Index Scan using shipments_pkey on shipments (cost=0.00..39.83 rows=1 width=294) (actual time=0.027..0.027 rows=1 loops=1) Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id '12345679'::text)) Buffers: shared hit=4 Problem Description In Q1, the planner thought there will be 1450 rows, and Q2 gave a much better estimate of 1. The problem is when I combine such condition with a join to other table, postgres will prefer a merge join (or hash) rather than a nested loop. Question Is Q1 and Q2 equivalent? From what I see and the result they seems to be the same, or did I miss something? (Charset: C, Encoding: UTF-8) If they are equivalent, is that a bug of the planner? Many Thanks, Sam -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] limit order by performance issue
Hello, I'm trying to do a simple SQL query over Postgresl 9.0 running on Ubuntu. I have a large table (over 100 million records) with three fields, id_signal (bigint), time_stamp (timestamp) and var_value (float). My query looks like this: select var_value from ism_floatvalues where id_signal = 29660 order by time_stamp desc limit 1; So I want to select the last value from a determinated ID (is_signal). This query runs FOREVER, while if I delete limit 1 it runs instantly Any help? Regards. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit order by performance issue
2012/10/15 Pedro Jiménez p.jime...@ismsolar.com: Hello, I'm trying to do a simple SQL query over Postgresl 9.0 running on Ubuntu. I have a large table (over 100 million records) with three fields, id_signal (bigint), time_stamp (timestamp) and var_value (float). My query looks like this: select var_value from ism_floatvalues where id_signal = 29660 order by time_stamp desc limit 1; So I want to select the last value from a determinated ID (is_signal). This query runs FOREVER, while if I delete limit 1 it runs instantly did you ANALYZE your tables? Can you send EXPLAIN ANALYZE result of both queries? Regards Pavel Stehule Any help? Regards. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit order by performance issue
On 10/15/2012 12:44 PM, Pedro Jiménez wrote: select var_value from ism_floatvalues where id_signal = 29660 order by time_stamp desc limit 1; Well, we'd have to see an EXPLAIN plan to really know what's going on here, but it often boils down to the planner being overly optimistic when low limits are specified. I bet you have an index on time_stamp, don't you? In that case, the planner would reverse index-scan that index, estimating that the chances of it finding ID 29660 are less expensive than fetching all of the rows that match the ID directly, and throwing away all but 1 row. Remember, it would have to read all of those values to know which is the most recent. You can fix this a couple of ways: 1. Put a two-column index on these values: CREATE INDEX idx_ordered_signal ON ism_floatvalues (id_signal, time_stamp DESC); Which turns any request for that particular combo into a single index fetch. 2. You can trick the planner by introducing an optimization fence: SELECT var_value FROM ( SELECT var_value, time_stamp FROM ism_floatvalues WHERE id_signal = 29660 OFFSET 0 ) ORDER BY time_stamp DESC LIMIT 1; Quite a few people will probably grouse at me for giving you that as an option, but it does work better than LIMIT 1 more often than it probably should. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 stho...@optionshouse.com __ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit order by performance issue
Put an index on time_stamp (I assume there is one on id_signal already) On 10/15/2012 12:44 PM, Pedro Jiménez wrote: Hello, I'm trying to do a simple SQL query over Postgresl 9.0 running on Ubuntu. I have a large table (over 100 million records) with three fields, id_signal (bigint), time_stamp (timestamp) and var_value (float). My query looks like this: select var_value from ism_floatvalues where id_signal = 29660 order by time_stamp desc limit 1; So I want to select the last value from a determinated ID (is_signal). This query runs FOREVER, while if I delete limit 1 it runs instantly Any help? Regards. -- -- Karl Denninger /The Market Ticker ®/ http://market-ticker.org Cuda Systems LLC
Re: [PERFORM] LIKE op with B-Tree Index?
On Tue, Oct 16, 2012 at 3:15 AM, Sam Wong s...@hellosam.net wrote: Hi communities, I am investigating a performance issue involved with LIKE '%' on an index in a complex query with joins. The problem boils down into this simple scenario---: Scenario My database locale is C, using UTF-8 encoding. I tested this on 9.1.6 and 9. 2.1. Q1. SELECT * FROM shipments WHERE shipment_id LIKE '12345678%' Q2. SELECT * FROM shipments WHERE shipment_id = '12345678' AND shipment_id '12345679' shipments is a table with million rows and 20 columns. Shipment_id is the primary key with text and non-null field. CREATE TABLE cod.shipments ( shipment_id text NOT NULL, -- other columns omitted CONSTRAINT shipments_pkey PRIMARY KEY (shipment_id) ) Analyze Q1 gives this: Index Scan using shipments_pkey on shipments (cost=0.00..39.84 rows=1450 width=294) (actual time=0.018..0.018 rows=1 loops=1) Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id '12345679'::text)) Filter: (shipment_id ~~ '12345678%'::text) Buffers: shared hit=4 Analyze Q2 gives this: Index Scan using shipments_pkey on shipments (cost=0.00..39.83 rows=1 width=294) (actual time=0.027..0.027 rows=1 loops=1) Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id '12345679'::text)) Buffers: shared hit=4 Problem Description In Q1, the planner thought there will be 1450 rows, and Q2 gave a much better estimate of 1. The problem is when I combine such condition with a join to other table, postgres will prefer a merge join (or hash) rather than a nested loop. Question Is Q1 and Q2 equivalent? From what I see and the result they seems to be the same, or did I miss something? (Charset: C, Encoding: UTF-8) If they are equivalent, is that a bug of the planner? They are most certainly not equivalent. What if the shipping_id is 12345678Z? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] limit order by performance issue
On Tue, Oct 16, 2012 at 10:47 PM, Karl Denninger k...@denninger.net wrote: Put an index on time_stamp (I assume there is one on id_signal already) Well the optimal index for this particular query would include both columns: (id_signal, time_stamp) -- in this order. Additionally, if you want to take advantage of the index-only scans feature, add the SELECTed column too: (id_signal, time_stamp, var_value) Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?
Hi guys, PG = 9.1.5 OS = winDOS 2008R8 I have a table that currently has 207 million rows. there is a timestamp field that contains data. more data gets copied from another database into this database. How do I make this do an index scan instead? I did an analyze audittrailclinical to no avail. I tested different indexes - no same behavior. The query does this: SELECT audittrailclinical.pgid, audittrailclinical.timestamp, mmuser.logon, audittrailclinical.entityname, audittrailclinical.clinicalactivity, audittrailclinical.audittraileventcode, account.accountnumber, patient.dnsortpersonnumber FROM public.account, public.audittrailclinical, public.encounter, public.entity, public.mmuser, public.patient, public.patientaccount WHERE audittrailclinical.encountersid = encounter.encountersid and audittrailclinical.timestamp = '2008-01-01'::timestamp without time zone and audittrailclinical.timestamp = '2012-10-05'::timestamp without time zone AND encounter.practiceid = patient.practiceid AND encounter.patientid = patient.patientid AND encounter.staffid = patient.staffid AND entity.entitysid = audittrailclinical.entitysid AND mmuser.mmusersid = audittrailclinical.mmusersid AND patient.practiceid = patientaccount.practiceid AND patient.patientid = patientaccount.patientid AND patientaccount.accountsid = account.accountsid AND patientaccount.defaultaccount = 'Y' AND patient.dnsortpersonnumber = '347450' ; The query plan says: - Seq Scan on audittrailclinical (cost=0.00..8637598.76 rows=203856829 width=62) Filter: ((timestamp = '2008-01-01 00:00:00'::timestamp without time zone) AND (timestamp = '2012-10-05 00:00:00'::timestamp without time zone)) which takes forever. How do I make this do an index scan instead? I did an analyze audittrailclinical to no avail. the table definitions are (the createstamp field is empty - I know, bad data): CREATE TABLE audittrailclinical ( audittrailid text, audittraileventcode text, clinicalactivity text, eventsuccessful text, externalunique text, recordstamp timestamp without time zone, recorddescription text, encountersid integer, eventuserlogon text, computername text, applicationcode text, practiceid integer, mmusersid integer, entitysid integer, entityname text, timestamp timestamp without time zone, lastuser integer, createstamp timestamp without time zone, pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass) ) WITH ( OIDS=FALSE ); ALTER TABLE audittrailclinical OWNER TO intergy; GRANT ALL ON TABLE audittrailclinical TO intergy; GRANT SELECT ON TABLE audittrailclinical TO rb; -- Index: atc_en_time CREATE INDEX atc_en_time ON audittrailclinical USING btree (entitysid , timestamp ); -- Index: atc_id -- DROP INDEX atc_id; CREATE INDEX atc_id ON audittrailclinical USING btree (audittrailid COLLATE pg_catalog.default ); -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?
On Oct 17, 2012, at 3:52 AM, Chris Ruprecht ch...@cdrbill.com wrote: Hi guys, PG = 9.1.5 OS = winDOS 2008R8 I have a table that currently has 207 million rows. there is a timestamp field that contains data. more data gets copied from another database into this database. How do I make this do an index scan instead? I did an analyze audittrailclinical to no avail. I tested different indexes - no same behavior. The query does this: SELECT audittrailclinical.pgid, audittrailclinical.timestamp, mmuser.logon, audittrailclinical.entityname, audittrailclinical.clinicalactivity, audittrailclinical.audittraileventcode, account.accountnumber, patient.dnsortpersonnumber FROM public.account, public.audittrailclinical, public.encounter, public.entity, public.mmuser, public.patient, public.patientaccount WHERE audittrailclinical.encountersid = encounter.encountersid and audittrailclinical.timestamp = '2008-01-01'::timestamp without time zone and audittrailclinical.timestamp = '2012-10-05'::timestamp without time zone AND encounter.practiceid = patient.practiceid AND encounter.patientid = patient.patientid AND encounter.staffid = patient.staffid AND entity.entitysid = audittrailclinical.entitysid AND mmuser.mmusersid = audittrailclinical.mmusersid AND patient.practiceid = patientaccount.practiceid AND patient.patientid = patientaccount.patientid AND patientaccount.accountsid = account.accountsid AND patientaccount.defaultaccount = 'Y' AND patient.dnsortpersonnumber = '347450' ; The query plan says: - Seq Scan on audittrailclinical (cost=0.00..8637598.76 rows=203856829 width=62) Filter: ((timestamp = '2008-01-01 00:00:00'::timestamp without time zone) AND (timestamp = '2012-10-05 00:00:00'::timestamp without time zone)) which takes forever. Selecting 5 yours of data is not selective at all, so postgres decides it is cheaper to do seqscan. Do you have an index on patient.dnsortpersonnumber? Can you post a result from select count(*) from patient where dnsortpersonnumber = '347450'; ? How do I make this do an index scan instead? I did an analyze audittrailclinical to no avail. the table definitions are (the createstamp field is empty - I know, bad data): CREATE TABLE audittrailclinical ( audittrailid text, audittraileventcode text, clinicalactivity text, eventsuccessful text, externalunique text, recordstamp timestamp without time zone, recorddescription text, encountersid integer, eventuserlogon text, computername text, applicationcode text, practiceid integer, mmusersid integer, entitysid integer, entityname text, timestamp timestamp without time zone, lastuser integer, createstamp timestamp without time zone, pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass) ) WITH ( OIDS=FALSE ); ALTER TABLE audittrailclinical OWNER TO intergy; GRANT ALL ON TABLE audittrailclinical TO intergy; GRANT SELECT ON TABLE audittrailclinical TO rb; -- Index: atc_en_time CREATE INDEX atc_en_time ON audittrailclinical USING btree (entitysid , timestamp ); -- Index: atc_id -- DROP INDEX atc_id; CREATE INDEX atc_id ON audittrailclinical USING btree (audittrailid COLLATE pg_catalog.default ); -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?
On Oct 16, 2012, at 20:01 , Evgeny Shishkin itparan...@gmail.com wrote: Selecting 5 yours of data is not selective at all, so postgres decides it is cheaper to do seqscan. Do you have an index on patient.dnsortpersonnumber? Can you post a result from select count(*) from patient where dnsortpersonnumber = '347450'; ? Yes, there is an index: Aggregate (cost=6427.06..6427.07 rows=1 width=0) - Index Scan using patient_pracsortpatientnumber on patient (cost=0.00..6427.06 rows=1 width=0) Index Cond: (dnsortpersonnumber = '347450'::text) In fact, all the other criteria is picked using an index. I fear that the = and = on the timestamp is causing the issue. If I do a = of just one of them, I get an index scan. But I need to scan the entire range. I get queries like give me everything that was entered into the system for this patient between these two dates. A single date wouldn't work. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?
On Tue, Oct 16, 2012 at 08:19:43PM -0400, Chris Ruprecht wrote: On Oct 16, 2012, at 20:01 , Evgeny Shishkin itparan...@gmail.com wrote: Selecting 5 yours of data is not selective at all, so postgres decides it is cheaper to do seqscan. Do you have an index on patient.dnsortpersonnumber? Can you post a result from select count(*) from patient where dnsortpersonnumber = '347450'; ? Yes, there is an index: Aggregate (cost=6427.06..6427.07 rows=1 width=0) - Index Scan using patient_pracsortpatientnumber on patient (cost=0.00..6427.06 rows=1 width=0) Index Cond: (dnsortpersonnumber = '347450'::text) In fact, all the other criteria is picked using an index. I fear that the = and = on the timestamp is causing the issue. If I do a = of just one of them, I get an index scan. But I need to scan the entire range. I get queries like give me everything that was entered into the system for this patient between these two dates. A single date wouldn't work. Have you read our FAQ on this matter? http://wiki.postgresql.org/wiki/FAQ#Why_are_my_queries_slow.3F_Why_don.27t_they_use_my_indexes.3F -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?
Thanks Bruce, I have, and I even thought, I understood it :). I just ran an explain analyze on another table - and ever since the query plan changed. It's now using the index as expected. I guess, I have some more reading to do. On Oct 16, 2012, at 20:31 , Bruce Momjian br...@momjian.us wrote: Have you read our FAQ on this matter? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] LIKE op with B-Tree Index?
On Wednesday, October 17, 2012 4:30, Merlin Moncure wrote, On Tue, Oct 16, 2012 at 3:15 AM, Sam Wong s...@hellosam.net wrote: Hi communities, I am investigating a performance issue involved with LIKE '%' on an index in a complex query with joins. The problem boils down into this simple scenario---: Scenario My database locale is C, using UTF-8 encoding. I tested this on 9.1.6 and 9. 2.1. Q1. SELECT * FROM shipments WHERE shipment_id LIKE '12345678%' Q2. SELECT * FROM shipments WHERE shipment_id = '12345678' AND shipment_id '12345679' ...snip... Question Is Q1 and Q2 equivalent? From what I see and the result they seems to be the same, or did I miss something? (Charset: C, Encoding: UTF-8) If they are equivalent, is that a bug of the planner? They are most certainly not equivalent. What if the shipping_id is 12345678Z? merlin But '12345678Z' is indeed = '12345678' AND '12345679'. Just like 'apple' 'apples' 'apply' in a dictionary. A quick test: vitalink=# select * from ss; id --- 12345678 12345678Z 12345679 (3 rows) vitalink=# select * from ss WHERE id = '12345678' AND id '12345679'; id --- 12345678 12345678Z (2 rows) Sam -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance