Re: [PERFORM] Slow Delete : Seq scan instead of index scan

2012-10-16 Thread Sékine Coulibaly
Hi Sylvain,

Might sound like a nasty question, and gurus will correct me if I'm wrong,
but first thing to investigate is why the index is not used :
- You have 2/3 million rows per table so the planner should use the index.
Seqscan is prefered for small tables.
- Maybe the WHERE clause of your DELETE statement doesn't make use of your
start and end date columns ? If so, in which order ?

Please, provide with your Pg version and the table setup with the index.

Regards,

Sekine

2012/10/16 Sylvain CAILLET scail...@alaloop.com

 Hi to all,

 I've got a trouble with some delete statements. My db contains a little
 more than 1 tables and runs on a dedicated server (Debian 6 - bi quad
 - 16Gb - SAS disks raid 0). Most of the tables contains between 2 and 3
 million rows and no foreign keys exist between them. Each is indexed
 (btree) on start_date / end_date fields (bigint). The Postgresql server has
 been tuned (I can give modified values if needed).

 I perform recurrent DELETE upon a table subset (~1900 tables) and each
 time, I delete a few lines (between 0 and 1200). Usually it takes between
 10s and more than 2mn. It seems to me to be a huge amount of  time ! An
 EXPLAIN ANALYZE on a DELETE shows me that the planner uses a Seq Scan
 instead of an Index Scan. Autovaccum is on and I expect the db stats to be
 updated in real time (pg_stats file is stored in /dev/shm RAM disk for
 quick access).

 Do you have any idea about this trouble ?

 Sylvain Caillet
 Bureau : + 33 5 59 41 51 10
 scail...@alaloop.com

 ALALOOP S.A.S. - Technopole Izarbel - 64210 Bidart
 www.alaloop.com




Re: [PERFORM] Slow Delete : Seq scan instead of index scan

2012-10-16 Thread Sylvain CAILLET
Hi Sékine, 

You're right : my question is why the planner doesn't use the index ! My DELETE 
statements have WHERE clause like : start_date134648610. They are executed 
to delete too old rows. 
My postgresql version is 8.4. Below is an example of a table (they all have the 
same structure) : 

CREATE TABLE agg_t100_outgoing_a39_src_net_f5 
( 
total_pkts bigint, 
end_date bigint, 
src_network inet, 
start_date bigint, 
total_flows bigint, 
total_bytes bigint 
) 
WITH ( 
OIDS=FALSE 
); 

CREATE INDEX agg_t100_outgoing_a39_src_net_f5_end_date 
ON agg_t100_outgoing_a39_src_net_f5 
USING btree 
(end_date); 

CREATE INDEX agg_t100_outgoing_a39_src_net_f5_start_date 
ON agg_t100_outgoing_a39_src_net_f5 
USING btree 
(start_date); 

I have investigated in the pg_stat_all_tables table and it seems the autovaccum 
/ autoanalyze don't do their job. Many tables have no last_autovacuum / 
last_autoanalyze dates ! So the planner doesn't have fresh stats to decide. 
Don't you think it could be a good reason for slow DELETE ? In this case, the 
trouble could come from the autovaccum configuration. 

Regards, 

Sylvain 
- Mail original -

 Hi Sylvain,

 Might sound like a nasty question, and gurus will correct me if I'm
 wrong, but first thing to investigate is why the index is not used :
 - You have 2/3 million rows per table so the planner should use the
 index. Seqscan is prefered for small tables.
 - Maybe the WHERE clause of your DELETE statement doesn't make use of
 your start and end date columns ? If so, in which order ?

 Please, provide with your Pg version and the table setup with the
 index.

 Regards,

 Sekine

 2012/10/16 Sylvain CAILLET  scail...@alaloop.com 

  Hi to all,
 

  I've got a trouble with some delete statements. My db contains a
  little more than 1 tables and runs on a dedicated server
  (Debian
  6 - bi quad - 16Gb - SAS disks raid 0). Most of the tables contains
  between 2 and 3 million rows and no foreign keys exist between
  them.
  Each is indexed (btree) on start_date / end_date fields (bigint).
  The Postgresql server has been tuned (I can give modified values if
  needed).
 

  I perform recurrent DELETE upon a table subset (~1900 tables) and
  each time, I delete a few lines (between 0 and 1200). Usually it
  takes between 10s and more than 2mn. It seems to me to be a huge
  amount of time ! An EXPLAIN ANALYZE on a DELETE shows me that the
  planner uses a Seq Scan instead of an Index Scan. Autovaccum is on
  and I expect the db stats to be updated in real time (pg_stats file
  is stored in /dev/shm RAM disk for quick access).
 

  Do you have any idea about this trouble ?
 

  Sylvain Caillet
 
  Bureau : + 33 5 59 41 51 10
 
  scail...@alaloop.com
 

  ALALOOP S.A.S. - Technopole Izarbel - 64210 Bidart
 
  www.alaloop.com
 


Re: [PERFORM] Slow Delete : Seq scan instead of index scan

2012-10-16 Thread Craig Ringer

On 10/16/2012 03:50 PM, Sylvain CAILLET wrote:

Hi to all,

I've got a trouble with some delete statements. My db contains a little
more than 1 tables and runs on a dedicated server (Debian 6 - bi
quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2
and 3 million rows and no foreign keys exist between them. Each is
indexed (btree) on start_date / end_date fields (bigint). The Postgresql
server has been tuned (I can give modified values if needed).

I perform recurrent DELETE upon a table subset (~1900 tables) and each
time, I delete a few lines (between 0 and 1200). Usually it takes
between 10s and more than 2mn. It seems to me to be a huge amount of
  time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a
Seq Scan instead of an Index Scan.


Can you post that (or paste to explain.depesz.com and link to it here) 
along with a \d tablename from psql?


--
Craig Ringer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow Delete : Seq scan instead of index scan

2012-10-16 Thread Sylvain CAILLET
Hi Craig, 

Here are the outputs : 

flows=# explain analyze delete from agg_t377_incoming_a40_dst_net_f5 where 
start_date  1346487911000; 
QUERY PLAN 
---
 
Seq Scan on agg_t377_incoming_a40_dst_net_f5 (cost=0.00..34448.96 rows=657622 
width=6) (actual time=3429.058..7135.901 rows=143 loops=1) 
Filter: (start_date  1346487911000::bigint) 
Total runtime: 7136.191 ms 
(3 rows) 

flows=# \d agg_t377_incoming_a40_dst_net_f5 
Table public.agg_t377_incoming_a40_dst_net_f5 
Column | Type | Modifiers 
-++--- 
end_date | bigint | 
dst_network | inet | 
total_pkts | bigint | 
total_bytes | bigint | 
start_date | bigint | 
total_flows | bigint | 
Indexes: 
agg_t377_incoming_a40_dst_net_f5_end_date btree (end_date) 
agg_t377_incoming_a40_dst_net_f5_start_date btree (start_date) 

Thanks for your help, 

Sylvain 
- Mail original -

 On 10/16/2012 03:50 PM, Sylvain CAILLET wrote:
  Hi to all,
 
  I've got a trouble with some delete statements. My db contains a
  little
  more than 1 tables and runs on a dedicated server (Debian 6 -
  bi
  quad - 16Gb - SAS disks raid 0). Most of the tables contains
  between 2
  and 3 million rows and no foreign keys exist between them. Each is
  indexed (btree) on start_date / end_date fields (bigint). The
  Postgresql
  server has been tuned (I can give modified values if needed).
 
  I perform recurrent DELETE upon a table subset (~1900 tables) and
  each
  time, I delete a few lines (between 0 and 1200). Usually it takes
  between 10s and more than 2mn. It seems to me to be a huge amount
  of
  time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner
  uses a
  Seq Scan instead of an Index Scan.

 Can you post that (or paste to explain.depesz.com and link to it
 here)
 along with a \d tablename from psql?

 --
 Craig Ringer


Re: [PERFORM] Slow Delete : Seq scan instead of index scan

2012-10-16 Thread Filippos Kalamidas
the first thing you should probably do is run an 'analyze' on one of these
tables and then run again the delete statement. if there are no stats for
these tables, it's normal not to have very good plans.



On Tue, Oct 16, 2012 at 11:24 AM, Sylvain CAILLET scail...@alaloop.comwrote:

 Hi Craig,

 Here are the outputs :

 flows=# explain analyze delete from agg_t377_incoming_a40_dst_net_f5 where
 start_date  1346487911000;
 QUERY PLAN


 ---
  Seq Scan on agg_t377_incoming_a40_dst_net_f5  (cost=0.00..34448.96
 rows=657622 width=6) (actual time=3429.058..7135.901 rows=143 loops=1)
Filter: (start_date  1346487911000::bigint)
  Total runtime: 7136.191 ms
 (3 rows)
 flows=# \d agg_t377_incoming_a40_dst_net_f5
 Table public.agg_t377_incoming_a40_dst_net_f5
Column|  Type  | Modifiers
 -++---
  end_date| bigint |
  dst_network | inet   |
  total_pkts  | bigint |
  total_bytes | bigint |
  start_date  | bigint |
  total_flows | bigint |
 Indexes:
 agg_t377_incoming_a40_dst_net_f5_end_date btree (end_date)
 agg_t377_incoming_a40_dst_net_f5_start_date btree (start_date)

 Thanks for your help,

 Sylvain

 --

 On 10/16/2012 03:50 PM, Sylvain CAILLET wrote:
  Hi to all,
 
  I've got a trouble with some delete statements. My db contains a little
  more than 1 tables and runs on a dedicated server (Debian 6 - bi
  quad - 16Gb - SAS disks raid 0). Most of the tables contains between 2
  and 3 million rows and no foreign keys exist between them. Each is
  indexed (btree) on start_date / end_date fields (bigint). The Postgresql
  server has been tuned (I can give modified values if needed).
 
  I perform recurrent DELETE upon a table subset (~1900 tables) and each
  time, I delete a few lines (between 0 and 1200). Usually it takes
  between 10s and more than 2mn. It seems to me to be a huge amount of
time ! An EXPLAIN ANALYZE on a DELETE shows me that the planner uses a
  Seq Scan instead of an Index Scan.

 Can you post that (or paste to explain.depesz.com and link to it here)
 along with a \d tablename from psql?

 --
 Craig Ringer





[PERFORM] LIKE op with B-Tree Index?

2012-10-16 Thread Sam Wong
Hi communities,

 

I am investigating a performance issue involved with LIKE '%' on an
index in a complex query with joins. 

 

The problem boils down into this simple scenario---:

Scenario

My database locale is C, using UTF-8 encoding. I tested this on 9.1.6 and 9.
2.1.

 

Q1.

SELECT * FROM shipments WHERE shipment_id LIKE '12345678%'

  

Q2.

SELECT * FROM shipments WHERE shipment_id = '12345678' AND shipment_id 
'12345679'

 

shipments is a table with million rows and 20 columns. Shipment_id is the
primary key with text and non-null field.

 

CREATE TABLE cod.shipments

(

  shipment_id text NOT NULL,

  -- other columns omitted

  CONSTRAINT shipments_pkey PRIMARY KEY (shipment_id)

)

 

Analyze Q1 gives this:

Index Scan using shipments_pkey on shipments  (cost=0.00..39.84 rows=1450
width=294) (actual time=0.018..0.018 rows=1 loops=1)

  Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id 
'12345679'::text))

  Filter: (shipment_id ~~ '12345678%'::text)

  Buffers: shared hit=4

 

Analyze Q2 gives this:

Index Scan using shipments_pkey on shipments  (cost=0.00..39.83 rows=1
width=294) (actual time=0.027..0.027 rows=1 loops=1)

  Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id 
'12345679'::text))

  Buffers: shared hit=4

 

Problem Description

In Q1, the planner thought there will be 1450 rows, and Q2 gave a much
better estimate of 1.

The problem is when I combine such condition with a join to other table,
postgres will prefer a merge join (or hash) rather than a nested loop.

 

Question

Is Q1 and Q2 equivalent? From what I see and the result they seems to be the
same, or did I miss something? (Charset: C, Encoding: UTF-8) If they are
equivalent, is that a bug of the planner?

 

Many Thanks,

Sam

 

(The email didn’t seems to go through without subscription. Resending)



[PERFORM] Support Create package

2012-10-16 Thread Alejandro Carrillo
Hi,

Why PostgreSQL, the EnterpriseBD supports create/alter/drop package and the 
opensource doesn't?
Is a project or never will have support?


Thanks


Re: [PERFORM] Support Create package

2012-10-16 Thread Pavel Stehule
2012/10/16 Alejandro Carrillo faster...@yahoo.es:
 Hi,

 Why PostgreSQL, the EnterpriseBD supports create/alter/drop package and the
 opensource doesn't?
 Is a project or never will have support?

Packages are part of EnterpriseDB Oracle compatibility layer.
PostgreSQL doesn't support this functionality. Packages are in our
ToDo, but probably nobody working on it and I don't expect it in next
few years.

Regards

Pavel Stehule


 Thanks


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Support Create package

2012-10-16 Thread k...@rice.edu
On Tue, Oct 16, 2012 at 01:26:37PM +0100, Alejandro Carrillo wrote:
 Hi,
 
 Why PostgreSQL, the EnterpriseBD supports create/alter/drop package and the 
 opensource doesn't?
 Is a project or never will have support?
 
Hi Alejandro,

Isn't that part of their Oracle compatibility secret sauce? For the opensource
version, it has never been important enough to anyone invest in the development
effort.

Cheers,
Ken


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Guide to Posting Slow Query Questions

2012-10-16 Thread Jeff Janes
On Sun, Oct 7, 2012 at 7:43 AM, Ants Aasma a...@cybertec.at wrote:
 On Wed, Sep 26, 2012 at 11:11 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wed, Sep 12, 2012 at 11:40 PM, Ants Aasma a...@cybertec.at wrote:
 I don't have any links for OS level monitoring, but with version 9.2
 track_io_timing would do the job.

 I don't know how to advice people on how to use this to obtain
 information on a specific query.  Would someone else like to take a
 stab at explaining that?

 I added a line suggesting that 9.2 users turn it on via SET
 track_io_timing TO on;

That was easy.  I thought there was more to it because I didn't get
any IO timing output when I tried it. But that was just because there
was nothing to output, as all data was in shared_buffers by the time I
turned the timing on.

Thanks,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] LIKE op with B-Tree Index?

2012-10-16 Thread Sam Wong
Hi communities,

I am investigating a performance issue involved with LIKE '%' on an
index in a complex query with joins. 

The problem boils down into this simple scenario---:
Scenario
My database locale is C, using UTF-8 encoding. I tested this on 9.1.6 and 9.
2.1.

Q1.
SELECT * FROM shipments WHERE shipment_id LIKE '12345678%'
  
Q2.
SELECT * FROM shipments WHERE shipment_id = '12345678' AND shipment_id 
'12345679'

shipments is a table with million rows and 20 columns. Shipment_id is the
primary key with text and non-null field.

CREATE TABLE cod.shipments
(
  shipment_id text NOT NULL,
  -- other columns omitted
  CONSTRAINT shipments_pkey PRIMARY KEY (shipment_id)
)

Analyze Q1 gives this:
Index Scan using shipments_pkey on shipments  (cost=0.00..39.84 rows=1450
width=294) (actual time=0.018..0.018 rows=1 loops=1)
  Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id 
'12345679'::text))
  Filter: (shipment_id ~~ '12345678%'::text)
  Buffers: shared hit=4

Analyze Q2 gives this:
Index Scan using shipments_pkey on shipments  (cost=0.00..39.83 rows=1
width=294) (actual time=0.027..0.027 rows=1 loops=1)
  Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id 
'12345679'::text))
  Buffers: shared hit=4

Problem Description
In Q1, the planner thought there will be 1450 rows, and Q2 gave a much
better estimate of 1.
The problem is when I combine such condition with a join to other table,
postgres will prefer a merge join (or hash) rather than a nested loop.

Question
Is Q1 and Q2 equivalent? From what I see and the result they seems to be the
same, or did I miss something? (Charset: C, Encoding: UTF-8)
If they are equivalent, is that a bug of the planner?

Many Thanks,
Sam



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] limit order by performance issue

2012-10-16 Thread Pedro Jiménez

Hello,
  I'm trying to do a simple SQL query over Postgresl 9.0 running on Ubuntu.

I have a large table (over 100 million records) with three fields, 
id_signal (bigint), time_stamp (timestamp) and var_value (float).


My query looks like this:

select var_value from ism_floatvalues where id_signal = 29660 order by 
time_stamp desc limit 1;


So I want to select the last value from a determinated ID (is_signal).

This query runs FOREVER, while if I delete limit 1 it runs instantly

Any help?

Regards.




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit order by performance issue

2012-10-16 Thread Pavel Stehule
2012/10/15 Pedro Jiménez p.jime...@ismsolar.com:
 Hello,
   I'm trying to do a simple SQL query over Postgresl 9.0 running on Ubuntu.

 I have a large table (over 100 million records) with three fields, id_signal
 (bigint), time_stamp (timestamp) and var_value (float).

 My query looks like this:

 select var_value from ism_floatvalues where id_signal = 29660 order by
 time_stamp desc limit 1;

 So I want to select the last value from a determinated ID (is_signal).

 This query runs FOREVER, while if I delete limit 1 it runs instantly

did you ANALYZE your tables?

Can you send EXPLAIN ANALYZE result of both queries?

Regards

Pavel Stehule



 Any help?

 Regards.




 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit order by performance issue

2012-10-16 Thread Shaun Thomas

On 10/15/2012 12:44 PM, Pedro Jiménez wrote:


select var_value from ism_floatvalues where id_signal = 29660 order by
time_stamp desc limit 1;


Well, we'd have to see an EXPLAIN plan to really know what's going on
here, but it often boils down to the planner being overly optimistic
when low limits are specified. I bet you have an index on time_stamp,
don't you?

In that case, the planner would reverse index-scan that index,
estimating that the chances of it finding ID 29660 are less expensive
than fetching all of the rows that match the ID directly, and throwing
away all but 1 row. Remember, it would have to read all of those values
to know which is the most recent.

You can fix this a couple of ways:

1. Put a two-column index on these values:

CREATE INDEX idx_ordered_signal
ON ism_floatvalues (id_signal, time_stamp DESC);

Which turns any request for that particular combo into a single index fetch.

2. You can trick the planner by introducing an optimization fence:

SELECT var_value
  FROM (
 SELECT var_value, time_stamp
   FROM ism_floatvalues
  WHERE id_signal = 29660
 OFFSET 0
   )
 ORDER BY time_stamp DESC
 LIMIT 1;

Quite a few people will probably grouse at me for giving you that as an
option, but it does work better than LIMIT 1 more often than it probably
should.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
stho...@optionshouse.com

__

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to 
this email


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit order by performance issue

2012-10-16 Thread Karl Denninger
Put an index on time_stamp (I assume there is one on id_signal already)

On 10/15/2012 12:44 PM, Pedro Jiménez wrote:
 Hello,
   I'm trying to do a simple SQL query over Postgresl 9.0 running on
 Ubuntu.

 I have a large table (over 100 million records) with three fields,
 id_signal (bigint), time_stamp (timestamp) and var_value (float).

 My query looks like this:

 select var_value from ism_floatvalues where id_signal = 29660 order by
 time_stamp desc limit 1;

 So I want to select the last value from a determinated ID (is_signal).

 This query runs FOREVER, while if I delete limit 1 it runs
 instantly

 Any help?

 Regards.





-- 
-- Karl Denninger
/The Market Ticker ®/ http://market-ticker.org
Cuda Systems LLC


Re: [PERFORM] LIKE op with B-Tree Index?

2012-10-16 Thread Merlin Moncure
On Tue, Oct 16, 2012 at 3:15 AM, Sam Wong s...@hellosam.net wrote:
 Hi communities,

 I am investigating a performance issue involved with LIKE '%' on an
 index in a complex query with joins.

 The problem boils down into this simple scenario---:
 Scenario
 My database locale is C, using UTF-8 encoding. I tested this on 9.1.6 and 9.
 2.1.

 Q1.
 SELECT * FROM shipments WHERE shipment_id LIKE '12345678%'

 Q2.
 SELECT * FROM shipments WHERE shipment_id = '12345678' AND shipment_id 
 '12345679'

 shipments is a table with million rows and 20 columns. Shipment_id is the
 primary key with text and non-null field.

 CREATE TABLE cod.shipments
 (
   shipment_id text NOT NULL,
   -- other columns omitted
   CONSTRAINT shipments_pkey PRIMARY KEY (shipment_id)
 )

 Analyze Q1 gives this:
 Index Scan using shipments_pkey on shipments  (cost=0.00..39.84 rows=1450
 width=294) (actual time=0.018..0.018 rows=1 loops=1)
   Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id 
 '12345679'::text))
   Filter: (shipment_id ~~ '12345678%'::text)
   Buffers: shared hit=4

 Analyze Q2 gives this:
 Index Scan using shipments_pkey on shipments  (cost=0.00..39.83 rows=1
 width=294) (actual time=0.027..0.027 rows=1 loops=1)
   Index Cond: ((shipment_id = '12345678'::text) AND (shipment_id 
 '12345679'::text))
   Buffers: shared hit=4

 Problem Description
 In Q1, the planner thought there will be 1450 rows, and Q2 gave a much
 better estimate of 1.
 The problem is when I combine such condition with a join to other table,
 postgres will prefer a merge join (or hash) rather than a nested loop.

 Question
 Is Q1 and Q2 equivalent? From what I see and the result they seems to be the
 same, or did I miss something? (Charset: C, Encoding: UTF-8)
 If they are equivalent, is that a bug of the planner?

They are most certainly not equivalent.  What if the shipping_id is 12345678Z?

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] limit order by performance issue

2012-10-16 Thread Marti Raudsepp
On Tue, Oct 16, 2012 at 10:47 PM, Karl Denninger k...@denninger.net wrote:
 Put an index on time_stamp (I assume there is one on id_signal already)

Well the optimal index for this particular query would include both columns:
(id_signal, time_stamp) -- in this order.

Additionally, if you want to take advantage of the index-only scans
feature, add the SELECTed column too:
(id_signal, time_stamp, var_value)

Regards,
Marti


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Chris Ruprecht
Hi guys,

PG = 9.1.5
OS = winDOS 2008R8

I have a table that currently has 207 million rows.
there is a timestamp field that contains data.
more data gets copied from another database into this database.
How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.
I tested different indexes - no same behavior.

The query does this:

SELECT   
audittrailclinical.pgid,   
audittrailclinical.timestamp,   
mmuser.logon,  
audittrailclinical.entityname,   
audittrailclinical.clinicalactivity,   
audittrailclinical.audittraileventcode,   
account.accountnumber,   
patient.dnsortpersonnumber  
FROM   
public.account,   
public.audittrailclinical,   
public.encounter,   
public.entity,   
public.mmuser,   
public.patient,   
public.patientaccount 
WHERE   
 audittrailclinical.encountersid = encounter.encountersid 
and   audittrailclinical.timestamp = '2008-01-01'::timestamp without time zone 
and   audittrailclinical.timestamp = '2012-10-05'::timestamp without time zone
AND  encounter.practiceid = patient.practiceid 
AND  encounter.patientid = patient.patientid 
AND  encounter.staffid = patient.staffid 
AND  entity.entitysid = audittrailclinical.entitysid 
AND  mmuser.mmusersid = audittrailclinical.mmusersid 
AND  patient.practiceid = patientaccount.practiceid 
AND  patient.patientid = patientaccount.patientid 
AND  patientaccount.accountsid = account.accountsid 
AND  patientaccount.defaultaccount = 'Y' 
AND  patient.dnsortpersonnumber = '347450' ;

The query plan says:

  -  Seq Scan on audittrailclinical  (cost=0.00..8637598.76 
rows=203856829 width=62)
Filter: ((timestamp = '2008-01-01 00:00:00'::timestamp 
without time zone) AND (timestamp = '2012-10-05 00:00:00'::timestamp without 
time zone))

which takes forever.

How do I make this do an index scan instead?
I did an analyze audittrailclinical to no avail.

the table definitions are (the createstamp field is empty - I know, bad data):

CREATE TABLE audittrailclinical
(
 audittrailid text,
 audittraileventcode text,
 clinicalactivity text,
 eventsuccessful text,
 externalunique text,
 recordstamp timestamp without time zone,
 recorddescription text,
 encountersid integer,
 eventuserlogon text,
 computername text,
 applicationcode text,
 practiceid integer,
 mmusersid integer,
 entitysid integer,
 entityname text,
 timestamp timestamp without time zone,
 lastuser integer,
 createstamp timestamp without time zone,
 pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass)
)
WITH (
 OIDS=FALSE
);
ALTER TABLE audittrailclinical
 OWNER TO intergy;
GRANT ALL ON TABLE audittrailclinical TO intergy;
GRANT SELECT ON TABLE audittrailclinical TO rb;

-- Index: atc_en_time

CREATE INDEX atc_en_time
 ON audittrailclinical
 USING btree
 (entitysid , timestamp );

-- Index: atc_id

-- DROP INDEX atc_id;

CREATE INDEX atc_id
 ON audittrailclinical
 USING btree
 (audittrailid COLLATE pg_catalog.default );


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Evgeny Shishkin

On Oct 17, 2012, at 3:52 AM, Chris Ruprecht ch...@cdrbill.com wrote:

 Hi guys,
 
 PG = 9.1.5
 OS = winDOS 2008R8
 
 I have a table that currently has 207 million rows.
 there is a timestamp field that contains data.
 more data gets copied from another database into this database.
 How do I make this do an index scan instead?
 I did an analyze audittrailclinical to no avail.
 I tested different indexes - no same behavior.
 
 The query does this:
 
 SELECT   
 audittrailclinical.pgid,   
 audittrailclinical.timestamp,   
 mmuser.logon,  
 audittrailclinical.entityname,   
 audittrailclinical.clinicalactivity,   
 audittrailclinical.audittraileventcode,   
 account.accountnumber,   
 patient.dnsortpersonnumber  
 FROM   
 public.account,   
 public.audittrailclinical,   
 public.encounter,   
 public.entity,   
 public.mmuser,   
 public.patient,   
 public.patientaccount 
 WHERE   
 audittrailclinical.encountersid = encounter.encountersid 
 and   audittrailclinical.timestamp = '2008-01-01'::timestamp without time 
 zone 
 and   audittrailclinical.timestamp = '2012-10-05'::timestamp without time 
 zone
 AND  encounter.practiceid = patient.practiceid 
 AND  encounter.patientid = patient.patientid 
 AND  encounter.staffid = patient.staffid 
 AND  entity.entitysid = audittrailclinical.entitysid 
 AND  mmuser.mmusersid = audittrailclinical.mmusersid 
 AND  patient.practiceid = patientaccount.practiceid 
 AND  patient.patientid = patientaccount.patientid 
 AND  patientaccount.accountsid = account.accountsid 
 AND  patientaccount.defaultaccount = 'Y' 
 AND  patient.dnsortpersonnumber = '347450' ;
 
 The query plan says:
 
   -  Seq Scan on audittrailclinical  (cost=0.00..8637598.76 
 rows=203856829 width=62)
 Filter: ((timestamp = '2008-01-01 
 00:00:00'::timestamp without time zone) AND (timestamp = '2012-10-05 
 00:00:00'::timestamp without time zone))
 
 which takes forever.
 

Selecting 5 yours of data is not selective at all, so postgres decides it is 
cheaper to do seqscan. 

Do you have an index on patient.dnsortpersonnumber? Can you post a result from 
select count(*) from patient where dnsortpersonnumber = '347450'; ?


 How do I make this do an index scan instead?
 I did an analyze audittrailclinical to no avail.
 
 the table definitions are (the createstamp field is empty - I know, bad data):
 
 CREATE TABLE audittrailclinical
 (
 audittrailid text,
 audittraileventcode text,
 clinicalactivity text,
 eventsuccessful text,
 externalunique text,
 recordstamp timestamp without time zone,
 recorddescription text,
 encountersid integer,
 eventuserlogon text,
 computername text,
 applicationcode text,
 practiceid integer,
 mmusersid integer,
 entitysid integer,
 entityname text,
 timestamp timestamp without time zone,
 lastuser integer,
 createstamp timestamp without time zone,
 pgid bigint DEFAULT nextval(('bravepoint_seq'::text)::regclass)
 )
 WITH (
 OIDS=FALSE
 );
 ALTER TABLE audittrailclinical
 OWNER TO intergy;
 GRANT ALL ON TABLE audittrailclinical TO intergy;
 GRANT SELECT ON TABLE audittrailclinical TO rb;
 
 -- Index: atc_en_time
 
 CREATE INDEX atc_en_time
 ON audittrailclinical
 USING btree
 (entitysid , timestamp );
 
 -- Index: atc_id
 
 -- DROP INDEX atc_id;
 
 CREATE INDEX atc_id
 ON audittrailclinical
 USING btree
 (audittrailid COLLATE pg_catalog.default );
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Chris Ruprecht

On Oct 16, 2012, at 20:01 , Evgeny Shishkin itparan...@gmail.com wrote:

 Selecting 5 yours of data is not selective at all, so postgres decides it is 
 cheaper to do seqscan. 
 
 Do you have an index on patient.dnsortpersonnumber? Can you post a result 
 from 
 select count(*) from patient where dnsortpersonnumber = '347450'; ?
 

Yes, there is an index:

Aggregate  (cost=6427.06..6427.07 rows=1 width=0)
  -  Index Scan using patient_pracsortpatientnumber on patient  
(cost=0.00..6427.06 rows=1 width=0)
Index Cond: (dnsortpersonnumber = '347450'::text)


In fact, all the other criteria is picked using an index. I fear that the = 
and = on the timestamp is causing the issue. If I do a = of just one of 
them, I get an index scan. But I need to scan the entire range. I get queries 
like give me everything that was entered into the system for this patient 
between these two dates. A single date wouldn't work.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Bruce Momjian
On Tue, Oct 16, 2012 at 08:19:43PM -0400, Chris Ruprecht wrote:
 
 On Oct 16, 2012, at 20:01 , Evgeny Shishkin itparan...@gmail.com wrote:
 
  Selecting 5 yours of data is not selective at all, so postgres decides it 
  is cheaper to do seqscan. 
  
  Do you have an index on patient.dnsortpersonnumber? Can you post a result 
  from 
  select count(*) from patient where dnsortpersonnumber = '347450'; ?
  
 
 Yes, there is an index:
 
 Aggregate  (cost=6427.06..6427.07 rows=1 width=0)
   -  Index Scan using patient_pracsortpatientnumber on patient  
 (cost=0.00..6427.06 rows=1 width=0)
 Index Cond: (dnsortpersonnumber = '347450'::text)
 
 
 In fact, all the other criteria is picked using an index. I fear that the = 
 and = on the timestamp is causing the issue. If I do a = of just one of 
 them, I get an index scan. But I need to scan the entire range. I get queries 
 like give me everything that was entered into the system for this patient 
 between these two dates. A single date wouldn't work.

Have you read our FAQ on this matter?


http://wiki.postgresql.org/wiki/FAQ#Why_are_my_queries_slow.3F_Why_don.27t_they_use_my_indexes.3F

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Have: Seq Scan - Want: Index Scan - what am I doing wrong?

2012-10-16 Thread Chris Ruprecht
Thanks Bruce, 

I have, and I even thought, I understood it :). 

I just ran an explain analyze on another table - and ever since the query plan 
changed. It's now using the index as expected. I guess, I have some more 
reading to do.

On Oct 16, 2012, at 20:31 , Bruce Momjian br...@momjian.us wrote:

 
 Have you read our FAQ on this matter?
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIKE op with B-Tree Index?

2012-10-16 Thread Sam Wong
 On Wednesday, October 17, 2012 4:30, Merlin Moncure wrote,
 
 On Tue, Oct 16, 2012 at 3:15 AM, Sam Wong s...@hellosam.net wrote:
  Hi communities,
 
  I am investigating a performance issue involved with LIKE '%' on
  an index in a complex query with joins.
 
  The problem boils down into this simple scenario---:
  Scenario
  My database locale is C, using UTF-8 encoding. I tested this on 9.1.6
and 9.
  2.1.
 
  Q1.
  SELECT * FROM shipments WHERE shipment_id LIKE '12345678%'
 
  Q2.
  SELECT * FROM shipments WHERE shipment_id = '12345678' AND
  shipment_id  '12345679'
 
  ...snip...
 
  Question
  Is Q1 and Q2 equivalent? From what I see and the result they seems to
  be the same, or did I miss something? (Charset: C, Encoding: UTF-8) If
  they are equivalent, is that a bug of the planner?
 
 They are most certainly not equivalent.  What if the shipping_id is
 12345678Z?
 
 merlin

But '12345678Z' is indeed = '12345678' AND  '12345679'. Just like 'apple'
 'apples'  'apply' in a dictionary.

A quick test:
vitalink=# select * from ss;
id
---
 12345678
 12345678Z
 12345679
(3 rows)

vitalink=# select * from ss  WHERE id = '12345678' AND id  '12345679';
id
---
 12345678
 12345678Z
(2 rows)

Sam



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance