Re: [PERFORM] Query tuning: partitioning, DISTINCT ON, and indexing

2013-06-21 Thread bricklen
On Thu, Jun 20, 2013 at 10:14 PM, Maciek Sakrejda m.sakre...@gmail.comwrote:

 On Thu, Jun 20, 2013 at 9:13 PM, bricklen brick...@gmail.com wrote:


 On Thu, Jun 20, 2013 at 6:24 PM, Maciek Sakrejda m.sakre...@gmail.comwrote:

 SELECT
   DISTINCT ON (type) ts, type, details
 FROM
   observations
 WHERE
   subject = '...'
 ORDER BY
   type, ts DESC;


 First thing: What is your work_mem set to, and how much RAM is in the
 machine? If you look at the plan, you'll immediately notice the external
 merge Disk line where it spills to disk on the sort. Try setting your
 work_mem to 120MB or so (depending on how much RAM you have, # concurrent
 sessions, complexity of queries etc)


 Good call, thanks, although the in-mem quicksort is not much faster:


 QUERY PLAN


 ---
  Unique  (cost=471248.30..489392.67 rows=3 width=47) (actual
 time=32002.133..32817.474 rows=3 loops=1)
Buffers: shared read=30264
-  Sort  (cost=471248.30..480320.48 rows=3628873 width=47) (actual
 time=32002.128..32455.950 rows=3628803 loops=1)
  Sort Key: public.observations.type, public.observations.ts
  Sort Method: quicksort  Memory: 381805kB
  Buffers: shared read=30264
  -  Result  (cost=0.00..75862.81 rows=3628873 width=47) (actual
 time=0.026..1323.317 rows=3628803 loops=1)
Buffers: shared read=30264
-  Append  (cost=0.00..75862.81 rows=3628873 width=47)
 (actual time=0.026..978.477 rows=3628803 loops=1)
  Buffers: shared read=30264
 ...

 the machine is not nailed down, but I think I'd need to find a way to
 drastically improve the plan to keep this in Postgres. The alternative is
 probably caching the results somewhere else: for any given subject, I only
 need the latest observation of each type 99.9+% of the time.



 Here are some pages that might help for what details to provide:
https://wiki.postgresql.org/wiki/Server_Configuration
https://wiki.postgresql.org/wiki/Slow_Query_Questions


Did you try an index on (type, ts desc) ? I don't have much else to add at
this point, but maybe after posting some more server and table (parent and
child) details someone will have an answer for you.


Re: [PERFORM] Query tuning: partitioning, DISTINCT ON, and indexing

2013-06-21 Thread Maciek Sakrejda
On Fri, Jun 21, 2013 at 9:08 AM, bricklen brick...@gmail.com wrote:

 Did you try an index on (type, ts desc) ? I don't have much else to add at
 this point, but maybe after posting some more server and table (parent and
 child) details someone will have an answer for you.


No, this is exactly what I was missing. I had forgotten the default index
order is useless for a descending lookup like this: I made the change and
the performance is 3000x better (the plan's using the index now). Thanks
for all your help.


Re: [PERFORM] Query tuning: partitioning, DISTINCT ON, and indexing

2013-06-20 Thread bricklen
On Thu, Jun 20, 2013 at 6:24 PM, Maciek Sakrejda m.sakre...@gmail.comwrote:

 SELECT
   DISTINCT ON (type) ts, type, details
 FROM
   observations
 WHERE
   subject = '...'
 ORDER BY
   type, ts DESC;


First thing: What is your work_mem set to, and how much RAM is in the
machine? If you look at the plan, you'll immediately notice the external
merge Disk line where it spills to disk on the sort. Try setting your
work_mem to 120MB or so (depending on how much RAM you have, # concurrent
sessions, complexity of queries etc)


Re: [PERFORM] Query tuning: partitioning, DISTINCT ON, and indexing

2013-06-20 Thread Maciek Sakrejda
On Thu, Jun 20, 2013 at 9:13 PM, bricklen brick...@gmail.com wrote:


 On Thu, Jun 20, 2013 at 6:24 PM, Maciek Sakrejda m.sakre...@gmail.comwrote:

 SELECT
   DISTINCT ON (type) ts, type, details
 FROM
   observations
 WHERE
   subject = '...'
 ORDER BY
   type, ts DESC;


 First thing: What is your work_mem set to, and how much RAM is in the
 machine? If you look at the plan, you'll immediately notice the external
 merge Disk line where it spills to disk on the sort. Try setting your
 work_mem to 120MB or so (depending on how much RAM you have, # concurrent
 sessions, complexity of queries etc)


Good call, thanks, although the in-mem quicksort is not much faster:


  QUERY PLAN

---
 Unique  (cost=471248.30..489392.67 rows=3 width=47) (actual
time=32002.133..32817.474 rows=3 loops=1)
   Buffers: shared read=30264
   -  Sort  (cost=471248.30..480320.48 rows=3628873 width=47) (actual
time=32002.128..32455.950 rows=3628803 loops=1)
 Sort Key: public.observations.type, public.observations.ts
 Sort Method: quicksort  Memory: 381805kB
 Buffers: shared read=30264
 -  Result  (cost=0.00..75862.81 rows=3628873 width=47) (actual
time=0.026..1323.317 rows=3628803 loops=1)
   Buffers: shared read=30264
   -  Append  (cost=0.00..75862.81 rows=3628873 width=47)
(actual time=0.026..978.477 rows=3628803 loops=1)
 Buffers: shared read=30264
...

the machine is not nailed down, but I think I'd need to find a way to
drastically improve the plan to keep this in Postgres. The alternative is
probably caching the results somewhere else: for any given subject, I only
need the latest observation of each type 99.9+% of the time.


Re: [PERFORM] Query tuning help

2011-10-11 Thread Pavel Stehule
Hello

please, send EXPLAIN ANALYZE output instead.

Regards

Pavel Stehule

2011/10/11 CS DBA cs_...@consistentstate.com:
 Hi all ;

 I'm trying to tune a difficult query.

 I have 2 tables:
 cust_acct (9million rows)
 cust_orders (200,000 rows)

 Here's the query:

 SELECT
     a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
     a.status,  a.customer_location_id, a.added_date,
     o.agent_id, p.order_location_id_id,
     COALESCE(a.customer_location_id, p.order_location_id) AS
 order_location_id
 FROM
     cust_acct a JOIN
     cust_orders o
     ON a.order_id = p.order_id;

 I can't get it to run much faster that about 13 seconds, in most cases it's
 more like 30 seconds.
 We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB


 I've tried separating the queries as filtering queries  joining the
 results, disabling seq scans, upping work_mem and half a dozen other
 approaches.  Here's the explain plan:

  Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
    Hash Cond: (a.order_id = o.order_id)
    -  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
    -  Hash  (cost=122.69..122.69 rows=2269 width=12)
  -  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269
 width=12)

 Thanks in advance for any help, tips, etc...












 --
 -
 Kevin Kempter   -   Constent State
 A PostgreSQL Professional Services Company
   www.consistentstate.com
 -

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query tuning help

2011-10-11 Thread Szymon Guz
On 11 October 2011 19:52, CS DBA cs_...@consistentstate.com wrote:

  Hi all ;

 I'm trying to tune a difficult query.

 I have 2 tables:
 cust_acct (9million rows)
 cust_orders (200,000 rows)

 Here's the query:

 SELECT
 a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
 a.status,  a.customer_location_id, a.added_date,
 o.agent_id, p.order_location_id_id,
 COALESCE(a.customer_location_id, p.order_location_id) AS
 order_location_id
 FROM
 cust_acct a JOIN
 cust_orders o
 ON a.order_id = p.order_id;

 I can't get it to run much faster that about 13 seconds, in most cases it's
 more like 30 seconds.
 We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB


 I've tried separating the queries as filtering queries  joining the
 results, disabling seq scans, upping work_mem and half a dozen other
 approaches.  Here's the explain plan:

  Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
Hash Cond: (a.order_id = o.order_id)
-  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130
 width=92)
-  Hash  (cost=122.69..122.69 rows=2269 width=12)
  -  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269
 width=12)

 Thanks in advance for any help, tips, etc...





Hi,
two simple questions:

- do you really need getting all 9M rows?
- show us the table structure, together with index definitions

regards
Szymon


Re: [PERFORM] Query tuning help

2011-10-11 Thread CS DBA

On 10/11/2011 12:02 PM, Pavel Stehule wrote:

Hello

please, send EXPLAIN ANALYZE output instead.

Regards

Pavel Stehule

2011/10/11 CS DBAcs_...@consistentstate.com:

Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
 a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
 a.status,  a.customer_location_id, a.added_date,
 o.agent_id, p.order_location_id_id,
 COALESCE(a.customer_location_id, p.order_location_id) AS
order_location_id
FROM
 cust_acct a JOIN
 cust_orders o
 ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most cases it's
more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB


I've tried separating the queries as filtering queries  joining the
results, disabling seq scans, upping work_mem and half a dozen other
approaches.  Here's the explain plan:

  Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
Hash Cond: (a.order_id = o.order_id)
-   Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
-   Hash  (cost=122.69..122.69 rows=2269 width=12)
  -   Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269
width=12)

Thanks in advance for any help, tips, etc...







Explain Analyze:


 Hash Join  (cost=154.46..691776.11 rows=10059626 width=100) (actual 
time=5.191..37551.360 rows=10063432 loops=1)

   Hash Cond: (a.order_id = o.order_id)
   -  Seq Scan on cust_acct a  (cost=0.00..540727.26 rows=10059626 
width=92) (actual time=0.022..18987.095 rows=10063432 loops=1)
   -  Hash  (cost=124.76..124.76 rows=2376 width=12) (actual 
time=5.135..5.135 rows=2534 loops=1)
 -  Seq Scan on cust_orders o  (cost=0.00..124.76 rows=2376 
width=12) (actual time=0.011..2.843 rows=2534 loops=1)

 Total runtime: 43639.105 ms
(6 rows)














--
-
Kevin Kempter   -   Constent State
A PostgreSQL Professional Services Company
   www.consistentstate.com
-



--
-
Kevin Kempter   -   Constent State
A PostgreSQL Professional Services Company
  www.consistentstate.com
-



Re: [PERFORM] Query tuning help

2011-10-11 Thread CS DBA

On 10/11/2011 12:03 PM, Szymon Guz wrote:



On 11 October 2011 19:52, CS DBA cs_...@consistentstate.com 
mailto:cs_...@consistentstate.com wrote:


Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
a.status,  a.customer_location_id, a.added_date,
o.agent_id, p.order_location_id_id,
COALESCE(a.customer_location_id, p.order_location_id) AS
order_location_id
FROM
cust_acct a JOIN
cust_orders o
ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most
cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers
is at 8GB


I've tried separating the queries as filtering queries  joining
the results, disabling seq scans, upping work_mem and half a dozen
other approaches.  Here's the explain plan:

 Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
   Hash Cond: (a.order_id = o.order_id)
   -  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130
width=92)
   -  Hash  (cost=122.69..122.69 rows=2269 width=12)
 -  Seq Scan on cust_orders o  (cost=0.00..122.69
rows=2269 width=12)

Thanks in advance for any help, tips, etc...





Hi,
two simple questions:

- do you really need getting all 9M rows?

unfortunately yes



- show us the table structure, together with index definitions




cust_acct table

Column |Type 
|   Modifiers

---+-+---
 account_id| bigint  | not null default 
nextval('cust_account_id_seq'::regclass)

 customer_id   | character varying(10)   |
 order_id  | integer | not null
 primary_contact_id| bigint  |
 status| accounts_status_type| not null
 customer_location_id  | integer |
 added_date| timestamp with time zone| not null
Indexes:
cust_acct_pkey PRIMARY KEY, btree (account_id)
cust_acct_cust_id_indx btree (customer_id)
cust_acct_order_id_id_indx btree (order_id)
cust_acct_pri_contact_id_indx btree (primary_contact_id)





cust_orders table


   Column|Type 
|   Modifiers
-+-+--- 

 order_id| integer | not null 
default nextval('order_id_seq'::regclass)

 backorder_tag_id| character varying(18)   |
 order_location_id   | integer | not null
 work_order_name | character varying(75)   | not null
 status  | programs_status_type| not null
 additional_info_tag_shipper | character(16)   | not null
 additional_info_tag_cust| character(16)   | not null
 additional_info_tag_manuf   | character(16)   | not null
 additional_info_tag_supply  | character(16)   | not null
 acct_active_dt  | timestamp without time zone |
 acct_last_activity_date | timestamp without time zone |
 acct_backorder_items| boolean | not null 
default false

 custom_info1| text|
 custom_info2| text|
 custom_info3| text|
 custom_info4| text|
 custom_info5| text|
 custom_info6| text|
 custom_info7| text|
Indexes:
cust_orders_pkey PRIMARY KEY, btree (order_id)
cust_orders_order_id_loc_id_key UNIQUE, btree (order_id, 
order_location_id)

cust_orders_loc_id_idx btree (order_location_id)














regards
Szymon





--
-
Kevin Kempter   -   Constent State
A PostgreSQL Professional Services Company
  www.consistentstate.com
-



Re: [PERFORM] Query tuning help

2011-10-11 Thread Pavel Stehule


  Hash Join  (cost=154.46..691776.11 rows=10059626 width=100) (actual
 time=5.191..37551.360 rows=10063432 loops=1)
    Hash Cond: (a.order_id =
 o.order_id)
    -  Seq Scan on cust_acct a  (cost=0.00..540727.26 rows=10059626
 width=92) (actual time=0.022..18987.095 rows=10063432
 loops=1)
    -  Hash  (cost=124.76..124.76 rows=2376 width=12) (actual
 time=5.135..5.135 rows=2534
 loops=1)
  -  Seq Scan on cust_orders o  (cost=0.00..124.76 rows=2376
 width=12) (actual time=0.011..2.843 rows=2534 loops=1)
  Total runtime: 43639.105
 ms
 (6 rows)


I am thinking so this time is adequate - processing of 10 mil rows
result must be slow

a tips:

* recheck a seq. read speed - if this is about expected values

* play with work_mem - probably is not enough for one bucket - you can
decrease time about 10-20 sec, but attention to going to swap -
EXPLAIN ANALYZE VERBOSE show a number of buckets - ideal is one.

* use a some filter if it's possible
* use a limit if it's possible

if you really should to process all rows and you need better reaction
time, try to use a cursor. It is optimized for fast first row

Regards

Pavel Stehule










 --
 -
 Kevin Kempter   -   Constent State
 A PostgreSQL Professional Services Company
   www.consistentstate.com
 -


 --
 -
 Kevin Kempter   -   Constent State
 A PostgreSQL Professional Services Company
   www.consistentstate.com
 -

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query tuning help

2010-06-14 Thread Andy Colson

On 06/14/2010 05:41 AM, AI Rumman wrote:

Can any one please help me in tuning the query?

explain
select *
from (select * from crmentity where deleted = 0 and createdtime between
(now() - interval '6 month') and now() ) as crmentity
inner join (select * from activity where  activitytype = 'Emails' and
date_start between (now() - interval '6 month')  and now()) as activity
on crmentity.crmid=activity.activityid
inner join emaildetails on emaildetails.emailid = crmentity.crmid
inner join vantage_email_track on
vantage_email_track.mailid=emaildetails.emailid
left join seactivityrel on seactivityrel.activityid = emaildetails.emailid



Can you send us 'explain analyze' too?


-  Seq Scan on emaildetails  (cost=0.00..1686.95 rows=44595 width=186)
-  Seq Scan on vantage_email_track  (cost=0.00..1324.52 rows=88852 width=12)


do you have indexes on emaildetails(emailid) and  vantage_email_track(mailid)?

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query tuning

2009-08-20 Thread Віталій Тимчишин
2009/8/19 Kevin Kempter kev...@consistentstate.com


 We do have an index on url_hits.time

 not sure why timestamps were not used, I was not here for the design phase.


What's type of time column? I don't like it casts it to double in explain.
If it is integer, may be you need to change

and time = extract ('epoch' from timestamp '2009-08-12')
and time  extract ('epoch' from timestamp '2009-08-13' )

to

and time = extract ('epoch' from timestamp '2009-08-12')::int4
and time  extract ('epoch' from timestamp '2009-08-13' )::int4

for the index to be used?


Re: [PERFORM] Query tuning

2009-08-19 Thread Grzegorz Jaśkiewicz
that seems to be the killer:

and time = extract ('epoch' from timestamp '2009-08-12')
and time  extract ('epoch' from timestamp '2009-08-13' )

You probably need an index on time/epoch:

CREATE INDEX foo ON table(extract ('epoch' from timestamp time );

or something like that, vacuum analyze and retry.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query tuning

2009-08-19 Thread Nikolas Everett
2009/8/19 Grzegorz Jaśkiewicz gryz...@gmail.com

 that seems to be the killer:

 and time = extract ('epoch' from timestamp '2009-08-12')
 and time  extract ('epoch' from timestamp '2009-08-13' )

 You probably need an index on time/epoch:

 CREATE INDEX foo ON table(extract ('epoch' from timestamp time );


It looks like those extracts just make constant integer times. You probably
just create an index on the time column.

Also, why not store times as timestamps?




 or something like that, vacuum analyze and retry.

 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Query tuning

2009-08-19 Thread Kevin Kempter
On Wednesday 19 August 2009 11:17:26 Scott Carey wrote:
 On 8/19/09 9:28 AM, Kevin Kempter kev...@consistentstate.com wrote:
  Hi all;
 
  we've been fighting this query for a few days now. we bumped up the
  statistict target for the a.id , c.url_hits_id and the b.id columns below
  to 250 and ran an analyze on the relevant tables.  we killed it after
  8hrs.
 
  Note the url_hits table has  1.4billion rows
 
  Any suggestions?

 Have you tried setting work_mem higher for just this query?

Yes, we upped it to 500Meg



 The big estimated cost is the sequential scan on url_hits.  But in reality,
 if the estimates are off the sort and index scan at the end might be your
 bottleneck.  Larger work_mem might make it choose another plan there.

 But if the true cost is the sequential scan on url_hits, then only an index
 there will help.

  $ psql -ef expl.sql pwreport
  explain
  select
  a.id,
  ident_id,
  time,
  customer_name,
  extract('day' from timezone(e.name, to_timestamp(a.time))) as day,
  category_id
  from
  pwreport.url_hits a left outer join
  pwreport.url_hits_category_jt c on (a.id = c.url_hits_id),
  pwreport.ident b,
  pwreport.timezone e
  where
  a.ident_id = b.id
  and b.timezone_id = e.id
  and time = extract ('epoch' from timestamp '2009-08-12')
  and time  extract ('epoch' from timestamp '2009-08-13' )
  and direction = 'REQUEST'
  ;
 
  QUERY
  PLAN
  -
 -
  -
 - 
   Merge Right Join  (cost=47528508.61..180424544.59 rows=10409251
  width=53) Merge Cond: (c.url_hits_id = a.id)
 -  Index Scan using mt_url_hits_category_jt_url_hits_id_index on
  url_hits_category_jt c  (cost=0.00..122162596.63 rows=4189283233 width=8)
 -  Sort  (cost=47528508.61..47536931.63 rows=3369210 width=49)
   Sort Key: a.id
   -  Hash Join  (cost=2565.00..47163219.21 rows=3369210 width=49)
 Hash Cond: (b.timezone_id = e.id)
 -  Hash Join  (cost=2553.49..47116881.07 rows=3369210
  width=37)
   Hash Cond: (a.ident_id = b.id)
   -  Seq Scan on url_hits a  (cost=0.00..47051154.89
  rows=3369210 width=12)
 Filter: ((direction =
  'REQUEST'::proxy_direction_enum) AND ((time)::double precision =
  1250035200::double precision) AND ((time)::double precision 
  1250121600::double precision))
   -  Hash  (cost=2020.44..2020.44 rows=42644
  width=29) -  Seq Scan on ident b  (cost=0.00..2020.44 rows=42644
  width=29)
 -  Hash  (cost=6.78..6.78 rows=378 width=20)
   -  Seq Scan on timezone e  (cost=0.00..6.78
  rows=378 width=20)
  (15 rows)
 
 
  --
  Sent via pgsql-performance mailing list
  (pgsql-performance@postgresql.org) To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query tuning

2009-08-19 Thread Kevin Kempter
On Wednesday 19 August 2009 11:31:30 Nikolas Everett wrote:
 2009/8/19 Grzegorz Jaśkiewicz gryz...@gmail.com

  that seems to be the killer:
 
  and time = extract ('epoch' from timestamp '2009-08-12')
  and time  extract ('epoch' from timestamp '2009-08-13' )
 
  You probably need an index on time/epoch:
 
  CREATE INDEX foo ON table(extract ('epoch' from timestamp time );

 It looks like those extracts just make constant integer times. You probably
 just create an index on the time column.

 Also, why not store times as timestamps?

  or something like that, vacuum analyze and retry.
 
  --
  Sent via pgsql-performance mailing list
  (pgsql-performance@postgresql.org) To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance


We do have an index on url_hits.time

not sure why timestamps were not used, I was not here for the design phase.


Thx




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query tuning

2006-08-23 Thread Subbiah, Stalin
I get the same plan after running vacuum analyze. Nope, I don't have
index on objdomainid, objid and userdomainid. Only eventime has it.

-Original Message-
From: Chris [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 22, 2006 8:06 PM
To: Subbiah, Stalin
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query tuning

Subbiah, Stalin wrote:
 Actually these servers will be upgraded to 8.1.4 in couple of months.

even so, you could get some bad data in there.
http://www.postgresql.org/docs/8.0/static/release.html . Go through the
old release notes and you'll find various race conditions, crashes etc.

 Here you go with explain analyze.
 
 # explain analyze SELECT *
 FROM EVENTLOG
 WHERE EVENTTIME'07/23/06 16:00:00' AND  EVENTTIME'08/22/06 16:00:00'

 AND  (OBJDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA' 
 OR OBJID='tzRh39d0d91luNGT1weIUjLvFIcA' 
 OR USERDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA')
 ORDER BY EVENTTIME DESC, SEQUENCENUM DESC LIMIT 500 OFFSET 500;
  
 QUERY PLAN
 
 --
 --
 --
 --
 --
 --
 --
 --
 -
  Limit  (cost=15583110.14..15583111.39 rows=500 width=327) (actual
 time=427771.568..427772.904 rows=500 loops=1)
-  Sort  (cost=15583108.89..15618188.88 rows=14031998 width=327) 
 (actual time=427770.504..427771.894 rows=1000 loops=1)
  Sort Key: eventtime, sequencenum
  -  Seq Scan on eventlog  (cost=0.00..2334535.17 
 rows=14031998
 width=327) (actual time=10.370..190038.764 rows=7699388 loops=1)
Filter: ((eventtime  '2006-07-23 16:00:00'::timestamp 
 without time zone) AND (eventtime  '2006-08-22 16:00:00'::timestamp 
 without time zone) AND (((objdomainid)::text =
 'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((objid)::text =
 'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((userdomainid)::text =
 'tzRh39d0d91luNGT1weIUjLvFIcA'::text)))
  Total runtime: 437884.134 ms
 (6 rows)

If you analyze the table then run this again what plan does it come back
with?

I can't read explain output properly but I suspect (and I'm sure I'll be
corrected if need be) that the sort step is way out of whack and so is
the seq scan because the stats aren't up to date enough.

Do you have an index on objdomainid, objid and userdomainid (one index
per field) ? I wonder if that will help much.

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Query tuning

2006-08-23 Thread Dave Dutcher
It seems to me that what would work best is an index scan backward on the
eventtime index.  I don't see why that wouldn't work for you, maybe the
planner is just esitmating the seq scan and sort is faster for some reason.
What does EXPLAIN say if you use a small limit and offset like 10?  Or what
does EXPLAIN say if you first run set enable_seqscan=false;  (If you get
the same plan, then I wouldn't bother running EXPLAIN ANALYZE, but if you
get a different plan I would run EXPLAIN ANALYZE to see if the new plan is
any faster.)



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Subbiah, Stalin
 Sent: Wednesday, August 23, 2006 1:03 PM
 To: Chris
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Query tuning
 
 
 I get the same plan after running vacuum analyze. Nope, I don't have
 index on objdomainid, objid and userdomainid. Only eventime has it.
 
 -Original Message-
 From: Chris [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, August 22, 2006 8:06 PM
 To: Subbiah, Stalin
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Query tuning
 
 Subbiah, Stalin wrote:
  Actually these servers will be upgraded to 8.1.4 in couple 
 of months.
 
 even so, you could get some bad data in there.
 http://www.postgresql.org/docs/8.0/static/release.html . Go 
 through the
 old release notes and you'll find various race conditions, 
 crashes etc.
 
  Here you go with explain analyze.
  
  # explain analyze SELECT *
  FROM EVENTLOG
  WHERE EVENTTIME'07/23/06 16:00:00' AND  
 EVENTTIME'08/22/06 16:00:00'
 
  AND  (OBJDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA' 
  OR OBJID='tzRh39d0d91luNGT1weIUjLvFIcA' 
  OR USERDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA')
  ORDER BY EVENTTIME DESC, SEQUENCENUM DESC LIMIT 500 OFFSET 500;
   
  QUERY PLAN
  
  
 --
  --
  
 --
  --
  
 --
  --
  
 --
  --
  -
   Limit  (cost=15583110.14..15583111.39 rows=500 width=327) (actual
  time=427771.568..427772.904 rows=500 loops=1)
 -  Sort  (cost=15583108.89..15618188.88 rows=14031998 
 width=327) 
  (actual time=427770.504..427771.894 rows=1000 loops=1)
   Sort Key: eventtime, sequencenum
   -  Seq Scan on eventlog  (cost=0.00..2334535.17 
  rows=14031998
  width=327) (actual time=10.370..190038.764 rows=7699388 loops=1)
 Filter: ((eventtime  '2006-07-23 
 16:00:00'::timestamp 
  without time zone) AND (eventtime  '2006-08-22 
 16:00:00'::timestamp 
  without time zone) AND (((objdomainid)::text =
  'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((objid)::text =
  'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((userdomainid)::text =
  'tzRh39d0d91luNGT1weIUjLvFIcA'::text)))
   Total runtime: 437884.134 ms
  (6 rows)
 
 If you analyze the table then run this again what plan does 
 it come back
 with?
 
 I can't read explain output properly but I suspect (and I'm 
 sure I'll be
 corrected if need be) that the sort step is way out of whack and so is
 the seq scan because the stats aren't up to date enough.
 
 Do you have an index on objdomainid, objid and userdomainid (one index
 per field) ? I wonder if that will help much.
 
 --
 Postgresql  php tutorials
 http://www.designmagick.com/
 
 ---(end of 
 broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Query tuning

2006-08-23 Thread Subbiah, Stalin
Changing limit or offset to a small number doesn't have any change in
plans. Likewise enable_seqscan to false. They still take 8-10 mins to
runs. 

-Original Message-
From: Dave Dutcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 23, 2006 4:20 PM
To: Subbiah, Stalin
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] Query tuning

It seems to me that what would work best is an index scan backward on
the eventtime index.  I don't see why that wouldn't work for you, maybe
the planner is just esitmating the seq scan and sort is faster for some
reason.
What does EXPLAIN say if you use a small limit and offset like 10?  Or
what does EXPLAIN say if you first run set enable_seqscan=false;  (If
you get the same plan, then I wouldn't bother running EXPLAIN ANALYZE,
but if you get a different plan I would run EXPLAIN ANALYZE to see if
the new plan is any faster.)



 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Subbiah, 
 Stalin
 Sent: Wednesday, August 23, 2006 1:03 PM
 To: Chris
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Query tuning
 
 
 I get the same plan after running vacuum analyze. Nope, I don't have 
 index on objdomainid, objid and userdomainid. Only eventime has it.
 
 -Original Message-
 From: Chris [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, August 22, 2006 8:06 PM
 To: Subbiah, Stalin
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Query tuning
 
 Subbiah, Stalin wrote:
  Actually these servers will be upgraded to 8.1.4 in couple
 of months.
 
 even so, you could get some bad data in there.
 http://www.postgresql.org/docs/8.0/static/release.html . Go through 
 the old release notes and you'll find various race conditions, crashes

 etc.
 
  Here you go with explain analyze.
  
  # explain analyze SELECT *
  FROM EVENTLOG
  WHERE EVENTTIME'07/23/06 16:00:00' AND
 EVENTTIME'08/22/06 16:00:00'
 
  AND  (OBJDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA' 
  OR OBJID='tzRh39d0d91luNGT1weIUjLvFIcA' 
  OR USERDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA')
  ORDER BY EVENTTIME DESC, SEQUENCENUM DESC LIMIT 500 OFFSET 500;
   
  QUERY PLAN
  
  
 --
  --
  
 --
  --
  
 --
  --
  
 --
  --
  -
   Limit  (cost=15583110.14..15583111.39 rows=500 width=327) (actual
  time=427771.568..427772.904 rows=500 loops=1)
 -  Sort  (cost=15583108.89..15618188.88 rows=14031998
 width=327)
  (actual time=427770.504..427771.894 rows=1000 loops=1)
   Sort Key: eventtime, sequencenum
   -  Seq Scan on eventlog  (cost=0.00..2334535.17
  rows=14031998
  width=327) (actual time=10.370..190038.764 rows=7699388 loops=1)
 Filter: ((eventtime  '2006-07-23
 16:00:00'::timestamp
  without time zone) AND (eventtime  '2006-08-22
 16:00:00'::timestamp
  without time zone) AND (((objdomainid)::text =
  'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((objid)::text =
  'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((userdomainid)::text =
  'tzRh39d0d91luNGT1weIUjLvFIcA'::text)))
   Total runtime: 437884.134 ms
  (6 rows)
 
 If you analyze the table then run this again what plan does it come 
 back with?
 
 I can't read explain output properly but I suspect (and I'm sure I'll 
 be corrected if need be) that the sort step is way out of whack and so

 is the seq scan because the stats aren't up to date enough.
 
 Do you have an index on objdomainid, objid and userdomainid (one index

 per field) ? I wonder if that will help much.
 
 --
 Postgresql  php tutorials
 http://www.designmagick.com/
 
 ---(end of
 broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Query tuning

2006-08-22 Thread Chris

Subbiah, Stalin wrote:

Hello All,

This query runs forever and ever. Nature of this table being lots of
inserts/deletes/query, I vacuum it every half hour to keep the holes
reusable and nightly once vacuum analyze to update the optimizer. We've
got index on eventtime only. Running it for current day uses index range
scan and it runs within acceptable time. Below is the explain of the
query. Is the order by sequencenum desc prevents from applying limit
optimization?

explain SELECT *
FROM EVENTLOG 
WHERE EVENTTIME'07/23/06 16:00:00' 
AND  EVENTTIME'08/22/06 16:00:00' 
AND  (OBJDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA' 
OR OBJID='tzRh39d0d91luNGT1weIUjLvFIcA' 
OR USERDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA') 
ORDER BY EVENTTIME DESC, SEQUENCENUM DESC LIMIT 500 OFFSET 0;
 
QUERY PLAN






-
 Limit  (cost=15546930.29..15546931.54 rows=500 width=327)
   -  Sort  (cost=15546930.29..15581924.84 rows=13997819 width=327)
 Sort Key: eventtime, sequencenum
 -  Seq Scan on eventlog  (cost=0.00..2332700.25 rows=13997819
width=327)
   Filter: ((eventtime  '2006-07-23 16:00:00'::timestamp
without time zone) AND (eventtime  '2006-08-22 16:00:00'::timestamp
without time zone) AND (((objdomainid)::text =
'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((objid)::text =
'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((userdomainid)::text =
'tzRh39d0d91luNGT1weIUjLvFIcA'::text)))
(5 rows)

Thanks,
Stalin
Pg version 8.0.1, suse 64bit.


Firstly you should update to 8.0.8 - because it's in the same stream you 
won't need to do a dump/initdb/reload like a major version change, it 
should be a simple upgrade.


Can you send explain analyze instead of just explain?

It sounds like you're not analyz'ing enough - if you're doing lots of 
updates/deletes/inserts then the statistics postgresql uses to choose 
whether to do an index scan or something else will quickly be outdated 
and so it'll have to go back to a full table scan every time..


Can you set up autovacuum to handle that for you more regularly?

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Query tuning

2006-08-22 Thread Subbiah, Stalin
Actually these servers will be upgraded to 8.1.4 in couple of months.

Here you go with explain analyze.

# explain analyze SELECT *
FROM EVENTLOG 
WHERE EVENTTIME'07/23/06 16:00:00' AND  EVENTTIME'08/22/06 16:00:00' 
AND  (OBJDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA' 
OR OBJID='tzRh39d0d91luNGT1weIUjLvFIcA' 
OR USERDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA') 
ORDER BY EVENTTIME DESC, SEQUENCENUM DESC LIMIT 500 OFFSET 500;
 
QUERY PLAN





-
 Limit  (cost=15583110.14..15583111.39 rows=500 width=327) (actual
time=427771.568..427772.904 rows=500 loops=1)
   -  Sort  (cost=15583108.89..15618188.88 rows=14031998 width=327)
(actual time=427770.504..427771.894 rows=1000 loops=1)
 Sort Key: eventtime, sequencenum
 -  Seq Scan on eventlog  (cost=0.00..2334535.17 rows=14031998
width=327) (actual time=10.370..190038.764 rows=7699388 loops=1)
   Filter: ((eventtime  '2006-07-23 16:00:00'::timestamp
without time zone) AND (eventtime  '2006-08-22 16:00:00'::timestamp
without time zone) AND (((objdomainid)::text =
'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((objid)::text =
'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((userdomainid)::text =
'tzRh39d0d91luNGT1weIUjLvFIcA'::text)))
 Total runtime: 437884.134 ms
(6 rows)

-Original Message-
From: Chris [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 22, 2006 6:37 PM
To: Subbiah, Stalin
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query tuning

Subbiah, Stalin wrote:
 Hello All,
 
 This query runs forever and ever. Nature of this table being lots of 
 inserts/deletes/query, I vacuum it every half hour to keep the holes 
 reusable and nightly once vacuum analyze to update the optimizer. 
 We've got index on eventtime only. Running it for current day uses 
 index range scan and it runs within acceptable time. Below is the 
 explain of the query. Is the order by sequencenum desc prevents from 
 applying limit optimization?
 
 explain SELECT *
 FROM EVENTLOG
 WHERE EVENTTIME'07/23/06 16:00:00' 
 AND  EVENTTIME'08/22/06 16:00:00' 
 AND  (OBJDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA' 
 OR OBJID='tzRh39d0d91luNGT1weIUjLvFIcA' 
 OR USERDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA')
 ORDER BY EVENTTIME DESC, SEQUENCENUM DESC LIMIT 500 OFFSET 0;
  
 QUERY PLAN
 
 --
 --
 --
 --
 --
 --
 --
 --
 -
  Limit  (cost=15546930.29..15546931.54 rows=500 width=327)
-  Sort  (cost=15546930.29..15581924.84 rows=13997819 width=327)
  Sort Key: eventtime, sequencenum
  -  Seq Scan on eventlog  (cost=0.00..2332700.25 
 rows=13997819
 width=327)
Filter: ((eventtime  '2006-07-23 16:00:00'::timestamp 
 without time zone) AND (eventtime  '2006-08-22 16:00:00'::timestamp 
 without time zone) AND (((objdomainid)::text =
 'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((objid)::text =
 'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((userdomainid)::text =
 'tzRh39d0d91luNGT1weIUjLvFIcA'::text)))
 (5 rows)
 
 Thanks,
 Stalin
 Pg version 8.0.1, suse 64bit.

Firstly you should update to 8.0.8 - because it's in the same stream you
won't need to do a dump/initdb/reload like a major version change, it
should be a simple upgrade.

Can you send explain analyze instead of just explain?

It sounds like you're not analyz'ing enough - if you're doing lots of
updates/deletes/inserts then the statistics postgresql uses to choose
whether to do an index scan or something else will quickly be outdated
and so it'll have to go back to a full table scan every time..

Can you set up autovacuum to handle that for you more regularly?

--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Query tuning

2006-08-22 Thread Chris

Subbiah, Stalin wrote:

Actually these servers will be upgraded to 8.1.4 in couple of months.


even so, you could get some bad data in there.
http://www.postgresql.org/docs/8.0/static/release.html . Go through the 
old release notes and you'll find various race conditions, crashes etc.



Here you go with explain analyze.

# explain analyze SELECT *
FROM EVENTLOG 
WHERE EVENTTIME'07/23/06 16:00:00' AND  EVENTTIME'08/22/06 16:00:00' 
AND  (OBJDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA' 
OR OBJID='tzRh39d0d91luNGT1weIUjLvFIcA' 
OR USERDOMAINID='tzRh39d0d91luNGT1weIUjLvFIcA') 
ORDER BY EVENTTIME DESC, SEQUENCENUM DESC LIMIT 500 OFFSET 500;
 
QUERY PLAN






-
 Limit  (cost=15583110.14..15583111.39 rows=500 width=327) (actual
time=427771.568..427772.904 rows=500 loops=1)
   -  Sort  (cost=15583108.89..15618188.88 rows=14031998 width=327)
(actual time=427770.504..427771.894 rows=1000 loops=1)
 Sort Key: eventtime, sequencenum
 -  Seq Scan on eventlog  (cost=0.00..2334535.17 rows=14031998
width=327) (actual time=10.370..190038.764 rows=7699388 loops=1)
   Filter: ((eventtime  '2006-07-23 16:00:00'::timestamp
without time zone) AND (eventtime  '2006-08-22 16:00:00'::timestamp
without time zone) AND (((objdomainid)::text =
'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((objid)::text =
'tzRh39d0d91luNGT1weIUjLvFIcA'::text) OR ((userdomainid)::text =
'tzRh39d0d91luNGT1weIUjLvFIcA'::text)))
 Total runtime: 437884.134 ms
(6 rows)


If you analyze the table then run this again what plan does it come back 
with?


I can't read explain output properly but I suspect (and I'm sure I'll be 
corrected if need be) that the sort step is way out of whack and so is 
the seq scan because the stats aren't up to date enough.


Do you have an index on objdomainid, objid and userdomainid (one index 
per field) ? I wonder if that will help much.


--
Postgresql  php tutorials
http://www.designmagick.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Query tuning help

2005-05-11 Thread Ulrich Wisser
Hi Dan,
I tried to understand your query, but I couldn't get my understanding of 
the query and your description in sync.

Why do you use sub selects? Wouldn't a simple recordtext like '%RED%' 
do the trick too?

You combine all your where conditions with and. To me this looks like 
you get only rows with RED and CORVETTE.

From your description I would rewrite the query as
explain analyze
select distinct
em.incidentid,
ea.recordtext as retdata,
eg.long,
eg.lat
from
ea join em using(incidentid) join eg using(incidentid)
where
em.entrydate = '2005-1-1 00:00'::date
and em.entrydate = '2005-5-9 00:00'::date
and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
order by em.entrydate
That should give you all rows containing one of the words.
Does it work?
Is is faster? Is it fast enough?
Ulrich
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Query tuning help

2005-05-09 Thread Harald Fuchs
In article [EMAIL PROTECTED],
Dan Harris [EMAIL PROTECTED] writes:

 On May 8, 2005, at 8:06 PM, Josh Berkus wrote:
 
 If I were to use tsearch2 for full-text indexing, would I need to
 create another table that merges all of my recordtext rows into a
 single 'text' field type?
 
 No.   Read the OpenFTS docs, they are fairly clear on how to set up
 a simple
 FTS index. (TSearch2 ~~ OpenFTS)
 
 If so, this is where I run into problems, as
 my logic also needs to match multiple words in their original order.

 I have been reading the Tsearch2 docs and either I don't understand
 something or I'm not communicating my situation clearly enough.  It
 seems that Tsearch2 has a concept of document.  And, in everything I
 am reading, they expect your document to be all contained in a
 single row.  Since my words can be spread across multiple rows, I
 don't see that Tsearch2 will combine all 'recordtext' row values with
 the same incidentid into a single vector.  Am I overlooking
 something in the docs?

AFAICS no, but you could create a separate table containing just the
distinct incidentids and the tsearch2 vectors of all recordtexts
matching that incidentid.  This table would get updated solely by
triggers on the original table and would provide a fast way to get all
incidentids for RED and CORVETTE.  The question is: would this reduce
the number of rows to check more than filtering on date?


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query tuning help

2005-05-09 Thread Mischa Sandberg
Quoting Russell Smith [EMAIL PROTECTED]:

 On Mon, 9 May 2005 11:49 am, Dan Harris wrote:
  On May 8, 2005, at 6:51 PM, Russell Smith wrote:
 [snip]
  select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
  FROM em
  JOIN ea ON em.incidentid = ea.incidentid  --- slight paraphrase /Mischa.
  AND em.entrydate between '2005-1-1' and '2005-5-9'
  AND ea.recordtext like '%RED%'  AND ea.recordtext like
'%CORVETTE%'

  Here's the situation:
  Due to the format of the systems with which I integrate ( I have no
  control over these formats ), we will get these 'recordtext' values one
  line at a time, accumulating over time.  The only way I can find to
  make this work is to insert a new record for each line.  The problem
  is, that when someone wants to search multiple keywords, they expect
  these words to be matched across multiple records with a given incident
  number.
 
For a very simple example:
 
  IncidentID  DateRecordtext
  --  -
  1   2005-05-01 14:21 blah blah blah RED blah blah
     2005-05-01 14:23 not what we are looking for
  1   2005-05-02 02:05 blah CORVETTE blah blah
 

select em.incidentid, ea.recordtest as retdata
fromem
join ( -- equivalent to where incidentid in (...), sometimes faster.
  select incidentid
  from  em join  ea using (incidentid)
  where em.entrydate between '2005-1-1' and '2005-5-9'
  group by incidentid
  having 1 = min(case when recordtest like '%RED%' then 1 end)
 and 1 = min(case when recordtest like '%CORVETTE%' then 1 end)
 ) as X using (incidentid);



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Query tuning help

2005-05-08 Thread Russell Smith
On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
 Sorry to bother everyone with yet another my query isn't using an 
 index problem but I am over my head on this one..  I am open to ways 
 of restructuring this query to perform better.
 
 I have a table, 'ea', with 22 million rows in it.  VACUUM ANALYZE has 
 been just run on the table.
 
 This is the result of:
 
 explain analyze
 select distinct
  em.incidentid,
  ea.recordtext as retdata,
  eg.long,
  eg.lat
 from
  ea, em, eg
 where
  em.incidentid = ea.incidentid and
  em.incidentid = eg.incidentid and
  em.entrydate = '2005-1-1 00:00' and
  em.entrydate = '2005-5-9 00:00'
  and ea.incidentid in (
   select
incidentid
   from
ea
   where
recordtext like '%RED%'
  )
 
  and ea.incidentid in (
   select
incidentid
   from
ea
   where
recordtext like '%CORVETTE%'
  )
  and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  
 order by em.entrydate
 
You cannot use an index for %CORVETTE%, or %RED%.  There is no way
for the index to know if a row had that in the middle without scanning the whole
index.  So it's much cheaper to do a sequence scan.

One possible way to make the query faster is to limit based on date, as you 
will only get about 700 rows.
And then don't use subselects, as they are doing full sequence scans.  I think 
this query does what you do 
above, and I think it will be faster, but I don't know.

select distinct em.incidentid, ea.recordtext as retdata, eg.long,  eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate = '2005-1-1 
00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND 
ea.recordtext like '%CORVETTE%')
JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or 
recordtext like '%CORVETTE%'  );

 
 -
 ANALYZE RESULTS
 -
 
   Unique  (cost=774693.72..774693.76 rows=1 width=159) (actual 
 time=446787.056..446787.342 rows=72 loops=1)
 -  Sort  (cost=774693.72..774693.72 rows=1 width=159) (actual 
 time=446787.053..446787.075 rows=72 loops=1)
   Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
   -  Nested Loop  (cost=771835.10..774693.71 rows=1 width=159) 
 (actual time=444378.655..446786.746 rows=72 loops=1)
 -  Nested Loop  (cost=771835.10..774688.81 rows=1 width=148) 
 (actual time=444378.532..446768.381 rows=72 loops=1)
   -  Nested Loop IN Join  (cost=771835.10..774678.88 
 rows=2 width=81) (actual time=444367.080..446191.864 rows=701 loops=1)
 -  Nested Loop  (cost=771835.10..774572.05 
 rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 loops=1)
   -  HashAggregate  
 (cost=771835.10..771835.10 rows=1 width=17) (actual 
 time=444366.702..444368.583 rows=473 loops=1)
 -  Seq Scan on ea  
 (cost=0.00..771834.26 rows=335 width=17) (actual time=259.746..444358.837 
 rows=592 loops=1)
   Filter: ((recordtext)::text ~~ 
 '%CORVETTE%'::text)
   -  Index Scan using ea1 on ea  
 (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 
 loops=473)
 Index Cond: ((ea.incidentid)::text = 
 (outer.incidentid)::text)
 Filter: (((recordtext)::text ~~ 
 '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
 -  Index Scan using ea1 on ea  
 (cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 
 loops=1011)
   Index Cond: ((outer.incidentid)::text = 
 (ea.incidentid)::text)
   Filter: ((recordtext)::text ~~ 
 '%RED%'::text)
   -  Index Scan using em_incidentid_idx on em  
 (cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 loops=701)
 Index Cond: ((outer.incidentid)::text = 
 (em.incidentid)::text)
 Filter: ((entrydate = '2005-01-01 
 00:00:00'::timestamp without time zone) AND (entrydate = '2005-05-09 
 00:00:00'::timestamp without time zone))
 -  Index Scan using eg_incidentid_idx on eg  
 (cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 loops=72)
   Index Cond: ((outer.incidentid)::text = 
 (eg.incidentid)::text)
   Total runtime: 446871.880 ms
 (22 rows)
 
 
 -
 EXPLANATION
 -
 The reason for the redundant LIKE clause is that first, I only want 
 those incidentids that contain the words 'RED' and 'CORVETTE'.  BUT, 
 those two words may exist across multiple records with the same 
 incidentid.  Then, I only want to actually work with the rows that 
 contain one of the words.  This query will repeat the same logic for 
 however many keywords are entered 

Re: [PERFORM] Query tuning help

2005-05-08 Thread Josh Berkus
Dan,

 and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  

It is simply not possible to use B-tree indexes on these kind of text queries.  
B-trees require you to start at the left side of the field, because B-trees 
locate records via  tests.  Anywhere in the field text search requires a 
Full Text Index.

 The reason for the redundant LIKE clause is that first, I only want
 those incidentids that contain the words 'RED' and 'CORVETTE'.  BUT,
 those two words may exist across multiple records with the same
 incidentid.  Then, I only want to actually work with the rows that
 contain one of the words.  This query will repeat the same logic for
 however many keywords are entered by the user.  I have investigated
 text searching options and have not found them to be congruous with my
 application.

Sounds like you either need to restructure your application, restructure your 
database (so that you're not doing anywhere in field searches), or buy 32GB 
of ram so that you can cache the whole table.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Query tuning help

2005-05-08 Thread Tom Lane
Russell Smith [EMAIL PROTECTED] writes:
 On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
 and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  
 
 You cannot use an index for %CORVETTE%, or %RED%.

Not a btree index anyway.  Dan might have some success here with a
full-text-indexing package (eg, contrib/tsearch2)

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Query tuning help

2005-05-08 Thread Dan Harris
On May 8, 2005, at 6:51 PM, Russell Smith wrote:
On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
You cannot use an index for %CORVETTE%, or %RED%.  There is no way
for the index to know if a row had that in the middle without scanning  
the whole
index.  So it's much cheaper to do a sequence scan.

While I believe you, I'm confused by this line in my original EXPLAIN  
ANALYZE:

-  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47)  
(actual time=2.085..2.309 rows=2 loops=473)
Index Cond:  
((ea.incidentid)::text = (outer.incidentid)::text)
Filter: (((recordtext)::text  
~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
Doesn't that mean it was using an index to filter?  Along those lines,  
before I created index 'ea1', the query was much much slower.  So, it  
seemed like creating this index made a difference.

One possible way to make the query faster is to limit based on date,  
as you will only get about 700 rows.
And then don't use subselects, as they are doing full sequence scans.   
I think this query does what you do
above, and I think it will be faster, but I don't know.

I REALLY like this idea! If I could just filter by date first and then  
sequential scan through those, it should be very manageable.  Hopefully  
I can keep this goal while still accommodating the requirement listed  
in my next paragraph.

select distinct em.incidentid, ea.recordtext as retdata, eg.long,   
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%'  
AND ea.recordtext like '%CORVETTE%')
JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like  
'%RED%' or recordtext like '%CORVETTE%'  );

I have run this, and while it is very fast, I'm concerned it's not  
doing what I need.  Here's the situation:

Due to the format of the systems with which I integrate ( I have no  
control over these formats ), we will get these 'recordtext' values one  
line at a time, accumulating over time.  The only way I can find to  
make this work is to insert a new record for each line.  The problem  
is, that when someone wants to search multiple keywords, they expect  
these words to be matched across multiple records with a given incident  
number.

 For a very simple example:
IncidentID		DateRecordtext
--		-			 
---
1			2005-05-01 14:21	blah blah blah RED blah blah
			2005-05-01 14:23	not what we are looking for
1			2005-05-02 02:05	blah CORVETTE blah blah

So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE  
'%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the  
condition will only be applied to a single row of recordtext at a time,  
not a whole group with the same incident number.

If I were to use tsearch2 for full-text indexing, would I need to  
create another table that merges all of my recordtext rows into a  
single 'text' field type?  If so, this is where I run into problems, as  
my logic also needs to match multiple words in their original order.  I  
may also receive additional updates to the previous data.  In that  
case, I need to replace the original record with the latest version of  
it.  If I have already concatenated these rows into a single field, the  
logic to in-line replace only the old text that has changed is very  
very difficult at best.  So, that's the reason I had to do two  
subqueries in my example.  Please tell me if I misunderstood your logic  
and it really will match given my condition above, but it didn't seem  
like it would.

Thanks again for the quick responses!  This list has been a great  
resource for me.

-Dan
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Query tuning help

2005-05-08 Thread Josh Berkus
Dan,

 While I believe you, I'm confused by this line in my original EXPLAIN

 ANALYZE:
  -  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47)
  (actual time=2.085..2.309 rows=2 loops=473)
  Index Cond:
  ((ea.incidentid)::text = (outer.incidentid)::text)
  Filter: (((recordtext)::text
  ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))

The index named is matching based on incidentid -- the join condition.  The 
filter is applied against the table rows, i.e. a scan.

 If I were to use tsearch2 for full-text indexing, would I need to
 create another table that merges all of my recordtext rows into a
 single 'text' field type?  

No.   Read the OpenFTS docs, they are fairly clear on how to set up a simple 
FTS index. (TSearch2 ~~ OpenFTS)

 If so, this is where I run into problems, as 
 my logic also needs to match multiple words in their original order.  

You do that by doubling up ... that is, use the FTS index to pick all rows 
that contain RED and CORVETTE, and then check the order.  I'll also note 
that your current query is not checking word order. 

Example:
WHERE recordtext_fti @@ to_tsquery ('default', 'RED  CORVETTE')
AND recordtext LIKE '%RED%CORVETTE%'

I'm doing something fairly similar on one of my projects and it works very 
well.

The limitations on TSearch2 indexes are:
1) they are expensive to update, so your data loads would be noticably slower.  
2) they are only fast when cached in RAM (and when cached, are *very* fast).  
So if you have a variety of other processes that tend to fill up RAM between 
searches, you may find them less useful.
3) You have to create a materialized index column next to recordtext, which 
will increase the size of the table.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Query tuning help

2005-05-08 Thread Tom Lane
Dan Harris [EMAIL PROTECTED] writes:
 -  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47) 
 (actual time=2.085..2.309 rows=2 loops=473)
   Index Cond: ((ea.incidentid)::text = (outer.incidentid)::text)
   Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text 
 ~~ '%CORVETTE%'::text))

 Doesn't that mean it was using an index to filter?

No.  The Index Cond shows it is using the index only for the join
condition.  A Filter is an additional filter condition that happens to
get applied at this plan node --- but it'll be applied to every row the
index finds for the index condition.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query tuning help

2005-05-08 Thread Dan Harris
On May 8, 2005, at 8:06 PM, Josh Berkus wrote:

If I were to use tsearch2 for full-text indexing, would I need to
create another table that merges all of my recordtext rows into a
single 'text' field type?
No.   Read the OpenFTS docs, they are fairly clear on how to set up a 
simple
FTS index. (TSearch2 ~~ OpenFTS)

If so, this is where I run into problems, as
my logic also needs to match multiple words in their original order.
I have been reading the Tsearch2 docs and either I don't understand 
something or I'm not communicating my situation clearly enough.  It 
seems that Tsearch2 has a concept of document.  And, in everything I 
am reading, they expect your document to be all contained in a single 
row.  Since my words can be spread across multiple rows, I don't see 
that Tsearch2 will combine all 'recordtext' row values with the same 
incidentid into a single vector.  Am I overlooking something in the 
docs?

I'm doing something fairly similar on one of my projects and it works 
very
well.

I'd be curious what similarities they have?  Is it the searching across 
multiple rows or the order of words?

The limitations on TSearch2 indexes are:
1) they are expensive to update, so your data loads would be noticably 
slower.
2) they are only fast when cached in RAM (and when cached, are *very* 
fast).
So if you have a variety of other processes that tend to fill up RAM 
between
searches, you may find them less useful.
3) You have to create a materialized index column next to recordtext, 
which
will increase the size of the table.
Duly noted.  If this method can search across rows, I'm willing to 
accept this overhead for the speed it would add.

In the meantime, is there any way I can reach my goal without Tsearch2 
by just restructuring my query to narrow down the results by date 
first, then seq scan for the 'likes'?

-Dan
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query tuning help

2005-05-08 Thread Russell Smith
On Mon, 9 May 2005 11:49 am, Dan Harris wrote:
 
 On May 8, 2005, at 6:51 PM, Russell Smith wrote:
 
[snip]
  select distinct em.incidentid, ea.recordtext as retdata, eg.long,   
  eg.lat
  FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
  '2005-1-1 00:00'
  AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%'  
  AND ea.recordtext like '%CORVETTE%')
  JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like  
  '%RED%' or recordtext like '%CORVETTE%'  );
 
 
 I have run this, and while it is very fast, I'm concerned it's not  
 doing what I need.
How fast is very fast?


 Here's the situation: 
 
 Due to the format of the systems with which I integrate ( I have no  
 control over these formats ), we will get these 'recordtext' values one  
 line at a time, accumulating over time.  The only way I can find to  
 make this work is to insert a new record for each line.  The problem  
 is, that when someone wants to search multiple keywords, they expect  
 these words to be matched across multiple records with a given incident  
 number.
 
   For a very simple example:
 
 IncidentID  DateRecordtext
 --  -
 ---
 1   2005-05-01 14:21 blah blah blah RED blah blah
    2005-05-01 14:23 not what we are looking for
 1   2005-05-02 02:05 blah CORVETTE blah blah
 
 So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE  
 '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the  
 condition will only be applied to a single row of recordtext at a time,  
 not a whole group with the same incident number.
 
 If I were to use tsearch2 for full-text indexing, would I need to  
 create another table that merges all of my recordtext rows into a  
 single 'text' field type?  If so, this is where I run into problems, as  
 my logic also needs to match multiple words in their original order.  I  
 may also receive additional updates to the previous data.  In that  
 case, I need to replace the original record with the latest version of  
 it.  If I have already concatenated these rows into a single field, the  
 logic to in-line replace only the old text that has changed is very  
 very difficult at best.  So, that's the reason I had to do two  
 subqueries in my example.  Please tell me if I misunderstood your logic  
 and it really will match given my condition above, but it didn't seem  
 like it would.
 
 Thanks again for the quick responses!  This list has been a great  
 resource for me.
 
select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND (ea.recordtext like '%RED%'  OR 
ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid WHERE 
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid)  AND 
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%'))
JOIN eg ON em.incidentid = eg.incidentid)

This may be more accurate.  However I would cool it VERY NASTY. Josh's 
solutions may be better.
However much of the data should be in memory once the subplans are done, so it 
may be quite fast.
you may 
 

 -Dan
 
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 
 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query tuning help

2005-05-08 Thread Dan Harris
On May 8, 2005, at 8:32 PM, Russell Smith wrote:
I have run this, and while it is very fast, I'm concerned it's not
doing what I need.
How fast is very fast?
It took 35 seconds to complete versus ~450 my old way.
select distinct em.incidentid, ea.recordtext as retdata, eg.long, 
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND (ea.recordtext like '%RED%'  
OR ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid WHERE
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, 
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like 
'%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid)  AND
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, 
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate =  
'2005-1-1 00:00'
AND em.entrydate = '2005-5-9 00:00' AND ea.recordtext like '%RED%'))
JOIN eg ON em.incidentid = eg.incidentid)

Yes, it is nasty, but so was my previous query :)  So long as this is 
faster, I'm ok with that.  I'll see if i can make this work.  Thank you 
very much.

-Dan
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Query tuning help

2005-05-08 Thread Klint Gore
On Sun, 8 May 2005 20:31:38 -0600, Dan Harris [EMAIL PROTECTED] wrote:
 Duly noted.  If this method can search across rows, I'm willing to 
 accept this overhead for the speed it would add.

You could use intersect to search across rows.  Using tsearch2 will look
up the RED and CORVETTE using the index and intersect will pull out the
commmon rows.

 In the meantime, is there any way I can reach my goal without Tsearch2 
 by just restructuring my query to narrow down the results by date 
 first, then seq scan for the 'likes'?


select distinct
em.incidentid,
ea.recordtext as retdata,
eg.long,
eg.lat
from
ea, em, eg, 
(
select
ea.incidentid
from
ea, em
where
em.incidentid = ea.incidentid and
em.entrydate = '2005-1-1 00:00' and
em.entrydate = '2005-5-9 00:00' and
recordtext like '%RED%'

intersect

select
ea.incidentid
from
ea, em
where
em.incidentid = ea.incidentid and
em.entrydate = '2005-1-1 00:00' and
em.entrydate = '2005-5-9 00:00' and
recordtext like '%CORVETTE%'
) as iid
where
em.incidentid = ea.incidentid and
em.incidentid = eg.incidentid and
em.entrydate = '2005-1-1 00:00' and
em.entrydate = '2005-5-9 00:00'
and ea.incidentid = iid.incidentid 
and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
order by em.entrydate

klint.

+---+-+
: Klint Gore: Non rhyming:
: EMail   : [EMAIL PROTECTED]   :  slang - the:
: Snail   : A.B.R.I.:  possibilities  :
: Mail  University of New England   :  are useless   :
:   Armidale NSW 2351 Australia : L.J.J.  :
: Fax : +61 2 6772 5376 : :
+---+-+

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Query Tuning

2004-10-08 Thread Pallav Kalva
Tom Lane wrote:
Pallav Kalva [EMAIL PROTECTED] writes:
 

 I have a problem with the below query, when i do explain on the 
below query on my live database it doesnt use any index specified on the 
tables and does seq scan on the table which is 400k records. But if i 
copy the same table onto a different database on a different machine it 
uses all the indexes specified and query runs much quicker.
   

It looks to me like you've never vacuumed/analyzed the copy, and so you
get a different plan there.  The fact that that plan is better than the
one made with statistics is unhappy making :-( ... but when you only
show us EXPLAIN output rather than EXPLAIN ANALYZE, it's impossible to
speculate about why.  Also, what PG version is this?
regards, tom lane
 

Thanks! for the quick reply. I cant run the EXPLAIN ANALYZE on the live 
database because, it takes lot of time and hols up lot of other queries 
on the table. The postgres version I  am using is 7.4 . when you say  i 
never vacuum/analyxed the copy you mean the Live database ? or the copy 
of the live database ? . I run vacuum database daily on my live database 
as a part of daily maintanence.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]