Re: [PERFORM] Query tuning help

2011-10-11 Thread Pavel Stehule
>
>
>  Hash Join  (cost=154.46..691776.11 rows=10059626 width=100) (actual
> time=5.191..37551.360 rows=10063432 loops=1)
>    Hash Cond: (a.order_id =
> o.order_id)
>    ->  Seq Scan on cust_acct a  (cost=0.00..540727.26 rows=10059626
> width=92) (actual time=0.022..18987.095 rows=10063432
> loops=1)
>    ->  Hash  (cost=124.76..124.76 rows=2376 width=12) (actual
> time=5.135..5.135 rows=2534
> loops=1)
>  ->  Seq Scan on cust_orders o  (cost=0.00..124.76 rows=2376
> width=12) (actual time=0.011..2.843 rows=2534 loops=1)
>  Total runtime: 43639.105
> ms
> (6 rows)
>

I am thinking so this time is adequate - processing of 10 mil rows
result must be slow

a tips:

* recheck a seq. read speed - if this is about expected values

* play with work_mem - probably is not enough for one bucket - you can
decrease time about 10-20 sec, but attention to going to swap -
EXPLAIN ANALYZE VERBOSE show a number of buckets - ideal is one.

* use a some filter if it's possible
* use a limit if it's possible

if you really should to process all rows and you need better reaction
time, try to use a cursor. It is optimized for fast first row

Regards

Pavel Stehule

>
>
>
>
>
>
>
>
>
> --
> -
> Kevin Kempter   -   Constent State
> A PostgreSQL Professional Services Company
>   www.consistentstate.com
> -
>
>
> --
> -
> Kevin Kempter   -   Constent State
> A PostgreSQL Professional Services Company
>   www.consistentstate.com
> -

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query tuning help

2011-10-11 Thread CS DBA

On 10/11/2011 12:03 PM, Szymon Guz wrote:



On 11 October 2011 19:52, CS DBA > wrote:


Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
a.status,  a.customer_location_id, a.added_date,
o.agent_id, p.order_location_id_id,
COALESCE(a.customer_location_id, p.order_location_id) AS
order_location_id
FROM
cust_acct a JOIN
cust_orders o
ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most
cases it's more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers
is at 8GB


I've tried separating the queries as filtering queries & joining
the results, disabling seq scans, upping work_mem and half a dozen
other approaches.  Here's the explain plan:

 Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
   Hash Cond: (a.order_id = o.order_id)
   ->  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130
width=92)
   ->  Hash  (cost=122.69..122.69 rows=2269 width=12)
 ->  Seq Scan on cust_orders o  (cost=0.00..122.69
rows=2269 width=12)

Thanks in advance for any help, tips, etc...





Hi,
two simple questions:

- do you really need getting all 9M rows?

unfortunately yes



- show us the table structure, together with index definitions




cust_acct table

Column |Type 
|   Modifiers

---+-+---
 account_id| bigint  | not null default 
nextval('cust_account_id_seq'::regclass)

 customer_id   | character varying(10)   |
 order_id  | integer | not null
 primary_contact_id| bigint  |
 status| accounts_status_type| not null
 customer_location_id  | integer |
 added_date| timestamp with time zone| not null
Indexes:
"cust_acct_pkey" PRIMARY KEY, btree (account_id)
"cust_acct_cust_id_indx" btree (customer_id)
"cust_acct_order_id_id_indx" btree (order_id)
"cust_acct_pri_contact_id_indx" btree (primary_contact_id)





cust_orders table


   Column|Type 
|   Modifiers
-+-+--- 

 order_id| integer | not null 
default nextval('order_id_seq'::regclass)

 backorder_tag_id| character varying(18)   |
 order_location_id   | integer | not null
 work_order_name | character varying(75)   | not null
 status  | programs_status_type| not null
 additional_info_tag_shipper | character(16)   | not null
 additional_info_tag_cust| character(16)   | not null
 additional_info_tag_manuf   | character(16)   | not null
 additional_info_tag_supply  | character(16)   | not null
 acct_active_dt  | timestamp without time zone |
 acct_last_activity_date | timestamp without time zone |
 acct_backorder_items| boolean | not null 
default false

 custom_info1| text|
 custom_info2| text|
 custom_info3| text|
 custom_info4| text|
 custom_info5| text|
 custom_info6| text|
 custom_info7| text|
Indexes:
"cust_orders_pkey" PRIMARY KEY, btree (order_id)
"cust_orders_order_id_loc_id_key" UNIQUE, btree (order_id, 
order_location_id)

"cust_orders_loc_id_idx" btree (order_location_id)














regards
Szymon





--
-
Kevin Kempter   -   Constent State
A PostgreSQL Professional Services Company
  www.consistentstate.com
-



Re: [PERFORM] Query tuning help

2011-10-11 Thread CS DBA

On 10/11/2011 12:02 PM, Pavel Stehule wrote:

Hello

please, send EXPLAIN ANALYZE output instead.

Regards

Pavel Stehule

2011/10/11 CS DBA:

Hi all ;

I'm trying to tune a difficult query.

I have 2 tables:
cust_acct (9million rows)
cust_orders (200,000 rows)

Here's the query:

SELECT
 a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
 a.status,  a.customer_location_id, a.added_date,
 o.agent_id, p.order_location_id_id,
 COALESCE(a.customer_location_id, p.order_location_id) AS
order_location_id
FROM
 cust_acct a JOIN
 cust_orders o
 ON a.order_id = p.order_id;

I can't get it to run much faster that about 13 seconds, in most cases it's
more like 30 seconds.
We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB


I've tried separating the queries as filtering queries&  joining the
results, disabling seq scans, upping work_mem and half a dozen other
approaches.  Here's the explain plan:

  Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
Hash Cond: (a.order_id = o.order_id)
->   Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
->   Hash  (cost=122.69..122.69 rows=2269 width=12)
  ->   Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269
width=12)

Thanks in advance for any help, tips, etc...







Explain Analyze:


 Hash Join  (cost=154.46..691776.11 rows=10059626 width=100) (actual 
time=5.191..37551.360 rows=10063432 loops=1)

   Hash Cond: (a.order_id = o.order_id)
   ->  Seq Scan on cust_acct a  (cost=0.00..540727.26 rows=10059626 
width=92) (actual time=0.022..18987.095 rows=10063432 loops=1)
   ->  Hash  (cost=124.76..124.76 rows=2376 width=12) (actual 
time=5.135..5.135 rows=2534 loops=1)
 ->  Seq Scan on cust_orders o  (cost=0.00..124.76 rows=2376 
width=12) (actual time=0.011..2.843 rows=2534 loops=1)

 Total runtime: 43639.105 ms
(6 rows)














--
-
Kevin Kempter   -   Constent State
A PostgreSQL Professional Services Company
   www.consistentstate.com
-



--
-
Kevin Kempter   -   Constent State
A PostgreSQL Professional Services Company
  www.consistentstate.com
-



Re: [PERFORM] Query tuning help

2011-10-11 Thread Szymon Guz
On 11 October 2011 19:52, CS DBA  wrote:

>  Hi all ;
>
> I'm trying to tune a difficult query.
>
> I have 2 tables:
> cust_acct (9million rows)
> cust_orders (200,000 rows)
>
> Here's the query:
>
> SELECT
> a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
> a.status,  a.customer_location_id, a.added_date,
> o.agent_id, p.order_location_id_id,
> COALESCE(a.customer_location_id, p.order_location_id) AS
> order_location_id
> FROM
> cust_acct a JOIN
> cust_orders o
> ON a.order_id = p.order_id;
>
> I can't get it to run much faster that about 13 seconds, in most cases it's
> more like 30 seconds.
> We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB
>
>
> I've tried separating the queries as filtering queries & joining the
> results, disabling seq scans, upping work_mem and half a dozen other
> approaches.  Here's the explain plan:
>
>  Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
>Hash Cond: (a.order_id = o.order_id)
>->  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130
> width=92)
>->  Hash  (cost=122.69..122.69 rows=2269 width=12)
>  ->  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269
> width=12)
>
> Thanks in advance for any help, tips, etc...
>
>
>
>

Hi,
two simple questions:

- do you really need getting all 9M rows?
- show us the table structure, together with index definitions

regards
Szymon


Re: [PERFORM] Query tuning help

2011-10-11 Thread Pavel Stehule
Hello

please, send EXPLAIN ANALYZE output instead.

Regards

Pavel Stehule

2011/10/11 CS DBA :
> Hi all ;
>
> I'm trying to tune a difficult query.
>
> I have 2 tables:
> cust_acct (9million rows)
> cust_orders (200,000 rows)
>
> Here's the query:
>
> SELECT
>     a.account_id, a.customer_id, a.order_id, a.primary_contact_id,
>     a.status,  a.customer_location_id, a.added_date,
>     o.agent_id, p.order_location_id_id,
>     COALESCE(a.customer_location_id, p.order_location_id) AS
> order_location_id
> FROM
>     cust_acct a JOIN
>     cust_orders o
>     ON a.order_id = p.order_id;
>
> I can't get it to run much faster that about 13 seconds, in most cases it's
> more like 30 seconds.
> We have a box with 8 2.5GZ cores and 28GB of ram, shared_buffers is at 8GB
>
>
> I've tried separating the queries as filtering queries & joining the
> results, disabling seq scans, upping work_mem and half a dozen other
> approaches.  Here's the explain plan:
>
>  Hash Join  (cost=151.05..684860.30 rows=9783130 width=100)
>    Hash Cond: (a.order_id = o.order_id)
>    ->  Seq Scan on cust_acct a  (cost=0.00..537962.30 rows=9783130 width=92)
>    ->  Hash  (cost=122.69..122.69 rows=2269 width=12)
>  ->  Seq Scan on cust_orders o  (cost=0.00..122.69 rows=2269
> width=12)
>
> Thanks in advance for any help, tips, etc...
>
>
>
>
>
>
>
>
>
>
>
>
> --
> -
> Kevin Kempter   -   Constent State
> A PostgreSQL Professional Services Company
>   www.consistentstate.com
> -

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query tuning help

2010-06-14 Thread Andy Colson

On 06/14/2010 05:41 AM, AI Rumman wrote:

Can any one please help me in tuning the query?

explain
select *
from (select * from crmentity where deleted = 0 and createdtime between
(now() - interval '6 month') and now() ) as crmentity
inner join (select * from activity where  activitytype = 'Emails' and
date_start between (now() - interval '6 month')  and now()) as activity
on crmentity.crmid=activity.activityid
inner join emaildetails on emaildetails.emailid = crmentity.crmid
inner join vantage_email_track on
vantage_email_track.mailid=emaildetails.emailid
left join seactivityrel on seactivityrel.activityid = emaildetails.emailid



Can you send us 'explain analyze' too?


->  Seq Scan on emaildetails  (cost=0.00..1686.95 rows=44595 width=186)
->  Seq Scan on vantage_email_track  (cost=0.00..1324.52 rows=88852 width=12)


do you have indexes on emaildetails(emailid) and  vantage_email_track(mailid)?

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query tuning help

2005-05-11 Thread Ulrich Wisser
Hi Dan,
I tried to understand your query, but I couldn't get my understanding of 
the query and your description in sync.

Why do you use sub selects? Wouldn't a simple "recordtext like '%RED%'" 
do the trick too?

You combine all your where conditions with and. To me this looks like 
you get only rows with RED and CORVETTE.

From your description I would rewrite the query as
explain analyze
select distinct
em.incidentid,
ea.recordtext as retdata,
eg.long,
eg.lat
from
ea join em using(incidentid) join eg using(incidentid)
where
em.entrydate >= '2005-1-1 00:00'::date
and em.entrydate <= '2005-5-9 00:00'::date
and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
order by em.entrydate
That should give you all rows containing one of the words.
Does it work?
Is is faster? Is it fast enough?
Ulrich
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Query tuning help

2005-05-09 Thread Mischa Sandberg
Quoting Russell Smith <[EMAIL PROTECTED]>:

> On Mon, 9 May 2005 11:49 am, Dan Harris wrote:
> > On May 8, 2005, at 6:51 PM, Russell Smith wrote:
> [snip]
> > select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
> > FROM em
> > JOIN ea ON em.incidentid = ea.incidentid  --- slight paraphrase /Mischa.
> > AND em.entrydate between '2005-1-1' and '2005-5-9'
> > AND ea.recordtext like '%RED%'  AND ea.recordtext like
'%CORVETTE%'

> > Here's the situation:
> > Due to the format of the systems with which I integrate ( I have no
> > control over these formats ), we will get these 'recordtext' values one
> > line at a time, accumulating over time.  The only way I can find to
> > make this work is to insert a new record for each line.  The problem
> > is, that when someone wants to search multiple keywords, they expect
> > these words to be matched across multiple records with a given incident
> > number.
> >
> >   For a very simple example:
> >
> > IncidentID  DateRecordtext
> > --  -
> > 1   2005-05-01 14:21 blah blah blah RED blah blah
> >    2005-05-01 14:23 not what we are looking for
> > 1   2005-05-02 02:05 blah CORVETTE blah blah
> >

select em.incidentid, ea.recordtest as retdata
fromem
join ( -- equivalent to "where incidentid in (...)", sometimes faster.
  select incidentid
  from  em join  ea using (incidentid)
  where em.entrydate between '2005-1-1' and '2005-5-9'
  group by incidentid
  having 1 = min(case when recordtest like '%RED%' then 1 end)
 and 1 = min(case when recordtest like '%CORVETTE%' then 1 end)
 ) as X using (incidentid);



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Query tuning help

2005-05-09 Thread Harald Fuchs
In article <[EMAIL PROTECTED]>,
Dan Harris <[EMAIL PROTECTED]> writes:

> On May 8, 2005, at 8:06 PM, Josh Berkus wrote:
>> 
>>> If I were to use tsearch2 for full-text indexing, would I need to
>>> create another table that merges all of my recordtext rows into a
>>> single 'text' field type?
>> 
>> No.   Read the OpenFTS docs, they are fairly clear on how to set up
>> a simple
>> FTS index. (TSearch2 ~~ OpenFTS)
>> 
>>> If so, this is where I run into problems, as
>>> my logic also needs to match multiple words in their original order.

> I have been reading the Tsearch2 docs and either I don't understand
> something or I'm not communicating my situation clearly enough.  It
> seems that Tsearch2 has a concept of "document".  And, in everything I
> am reading, they expect your "document" to be all contained in a
> single row.  Since my words can be spread across multiple rows, I
> don't see that Tsearch2 will combine all 'recordtext' row values with
> the same "incidentid" into a single vector.  Am I overlooking
> something in the docs?

AFAICS no, but you could create a separate table containing just the
distinct incidentids and the tsearch2 vectors of all recordtexts
matching that incidentid.  This table would get updated solely by
triggers on the original table and would provide a fast way to get all
incidentids for RED and CORVETTE.  The question is: would this reduce
the number of rows to check more than filtering on date?


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query tuning help

2005-05-08 Thread Klint Gore
On Sun, 8 May 2005 20:31:38 -0600, Dan Harris <[EMAIL PROTECTED]> wrote:
> Duly noted.  If this method can search across rows, I'm willing to 
> accept this overhead for the speed it would add.

You could use intersect to search across rows.  Using tsearch2 will look
up the RED and CORVETTE using the index and intersect will pull out the
commmon rows.

> In the meantime, is there any way I can reach my goal without Tsearch2 
> by just restructuring my query to narrow down the results by date 
> first, then seq scan for the 'likes'?


select distinct
em.incidentid,
ea.recordtext as retdata,
eg.long,
eg.lat
>from
ea, em, eg, 
(
select
ea.incidentid
from
ea, em
where
em.incidentid = ea.incidentid and
em.entrydate >= '2005-1-1 00:00' and
em.entrydate <= '2005-5-9 00:00' and
recordtext like '%RED%'

intersect

select
ea.incidentid
from
ea, em
where
em.incidentid = ea.incidentid and
em.entrydate >= '2005-1-1 00:00' and
em.entrydate <= '2005-5-9 00:00' and
recordtext like '%CORVETTE%'
) as iid
where
em.incidentid = ea.incidentid and
em.incidentid = eg.incidentid and
em.entrydate >= '2005-1-1 00:00' and
em.entrydate <= '2005-5-9 00:00'
and ea.incidentid = iid.incidentid 
and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )
order by em.entrydate

klint.

+---+-+
: Klint Gore: "Non rhyming:
: EMail   : [EMAIL PROTECTED]   :  slang - the:
: Snail   : A.B.R.I.:  possibilities  :
: Mail  University of New England   :  are useless"   :
:   Armidale NSW 2351 Australia : L.J.J.  :
: Fax : +61 2 6772 5376 : :
+---+-+

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Query tuning help

2005-05-08 Thread Dan Harris
On May 8, 2005, at 8:32 PM, Russell Smith wrote:
I have run this, and while it is very fast, I'm concerned it's not
doing what I need.
How fast is very fast?
It took 35 seconds to complete versus ~450 my old way.
select distinct em.incidentid, ea.recordtext as retdata, eg.long, 
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  
'2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND (ea.recordtext like '%RED%'  
OR ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid WHERE
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, 
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  
'2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like 
'%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid)  AND
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, 
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  
'2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'))
JOIN eg ON em.incidentid = eg.incidentid)

Yes, it is nasty, but so was my previous query :)  So long as this is 
faster, I'm ok with that.  I'll see if i can make this work.  Thank you 
very much.

-Dan
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Query tuning help

2005-05-08 Thread Russell Smith
On Mon, 9 May 2005 11:49 am, Dan Harris wrote:
> 
> On May 8, 2005, at 6:51 PM, Russell Smith wrote:
> 
[snip]
> > select distinct em.incidentid, ea.recordtext as retdata, eg.long,   
> > eg.lat
> > FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  
> > '2005-1-1 00:00'
> > AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'  
> > AND ea.recordtext like '%CORVETTE%')
> > JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like  
> > '%RED%' or recordtext like '%CORVETTE%'  );
> >
> 
> I have run this, and while it is very fast, I'm concerned it's not  
> doing what I need.
How fast is very fast?


> Here's the situation: 
> 
> Due to the format of the systems with which I integrate ( I have no  
> control over these formats ), we will get these 'recordtext' values one  
> line at a time, accumulating over time.  The only way I can find to  
> make this work is to insert a new record for each line.  The problem  
> is, that when someone wants to search multiple keywords, they expect  
> these words to be matched across multiple records with a given incident  
> number.
> 
>   For a very simple example:
> 
> IncidentID  DateRecordtext
> --  -
> ---
> 1   2005-05-01 14:21 blah blah blah RED blah blah
>    2005-05-01 14:23 not what we are looking for
> 1   2005-05-02 02:05 blah CORVETTE blah blah
> 
> So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE  
> '%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the  
> condition will only be applied to a single row of recordtext at a time,  
> not a whole group with the same incident number.
> 
> If I were to use tsearch2 for full-text indexing, would I need to  
> create another table that merges all of my recordtext rows into a  
> single 'text' field type?  If so, this is where I run into problems, as  
> my logic also needs to match multiple words in their original order.  I  
> may also receive additional updates to the previous data.  In that  
> case, I need to replace the original record with the latest version of  
> it.  If I have already concatenated these rows into a single field, the  
> logic to in-line replace only the old text that has changed is very  
> very difficult at best.  So, that's the reason I had to do two  
> subqueries in my example.  Please tell me if I misunderstood your logic  
> and it really will match given my condition above, but it didn't seem  
> like it would.
> 
> Thanks again for the quick responses!  This list has been a great  
> resource for me.
> 
select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  
'2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND (ea.recordtext like '%RED%'  OR 
ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid WHERE 
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  
'2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%CORVETTE%'))
JOIN eg ON em.incidentid = eg.incidentid)  AND 
em.incidentid IN
(select distinct em.incidentid, ea.recordtext as retdata, eg.long, eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  
'2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'))
JOIN eg ON em.incidentid = eg.incidentid)

This may be more accurate.  However I would cool it VERY NASTY. Josh's 
solutions may be better.
However much of the data should be in memory once the subplans are done, so it 
may be quite fast.
you may 
> >

> -Dan
> 
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
> 
> 

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query tuning help

2005-05-08 Thread Dan Harris
On May 8, 2005, at 8:06 PM, Josh Berkus wrote:

If I were to use tsearch2 for full-text indexing, would I need to
create another table that merges all of my recordtext rows into a
single 'text' field type?
No.   Read the OpenFTS docs, they are fairly clear on how to set up a 
simple
FTS index. (TSearch2 ~~ OpenFTS)

If so, this is where I run into problems, as
my logic also needs to match multiple words in their original order.
I have been reading the Tsearch2 docs and either I don't understand 
something or I'm not communicating my situation clearly enough.  It 
seems that Tsearch2 has a concept of "document".  And, in everything I 
am reading, they expect your "document" to be all contained in a single 
row.  Since my words can be spread across multiple rows, I don't see 
that Tsearch2 will combine all 'recordtext' row values with the same 
"incidentid" into a single vector.  Am I overlooking something in the 
docs?

I'm doing something fairly similar on one of my projects and it works 
very
well.

I'd be curious what similarities they have?  Is it the searching across 
multiple rows or the order of words?

The limitations on TSearch2 indexes are:
1) they are expensive to update, so your data loads would be noticably 
slower.
2) they are only fast when cached in RAM (and when cached, are *very* 
fast).
So if you have a variety of other processes that tend to fill up RAM 
between
searches, you may find them less useful.
3) You have to create a materialized index column next to recordtext, 
which
will increase the size of the table.
Duly noted.  If this method can search across rows, I'm willing to 
accept this overhead for the speed it would add.

In the meantime, is there any way I can reach my goal without Tsearch2 
by just restructuring my query to narrow down the results by date 
first, then seq scan for the 'likes'?

-Dan
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Query tuning help

2005-05-08 Thread Tom Lane
Dan Harris <[EMAIL PROTECTED]> writes:
>> ->  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47) 
>> (actual time=2.085..2.309 rows=2 loops=473)
>>   Index Cond: ((ea.incidentid)::text = ("outer".incidentid)::text)
>>   Filter: (((recordtext)::text ~~ '%RED%'::text) OR ((recordtext)::text 
>> ~~ '%CORVETTE%'::text))

> Doesn't that mean it was using an index to filter?

No.  The "Index Cond" shows it is using the index only for the join
condition.  A "Filter" is an additional filter condition that happens to
get applied at this plan node --- but it'll be applied to every row the
index finds for the index condition.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query tuning help

2005-05-08 Thread Josh Berkus
Dan,

> While I believe you, I'm confused by this line in my original EXPLAIN
>
> ANALYZE:
> >> ->  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47)
> >> (actual time=2.085..2.309 rows=2 loops=473)
> >> Index Cond:
> >> ((ea.incidentid)::text = ("outer".incidentid)::text)
> >> Filter: (((recordtext)::text
> >> ~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))

The index named is matching based on incidentid -- the join condition.  The 
"filter" is applied against the table rows, i.e. a scan.

> If I were to use tsearch2 for full-text indexing, would I need to
> create another table that merges all of my recordtext rows into a
> single 'text' field type?  

No.   Read the OpenFTS docs, they are fairly clear on how to set up a simple 
FTS index. (TSearch2 ~~ OpenFTS)

> If so, this is where I run into problems, as 
> my logic also needs to match multiple words in their original order.  

You do that by doubling up ... that is, use the FTS index to pick all rows 
that contain "RED" and "CORVETTE", and then check the order.  I'll also note 
that your current query is not checking word order. 

Example:
WHERE recordtext_fti @@ to_tsquery ('default', 'RED && CORVETTE')
AND recordtext LIKE '%RED%CORVETTE%'

I'm doing something fairly similar on one of my projects and it works very 
well.

The limitations on TSearch2 indexes are:
1) they are expensive to update, so your data loads would be noticably slower.  
2) they are only fast when cached in RAM (and when cached, are *very* fast).  
So if you have a variety of other processes that tend to fill up RAM between 
searches, you may find them less useful.
3) You have to create a materialized index column next to recordtext, which 
will increase the size of the table.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Query tuning help

2005-05-08 Thread Dan Harris
On May 8, 2005, at 6:51 PM, Russell Smith wrote:
On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
You cannot use an index for %CORVETTE%, or %RED%.  There is no way
for the index to know if a row had that in the middle without scanning  
the whole
index.  So it's much cheaper to do a sequence scan.

While I believe you, I'm confused by this line in my original EXPLAIN  
ANALYZE:

->  Index Scan using ea1 on ea  (cost=0.00..2736.43 rows=42 width=47)  
(actual time=2.085..2.309 rows=2 loops=473)
Index Cond:  
((ea.incidentid)::text = ("outer".incidentid)::text)
Filter: (((recordtext)::text  
~~ '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
Doesn't that mean it was using an index to filter?  Along those lines,  
before I created index 'ea1', the query was much much slower.  So, it  
seemed like creating this index made a difference.

One possible way to make the query faster is to limit based on date,  
as you will only get about 700 rows.
And then don't use subselects, as they are doing full sequence scans.   
I think this query does what you do
above, and I think it will be faster, but I don't know.

I REALLY like this idea! If I could just filter by date first and then  
sequential scan through those, it should be very manageable.  Hopefully  
I can keep this goal while still accommodating the requirement listed  
in my next paragraph.

select distinct em.incidentid, ea.recordtext as retdata, eg.long,   
eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >=  
'2005-1-1 00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%'  
AND ea.recordtext like '%CORVETTE%')
JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like  
'%RED%' or recordtext like '%CORVETTE%'  );

I have run this, and while it is very fast, I'm concerned it's not  
doing what I need.  Here's the situation:

Due to the format of the systems with which I integrate ( I have no  
control over these formats ), we will get these 'recordtext' values one  
line at a time, accumulating over time.  The only way I can find to  
make this work is to insert a new record for each line.  The problem  
is, that when someone wants to search multiple keywords, they expect  
these words to be matched across multiple records with a given incident  
number.

 For a very simple example:
IncidentID		DateRecordtext
--		-			 
---
1			2005-05-01 14:21	blah blah blah RED blah blah
			2005-05-01 14:23	not what we are looking for
1			2005-05-02 02:05	blah CORVETTE blah blah

So, doing a search with an 'and' condition, e.g. WHERE RECORDTEXT LIKE  
'%RED%' AND RECORDTEXT LIKE '%CORVETTE%' , will not match because the  
condition will only be applied to a single row of recordtext at a time,  
not a whole group with the same incident number.

If I were to use tsearch2 for full-text indexing, would I need to  
create another table that merges all of my recordtext rows into a  
single 'text' field type?  If so, this is where I run into problems, as  
my logic also needs to match multiple words in their original order.  I  
may also receive additional updates to the previous data.  In that  
case, I need to replace the original record with the latest version of  
it.  If I have already concatenated these rows into a single field, the  
logic to in-line replace only the old text that has changed is very  
very difficult at best.  So, that's the reason I had to do two  
subqueries in my example.  Please tell me if I misunderstood your logic  
and it really will match given my condition above, but it didn't seem  
like it would.

Thanks again for the quick responses!  This list has been a great  
resource for me.

-Dan
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Query tuning help

2005-05-08 Thread Tom Lane
Russell Smith <[EMAIL PROTECTED]> writes:
> On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
>> and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  
>> 
> You cannot use an index for %CORVETTE%, or %RED%.

Not a btree index anyway.  Dan might have some success here with a
full-text-indexing package (eg, contrib/tsearch2)

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Query tuning help

2005-05-08 Thread Josh Berkus
Dan,

> and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  

It is simply not possible to use B-tree indexes on these kind of text queries.  
B-trees require you to start at the "left" side of the field, because B-trees 
locate records via <> tests.  "Anywhere in the field" text search requires a 
Full Text Index.

> The reason for the redundant LIKE clause is that first, I only want
> those "incidentid"s that contain the words 'RED' and 'CORVETTE'.  BUT,
> those two words may exist across multiple records with the same
> incidentid.  Then, I only want to actually work with the rows that
> contain one of the words.  This query will repeat the same logic for
> however many keywords are entered by the user.  I have investigated
> text searching options and have not found them to be congruous with my
> application.

Sounds like you either need to restructure your application, restructure your 
database (so that you're not doing "anywhere in field" searches), or buy 32GB 
of ram so that you can cache the whole table.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Query tuning help

2005-05-08 Thread Russell Smith
On Mon, 9 May 2005 09:20 am, Dan Harris wrote:
> Sorry to bother everyone with yet another "my query isn't using an 
> index" problem but I am over my head on this one..  I am open to ways 
> of restructuring this query to perform better.
> 
> I have a table, 'ea', with 22 million rows in it.  VACUUM ANALYZE has 
> been just run on the table.
> 
> This is the result of:
> 
> explain analyze
> select distinct
>  em.incidentid,
>  ea.recordtext as retdata,
>  eg.long,
>  eg.lat
> from
>  ea, em, eg
> where
>  em.incidentid = ea.incidentid and
>  em.incidentid = eg.incidentid and
>  em.entrydate >= '2005-1-1 00:00' and
>  em.entrydate <= '2005-5-9 00:00'
>  and ea.incidentid in (
>   select
>incidentid
>   from
>ea
>   where
>recordtext like '%RED%'
>  )
> 
>  and ea.incidentid in (
>   select
>incidentid
>   from
>ea
>   where
>recordtext like '%CORVETTE%'
>  )
>  and (  recordtext like '%RED%' or recordtext like '%CORVETTE%'  )  
> order by em.entrydate
> 
You cannot use an index for %CORVETTE%, or %RED%.  There is no way
for the index to know if a row had that in the middle without scanning the whole
index.  So it's much cheaper to do a sequence scan.

One possible way to make the query faster is to limit based on date, as you 
will only get about 700 rows.
And then don't use subselects, as they are doing full sequence scans.  I think 
this query does what you do 
above, and I think it will be faster, but I don't know.

select distinct em.incidentid, ea.recordtext as retdata, eg.long,  eg.lat
FROM em JOIN ea ON (em.incidentid = ea.incidentid AND em.entrydate >= '2005-1-1 
00:00'
AND em.entrydate <= '2005-5-9 00:00' AND ea.recordtext like '%RED%' AND 
ea.recordtext like '%CORVETTE%')
JOIN eg ON em.incidentid = eg.incidentid WHERE (recordtext like '%RED%' or 
recordtext like '%CORVETTE%'  );

> 
> -
> ANALYZE RESULTS
> -
> 
>   Unique  (cost=774693.72..774693.76 rows=1 width=159) (actual 
> time=446787.056..446787.342 rows=72 loops=1)
> ->  Sort  (cost=774693.72..774693.72 rows=1 width=159) (actual 
> time=446787.053..446787.075 rows=72 loops=1)
>   Sort Key: em.incidentid, public.ea.recordtext, eg.long, eg.lat
>   ->  Nested Loop  (cost=771835.10..774693.71 rows=1 width=159) 
> (actual time=444378.655..446786.746 rows=72 loops=1)
> ->  Nested Loop  (cost=771835.10..774688.81 rows=1 width=148) 
> (actual time=444378.532..446768.381 rows=72 loops=1)
>   ->  Nested Loop IN Join  (cost=771835.10..774678.88 
> rows=2 width=81) (actual time=444367.080..446191.864 rows=701 loops=1)
> ->  Nested Loop  (cost=771835.10..774572.05 
> rows=42 width=64) (actual time=444366.859..445463.232 rows=1011 loops=1)
>   ->  HashAggregate  
> (cost=771835.10..771835.10 rows=1 width=17) (actual 
> time=444366.702..444368.583 rows=473 loops=1)
> ->  Seq Scan on ea  
> (cost=0.00..771834.26 rows=335 width=17) (actual time=259.746..444358.837 
> rows=592 loops=1)
>   Filter: ((recordtext)::text ~~ 
> '%CORVETTE%'::text)
>   ->  Index Scan using ea1 on ea  
> (cost=0.00..2736.43 rows=42 width=47) (actual time=2.085..2.309 rows=2 
> loops=473)
> Index Cond: ((ea.incidentid)::text = 
> ("outer".incidentid)::text)
> Filter: (((recordtext)::text ~~ 
> '%RED%'::text) OR ((recordtext)::text ~~ '%CORVETTE%'::text))
> ->  Index Scan using ea1 on ea  
> (cost=0.00..2733.81 rows=42 width=17) (actual time=0.703..0.703 rows=1 
> loops=1011)
>   Index Cond: (("outer".incidentid)::text = 
> (ea.incidentid)::text)
>   Filter: ((recordtext)::text ~~ 
> '%RED%'::text)
>   ->  Index Scan using em_incidentid_idx on em  
> (cost=0.00..4.95 rows=1 width=67) (actual time=0.820..0.821 rows=0 loops=701)
> Index Cond: (("outer".incidentid)::text = 
> (em.incidentid)::text)
> Filter: ((entrydate >= '2005-01-01 
> 00:00:00'::timestamp without time zone) AND (entrydate <= '2005-05-09 
> 00:00:00'::timestamp without time zone))
> ->  Index Scan using eg_incidentid_idx on eg  
> (cost=0.00..4.89 rows=1 width=79) (actual time=0.245..0.246 rows=1 loops=72)
>   Index Cond: (("outer".incidentid)::text = 
> (eg.incidentid)::text)
>   Total runtime: 446871.880 ms
> (22 rows)
> 
> 
> -
> EXPLANATION
> -
> The reason for the redundant LIKE clause is that first, I only want 
> those "incidentid"s that contain the words 'RED' and 'CORVETTE'.  BUT, 
> those two words may exist across multiple records with the same 
> incidentid.  Then, I only want to actually