Re: [PERFORM] help needed -- sequential scan problem

2004-11-22 Thread sarlav kumar
Hi Tom,

Thanks for the help, Tom.

The major issue seems to be in the sub-selects: - Seq Scan on merchant_purchase mp (cost=0.00..95.39 rows=44 width=4) (actual time=2.37..2.58 rows=6 loops=619) Filter: (merchant_id = $0)where the estimated row count is a factor of 7 too high. If theestimated row count were even a little lower, it'd probably have gonefor an indexscan.

I understand that the sub-selects are taking up most of the time as they do a sequential scan on the tables. 

You might get some results from increasing thestatistics target for merchant_purchase.merchant_id. 

Do I have to use vacuum analyze to update the statistics? If so, I have already tried that and it doesn't seem to help.

If that doesn't help, I'd think about reducing random_page_cost a little bit.

I am sorry, I am not aware of what random_page_cost is, as I am new to Postgres. What does it signify and how do I reduce random_page_cost? Thanks,
Saranya__Do You Yahoo!?Tired of spam?  Yahoo! Mail has the best spam protection around http://mail.yahoo.com 

Re: [PERFORM] help needed -- sequential scan problem

2004-11-22 Thread gnari
From: sarlav kumar [EMAIL PROTECTED]

  [Tom:]
  You might get some results from increasing the
 statistics target for merchant_purchase.merchant_id.

 Do I have to use vacuum analyze to update the statistics? If so, I have
already tried that and it doesn't seem to help.

alter table merchant_purchase alter column merchant_id set statistics 500;
analyze merchant_purchase;


 If that doesn't help, I'd think about reducing random_page_cost a little
bit.

 I am sorry, I am not aware of what random_page_cost is, as I am new to
Postgres. What does it signify and how do I reduce random_page_cost?

set random_page_cost = 3;
explain analyse query

if it is an improvement, consider setting the value in your postgresql.conf,
but remember that this may affect other queries too.

gnari




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] help needed -- sequential scan problem

2004-11-22 Thread Josh Berkus
Sarlav,

 I am sorry, I am not aware of what random_page_cost is, as I am new to
 Postgres. What does it signify and how do I reduce random_page_cost?

It's a parameter in your postgresql.conf file.After you test it, you will 
want to change it there and reload the server (pg_ctl reload).

However, you can test it on an individual connection:
SET random_page_cost=2.5
(the default is 4.0)

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] help needed -- sequential scan problem

2004-11-22 Thread sarlav kumar
Hi Josh,

Can you tell me in what way it affects performance? And How do I decide what value to set for the random_page_cost? Does it depend on any other factors?

Thanks,
SaranyaJosh Berkus [EMAIL PROTECTED] wrote:
Sarlav, I am sorry, I am not aware of what random_page_cost is, as I am new to Postgres. What does it signify and how do I reduce random_page_cost?It's a parameter in your postgresql.conf file. After you test it, you will want to change it there and reload the server (pg_ctl reload).However, you can test it on an individual connection:SET random_page_cost=2.5(the default is 4.0)-- --JoshJosh BerkusAglio Database SolutionsSan Francisco__Do You Yahoo!?Tired of spam?  Yahoo! Mail has the best spam protection around http://mail.yahoo.com 

[PERFORM] help needed -- sequential scan problem

2004-11-19 Thread sarlav kumar
Hi All,

I am new to Postgres.

I have a query which does not use index scan unless I force postgres to use index scan. I dont want to force postgres, unless there is no way of optimizing this query.

The query :

select m.company_name,m.approved,cu.account_no,mbt.business_name,cda.country, 
(select count(*) from merchant_purchase mp left join data d on mp.data_id=d.idwheremp.merchant_id=m.id and d.status=5) as Trans_count,
(select sum(total * 0.01) from merchant_purchase mp left join data d on mp.data_id=d.id where mp.merchant_id=m.id and d.status=5) as Trans_amount,
(select count(*) from merchant_purchase mp left join data d on mp.data_id=d.id where d.what=15 and d.status=5 and d.flags=7 and mp.merchant_id=m.id) as Reversal_count
from merchant m 
left join customer cu on cu.id=m.uid 
left join customerdata cda on cda.uid=cu.id 
left join merchant_business_types mbt on mbt.id=m.businesstype and
exists (select distinct(merchant_id) from merchant_purchase where m.id=merchant_id);

First Question: I know the way I have written the first two sub-selects is really bad, as they have the same conditions in the where clause. But I am not sure if there is a way to select two columns in a single sub-select query. When I tried to combine the two sub-select queries, I got an error saying that the sub-select can have only one column. Does anyone know any other efficient way of doing it?

Second Question: The query plan is as follows:

QUERY PLAN  Hash Join (cost=901.98..17063.67 rows=619 width=88) (actual time=52.01..5168.09 rows=619 loops=1) Hash Cond: ("outer".businesstype = "inner".id) Join Filter: (subplan) - Merge Join (cost=900.34..1276.04 rows=619 width=62) (actual time=37.00..97.58 rows=619
 loops=1) Merge Cond: ("outer".id = "inner".uid) - Merge Join (cost=900.34..940.61 rows=619 width=52) (actual time=36.91..54.66 rows=619 loops=1) Merge Cond: ("outer".id = "inner".uid) - Sort (cost=795.45..810.32 rows=5949 width=17) (actual time=32.59..36.59 rows=5964 loops=1) Sort Key: cu.id - Seq Scan on customer cu (cost=0.00..422.49 rows=5949 width=17) (actual time=0.02..15.69 rows=5964
 loops=1) - Sort (cost=104.89..106.44 rows=619 width=35) (actual time=4.27..5.10 rows=619 loops=1) Sort Key: m.uid - Seq Scan on merchant m (cost=0.00..76.19 rows=619 width=35) (actual time=0.04..2.65 rows=619 loops=1) - Index Scan using customerdata_uid_idx on customerdata cda (cost=0.00..311.85 rows=5914 width=10) (actual time=0.09..27.70 rows=5919 loops=1) - Hash (cost=1.51..1.51 rows=51 width=26) (actual time=0.19..0.19 rows=0 loops=1) - Seq Scan o
  n
 merchant_business_types mbt (cost=0.00..1.51 rows=51 width=26) (actual time=0.04..0.12 rows=51 loops=1) SubPlan - Aggregate (cost=269.89..269.89 rows=1 width=12) (actual time=2.70..2.70 rows=1 loops=619) - Nested Loop (cost=0.00..269.78 rows=44 width=12) (actual time=2.40..2.69 rows=4 loops=619) Filter: ("inner".status = 5) - Seq Scan on merchant_purchase mp (cost=0.00..95.39 rows=44 width=4) (actual time=2.37..2.58 rows=6 loops=619) Filter: (merchant_id =
 $0) - Index Scan using data_pkey on data d (cost=0.00..3.91 rows=1 width=8) (actual time=0.01..0.01 rows=1 loops=3951) Index Cond: ("outer".data_id = d.id) - Aggregate (cost=269.89..269.89 rows=1 width=16) (actual time=2.73..2.73 rows=1 loops=619) - Nested Loop (cost=0.00..269.78 rows=44 width=16) (actual time=2.42..2.70 rows=4 loops=619) Filter: ("inner".status = 5) - Seq Scan on merchant_purchase m
  p
 (cost=0.00..95.39 rows=44 width=8) (actual time=2.39..2.60 rows=6 loops=619) Filter: (merchant_id = $0) - Index Scan using data_pkey on data d (cost=0.00..3.91 rows=1 width=8) (actual time=0.01..0.01 rows=1 loops=3951) Index Cond: ("outer".data_id = d.id) - Aggregate (cost=270.12..270.12 rows=1 width=20) (actual time=2.72..2.72 rows=1 loops=619) - Nested Loop (cost=0.00..270.00 rows=44 width=20) (actual time=2.63..2.72 rows=0
 loops=619) Filter: (("inner".what = 15) AND ("inner".status = 5) AND ("inner".flags = 7)) - Seq Scan on merchant_purchase mp (cost=0.00..95.39 rows=44 width=4) (actual time=2.40..2.62 rows=6 loops=619) Filter: (merchant_id = $0) - Index Scan using data_pkey on data d (cost=0.00..3.91 rows=1 width=16) (actual time=0.01..0.01 rows=1 loops=3951) Index Cond: ("outer".data_id =
 d.id) - Unique (cost=0.00..113.14 rows=4 width=4) (actual time=0.02..0.02 rows=0 loops=598) - Index Scan using merchant_purchase_merchant_id_idx on merchant_purchase (cost=0.00..113.02 rows=44 width=4) (actual time=0.01..0.01 rows=0 loops=598) Index Cond: ($0 = merchant_id)Total runtime: 5170.37 msec (5.170 sec)(42 rows)

As you can see, there are many sequential scans in the query plan. Postgres is not using the index defined, even though it leads to better performance(0.2 sec!! when i force index scan)

Is there something wrong in my query that makes postgres use seq scan as opposed to index scan?? Any help 

Re: [PERFORM] help needed -- sequential scan problem

2004-11-19 Thread Tom Lane
sarlav kumar [EMAIL PROTECTED] writes:
 I have a query which does not use index scan unless I force postgres to use 
 index scan. I dont want to force postgres, unless there is no way of 
 optimizing this query.

The major issue seems to be in the sub-selects:

  -  Seq Scan on merchant_purchase mp  (cost=0.00..95.39 
 rows=44 width=4) (actual time=2.37..2.58 rows=6 loops=619)
Filter: (merchant_id = $0)

where the estimated row count is a factor of 7 too high.  If the
estimated row count were even a little lower, it'd probably have gone
for an indexscan.  You might get some results from increasing the
statistics target for merchant_purchase.merchant_id.  If that doesn't
help, I'd think about reducing random_page_cost a little bit.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend