Re: [PERFORM] Query optimization help

2011-08-30 Thread Ondrej Ivanič
Hi,

2011/8/30 Szymon Kosok :
> 2011/8/30 Ondrej Ivanič :
>> Could you please re-post your explain using this web site:
>> http://explain.depesz.com/ and post links to Stackoverflow question?
>
> Here it is: http://explain.depesz.com/s/Iaa
>
>> - try to disable nested loop join (set enable_nestloop=off)

Thanks, I would try to "materialise" spoleczniak_tablica table. Your
query looks like this:
select ...
from spoleczniak_tablica
inner join ...
where ...
order by spoleczniak_tablica.id desc
limit 21

So I would rewrite your query like this:
select ...
from (
 select ...
 from spoleczniak_tablica
 where 
 order by spoleczniak_tablica.id desc
 limit 21
) as x
inner join ...


-- 
Ondrej Ivanic
(ondrej.iva...@gmail.com)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query optimization help

2011-08-30 Thread Szymon Kosok
2011/8/30 Ondrej Ivanič :
> Could you please re-post your explain using this web site:
> http://explain.depesz.com/ and post links to Stackoverflow question?

Here it is: http://explain.depesz.com/s/Iaa

> - try to disable nested loop join (set enable_nestloop=off)

Even worse performance (http://explain.depesz.com/s/mMi).

My configuration:http://pastie.org/2453148 (copied and pasted only
uncommented important variables). It's decent hardware. i7, 16 GB of
RAM, 3x2 RAID 10 (7200rpm) for OS + data, RAID 1 (2 disks, 7200rpm)
for WAL, RAID controller with BBU and 512 MB memory cache (cache is
set to write only).

PS. Sorry Ondrej, accidentally I've sent reply to you, not to list.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query optimization help

2011-08-29 Thread Ondrej Ivanič
Hi,

On 30 August 2011 15:36, Szymon Kosok  wrote:
> Hello,
>
> I asked that question on StackOverflow, but didn't get any valuable
> response, so I'll ask it here. :)
>
> I have such query:

Could you please re-post your explain using this web site:
http://explain.depesz.com/ and post links to Stackoverflow question?
What is your Postgres version? Database settings?
I see huge discrepancy between predicted and actual row numbers (like
1264420 vs 485). I would try the following:

- check column statistics (pg_stasts) and focus on the following
columns: n_distinct, null_frac, most_common_vals. If they are way-off
from the actual values then you should tweak (auto)analyze process:
run manual/auto analyse more often (check pg_stat_user_tables),
increase default_statistics_target (per column or global)

- try to disable nested loop join (set enable_nestloop=off)

-- 
Ondrej Ivanic
(ondrej.iva...@gmail.com)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query optimization help

2011-08-29 Thread Szymon Kosok
Hello,

I asked that question on StackOverflow, but didn't get any valuable
response, so I'll ask it here. :)

I have such query:

SELECT "spoleczniak_tablica"."id", "spoleczniak_tablica"."postac_id",
"spoleczniak_tablica"."hash", "spoleczniak_tablica"."typ",
"spoleczniak_tablica"."ikona", "spoleczniak_tablica"."opis",
"spoleczniak_tablica"."cel", "spoleczniak_tablica"."data",
"postac_postacie"."id",
"postac_postacie"."user_id", "postac_postacie"."avatar",
"postac_postacie"."ikonka",
"postac_postacie"."imie", "postac_postacie"."nazwisko",
"postac_postacie"."pseudonim",
"postac_postacie"."plec", "postac_postacie"."wzrost", "postac_postacie"."waga",
"postac_postacie"."ur_tydz", "postac_postacie"."ur_rok",
"postac_postacie"."ur_miasto_id",
"postac_postacie"."akt_miasto_id", "postac_postacie"."kasa",
"postac_postacie"."punkty",
"postac_postacie"."zmeczenie", "postac_postacie"."zdrowie",
"postac_postacie"."kariera"
FROM "spoleczniak_tablica" INNER JOIN "postac_postacie" ON
("spoleczniak_tablica"."postac_id" = "postac_postacie"."id") WHERE
spoleczniak_tablica.postac_id = 1 or spoleczniak_tablica.id in(select
wpis_id from
spoleczniak_oznaczone where etykieta_id in(select tag_id from
spoleczniak_subskrypcje where
postac_id = 1)) or (spoleczniak_tablica.postac_id in(select obserwowany_id from
spoleczniak_obserwatorium where obserwujacy_id = 1) and hash not
in('dyskusja', 'kochanie',
'szturniecie')) or (spoleczniak_tablica.cel = 1 and
spoleczniak_tablica.hash in('dyskusja',
'kochanie', 'obserwatorium', 'szturchniecie')) or spoleczniak_tablica.hash =
'administracja-info' or exists(select 1 from spoleczniak_komentarze
where kredka_id =
spoleczniak_tablica.id and postac_id = 1) ORDER BY
"spoleczniak_tablica"."id" DESC LIMIT
21;

and it's real performance bottleneck for us. It's one of the most
often executed query on our site.

Here is EXPLAIN ANALYZE:

 Limit  (cost=52.69..185979.44 rows=21 width=283) (actual
time=5.981..149.110 rows=21 loops=1)
   ->  Nested Loop  (cost=52.69..27867127142.57 rows=3147528
width=283) (actual time=5.981..149.103 rows=21 loops=1)
 ->  Index Scan Backward using spoleczniak_tablica_pkey on
spoleczniak_tablica  (cost=52.69..27866103743.37 rows=3147528
width=194) (actual time=5.971..148.963 rows=21 loops=1)
   Filter: ((postac_id = 1) OR (SubPlan 1) OR ((hashed
SubPlan 2) AND ((hash)::text <> ALL
('{dyskusja,kochanie,szturniecie}'::text[]))) OR ((cel = 1) AND
((hash)::text = ANY
('{dyskusja,kochanie,obserwatorium,szturchniecie}'::text[]))) OR
((hash)::text = 'administracja-info'::text) OR (alternatives: SubPlan
3 or hashed SubPlan 4))
   SubPlan 1
 ->  Materialize  (cost=13.28..11947.85 rows=1264420
width=4) (actual time=0.000..0.024 rows=485 loops=2137)
   ->  Nested Loop  (cost=13.28..685.75
rows=1264420 width=4) (actual time=0.119..0.664 rows=485 loops=1)
 ->  HashAggregate  (cost=5.89..5.90
rows=1 width=4) (actual time=0.015..0.017 rows=7 loops=1)
   ->  Index Scan using
spoleczniak_subskrypcje_postac_id on spoleczniak_subskrypcje
(cost=0.00..5.89 rows=2 width=4) (actual time=0.005..0.009 rows=7
loops=1)
 Index Cond: (postac_id = 1)
 ->  Bitmap Heap Scan on
spoleczniak_oznaczone  (cost=7.38..674.96 rows=391 width=8) (actual
time=0.019..0.082 rows=69 loops=7)
   Recheck Cond: (etykieta_id =
spoleczniak_subskrypcje.tag_id)
   ->  Bitmap Index Scan on
spoleczniak_oznaczone_etykieta_id  (cost=0.00..7.29 rows=391 width=0)
(actual time=0.013..0.013 rows=69 loops=7)
 Index Cond: (etykieta_id =
spoleczniak_subskrypcje.tag_id)
   SubPlan 2
 ->  Index Scan using
spoleczniak_obserwatorium_obserwujacy_id on spoleczniak_obserwatorium
(cost=0.00..39.36 rows=21 width=4) (actual time=0.006..0.030 rows=26
loops=1)
   Index Cond: (obserwujacy_id = 1)
   SubPlan 3
 ->  Bitmap Heap Scan on spoleczniak_komentarze
(cost=18.67..20.68 rows=1 width=0) (never executed)
   Recheck Cond: ((kredka_id =
spoleczniak_tablica.id) AND (postac_id = 1))
   ->  BitmapAnd  (cost=18.67..18.67 rows=1
width=0) (never executed)
 ->  Bitmap Index Scan on
spoleczniak_komentarze_kredka_id  (cost=0.00..2.98 rows=24 width=0)
(never executed)
   Index Cond: (kredka_id =
spoleczniak_tablica.id)
 ->  Bitmap Index Scan on
spoleczniak_komentarze_postac_id  (cost=0.00..15.44 rows=890 width=0)
(never executed)
   Index Cond: (postac_id = 1)
   SubPlan 4
 ->  Index Scan using spoleczniak_komentarze_postac_id
on spoleczniak_komentarze  (cost=0.00..1610.46 rows=890 width=4)
(actual time=0.0

[PERFORM] Query optimization help

2011-08-29 Thread Szymon Kosok
Hello,

I asked that question on StackOverflow, but didn't get any valuable
response, so I'll ask it here. :)

I have such query:

SELECT "spoleczniak_tablica"."id", "spoleczniak_tablica"."postac_id",
"spoleczniak_tablica"."hash", "spoleczniak_tablica"."typ",
"spoleczniak_tablica"."ikona", "spoleczniak_tablica"."opis",
"spoleczniak_tablica"."cel", "spoleczniak_tablica"."data",
"postac_postacie"."id",
"postac_postacie"."user_id", "postac_postacie"."avatar",
"postac_postacie"."ikonka",
"postac_postacie"."imie", "postac_postacie"."nazwisko",
"postac_postacie"."pseudonim",
"postac_postacie"."plec", "postac_postacie"."wzrost", "postac_postacie"."waga",
"postac_postacie"."ur_tydz", "postac_postacie"."ur_rok",
"postac_postacie"."ur_miasto_id",
"postac_postacie"."akt_miasto_id", "postac_postacie"."kasa",
"postac_postacie"."punkty",
"postac_postacie"."zmeczenie", "postac_postacie"."zdrowie",
"postac_postacie"."kariera"
FROM "spoleczniak_tablica" INNER JOIN "postac_postacie" ON
("spoleczniak_tablica"."postac_id" = "postac_postacie"."id") WHERE
spoleczniak_tablica.postac_id = 1 or spoleczniak_tablica.id in(select
wpis_id from
spoleczniak_oznaczone where etykieta_id in(select tag_id from
spoleczniak_subskrypcje where
postac_id = 1)) or (spoleczniak_tablica.postac_id in(select obserwowany_id from
spoleczniak_obserwatorium where obserwujacy_id = 1) and hash not
in('dyskusja', 'kochanie',
'szturniecie')) or (spoleczniak_tablica.cel = 1 and
spoleczniak_tablica.hash in('dyskusja',
'kochanie', 'obserwatorium', 'szturchniecie')) or spoleczniak_tablica.hash =
'administracja-info' or exists(select 1 from spoleczniak_komentarze
where kredka_id =
spoleczniak_tablica.id and postac_id = 1) ORDER BY
"spoleczniak_tablica"."id" DESC LIMIT
21;

and it's real performance bottleneck for us. It's one of the most
often executed query on our site.

Here is EXPLAIN ANALYZE:

 Limit  (cost=52.69..185979.44 rows=21 width=283) (actual
time=5.981..149.110 rows=21 loops=1)
   ->  Nested Loop  (cost=52.69..27867127142.57 rows=3147528
width=283) (actual time=5.981..149.103 rows=21 loops=1)
 ->  Index Scan Backward using spoleczniak_tablica_pkey on
spoleczniak_tablica  (cost=52.69..27866103743.37 rows=3147528
width=194) (actual time=5.971..148.963 rows=21 loops=1)
   Filter: ((postac_id = 1) OR (SubPlan 1) OR ((hashed
SubPlan 2) AND ((hash)::text <> ALL
('{dyskusja,kochanie,szturniecie}'::text[]))) OR ((cel = 1) AND
((hash)::text = ANY
('{dyskusja,kochanie,obserwatorium,szturchniecie}'::text[]))) OR
((hash)::text = 'administracja-info'::text) OR (alternatives: SubPlan
3 or hashed SubPlan 4))
   SubPlan 1
 ->  Materialize  (cost=13.28..11947.85 rows=1264420
width=4) (actual time=0.000..0.024 rows=485 loops=2137)
   ->  Nested Loop  (cost=13.28..685.75
rows=1264420 width=4) (actual time=0.119..0.664 rows=485 loops=1)
 ->  HashAggregate  (cost=5.89..5.90
rows=1 width=4) (actual time=0.015..0.017 rows=7 loops=1)
   ->  Index Scan using
spoleczniak_subskrypcje_postac_id on spoleczniak_subskrypcje
(cost=0.00..5.89 rows=2 width=4) (actual time=0.005..0.009 rows=7
loops=1)
 Index Cond: (postac_id = 1)
 ->  Bitmap Heap Scan on
spoleczniak_oznaczone  (cost=7.38..674.96 rows=391 width=8) (actual
time=0.019..0.082 rows=69 loops=7)
   Recheck Cond: (etykieta_id =
spoleczniak_subskrypcje.tag_id)
   ->  Bitmap Index Scan on
spoleczniak_oznaczone_etykieta_id  (cost=0.00..7.29 rows=391 width=0)
(actual time=0.013..0.013 rows=69 loops=7)
 Index Cond: (etykieta_id =
spoleczniak_subskrypcje.tag_id)
   SubPlan 2
 ->  Index Scan using
spoleczniak_obserwatorium_obserwujacy_id on spoleczniak_obserwatorium
(cost=0.00..39.36 rows=21 width=4) (actual time=0.006..0.030 rows=26
loops=1)
   Index Cond: (obserwujacy_id = 1)
   SubPlan 3
 ->  Bitmap Heap Scan on spoleczniak_komentarze
(cost=18.67..20.68 rows=1 width=0) (never executed)
   Recheck Cond: ((kredka_id =
spoleczniak_tablica.id) AND (postac_id = 1))
   ->  BitmapAnd  (cost=18.67..18.67 rows=1
width=0) (never executed)
 ->  Bitmap Index Scan on
spoleczniak_komentarze_kredka_id  (cost=0.00..2.98 rows=24 width=0)
(never executed)
   Index Cond: (kredka_id =
spoleczniak_tablica.id)
 ->  Bitmap Index Scan on
spoleczniak_komentarze_postac_id  (cost=0.00..15.44 rows=890 width=0)
(never executed)
   Index Cond: (postac_id = 1)
   SubPlan 4
 ->  Index Scan using spoleczniak_komentarze_postac_id
on spoleczniak_komentarze  (cost=0.00..1610.46 rows=890 width=4)
(actual time=0.0

Re: [PERFORM] query optimization help

2005-01-14 Thread sarlav kumar
Hi,
 
Thanks for the help. I actually got around with it by doing the following.
I created a temporary table:
 
create table statesnew as select distinct state,state_code from postalcode where lower(country)='us';
 
And then changed the query to :
 
SELECT (SELECT sn.state FROM statesnew sn WHERE UPPER(cd.state) IN (sn.state, sn.state_code)) as mystate, SUM(d.amount) as total_amount, SUM(COALESCE(d.fee,0) + COALESCE(mp.seller_fee, 0) + COALESCE(mp.buyer_fee,0)) as total_fee FROM data d JOIN customerdata cd ON d.uid = cd.uid LEFT JOIN merchant_purchase mp ON d.id = mp.data_id WHERE d.what IN (26,0, 15) AND d.flags IN (1,9,10,12 ) AND lower(cd.country) = 'us' AND date_part('year', d.time) = 2004 GROUP BY mystate ORDER BY mystate;
 
This worked well, as it reduced the number of entries it had to search from.
 
I am not sure how to use the function you have written. Can you give me pointers on that?
 
Thanks,
Saranya
 
Merlin Moncure <[EMAIL PROTECTED]> wrote:

Please post in plaintext, not html where possible.Your group by clause was 'myst'...was that supposed to be mystate?
Yes, It is mystate. It continues on the next line:)
Her is something you might try...use the original query form and create a function which resolves the state code from the input data...you are already doing that with upper.So,create function get_state_code(text) returns char(2) as $$select case when len($1) = 2 then upper($1)else lookup_state_code($1)end;$$language sql stable;lookup_state_code is a similar function which is boils down to a select from a lookup table. Or, you could make a giant cast statement (when GEORGIA then GA, etc). and now your function becomes IMMUTABLE and should execute very fast. Just make sure all the states are spelled correct in the original table via domain constraint.Merlin__Do You Yahoo!?Tired of spam?  Yahoo! Mail has the best spam protection around http://mail.yahoo.com 

Re: [PERFORM] query optimization help

2005-01-14 Thread Merlin Moncure
Please post in plaintext, not html where possible.
Your group by clause was 'myst'...was that supposed to be mystate?

Her is something you might try...use the original query form and create a 
function which resolves the state code from the input data...you are already 
doing that with upper.

So,

create function get_state_code(text) returns char(2) as 
$$
select case when len($1) = 2 
then upper($1)
else lookup_state_code($1)
end;
$$
language sql stable;

lookup_state_code is a similar function which is boils down to a select from a 
lookup table.  Or, you could make a giant cast statement (when GEORGIA then GA, 
etc).  and now your function becomes IMMUTABLE and should execute very fast.  
Just make sure all the states are spelled correct in the original table via 
domain constraint.

Merlin



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of sarlav kumar
Sent: Friday, January 14, 2005 9:40 AM
To: pgsqlnovice; pgsqlperform
Subject: [PERFORM] query optimization help

Hi All,
 
I have the following query to generate a report grouped by "states".
 
SELECT distinct upper(cd.state) as mystate, SUM(d.amount) as total_amount, 
SUM(COALESCE(d.fee,0) + COALESCE(mp.seller_fee, 0) + COALESCE(mp.buyer_fee,0)) 
as total_fee FROM data d left JOIN customerdata cd ON d.uid = cd.uid LEFT JOIN 
merchant_purchase mp ON d.id = mp.data_id WHERE d.what IN (26,0, 15) AND 
d.flags IN (1,9,10,12 ) AND lower(cd.country) = 'us' AND 
date_part('year',d.time)= 2004 GROUP BY myst
ate ORDER BY mystate;


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] query optimization help

2005-01-14 Thread sarlav kumar
Hi All,
 
I have the following query to generate a report grouped by "states".
 
SELECT distinct upper(cd.state) as mystate, SUM(d.amount) as total_amount, SUM(COALESCE(d.fee,0) + COALESCE(mp.seller_fee, 0) + COALESCE(mp.buyer_fee,0)) as total_fee FROM data d left JOIN customerdata cd ON d.uid = cd.uid LEFT JOIN merchant_purchase mp ON d.id = mp.data_id WHERE d.what IN (26,0, 15) AND d.flags IN (1,9,10,12 ) AND lower(cd.country) = 'us' AND date_part('year',d.time)= 2004 GROUP BY mystate ORDER BY mystate;
 mystate | total_amount | total_fee -+--+--- | 3695 | 0 AR  | 3000 | 0 AZ  | 1399 | 0 CA  |   113100 |  6242 FL  |   121191 |  9796 GA  | 34826876 |    47 GEORGIA |    57990 |   &nbs
 p; 
 3500 IEIE    |   114000 |  4849 MD  |    2 |  1158 MI  |   906447 | 0 NY  | 8000 |   600 PA  | 6200 |   375 SC  |    25000 |   600 TN  |  1443681 |  1124
 |    13300 | 0(15 rows)
If you notice, my problem in this query is that the records for GA, GEORGIA appear separately. But what I want to do is  to have them combined to a single entry with their values summed up . Initially we had accepted both formats as input for the state field. Also, there are some invalid entries for the state field (like the "IEIE" and null values), which appear because the input for state was not validated initially. These entries have to be eliminated from the report.This query did not take a long time to complete, but did not meet the needs for the report. 
 
So, the query was rewritten to the following query which takes nearly 7-8 mins to complete on our test database:
 
SELECT (SELECT DISTINCT pc.state FROM postalcode pc WHERE UPPER(cd.state) IN (pc.state, pc.state_code)) as mystate, SUM(d.amount) as total_amount, SUM(COALESCE(d.fee,0) + COALESCE(mp.seller_fee, 0) + COALESCE(mp.buyer_fee,0)) as total_fee FROM data d JOIN customerdata cd ON d.uid = cd.uid LEFT JOIN merchant_purchase mp ON d.id = mp.data_id WHERE d.what IN (26,0, 15) AND d.flags IN (1,9,10,12 ) AND lower(cd.country) = 'us' AND date_part('year', d.time) = 2004 GROUP BY mystate ORDER BY mystate;    mystate | total_amount | total_fee +--+--- ARIZONA    | 1399 | 0 ARKANSAS   | 3000
 | 0 CALIFORNIA |   113100 |  6242 FLORIDA    |   121191 |  9796 GEORGIA    | 34884866 |    482388 MARYLAND   |    2 |  1158 MICHIGAN   |   906447 | 0 NEW YORK   | 8000 |   600 PENNSYLVANIA   | 6200
 |   375 SOUTH CAROLINA |    25000 |   600 TENNESSEE  |  1443681 |  1124    |   130995 |  4849
 
Here is the explain analyze of this query:
  QUERY PLAN   Aggregate  (cost=1226.57..1226.58 rows=1 width=38) (actual time=362355.58..362372.09 rows=12 loops=1)   ->  Group  (cost=1226.57..1226.57 rows=1 width=38) (actual
 time=362355.54..362367.73 rows=2197 loops=1) ->  Sort  (cost=1226.57..1226.57 rows=1 width=38) (actual time=362355.53..362356.96 rows=2197 loops=1)   Sort Key: (subplan)   ->  Nested Loop  (cost=0.00..1226.56 rows=1 width=38) (actual time=166.11..362321.46 rows=2197 loops=1) ->  Nested Loop  (cost=0.00..1220.53 rows=1 width=26) (actual time=1.68..361.32 rows=2115 loops=1)   ->  Seq Scan on customerdata cd  (cost=0.00..274.32 ro
 ws=31
 width=10) (actual time=0.04..29.87 rows=3303 loops=1) Filter: (lower((country)::text) = 'us'::text)   ->  Index Scan using data_uid_idx on data d  (cost=0.00..30.08 rows=1 width=16) (actual time=0.04..0.09 rows=1 loops=3303) Index Cond: (d.uid =
 "outer".uid) Filter: (((what = 26) OR (what = 0) OR (what = 15)) AND ((flags = 1) OR (flags = 9) OR (flags = 10) OR (flags = 12)) AND (date_part('year'::text, "time") = 2004::double precision)) ->  Index Scan using merchant_purchase_data_idx on merchant_purchase mp  (cost=0.00..6.01 rows=1 width=12) (actual time=0.05..0.05 rows=1 loops=2115)   Index Cond: ("outer".id =
 mp.data_id) SubPlan   ->  Unique  (cost=2237.12..2243.22 rows=122 width=13) (actual time=161.25..164.68 rows=1 loops=2197) ->  Sort  (cost=2237.12..2240.17 rows=1220 width=13) (actual time=161.21..161.88 rows=1033 loops=2197)   Sort Key: