[PERFORM] Nested Loop vs Hash Join based on predicate?

2016-03-19 Thread Doiron, Daniel
I have the following queries:

EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
select[…]
from  f_calc_service   a11,
d_patient_typea12
where   a11.d_patient_pop_id in (336)
 and a11.d_patient_type_id = a12.id
 and a12.short_name = 'I'
group by  a11.d_rate_schedule_id,
a11.d_payer_id,
a11.d_patient_pop_id,
a11.d_patient_type_id
;

And

EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
select […]
from  f_calc_service   a11,
d_patient_typea12
where   a11.d_patient_pop_id in (336)
 and a11.d_patient_type_id = a12.id
 and a12.short_name = 'O'
group by  a11.d_rate_schedule_id,
a11.d_payer_id,
a11.d_patient_pop_id,
a11.d_patient_type_id
;

Making this one change from short_name = ‘I’ to short_name = ‘O’ changes the 
query execution from 200k ms to 280ms. The first one chooses a Nested Loop, the 
second chooses a hash join. How do I get them both to choose the same? There 
are no values for d_patient_pop_id in (336) and short_name = ‘I’.

Thanks!

Dan




Re: [PERFORM] Nested Loop vs Hash Join based on predicate?

2016-03-19 Thread Pavel Stehule
2016-03-16 21:23 GMT+01:00 Doiron, Daniel :

> I have the following queries:
>
> EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
> select[…]
> from  f_calc_service   a11,
> d_patient_typea12
> where   a11.d_patient_pop_id in (336)
>  and a11.d_patient_type_id = a12.id
>  and a12.short_name = 'I'
> group by  a11.d_rate_schedule_id,
> a11.d_payer_id,
> a11.d_patient_pop_id,
> a11.d_patient_type_id
> ;
>
> And
>
> EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING)
> select […]
> from  f_calc_service   a11,
> d_patient_typea12
> where   a11.d_patient_pop_id in (336)
>  and a11.d_patient_type_id = a12.id
>  and a12.short_name = 'O'
> group by  a11.d_rate_schedule_id,
> a11.d_payer_id,
> a11.d_patient_pop_id,
> a11.d_patient_type_id
> ;
>
> Making this one change from short_name = ‘I’ to short_name = ‘O’ changes
> the query execution from 200k ms to 280ms. The first one chooses a Nested
> Loop, the second chooses a hash join. How do I get them both to choose the
> same? There are no values for d_patient_pop_id in (336) and short_name =
> ‘I’.
>

we don't see plans, so it is blind shot,

Probably the estimation for 'I' value is pretty underestimated - so planner
choose nested loop. The reasons can be different - possible correlation
inside data for example.

You can try:

0) ensure so your statistic are current - run statement ANALYZE

a) increase statistic by statement ALTER TABLE xx ALTER COLUMN yyy SET
STATISTICS some number

b) penalize nested loop - statement SET enable_nestloop TO off;

Regards

Pavel


>
> Thanks!
>
> Dan
>
>
>


[PERFORM] Nested loop issue

2014-04-08 Thread Manoj Gadi
Hi All,

I have been looking for a solution to a problem where my query is executing for 
a long time because it is running into a nested loop problem.

I have done explain analyze and it shows the query taking a very long time due 
to nested loops.

On the DB side, there are indices in place for all the required columns. By 
setting nested loop off there is a drastic increase in performance (from 40,000 
ms to 600 ms) but I know this is not a right practice.

My postgres version is 9.3.2 on linux.

Please find the link for the query plan below :

http://explain.depesz.com/s/l9o


Also, find below the query that is being executed.

SELECT DISTINCT
   Sektion/Fachbereich.parent,
   Studienfach.ltxt,
   SUM(CASE
  WHEN Studiengang.faktor IS NOT NULL
  AND Studiengang.faktor = 0 THEN Studiengang.faktor * 
Studierende.summe
  ELSE Studierende.summe
   END)

FROM (
   SELECT sos_stg_aggr.tid_stg, sos_stg_aggr.ca12_staat, 
sos_stg_aggr.geschlecht, sos_stg_aggr.alter, sos_stg_aggr.hzbart, 
sos_stg_aggr.hmkfzkz, sos_stg_aggr.hmkfz, sos_stg_aggr.semkfzkz, 
sos_stg_aggr.semkfz, sos_stg_aggr.hzbkfzkz, sos_stg_aggr.hzbkfz, 
sos_stg_aggr.hrst, sos_stg_aggr.studiengang_nr, sos_stg_aggr.fach_nr, 
sos_stg_aggr.fach_sem_zahl, sos_stg_aggr.sem_rueck_beur_ein, 
sos_stg_aggr.kz_rueck_beur_ein, sos_stg_aggr.klinsem, sos_stg_aggr.hssem, 
sos_stg_aggr.stuart, sos_stg_aggr.stutyp, sos_stg_aggr.stufrm, 
sos_stg_aggr.stichtag, sos_stg_aggr.summe, sos_stg_aggr.hzbart_int, 
sos_stg_aggr.matrikel_nr, sos_stg_aggr.ch27_grund_beurl, 
sos_stg_aggr.ch62_grund_exmatr, sos_stg_aggr.hzbnote, 
textcat(sos_stg_aggr.studiengang_nr::text, sos_stg_aggr.fach_nr::text) AS 
koepfe_faelle
   FROM sos_stg_aggr

   union all

   SELECT sos_stg_aggr.tid_stg, sos_stg_aggr.ca12_staat, 
sos_stg_aggr.geschlecht, sos_stg_aggr.alter, sos_stg_aggr.hzbart, 
sos_stg_aggr.hmkfzkz, sos_stg_aggr.hmkfz, sos_stg_aggr.semkfzkz, 
sos_stg_aggr.semkfz, sos_stg_aggr.hzbkfzkz, sos_stg_aggr.hzbkfz, 
sos_stg_aggr.hrst, sos_stg_aggr.studiengang_nr, sos_stg_aggr.fach_nr, 
sos_stg_aggr.fach_sem_zahl, sos_stg_aggr.sem_rueck_beur_ein, 
sos_stg_aggr.kz_rueck_beur_ein, sos_stg_aggr.klinsem, sos_stg_aggr.hssem, 
sos_stg_aggr.stuart, sos_stg_aggr.stutyp, sos_stg_aggr.stufrm, 
sos_stg_aggr.stichtag, sos_stg_aggr.summe, sos_stg_aggr.hzbart_int, 
sos_stg_aggr.matrikel_nr, sos_stg_aggr.ch27_grund_beurl, 
sos_stg_aggr.ch62_grund_exmatr, sos_stg_aggr.hzbnote, '21' AS koepfe_faelle
   FROM sos_stg_aggr
   where sos_stg_aggr.tid_stg in (select distinct lehr_stg_ab_tid from 
lehr_stg_ab2fb)

) AS Studierende
INNER JOIN (
   select astat::integer, trim(druck) as druck from sos_k_status

) AS Rückmeldestatus
ON (
   Studierende.kz_rueck_beur_ein = Rückmeldestatus.astat
)
INNER JOIN (
   select tid, trim(name) as name from sos_stichtag

) AS Stichtag
ON (
   Studierende.stichtag = Stichtag.tid
)
INNER JOIN (
   select abschluss, kz_fach, stg, pversion, regel, trim(text) as text, fb, 
lehr, anteil, tid,null as faktor from lehr_stg_ab
where lehr_stg_ab.tid not in (select lehr_stg_ab_tid from lehr_stg_ab2fb)

union
select abschluss, kz_fach, stg, pversion, regel, trim(text) as text, 
lehr_stg_ab2fb.fb, lehr, anteil, tid,faktor from lehr_stg_ab
inner join lehr_stg_ab2fb
on lehr_stg_ab2fb.lehr_stg_ab_tid = lehr_stg_ab.tid

) AS Studiengang
ON (
   Studierende.tid_stg = Studiengang.tid
)
INNER JOIN (
   select astat, astfr, astgrp, fb, trim(ltxt) as ltxt, stg from k_stg

) AS Studienfach
ON (
   Studiengang.stg = Studienfach.stg
)
AND (
   Studienfach.ltxt IS NOT NULL
)
INNER JOIN (
   select instnr, ch110_institut, btrim(druck) as druck, btrim(parent) as 
parent from unikn_k_fb

) AS Sektion/Fachbereich
ON (
   Studiengang.fb = Sektion/Fachbereich.instnr
)
INNER JOIN (
   select apnr, trim(druck) as druck from cifx where key=613

) AS Hörerstatus
ON (
   Studierende.hrst = Hörerstatus.apnr
)
WHERE
(
   Sektion/Fachbereich.druck = 'FB Biologie'
)
AND
 (
   (
  Hörerstatus.druck = 'Haupthörer/in'
  AND Stichtag.name = 'Amtl. Statistik Land'
  AND Rückmeldestatus.druck IN ('Beurlaubung', 'Ersteinschreibung', 
'Neueinschreibung', 'Rückmeldung')
  AND Studierende.sem_rueck_beur_ein = 20132
   )
)
GROUP BY
   Sektion/Fachbereich.parent,
   Studienfach.ltxt


According to my analysis, the where clause after the Union All is taking a lot 
of time for execution.

Any help with an alternative way to represent the query or what the cause of 
issue would be very helpful.


Thanks in advance,
Manoj


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Nested loop issue

2014-04-08 Thread Dhananjay Singh
REPLACE -- where sos_stg_aggr.tid_stg in (select distinct lehr_stg_ab_tid from 
lehr_stg_ab2fb)

WITH -- where sos_stg_aggr.tid_stg EXISTS (select distinct lehr_stg_ab_tid from 
lehr_stg_ab2fb)

Similarly others also like -- lehr_stg_ab.tid not in (select lehr_stg_ab_tid 
from lehr_stg_ab2fb) with NOT EXISTS


This should surely going to improve performance depending on results from inner 
query.

Regards
Dhananjay
OpenSCG


On Tuesday, 8 April 2014 3:06 PM, Manoj Gadi manoj.g...@uni-konstanz.de wrote:
 
Hi All,

I have been looking for a solution to a problem where my query is executing for 
a long time because it is running into a nested loop problem.

I have done explain analyze and it shows the query taking a very long time due 
to nested loops.

On the DB side, there are indices in place for all the required columns. By 
setting nested loop off there is a drastic increase in performance (from 40,000 
ms to 600 ms) but I know this is not a right practice.

My postgres version is 9.3.2 on linux.

Please find the link for the query plan below :

http://explain.depesz.com/s/l9o


Also, find below the query that is being executed.

SELECT DISTINCT
   Sektion/Fachbereich.parent,
   Studienfach.ltxt,
   SUM(CASE
      WHEN Studiengang.faktor IS NOT NULL
      AND Studiengang.faktor = 0 THEN Studiengang.faktor * 
Studierende.summe
      ELSE Studierende.summe
   END)

FROM (
   SELECT sos_stg_aggr.tid_stg, sos_stg_aggr.ca12_staat, 
sos_stg_aggr.geschlecht, sos_stg_aggr.alter, sos_stg_aggr.hzbart, 
sos_stg_aggr.hmkfzkz, sos_stg_aggr.hmkfz, sos_stg_aggr.semkfzkz, 
sos_stg_aggr.semkfz, sos_stg_aggr.hzbkfzkz, sos_stg_aggr.hzbkfz, 
sos_stg_aggr.hrst, sos_stg_aggr.studiengang_nr, sos_stg_aggr.fach_nr, 
sos_stg_aggr.fach_sem_zahl, sos_stg_aggr.sem_rueck_beur_ein, 
sos_stg_aggr.kz_rueck_beur_ein, sos_stg_aggr.klinsem, sos_stg_aggr.hssem, 
sos_stg_aggr.stuart, sos_stg_aggr.stutyp, sos_stg_aggr.stufrm, 
sos_stg_aggr.stichtag, sos_stg_aggr.summe, sos_stg_aggr.hzbart_int, 
sos_stg_aggr.matrikel_nr, sos_stg_aggr.ch27_grund_beurl, 
sos_stg_aggr.ch62_grund_exmatr, sos_stg_aggr.hzbnote, 
textcat(sos_stg_aggr.studiengang_nr::text, sos_stg_aggr.fach_nr::text) AS 
koepfe_faelle
   FROM sos_stg_aggr

   union all

   SELECT sos_stg_aggr.tid_stg, sos_stg_aggr.ca12_staat, 
sos_stg_aggr.geschlecht, sos_stg_aggr.alter, sos_stg_aggr.hzbart, 
sos_stg_aggr.hmkfzkz, sos_stg_aggr.hmkfz, sos_stg_aggr.semkfzkz, 
sos_stg_aggr.semkfz, sos_stg_aggr.hzbkfzkz, sos_stg_aggr.hzbkfz, 
sos_stg_aggr.hrst, sos_stg_aggr.studiengang_nr, sos_stg_aggr.fach_nr, 
sos_stg_aggr.fach_sem_zahl, sos_stg_aggr.sem_rueck_beur_ein, 
sos_stg_aggr.kz_rueck_beur_ein, sos_stg_aggr.klinsem, sos_stg_aggr.hssem, 
sos_stg_aggr.stuart, sos_stg_aggr.stutyp, sos_stg_aggr.stufrm, 
sos_stg_aggr.stichtag, sos_stg_aggr.summe, sos_stg_aggr.hzbart_int, 
sos_stg_aggr.matrikel_nr, sos_stg_aggr.ch27_grund_beurl, 
sos_stg_aggr.ch62_grund_exmatr, sos_stg_aggr.hzbnote, '21' AS koepfe_faelle
   FROM sos_stg_aggr
   where sos_stg_aggr.tid_stg in (select distinct lehr_stg_ab_tid from 
lehr_stg_ab2fb)

) AS Studierende
INNER JOIN (
   select astat::integer, trim(druck) as druck from sos_k_status

) AS Rückmeldestatus
ON (
   Studierende.kz_rueck_beur_ein = Rückmeldestatus.astat
)
INNER JOIN (
   select tid, trim(name) as name from sos_stichtag

) AS Stichtag
ON (
   Studierende.stichtag = Stichtag.tid
)
INNER JOIN (
   select abschluss, kz_fach, stg, pversion, regel, trim(text) as text, fb, 
lehr, anteil, tid,null as faktor from lehr_stg_ab
where lehr_stg_ab.tid not in (select lehr_stg_ab_tid from lehr_stg_ab2fb)

union
select abschluss, kz_fach, stg, pversion, regel, trim(text) as text, 
lehr_stg_ab2fb.fb, lehr, anteil, tid,faktor from lehr_stg_ab
inner join lehr_stg_ab2fb
on lehr_stg_ab2fb.lehr_stg_ab_tid = lehr_stg_ab.tid

) AS Studiengang
ON (
   Studierende.tid_stg = Studiengang.tid
)
INNER JOIN (
   select astat, astfr, astgrp, fb, trim(ltxt) as ltxt, stg from k_stg

) AS Studienfach
ON (
   Studiengang.stg = Studienfach.stg
)
AND (
   Studienfach.ltxt IS NOT NULL
)
INNER JOIN (
   select instnr, ch110_institut, btrim(druck) as druck, btrim(parent) as 
parent from unikn_k_fb

) AS Sektion/Fachbereich
ON (
   Studiengang.fb = Sektion/Fachbereich.instnr
)
INNER JOIN (
   select apnr, trim(druck) as druck from cifx where key=613

) AS Hörerstatus
ON (
   Studierende.hrst = Hörerstatus.apnr
)
WHERE
(
   Sektion/Fachbereich.druck = 'FB Biologie'
)
AND
(
   (
      Hörerstatus.druck = 'Haupthörer/in'
      AND Stichtag.name = 'Amtl. Statistik Land'
      AND Rückmeldestatus.druck IN ('Beurlaubung', 'Ersteinschreibung', 
'Neueinschreibung', 'Rückmeldung')
      AND Studierende.sem_rueck_beur_ein = 20132
   )
)
GROUP BY
   Sektion/Fachbereich.parent,
   Studienfach.ltxt


According to my analysis, the where clause after the Union All is taking a lot 
of time for execution.

Any help with an alternative way to represent the query or what the cause of 
issue would be very helpful.


Thanks in 

Re: [PERFORM] Nested loop and simple join query - slow after upgrade to 9.2

2013-01-25 Thread Jeff Janes
On Fri, Jan 25, 2013 at 7:34 AM, alexandre - aldeia digital
adald...@gmail.com wrote:
 Hi,

 Last weekend, we upgrade a PG from 8.4 to 9.2 version (full
 pg_dump/restore/vacuum/analyze). After this, some simple join querys
 became very slow, maybe because the use of nested loops. Bellow, an example
 with nestedloop on and off:

What happens if you bump up default_statistics_target by a factor of
10 or 100 and redo the analyze?

Here it is finding 39 times more rows than expected:

Index Scan using ad_pc13t3_modpadrao on pc13t3 t1 (cost=0.00..6.21
rows=1 width=65) (actual time=0.090..0.252 rows=39 loops=1)

It would interesting to know why that is.

Cheers,

Jeff


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Nested loop and simple join query - slow after upgrade to 9.2

2013-01-25 Thread alexandre - aldeia digital

Em 25-01-2013 16:29, Jeff Janes escreveu:

On Fri, Jan 25, 2013 at 7:34 AM, alexandre - aldeia digital
adald...@gmail.com wrote:

Hi,

Last weekend, we upgrade a PG from 8.4 to 9.2 version (full
pg_dump/restore/vacuum/analyze). After this, some simple join querys
became very slow, maybe because the use of nested loops. Bellow, an example
with nestedloop on and off:


What happens if you bump up default_statistics_target by a factor of
10 or 100 and redo the analyze?


Before send the e-mail, the default_statistics_target was 500 and I 
return to 100 (default). I will try to set 1000.



Here it is finding 39 times more rows than expected:

Index Scan using ad_pc13t3_modpadrao on pc13t3 t1 (cost=0.00..6.21
rows=1 width=65) (actual time=0.090..0.252 rows=39 loops=1)

It would interesting to know why that is.


This is a partial index:

ad_pc13t3_modpadrao btree (pc13emp08p, pc13anoped, pc13codped, 
pc13codigo, pc13cor, pc13empins, pc13emp08, pc13tipin2, pc13insest) 
WHERE pc13emp08p = 0 AND pc13anoped = 0 AND pc13codped = 0 AND pc13item = 0


I can't connect to databse now. I will retry tests in sunday.

Best regards,

Alexandre


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Nested loop Query performance on PK

2009-07-26 Thread nha
Hello,

Le 26/07/09 7:09, Greg Caulton a écrit :
 On Sun, Jul 26, 2009 at 1:02 AM, Greg Caulton caulton...@gmail.com
 mailto:caulton...@gmail.com wrote:
 
 Hello,
 
 It seems to me that the following query should be a lot faster. 
 This runs in 17 seconds (regardless how many times I run it)
 
 select ac.* from application_controls_view ac, refs r where
 ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%XYZ%';
 [...]
 Does this seem right to you?  Anything I can tune ? 
 [...]
 
 Oh it seems to be the join that is throwing it off, because this runs in
 600 ms
 
 select ac.* from application_controls_view ac
 where ac.application_control_id in (
 5745,
 5760, 
 [...]
 50021066,
 50020808
 )
 
 never mind, makes sense now  - its fixed
 [...]

The following rewritten query may be satisfiable for the generic case of
using arbitrary LIKE pattern for refs.ref_key and performing in a short
acceptable time as well:

SELECT ac.*
FROM application_controls_view AS ac
INNER JOIN (
SELECT ref_id
FROM refs
WHERE ref_key LIKE '%XYZ%'
) AS r
ON ac.custom_controller_ref_id = r.ref_id;

The hint is to build a subquery, from refs table, and to move in the
WHERE clause that only refers to refs column (ref_key here). This
subquery results in a shorter table than the original (refs here),
thence reducing the number of joins to perform with ac (no matter
working with view or original table).

Regards.

--
nha / Lyon / France.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Nested loop Query performance on PK

2009-07-25 Thread Greg Caulton
Hello,

It seems to me that the following query should be a lot faster.  This runs
in 17 seconds (regardless how many times I run it)

select ac.* from application_controls_view ac, refs r where
ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%XYZ%';

if I do not use the view the query runs in under 100 ms

select ac.* from application_controls ac, refs r where
ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%XYZ%';


The view is

 SELECT t.action_form_type_ref_id, r1.display AS action_form_type_display,
t.action_order_type_ref_id, r2.display AS action_order_type_display,
t.action_view_ref_id, r3.display AS action_view_display, t.active_ind,
t.application_control_id, t.application_control_name, t.application_view_id,
t.background_color, t.background_processing_ind, t.base_model_type_ref_id,
r4.display AS base_model_type_display, t.base_value_script,
t.class_instantiate_script, t.class_name, t.combo_add_to_list_ind,
t.combo_category_ref_id, r5.display AS combo_category_display,
t.combo_dynamic_search_ind, t.combo_filter_ref_id, r6.display AS
combo_filter_display, t.combo_group_ref_id, r7.display AS
combo_group_display, t.combo_short_display_ind, t.combo_values_term_id,
t.comparison_operator_ref_id, r8.display AS comparison_operator_display,
t.context_ref_id, r9.display AS context_display, t.control_description,
t.control_format, t.control_format_ref_id, r10.display AS
 snip for brevity
t.parameter_ref_id = r30.ref_id AND t.parameter_source_ref_id = r31.ref_id
AND t.record_item_ref_id = r32.ref_id AND t.repeating_section_view_ref_id =
r33.ref_id AND t.report_print_ref_id = r34.ref_id AND
t.right_arrow_action_ref_id = r35.ref_id AND t.right_click_action_ref_id =
r36.ref_id AND t.section_view_ref_id = r37.ref_id AND t.select_action_ref_id
= r38.ref_id AND t.source_ref_id = r39.ref_id AND t.state_field_type_ref_id
= r40.ref_id AND t.table_access_ref_id = r41.ref_id AND t.update_user_ref_id
= r42.ref_id AND t.value_data_type_ref_id = r43.ref_id;

so basically it joins 43 times to the refs table on the primary key

the explain confirms the nested loops


   {NESTLOOP 
   :startup_cost 2660771.70 
   :total_cost 3317979.85 
   :plan_rows 27 
   :plan_width 4708 
   :targetlist (
  {TARGETENTRY 
  :expr 
 {VAR 
 :varno 65001 
 :varattno 29 
 :vartype 20 
 :vartypmod -1 
 :varlevelsup 0 
 :varnoold 5 
snip for brevity
  -  Index Scan using refs_pk on refs r17  (cost=0.00..5.45
rows=1 width=50)
Index Cond: (r17.ref_id = t.detail_record_item_ref_id)
-  Index Scan using refs_pk on refs r1  (cost=0.00..5.45 rows=1
width=50)
  Index Cond: (r1.ref_id = t.action_form_type_ref_id)
  -  Index Scan using refs_pk on refs r  (cost=0.00..5.45 rows=1 width=8)
Index Cond: (r.ref_id = t.custom_controller_ref_id)
Filter: ((r.ref_key)::text ~~ '%ERNEST%'::text)


I did a vacuum analyze and so the primary key (indexes of course) is being
used.  But the above query is still 17s.  If I dont return so many columns
it comes down to around 10 seconds.

select ac.application_control_id from application_controls_view ac, refs r
where ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%ERNEST%';

But in either case this is only 37 rows.  So 1554 lookups on a unique index
on a table of 34000 rows means 6ms per internal join - note that many of
those values are the same.

Does this seem right to you?  Anything I can tune ?



-- 
Gregory Caulton
Principal at PatientOS Inc.
personal email: caulton...@gmail.com
http://www.patientos.com
corporate: (888)-NBR-1EMR || fax  857.241.3022


Re: [PERFORM] Nested loop Query performance on PK

2009-07-25 Thread Greg Caulton
On Sun, Jul 26, 2009 at 1:02 AM, Greg Caulton caulton...@gmail.com wrote:

 Hello,

 It seems to me that the following query should be a lot faster.  This runs
 in 17 seconds (regardless how many times I run it)

 select ac.* from application_controls_view ac, refs r where
 ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%XYZ%';

 if I do not use the view the query runs in under 100 ms

 select ac.* from application_controls ac, refs r where
 ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%XYZ%';


 The view is

  SELECT t.action_form_type_ref_id, r1.display AS action_form_type_display,
 t.action_order_type_ref_id, r2.display AS action_order_type_display,
 t.action_view_ref_id, r3.display AS action_view_display, t.active_ind,
 t.application_control_id, t.application_control_name, t.application_view_id,
 t.background_color, t.background_processing_ind, t.base_model_type_ref_id,
 r4.display AS base_model_type_display, t.base_value_script,
 t.class_instantiate_script, t.class_name, t.combo_add_to_list_ind,
 t.combo_category_ref_id, r5.display AS combo_category_display,
 t.combo_dynamic_search_ind, t.combo_filter_ref_id, r6.display AS
 combo_filter_display, t.combo_group_ref_id, r7.display AS
 combo_group_display, t.combo_short_display_ind, t.combo_values_term_id,
 t.comparison_operator_ref_id, r8.display AS comparison_operator_display,
 t.context_ref_id, r9.display AS context_display, t.control_description,
 t.control_format, t.control_format_ref_id, r10.display AS
  snip for brevity
 t.parameter_ref_id = r30.ref_id AND t.parameter_source_ref_id = r31.ref_id
 AND t.record_item_ref_id = r32.ref_id AND t.repeating_section_view_ref_id =
 r33.ref_id AND t.report_print_ref_id = r34.ref_id AND
 t.right_arrow_action_ref_id = r35.ref_id AND t.right_click_action_ref_id =
 r36.ref_id AND t.section_view_ref_id = r37.ref_id AND t.select_action_ref_id
 = r38.ref_id AND t.source_ref_id = r39.ref_id AND t.state_field_type_ref_id
 = r40.ref_id AND t.table_access_ref_id = r41.ref_id AND t.update_user_ref_id
 = r42.ref_id AND t.value_data_type_ref_id = r43.ref_id;

 so basically it joins 43 times to the refs table on the primary key

 the explain confirms the nested loops


{NESTLOOP 
:startup_cost 2660771.70 
:total_cost 3317979.85 
:plan_rows 27 
:plan_width 4708 
:targetlist (
   {TARGETENTRY 
   :expr 
  {VAR 
  :varno 65001 
  :varattno 29 
  :vartype 20 
  :vartypmod -1 
  :varlevelsup 0 
  :varnoold 5 
 snip for brevity
   -  Index Scan using refs_pk on refs r17  (cost=0.00..5.45
 rows=1 width=50)
 Index Cond: (r17.ref_id =
 t.detail_record_item_ref_id)
 -  Index Scan using refs_pk on refs r1  (cost=0.00..5.45 rows=1
 width=50)
   Index Cond: (r1.ref_id = t.action_form_type_ref_id)
   -  Index Scan using refs_pk on refs r  (cost=0.00..5.45 rows=1
 width=8)
 Index Cond: (r.ref_id = t.custom_controller_ref_id)
 Filter: ((r.ref_key)::text ~~ '%ERNEST%'::text)


 I did a vacuum analyze and so the primary key (indexes of course) is being
 used.  But the above query is still 17s.  If I dont return so many columns
 it comes down to around 10 seconds.

 select ac.application_control_id from application_controls_view ac, refs r
 where ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%ERNEST%';

 But in either case this is only 37 rows.  So 1554 lookups on a unique index
 on a table of 34000 rows means 6ms per internal join - note that many of
 those values are the same.

 Does this seem right to you?  Anything I can tune ?



 --
 Gregory Caulton
 Principal at PatientOS Inc.
 personal email: caulton...@gmail.com
 http://www.patientos.com
 corporate: (888)-NBR-1EMR || fax  857.241.3022


Oh it seems to be the join that is throwing it off, because this runs in 600
ms

select ac.* from application_controls_view ac
where ac.application_control_id in (
5745,
5760,
5759,
5758,
5757,
5756,
5753,
5751,
5750,
5749,
5748,
5746,
5744,
50001328,
5752,
5754,
5755,
50002757,
50002756,
50002755,
50002754,
50001168,
50020825,
50021077,
50020821,
50020822,
50020824,
50020823,
50020820,
50020819,
50020809,
50020810,
50020806,
50020807,
50020817,
50021066,
50020808
)



never mind, makes sense now  - its fixed


-- 
Gregory Caulton
Principal at PatientOS Inc.
personal email: caulton...@gmail.com
http://www.patientos.com
corporate: (888)-NBR-1EMR || fax  857.241.3022


Re: [PERFORM] Nested Loop Killer on 8.1

2009-06-26 Thread Tom Lane
Dave North dno...@signiant.com writes:
 The outstanding question here is why does the explain analyze take
 (quite a bit) longer than just executing the query?

EXPLAIN ANALYZE has nontrivial measurement overhead, especially on
platforms with slow gettimeofday().  Old/cheap PC hardware, in particular,
tends to suck in this respect.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Nested Loop Killer on 8.1

2009-06-25 Thread Josh Berkus

Dave,


Is there further optimizations we can do to change the plan?
Is this perhaps addressed in a later release?


Given the left joins, a later release might help; I know we did a lot to 
improve left join plans in 8.3.  It would be worth testing if you can 
test an upgrade easily.


--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Nested Loop Killer on 8.1

2009-06-25 Thread Greg Stark
On Wed, Jun 24, 2009 at 1:43 PM, Dave Northdno...@signiant.com wrote:

 Essentially, we're seeing a query plan that is taking 95 secs with a nested
 loop execution plan and 1 sec with a merge join plan.  We've tried
 increasing the default_statistics_target to 1000 and re-analyzed but the
 same query plan is returned.  If we then force nested loops off (set
 enable_nestloop=false), the optimizer chooses the better plan and execution
 is under 1 second.

 Default explain plan: http://explain.depesz.com/s/a3  (execution time
 95secs)

 Nested loops off plan: http://explain.depesz.com/s/JV (execution time ~
 1sec)

The planner is coming up with a bad estimate for the number of rows
matching this filter:

Filter: ((prop_key)::text ~~ 'location_node_directory_outbox'::text)

Which is coming from this condition:

 AND
    web_user_property_directory_outbox.prop_key like
 'location_node_directory_outbox'

Why use like for a constant string with no % or _ characters? If you
used = the planner might be able to come up with a better estimate.

That said I suspect Dave's right that your best course of action would
be to update to 8.3 or wait a couple weeks and update to 8.4 when it
comes out.

Regardless you *really* want to update your 8.1.8 install to the
latest bug-fix release (currently 8.1.17). That's not an upgrade and
won't need a dump/reload.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Nested Loop Killer on 8.1

2009-06-25 Thread Mark Mielke

On 06/25/2009 04:36 PM, Greg Stark wrote:

AND
web_user_property_directory_outbox.prop_key like
'location_node_directory_outbox'
 


Why use like for a constant string with no % or _ characters? If you
used = the planner might be able to come up with a better estimate


Any reason why like with a constant string without % or _ is not 
optimized to = today?


Cheers,
mark

--
Mark Mielkem...@mielke.cc



Re: [PERFORM] Nested Loop Killer on 8.1

2009-06-25 Thread Tom Lane
Greg Stark gsst...@mit.edu writes:
 On Wed, Jun 24, 2009 at 1:43 PM, Dave Northdno...@signiant.com wrote:
 Why use like for a constant string with no % or _ characters? If you
 used = the planner might be able to come up with a better estimate.

Uh, it appears to me the string *does* contain _ characters; perhaps the
OP has neglected to escape those?

The planner does know enough to estimate LIKE with a fixed pattern as
being equivalent to =.  I think it knew that even back in 8.1, but am
too lazy to look right now.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Nested Loop Killer on 8.1

2009-06-25 Thread Greg Stark
On Thu, Jun 25, 2009 at 10:05 PM, Tom Lanet...@sss.pgh.pa.us wrote:

 Uh, it appears to me the string *does* contain _ characters; perhaps the
 OP has neglected to escape those?

Sigh. Indeed.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Nested Loop Killer on 8.1

2009-06-24 Thread Dave North
Morning all,
A colleague here tried to post this yesterday but it was stalled for
some reason.  Anyway, here's what we're seeing which hopefully someone
has some pointers for.
 
Essentially, we're seeing a query plan that is taking 95 secs with a
nested loop execution plan and 1 sec with a merge join plan.  We've
tried increasing the default_statistics_target to 1000 and re-analyzed
but the same query plan is returned.  If we then force nested loops off
(set enable_nestloop=false), the optimizer chooses the better plan and
execution is under 1 second.
 
Default explain plan: http://explain.depesz.com/s/a3
http://explain.depesz.com/s/a3   (execution time 95secs)
 
Nested loops off plan: http://explain.depesz.com/s/JV
http://explain.depesz.com/s/JV  (execution time ~ 1sec)
 
We're currently running 8.1.8 (yeah, we know it's old skool but it's
embedded as part of an application) so the real questions are:
 
Is there further optimizations we can do to change the plan?
Is this perhaps addressed in a later release?
 
Some postgresql.conf settings that might be useful:
 
effective_cache_size   511082
 shared_buffers  3
 work_mem 4096
 random_page_cost4
 join_collapse_limit 8
 
 
and of course, the query in question that generates the plan:
 
SELECT web_user_type,
  web_user.web_user_id as id,
  cast(web_user_property_node.prop_val as numeric) as node_id ,
  node_name,
  last_name || ', ' || first_name  as name,
  web_user_property_directory_inbox.prop_val as
directory_location_inbox,
  web_user_property_directory_outbox.prop_val as
directory_location_outbox,
  username,
  first_name,
  last_name,
  email
FROM
 web_user LEFT JOIN web_user_property as
web_user_property_directory_outbox ON web_user.web_user_id =
web_user_property_directory_outbox.web_user_id AND 
   web_user_property_directory_outbox.prop_key like
'location_node_directory_outbox',  web_user_property,  web_user_property
as web_user_property_directory_inbox,
 web_user_property as web_user_property_node,  node WHERE
web_user.web_user_id = web_user_property_directory_inbox.web_user_id AND
web_user.web_user_id = web_user_property.web_user_id  AND
web_user_property.prop_key = 'location_node_enabled' AND
web_user_property.prop_val = 'true' AND
web_user_property_directory_inbox.prop_key like
'location_node_directory_inbox' AND  web_user.web_user_id =
web_user_property_node.web_user_id AND  web_user_property_node.prop_key
like 'location_node_id' AND  web_user_property_node.prop_val =
node.node_id AND  (first_name ilike '%' OR last_name ilike '%' OR 
   last_name || ',' || first_name ilike '%') AND  node.node_id  IN (
SELECT node_id FROM node_execute 
  WHERE acl_web_user_id = 249) AND
web_user.web_user_id IN ( SELECT web_user_id FROM web_user_read 
 WHERE acl_web_user_id  = 249  OR 
   web_user_id IN ( SELECT member_id FROM web_user_grp_member 
 WHERE web_user_id IN( SELECT acl_web_user_id 
  FROM web_user_read 
 WHERE web_user_id IN
(SELECT web_user_id FROM web_user_grp_member  
WHERE member_id =
249  ORDER BY name;

 
Thanks in advance
 
Dave
 
Dave North
dno...@signiant.com


Re: [PERFORM] Nested Loop join being improperly chosen

2008-08-28 Thread David Rowley
I had a similar problem here:
http://archives.postgresql.org/pgsql-bugs/2008-07/msg00026.php

Is the nested loop performing a LEFT join with yours? It's a little
difficult to tell just from the query plan you showed.

A work around for mine was to use a full outer join and eliminate the extra
rows in the where clause. A bit of a hack but it changed a 2 min query into
one that ran in under a second.

Of course this is not helping with your problem but at least may trigger
some more feedback.

David.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Brad Ediger
Sent: 22 August 2008 16:26
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Nested Loop join being improperly chosen

Hello,
I'm having trouble with a Nested Loop being selected for a rather  
complex query; it turns out this is a pretty bad plan as the nested  
loop's row estimates are quite off (1 estimated / 1207881 actual). If  
I disable enable_nestloop, the query executes much faster (42 seconds  
instead of 605). The tables in the query have all been ANALYZEd just  
before generating these plans.

Here are the plans with and without enable_nestloop:

http://pastie.org/258043

The inventory table is huge; it currently has about 1.3 x 10^9 tuples.  
The items table has around 10,000 tuples, and the other tables in the  
query are tiny.

Any ideas or suggestions would be greatly appreciated. Thanks!
--
Brad Ediger



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Nested Loop join being improperly chosen

2008-08-22 Thread Brad Ediger

Hello,
I'm having trouble with a Nested Loop being selected for a rather  
complex query; it turns out this is a pretty bad plan as the nested  
loop's row estimates are quite off (1 estimated / 1207881 actual). If  
I disable enable_nestloop, the query executes much faster (42 seconds  
instead of 605). The tables in the query have all been ANALYZEd just  
before generating these plans.


Here are the plans with and without enable_nestloop:

http://pastie.org/258043

The inventory table is huge; it currently has about 1.3 x 10^9 tuples.  
The items table has around 10,000 tuples, and the other tables in the  
query are tiny.


Any ideas or suggestions would be greatly appreciated. Thanks!
--
Brad Ediger



smime.p7s
Description: S/MIME cryptographic signature


[PERFORM] Nested loop vs merge join: inconsistencies between estimated and actual time

2008-03-06 Thread Vlad Arkhipov

I've came across this issue while writing report-like query for 2 not
very large tables. I've tried several methods to resolve this one (see
below). But now I'm really stuck...
PostgreSQL 8.3, default configuration

There are 2 tables (structure was simplified to show only problematic
place):
create table c
(
 id bigint primary key
 cdate date
);

create index c_cdate_idx on c (cdate);

create table i
(
 id bigint primary key,
 id_c bigint references c(id)
);

select count(*) from c

count

636 565

select count(*) from i

count

4 646 145

analyze i;
analyze c;

explain analyze
select id
from c
 join i on i.idc = c.id
where c.cdate between '2007-02-01' and '2007-02-16'

QUERY
PLAN 




- 



Merge Join  (cost=738.95..57864.63 rows=14479 width=8) (actual
time=13954.681..14358.731 rows=14583
loops=1)
 Merge Cond: (i.idc =
c.id) 



 -  Index Scan using fki_i_c_fk on i  (cost=0.00..194324.34
rows=4646145 width=8) (actual time=17.254..12061.414 rows=1042599 loops=1)
 -  Sort  (cost=738.94..756.88 rows=7178 width=8) (actual
time=53.942..75.013 rows=14583
loops=1)
   Sort Key:
c.id 



   Sort Method:  quicksort  Memory:
404kB 



   -  Index Scan using c_cdate_idx on c  (cost=0.00..279.21
rows=7178 width=8) (actual time=23.595..41.470 rows=7064 loops=1)
 Index Cond: ((cdate = '2007-02-01'::date) AND (cdate =
'2007-02-16'::date))
Total runtime: 14379.461
ms 




set enable_mergejoin to off;
set enable_hashjoin to off;

QUERY
PLAN 



-- 



Nested Loop  (cost=0.00..59833.70 rows=14479 width=8) (actual
time=0.129..153.038 rows=14583
loops=1)
 -  Index Scan using  c_cdate_idx on c  (cost=0.00..279.21 rows=7178
width=8) (actual time=0.091..14.468 rows=7064 loops=1)
   Index Cond: ((cdate = '2007-02-01'::date) AND (cdate =
'2007-02-16'::date))
 -  Index Scan using fki_i_c_fk on i  (cost=0.00..8.13 rows=13
width=8) (actual time=0.007..0.011 rows=2 loops=7064)
   Index Cond: (i.idc =
c.id) 



Total runtime: 172.599 ms

Ok, the first problem is here:
 -  Index Scan using fki_i_c_fk on i  (cost=0.00..8.13 rows=13
width=8) (actual time=0.007..0.011 rows=2 loops=7064)

I collected statistics for these tables at level 1000 for all columns.

select attname, null_frac, avg_width, n_distinct, correlation
from pg_stats
where tablename = 'i'

attname null_frac   avg_width n_distinct
correlation
--  --    -
--
id 0   8 -1 0,849796295166
idc0,7236369848251343  8 95583  0,999763011932373

Nice stats except of n_distinct for idc column.

select count(distinct idc)
from i

count

633 864

Of course it is not correct solution but...

update pg_statistic
set stadistinct = 633864
where starelid = ... and staattnum = ...

Reconnect and execute:

explain analyze
select id
from c
 join i on i.idc = c.id
where c.cdate between '2007-02-01' and '2007-02-16'

QUERY
PLAN 



-- 



Nested Loop  (cost=0.00..57342.39 rows=14479 width=8) (actual
time=0.133..151.426 rows=14583
loops=1)
 -  Index Scan using c_cdate_idx on c  (cost=0.00..279.21 rows=7178
width=8) (actual time=0.094..14.242 rows=7064 loops=1)
   Index Cond: ((cdate = '2007-02-01'::date) AND (cdate =
'2007-02-16'::date))
 -  Index Scan using fki_i_c_fk on i  (cost=0.00..7.92 rows=2 width=8)
(actual time=0.007..0.011 rows=2 loops=7064)
   Index Cond: (i.idc =
c.id) 



Total runtime: 170.911
ms 




But the reason of this issue is not the incorrect value of n_distinct.
Let's expand dates interval in WHERE clause.


explain analyze
select id
from c
 join i on i.idc = c.id
where c.cdate between '2007-02-01' and '2007-02-19'

QUERY
PLAN 



 



Merge Join  (cost=831.16..57981.98 rows=16155 width=8) (actual
time=11691.156..12155.201 rows=16357
loops=1)
 Merge Cond: (i.idc =
c.id) 



 -  Index Scan using fki_i_c_fk on i  (cost=0.00..194324.34
rows=4646145 width=8) (actual time=22.236..9928.489 rows=1044373 loops=1)
 -  Sort  (cost=831.15..851.17 rows=8009 width=8) (actual
time=31.660..55.277 rows=16357
loops=1)
   Sort Key:
c.id 



   Sort Method:  quicksort  Memory:
438kB 



   -  Index Scan using c_cdate_idx on c  (cost=0.00..311.87
rows=8009 width=8) (actual time=0.116..17.050 rows=7918 loops=1)
 Index Cond: ((cdate = '2007-02-01'::date) AND (cdate =

Re: [PERFORM] Nested loop vs merge join: inconsistencies between estimated and actual time

2008-03-06 Thread Tom Lane
Vlad Arkhipov [EMAIL PROTECTED] writes:
 I've came across this issue while writing report-like query for 2 not
 very large tables. I've tried several methods to resolve this one (see
 below). But now I'm really stuck...

It looks like you are wishing to optimize for all-in-memory situations,
in which case the traditional advice is to reduce random_page_cost to
something close to 1.  AFAICS all the rowcount estimates you're seeing
are spot on, or as close to spot on as you could realistically hope for,
and so the problem lies with the cost parameters.  Fooling with the
statistics is not going to help if the rowcount estimates are already
good.

(Note: the apparent undercounts you're seeing on indexscans on the outer
side of a mergejoin seem to be because the mergejoin terminates early
due to limited range of the other input join key.  The planner is
expecting this, as we can see because the predicted cost of the join is
actually much less than the predicted cost of running the input
indexscan to completion.  The cost ratio is about consistent with the
rowcount ratio, which makes me think it got these right too.)

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Nested Loop

2007-03-27 Thread Gauri Kanekar

Hi,

here is the query

SELECT rs.id AS sid, rs.name AS sname, rc.id AS campid, rc.name AS campname,
rc.rev_type AS revtype, rc.act_type AS actntype, ra.id AS advid, ra.name AS
advname, rpt_chn.id AS chanid, rpt_chn.name AS channame, rpt_cre.dn AS dn,
SUM(rm.imdel) AS impression, SUM(rm.cdel) AS click, rd.sqldate AS date FROM
rm, rn CROSS JOIN rd, ra, rs, rc, rpt_chn, rpt_cre WHERE rm.date_key =
rd.key AND rm.net_key = rn.key AND rm.adv_key = ra.key AND rm.camp_key =
rc.key AND rm.s_key = rs.key AND rm.chn_key = rpt_chn.key AND rm.cre_key =
rpt_cre.key AND ra.name != 'SYSTEM' AND rd.sqldate BETWEEN '12/1/2006' AND
'12/30/2006' AND ( rn.id IN ( 607 ) ) GROUP BY rd.sqldate, rs.id, rs.name,
ra.id, ra.name, rc.id, rc.name, rc.rev_type, rc.act_type, rpt_chn.id,
rpt_chn.name, rpt_cre.dn;



On 3/26/07, Ragnar [EMAIL PROTECTED] wrote:


On mán, 2007-03-26 at 20:33 +0530, Gauri Kanekar wrote:

you did not show your query, nor did you answer whather you had vacuumed
and analyzed.

 enable_seqscan = off

why this? this is unlikely to help



 QUERY PLAN
 ...
  -  Nested Loop
 (cost=0.00..1104714.83 rows=6801 width=44) (actual
 time=1820.153..229779.814 rows=10945938 loops=1)

the estimates are way off here. you sure you have analyzed?

gnari







--
Regards
Gauri


Re: [PERFORM] Nested Loop

2007-03-27 Thread Ragnar
On þri, 2007-03-27 at 16:13 +0530, Gauri Kanekar wrote:
  
 SELECT rs.id AS sid, rs.name AS sname, rc.id AS campid, rc.name AS
 campname, rc.rev_type AS revtype, rc.act_type AS actntype, ra.id AS
 advid, ra.name AS advname, rpt_chn.id AS chanid, rpt_chn.name AS
 channame, rpt_cre.dn AS dn, SUM(rm.imdel) AS impression, SUM(rm.cdel)
 AS click, rd.sqldate AS date FROM rm, rn CROSS JOIN rd, ra, rs, rc,
 rpt_chn, rpt_cre WHERE rm.date_key = rd.key AND rm.net_key = rn.key
 AND rm.adv_key = ra.key AND rm.camp_key = rc.key AND rm.s_key = rs.key
 AND rm.chn_key = rpt_chn.key AND rm.cre_key = rpt_cre.key AND
 ra.name != 'SYSTEM' AND rd.sqldate BETWEEN '12/1/2006' AND
 '12/30/2006' AND ( rn.id IN ( 607 ) ) GROUP BY rd.sqldate , rs.id,
 rs.name, ra.id, ra.name, rc.id, rc.name, rc.rev_type , rc.act_type,
 rpt_chn.id, rpt_chn.name, rpt_cre.dn;

you did not answer other questions, so do this:
1) VACUUM ANALYZE your database
2) set these in your postgresql.conf:
enable_seqscan = true
join_collapse_limit = 8
3) restart postgresql
4) do the EXPLAIN ANALYZE again, and send us it's output

gnari




---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] Nested Loop

2007-03-26 Thread Gauri Kanekar

Hi List,

how to speedup nested loop queries and by which parameters.
--
Regards
Gauri


Re: [PERFORM] Nested Loop

2007-03-26 Thread Michael Fuhr
On Mon, Mar 26, 2007 at 05:34:39PM +0530, Gauri Kanekar wrote:
 how to speedup nested loop queries and by which parameters.

Please post a query you're trying to tune and the EXPLAIN ANALYZE
output, as well as any changes you've already made in postgresql.conf
or configuration variables you've set in a particular session.
Without more information we can't give much advice other than to
make sure you're vacuuming and analyzing the tables often enough
to keep them from becoming bloated with dead rows and to keep the
statistics current, and to review a configuration checklist such
as this one:

http://www.powerpostgresql.com/PerfList

-- 
Michael Fuhr

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Nested Loop

2007-03-26 Thread Gauri Kanekar

Sorry,

this are the Confg Setting
max_connections = 100   # (change requires restart)
shared_buffers = 300MB
work_mem = 256MB
max_fsm_pages = 40
max_fsm_relations = 500
wal_buffers = 512
checkpoint_segments = 20
checkpoint_timeout = 900
enable_bitmapscan = on
enable_seqscan = off
enable_tidscan = on
random_page_cost = 2
cpu_index_tuple_cost = 0.001
effective_cache_size = 800MB
join_collapse_limit = 1 # JOINs
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8' # locale for system error
message
lc_monetary = 'en_US.UTF-8' # locale for monetary
formatting
lc_numeric = 'en_US.UTF-8'  # locale for number
formatting
lc_time = 'en_US.UTF-8' # locale for time formatting

all other are the default values.



QUERY PLAN
---
HashAggregate  (cost=1116330.73..1116432.34 rows=6774 width=128) (actual
time=438565.297..440455.386 rows=646881 loops=1)
  -  Hash Join  (cost=10802.93..1116093.64 rows=6774 width=128) (actual
time=1904.797..377717.036 rows=10438694 loops=1)
Hash Cond: (rm.ck = rc.k)
-  Hash Join  (cost=10651.73..1115840.83 rows=6774 width=105)
(actual time=1890.765..347169.113 rows=10438694 loops=1)
  Hash Cond: (rm.chk = rc.ky)
  -  Hash Join  (cost=9835.35..1114905.90 rows=6774 width=83)
(actual time=1873.463..317623.437 rows=10438694 loops=1)
Hash Cond: (rm.ckey = rc.k)
-  Hash Join  (cost=615.77..1105533.91 rows=6774
width=85) (actual time=1842.309..288198.666 rows=10438694 loops=1)
  Hash Cond: (rm.sk = rs.k)
  -  Hash Join  (cost=77.32..1104885.39 rows=6774
width=58) (actual time=1831.908..259147.154 rows=10438694 loops=1)
Hash Cond: (rm.advk = ra.k)
-  Nested Loop
(cost=0.00..1104714.83rows=6801 width=44) (actual time=
1820.153..229779.814 rows=10945938 loops=1)
  Join Filter: (rm.nk = rn.k)
  -  Index Scan using r_idx on rn
(cost=0.00..4.27 rows=1 width=4) (actual time=0.093..0.095 rows=1 loops=1)
Index Cond: (id = 607)
  -  Nested Loop  (cost=
0.00..1104370.50 rows=27205 width=48) (actual
time=7.920..202878.054rows=10945998 loops=1)
-  Index Scan using
rpts_ldt_idx on rd  (cost=0.00..4.27 rows=1 width=12) (actual time=
0.097..0.352 rows=30 loops=1)
  Index Cond: ((sdt =
'2006-12-01 00:00:00'::timestamp without time zone) AND (sd = '2006-12-30
00:00:00'::timestamp without time zone))
-  Index Scan using rmidx on
rm  (cost=0.00..1100192.24 rows=333919 width=44) (actual time=
3.109..5835.861 rows=364867 loops=30)
  Index Cond: (rmdkey =
rd.k)
-  Hash  (cost=68.15..68.15 rows=734
width=22) (actual time=11.692..11.692 rows=734 loops=1)
  -  Index Scan using radvki on radvt
(cost=0.00..68.15 rows=734 width=22) (actual time=9.112..10.517 rows=734
loops=1)
Filter: ((name)::text 
'SYSTEM'::text)
  -  Hash  (cost=500.35..500.35 rows=3048
width=35) (actual time=10.377..10.377 rows=3048 loops=1)
-  Index Scan using rskidx on rs  (cost=
0.00..500.35 rows=3048 width=35) (actual time=0.082..5.589 rows=3048
loops=1)
-  Hash  (cost=9118.63..9118.63 rows=8076 width=6)
(actual time=31.124..31.124 rows=8076 loops=1)
  -  Index Scan using rcridx on rcr  (cost=
0.00..9118.63 rows=8076 width=6) (actual time=2.036..19.218 rows=8076
loops=1)
  -  Hash  (cost=769.94..769.94 rows=3715 width=30) (actual
time=17.275..17.275 rows=3715 loops=1)
-  Index Scan using ridx on rcl
(cost=0.00..769.94rows=3715 width=30) (actual time=
4.238..11.432 rows=3715 loops=1)
-  Hash  (cost=120.38..120.38 rows=2466 width=31) (actual time=
14.010..14.010 rows=2466 loops=1)
  -  Index Scan using rckdx on rcpn
(cost=0.00..120.38rows=2466 width=31) (actual time=
4.564..9.926 rows=2466 loops=1)
Total runtime: 441153.878 ms
(32 rows)


we are using 8.2 version


On 3/26/07, Michael Fuhr [EMAIL PROTECTED] wrote:


On Mon, Mar 26, 2007 at 05:34:39PM +0530, Gauri Kanekar wrote:
 how to speedup nested loop queries and by which parameters.

Please post a query you're trying to tune and the EXPLAIN ANALYZE
output, as well as any changes you've already made in 

Re: [PERFORM] Nested Loop

2007-03-26 Thread Dave Dutcher
-Original Message-
From: [EMAIL PROTECTED] On Behalf Of Gauri Kanekar
Subject: Re: [PERFORM] Nested Loop

join_collapse_limit = 1 # JOINs 

Is there a reason you have this set to 1?  Postgres can't consider multiple
join orders when you do that.  I would try setting that back to the default
and seeing if this query is any faster.

Other than that it looked like the problems with the query might be bad
estimates of rows.  One is that postgres expects there to be 1 matching row
from rd when there are actually 30.  You might try increasing the statistics
targets on rd.sd and rd.sdt, reanalyzing, and seeing if that helps.  Also
postgres expects the join of rd and rm to return about 27205 rows when it
actually returns 10 million.  I'm not sure what you can do about that.
Maybe if Postgres gets a better estimate for rd it would then estimate the
join better.

Dave




---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Nested Loop

2007-03-26 Thread Ragnar
On mán, 2007-03-26 at 20:33 +0530, Gauri Kanekar wrote:

you did not show your query, nor did you answer whather you had vacuumed
and analyzed.

 enable_seqscan = off

why this? this is unlikely to help


 
 QUERY PLAN
 ...
  -  Nested Loop
 (cost=0.00..1104714.83 rows=6801 width=44) (actual
 time=1820.153..229779.814 rows=10945938 loops=1)

the estimates are way off here. you sure you have analyzed?

gnari

 


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Nested loop join and date range query

2006-05-03 Thread Ian Burrell

On 5/2/06, Tom Lane [EMAIL PROTECTED] wrote:

Ian Burrell [EMAIL PROTECTED] writes:
 We recently upgraded to PostgreSQL 8.1 from 7.4 and a few queries are
 having performance problems and running for very long times.  The
 commonality seems to be PostgreSQL 8.1 is choosing to use a nested
 loop join because it estimates there will be only be a single row.

We've already noted that there's a problem with estimating zero-width
ranges (too lazy to search the archives, but this has come up at least
twice recently).  Can you modify your app to generate something like

week = x and week  x+1

instead of

week = x and week = x



I am working on modifying the SQL generation code to replace the
zero-width range with an equals.

Does BETWEEN have the same bug?


?  My recollection is that the fix will probably be complicated
enough to not get back-patched into 8.1.

BTW, AFAIK the same problem exists in 7.4.  What kind of estimates/plans
were you getting for this case in 7.4?



We get similar rows=1 estimates on 7.4.  7.4 doesn't choose to use the
nested loop joins so it performs fine.

We have been getting similar rows=1 estimates and nested loop joins
with some other queries. But I think those are caused by not
frequently analyzing log type tables and then searching for recent
days which it doesn't think exist.

- Ian

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Nested loop join and date range query

2006-05-02 Thread Ian Burrell

We recently upgraded to PostgreSQL 8.1 from 7.4 and a few queries are
having performance problems and running for very long times.  The
commonality seems to be PostgreSQL 8.1 is choosing to use a nested
loop join because it estimates there will be only be a single row.
There are really thousands of rows and the nested loop version takes
much longer.

Even with the bad plan, the test query runs quickly.  The real query
is much more complicated and we have had to kill it after running for
24 hours.

SELECT
   MAX(titles.name) AS title_name,
   MAX(providers.short_name) AS provider_short_name,
   SUM(x.xtns) AS xtns,
   SUM(x.rev) AS rev
FROM xtns_by_mso_title_wk x
INNER JOIN providers providers ON x.provider_no = providers.provider_no
INNER JOIN titles titles ON x.title_no = titles.title_no
WHERE x.mso_no = 50
   AND x.week BETWEEN '20060423 00:00:00' AND  '20060423 00:00:00'
GROUP BY x.title_no, x.provider_no

The EXPLAIN ANALYZE looks like:

GroupAggregate  (cost=11.63..11.67 rows=1 width=61) (actual
time=1440.550..1467.602 rows=3459 loops=1)
  -  Sort  (cost=11.63..11.64 rows=1 width=61) (actual
time=1440.515..1446.634 rows=3934 loops=1)
Sort Key: x.title_no, x.provider_no
-  Nested Loop  (cost=0.00..11.62 rows=1 width=61) (actual
time=7.900..1422.686 rows=3934 loops=1)
  -  Nested Loop  (cost=0.00..7.38 rows=1 width=49)
(actual time=7.877..1373.392 rows=3934 loops=1)
-  Index Scan using unq_xtns_by_mso_title_wk on
xtns_by_mso_title_wk x  (cost=0.00..4.12 rows=1 width=26) (actual
time=7.827..1297.681 rows=3934 loops=1)
  Index Cond: ((week = '2006-04-23
00:00:00'::timestamp without time zone) AND (week = '2006-04-23
00:00:00'::timestamp without time zone) AND (mso_no = 50))
-  Index Scan using pk_titles on titles 
(cost=0.00..3.25 rows=1 width=27) (actual time=0.010..0.012 rows=1

loops=3934)
  Index Cond: (outer.title_no = titles.title_no)
  -  Index Scan using pk_providers on providers 
(cost=0.00..4.23 rows=1 width=16) (actual time=0.004..0.005 rows=1

loops=3934)
Index Cond: (outer.provider_no = providers.provider_no)

If it is searching over multiple weeks (week BETWEEN '20060417
00:00:00' AND  '20060423 00:00:00'), it estimates better and uses a
hash join.

GroupAggregate  (cost=7848.20..7878.48 rows=156 width=61) (actual
time=117.761..145.910 rows=3459 loops=1)
  -  Sort  (cost=7848.20..7852.08 rows=1552 width=61) (actual
time=117.735..123.823 rows=3934 loops=1)
Sort Key: x.title_no, x.provider_no
-  Hash Join  (cost=5.95..7765.94 rows=1552 width=61)
(actual time=6.539..102.825 rows=3934 loops=1)
  Hash Cond: (outer.provider_no = inner.provider_no)
  -  Nested Loop  (cost=0.00..7736.71 rows=1552
width=49) (actual time=5.117..86.980 rows=3934 loops=1)   
-  Index Scan using idx_xtns_by_mso_ti_wk_wk_mso_t on

xtns_by_mso_title_wk x  (cost=0.00..2677.04 rows=1552 width=26)
(actual time=5.085..18.065 rows=3934 loops=1)
  Index Cond: ((week = '2006-04-17
00:00:00'::timestamp without time zone) AND (week = '2006-04-23
00:00:00'::timestamp without time zone) AND (mso_no = 50))
-  Index Scan using pk_titles on titles 
(cost=0.00..3.25 rows=1 width=27) (actual time=0.006..0.010 rows=1

loops=3934)
  Index Cond: (outer.title_no = titles.title_no)
  -  Hash  (cost=5.16..5.16 rows=316 width=16) (actual
time=1.356..1.356 rows=325 loops=1)
-  Seq Scan on providers  (cost=0.00..5.16
rows=316 width=16) (actual time=0.008..0.691 rows=325 loops=1)

If the week range is replace by an equals (week = '20060423
00:00:00'), it also uses a hash join.  Unforuntately, the queries are
automatically generated and changing them to use an equals could be
problematic.

GroupAggregate  (cost=7828.75..7859.32 rows=157 width=61) (actual
time=98.330..125.370 rows=3459 loops=1)
  -  Sort  (cost=7828.75..7832.67 rows=1567 width=61) (actual
time=98.303..104.055 rows=3934 loops=1)
Sort Key: x.title_no, x.provider_no
-  Hash Join  (cost=5.95..7745.60 rows=1567 width=61)
(actual time=1.785..83.830 rows=3934 loops=1)
  Hash Cond: (outer.provider_no = inner.provider_no)
  -  Nested Loop  (cost=0.00..7716.14 rows=1567
width=49) (actual time=0.170..68.338 rows=3934 loops=1)   
-  Index Scan using idx_xtns_by_mso_ti_wk_wk_mso_t on

xtns_by_mso_title_wk x  (cost=0.00..2607.56 rows=1567 width=26)
(actual time=0.138..11.993 rows=3934 loops=1)
  Index Cond: ((week = '2006-04-23
00:00:00'::timestamp without time zone) AND (mso_no = 50))
-  Index Scan using pk_titles on titles 
(cost=0.00..3.25 rows=1 width=27) (actual time=0.006..0.008 rows=1

loops=3934)
  Index Cond: (outer.title_no = titles.title_no)
  -  Hash  

Re: [PERFORM] Nested loop join and date range query

2006-05-02 Thread Tom Lane
Ian Burrell [EMAIL PROTECTED] writes:
 We recently upgraded to PostgreSQL 8.1 from 7.4 and a few queries are
 having performance problems and running for very long times.  The
 commonality seems to be PostgreSQL 8.1 is choosing to use a nested
 loop join because it estimates there will be only be a single row.

  -  Index Scan using unq_xtns_by_mso_title_wk on
 xtns_by_mso_title_wk x  (cost=0.00..4.12 rows=1 width=26) (actual
 time=7.827..1297.681 rows=3934 loops=1)
Index Cond: ((week = '2006-04-23
 00:00:00'::timestamp without time zone) AND (week = '2006-04-23
 00:00:00'::timestamp without time zone) AND (mso_no = 50))

We've already noted that there's a problem with estimating zero-width
ranges (too lazy to search the archives, but this has come up at least
twice recently).  Can you modify your app to generate something like

week = x and week  x+1

instead of

week = x and week = x

?  My recollection is that the fix will probably be complicated
enough to not get back-patched into 8.1.

BTW, AFAIK the same problem exists in 7.4.  What kind of estimates/plans
were you getting for this case in 7.4?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Nested Loop trouble : Execution time increases more

2005-09-22 Thread Simon Riggs
On Sat, 2005-09-17 at 17:47 +0200, Antoine Bajolet wrote:

 There are more tables around, but the heart of  the search engine is 
 made of three tables :
 
 fiches (f_id int4, f_title varchar)  52445 rows
 engine (f_id int4, k_id int4, weight )11761700 rows
 keywords(k_id, keyword)1072600 rows
 
 A fiche is linked to any kind of document.
 The engine table counts how many times a keyword appears in a document.
 
 A query to search on one or two keywords is quick to execute (the 
 front-end creates thoses queries):
 

 Is there a specific reason the planner chooses this way ?

Yes, you have an additional join for each new keyword, so there is more
work to do.

Recode your SQL with an IN subselect that retrieves all possible
keywords before it accesses the larger table.

That way you should have only one join for each new keyword.

 Can whe do something on the postgresql configuration to avoid this ?
 Can whe force the planner to use a hash join as it does for the first 
 joins ?

Not required, IMHO.

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Nested Loop trouble : Execution time increases more 1000 time (long)

2005-09-22 Thread Tom Lane
Antoine Bajolet [EMAIL PROTECTED] writes:
 We are using postgresql in a search engine on an intranet handling 
 throusand of documents.
 But we ave a big problem when users use more than two search key.

I think you need to increase the statistics targets for your keywords
table --- the estimates of numbers of matching rows are much too small:

 -  Index Scan using keyword_pattern_key on keywords 
 k2  (cost=0.00..3.51 rows=1 width=4) (actual time=0.078..1.887 rows=75 
 loops=1)
   Index Cond: (((keyword)::text ~=~ 
 'exploitation'::character varying) AND ((keyword)::text ~~ 
 'exploitatioo'::character varying))
   Filter: ((keyword)::text ~~ 'exploitation%'::text)

A factor-of-75 error is quite likely to mislead the planner into
choosing a bad join plan.

BTW, have you looked into using a real full-text-search engine (eg,
tsearch2) instead of rolling your own like this?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Nested Loop trouble : Execution time increases more

2005-09-22 Thread Antoine Bajolet

Hello,

Tom Lane a écrit :


Antoine Bajolet [EMAIL PROTECTED] writes:
 

We are using postgresql in a search engine on an intranet handling 
throusand of documents.

But we ave a big problem when users use more than two search key.
   



I think you need to increase the statistics targets for your keywords
table --- the estimates of numbers of matching rows are much too small:
 

What value you think i could put into a ALTER TABLE SET STATISTICS 
statment ?


Also, the solution given by Simon Riggs works well.
quote

Recode your SQL with an IN subselect that retrieves all possible 
keywords before it accesses the larger table.

/quote

But i will try the old ones increasing the statistics parameter and compare 
performance.


 

   -  Index Scan using keyword_pattern_key on keywords 
k2  (cost=0.00..3.51 rows=1 width=4) (actual time=0.078..1.887 rows=75 
loops=1)
 Index Cond: (((keyword)::text ~=~ 
'exploitation'::character varying) AND ((keyword)::text ~~ 
'exploitatioo'::character varying))

 Filter: ((keyword)::text ~~ 'exploitation%'::text)
   



A factor-of-75 error is quite likely to mislead the planner into
choosing a bad join plan.

BTW, have you looked into using a real full-text-search engine (eg,
tsearch2) instead of rolling your own like this?
 


It seems a quite good contrib, but...
The first version of this search engine was developped in 2000... 
tsearch2 nor tsearch existed at this time.
Also, there are some developpement works around this search engine 
(pertinence algorithm, filtering with users rights, ponderating keywords 
with specific rules to each type of document, etc.) and adapting all to 
work in the similar way with tsearch2 seems to be a bit heavy.
At the end, each document indexed are quite big and the choosen method 
reduces disk storage : 1 Go of text content traduces to ~100 Mo of table 
space.


Best Regards,
Antoine Bajolet



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Nested Loop trouble : Execution time increases more

2005-09-22 Thread Antoine Bajolet

Re,

With modifing parameters like this :

ALTER TABLE keywords ALTER keyword SET STATISTICS 100;
ALTER TABLE keywords ALTER k_id SET STATISTICS 100;
ALTER TABLE engine ALTER k_id SET STATISTICS 100;
ALTER TABLE engine ALTER f_id SET STATISTICS 100;

vacuuming both tables
and rewriting the queries using sub-selects :

select count (distinct f.f_id) as results
FROM
fiches f
INNER JOIN (SELECT distinct f_id FROM keywords,engine WHERE engine.k_id 
= keywords.k_id AND keyword like 'exploitation%') as e1 USING(f_id)
INNER JOIN (SELECT distinct f_id FROM keywords,engine WHERE engine.k_id 
= keywords.k_id AND keyword like 'maintenance%') as e2 USING(f_id)
INNER JOIN (SELECT distinct f_id FROM keywords,engine WHERE engine.k_id 
= keywords.k_id AND keyword like 'numerique%') as e3 USING(f_id)


The query time is less than 600 ms, and increases only a little adding 
more keywords.


Thanks to Tom Lane and Simon Riggs.

Best regards,
Antoine Bajolet

Antoine Bajolet a écrit :


Hello,

Tom Lane a écrit :


Antoine Bajolet [EMAIL PROTECTED] writes:
 

We are using postgresql in a search engine on an intranet handling 
throusand of documents.

But we ave a big problem when users use more than two search key.
  



I think you need to increase the statistics targets for your keywords
table --- the estimates of numbers of matching rows are much too small:
 

What value you think i could put into a ALTER TABLE SET STATISTICS 
statment ?


Also, the solution given by Simon Riggs works well.
quote

Recode your SQL with an IN subselect that retrieves all possible 
keywords before it accesses the larger table.

/quote

But i will try the old ones increasing the statistics parameter and 
compare performance.





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Nested Loop trouble : Execution time increases more 1000 time (long)

2005-09-17 Thread Antoine Bajolet

Hello,

We are using postgresql in a search engine on an intranet handling 
throusand of documents.

But we ave a big problem when users use more than two search key.

There are more tables around, but the heart of  the search engine is 
made of three tables :


fiches (f_id int4, f_title varchar)  52445 rows
engine (f_id int4, k_id int4, weight )11761700 rows
keywords(k_id, keyword)1072600 rows

A fiche is linked to any kind of document.
The engine table counts how many times a keyword appears in a document.

A query to search on one or two keywords is quick to execute (the 
front-end creates thoses queries):


-
select count (distinct f.f_id) as results
FROM
fiches f
INNER JOIN engine e1 INNER JOIN keywords k1 USING (k_id) USING (f_id)
INNER JOIN engine e2 INNER JOIN keywords k2 USING (k_id) USING (f_id)

WHERE TRUE
AND k1.keyword like 'maintenance%'
AND k2.keyword like 'exploitation%'
;

QUERY PLAN
Aggregate  (cost=3953.00..3953.00 rows=1 width=4) (actual 
time=525.243..525.243 rows=1 loops=1)
 -  Nested Loop  (cost=1974.79..3952.99 rows=1 width=4) (actual 
time=211.570..513.758 rows=6879 loops=1)
   -  Hash Join  (cost=1974.79..3949.62 rows=1 width=8) (actual 
time=211.483..389.340 rows=6879 loops=1)

 Hash Cond: (outer.f_id = inner.f_id)
 -  Nested Loop  (cost=0.00..1974.76 rows=11 width=4) 
(actual time=0.132..155.499 rows=9520 loops=1)
   -  Index Scan using keyword_pattern_key on keywords 
k2  (cost=0.00..3.51 rows=1 width=4) (actual time=0.078..1.887 rows=75 
loops=1)
 Index Cond: (((keyword)::text ~=~ 
'exploitation'::character varying) AND ((keyword)::text ~~ 
'exploitatioo'::character varying))

 Filter: ((keyword)::text ~~ 'exploitation%'::text)
   -  Index Scan using k_id_key on engine e2  
(cost=0.00..1954.93 rows=1306 width=8) (actual time=0.049..1.842 
rows=127 loops=75)

 Index Cond: (e2.k_id = outer.k_id)
 -  Hash  (cost=1974.76..1974.76 rows=11 width=4) (actual 
time=211.203..211.203 rows=0 loops=1)
   -  Nested Loop  (cost=0.00..1974.76 rows=11 
width=4) (actual time=0.296..197.590 rows=11183 loops=1)
 -  Index Scan using keyword_pattern_key on 
keywords k1  (cost=0.00..3.51 rows=1 width=4) (actual time=0.189..1.351 
rows=73 loops=1)
   Index Cond: (((keyword)::text ~=~ 
'maintenance'::character varying) AND ((keyword)::text ~~ 
'maintenancf'::character varying))
   Filter: ((keyword)::text ~~ 
'maintenance%'::text)
 -  Index Scan using k_id_key on engine e1  
(cost=0.00..1954.93 rows=1306 width=8) (actual time=0.029..2.406 
rows=153 loops=73)

   Index Cond: (e1.k_id = outer.k_id)
   -  Index Scan using fiches_pkey on fiches f  (cost=0.00..3.36 
rows=1 width=4) (actual time=0.013..0.014 rows=1 loops=6879)

 Index Cond: (f.f_id = outer.f_id)
Total runtime: 525.511 ms
--


But when there are three keywords or more, the planner chooses to 
perform a very costly nested loop :


--
select count (distinct f.f_id) as results
FROM
fiches f
INNER JOIN engine e1 INNER JOIN keywords k1 USING (k_id) USING (f_id)
INNER JOIN engine e2 INNER JOIN keywords k2 USING (k_id) USING (f_id)
INNER JOIN engine e3 INNER JOIN keywords k3 USING (k_id) USING (f_id)

WHERE TRUE
AND k1.keyword like 'maintenance%'
AND k2.keyword like 'exploitation%'
AND k3.keyword like 'numerique%'
;

QUERY PLAN
Aggregate  (cost=5927.90..5927.90 rows=1 width=4) (actual 
time=673048.168..673048.169 rows=1 loops=1)
 -  Nested Loop  (cost=1974.79..5927.90 rows=1 width=4) (actual 
time=1853.789..673038.065 rows=2929 loops=1)
   -  Nested Loop  (cost=1974.79..5924.52 rows=1 width=12) (actual 
time=1853.719..672881.725 rows=2929 loops=1)

 Join Filter: (inner.f_id = outer.f_id)
 -  Hash Join  (cost=1974.79..3949.62 rows=1 width=8) 
(actual time=198.845..441.947 rows=6879 loops=1)

   Hash Cond: (outer.f_id = inner.f_id)
   -  Nested Loop  (cost=0.00..1974.76 rows=11 
width=4) (actual time=0.129..199.895 rows=9520 loops=1)
 -  Index Scan using keyword_pattern_key on 
keywords k2  (cost=0.00..3.51 rows=1 width=4) (actual time=0.077..1.918 
rows=75 loops=1)
   Index Cond: (((keyword)::text ~=~ 
'exploitation'::character varying) AND ((keyword)::text ~~ 
'exploitatioo'::character varying))
   Filter: ((keyword)::text ~~ 
'exploitation%'::text)
 -  Index Scan using k_id_key on engine e2  
(cost=0.00..1954.93 rows=1306 width=8) (actual 

Re: [PERFORM] Nested loop question

2003-12-17 Thread Richard Huxton
On Tuesday 16 December 2003 17:06, Nick Fankhauser - Doxpop wrote:
 Hi-

 I'm trying to optimize a query that I *think* should run very fast.
 Essentially, I'm joining two tables that have very selective indexes and
 constraining the query on an indexed field. (There's a third small lookup
 table in the mix, but it doesn't really affect the bottom line.)

 I'm unsure what is happening next. I notice that an index scan is occurring
 on actor_summary_pk, with an actual time of 9.15, but then it looks like
 a nested loop occurs at the next level to join these tables. Does this mean
 that each probe of the actor_summary index will take 9.15 msec, but the
 nested loop is going to do this once for each actor_id?

That's right - you need to multiply the actual time by the number of loops. In 
your case this would seem to be about 33 seconds.

  -  Index Scan using actor_summary_pk on
 actor_summary  (cost=0.00..8.11 rows=1 width=72) (actual time=9.14..9.15
 rows=1 loops=3639)
Index Cond: (outer.actor_id =
 actor_summary.actor_id)

 The nested loop appears to be where most of my time is going, so I'm
 focusing on this area, but don't know if there is a better approach to this
 join.

 Is there a more efficient means than a nested loop to handle such a join?
 Would a different method be chosen if there was exactly one row in
 actor_summary for every row in actor?

Hmm - tricky to say in your case. PG has decided to filter on actor then look 
up the corresponding values in actor_summary. Given that you have 3 million 
rows in both tables that seems a reasonable approach. You could always try 
forcing different plans by switching the various ENABLE_HASHJOIN etc options 
(see the runtime configuration section of the manuals). I'm not sure that 
will help you here though.

The fact that it's taking you 9ms to do each index lookup suggests to me that 
it's going to disk each time. Does that sound plausible, or do you think you 
have enough RAM to cache your large indexes?

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Nested loop performance

2003-12-17 Thread Nick Fankhauser
 It seems that your basic problem is that you're fetching lots of rows
 from two big ol' tables.

 It doesn't seem to me that there would be a substantially better plan
 for this query with your tables as they stand.

That's more or less the conclusion I had come to. I was just hoping someone
else could point out an approach I've been missing. (sigh!)



 If your data were more
 normalised, then your big scans might be quicker (because their rows
 would be smaller so they would hit fewer disk pages),

This started off as a 5-table join on well-normalized data. Unfortunately,
the actor table doesn't get any smaller, and the work involved in
calculating the case_count information on the fly was clearly becoming a
problem- particularly with actors that had a heavy caseload. (Busy attorneys
and judges.) The actor_summary approach makes these previous problem cases
go away, but the payback is that (as you correctly pointed out) queries on
average citizens who only have one case suffer from the de-normalized
approach.

We're currently considering the approach of just returning all of the rows
to our application, and doing the aggregation and limit work in the app. The
inconsistency of the data makes it very tough for the query planner to come
up with an strategy that is always a winner.

Thanks for your thoughts!

-Nick



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] Nested loop performance

2003-12-16 Thread Nick Fankhauser
Hi-

I'm trying to optimize a query that I *think* should run very fast.
Essentially, I'm joining two tables that have very selective indexes and
constraining the query on an indexed field. (There's a third small lookup
table in the mix, but it doesn't really affect the bottom line.)

actor is a table containing roughly 3 million rows with an index on
actor_full_name_uppercase and a unique index on actor_id.

actor_summary also contains roughly 3 million rows. Its PK is a unique
combined index on (actor_id, county_id, case_disp_global_code).

The vast majority of the rows in actor correspond to a single row in
actor_summary I'd estimate this at 95% or more. The remaining actors with
multiple records generally have two corresponding rows in actor summary.
Actor summary was created as a performance enhancer, where we can store some
pre-calculated values such as the number of court cases an actor is involved
in.

The constraint is applied first, with reasonable speed. In the example
below, it takes about 15 seconds to gather the matches in actor.

I'm unsure what is happening next. I notice that an index scan is occurring
on actor_summary_pk, with an actual time of 9.15, but then it looks like a
nested loop occurs at the next level to join these tables. Does this mean
that each probe of the actor_summary index will take 9.15 msec, but the
nested loop is going to do this once for each actor_id?

The nested loop appears to be where most of my time is going, so I'm
focusing on this area, but don't know if there is a better approach to this
join.

Is there a more efficient means than a nested loop to handle such a join?
Would a different method be chosen if there was exactly one row in
actor_summary for every row in actor?

-Nick

The query  explain analyze:


alpha=#
alpha=#
alpha=# explain analyze
alpha-# select
alpha-# min(actor.actor_id) as actor_id,
alpha-# min(actor.actor_entity_type) as actor_entity_type,
alpha-# min(actor.role_class_code) as role_class_code,
alpha-# min(actor.actor_full_name) as actor_full_name,
alpha-# min(actor.actor_person_date_of_birth) as
actor_person_date_of_birth,
alpha-# min(actor.actor_entity_acronym) as actor_entity_acronym,
alpha-# min(actor.actor_person_last_name) as actor_person_last_name,
alpha-# min(actor.actor_person_first_name) as actor_person_first_name,
alpha-# min(actor.actor_person_middle_name) as actor_person_middle_name,
alpha-# min(actor.actor_person_name_suffix) as actor_person_name_suffix,
alpha-# min(actor.actor_person_place_of_birth) as
actor_person_place_of_birth,
alpha-# min(actor.actor_person_height) as actor_person_height,
alpha-# min(actor.actor_person_height_unit) as actor_person_height_unit,
alpha-# min(actor.actor_person_weight) as actor_person_weight,
alpha-# min(actor.actor_person_weight_unit) as actor_person_weight_unit,
alpha-# min(actor.actor_person_ethnicity) as actor_person_ethnicity,
alpha-# min(actor.actor_person_citizenship_count) as
actor_person_citizenship_count,
alpha-# min(actor.actor_person_hair_color) as actor_person_hair_color,
alpha-# min(actor.actor_person_scars_marks_tatto) as
actor_person_scars_marks_tatto,
alpha-# min(actor.actor_person_marital_status) as
actor_person_marital_status,
alpha-# min(actor.actor_alias_for_actor_id) as actor_alias_for_actor_id,
alpha-# min(to_char(data_source.source_last_update, 'MM/DD/ HH12:MI
AM TZ')) as last_update,
alpha-# min(actor_summary.single_case_public_id) as case_public_id,
alpha-# min(actor_summary.single_case_id) as case_id,
alpha-# sum(actor_summary.case_count)as case_count
alpha-#   from
alpha-# actor,
alpha-# actor_summary,
alpha-# data_source
alpha-#   where
alpha-# actor.actor_id = actor_summary.actor_id
alpha-# and data_source.source_id = actor.source_id
alpha-# and actor_full_name_uppercase like upper('sanders%')
alpha-#   group by
alpha-# actor.actor_id
alpha-#   order by
alpha-# min(actor.actor_full_name_uppercase),
alpha-# case_count desc,
alpha-# min(actor_summary.case_disp_global_code)
alpha-#   limit
alpha-# 1000
alpha-# ;



QUERY PLAN


---
 Limit  (cost=2555.58..2555.59 rows=1 width=547) (actual
time=48841.76..48842.90 rows=1000 loops=1)
   -  Sort  (cost=2555.58..2555.59 rows=1 width=547) (actual
time=48841.76..48842.18 rows=1001 loops=1)
 Sort Key: min((actor.actor_full_name_uppercase)::text),
sum(actor_summary.case_count),
min((actor_summary.case_disp_global_code)::text)
 -  Aggregate  (cost=2555.50..2555.57 rows=1 width=547) (actual
time=48604.17..48755.28 rows=3590 loops=1)
   -  Group  (cost=2555.50..2555.50 rows=1 width=547) (actual
time=48604.04..48647.91 rows=3594 loops=1)
 -  Sort  

Re: [PERFORM] Nested loop performance

2003-12-16 Thread Stephan Szabo

On Tue, 16 Dec 2003, Nick Fankhauser wrote:

 Is there a more efficient means than a nested loop to handle such a join?
 Would a different method be chosen if there was exactly one row in
 actor_summary for every row in actor?

As a question, what does explain analyze give you if you
set enable_nestloop=false; before trying the query?

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org