[PERFORM] Nested Loop vs Hash Join based on predicate?
I have the following queries: EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING) select[…] from f_calc_service a11, d_patient_typea12 where a11.d_patient_pop_id in (336) and a11.d_patient_type_id = a12.id and a12.short_name = 'I' group by a11.d_rate_schedule_id, a11.d_payer_id, a11.d_patient_pop_id, a11.d_patient_type_id ; And EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING) select […] from f_calc_service a11, d_patient_typea12 where a11.d_patient_pop_id in (336) and a11.d_patient_type_id = a12.id and a12.short_name = 'O' group by a11.d_rate_schedule_id, a11.d_payer_id, a11.d_patient_pop_id, a11.d_patient_type_id ; Making this one change from short_name = ‘I’ to short_name = ‘O’ changes the query execution from 200k ms to 280ms. The first one chooses a Nested Loop, the second chooses a hash join. How do I get them both to choose the same? There are no values for d_patient_pop_id in (336) and short_name = ‘I’. Thanks! Dan
Re: [PERFORM] Nested Loop vs Hash Join based on predicate?
2016-03-16 21:23 GMT+01:00 Doiron, Daniel: > I have the following queries: > > EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING) > select[…] > from f_calc_service a11, > d_patient_typea12 > where a11.d_patient_pop_id in (336) > and a11.d_patient_type_id = a12.id > and a12.short_name = 'I' > group by a11.d_rate_schedule_id, > a11.d_payer_id, > a11.d_patient_pop_id, > a11.d_patient_type_id > ; > > And > > EXPLAIN (ANALYZE, VERBOSE, COSTS, BUFFERS, TIMING) > select […] > from f_calc_service a11, > d_patient_typea12 > where a11.d_patient_pop_id in (336) > and a11.d_patient_type_id = a12.id > and a12.short_name = 'O' > group by a11.d_rate_schedule_id, > a11.d_payer_id, > a11.d_patient_pop_id, > a11.d_patient_type_id > ; > > Making this one change from short_name = ‘I’ to short_name = ‘O’ changes > the query execution from 200k ms to 280ms. The first one chooses a Nested > Loop, the second chooses a hash join. How do I get them both to choose the > same? There are no values for d_patient_pop_id in (336) and short_name = > ‘I’. > we don't see plans, so it is blind shot, Probably the estimation for 'I' value is pretty underestimated - so planner choose nested loop. The reasons can be different - possible correlation inside data for example. You can try: 0) ensure so your statistic are current - run statement ANALYZE a) increase statistic by statement ALTER TABLE xx ALTER COLUMN yyy SET STATISTICS some number b) penalize nested loop - statement SET enable_nestloop TO off; Regards Pavel > > Thanks! > > Dan > > >
[PERFORM] Nested loop issue
Hi All, I have been looking for a solution to a problem where my query is executing for a long time because it is running into a nested loop problem. I have done explain analyze and it shows the query taking a very long time due to nested loops. On the DB side, there are indices in place for all the required columns. By setting nested loop off there is a drastic increase in performance (from 40,000 ms to 600 ms) but I know this is not a right practice. My postgres version is 9.3.2 on linux. Please find the link for the query plan below : http://explain.depesz.com/s/l9o Also, find below the query that is being executed. SELECT DISTINCT Sektion/Fachbereich.parent, Studienfach.ltxt, SUM(CASE WHEN Studiengang.faktor IS NOT NULL AND Studiengang.faktor = 0 THEN Studiengang.faktor * Studierende.summe ELSE Studierende.summe END) FROM ( SELECT sos_stg_aggr.tid_stg, sos_stg_aggr.ca12_staat, sos_stg_aggr.geschlecht, sos_stg_aggr.alter, sos_stg_aggr.hzbart, sos_stg_aggr.hmkfzkz, sos_stg_aggr.hmkfz, sos_stg_aggr.semkfzkz, sos_stg_aggr.semkfz, sos_stg_aggr.hzbkfzkz, sos_stg_aggr.hzbkfz, sos_stg_aggr.hrst, sos_stg_aggr.studiengang_nr, sos_stg_aggr.fach_nr, sos_stg_aggr.fach_sem_zahl, sos_stg_aggr.sem_rueck_beur_ein, sos_stg_aggr.kz_rueck_beur_ein, sos_stg_aggr.klinsem, sos_stg_aggr.hssem, sos_stg_aggr.stuart, sos_stg_aggr.stutyp, sos_stg_aggr.stufrm, sos_stg_aggr.stichtag, sos_stg_aggr.summe, sos_stg_aggr.hzbart_int, sos_stg_aggr.matrikel_nr, sos_stg_aggr.ch27_grund_beurl, sos_stg_aggr.ch62_grund_exmatr, sos_stg_aggr.hzbnote, textcat(sos_stg_aggr.studiengang_nr::text, sos_stg_aggr.fach_nr::text) AS koepfe_faelle FROM sos_stg_aggr union all SELECT sos_stg_aggr.tid_stg, sos_stg_aggr.ca12_staat, sos_stg_aggr.geschlecht, sos_stg_aggr.alter, sos_stg_aggr.hzbart, sos_stg_aggr.hmkfzkz, sos_stg_aggr.hmkfz, sos_stg_aggr.semkfzkz, sos_stg_aggr.semkfz, sos_stg_aggr.hzbkfzkz, sos_stg_aggr.hzbkfz, sos_stg_aggr.hrst, sos_stg_aggr.studiengang_nr, sos_stg_aggr.fach_nr, sos_stg_aggr.fach_sem_zahl, sos_stg_aggr.sem_rueck_beur_ein, sos_stg_aggr.kz_rueck_beur_ein, sos_stg_aggr.klinsem, sos_stg_aggr.hssem, sos_stg_aggr.stuart, sos_stg_aggr.stutyp, sos_stg_aggr.stufrm, sos_stg_aggr.stichtag, sos_stg_aggr.summe, sos_stg_aggr.hzbart_int, sos_stg_aggr.matrikel_nr, sos_stg_aggr.ch27_grund_beurl, sos_stg_aggr.ch62_grund_exmatr, sos_stg_aggr.hzbnote, '21' AS koepfe_faelle FROM sos_stg_aggr where sos_stg_aggr.tid_stg in (select distinct lehr_stg_ab_tid from lehr_stg_ab2fb) ) AS Studierende INNER JOIN ( select astat::integer, trim(druck) as druck from sos_k_status ) AS Rückmeldestatus ON ( Studierende.kz_rueck_beur_ein = Rückmeldestatus.astat ) INNER JOIN ( select tid, trim(name) as name from sos_stichtag ) AS Stichtag ON ( Studierende.stichtag = Stichtag.tid ) INNER JOIN ( select abschluss, kz_fach, stg, pversion, regel, trim(text) as text, fb, lehr, anteil, tid,null as faktor from lehr_stg_ab where lehr_stg_ab.tid not in (select lehr_stg_ab_tid from lehr_stg_ab2fb) union select abschluss, kz_fach, stg, pversion, regel, trim(text) as text, lehr_stg_ab2fb.fb, lehr, anteil, tid,faktor from lehr_stg_ab inner join lehr_stg_ab2fb on lehr_stg_ab2fb.lehr_stg_ab_tid = lehr_stg_ab.tid ) AS Studiengang ON ( Studierende.tid_stg = Studiengang.tid ) INNER JOIN ( select astat, astfr, astgrp, fb, trim(ltxt) as ltxt, stg from k_stg ) AS Studienfach ON ( Studiengang.stg = Studienfach.stg ) AND ( Studienfach.ltxt IS NOT NULL ) INNER JOIN ( select instnr, ch110_institut, btrim(druck) as druck, btrim(parent) as parent from unikn_k_fb ) AS Sektion/Fachbereich ON ( Studiengang.fb = Sektion/Fachbereich.instnr ) INNER JOIN ( select apnr, trim(druck) as druck from cifx where key=613 ) AS Hörerstatus ON ( Studierende.hrst = Hörerstatus.apnr ) WHERE ( Sektion/Fachbereich.druck = 'FB Biologie' ) AND ( ( Hörerstatus.druck = 'Haupthörer/in' AND Stichtag.name = 'Amtl. Statistik Land' AND Rückmeldestatus.druck IN ('Beurlaubung', 'Ersteinschreibung', 'Neueinschreibung', 'Rückmeldung') AND Studierende.sem_rueck_beur_ein = 20132 ) ) GROUP BY Sektion/Fachbereich.parent, Studienfach.ltxt According to my analysis, the where clause after the Union All is taking a lot of time for execution. Any help with an alternative way to represent the query or what the cause of issue would be very helpful. Thanks in advance, Manoj -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Nested loop issue
REPLACE -- where sos_stg_aggr.tid_stg in (select distinct lehr_stg_ab_tid from lehr_stg_ab2fb) WITH -- where sos_stg_aggr.tid_stg EXISTS (select distinct lehr_stg_ab_tid from lehr_stg_ab2fb) Similarly others also like -- lehr_stg_ab.tid not in (select lehr_stg_ab_tid from lehr_stg_ab2fb) with NOT EXISTS This should surely going to improve performance depending on results from inner query. Regards Dhananjay OpenSCG On Tuesday, 8 April 2014 3:06 PM, Manoj Gadi manoj.g...@uni-konstanz.de wrote: Hi All, I have been looking for a solution to a problem where my query is executing for a long time because it is running into a nested loop problem. I have done explain analyze and it shows the query taking a very long time due to nested loops. On the DB side, there are indices in place for all the required columns. By setting nested loop off there is a drastic increase in performance (from 40,000 ms to 600 ms) but I know this is not a right practice. My postgres version is 9.3.2 on linux. Please find the link for the query plan below : http://explain.depesz.com/s/l9o Also, find below the query that is being executed. SELECT DISTINCT Sektion/Fachbereich.parent, Studienfach.ltxt, SUM(CASE WHEN Studiengang.faktor IS NOT NULL AND Studiengang.faktor = 0 THEN Studiengang.faktor * Studierende.summe ELSE Studierende.summe END) FROM ( SELECT sos_stg_aggr.tid_stg, sos_stg_aggr.ca12_staat, sos_stg_aggr.geschlecht, sos_stg_aggr.alter, sos_stg_aggr.hzbart, sos_stg_aggr.hmkfzkz, sos_stg_aggr.hmkfz, sos_stg_aggr.semkfzkz, sos_stg_aggr.semkfz, sos_stg_aggr.hzbkfzkz, sos_stg_aggr.hzbkfz, sos_stg_aggr.hrst, sos_stg_aggr.studiengang_nr, sos_stg_aggr.fach_nr, sos_stg_aggr.fach_sem_zahl, sos_stg_aggr.sem_rueck_beur_ein, sos_stg_aggr.kz_rueck_beur_ein, sos_stg_aggr.klinsem, sos_stg_aggr.hssem, sos_stg_aggr.stuart, sos_stg_aggr.stutyp, sos_stg_aggr.stufrm, sos_stg_aggr.stichtag, sos_stg_aggr.summe, sos_stg_aggr.hzbart_int, sos_stg_aggr.matrikel_nr, sos_stg_aggr.ch27_grund_beurl, sos_stg_aggr.ch62_grund_exmatr, sos_stg_aggr.hzbnote, textcat(sos_stg_aggr.studiengang_nr::text, sos_stg_aggr.fach_nr::text) AS koepfe_faelle FROM sos_stg_aggr union all SELECT sos_stg_aggr.tid_stg, sos_stg_aggr.ca12_staat, sos_stg_aggr.geschlecht, sos_stg_aggr.alter, sos_stg_aggr.hzbart, sos_stg_aggr.hmkfzkz, sos_stg_aggr.hmkfz, sos_stg_aggr.semkfzkz, sos_stg_aggr.semkfz, sos_stg_aggr.hzbkfzkz, sos_stg_aggr.hzbkfz, sos_stg_aggr.hrst, sos_stg_aggr.studiengang_nr, sos_stg_aggr.fach_nr, sos_stg_aggr.fach_sem_zahl, sos_stg_aggr.sem_rueck_beur_ein, sos_stg_aggr.kz_rueck_beur_ein, sos_stg_aggr.klinsem, sos_stg_aggr.hssem, sos_stg_aggr.stuart, sos_stg_aggr.stutyp, sos_stg_aggr.stufrm, sos_stg_aggr.stichtag, sos_stg_aggr.summe, sos_stg_aggr.hzbart_int, sos_stg_aggr.matrikel_nr, sos_stg_aggr.ch27_grund_beurl, sos_stg_aggr.ch62_grund_exmatr, sos_stg_aggr.hzbnote, '21' AS koepfe_faelle FROM sos_stg_aggr where sos_stg_aggr.tid_stg in (select distinct lehr_stg_ab_tid from lehr_stg_ab2fb) ) AS Studierende INNER JOIN ( select astat::integer, trim(druck) as druck from sos_k_status ) AS Rückmeldestatus ON ( Studierende.kz_rueck_beur_ein = Rückmeldestatus.astat ) INNER JOIN ( select tid, trim(name) as name from sos_stichtag ) AS Stichtag ON ( Studierende.stichtag = Stichtag.tid ) INNER JOIN ( select abschluss, kz_fach, stg, pversion, regel, trim(text) as text, fb, lehr, anteil, tid,null as faktor from lehr_stg_ab where lehr_stg_ab.tid not in (select lehr_stg_ab_tid from lehr_stg_ab2fb) union select abschluss, kz_fach, stg, pversion, regel, trim(text) as text, lehr_stg_ab2fb.fb, lehr, anteil, tid,faktor from lehr_stg_ab inner join lehr_stg_ab2fb on lehr_stg_ab2fb.lehr_stg_ab_tid = lehr_stg_ab.tid ) AS Studiengang ON ( Studierende.tid_stg = Studiengang.tid ) INNER JOIN ( select astat, astfr, astgrp, fb, trim(ltxt) as ltxt, stg from k_stg ) AS Studienfach ON ( Studiengang.stg = Studienfach.stg ) AND ( Studienfach.ltxt IS NOT NULL ) INNER JOIN ( select instnr, ch110_institut, btrim(druck) as druck, btrim(parent) as parent from unikn_k_fb ) AS Sektion/Fachbereich ON ( Studiengang.fb = Sektion/Fachbereich.instnr ) INNER JOIN ( select apnr, trim(druck) as druck from cifx where key=613 ) AS Hörerstatus ON ( Studierende.hrst = Hörerstatus.apnr ) WHERE ( Sektion/Fachbereich.druck = 'FB Biologie' ) AND ( ( Hörerstatus.druck = 'Haupthörer/in' AND Stichtag.name = 'Amtl. Statistik Land' AND Rückmeldestatus.druck IN ('Beurlaubung', 'Ersteinschreibung', 'Neueinschreibung', 'Rückmeldung') AND Studierende.sem_rueck_beur_ein = 20132 ) ) GROUP BY Sektion/Fachbereich.parent, Studienfach.ltxt According to my analysis, the where clause after the Union All is taking a lot of time for execution. Any help with an alternative way to represent the query or what the cause of issue would be very helpful. Thanks in
Re: [PERFORM] Nested loop and simple join query - slow after upgrade to 9.2
On Fri, Jan 25, 2013 at 7:34 AM, alexandre - aldeia digital adald...@gmail.com wrote: Hi, Last weekend, we upgrade a PG from 8.4 to 9.2 version (full pg_dump/restore/vacuum/analyze). After this, some simple join querys became very slow, maybe because the use of nested loops. Bellow, an example with nestedloop on and off: What happens if you bump up default_statistics_target by a factor of 10 or 100 and redo the analyze? Here it is finding 39 times more rows than expected: Index Scan using ad_pc13t3_modpadrao on pc13t3 t1 (cost=0.00..6.21 rows=1 width=65) (actual time=0.090..0.252 rows=39 loops=1) It would interesting to know why that is. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Nested loop and simple join query - slow after upgrade to 9.2
Em 25-01-2013 16:29, Jeff Janes escreveu: On Fri, Jan 25, 2013 at 7:34 AM, alexandre - aldeia digital adald...@gmail.com wrote: Hi, Last weekend, we upgrade a PG from 8.4 to 9.2 version (full pg_dump/restore/vacuum/analyze). After this, some simple join querys became very slow, maybe because the use of nested loops. Bellow, an example with nestedloop on and off: What happens if you bump up default_statistics_target by a factor of 10 or 100 and redo the analyze? Before send the e-mail, the default_statistics_target was 500 and I return to 100 (default). I will try to set 1000. Here it is finding 39 times more rows than expected: Index Scan using ad_pc13t3_modpadrao on pc13t3 t1 (cost=0.00..6.21 rows=1 width=65) (actual time=0.090..0.252 rows=39 loops=1) It would interesting to know why that is. This is a partial index: ad_pc13t3_modpadrao btree (pc13emp08p, pc13anoped, pc13codped, pc13codigo, pc13cor, pc13empins, pc13emp08, pc13tipin2, pc13insest) WHERE pc13emp08p = 0 AND pc13anoped = 0 AND pc13codped = 0 AND pc13item = 0 I can't connect to databse now. I will retry tests in sunday. Best regards, Alexandre -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Nested loop Query performance on PK
Hello, Le 26/07/09 7:09, Greg Caulton a écrit : On Sun, Jul 26, 2009 at 1:02 AM, Greg Caulton caulton...@gmail.com mailto:caulton...@gmail.com wrote: Hello, It seems to me that the following query should be a lot faster. This runs in 17 seconds (regardless how many times I run it) select ac.* from application_controls_view ac, refs r where ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%XYZ%'; [...] Does this seem right to you? Anything I can tune ? [...] Oh it seems to be the join that is throwing it off, because this runs in 600 ms select ac.* from application_controls_view ac where ac.application_control_id in ( 5745, 5760, [...] 50021066, 50020808 ) never mind, makes sense now - its fixed [...] The following rewritten query may be satisfiable for the generic case of using arbitrary LIKE pattern for refs.ref_key and performing in a short acceptable time as well: SELECT ac.* FROM application_controls_view AS ac INNER JOIN ( SELECT ref_id FROM refs WHERE ref_key LIKE '%XYZ%' ) AS r ON ac.custom_controller_ref_id = r.ref_id; The hint is to build a subquery, from refs table, and to move in the WHERE clause that only refers to refs column (ref_key here). This subquery results in a shorter table than the original (refs here), thence reducing the number of joins to perform with ac (no matter working with view or original table). Regards. -- nha / Lyon / France. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Nested loop Query performance on PK
Hello, It seems to me that the following query should be a lot faster. This runs in 17 seconds (regardless how many times I run it) select ac.* from application_controls_view ac, refs r where ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%XYZ%'; if I do not use the view the query runs in under 100 ms select ac.* from application_controls ac, refs r where ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%XYZ%'; The view is SELECT t.action_form_type_ref_id, r1.display AS action_form_type_display, t.action_order_type_ref_id, r2.display AS action_order_type_display, t.action_view_ref_id, r3.display AS action_view_display, t.active_ind, t.application_control_id, t.application_control_name, t.application_view_id, t.background_color, t.background_processing_ind, t.base_model_type_ref_id, r4.display AS base_model_type_display, t.base_value_script, t.class_instantiate_script, t.class_name, t.combo_add_to_list_ind, t.combo_category_ref_id, r5.display AS combo_category_display, t.combo_dynamic_search_ind, t.combo_filter_ref_id, r6.display AS combo_filter_display, t.combo_group_ref_id, r7.display AS combo_group_display, t.combo_short_display_ind, t.combo_values_term_id, t.comparison_operator_ref_id, r8.display AS comparison_operator_display, t.context_ref_id, r9.display AS context_display, t.control_description, t.control_format, t.control_format_ref_id, r10.display AS snip for brevity t.parameter_ref_id = r30.ref_id AND t.parameter_source_ref_id = r31.ref_id AND t.record_item_ref_id = r32.ref_id AND t.repeating_section_view_ref_id = r33.ref_id AND t.report_print_ref_id = r34.ref_id AND t.right_arrow_action_ref_id = r35.ref_id AND t.right_click_action_ref_id = r36.ref_id AND t.section_view_ref_id = r37.ref_id AND t.select_action_ref_id = r38.ref_id AND t.source_ref_id = r39.ref_id AND t.state_field_type_ref_id = r40.ref_id AND t.table_access_ref_id = r41.ref_id AND t.update_user_ref_id = r42.ref_id AND t.value_data_type_ref_id = r43.ref_id; so basically it joins 43 times to the refs table on the primary key the explain confirms the nested loops {NESTLOOP :startup_cost 2660771.70 :total_cost 3317979.85 :plan_rows 27 :plan_width 4708 :targetlist ( {TARGETENTRY :expr {VAR :varno 65001 :varattno 29 :vartype 20 :vartypmod -1 :varlevelsup 0 :varnoold 5 snip for brevity - Index Scan using refs_pk on refs r17 (cost=0.00..5.45 rows=1 width=50) Index Cond: (r17.ref_id = t.detail_record_item_ref_id) - Index Scan using refs_pk on refs r1 (cost=0.00..5.45 rows=1 width=50) Index Cond: (r1.ref_id = t.action_form_type_ref_id) - Index Scan using refs_pk on refs r (cost=0.00..5.45 rows=1 width=8) Index Cond: (r.ref_id = t.custom_controller_ref_id) Filter: ((r.ref_key)::text ~~ '%ERNEST%'::text) I did a vacuum analyze and so the primary key (indexes of course) is being used. But the above query is still 17s. If I dont return so many columns it comes down to around 10 seconds. select ac.application_control_id from application_controls_view ac, refs r where ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%ERNEST%'; But in either case this is only 37 rows. So 1554 lookups on a unique index on a table of 34000 rows means 6ms per internal join - note that many of those values are the same. Does this seem right to you? Anything I can tune ? -- Gregory Caulton Principal at PatientOS Inc. personal email: caulton...@gmail.com http://www.patientos.com corporate: (888)-NBR-1EMR || fax 857.241.3022
Re: [PERFORM] Nested loop Query performance on PK
On Sun, Jul 26, 2009 at 1:02 AM, Greg Caulton caulton...@gmail.com wrote: Hello, It seems to me that the following query should be a lot faster. This runs in 17 seconds (regardless how many times I run it) select ac.* from application_controls_view ac, refs r where ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%XYZ%'; if I do not use the view the query runs in under 100 ms select ac.* from application_controls ac, refs r where ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%XYZ%'; The view is SELECT t.action_form_type_ref_id, r1.display AS action_form_type_display, t.action_order_type_ref_id, r2.display AS action_order_type_display, t.action_view_ref_id, r3.display AS action_view_display, t.active_ind, t.application_control_id, t.application_control_name, t.application_view_id, t.background_color, t.background_processing_ind, t.base_model_type_ref_id, r4.display AS base_model_type_display, t.base_value_script, t.class_instantiate_script, t.class_name, t.combo_add_to_list_ind, t.combo_category_ref_id, r5.display AS combo_category_display, t.combo_dynamic_search_ind, t.combo_filter_ref_id, r6.display AS combo_filter_display, t.combo_group_ref_id, r7.display AS combo_group_display, t.combo_short_display_ind, t.combo_values_term_id, t.comparison_operator_ref_id, r8.display AS comparison_operator_display, t.context_ref_id, r9.display AS context_display, t.control_description, t.control_format, t.control_format_ref_id, r10.display AS snip for brevity t.parameter_ref_id = r30.ref_id AND t.parameter_source_ref_id = r31.ref_id AND t.record_item_ref_id = r32.ref_id AND t.repeating_section_view_ref_id = r33.ref_id AND t.report_print_ref_id = r34.ref_id AND t.right_arrow_action_ref_id = r35.ref_id AND t.right_click_action_ref_id = r36.ref_id AND t.section_view_ref_id = r37.ref_id AND t.select_action_ref_id = r38.ref_id AND t.source_ref_id = r39.ref_id AND t.state_field_type_ref_id = r40.ref_id AND t.table_access_ref_id = r41.ref_id AND t.update_user_ref_id = r42.ref_id AND t.value_data_type_ref_id = r43.ref_id; so basically it joins 43 times to the refs table on the primary key the explain confirms the nested loops {NESTLOOP :startup_cost 2660771.70 :total_cost 3317979.85 :plan_rows 27 :plan_width 4708 :targetlist ( {TARGETENTRY :expr {VAR :varno 65001 :varattno 29 :vartype 20 :vartypmod -1 :varlevelsup 0 :varnoold 5 snip for brevity - Index Scan using refs_pk on refs r17 (cost=0.00..5.45 rows=1 width=50) Index Cond: (r17.ref_id = t.detail_record_item_ref_id) - Index Scan using refs_pk on refs r1 (cost=0.00..5.45 rows=1 width=50) Index Cond: (r1.ref_id = t.action_form_type_ref_id) - Index Scan using refs_pk on refs r (cost=0.00..5.45 rows=1 width=8) Index Cond: (r.ref_id = t.custom_controller_ref_id) Filter: ((r.ref_key)::text ~~ '%ERNEST%'::text) I did a vacuum analyze and so the primary key (indexes of course) is being used. But the above query is still 17s. If I dont return so many columns it comes down to around 10 seconds. select ac.application_control_id from application_controls_view ac, refs r where ac.custom_controller_ref_id = r.ref_id and r.ref_key like '%ERNEST%'; But in either case this is only 37 rows. So 1554 lookups on a unique index on a table of 34000 rows means 6ms per internal join - note that many of those values are the same. Does this seem right to you? Anything I can tune ? -- Gregory Caulton Principal at PatientOS Inc. personal email: caulton...@gmail.com http://www.patientos.com corporate: (888)-NBR-1EMR || fax 857.241.3022 Oh it seems to be the join that is throwing it off, because this runs in 600 ms select ac.* from application_controls_view ac where ac.application_control_id in ( 5745, 5760, 5759, 5758, 5757, 5756, 5753, 5751, 5750, 5749, 5748, 5746, 5744, 50001328, 5752, 5754, 5755, 50002757, 50002756, 50002755, 50002754, 50001168, 50020825, 50021077, 50020821, 50020822, 50020824, 50020823, 50020820, 50020819, 50020809, 50020810, 50020806, 50020807, 50020817, 50021066, 50020808 ) never mind, makes sense now - its fixed -- Gregory Caulton Principal at PatientOS Inc. personal email: caulton...@gmail.com http://www.patientos.com corporate: (888)-NBR-1EMR || fax 857.241.3022
Re: [PERFORM] Nested Loop Killer on 8.1
Dave North dno...@signiant.com writes: The outstanding question here is why does the explain analyze take (quite a bit) longer than just executing the query? EXPLAIN ANALYZE has nontrivial measurement overhead, especially on platforms with slow gettimeofday(). Old/cheap PC hardware, in particular, tends to suck in this respect. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Nested Loop Killer on 8.1
Dave, Is there further optimizations we can do to change the plan? Is this perhaps addressed in a later release? Given the left joins, a later release might help; I know we did a lot to improve left join plans in 8.3. It would be worth testing if you can test an upgrade easily. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Nested Loop Killer on 8.1
On Wed, Jun 24, 2009 at 1:43 PM, Dave Northdno...@signiant.com wrote: Essentially, we're seeing a query plan that is taking 95 secs with a nested loop execution plan and 1 sec with a merge join plan. We've tried increasing the default_statistics_target to 1000 and re-analyzed but the same query plan is returned. If we then force nested loops off (set enable_nestloop=false), the optimizer chooses the better plan and execution is under 1 second. Default explain plan: http://explain.depesz.com/s/a3 (execution time 95secs) Nested loops off plan: http://explain.depesz.com/s/JV (execution time ~ 1sec) The planner is coming up with a bad estimate for the number of rows matching this filter: Filter: ((prop_key)::text ~~ 'location_node_directory_outbox'::text) Which is coming from this condition: AND web_user_property_directory_outbox.prop_key like 'location_node_directory_outbox' Why use like for a constant string with no % or _ characters? If you used = the planner might be able to come up with a better estimate. That said I suspect Dave's right that your best course of action would be to update to 8.3 or wait a couple weeks and update to 8.4 when it comes out. Regardless you *really* want to update your 8.1.8 install to the latest bug-fix release (currently 8.1.17). That's not an upgrade and won't need a dump/reload. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Nested Loop Killer on 8.1
On 06/25/2009 04:36 PM, Greg Stark wrote: AND web_user_property_directory_outbox.prop_key like 'location_node_directory_outbox' Why use like for a constant string with no % or _ characters? If you used = the planner might be able to come up with a better estimate Any reason why like with a constant string without % or _ is not optimized to = today? Cheers, mark -- Mark Mielkem...@mielke.cc
Re: [PERFORM] Nested Loop Killer on 8.1
Greg Stark gsst...@mit.edu writes: On Wed, Jun 24, 2009 at 1:43 PM, Dave Northdno...@signiant.com wrote: Why use like for a constant string with no % or _ characters? If you used = the planner might be able to come up with a better estimate. Uh, it appears to me the string *does* contain _ characters; perhaps the OP has neglected to escape those? The planner does know enough to estimate LIKE with a fixed pattern as being equivalent to =. I think it knew that even back in 8.1, but am too lazy to look right now. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Nested Loop Killer on 8.1
On Thu, Jun 25, 2009 at 10:05 PM, Tom Lanet...@sss.pgh.pa.us wrote: Uh, it appears to me the string *does* contain _ characters; perhaps the OP has neglected to escape those? Sigh. Indeed. -- greg http://mit.edu/~gsstark/resume.pdf -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Nested Loop Killer on 8.1
Morning all, A colleague here tried to post this yesterday but it was stalled for some reason. Anyway, here's what we're seeing which hopefully someone has some pointers for. Essentially, we're seeing a query plan that is taking 95 secs with a nested loop execution plan and 1 sec with a merge join plan. We've tried increasing the default_statistics_target to 1000 and re-analyzed but the same query plan is returned. If we then force nested loops off (set enable_nestloop=false), the optimizer chooses the better plan and execution is under 1 second. Default explain plan: http://explain.depesz.com/s/a3 http://explain.depesz.com/s/a3 (execution time 95secs) Nested loops off plan: http://explain.depesz.com/s/JV http://explain.depesz.com/s/JV (execution time ~ 1sec) We're currently running 8.1.8 (yeah, we know it's old skool but it's embedded as part of an application) so the real questions are: Is there further optimizations we can do to change the plan? Is this perhaps addressed in a later release? Some postgresql.conf settings that might be useful: effective_cache_size 511082 shared_buffers 3 work_mem 4096 random_page_cost4 join_collapse_limit 8 and of course, the query in question that generates the plan: SELECT web_user_type, web_user.web_user_id as id, cast(web_user_property_node.prop_val as numeric) as node_id , node_name, last_name || ', ' || first_name as name, web_user_property_directory_inbox.prop_val as directory_location_inbox, web_user_property_directory_outbox.prop_val as directory_location_outbox, username, first_name, last_name, email FROM web_user LEFT JOIN web_user_property as web_user_property_directory_outbox ON web_user.web_user_id = web_user_property_directory_outbox.web_user_id AND web_user_property_directory_outbox.prop_key like 'location_node_directory_outbox', web_user_property, web_user_property as web_user_property_directory_inbox, web_user_property as web_user_property_node, node WHERE web_user.web_user_id = web_user_property_directory_inbox.web_user_id AND web_user.web_user_id = web_user_property.web_user_id AND web_user_property.prop_key = 'location_node_enabled' AND web_user_property.prop_val = 'true' AND web_user_property_directory_inbox.prop_key like 'location_node_directory_inbox' AND web_user.web_user_id = web_user_property_node.web_user_id AND web_user_property_node.prop_key like 'location_node_id' AND web_user_property_node.prop_val = node.node_id AND (first_name ilike '%' OR last_name ilike '%' OR last_name || ',' || first_name ilike '%') AND node.node_id IN ( SELECT node_id FROM node_execute WHERE acl_web_user_id = 249) AND web_user.web_user_id IN ( SELECT web_user_id FROM web_user_read WHERE acl_web_user_id = 249 OR web_user_id IN ( SELECT member_id FROM web_user_grp_member WHERE web_user_id IN( SELECT acl_web_user_id FROM web_user_read WHERE web_user_id IN (SELECT web_user_id FROM web_user_grp_member WHERE member_id = 249 ORDER BY name; Thanks in advance Dave Dave North dno...@signiant.com
Re: [PERFORM] Nested Loop join being improperly chosen
I had a similar problem here: http://archives.postgresql.org/pgsql-bugs/2008-07/msg00026.php Is the nested loop performing a LEFT join with yours? It's a little difficult to tell just from the query plan you showed. A work around for mine was to use a full outer join and eliminate the extra rows in the where clause. A bit of a hack but it changed a 2 min query into one that ran in under a second. Of course this is not helping with your problem but at least may trigger some more feedback. David. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brad Ediger Sent: 22 August 2008 16:26 To: pgsql-performance@postgresql.org Subject: [PERFORM] Nested Loop join being improperly chosen Hello, I'm having trouble with a Nested Loop being selected for a rather complex query; it turns out this is a pretty bad plan as the nested loop's row estimates are quite off (1 estimated / 1207881 actual). If I disable enable_nestloop, the query executes much faster (42 seconds instead of 605). The tables in the query have all been ANALYZEd just before generating these plans. Here are the plans with and without enable_nestloop: http://pastie.org/258043 The inventory table is huge; it currently has about 1.3 x 10^9 tuples. The items table has around 10,000 tuples, and the other tables in the query are tiny. Any ideas or suggestions would be greatly appreciated. Thanks! -- Brad Ediger -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Nested Loop join being improperly chosen
Hello, I'm having trouble with a Nested Loop being selected for a rather complex query; it turns out this is a pretty bad plan as the nested loop's row estimates are quite off (1 estimated / 1207881 actual). If I disable enable_nestloop, the query executes much faster (42 seconds instead of 605). The tables in the query have all been ANALYZEd just before generating these plans. Here are the plans with and without enable_nestloop: http://pastie.org/258043 The inventory table is huge; it currently has about 1.3 x 10^9 tuples. The items table has around 10,000 tuples, and the other tables in the query are tiny. Any ideas or suggestions would be greatly appreciated. Thanks! -- Brad Ediger smime.p7s Description: S/MIME cryptographic signature
[PERFORM] Nested loop vs merge join: inconsistencies between estimated and actual time
I've came across this issue while writing report-like query for 2 not very large tables. I've tried several methods to resolve this one (see below). But now I'm really stuck... PostgreSQL 8.3, default configuration There are 2 tables (structure was simplified to show only problematic place): create table c ( id bigint primary key cdate date ); create index c_cdate_idx on c (cdate); create table i ( id bigint primary key, id_c bigint references c(id) ); select count(*) from c count 636 565 select count(*) from i count 4 646 145 analyze i; analyze c; explain analyze select id from c join i on i.idc = c.id where c.cdate between '2007-02-01' and '2007-02-16' QUERY PLAN - Merge Join (cost=738.95..57864.63 rows=14479 width=8) (actual time=13954.681..14358.731 rows=14583 loops=1) Merge Cond: (i.idc = c.id) - Index Scan using fki_i_c_fk on i (cost=0.00..194324.34 rows=4646145 width=8) (actual time=17.254..12061.414 rows=1042599 loops=1) - Sort (cost=738.94..756.88 rows=7178 width=8) (actual time=53.942..75.013 rows=14583 loops=1) Sort Key: c.id Sort Method: quicksort Memory: 404kB - Index Scan using c_cdate_idx on c (cost=0.00..279.21 rows=7178 width=8) (actual time=23.595..41.470 rows=7064 loops=1) Index Cond: ((cdate = '2007-02-01'::date) AND (cdate = '2007-02-16'::date)) Total runtime: 14379.461 ms set enable_mergejoin to off; set enable_hashjoin to off; QUERY PLAN -- Nested Loop (cost=0.00..59833.70 rows=14479 width=8) (actual time=0.129..153.038 rows=14583 loops=1) - Index Scan using c_cdate_idx on c (cost=0.00..279.21 rows=7178 width=8) (actual time=0.091..14.468 rows=7064 loops=1) Index Cond: ((cdate = '2007-02-01'::date) AND (cdate = '2007-02-16'::date)) - Index Scan using fki_i_c_fk on i (cost=0.00..8.13 rows=13 width=8) (actual time=0.007..0.011 rows=2 loops=7064) Index Cond: (i.idc = c.id) Total runtime: 172.599 ms Ok, the first problem is here: - Index Scan using fki_i_c_fk on i (cost=0.00..8.13 rows=13 width=8) (actual time=0.007..0.011 rows=2 loops=7064) I collected statistics for these tables at level 1000 for all columns. select attname, null_frac, avg_width, n_distinct, correlation from pg_stats where tablename = 'i' attname null_frac avg_width n_distinct correlation -- -- - -- id 0 8 -1 0,849796295166 idc0,7236369848251343 8 95583 0,999763011932373 Nice stats except of n_distinct for idc column. select count(distinct idc) from i count 633 864 Of course it is not correct solution but... update pg_statistic set stadistinct = 633864 where starelid = ... and staattnum = ... Reconnect and execute: explain analyze select id from c join i on i.idc = c.id where c.cdate between '2007-02-01' and '2007-02-16' QUERY PLAN -- Nested Loop (cost=0.00..57342.39 rows=14479 width=8) (actual time=0.133..151.426 rows=14583 loops=1) - Index Scan using c_cdate_idx on c (cost=0.00..279.21 rows=7178 width=8) (actual time=0.094..14.242 rows=7064 loops=1) Index Cond: ((cdate = '2007-02-01'::date) AND (cdate = '2007-02-16'::date)) - Index Scan using fki_i_c_fk on i (cost=0.00..7.92 rows=2 width=8) (actual time=0.007..0.011 rows=2 loops=7064) Index Cond: (i.idc = c.id) Total runtime: 170.911 ms But the reason of this issue is not the incorrect value of n_distinct. Let's expand dates interval in WHERE clause. explain analyze select id from c join i on i.idc = c.id where c.cdate between '2007-02-01' and '2007-02-19' QUERY PLAN Merge Join (cost=831.16..57981.98 rows=16155 width=8) (actual time=11691.156..12155.201 rows=16357 loops=1) Merge Cond: (i.idc = c.id) - Index Scan using fki_i_c_fk on i (cost=0.00..194324.34 rows=4646145 width=8) (actual time=22.236..9928.489 rows=1044373 loops=1) - Sort (cost=831.15..851.17 rows=8009 width=8) (actual time=31.660..55.277 rows=16357 loops=1) Sort Key: c.id Sort Method: quicksort Memory: 438kB - Index Scan using c_cdate_idx on c (cost=0.00..311.87 rows=8009 width=8) (actual time=0.116..17.050 rows=7918 loops=1) Index Cond: ((cdate = '2007-02-01'::date) AND (cdate =
Re: [PERFORM] Nested loop vs merge join: inconsistencies between estimated and actual time
Vlad Arkhipov [EMAIL PROTECTED] writes: I've came across this issue while writing report-like query for 2 not very large tables. I've tried several methods to resolve this one (see below). But now I'm really stuck... It looks like you are wishing to optimize for all-in-memory situations, in which case the traditional advice is to reduce random_page_cost to something close to 1. AFAICS all the rowcount estimates you're seeing are spot on, or as close to spot on as you could realistically hope for, and so the problem lies with the cost parameters. Fooling with the statistics is not going to help if the rowcount estimates are already good. (Note: the apparent undercounts you're seeing on indexscans on the outer side of a mergejoin seem to be because the mergejoin terminates early due to limited range of the other input join key. The planner is expecting this, as we can see because the predicted cost of the join is actually much less than the predicted cost of running the input indexscan to completion. The cost ratio is about consistent with the rowcount ratio, which makes me think it got these right too.) regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Nested Loop
Hi, here is the query SELECT rs.id AS sid, rs.name AS sname, rc.id AS campid, rc.name AS campname, rc.rev_type AS revtype, rc.act_type AS actntype, ra.id AS advid, ra.name AS advname, rpt_chn.id AS chanid, rpt_chn.name AS channame, rpt_cre.dn AS dn, SUM(rm.imdel) AS impression, SUM(rm.cdel) AS click, rd.sqldate AS date FROM rm, rn CROSS JOIN rd, ra, rs, rc, rpt_chn, rpt_cre WHERE rm.date_key = rd.key AND rm.net_key = rn.key AND rm.adv_key = ra.key AND rm.camp_key = rc.key AND rm.s_key = rs.key AND rm.chn_key = rpt_chn.key AND rm.cre_key = rpt_cre.key AND ra.name != 'SYSTEM' AND rd.sqldate BETWEEN '12/1/2006' AND '12/30/2006' AND ( rn.id IN ( 607 ) ) GROUP BY rd.sqldate, rs.id, rs.name, ra.id, ra.name, rc.id, rc.name, rc.rev_type, rc.act_type, rpt_chn.id, rpt_chn.name, rpt_cre.dn; On 3/26/07, Ragnar [EMAIL PROTECTED] wrote: On mán, 2007-03-26 at 20:33 +0530, Gauri Kanekar wrote: you did not show your query, nor did you answer whather you had vacuumed and analyzed. enable_seqscan = off why this? this is unlikely to help QUERY PLAN ... - Nested Loop (cost=0.00..1104714.83 rows=6801 width=44) (actual time=1820.153..229779.814 rows=10945938 loops=1) the estimates are way off here. you sure you have analyzed? gnari -- Regards Gauri
Re: [PERFORM] Nested Loop
On þri, 2007-03-27 at 16:13 +0530, Gauri Kanekar wrote: SELECT rs.id AS sid, rs.name AS sname, rc.id AS campid, rc.name AS campname, rc.rev_type AS revtype, rc.act_type AS actntype, ra.id AS advid, ra.name AS advname, rpt_chn.id AS chanid, rpt_chn.name AS channame, rpt_cre.dn AS dn, SUM(rm.imdel) AS impression, SUM(rm.cdel) AS click, rd.sqldate AS date FROM rm, rn CROSS JOIN rd, ra, rs, rc, rpt_chn, rpt_cre WHERE rm.date_key = rd.key AND rm.net_key = rn.key AND rm.adv_key = ra.key AND rm.camp_key = rc.key AND rm.s_key = rs.key AND rm.chn_key = rpt_chn.key AND rm.cre_key = rpt_cre.key AND ra.name != 'SYSTEM' AND rd.sqldate BETWEEN '12/1/2006' AND '12/30/2006' AND ( rn.id IN ( 607 ) ) GROUP BY rd.sqldate , rs.id, rs.name, ra.id, ra.name, rc.id, rc.name, rc.rev_type , rc.act_type, rpt_chn.id, rpt_chn.name, rpt_cre.dn; you did not answer other questions, so do this: 1) VACUUM ANALYZE your database 2) set these in your postgresql.conf: enable_seqscan = true join_collapse_limit = 8 3) restart postgresql 4) do the EXPLAIN ANALYZE again, and send us it's output gnari ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Nested Loop
Hi List, how to speedup nested loop queries and by which parameters. -- Regards Gauri
Re: [PERFORM] Nested Loop
On Mon, Mar 26, 2007 at 05:34:39PM +0530, Gauri Kanekar wrote: how to speedup nested loop queries and by which parameters. Please post a query you're trying to tune and the EXPLAIN ANALYZE output, as well as any changes you've already made in postgresql.conf or configuration variables you've set in a particular session. Without more information we can't give much advice other than to make sure you're vacuuming and analyzing the tables often enough to keep them from becoming bloated with dead rows and to keep the statistics current, and to review a configuration checklist such as this one: http://www.powerpostgresql.com/PerfList -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Nested Loop
Sorry, this are the Confg Setting max_connections = 100 # (change requires restart) shared_buffers = 300MB work_mem = 256MB max_fsm_pages = 40 max_fsm_relations = 500 wal_buffers = 512 checkpoint_segments = 20 checkpoint_timeout = 900 enable_bitmapscan = on enable_seqscan = off enable_tidscan = on random_page_cost = 2 cpu_index_tuple_cost = 0.001 effective_cache_size = 800MB join_collapse_limit = 1 # JOINs datestyle = 'iso, mdy' lc_messages = 'en_US.UTF-8' # locale for system error message lc_monetary = 'en_US.UTF-8' # locale for monetary formatting lc_numeric = 'en_US.UTF-8' # locale for number formatting lc_time = 'en_US.UTF-8' # locale for time formatting all other are the default values. QUERY PLAN --- HashAggregate (cost=1116330.73..1116432.34 rows=6774 width=128) (actual time=438565.297..440455.386 rows=646881 loops=1) - Hash Join (cost=10802.93..1116093.64 rows=6774 width=128) (actual time=1904.797..377717.036 rows=10438694 loops=1) Hash Cond: (rm.ck = rc.k) - Hash Join (cost=10651.73..1115840.83 rows=6774 width=105) (actual time=1890.765..347169.113 rows=10438694 loops=1) Hash Cond: (rm.chk = rc.ky) - Hash Join (cost=9835.35..1114905.90 rows=6774 width=83) (actual time=1873.463..317623.437 rows=10438694 loops=1) Hash Cond: (rm.ckey = rc.k) - Hash Join (cost=615.77..1105533.91 rows=6774 width=85) (actual time=1842.309..288198.666 rows=10438694 loops=1) Hash Cond: (rm.sk = rs.k) - Hash Join (cost=77.32..1104885.39 rows=6774 width=58) (actual time=1831.908..259147.154 rows=10438694 loops=1) Hash Cond: (rm.advk = ra.k) - Nested Loop (cost=0.00..1104714.83rows=6801 width=44) (actual time= 1820.153..229779.814 rows=10945938 loops=1) Join Filter: (rm.nk = rn.k) - Index Scan using r_idx on rn (cost=0.00..4.27 rows=1 width=4) (actual time=0.093..0.095 rows=1 loops=1) Index Cond: (id = 607) - Nested Loop (cost= 0.00..1104370.50 rows=27205 width=48) (actual time=7.920..202878.054rows=10945998 loops=1) - Index Scan using rpts_ldt_idx on rd (cost=0.00..4.27 rows=1 width=12) (actual time= 0.097..0.352 rows=30 loops=1) Index Cond: ((sdt = '2006-12-01 00:00:00'::timestamp without time zone) AND (sd = '2006-12-30 00:00:00'::timestamp without time zone)) - Index Scan using rmidx on rm (cost=0.00..1100192.24 rows=333919 width=44) (actual time= 3.109..5835.861 rows=364867 loops=30) Index Cond: (rmdkey = rd.k) - Hash (cost=68.15..68.15 rows=734 width=22) (actual time=11.692..11.692 rows=734 loops=1) - Index Scan using radvki on radvt (cost=0.00..68.15 rows=734 width=22) (actual time=9.112..10.517 rows=734 loops=1) Filter: ((name)::text 'SYSTEM'::text) - Hash (cost=500.35..500.35 rows=3048 width=35) (actual time=10.377..10.377 rows=3048 loops=1) - Index Scan using rskidx on rs (cost= 0.00..500.35 rows=3048 width=35) (actual time=0.082..5.589 rows=3048 loops=1) - Hash (cost=9118.63..9118.63 rows=8076 width=6) (actual time=31.124..31.124 rows=8076 loops=1) - Index Scan using rcridx on rcr (cost= 0.00..9118.63 rows=8076 width=6) (actual time=2.036..19.218 rows=8076 loops=1) - Hash (cost=769.94..769.94 rows=3715 width=30) (actual time=17.275..17.275 rows=3715 loops=1) - Index Scan using ridx on rcl (cost=0.00..769.94rows=3715 width=30) (actual time= 4.238..11.432 rows=3715 loops=1) - Hash (cost=120.38..120.38 rows=2466 width=31) (actual time= 14.010..14.010 rows=2466 loops=1) - Index Scan using rckdx on rcpn (cost=0.00..120.38rows=2466 width=31) (actual time= 4.564..9.926 rows=2466 loops=1) Total runtime: 441153.878 ms (32 rows) we are using 8.2 version On 3/26/07, Michael Fuhr [EMAIL PROTECTED] wrote: On Mon, Mar 26, 2007 at 05:34:39PM +0530, Gauri Kanekar wrote: how to speedup nested loop queries and by which parameters. Please post a query you're trying to tune and the EXPLAIN ANALYZE output, as well as any changes you've already made in
Re: [PERFORM] Nested Loop
-Original Message- From: [EMAIL PROTECTED] On Behalf Of Gauri Kanekar Subject: Re: [PERFORM] Nested Loop join_collapse_limit = 1 # JOINs Is there a reason you have this set to 1? Postgres can't consider multiple join orders when you do that. I would try setting that back to the default and seeing if this query is any faster. Other than that it looked like the problems with the query might be bad estimates of rows. One is that postgres expects there to be 1 matching row from rd when there are actually 30. You might try increasing the statistics targets on rd.sd and rd.sdt, reanalyzing, and seeing if that helps. Also postgres expects the join of rd and rm to return about 27205 rows when it actually returns 10 million. I'm not sure what you can do about that. Maybe if Postgres gets a better estimate for rd it would then estimate the join better. Dave ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Nested Loop
On mán, 2007-03-26 at 20:33 +0530, Gauri Kanekar wrote: you did not show your query, nor did you answer whather you had vacuumed and analyzed. enable_seqscan = off why this? this is unlikely to help QUERY PLAN ... - Nested Loop (cost=0.00..1104714.83 rows=6801 width=44) (actual time=1820.153..229779.814 rows=10945938 loops=1) the estimates are way off here. you sure you have analyzed? gnari ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Nested loop join and date range query
On 5/2/06, Tom Lane [EMAIL PROTECTED] wrote: Ian Burrell [EMAIL PROTECTED] writes: We recently upgraded to PostgreSQL 8.1 from 7.4 and a few queries are having performance problems and running for very long times. The commonality seems to be PostgreSQL 8.1 is choosing to use a nested loop join because it estimates there will be only be a single row. We've already noted that there's a problem with estimating zero-width ranges (too lazy to search the archives, but this has come up at least twice recently). Can you modify your app to generate something like week = x and week x+1 instead of week = x and week = x I am working on modifying the SQL generation code to replace the zero-width range with an equals. Does BETWEEN have the same bug? ? My recollection is that the fix will probably be complicated enough to not get back-patched into 8.1. BTW, AFAIK the same problem exists in 7.4. What kind of estimates/plans were you getting for this case in 7.4? We get similar rows=1 estimates on 7.4. 7.4 doesn't choose to use the nested loop joins so it performs fine. We have been getting similar rows=1 estimates and nested loop joins with some other queries. But I think those are caused by not frequently analyzing log type tables and then searching for recent days which it doesn't think exist. - Ian ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Nested loop join and date range query
We recently upgraded to PostgreSQL 8.1 from 7.4 and a few queries are having performance problems and running for very long times. The commonality seems to be PostgreSQL 8.1 is choosing to use a nested loop join because it estimates there will be only be a single row. There are really thousands of rows and the nested loop version takes much longer. Even with the bad plan, the test query runs quickly. The real query is much more complicated and we have had to kill it after running for 24 hours. SELECT MAX(titles.name) AS title_name, MAX(providers.short_name) AS provider_short_name, SUM(x.xtns) AS xtns, SUM(x.rev) AS rev FROM xtns_by_mso_title_wk x INNER JOIN providers providers ON x.provider_no = providers.provider_no INNER JOIN titles titles ON x.title_no = titles.title_no WHERE x.mso_no = 50 AND x.week BETWEEN '20060423 00:00:00' AND '20060423 00:00:00' GROUP BY x.title_no, x.provider_no The EXPLAIN ANALYZE looks like: GroupAggregate (cost=11.63..11.67 rows=1 width=61) (actual time=1440.550..1467.602 rows=3459 loops=1) - Sort (cost=11.63..11.64 rows=1 width=61) (actual time=1440.515..1446.634 rows=3934 loops=1) Sort Key: x.title_no, x.provider_no - Nested Loop (cost=0.00..11.62 rows=1 width=61) (actual time=7.900..1422.686 rows=3934 loops=1) - Nested Loop (cost=0.00..7.38 rows=1 width=49) (actual time=7.877..1373.392 rows=3934 loops=1) - Index Scan using unq_xtns_by_mso_title_wk on xtns_by_mso_title_wk x (cost=0.00..4.12 rows=1 width=26) (actual time=7.827..1297.681 rows=3934 loops=1) Index Cond: ((week = '2006-04-23 00:00:00'::timestamp without time zone) AND (week = '2006-04-23 00:00:00'::timestamp without time zone) AND (mso_no = 50)) - Index Scan using pk_titles on titles (cost=0.00..3.25 rows=1 width=27) (actual time=0.010..0.012 rows=1 loops=3934) Index Cond: (outer.title_no = titles.title_no) - Index Scan using pk_providers on providers (cost=0.00..4.23 rows=1 width=16) (actual time=0.004..0.005 rows=1 loops=3934) Index Cond: (outer.provider_no = providers.provider_no) If it is searching over multiple weeks (week BETWEEN '20060417 00:00:00' AND '20060423 00:00:00'), it estimates better and uses a hash join. GroupAggregate (cost=7848.20..7878.48 rows=156 width=61) (actual time=117.761..145.910 rows=3459 loops=1) - Sort (cost=7848.20..7852.08 rows=1552 width=61) (actual time=117.735..123.823 rows=3934 loops=1) Sort Key: x.title_no, x.provider_no - Hash Join (cost=5.95..7765.94 rows=1552 width=61) (actual time=6.539..102.825 rows=3934 loops=1) Hash Cond: (outer.provider_no = inner.provider_no) - Nested Loop (cost=0.00..7736.71 rows=1552 width=49) (actual time=5.117..86.980 rows=3934 loops=1) - Index Scan using idx_xtns_by_mso_ti_wk_wk_mso_t on xtns_by_mso_title_wk x (cost=0.00..2677.04 rows=1552 width=26) (actual time=5.085..18.065 rows=3934 loops=1) Index Cond: ((week = '2006-04-17 00:00:00'::timestamp without time zone) AND (week = '2006-04-23 00:00:00'::timestamp without time zone) AND (mso_no = 50)) - Index Scan using pk_titles on titles (cost=0.00..3.25 rows=1 width=27) (actual time=0.006..0.010 rows=1 loops=3934) Index Cond: (outer.title_no = titles.title_no) - Hash (cost=5.16..5.16 rows=316 width=16) (actual time=1.356..1.356 rows=325 loops=1) - Seq Scan on providers (cost=0.00..5.16 rows=316 width=16) (actual time=0.008..0.691 rows=325 loops=1) If the week range is replace by an equals (week = '20060423 00:00:00'), it also uses a hash join. Unforuntately, the queries are automatically generated and changing them to use an equals could be problematic. GroupAggregate (cost=7828.75..7859.32 rows=157 width=61) (actual time=98.330..125.370 rows=3459 loops=1) - Sort (cost=7828.75..7832.67 rows=1567 width=61) (actual time=98.303..104.055 rows=3934 loops=1) Sort Key: x.title_no, x.provider_no - Hash Join (cost=5.95..7745.60 rows=1567 width=61) (actual time=1.785..83.830 rows=3934 loops=1) Hash Cond: (outer.provider_no = inner.provider_no) - Nested Loop (cost=0.00..7716.14 rows=1567 width=49) (actual time=0.170..68.338 rows=3934 loops=1) - Index Scan using idx_xtns_by_mso_ti_wk_wk_mso_t on xtns_by_mso_title_wk x (cost=0.00..2607.56 rows=1567 width=26) (actual time=0.138..11.993 rows=3934 loops=1) Index Cond: ((week = '2006-04-23 00:00:00'::timestamp without time zone) AND (mso_no = 50)) - Index Scan using pk_titles on titles (cost=0.00..3.25 rows=1 width=27) (actual time=0.006..0.008 rows=1 loops=3934) Index Cond: (outer.title_no = titles.title_no) - Hash
Re: [PERFORM] Nested loop join and date range query
Ian Burrell [EMAIL PROTECTED] writes: We recently upgraded to PostgreSQL 8.1 from 7.4 and a few queries are having performance problems and running for very long times. The commonality seems to be PostgreSQL 8.1 is choosing to use a nested loop join because it estimates there will be only be a single row. - Index Scan using unq_xtns_by_mso_title_wk on xtns_by_mso_title_wk x (cost=0.00..4.12 rows=1 width=26) (actual time=7.827..1297.681 rows=3934 loops=1) Index Cond: ((week = '2006-04-23 00:00:00'::timestamp without time zone) AND (week = '2006-04-23 00:00:00'::timestamp without time zone) AND (mso_no = 50)) We've already noted that there's a problem with estimating zero-width ranges (too lazy to search the archives, but this has come up at least twice recently). Can you modify your app to generate something like week = x and week x+1 instead of week = x and week = x ? My recollection is that the fix will probably be complicated enough to not get back-patched into 8.1. BTW, AFAIK the same problem exists in 7.4. What kind of estimates/plans were you getting for this case in 7.4? regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Nested Loop trouble : Execution time increases more
On Sat, 2005-09-17 at 17:47 +0200, Antoine Bajolet wrote: There are more tables around, but the heart of the search engine is made of three tables : fiches (f_id int4, f_title varchar) 52445 rows engine (f_id int4, k_id int4, weight )11761700 rows keywords(k_id, keyword)1072600 rows A fiche is linked to any kind of document. The engine table counts how many times a keyword appears in a document. A query to search on one or two keywords is quick to execute (the front-end creates thoses queries): Is there a specific reason the planner chooses this way ? Yes, you have an additional join for each new keyword, so there is more work to do. Recode your SQL with an IN subselect that retrieves all possible keywords before it accesses the larger table. That way you should have only one join for each new keyword. Can whe do something on the postgresql configuration to avoid this ? Can whe force the planner to use a hash join as it does for the first joins ? Not required, IMHO. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Nested Loop trouble : Execution time increases more 1000 time (long)
Antoine Bajolet [EMAIL PROTECTED] writes: We are using postgresql in a search engine on an intranet handling throusand of documents. But we ave a big problem when users use more than two search key. I think you need to increase the statistics targets for your keywords table --- the estimates of numbers of matching rows are much too small: - Index Scan using keyword_pattern_key on keywords k2 (cost=0.00..3.51 rows=1 width=4) (actual time=0.078..1.887 rows=75 loops=1) Index Cond: (((keyword)::text ~=~ 'exploitation'::character varying) AND ((keyword)::text ~~ 'exploitatioo'::character varying)) Filter: ((keyword)::text ~~ 'exploitation%'::text) A factor-of-75 error is quite likely to mislead the planner into choosing a bad join plan. BTW, have you looked into using a real full-text-search engine (eg, tsearch2) instead of rolling your own like this? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Nested Loop trouble : Execution time increases more
Hello, Tom Lane a écrit : Antoine Bajolet [EMAIL PROTECTED] writes: We are using postgresql in a search engine on an intranet handling throusand of documents. But we ave a big problem when users use more than two search key. I think you need to increase the statistics targets for your keywords table --- the estimates of numbers of matching rows are much too small: What value you think i could put into a ALTER TABLE SET STATISTICS statment ? Also, the solution given by Simon Riggs works well. quote Recode your SQL with an IN subselect that retrieves all possible keywords before it accesses the larger table. /quote But i will try the old ones increasing the statistics parameter and compare performance. - Index Scan using keyword_pattern_key on keywords k2 (cost=0.00..3.51 rows=1 width=4) (actual time=0.078..1.887 rows=75 loops=1) Index Cond: (((keyword)::text ~=~ 'exploitation'::character varying) AND ((keyword)::text ~~ 'exploitatioo'::character varying)) Filter: ((keyword)::text ~~ 'exploitation%'::text) A factor-of-75 error is quite likely to mislead the planner into choosing a bad join plan. BTW, have you looked into using a real full-text-search engine (eg, tsearch2) instead of rolling your own like this? It seems a quite good contrib, but... The first version of this search engine was developped in 2000... tsearch2 nor tsearch existed at this time. Also, there are some developpement works around this search engine (pertinence algorithm, filtering with users rights, ponderating keywords with specific rules to each type of document, etc.) and adapting all to work in the similar way with tsearch2 seems to be a bit heavy. At the end, each document indexed are quite big and the choosen method reduces disk storage : 1 Go of text content traduces to ~100 Mo of table space. Best Regards, Antoine Bajolet ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Nested Loop trouble : Execution time increases more
Re, With modifing parameters like this : ALTER TABLE keywords ALTER keyword SET STATISTICS 100; ALTER TABLE keywords ALTER k_id SET STATISTICS 100; ALTER TABLE engine ALTER k_id SET STATISTICS 100; ALTER TABLE engine ALTER f_id SET STATISTICS 100; vacuuming both tables and rewriting the queries using sub-selects : select count (distinct f.f_id) as results FROM fiches f INNER JOIN (SELECT distinct f_id FROM keywords,engine WHERE engine.k_id = keywords.k_id AND keyword like 'exploitation%') as e1 USING(f_id) INNER JOIN (SELECT distinct f_id FROM keywords,engine WHERE engine.k_id = keywords.k_id AND keyword like 'maintenance%') as e2 USING(f_id) INNER JOIN (SELECT distinct f_id FROM keywords,engine WHERE engine.k_id = keywords.k_id AND keyword like 'numerique%') as e3 USING(f_id) The query time is less than 600 ms, and increases only a little adding more keywords. Thanks to Tom Lane and Simon Riggs. Best regards, Antoine Bajolet Antoine Bajolet a écrit : Hello, Tom Lane a écrit : Antoine Bajolet [EMAIL PROTECTED] writes: We are using postgresql in a search engine on an intranet handling throusand of documents. But we ave a big problem when users use more than two search key. I think you need to increase the statistics targets for your keywords table --- the estimates of numbers of matching rows are much too small: What value you think i could put into a ALTER TABLE SET STATISTICS statment ? Also, the solution given by Simon Riggs works well. quote Recode your SQL with an IN subselect that retrieves all possible keywords before it accesses the larger table. /quote But i will try the old ones increasing the statistics parameter and compare performance. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] Nested Loop trouble : Execution time increases more 1000 time (long)
Hello, We are using postgresql in a search engine on an intranet handling throusand of documents. But we ave a big problem when users use more than two search key. There are more tables around, but the heart of the search engine is made of three tables : fiches (f_id int4, f_title varchar) 52445 rows engine (f_id int4, k_id int4, weight )11761700 rows keywords(k_id, keyword)1072600 rows A fiche is linked to any kind of document. The engine table counts how many times a keyword appears in a document. A query to search on one or two keywords is quick to execute (the front-end creates thoses queries): - select count (distinct f.f_id) as results FROM fiches f INNER JOIN engine e1 INNER JOIN keywords k1 USING (k_id) USING (f_id) INNER JOIN engine e2 INNER JOIN keywords k2 USING (k_id) USING (f_id) WHERE TRUE AND k1.keyword like 'maintenance%' AND k2.keyword like 'exploitation%' ; QUERY PLAN Aggregate (cost=3953.00..3953.00 rows=1 width=4) (actual time=525.243..525.243 rows=1 loops=1) - Nested Loop (cost=1974.79..3952.99 rows=1 width=4) (actual time=211.570..513.758 rows=6879 loops=1) - Hash Join (cost=1974.79..3949.62 rows=1 width=8) (actual time=211.483..389.340 rows=6879 loops=1) Hash Cond: (outer.f_id = inner.f_id) - Nested Loop (cost=0.00..1974.76 rows=11 width=4) (actual time=0.132..155.499 rows=9520 loops=1) - Index Scan using keyword_pattern_key on keywords k2 (cost=0.00..3.51 rows=1 width=4) (actual time=0.078..1.887 rows=75 loops=1) Index Cond: (((keyword)::text ~=~ 'exploitation'::character varying) AND ((keyword)::text ~~ 'exploitatioo'::character varying)) Filter: ((keyword)::text ~~ 'exploitation%'::text) - Index Scan using k_id_key on engine e2 (cost=0.00..1954.93 rows=1306 width=8) (actual time=0.049..1.842 rows=127 loops=75) Index Cond: (e2.k_id = outer.k_id) - Hash (cost=1974.76..1974.76 rows=11 width=4) (actual time=211.203..211.203 rows=0 loops=1) - Nested Loop (cost=0.00..1974.76 rows=11 width=4) (actual time=0.296..197.590 rows=11183 loops=1) - Index Scan using keyword_pattern_key on keywords k1 (cost=0.00..3.51 rows=1 width=4) (actual time=0.189..1.351 rows=73 loops=1) Index Cond: (((keyword)::text ~=~ 'maintenance'::character varying) AND ((keyword)::text ~~ 'maintenancf'::character varying)) Filter: ((keyword)::text ~~ 'maintenance%'::text) - Index Scan using k_id_key on engine e1 (cost=0.00..1954.93 rows=1306 width=8) (actual time=0.029..2.406 rows=153 loops=73) Index Cond: (e1.k_id = outer.k_id) - Index Scan using fiches_pkey on fiches f (cost=0.00..3.36 rows=1 width=4) (actual time=0.013..0.014 rows=1 loops=6879) Index Cond: (f.f_id = outer.f_id) Total runtime: 525.511 ms -- But when there are three keywords or more, the planner chooses to perform a very costly nested loop : -- select count (distinct f.f_id) as results FROM fiches f INNER JOIN engine e1 INNER JOIN keywords k1 USING (k_id) USING (f_id) INNER JOIN engine e2 INNER JOIN keywords k2 USING (k_id) USING (f_id) INNER JOIN engine e3 INNER JOIN keywords k3 USING (k_id) USING (f_id) WHERE TRUE AND k1.keyword like 'maintenance%' AND k2.keyword like 'exploitation%' AND k3.keyword like 'numerique%' ; QUERY PLAN Aggregate (cost=5927.90..5927.90 rows=1 width=4) (actual time=673048.168..673048.169 rows=1 loops=1) - Nested Loop (cost=1974.79..5927.90 rows=1 width=4) (actual time=1853.789..673038.065 rows=2929 loops=1) - Nested Loop (cost=1974.79..5924.52 rows=1 width=12) (actual time=1853.719..672881.725 rows=2929 loops=1) Join Filter: (inner.f_id = outer.f_id) - Hash Join (cost=1974.79..3949.62 rows=1 width=8) (actual time=198.845..441.947 rows=6879 loops=1) Hash Cond: (outer.f_id = inner.f_id) - Nested Loop (cost=0.00..1974.76 rows=11 width=4) (actual time=0.129..199.895 rows=9520 loops=1) - Index Scan using keyword_pattern_key on keywords k2 (cost=0.00..3.51 rows=1 width=4) (actual time=0.077..1.918 rows=75 loops=1) Index Cond: (((keyword)::text ~=~ 'exploitation'::character varying) AND ((keyword)::text ~~ 'exploitatioo'::character varying)) Filter: ((keyword)::text ~~ 'exploitation%'::text) - Index Scan using k_id_key on engine e2 (cost=0.00..1954.93 rows=1306 width=8) (actual
Re: [PERFORM] Nested loop question
On Tuesday 16 December 2003 17:06, Nick Fankhauser - Doxpop wrote: Hi- I'm trying to optimize a query that I *think* should run very fast. Essentially, I'm joining two tables that have very selective indexes and constraining the query on an indexed field. (There's a third small lookup table in the mix, but it doesn't really affect the bottom line.) I'm unsure what is happening next. I notice that an index scan is occurring on actor_summary_pk, with an actual time of 9.15, but then it looks like a nested loop occurs at the next level to join these tables. Does this mean that each probe of the actor_summary index will take 9.15 msec, but the nested loop is going to do this once for each actor_id? That's right - you need to multiply the actual time by the number of loops. In your case this would seem to be about 33 seconds. - Index Scan using actor_summary_pk on actor_summary (cost=0.00..8.11 rows=1 width=72) (actual time=9.14..9.15 rows=1 loops=3639) Index Cond: (outer.actor_id = actor_summary.actor_id) The nested loop appears to be where most of my time is going, so I'm focusing on this area, but don't know if there is a better approach to this join. Is there a more efficient means than a nested loop to handle such a join? Would a different method be chosen if there was exactly one row in actor_summary for every row in actor? Hmm - tricky to say in your case. PG has decided to filter on actor then look up the corresponding values in actor_summary. Given that you have 3 million rows in both tables that seems a reasonable approach. You could always try forcing different plans by switching the various ENABLE_HASHJOIN etc options (see the runtime configuration section of the manuals). I'm not sure that will help you here though. The fact that it's taking you 9ms to do each index lookup suggests to me that it's going to disk each time. Does that sound plausible, or do you think you have enough RAM to cache your large indexes? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Nested loop performance
It seems that your basic problem is that you're fetching lots of rows from two big ol' tables. It doesn't seem to me that there would be a substantially better plan for this query with your tables as they stand. That's more or less the conclusion I had come to. I was just hoping someone else could point out an approach I've been missing. (sigh!) If your data were more normalised, then your big scans might be quicker (because their rows would be smaller so they would hit fewer disk pages), This started off as a 5-table join on well-normalized data. Unfortunately, the actor table doesn't get any smaller, and the work involved in calculating the case_count information on the fly was clearly becoming a problem- particularly with actors that had a heavy caseload. (Busy attorneys and judges.) The actor_summary approach makes these previous problem cases go away, but the payback is that (as you correctly pointed out) queries on average citizens who only have one case suffer from the de-normalized approach. We're currently considering the approach of just returning all of the rows to our application, and doing the aggregation and limit work in the app. The inconsistency of the data makes it very tough for the query planner to come up with an strategy that is always a winner. Thanks for your thoughts! -Nick ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Nested loop performance
Hi- I'm trying to optimize a query that I *think* should run very fast. Essentially, I'm joining two tables that have very selective indexes and constraining the query on an indexed field. (There's a third small lookup table in the mix, but it doesn't really affect the bottom line.) actor is a table containing roughly 3 million rows with an index on actor_full_name_uppercase and a unique index on actor_id. actor_summary also contains roughly 3 million rows. Its PK is a unique combined index on (actor_id, county_id, case_disp_global_code). The vast majority of the rows in actor correspond to a single row in actor_summary I'd estimate this at 95% or more. The remaining actors with multiple records generally have two corresponding rows in actor summary. Actor summary was created as a performance enhancer, where we can store some pre-calculated values such as the number of court cases an actor is involved in. The constraint is applied first, with reasonable speed. In the example below, it takes about 15 seconds to gather the matches in actor. I'm unsure what is happening next. I notice that an index scan is occurring on actor_summary_pk, with an actual time of 9.15, but then it looks like a nested loop occurs at the next level to join these tables. Does this mean that each probe of the actor_summary index will take 9.15 msec, but the nested loop is going to do this once for each actor_id? The nested loop appears to be where most of my time is going, so I'm focusing on this area, but don't know if there is a better approach to this join. Is there a more efficient means than a nested loop to handle such a join? Would a different method be chosen if there was exactly one row in actor_summary for every row in actor? -Nick The query explain analyze: alpha=# alpha=# alpha=# explain analyze alpha-# select alpha-# min(actor.actor_id) as actor_id, alpha-# min(actor.actor_entity_type) as actor_entity_type, alpha-# min(actor.role_class_code) as role_class_code, alpha-# min(actor.actor_full_name) as actor_full_name, alpha-# min(actor.actor_person_date_of_birth) as actor_person_date_of_birth, alpha-# min(actor.actor_entity_acronym) as actor_entity_acronym, alpha-# min(actor.actor_person_last_name) as actor_person_last_name, alpha-# min(actor.actor_person_first_name) as actor_person_first_name, alpha-# min(actor.actor_person_middle_name) as actor_person_middle_name, alpha-# min(actor.actor_person_name_suffix) as actor_person_name_suffix, alpha-# min(actor.actor_person_place_of_birth) as actor_person_place_of_birth, alpha-# min(actor.actor_person_height) as actor_person_height, alpha-# min(actor.actor_person_height_unit) as actor_person_height_unit, alpha-# min(actor.actor_person_weight) as actor_person_weight, alpha-# min(actor.actor_person_weight_unit) as actor_person_weight_unit, alpha-# min(actor.actor_person_ethnicity) as actor_person_ethnicity, alpha-# min(actor.actor_person_citizenship_count) as actor_person_citizenship_count, alpha-# min(actor.actor_person_hair_color) as actor_person_hair_color, alpha-# min(actor.actor_person_scars_marks_tatto) as actor_person_scars_marks_tatto, alpha-# min(actor.actor_person_marital_status) as actor_person_marital_status, alpha-# min(actor.actor_alias_for_actor_id) as actor_alias_for_actor_id, alpha-# min(to_char(data_source.source_last_update, 'MM/DD/ HH12:MI AM TZ')) as last_update, alpha-# min(actor_summary.single_case_public_id) as case_public_id, alpha-# min(actor_summary.single_case_id) as case_id, alpha-# sum(actor_summary.case_count)as case_count alpha-# from alpha-# actor, alpha-# actor_summary, alpha-# data_source alpha-# where alpha-# actor.actor_id = actor_summary.actor_id alpha-# and data_source.source_id = actor.source_id alpha-# and actor_full_name_uppercase like upper('sanders%') alpha-# group by alpha-# actor.actor_id alpha-# order by alpha-# min(actor.actor_full_name_uppercase), alpha-# case_count desc, alpha-# min(actor_summary.case_disp_global_code) alpha-# limit alpha-# 1000 alpha-# ; QUERY PLAN --- Limit (cost=2555.58..2555.59 rows=1 width=547) (actual time=48841.76..48842.90 rows=1000 loops=1) - Sort (cost=2555.58..2555.59 rows=1 width=547) (actual time=48841.76..48842.18 rows=1001 loops=1) Sort Key: min((actor.actor_full_name_uppercase)::text), sum(actor_summary.case_count), min((actor_summary.case_disp_global_code)::text) - Aggregate (cost=2555.50..2555.57 rows=1 width=547) (actual time=48604.17..48755.28 rows=3590 loops=1) - Group (cost=2555.50..2555.50 rows=1 width=547) (actual time=48604.04..48647.91 rows=3594 loops=1) - Sort
Re: [PERFORM] Nested loop performance
On Tue, 16 Dec 2003, Nick Fankhauser wrote: Is there a more efficient means than a nested loop to handle such a join? Would a different method be chosen if there was exactly one row in actor_summary for every row in actor? As a question, what does explain analyze give you if you set enable_nestloop=false; before trying the query? ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org