Re: [PERFORM] How to get explain plan to prefer Hash Join
On Wed, Mar 11, 2015 at 5:35 PM, atxcanadian matthew.bo...@gmail.com wrote: I'm a little perplexed why the autovacuum wasn't keeping up. Any recommendations for those settings to push it to do a bit more analyzing of the tables?? What does pg_stat_user_tables show for that table?
Re: [PERFORM] How to get explain plan to prefer Hash Join
On Thu, Mar 12, 2015 at 8:59 AM, atxcanadian matthew.bo...@gmail.com wrote: Here is the output: http://postgresql.nabble.com/file/n5841610/pg_stat_user_table.jpg This is after I've manually ran analyze. The last_* columns are only showing times, and not full timestamps. Does your reporting tool drop the date part of a timestamp when it is equal to today? Or does it just drop the date part altogether regardless of what it is? Cheers, Jeff
Re: [PERFORM] How to get explain plan to prefer Hash Join
Isn't a random_page_cost of 1 a little aggressive? We are currently setup on Amazon SSD with software RAID 5. -- View this message in context: http://postgresql.nabble.com/How-to-get-explain-plan-to-prefer-Hash-Join-tp5841450p5841605.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get explain plan to prefer Hash Join
Here is the output: http://postgresql.nabble.com/file/n5841610/pg_stat_user_table.jpg This is after I've manually ran analyze. -- View this message in context: http://postgresql.nabble.com/How-to-get-explain-plan-to-prefer-Hash-Join-tp5841450p5841610.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get explain plan to prefer Hash Join
Sorry about that, excel clipped off the dates. http://postgresql.nabble.com/file/n5841633/pg_stat_user_table.jpg -- View this message in context: http://postgresql.nabble.com/How-to-get-explain-plan-to-prefer-Hash-Join-tp5841450p5841633.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] How to get explain plan to prefer Hash Join
Currently seeing massive increase in performance when optimizer chooses Hash Join over Nested Loops. I achieve this by temporarily setting nested loops off. I'd like to setup some database variables where the optimizer prefers hash joins. Any suggestions? *Query in question:* explain analyze select dp.market_day, dp.hour_ending, dp.repeated_hour_flag, dp.settlement_point, sum(dp.mw) dp_mw from dp_hist_gen_actual dp Inner Join api_settlement_points sp on sp.settlement_point = dp.settlement_point and sp.settlement_point_rdfid = '#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}' and sp.start_date = '2015-01-01'::date and sp.end_date '2015-01-01'::date and sp.rt_model = (select case when c.rt_model_loaded = 2 then true else false end from cim_calendar c where c.nodal_load = '2015-01-01'::date order by c.cim desc limit 1) where dp.market_day BETWEEN '2015-01-01'::date and '2015-01-01'::date and dp.expiry_date is null group by dp.market_day, dp.hour_ending, dp.repeated_hour_flag, dp.settlement_point; *Nested Loop Explain Analyze Output:* HashAggregate (cost=58369.29..58369.30 rows=1 width=24) (actual time=496287.249..496287.257 rows=24 loops=1) InitPlan 1 (returns $0) - Limit (cost=8.30..8.30 rows=1 width=9) (actual time=0.145..0.145 rows=1 loops=1) - Sort (cost=8.30..8.78 rows=193 width=9) (actual time=0.145..0.145 rows=1 loops=1) Sort Key: c.cim Sort Method: top-N heapsort Memory: 25kB - Seq Scan on cim_calendar c (cost=0.00..7.33 rows=193 width=9) (actual time=0.007..0.075 rows=192 loops=1) Filter: (nodal_load = '2015-01-01'::date) Rows Removed by Filter: 36 - * Nested Loop (cost=0.99..58360.98 rows=1 width=24) (actual time=883.718..496287.058 rows=24 loops=1)* Join Filter: ((dp.settlement_point)::text = (sp.settlement_point)::text) Rows Removed by Join Filter: 12312 - Index Scan using dp_hist_gen_actual_idx2 on dp_hist_gen_actual dp (cost=0.56..2.78 rows=1 width=24) (actual time=0.020..20.012 rows=12336 loops=1) Index Cond: ((market_day = '2015-01-01'::date) AND (market_day = '2015-01-01'::date) AND (expiry_date IS NULL)) - Index Scan using api_settlement_points_idx on api_settlement_points sp (cost=0.43..58358.05 rows=12 width=9) (actual time=39.066..40.223 rows=1 loops=12336) Index Cond: ((rt_model = $0) AND (start_date = '2015-01-01'::date) AND (end_date '2015-01-01'::date)) Filter: ((settlement_point_rdfid)::text = '#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}'::text) Rows Removed by Filter: 5298 *Total runtime: 496287.325 ms* *Hash Join Explain Analyze Output:* HashAggregate (cost=58369.21..58369.22 rows=1 width=24) (actual time=50.835..50.843 rows=24 loops=1) InitPlan 1 (returns $0) - Limit (cost=8.30..8.30 rows=1 width=9) (actual time=0.149..0.149 rows=1 loops=1) - Sort (cost=8.30..8.78 rows=193 width=9) (actual time=0.148..0.148 rows=1 loops=1) Sort Key: c.cim Sort Method: top-N heapsort Memory: 25kB - Seq Scan on cim_calendar c (cost=0.00..7.33 rows=193 width=9) (actual time=0.009..0.082 rows=192 loops=1) Filter: (nodal_load = '2015-01-01'::date) Rows Removed by Filter: 36 - *Hash Join (cost=3.23..58360.90 rows=1 width=24) (actual time=49.644..50.811 rows=24 loops=1)* Hash Cond: ((sp.settlement_point)::text = (dp.settlement_point)::text) - Index Scan using api_settlement_points_idx on api_settlement_points sp (cost=0.43..58358.05 rows=12 width=9) (actual time=39.662..40.822 rows=1 loops=1) Index Cond: ((rt_model = $0) AND (start_date = '2015-01-01'::date) AND (end_date '2015-01-01'::date)) Filter: ((settlement_point_rdfid)::text = '#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}'::text) Rows Removed by Filter: 5298 - Hash (cost=2.78..2.78 rows=1 width=24) (actual time=9.962..9.962 rows=12336 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 684kB - Index Scan using dp_hist_gen_actual_idx2 on dp_hist_gen_actual dp (cost=0.56..2.78 rows=1 width=24) (actual time=0.023..5.962 rows=12336 loops=1) Index Cond: ((market_day = '2015-01-01'::date) AND (market_day = '2015-01-01'::date) AND (expiry_date IS NULL)) *Total runtime: 50.906 ms* -- View this message in context: http://postgresql.nabble.com/How-to-get-explain-plan-to-prefer-Hash-Join-tp5841450.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get explain plan to prefer Hash Join
-Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- ow...@postgresql.org] On Behalf Of atxcanadian Sent: Wednesday, March 11, 2015 1:01 PM To: pgsql-performance@postgresql.org Subject: [PERFORM] How to get explain plan to prefer Hash Join Currently seeing massive increase in performance when optimizer chooses Hash Join over Nested Loops. I achieve this by temporarily setting nested loops off. I'd like to setup some database variables where the optimizer prefers hash joins. Any suggestions? Try making small adjustments to either random_page_cost or cpu_tuple_cost. They can influence the planners choice here. I have solved similar issues in the past by adjusting one or the other. Be aware thought that those changes can have negative effects in other places, so be sure to test. Brad. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get explain plan to prefer Hash Join
On Wed, Mar 11, 2015 at 10:01 AM, atxcanadian matthew.bo...@gmail.com wrote: Currently seeing massive increase in performance when optimizer chooses Hash Join over Nested Loops. I achieve this by temporarily setting nested loops off. I'd like to setup some database variables where the optimizer prefers hash joins. Any suggestions? - Index Scan using dp_hist_gen_actual_idx2 on dp_hist_gen_actual dp (cost=0.56..2.78 rows=1 width=24) (actual time=0.020..20.012 rows=12336 loops=1) Here it thinks it will find 1 row, but actually finds 12336. That is not conducive to good plans. Has the table been analyzed recently? Index Cond: ((market_day = '2015-01-01'::date) AND (market_day = '2015-01-01'::date) AND (expiry_date IS NULL)) If you query just this one table with just these criteria, what do you get for the row estimates and actual rows, with and without the IS NULL condition? Cheers, Jeff
Re: [PERFORM] How to get explain plan to prefer Hash Join
So I implemented two changes. - Moved random_page_cost from 1.1 to 2.0 - Manually ran analyze on all the tables *Here is the new explain analyze:* QUERY PLAN HashAggregate (cost=74122.97..74125.53 rows=256 width=24) (actual time=45.205..45.211 rows=24 loops=1) InitPlan 1 (returns $0) - Limit (cost=8.30..8.30 rows=1 width=9) (actual time=0.152..0.152 rows=1 loops=1) - Sort (cost=8.30..8.78 rows=193 width=9) (actual time=0.150..0.150 rows=1 loops=1) Sort Key: c.cim Sort Method: top-N heapsort Memory: 25kB - Seq Scan on cim_calendar c (cost=0.00..7.33 rows=193 width=9) (actual time=0.008..0.085 rows=192 loops=1) Filter: (nodal_load = '2015-01-01'::date) Rows Removed by Filter: 36 - Nested Loop (cost=22623.47..74111.47 rows=256 width=24) (actual time=43.798..45.181 rows=24 loops=1) - Bitmap Heap Scan on api_settlement_points sp (cost=22622.91..67425.92 rows=12 width=9) (actual time=43.756..43.823 rows=1 loops=1) Recheck Cond: ((rt_model = $0) AND (start_date = '2015-01-01'::date) AND (end_date '2015-01-01'::date)) Filter: ((settlement_point_rdfid)::text = '#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}'::text) Rows Removed by Filter: 5298 - Bitmap Index Scan on api_settlement_points_idx (cost=0.00..22622.90 rows=72134 width=0) (actual time=42.998..42.998 rows=5299 loops=1) Index Cond: ((rt_model = $0) AND (start_date = '2015-01-01'::date) AND (end_date '2015-01-01'::date)) - Index Scan using dp_hist_gen_actual_idx2 on dp_hist_gen_actual dp (cost=0.56..556.88 rows=25 width=24) (actual time=0.033..1.333 rows=24 loops=1) Index Cond: ((market_day = '2015-01-01'::date) AND (market_day = '2015-01-01'::date) AND (expiry_date IS NULL) AND ((settlement_point)::text = (sp.settlement_point)::text)) Total runtime: 45.278 ms I'm a little perplexed why the autovacuum wasn't keeping up. Any recommendations for those settings to push it to do a bit more analyzing of the tables?? -- View this message in context: http://postgresql.nabble.com/How-to-get-explain-plan-to-prefer-Hash-Join-tp5841450p5841520.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to get explain plan to prefer Hash Join
2015-03-12 1:35 GMT+01:00 atxcanadian matthew.bo...@gmail.com: So I implemented two changes. - Moved random_page_cost from 1.1 to 2.0 random_page_cost 1 can enforce nested_loop - it is very cheap with it - Manually ran analyze on all the tables *Here is the new explain analyze:* QUERY PLAN HashAggregate (cost=74122.97..74125.53 rows=256 width=24) (actual time=45.205..45.211 rows=24 loops=1) InitPlan 1 (returns $0) - Limit (cost=8.30..8.30 rows=1 width=9) (actual time=0.152..0.152 rows=1 loops=1) - Sort (cost=8.30..8.78 rows=193 width=9) (actual time=0.150..0.150 rows=1 loops=1) Sort Key: c.cim Sort Method: top-N heapsort Memory: 25kB - Seq Scan on cim_calendar c (cost=0.00..7.33 rows=193 width=9) (actual time=0.008..0.085 rows=192 loops=1) Filter: (nodal_load = '2015-01-01'::date) Rows Removed by Filter: 36 - Nested Loop (cost=22623.47..74111.47 rows=256 width=24) (actual time=43.798..45.181 rows=24 loops=1) - Bitmap Heap Scan on api_settlement_points sp (cost=22622.91..67425.92 rows=12 width=9) (actual time=43.756..43.823 rows=1 loops=1) Recheck Cond: ((rt_model = $0) AND (start_date = '2015-01-01'::date) AND (end_date '2015-01-01'::date)) Filter: ((settlement_point_rdfid)::text = '#_{09F3A628-3B9D-481A-AC90-72AF8EAB64CA}'::text) Rows Removed by Filter: 5298 - Bitmap Index Scan on api_settlement_points_idx (cost=0.00..22622.90 rows=72134 width=0) (actual time=42.998..42.998 rows=5299 loops=1) Index Cond: ((rt_model = $0) AND (start_date = '2015-01-01'::date) AND (end_date '2015-01-01'::date)) - Index Scan using dp_hist_gen_actual_idx2 on dp_hist_gen_actual dp (cost=0.56..556.88 rows=25 width=24) (actual time=0.033..1.333 rows=24 loops=1) Index Cond: ((market_day = '2015-01-01'::date) AND (market_day = '2015-01-01'::date) AND (expiry_date IS NULL) AND ((settlement_point)::text = (sp.settlement_point)::text)) Total runtime: 45.278 ms I'm a little perplexed why the autovacuum wasn't keeping up. Any recommendations for those settings to push it to do a bit more analyzing of the tables?? -- View this message in context: http://postgresql.nabble.com/How-to-get-explain-plan-to-prefer-Hash-Join-tp5841450p5841520.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance