Re: [PERFORM] Performance issue with NestedLoop query
On Tue, Aug 4, 2015 at 8:40 PM, Ram N yrami...@gmail.com wrote: Thanks much for responding guys. I have tried both, building multi column indexes and GIST, with no improvement. I have reduced the window from 180 days to 30 days and below are the numbers Composite index - takes 30 secs With Btree indexing - takes 9 secs With GIST - takes 30 secs with kind of materialize plan in explain Any other ideas I can do for window based joins. From this query: select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts b.start_date and a.ts b.end_date and a.ts '2015-01-01 20:50:44.00 +00:00:00' and a.ts '2015-07-01 19:50:44.00 +00:00:00' group by a.ts, st order by a.ts We can actually derive that b.start_date '2015-07-01 19:50:44.00 +00:00:00' and b.end_date '2015-01-01 20:50:44.00 +00:00:00'. If we add these two predicates to the original query, does it help? Thanks, Qingqing -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with NestedLoop query
Thanks much for responding guys. I have tried both, building multi column indexes and GIST, with no improvement. I have reduced the window from 180 days to 30 days and below are the numbers Composite index - takes 30 secs With Btree indexing - takes 9 secs With GIST - takes 30 secs with kind of materialize plan in explain Any other ideas I can do for window based joins. --yr On Fri, Jul 31, 2015 at 11:37 AM, Qingqing Zhou zhouqq.postg...@gmail.com wrote: On Fri, Jul 31, 2015 at 10:55 AM, Ram N yrami...@gmail.com wrote: Thanks Qingqing for responding. That didn't help. It in fact increased the scan time. Looks like a lot of time is being spent on the NestedLoop Join than index lookups though I am not sure how to optimize the join. Good news is that optimizer is right this time :-). The NLJ here does almost nothing but schedule each outer row to probing the inner index. So the index seek is the major cost. Have you tried build a two column index on (b.start_date, b.end_date)? Regards, Qingqing
Re: [PERFORM] Performance issue with NestedLoop query
On Thu, Jul 30, 2015 at 4:51 AM, Ram N yrami...@gmail.com wrote: select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts b.start_date and a.ts b.end_date and a.ts '2015-01-01 20:50:44.00 +00:00:00' and a.ts '2015-07-01 19:50:44.00 +00:00:00' group by a.ts, st order by a.ts You could try to use a range type: CREATE INDEX ON table2 USING gin (tstzrange(start_date, end_date, '()')); Then: select sum(a), count(id), a.ts, st from table1 a, table2 b where tstzrange(b.start_date, b.end_date, '()') @ a.ts and a.ts '2015-07-01 19:50:44.00 +00:00:00' group by a.ts, st order by a.ts Regards, -- Matheus de Oliveira
Re: [PERFORM] Performance issue with NestedLoop query
On Fri, Jul 31, 2015 at 10:55 AM, Ram N yrami...@gmail.com wrote: Thanks Qingqing for responding. That didn't help. It in fact increased the scan time. Looks like a lot of time is being spent on the NestedLoop Join than index lookups though I am not sure how to optimize the join. Good news is that optimizer is right this time :-). The NLJ here does almost nothing but schedule each outer row to probing the inner index. So the index seek is the major cost. Have you tried build a two column index on (b.start_date, b.end_date)? Regards, Qingqing -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance issue with NestedLoop query
On Fri, Jul 31, 2015 at 3:06 PM, Matheus de Oliveira matioli.math...@gmail.com wrote: CREATE INDEX ON table2 USING gin (tstzrange(start_date, end_date, '()')); The index should be USING GIST, not GIN. Sorry. -- Matheus de Oliveira
Re: [PERFORM] Performance issue with NestedLoop query
Thanks Qingqing for responding. That didn't help. It in fact increased the scan time. Looks like a lot of time is being spent on the NestedLoop Join than index lookups though I am not sure how to optimize the join. I am assuming its in memory join, so I am not sure why it should take such a lot of time. Increase work_mem has helped in reducing the processing time but it's still 1 min. --yr On Thu, Jul 30, 2015 at 1:24 PM, Qingqing Zhou zhouqq.postg...@gmail.com wrote: On Thu, Jul 30, 2015 at 12:51 AM, Ram N yrami...@gmail.com wrote: - Index Scan using end_date_idx on public.table2 b (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274 rows=403936 loops=181) Output: b.serial_no, b.name, b.st, b.end_date, b.a, b.start_date Index Cond: (a.ts b.end_date) Filter: (a.ts b.start_date) Rows Removed by Filter: 392642 In your case, do you have index built for both b.end_date and b.start_date? If so, can you try set enable_index=off to see if bitmap heap scan helps? Regards, Qingqing
Re: [PERFORM] Performance issue with NestedLoop query
On Thu, Jul 30, 2015 at 12:51 AM, Ram N yrami...@gmail.com wrote: - Index Scan using end_date_idx on public.table2 b (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274 rows=403936 loops=181) Output: b.serial_no, b.name, b.st, b.end_date, b.a, b.start_date Index Cond: (a.ts b.end_date) Filter: (a.ts b.start_date) Rows Removed by Filter: 392642 In your case, do you have index built for both b.end_date and b.start_date? If so, can you try set enable_index=off to see if bitmap heap scan helps? Regards, Qingqing -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Performance issue with NestedLoop query
Hi, I am trying to see if I can do anything to optimize the following plan. I have two tables and I am doing a join between them. After joining it calculates aggregates (Sum and Count) Table 1 : timestamp (one per day) for 2 years (730 records) Table 2 : Window based validity records. Window here means start and end timestamp indicating a period of validity for a record. Hash some 10 odd columns including start_time and end_time. (1 million records) Machine has 244 GB RAM. Queries are taking more than a min and in some case 2-3 mins. Below is the plan I am getting. The Nested loop blows up the number of records and we expect that. I have tried playing around work_mem and cache configs which hasn't helped. Query select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts b.start_date and a.ts b.end_date and a.ts '2015-01-01 20:50:44.00 +00:00:00' and a.ts '2015-07-01 19:50:44.00 +00:00:00' group by a.ts, st order by a.ts Plan (EXPLAIN ANALYZE) Sort (cost=10005447874.54..10005447879.07 rows=1810 width=44) (actual time=178883.936..178884.159 rows=1355 loops=1) Output: (sum(b.a)), (count(b.id)), a.ts, b.st Sort Key: a.ts Sort Method: quicksort Memory: 154kB Buffers: shared hit=47068722 read=102781 I/O Timings: read=579.946 - HashAggregate (cost=10005447758.51..10005447776.61 rows=1810 width=44) (actual time=178882.874..178883.320 rows=1355 loops=1) Output: sum(b.a), count(b.id), a.ts, b.st Group Key: a.ts, b.st Buffers: shared hit=47068719 read=102781 I/O Timings: read=579.946 - Nested Loop (cost=100.43..10004821800.38 rows=62595813 width=44) (actual time=0.167..139484.854 rows=73112419 loops=1) Output: a.ts, b.st, b.a, b.id Buffers: shared hit=47068719 read=102781 I/O Timings: read=579.946 - Seq Scan on public.table1 a (cost=0.00..14.81 rows=181 width=8) (actual time=0.058..0.563 rows=181 loops=1) Output: a.ts Filter: ((a.ts '2015-01-01 20:50:44+00'::timestamp with time zone) AND (a.ts '2015-07-01 19:50:44+00'::timestamp with time zone)) Rows Removed by Filter: 540 Buffers: shared read=4 I/O Timings: read=0.061 - Index Scan using end_date_idx on public.table2 b (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274 rows=403936 loops=181) Output: b.serial_no, b.name, b.st, b.end_date, b.a, b.start_date Index Cond: (a.ts b.end_date) Filter: (a.ts b.start_date) Rows Removed by Filter: 392642 Buffers: shared hit=47068719 read=102777 I/O Timings: read=579.885 Planning time: 0.198 ms Execution time: 178884.467 ms Any pointers on how to go about optimizing this? --yr