Re: [PERFORM] Performance issue with NestedLoop query

2015-08-05 Thread Qingqing Zhou
On Tue, Aug 4, 2015 at 8:40 PM, Ram N yrami...@gmail.com wrote:

 Thanks much for responding guys. I have tried both, building multi column
 indexes and GIST, with no improvement. I have reduced the window from 180
 days to 30 days and below are the numbers

 Composite index -  takes 30 secs

 With Btree indexing  - takes 9 secs

 With GIST - takes 30 secs with kind of materialize plan in explain

 Any other ideas I can do for window based joins.


From this query:

select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts
 b.start_date and a.ts  b.end_date and a.ts  '2015-01-01
20:50:44.00 +00:00:00' and a.ts  '2015-07-01 19:50:44.00
+00:00:00' group by a.ts, st order by a.ts

We can actually derive that b.start_date  '2015-07-01 19:50:44.00
+00:00:00' and b.end_date  '2015-01-01 20:50:44.00 +00:00:00'. If
we add these two predicates to the original query, does it help?

Thanks,
Qingqing


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with NestedLoop query

2015-08-04 Thread Ram N
Thanks much for responding guys. I have tried both, building multi column
indexes and GIST, with no improvement. I have reduced the window from 180
days to 30 days and below are the numbers

Composite index -  takes 30 secs

With Btree indexing  - takes 9 secs

With GIST - takes 30 secs with kind of materialize plan in explain

Any other ideas I can do for window based joins.

--yr


On Fri, Jul 31, 2015 at 11:37 AM, Qingqing Zhou zhouqq.postg...@gmail.com
wrote:

 On Fri, Jul 31, 2015 at 10:55 AM, Ram N yrami...@gmail.com wrote:
 
  Thanks Qingqing for responding. That didn't help. It in fact increased
 the
  scan time. Looks like a lot of time is being spent on the NestedLoop Join
  than index lookups though I am not sure how to optimize the join.
 

 Good news is that optimizer is right this time :-). The NLJ here does
 almost nothing but schedule each outer row to probing the inner index.
 So the index seek is the major cost.

 Have you tried build a two column index on (b.start_date, b.end_date)?

 Regards,
 Qingqing



Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Matheus de Oliveira
On Thu, Jul 30, 2015 at 4:51 AM, Ram N yrami...@gmail.com wrote:

 select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts 
 b.start_date and a.ts  b.end_date and a.ts  '2015-01-01 20:50:44.00
 +00:00:00' and a.ts  '2015-07-01 19:50:44.00 +00:00:00' group by a.ts,
 st order by a.ts


You could try to use a range type:

CREATE INDEX ON table2 USING gin (tstzrange(start_date, end_date,
'()'));

Then:

select sum(a), count(id), a.ts, st
from table1 a, table2 b
where tstzrange(b.start_date, b.end_date, '()') @ a.ts
and a.ts  '2015-07-01 19:50:44.00 +00:00:00'
group by a.ts, st
order by a.ts

Regards,
-- 
Matheus de Oliveira


Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Qingqing Zhou
On Fri, Jul 31, 2015 at 10:55 AM, Ram N yrami...@gmail.com wrote:

 Thanks Qingqing for responding. That didn't help. It in fact increased the
 scan time. Looks like a lot of time is being spent on the NestedLoop Join
 than index lookups though I am not sure how to optimize the join.


Good news is that optimizer is right this time :-). The NLJ here does
almost nothing but schedule each outer row to probing the inner index.
So the index seek is the major cost.

Have you tried build a two column index on (b.start_date, b.end_date)?

Regards,
Qingqing


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Matheus de Oliveira
On Fri, Jul 31, 2015 at 3:06 PM, Matheus de Oliveira 
matioli.math...@gmail.com wrote:

 CREATE INDEX ON table2 USING gin (tstzrange(start_date, end_date,
 '()'));


The index should be USING GIST, not GIN. Sorry.


-- 
Matheus de Oliveira


Re: [PERFORM] Performance issue with NestedLoop query

2015-07-31 Thread Ram N
Thanks Qingqing for responding. That didn't help. It in fact increased the
scan time. Looks like a lot of time is being spent on the NestedLoop Join
than index lookups though I am not sure how to optimize the join. I am
assuming its in memory join, so I am not sure why it should take such a lot
of time. Increase work_mem has helped in reducing the processing time but
it's still  1 min.

--yr

On Thu, Jul 30, 2015 at 1:24 PM, Qingqing Zhou zhouqq.postg...@gmail.com
wrote:

 On Thu, Jul 30, 2015 at 12:51 AM, Ram N yrami...@gmail.com wrote:
-  Index Scan using end_date_idx on public.table2 b
  (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274
  rows=403936 loops=181)
  Output: b.serial_no, b.name, b.st, b.end_date, b.a,
  b.start_date
  Index Cond: (a.ts  b.end_date)
  Filter: (a.ts  b.start_date)
  Rows Removed by Filter: 392642

 In your case, do you have index built for both b.end_date and
 b.start_date? If so, can you try

 set enable_index=off

 to see if bitmap heap scan helps?

 Regards,
 Qingqing



Re: [PERFORM] Performance issue with NestedLoop query

2015-07-30 Thread Qingqing Zhou
On Thu, Jul 30, 2015 at 12:51 AM, Ram N yrami...@gmail.com wrote:
   -  Index Scan using end_date_idx on public.table2 b
 (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274
 rows=403936 loops=181)
 Output: b.serial_no, b.name, b.st, b.end_date, b.a,
 b.start_date
 Index Cond: (a.ts  b.end_date)
 Filter: (a.ts  b.start_date)
 Rows Removed by Filter: 392642

In your case, do you have index built for both b.end_date and
b.start_date? If so, can you try

set enable_index=off

to see if bitmap heap scan helps?

Regards,
Qingqing


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance issue with NestedLoop query

2015-07-30 Thread Ram N
Hi,

I am trying to see if I can do anything to optimize the following plan.

I have two tables and I am doing a join between them. After joining it
calculates aggregates (Sum and Count)
Table 1 : timestamp (one per day) for 2 years (730 records)
Table 2 : Window based validity records. Window here means start and end
timestamp indicating a period of validity for a record.
Hash some 10 odd columns including start_time and end_time.  (1 million
records)

Machine has 244 GB RAM. Queries are taking more than a min and in some case
2-3 mins.

Below is the plan I am getting. The Nested loop blows up the number of
records and we expect that. I have tried playing around work_mem and cache
configs which hasn't helped.

Query
select sum(a), count(id), a.ts, st from table1 a, table2 b where a.ts 
b.start_date and a.ts  b.end_date and a.ts  '2015-01-01 20:50:44.00
+00:00:00' and a.ts  '2015-07-01 19:50:44.00 +00:00:00' group by a.ts,
st order by a.ts

Plan (EXPLAIN ANALYZE)
Sort  (cost=10005447874.54..10005447879.07 rows=1810 width=44) (actual
time=178883.936..178884.159 rows=1355 loops=1)
  Output: (sum(b.a)), (count(b.id)), a.ts, b.st
  Sort Key: a.ts
  Sort Method: quicksort  Memory: 154kB
  Buffers: shared hit=47068722 read=102781
  I/O Timings: read=579.946
  -  HashAggregate  (cost=10005447758.51..10005447776.61 rows=1810
width=44) (actual time=178882.874..178883.320 rows=1355 loops=1)
Output: sum(b.a), count(b.id), a.ts, b.st
Group Key: a.ts, b.st
Buffers: shared hit=47068719 read=102781
I/O Timings: read=579.946
-  Nested Loop  (cost=100.43..10004821800.38
rows=62595813 width=44) (actual time=0.167..139484.854 rows=73112419
loops=1)
  Output: a.ts, b.st, b.a, b.id
  Buffers: shared hit=47068719 read=102781
  I/O Timings: read=579.946
  -  Seq Scan on public.table1 a  (cost=0.00..14.81 rows=181
width=8) (actual time=0.058..0.563 rows=181 loops=1)
Output: a.ts
Filter: ((a.ts  '2015-01-01 20:50:44+00'::timestamp
with time zone) AND (a.ts  '2015-07-01 19:50:44+00'::timestamp with time
zone))
Rows Removed by Filter: 540
Buffers: shared read=4
I/O Timings: read=0.061
  -  Index Scan using end_date_idx on public.table2 b
 (cost=0.43..23181.37 rows=345833 width=52) (actual time=0.063..622.274
rows=403936 loops=181)
Output: b.serial_no, b.name, b.st, b.end_date, b.a,
b.start_date
Index Cond: (a.ts  b.end_date)
Filter: (a.ts  b.start_date)
Rows Removed by Filter: 392642
Buffers: shared hit=47068719 read=102777
I/O Timings: read=579.885
Planning time: 0.198 ms
Execution time: 178884.467 ms

Any pointers on how to go about optimizing this?

--yr