Re: [PERFORM] Any advice tuning this query ?

2016-11-12 Thread Jeff Janes
On Fri, Nov 11, 2016 at 7:19 AM, Henrik Ekenberg  wrote:

> Hi,
>
> I have a select moving around a lot of data and takes times
> Any advice tuning this query ?
>
> EXPLAIN (ANALYZE ON, BUFFERS ON)
>

When accessing lots of data, sometimes the act of collecting timing on all
of the actions makes the query take >2x times longer, or more, and distorts
the timings it collects.

Try running the same query like:

EXPLAIN (ANALYZE ON, BUFFERS ON, timing off)

If the Execution times are very similar either way, then you don't have
this problem.  But if they differ, then you can't depend on the results of
the timings reported when timing is turned on.  Large sorts are
particularly subject to this problem.

More than half the time (if the times are believable) goes to scanning the
index activations_books_date.  You might be better off with a sort rather
than an index scan.  You can test this by doing:

begin;
drop index activations_books_date;
;
rollback;

Don't do that on production server, as it will block other access to the
table for the duration.


You might also benefit from hash joins/aggregates, but you would have to
set work_mem to a very large value get them.  I'd start by setting work_mem
in your session to 1TB, and seeing if that changes the explain plan (just
explain, not explain analyze!).  If that supports the hash
joins/aggregates, then keeping lowering work_mem until you find the minimum
that supports the hash plans.  Then ponder if it is safe to use that much
work_mem "for real" given your RAM and level  of concurrent access.

Cheers,

Jeff


Re: [PERFORM] Any advice tuning this query ?

2016-11-11 Thread Andreas Karlsson
I have a couple of suggestions which should lead to some minor 
improvements, but in general I am surprised by the huge size of the 
result set. Is your goal really to get a 43 million row result? When a 
query returns that many rows usually all possible query plans are more 
or less bad.


1) You can remove "3" from the group by clause to avoid having to sort 
that data when we already sort by d.date.


2) If (books, date) is the primary key of dates_per_books we can also 
safely remove "4" from the group by clause further reducing the length 
of the keys that we need to sort.


3) For a minor speed up change "coalesce(sum(case when i.invno is not 
null then 1 else 0 end),0)" to "count(i.invno)".


Andreas


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any advice tuning this query ?

2016-11-11 Thread Devrim Gündüz

Hi,

On Fri, 2016-11-11 at 16:19 +0100, Henrik Ekenberg wrote:
>  Sort Method: external merge  Disk: 16782928kB

This query is generating 16GB temp file on disk. Is this the amount of data you
want to sort?

Regards,
-- 
Devrim GÜNDÜZ
EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Twitter: @DevrimGunduz , @DevrimGunduzTR


signature.asc
Description: This is a digitally signed message part


[PERFORM] Any advice tuning this query ?

2016-11-11 Thread Henrik Ekenberg

Hi,

I have a select moving around a lot of data and takes times
Any advice tuning this query ?

EXPLAIN (ANALYZE ON, BUFFERS ON)
    select
    d.books,
    d.date publish_date,
    extract(dow from d.date) publish_dow,
    week_num_fixed,
    coalesce(sum(case when i.invno is not null then 1 else 0 end),0) as
daily_cnt,
    coalesce(sum(i.activation_amount_sek),0) as daily_amt_sek
    from dates_per_books d
    left join publishing_data i on (d.books=i.books and
d.date=i.publish_date)
    group by 1,2,3,4;

( explain : https://explain.depesz.com/s/aDOi )
    

   
QUERY
PLAN
  
 
--
 GroupAggregate  (cost=44606264.52..48172260.66 rows=4318263 width=68)
(actual time=839980.887..1029679.771 rows=43182733 loops=1)
   Group Key: d.books, d.date, (date_part('dow'::text,
(d.date)::timestamp without time zone)), d.week_num_fixed
   Buffers: shared hit=3, local hit=10153260 read=165591641, temp
read=2097960 written=2097960
   I/O Timings: read=399828.103
   ->  Sort  (cost=44606264.52..45104896.89 rows=199452945 width=48)
(actual time=839980.840..933883.311 rows=283894005 loops=1)
 Sort Key: d.books, d.date, (date_part('dow'::text,
(d.date)::timestamp without time zone)), d.week_num_fixed
 Sort Method: external merge  Disk: 16782928kB
 Buffers: shared hit=3, local hit=10153260 read=165591641,
temp read=2097960 written=2097960
 I/O Timings: read=399828.103
 ->  Merge Left Join  (cost=191.15..13428896.40
rows=199452945 width=48) (actual time=0.031..734937.112 rows=283894005
loops=1)
   Merge Cond: ((d.books = i.books) AND (d.date =
i.publish_date))
   Buffers: local hit=10153260 read=165591641
   I/O Timings: read=399828.103
   ->  Index Scan using books_date on
dates_per_books d  (cost=0.56..1177329.91 rows=43182628 width=20) (actual
time=0.005..33789.216 rows=43182733 loops=1)
 Buffers: local hit=10 read=475818
 I/O Timings: read=27761.376
   ->  Index Scan using activations_books_date
on publishing_data i  (cost=0.57..7797117.25 rows=249348384 width=32)
(actual time=0.004..579806.706 rows=249348443 loops=1)
 Buffers: local hit=10153250
read=165115823
 I/O Timings: read=372066.727
 Planning time: 2.864 ms
 Execution time: 1034284.193 ms
(21 rows)

(END)