Re: Parallel Query

2019-11-14 Thread Luís Roberto Weck

Em 13/11/2019 19:08, Jeff Janes escreveu:
On Wed, Nov 13, 2019 at 3:59 PM Luís Roberto Weck 
mailto:luisrobe...@siscobra.com.br>> wrote:




Indeed, reducing the costs made the query run in parallel, but the
improvement in speed was not worth the cost (CPU).


Could you show the plan for that?

Sure:

Finalize GroupAggregate  (cost=842675.56..1017018.29 rows=3470567 
width=14) (actual time=61419.510..65635.188 rows=86 loops=1)

  Group Key: c.concod, cp.conparnum, cp.conpardatven
  Filter: (count(*) > 1)
  Rows Removed by Filter: 6787359
  ->  Gather Merge  (cost=842675.56..947606.94 rows=3470568 width=22) 
(actual time=51620.609..60648.085 rows=6787506 loops=1)

    Workers Planned: 4
    Workers Launched: 4
    ->  Partial GroupAggregate (cost=842575.50..862097.45 
rows=867642 width=22) (actual time=51585.526..53477.065 rows=1357501 
loops=5)

  Group Key: c.concod, cp.conparnum, cp.conpardatven
  ->  Sort  (cost=842575.50..844744.61 rows=867642 
width=14) (actual time=51585.514..51951.984 rows=1357506 loops=5)

    Sort Key: c.concod, cp.conparnum, cp.conpardatven
    Sort Method: quicksort  Memory: 112999kB
    ->  Hash Join (cost=34390.13..756996.76 rows=867642 
width=14) (actual time=1087.591..49744.673 rows=1357506 loops=5)

  Hash Cond: (cp.concod = c.concod)
  ->  Parallel Seq Scan on contrato_parcela cp  
(cost=0.00..714762.89 rows=2988089 width=14) (actual 
time=0.077..46674.986 rows=2392501 loops=5)
  ->  Hash (cost=23462.75..23462.75 rows=874190 
width=8) (actual time=1080.189..1080.189 rows=879841 loops=5)
    Buckets: 1048576  Batches: 1  Memory 
Usage: 42561kB
    ->  Index Only Scan using 
contrato_iu0004 on contrato c  (cost=0.43..23462.75 rows=874190 width=8) 
(actual time=0.141..663.108 rows=879841 loops=5)

  Index Cond: (carcod = 100)
  Heap Fetches: 35197
Planning time: 1.045 ms
Execution time: 65734.134 ms


Re: Parallel Query

2019-11-13 Thread Jeff Janes
On Wed, Nov 13, 2019 at 4:01 PM Luís Roberto Weck <
luisrobe...@siscobra.com.br> wrote:

>
> Maybe PostgreSQL can't find a way to calculate having estimates?
>

I wasn't even thinking of the HAVING estimates I was thinking of just the
raw aggregates.  It can't implement the HAVING until has the raw aggregate
in hand. But, what is the actual row count without the HAVING?  Well, I
notice now this line:

Rows Removed by Filter: 6787359

So the row count of rows=86 is mostly due to the HAVING, not due to the raw
aggregation, a point I overlooked initially.  So the planner is not
mistaken in thinking that a huge number of rows need to be passed up--it is
correct in thinking that.

Cheers,

Jeff


Re: Parallel Query

2019-11-13 Thread Jeff Janes
On Wed, Nov 13, 2019 at 3:59 PM Luís Roberto Weck <
luisrobe...@siscobra.com.br> wrote:

>
>
> Indeed, reducing the costs made the query run in parallel, but the
> improvement in speed was not worth the cost (CPU).
>

Could you show the plan for that?


Re: Parallel Query

2019-11-13 Thread Luís Roberto Weck



Em 13/11/2019 17:40, Jeff Janes escreveu:
On Wed, Nov 13, 2019 at 3:11 PM Luís Roberto Weck 
mailto:luisrobe...@siscobra.com.br>> wrote:


Hi!

Is there a reason query 3 can't use parallel workers? Using q1 and q2
they seem very similar but can use up to 4 workers to run faster:

q1: https://pastebin.com/ufkbSmfB
q2: https://pastebin.com/Yt32zRNX
q3: https://pastebin.com/dqh7yKPb

The sort node on q3 takes almost 12 seconds, making the query run
on 68
if I had set enough work_mem to make it all in memory.


The third one thinks it is going find 3454539 output rows.  If it run 
in parallel, it thinks it will be passing lots of rows up from the 
parallel workers, and charges a high price (parallel_tuple_cost = 0.1) 
for doing so.  So you can try lowering parallel_tuple_cost, or 
figuring out why the estimate is so bad.


Cheers,

Jeff

 Hi Jeff,

I don't think the "HAVING" clause is havin any effect on the estimates:

WITHOUT "HAVING":
Group  (cost=1245134.08..1279680.28 rows=3454620 width=14)
  Group Key: c.concod, cp.conparnum, cp.conpardatven
  ->  Sort  (cost=1245134.08..1253770.63 rows=3454620 width=14)
    Sort Key: c.concod, cp.conparnum, cp.conpardatven
    ->  Hash Join  (cost=34366.64..869960.70 rows=3454620 width=14)
  Hash Cond: (cp.concod = c.concod)
  ->  Seq Scan on contrato_parcela cp (cost=0.00..804248.08 
rows=11941308 width=14)

  ->  Hash  (cost=23436.55..23436.55 rows=874407 width=8)
    ->  Index Only Scan using contrato_iu0004 on 
contrato c  (cost=0.43..23436.55 rows=874407 width=8)

  Index Cond: (carcod = 100)

WITH "HAVING":
GroupAggregate  (cost=1245144.88..1322874.51 rows=3454650 width=14)
  Group Key: c.concod, cp.conparnum, cp.conpardatven
  Filter: (count(*) > 1)
  ->  Sort  (cost=1245144.88..1253781.51 rows=3454650 width=14)
    Sort Key: c.concod, cp.conparnum, cp.conpardatven
    ->  Hash Join  (cost=34366.64..869968.02 rows=3454650 width=14)
  Hash Cond: (cp.concod = c.concod)
  ->  Seq Scan on contrato_parcela cp (cost=0.00..804255.13 
rows=11941413 width=14)

  ->  Hash  (cost=23436.55..23436.55 rows=874407 width=8)
    ->  Index Only Scan using contrato_iu0004 on 
contrato c  (cost=0.43..23436.55 rows=874407 width=8)

  Index Cond: (carcod = 100)

Maybe PostgreSQL can't find a way to calculate having estimates?


Re: Parallel Query

2019-11-13 Thread Luís Roberto Weck



Em 13/11/2019 17:47, Tomas Vondra escreveu:

On Wed, Nov 13, 2019 at 05:16:44PM -0300, Luís Roberto Weck wrote:

Hi!

Is there a reason query 3 can't use parallel workers? Using q1 and q2 
they seem very similar but can use up to 4 workers to run faster:


q1: https://pastebin.com/ufkbSmfB
q2: https://pastebin.com/Yt32zRNX
q3: https://pastebin.com/dqh7yKPb

The sort node on q3 takes almost 12 seconds, making the query run on 
68  if I had set enough work_mem to make it all in memory.




Most likely because it'd be actually slower. The trouble is the
aggregation does not actually reduce the cardinality, or at least the
planner does not expect that - the Sort and GroupAggregate are expected
to produce 3454539 rows. The last step of the aggregation has to receive
and merge data from all workers, which is not exactly free, and if there
is no reduction of cardinality it's likely cheaper to just do everything
in a single process serially.

How does the explain analyze output look like without the HAVING clause?

Try setting parallel_setup_cost and parallel_tuple_cost to 0. That might
trigger parallel query.

regards


Tomas,

EXPLAIN:
Group  (cost=1245130.37..1279676.46 rows=3454609 width=14)
  Group Key: c.concod, cp.conparnum, cp.conpardatven
  ->  Sort  (cost=1245130.37..1253766.89 rows=3454609 width=14)
    Sort Key: c.concod, cp.conparnum, cp.conpardatven
    ->  Hash Join  (cost=34366.64..869958.26 rows=3454609 width=14)
  Hash Cond: (cp.concod = c.concod)
  ->  Seq Scan on contrato_parcela cp (cost=0.00..804245.73 
rows=11941273 width=14)

  ->  Hash  (cost=23436.55..23436.55 rows=874407 width=8)
    ->  Index Only Scan using contrato_iu0004 on 
contrato c  (cost=0.43..23436.55 rows=874407 width=8)

  Index Cond: (carcod = 100)

EXPLAIN ANALYZE:

Group  (cost=1245132.29..1279678.44 rows=3454615 width=14) (actual 
time=61860.985..64852.579 rows=6787445 loops=1)

  Group Key: c.concod, cp.conparnum, cp.conpardatven
  ->  Sort  (cost=1245132.29..1253768.83 rows=3454615 width=14) (actual 
time=61860.980..63128.557 rows=6787531 loops=1)

    Sort Key: c.concod, cp.conparnum, cp.conpardatven
    Sort Method: external merge  Disk: 172688kB
    ->  Hash Join  (cost=34366.64..869959.48 rows=3454615 width=14) 
(actual time=876.428..52675.140 rows=6787531 loops=1)

  Hash Cond: (cp.concod = c.concod)
  ->  Seq Scan on contrato_parcela cp (cost=0.00..804246.91 
rows=11941291 width=14) (actual time=0.010..44860.242 rows=11962505 loops=1)
  ->  Hash  (cost=23436.55..23436.55 rows=874407 width=8) 
(actual time=874.791..874.791 rows=879841 loops=1)

    Buckets: 1048576  Batches: 1  Memory Usage: 42561kB
    ->  Index Only Scan using contrato_iu0004 on 
contrato c  (cost=0.43..23436.55 rows=874407 width=8) (actual 
time=0.036..535.897 rows=879841 loops=1)

  Index Cond: (carcod = 100)
  Heap Fetches: 144438
Planning time: 1.252 ms
Execution time: 65214.007 ms


Indeed, reducing the costs made the query run in parallel, but the 
improvement in speed was not worth the cost (CPU).


Re: Parallel Query

2019-11-13 Thread Tomas Vondra

On Wed, Nov 13, 2019 at 05:16:44PM -0300, Luís Roberto Weck wrote:

Hi!

Is there a reason query 3 can't use parallel workers? Using q1 and q2 
they seem very similar but can use up to 4 workers to run faster:


q1: https://pastebin.com/ufkbSmfB
q2: https://pastebin.com/Yt32zRNX
q3: https://pastebin.com/dqh7yKPb

The sort node on q3 takes almost 12 seconds, making the query run on 
68  if I had set enough work_mem to make it all in memory.




Most likely because it'd be actually slower. The trouble is the
aggregation does not actually reduce the cardinality, or at least the
planner does not expect that - the Sort and GroupAggregate are expected
to produce 3454539 rows. The last step of the aggregation has to receive
and merge data from all workers, which is not exactly free, and if there
is no reduction of cardinality it's likely cheaper to just do everything
in a single process serially.

How does the explain analyze output look like without the HAVING clause?

Try setting parallel_setup_cost and parallel_tuple_cost to 0. That might
trigger parallel query.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Parallel Query

2019-11-13 Thread Jeff Janes
On Wed, Nov 13, 2019 at 3:11 PM Luís Roberto Weck <
luisrobe...@siscobra.com.br> wrote:

> Hi!
>
> Is there a reason query 3 can't use parallel workers? Using q1 and q2
> they seem very similar but can use up to 4 workers to run faster:
>
> q1: https://pastebin.com/ufkbSmfB
> q2: https://pastebin.com/Yt32zRNX
> q3: https://pastebin.com/dqh7yKPb
>
> The sort node on q3 takes almost 12 seconds, making the query run on 68
> if I had set enough work_mem to make it all in memory.
>

The third one thinks it is going find 3454539 output rows.  If it run in
parallel, it thinks it will be passing lots of rows up from the parallel
workers, and charges a high price (parallel_tuple_cost = 0.1) for doing
so.  So you can try lowering  parallel_tuple_cost, or figuring out why the
estimate is so bad.

Cheers,

Jeff


Parallel Query

2019-11-13 Thread Luís Roberto Weck

Hi!

Is there a reason query 3 can't use parallel workers? Using q1 and q2 
they seem very similar but can use up to 4 workers to run faster:


q1: https://pastebin.com/ufkbSmfB
q2: https://pastebin.com/Yt32zRNX
q3: https://pastebin.com/dqh7yKPb

The sort node on q3 takes almost 12 seconds, making the query run on 68  
if I had set enough work_mem to make it all in memory.


Running version 10.10.




Re: why does this query not use a parallel query

2018-03-02 Thread Tom Lane
Dave Cramer <davecra...@gmail.com> writes:
> Have a query:
> explain analyze SELECT minion_id FROM mob_player_mob_118 WHERE player_id =
> 55351078;

>  Index Only Scan using mob_player_mob_118_pkey on mob_player_mob_118
> (cost=0.44..117887.06 rows=4623076 width=4) (actual time=0.062..3716.105
> rows=4625123 loops=1)

I don't think we have parallel IOS yet (I might be wrong).  If so,
it probably thinks this is cheaper than the best available parallel plan.

> If I just get the count it will use a parallel query

Likely a parallelized aggregation.

regards, tom lane



why does this query not use a parallel query

2018-03-02 Thread Dave Cramer
Have a query:

explain analyze SELECT minion_id FROM mob_player_mob_118 WHERE player_id =
55351078;

 QUERY PLAN
-
 Index Only Scan using mob_player_mob_118_pkey on mob_player_mob_118
(cost=0.44..117887.06 rows=4623076 width=4) (actual time=0.062..3716.105
rows=4625123 loops=1)
   Index Cond: (player_id = 55351078)
   Heap Fetches: 1152408
 Planning time: 0.241 ms
 Execution time: 5272.171 ms

If I just get the count it will use a parallel query

explain analyze SELECT count(minion_id) FROM mob_player_mob_118 WHERE
player_id = 55351078;

Thanks

Dave Cramer