Re: Extremely slow HashAggregate in simple UNION query

2019-08-24 Thread Andres Freund
Hi,

On August 24, 2019 12:41:03 PM PDT, Tom Lane  wrote:
>Jeff Janes  writes:
>> Most of the time is not after the clock stops, but before the
>stepwise
>> ANALYZE clock starts.  If you just do an EXPLAIN rather than EXPLAIN
>> ANALYZE, that is also slow.  The giant hash table is created during
>the
>> planning step (or somewhere around there
>
>It's in executor startup, I believe.

I'm sure personally interested in doing so, but it'd not be hard to measure the 
executor startup time separately. And display it either on a per node basis, or 
as a total number.

Quite unconvinced this thread is a convincing reason to do so (or really do 
anything). But for some other workloads executor startup is a very large 
fraction of the total time, without massive misestimations. Optimizing that 
could be easier with that information available.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Extremely slow HashAggregate in simple UNION query

2019-08-24 Thread Jeff Janes
On Thu, Aug 22, 2019 at 1:09 AM Pavel Stehule 
wrote:

> čt 22. 8. 2019 v 3:11 odesílatel Jeff Janes  napsal:
>
>> ...


> But the same advance in v12 which makes it harder to fool with your test
>> case also opens the possibility of fixing your real case.
>>
>
> I think so much more interesting should be long time after query
> processing - last row was processed in 13ms, but Execution Time was 69ms ..
> so some cleaning is 56ms - that is pretty long.
>

Most of the time is not after the clock stops, but before the stepwise
ANALYZE clock starts.  If you just do an EXPLAIN rather than EXPLAIN
ANALYZE, that is also slow.  The giant hash table is created during the
planning step (or somewhere around there--I notice that EXPLAIN ANALYZE
output doesn't count it in what it labels as the planning step--but it is
some step that EXPLAIN without ANALYZE does execute, which to me makes it a
planning step).

For me, "perf top" shows kernel's __do_page_fault as the top
function.  tuplehash_iterate does show up at 20% (which I think is
overattributed, considering how little the speedup is when dropping
ANALYZE), but everything else just looks like kernel memory management code.

Cheers,

Jeff


Re: Extremely slow HashAggregate in simple UNION query

2019-08-22 Thread Felix Geisendörfer



> On 21. Aug 2019, at 20:26, Jeff Janes  wrote:
> 
> As noted elsewhere, v12 thwarts your attempts to deliberately design the bad 
> estimates.  You can still get them, you just have to work a bit harder at it:
> 
> CREATE FUNCTION j (bigint, bigint) returns setof bigint as $$ select 
> generate_series($1,$2) $$ rows 1000 language sql;

Yeah, that's awesome! I didn't know about this until I ran into this issue, 
I'll definitely be using it for future estimation problems that are difficult 
to fix otherwise!

> I've made an extension which has a function which always returns true, but 
> lies about how often it is expected to return true. See the attachment.  With 
> that, you can fine-tune the planner.
> 
> CREATE EXTENSION pg_selectivities ;

Very cool and useful : )!

I think in most cases I'll be okay with declaring a function with a static ROWS 
estimate, but I'll consider your extension if I need more flexibility in the 
future!

Thanks
Felix



Re: Extremely slow HashAggregate in simple UNION query

2019-08-22 Thread Pavel Stehule
čt 22. 8. 2019 v 3:11 odesílatel Jeff Janes  napsal:

> On Tue, Aug 20, 2019 at 11:12 AM Felix Geisendörfer 
> wrote:
>  ...
>
>
>> [1] My actual query had bad estimates for other reasons (GIN Index), but
>> that's another story. The query above was of course deliberately designed
>> to have bad estimates.
>>
>
> As noted elsewhere, v12 thwarts your attempts to deliberately design the
> bad estimates.  You can still get them, you just have to work a bit harder
> at it:
>
> CREATE FUNCTION j (bigint, bigint) returns setof bigint as $$ select
> generate_series($1,$2) $$ rows 1000 language sql;
>
> EXPLAIN ANALYZE
> SELECT * FROM j(1, 1) a, j(1, 1) b
> UNION
> SELECT * FROM j(1, 1) a, j(1, 1) b;
> QUERY PLAN
>
>
> --
>  HashAggregate  (cost=80021.00..100021.00 rows=200 width=16) (actual
> time=11.332..13.241 rows=1 loops=1)
>Group Key: a.a, b.b
>->  Append  (cost=0.50..70021.00 rows=200 width=16) (actual
> time=0.118..0.163 rows=2 loops=1)
>  ->  Nested Loop  (cost=0.50..20010.50 rows=100 width=16)
> (actual time=0.117..0.118 rows=1 loops=1)
>->  Function Scan on j a  (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.087..0.088 rows=1 loops=1)
>->  Function Scan on j b  (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.027..0.027 rows=1 loops=1)
>  ->  Nested Loop  (cost=0.50..20010.50 rows=100 width=16)
> (actual time=0.044..0.044 rows=1 loops=1)
>->  Function Scan on j a_1  (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.022..0.022 rows=1 loops=1)
>->  Function Scan on j b_1  (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.020..0.021 rows=1 loops=1)
>  Planning Time: 0.085 ms
>  Execution Time: 69.277 ms
> (11 rows)
>
> But the same advance in v12 which makes it harder to fool with your test
> case also opens the possibility of fixing your real case.
>

I think so much more interesting should be long time after query processing
- last row was processed in 13ms, but Execution Time was 69ms .. so some
cleaning is 56ms - that is pretty long.


> I've made an extension which has a function which always returns true, but
> lies about how often it is expected to return true. See the attachment.
> With that, you can fine-tune the planner.
>
> CREATE EXTENSION pg_selectivities ;
>
> EXPLAIN ANALYZE
> SELECT * FROM j(1, 1) a, j(1, 1) b where pg_always(0.1)
> UNION
> SELECT * FROM j(1, 1) a, j(1, 1) b where pg_always(0.1);
> QUERY PLAN
>
>
> --
>  HashAggregate  (cost=45021.40..45021.60 rows=20 width=16) (actual
> time=0.226..0.227 rows=1 loops=1)
>Group Key: a.a, b.b
>->  Append  (cost=0.50..45021.30 rows=20 width=16) (actual
> time=0.105..0.220 rows=2 loops=1)
>  ->  Nested Loop  (cost=0.50..22510.50 rows=10 width=16) (actual
> time=0.104..0.105 rows=1 loops=1)
>Join Filter: pg_always('1e-05'::double precision)
>->  Function Scan on j a  (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.066..0.066 rows=1 loops=1)
>->  Function Scan on j b  (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.035..0.035 rows=1 loops=1)
>  ->  Nested Loop  (cost=0.50..22510.50 rows=10 width=16) (actual
> time=0.112..0.113 rows=1 loops=1)
>Join Filter: pg_always('1e-05'::double precision)
>->  Function Scan on j a_1  (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.077..0.077 rows=1 loops=1)
>->  Function Scan on j b_1  (cost=0.25..10.25 rows=1000
> width=8) (actual time=0.034..0.034 rows=1 loops=1)
>  Planning Time: 0.139 ms
>  Execution Time: 0.281 ms
>
> Cheers,
>
> Jeff
>
>


Re: Extremely slow HashAggregate in simple UNION query

2019-08-21 Thread Jeff Janes
On Tue, Aug 20, 2019 at 11:12 AM Felix Geisendörfer 
wrote:
 ...


> [1] My actual query had bad estimates for other reasons (GIN Index), but
> that's another story. The query above was of course deliberately designed
> to have bad estimates.
>

As noted elsewhere, v12 thwarts your attempts to deliberately design the
bad estimates.  You can still get them, you just have to work a bit harder
at it:

CREATE FUNCTION j (bigint, bigint) returns setof bigint as $$ select
generate_series($1,$2) $$ rows 1000 language sql;

EXPLAIN ANALYZE
SELECT * FROM j(1, 1) a, j(1, 1) b
UNION
SELECT * FROM j(1, 1) a, j(1, 1) b;
QUERY PLAN

--
 HashAggregate  (cost=80021.00..100021.00 rows=200 width=16) (actual
time=11.332..13.241 rows=1 loops=1)
   Group Key: a.a, b.b
   ->  Append  (cost=0.50..70021.00 rows=200 width=16) (actual
time=0.118..0.163 rows=2 loops=1)
 ->  Nested Loop  (cost=0.50..20010.50 rows=100 width=16)
(actual time=0.117..0.118 rows=1 loops=1)
   ->  Function Scan on j a  (cost=0.25..10.25 rows=1000
width=8) (actual time=0.087..0.088 rows=1 loops=1)
   ->  Function Scan on j b  (cost=0.25..10.25 rows=1000
width=8) (actual time=0.027..0.027 rows=1 loops=1)
 ->  Nested Loop  (cost=0.50..20010.50 rows=100 width=16)
(actual time=0.044..0.044 rows=1 loops=1)
   ->  Function Scan on j a_1  (cost=0.25..10.25 rows=1000
width=8) (actual time=0.022..0.022 rows=1 loops=1)
   ->  Function Scan on j b_1  (cost=0.25..10.25 rows=1000
width=8) (actual time=0.020..0.021 rows=1 loops=1)
 Planning Time: 0.085 ms
 Execution Time: 69.277 ms
(11 rows)

But the same advance in v12 which makes it harder to fool with your test
case also opens the possibility of fixing your real case.

I've made an extension which has a function which always returns true, but
lies about how often it is expected to return true. See the attachment.
With that, you can fine-tune the planner.

CREATE EXTENSION pg_selectivities ;

EXPLAIN ANALYZE
SELECT * FROM j(1, 1) a, j(1, 1) b where pg_always(0.1)
UNION
SELECT * FROM j(1, 1) a, j(1, 1) b where pg_always(0.1);
QUERY PLAN

--
 HashAggregate  (cost=45021.40..45021.60 rows=20 width=16) (actual
time=0.226..0.227 rows=1 loops=1)
   Group Key: a.a, b.b
   ->  Append  (cost=0.50..45021.30 rows=20 width=16) (actual
time=0.105..0.220 rows=2 loops=1)
 ->  Nested Loop  (cost=0.50..22510.50 rows=10 width=16) (actual
time=0.104..0.105 rows=1 loops=1)
   Join Filter: pg_always('1e-05'::double precision)
   ->  Function Scan on j a  (cost=0.25..10.25 rows=1000
width=8) (actual time=0.066..0.066 rows=1 loops=1)
   ->  Function Scan on j b  (cost=0.25..10.25 rows=1000
width=8) (actual time=0.035..0.035 rows=1 loops=1)
 ->  Nested Loop  (cost=0.50..22510.50 rows=10 width=16) (actual
time=0.112..0.113 rows=1 loops=1)
   Join Filter: pg_always('1e-05'::double precision)
   ->  Function Scan on j a_1  (cost=0.25..10.25 rows=1000
width=8) (actual time=0.077..0.077 rows=1 loops=1)
   ->  Function Scan on j b_1  (cost=0.25..10.25 rows=1000
width=8) (actual time=0.034..0.034 rows=1 loops=1)
 Planning Time: 0.139 ms
 Execution Time: 0.281 ms

Cheers,

Jeff


pg_selectivities.patch
Description: Binary data


Re: Extremely slow HashAggregate in simple UNION query

2019-08-20 Thread Michael Lewis
I believe this would be relevant-
https://www.cybertec-postgresql.com/en/optimizer-support-functions/

It seems there is hope down the road to improve those estimates.


Re: Extremely slow HashAggregate in simple UNION query

2019-08-20 Thread Andres Freund
Hi,

On 2019-08-20 19:55:56 +0200, Felix Geisendörfer wrote:
> > On 20. Aug 2019, at 19:32, Andres Freund  wrote:
> > FWIW, that's not a mis-estimate I'm getting on master ;).  Obviously
> > that doesn't actually address your concern...
> 
> I suppose this is thanks to the new optimizer support functions
> mentioned by Michael and Pavel?

Right.


> > Under-sizing the hashtable just out of caution will have add overhead to
> > a lot more common cases. That requires copying data around during
> > growth, which is far far from free. Or you can use hashtables that don't
> > need to copy, but they're also considerably slower in the more common
> > cases.
> 
> How does PostgreSQL currently handle the case where the initial hash
> table is under-sized due to the planner having underestimated things?
> Are the growth costs getting amortized by using an exponential growth
> function?

Yes. But that's still far from free.

Greetings,

Andres Freund




Re: Extremely slow HashAggregate in simple UNION query

2019-08-20 Thread Felix Geisendörfer
Hi,

> On 20. Aug 2019, at 19:32, Andres Freund  wrote:
> 
> Hi,
> 
> On 2019-08-20 17:11:58 +0200, Felix Geisendörfer wrote:
>> 
>> HashAggregate  (cost=80020.01..100020.01 rows=200 width=8) (actual 
>> time=19.349..23.123 rows=1 loops=1)
> 
> FWIW, that's not a mis-estimate I'm getting on master ;).  Obviously
> that doesn't actually address your concern...

I suppose this is thanks to the new optimizer support functions
mentioned by Michael and Pavel?

Of course I'm very excited about those improvements, but yeah, my 
real query is mis-estimating things for totally different reasons not
involving any SRFs.

>> I'm certainly a novice when it comes to PostgreSQL internals, but I'm
>> wondering if this could be fixed by taking a more dynamic approach for
>> allocating HashAggregate hash tables?
> 
> Under-sizing the hashtable just out of caution will have add overhead to
> a lot more common cases. That requires copying data around during
> growth, which is far far from free. Or you can use hashtables that don't
> need to copy, but they're also considerably slower in the more common
> cases.

How does PostgreSQL currently handle the case where the initial hash
table is under-sized due to the planner having underestimated things?
Are the growth costs getting amortized by using an exponential growth
function?

Anyway, I can accept my situation to be an edge case that doesn't justify
making things more complicated.

>> 3. Somehow EXPLAIN gets confused by this and only ends up tracking 23ms of 
>> the query execution instead of 45ms [5].
> 
> Well, there's plenty work that's not attributed to nodes. IIRC we don't
> track executor startup/shutdown overhead on a per-node basis.

I didn't know that, thanks for clarifying : ).



Re: Extremely slow HashAggregate in simple UNION query

2019-08-20 Thread Andres Freund
Hi,

On 2019-08-20 17:11:58 +0200, Felix Geisendörfer wrote:
> today I debugged a query that was executing about 100x slower than expected, 
> and was very surprised by what I found.
>
> I'm posting to this list to see if this might be an issue that should be 
> fixed in PostgreSQL itself.
>
> Below is a simplified version of the query in question:
>
> SET work_mem='64MB';
> EXPLAIN ANALYZE
> SELECT * FROM generate_series(1, 1) a, generate_series(1, 1) b
> UNION
> SELECT * FROM generate_series(1, 1) a, generate_series(1, 1) b;
>
> HashAggregate  (cost=80020.01..100020.01 rows=200 width=8) (actual 
> time=19.349..23.123 rows=1 loops=1)

FWIW, that's not a mis-estimate I'm getting on master ;).  Obviously
that doesn't actually address your concern...


> 1. The query overestimates the final output rows by a factor of 2 million. [1]

Right. There's not really that much we can do about that in
general. That'll always be possible. Although we can obviously improve
the estimates a good bit more.


> I'm certainly a novice when it comes to PostgreSQL internals, but I'm
> wondering if this could be fixed by taking a more dynamic approach for
> allocating HashAggregate hash tables?

Under-sizing the hashtable just out of caution will have add overhead to
a lot more common cases. That requires copying data around during
growth, which is far far from free. Or you can use hashtables that don't
need to copy, but they're also considerably slower in the more common
cases.


> 3. Somehow EXPLAIN gets confused by this and only ends up tracking 23ms of 
> the query execution instead of 45ms [5].

Well, there's plenty work that's not attributed to nodes. IIRC we don't
track executor startup/shutdown overhead on a per-node basis. So I don't
really think this is necessarily something that suspicious.  Which
indeed seems to be what's happening here (this is with 11, to be able to
hit the problem with your reproducer):

+   33.01%  postgres  postgres  [.] tuplehash_iterate
-   18.39%  postgres  libc-2.28.so  [.] __memset_avx2_erms
   - 90.94% page_fault
__memset_avx2_erms
tuplehash_allocate
tuplehash_create
BuildTupleHashTableExt
build_hash_table
ExecInitAgg
ExecInitNode
InitPlan
standard_ExecutorStart

Greetings,

Andres Freund




Re: Extremely slow HashAggregate in simple UNION query

2019-08-20 Thread Pavel Stehule
út 20. 8. 2019 v 17:12 odesílatel Felix Geisendörfer 
napsal:

> Hi all,
>
> today I debugged a query that was executing about 100x slower than
> expected, and was very surprised by what I found.
>
> I'm posting to this list to see if this might be an issue that should be
> fixed in PostgreSQL itself.
>
> Below is a simplified version of the query in question:
>
> SET work_mem='64MB';q
> EXPLAIN ANALYZE
> SELECT * FROM generate_series(1, 1) a, generate_series(1, 1) b
> UNION
> SELECT * FROM generate_series(1, 1) a, generate_series(1, 1) b;
>
> HashAggregate  (cost=80020.01..100020.01 rows=200 width=8) (actual
> time=19.349..23.123 rows=1 loops=1)
>   Group Key: a.a, b.b
>   ->  Append  (cost=0.01..70020.01 rows=200 width=8) (actual
> time=0.022..0.030 rows=2 loops=1)
> ->  Nested Loop  (cost=0.01..20010.01 rows=100 width=8)
> (actual time=0.021..0.022 rows=1 loops=1)
>   ->  Function Scan on generate_series a  (cost=0.00..10.00
> rows=1000 width=4) (actual time=0.014..0.014 rows=1 loops=1)
>   ->  Function Scan on generate_series b  (cost=0.00..10.00
> rows=1000 width=4) (actual time=0.002..0.003 rows=1 loops=1)
> ->  Nested Loop  (cost=0.01..20010.01 rows=100 width=8)
> (actual time=0.006..0.007 rows=1 loops=1)
>   ->  Function Scan on generate_series a_1  (cost=0.00..10.00
> rows=1000 width=4) (actual time=0.002..0.003 rows=1 loops=1)
>   ->  Function Scan on generate_series b_1  (cost=0.00..10.00
> rows=1000 width=4) (actual time=0.002..0.002 rows=1 loops=1)
> Planning Time: 0.101 ms
> Execution Time: 45.986 ms
>
> As you can see, it takes over 45ms (!) to execute what appear to be a very
> simple query. How is this possible?
>
> Based on my debugging, I think the following is going on:
>
> 1. The query overestimates the final output rows by a factor of 2 million.
> [1]
> 2. The executor uses the bad estimate to allocate a huge hash table [2],
> and the increased work_mem [3] gives it enough rope to hang itself [4].
> 3. Somehow EXPLAIN gets confused by this and only ends up tracking 23ms of
> the query execution instead of 45ms [5].
>
> I'm certainly a novice when it comes to PostgreSQL internals, but I'm
> wondering if this could be fixed by taking a more dynamic approach for
> allocating HashAggregate hash tables?
>

Table functions has not statistics - and default ROWS value is 1000, so it
is reason why there is very ever estimating. This specific case can be
solved better in PostgreSQL 12, where some functions like generate_series
has support functions with better estimations.

You can get profile of this query with some profiler

https://wiki.postgresql.org/wiki/Profiling_with_perf

and you can see a reason why the query is slow.

The speed on PostgreSQL 12 of your example is good - about 1ms

But when I repeat your example, the speed was more terrible. On second hand
- nobody can expect optimal plan when there is this crazy miss estimation.
Looks so some wrong is inside some cleaning part.

Pavel




> Anyway, for my query using UNION ALL was acceptable, so I'm not too
> troubled by this. I was just really caught by surprise that bad estimates
> can not only cause bad query plans, but also cause good query plans to
> execute extremely slowly. I had never seen that before.
>
> Best Regards
> Felix
>
> [1] My actual query had bad estimates for other reasons (GIN Index), but
> that's another story. The query above was of course deliberately designed
> to have bad estimates.
> [2] nodeAgg.c: build_hash_table()
> [3] A lot of queries in my application benefit from increased work_mem.
> [4] execGrouping.c: nbuckets = Min(nbuckets, (long) ((work_mem * 1024L) /
> entrysize));
> [5] In my actual query it was even worse, only 6 out of 40ms of the total
> execution time were reported as being spent in the query nodes.
>
>