Re: Extremely slow HashAggregate in simple UNION query

2019-08-22 Thread Felix Geisendörfer



> On 21. Aug 2019, at 20:26, Jeff Janes  wrote:
> 
> As noted elsewhere, v12 thwarts your attempts to deliberately design the bad 
> estimates.  You can still get them, you just have to work a bit harder at it:
> 
> CREATE FUNCTION j (bigint, bigint) returns setof bigint as $$ select 
> generate_series($1,$2) $$ rows 1000 language sql;

Yeah, that's awesome! I didn't know about this until I ran into this issue, 
I'll definitely be using it for future estimation problems that are difficult 
to fix otherwise!

> I've made an extension which has a function which always returns true, but 
> lies about how often it is expected to return true. See the attachment.  With 
> that, you can fine-tune the planner.
> 
> CREATE EXTENSION pg_selectivities ;

Very cool and useful : )!

I think in most cases I'll be okay with declaring a function with a static ROWS 
estimate, but I'll consider your extension if I need more flexibility in the 
future!

Thanks
Felix



Re: Extremely slow HashAggregate in simple UNION query

2019-08-20 Thread Felix Geisendörfer
Hi,

> On 20. Aug 2019, at 19:32, Andres Freund  wrote:
> 
> Hi,
> 
> On 2019-08-20 17:11:58 +0200, Felix Geisendörfer wrote:
>> 
>> HashAggregate  (cost=80020.01..100020.01 rows=200 width=8) (actual 
>> time=19.349..23.123 rows=1 loops=1)
> 
> FWIW, that's not a mis-estimate I'm getting on master ;).  Obviously
> that doesn't actually address your concern...

I suppose this is thanks to the new optimizer support functions
mentioned by Michael and Pavel?

Of course I'm very excited about those improvements, but yeah, my 
real query is mis-estimating things for totally different reasons not
involving any SRFs.

>> I'm certainly a novice when it comes to PostgreSQL internals, but I'm
>> wondering if this could be fixed by taking a more dynamic approach for
>> allocating HashAggregate hash tables?
> 
> Under-sizing the hashtable just out of caution will have add overhead to
> a lot more common cases. That requires copying data around during
> growth, which is far far from free. Or you can use hashtables that don't
> need to copy, but they're also considerably slower in the more common
> cases.

How does PostgreSQL currently handle the case where the initial hash
table is under-sized due to the planner having underestimated things?
Are the growth costs getting amortized by using an exponential growth
function?

Anyway, I can accept my situation to be an edge case that doesn't justify
making things more complicated.

>> 3. Somehow EXPLAIN gets confused by this and only ends up tracking 23ms of 
>> the query execution instead of 45ms [5].
> 
> Well, there's plenty work that's not attributed to nodes. IIRC we don't
> track executor startup/shutdown overhead on a per-node basis.

I didn't know that, thanks for clarifying : ).



Extremely slow HashAggregate in simple UNION query

2019-08-20 Thread Felix Geisendörfer
Hi all,

today I debugged a query that was executing about 100x slower than expected, 
and was very surprised by what I found.

I'm posting to this list to see if this might be an issue that should be fixed 
in PostgreSQL itself.

Below is a simplified version of the query in question:

SET work_mem='64MB';
EXPLAIN ANALYZE
SELECT * FROM generate_series(1, 1) a, generate_series(1, 1) b
UNION
SELECT * FROM generate_series(1, 1) a, generate_series(1, 1) b;

HashAggregate  (cost=80020.01..100020.01 rows=200 width=8) (actual 
time=19.349..23.123 rows=1 loops=1)
  Group Key: a.a, b.b
  ->  Append  (cost=0.01..70020.01 rows=200 width=8) (actual 
time=0.022..0.030 rows=2 loops=1)
->  Nested Loop  (cost=0.01..20010.01 rows=100 width=8) (actual 
time=0.021..0.022 rows=1 loops=1)
  ->  Function Scan on generate_series a  (cost=0.00..10.00 
rows=1000 width=4) (actual time=0.014..0.014 rows=1 loops=1)
  ->  Function Scan on generate_series b  (cost=0.00..10.00 
rows=1000 width=4) (actual time=0.002..0.003 rows=1 loops=1)
->  Nested Loop  (cost=0.01..20010.01 rows=100 width=8) (actual 
time=0.006..0.007 rows=1 loops=1)
  ->  Function Scan on generate_series a_1  (cost=0.00..10.00 
rows=1000 width=4) (actual time=0.002..0.003 rows=1 loops=1)
  ->  Function Scan on generate_series b_1  (cost=0.00..10.00 
rows=1000 width=4) (actual time=0.002..0.002 rows=1 loops=1)
Planning Time: 0.101 ms
Execution Time: 45.986 ms

As you can see, it takes over 45ms (!) to execute what appear to be a very 
simple query. How is this possible?

Based on my debugging, I think the following is going on:

1. The query overestimates the final output rows by a factor of 2 million. [1]
2. The executor uses the bad estimate to allocate a huge hash table [2], and 
the increased work_mem [3] gives it enough rope to hang itself [4].
3. Somehow EXPLAIN gets confused by this and only ends up tracking 23ms of the 
query execution instead of 45ms [5].

I'm certainly a novice when it comes to PostgreSQL internals, but I'm wondering 
if this could be fixed by taking a more dynamic approach for allocating 
HashAggregate hash tables?

Anyway, for my query using UNION ALL was acceptable, so I'm not too troubled by 
this. I was just really caught by surprise that bad estimates can not only 
cause bad query plans, but also cause good query plans to execute extremely 
slowly. I had never seen that before.

Best Regards
Felix

[1] My actual query had bad estimates for other reasons (GIN Index), but that's 
another story. The query above was of course deliberately designed to have bad 
estimates.
[2] nodeAgg.c: build_hash_table()
[3] A lot of queries in my application benefit from increased work_mem.
[4] execGrouping.c: nbuckets = Min(nbuckets, (long) ((work_mem * 1024L) / 
entrysize));
[5] In my actual query it was even worse, only 6 out of 40ms of the total 
execution time were reported as being spent in the query nodes.



GIN Index has O(N^2) complexity for array overlap operator?

2018-09-07 Thread Felix Geisendörfer
Hi everybody,

I ran into an issue with using the && array operator on a GIN index of mine. 
Basically I have a query that looks like this:

SELECT * FROM example WHERE keys && ARRAY[...];

This works fine for a small number of array elements (N), but gets really slow 
as N gets bigger in what appears to be O(N^2) complexity.

However, from studying the GIN data structure as described by the docs, it 
seems that the performance for this could be O(N). In fact, it's possible to 
coerce the query planner into an O(N) plan like this:

SELECT DISTINCT ON (example.id) * FROM unnest(ARRAY[...]) key JOIN example ON 
keys && ARRAY[key]

In order to illustrate this better, I've created a jupyter notebook that 
populates an example table, show the query plans for both queries, and most 
importantly benchmarks them and plots a time vs array size (N) graph.

https://github.com/felixge/pg-slow-gin/blob/master/pg-slow-gin.ipynb 
<https://github.com/felixge/pg-slow-gin/blob/master/pg-slow-gin.ipynb>

Please help me understand what causes the O(N^2) performance for query 1 and if 
query 2 is the best way to work around this issue.

Thanks
Felix Geisendörfer

PS: I'm using Postgres 10, but also verified that this problem exists with 
Postgres 11.