Re: Extremely slow HashAggregate in simple UNION query

2019-08-24 Thread Andres Freund
Hi,

On August 24, 2019 12:41:03 PM PDT, Tom Lane  wrote:
>Jeff Janes  writes:
>> Most of the time is not after the clock stops, but before the
>stepwise
>> ANALYZE clock starts.  If you just do an EXPLAIN rather than EXPLAIN
>> ANALYZE, that is also slow.  The giant hash table is created during
>the
>> planning step (or somewhere around there
>
>It's in executor startup, I believe.

I'm sure personally interested in doing so, but it'd not be hard to measure the 
executor startup time separately. And display it either on a per node basis, or 
as a total number.

Quite unconvinced this thread is a convincing reason to do so (or really do 
anything). But for some other workloads executor startup is a very large 
fraction of the total time, without massive misestimations. Optimizing that 
could be easier with that information available.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.




Re: Extremely slow HashAggregate in simple UNION query

2019-08-24 Thread Jeff Janes
On Thu, Aug 22, 2019 at 1:09 AM Pavel Stehule 
wrote:

> čt 22. 8. 2019 v 3:11 odesílatel Jeff Janes  napsal:
>
>> ...


> But the same advance in v12 which makes it harder to fool with your test
>> case also opens the possibility of fixing your real case.
>>
>
> I think so much more interesting should be long time after query
> processing - last row was processed in 13ms, but Execution Time was 69ms ..
> so some cleaning is 56ms - that is pretty long.
>

Most of the time is not after the clock stops, but before the stepwise
ANALYZE clock starts.  If you just do an EXPLAIN rather than EXPLAIN
ANALYZE, that is also slow.  The giant hash table is created during the
planning step (or somewhere around there--I notice that EXPLAIN ANALYZE
output doesn't count it in what it labels as the planning step--but it is
some step that EXPLAIN without ANALYZE does execute, which to me makes it a
planning step).

For me, "perf top" shows kernel's __do_page_fault as the top
function.  tuplehash_iterate does show up at 20% (which I think is
overattributed, considering how little the speedup is when dropping
ANALYZE), but everything else just looks like kernel memory management code.

Cheers,

Jeff