On 10/11/2016 05:56 PM, Andres Freund wrote:
On 2016-10-11 04:29:31 +0200, Tomas Vondra wrote:
On 10/11/2016 04:07 AM, Andres Freund wrote:
On 2016-10-10 17:46:22 -0700, Andres Freund wrote:
TPC-DS (tpcds.ods)
------------------

In this case, I'd say the results are less convincing. There are quite a few
queries that got slower by ~10%, which is well above - for example queries
22 and 67. There are of course queries that got ~10% faster, and in total
the patched version executed more queries (so overall the result is slightly
positive, but not significantly).

That's interesting. I wonder whether that's plan changes just due to the
changing memory estimates, or what's causing that. I'll look into it.

Hm. Based on an initial look those queries aren't planned with any of
the affected codepaths.  Could this primarily be a question of
randomness? Would it perhaps make sense to run the tests in a comparable
order? Looking at tpcds.py and the output files, it seems that the query
order differes between the runs, that can easily explain bigger
difference than the above. For me a scale=1 run creates a database of
approximately 4.5GB, thus with shared_buffers=1GB execution order is
likely to have a significant performance impact.


Yes, I see similar plans (no bitmap index scans or hash aggregates). But the
difference is there, even when running the query alone (so it's not merely
due to the randomized ordering).

I wonder whether this is again due to compiler moving stuff around.

It looks like that. I looked through a significant set of plans where
there we time differences (generated on my machine), and none of them
had bitmap or hash groupings to any significant degree.  Comparing
profiles in those cases usually shows a picture like:
    24.98%   +0.32%  postgres            [.] slot_deform_tuple
    16.58%   -0.05%  postgres            [.] ExecMakeFunctionResultNoSets
    12.41%   -0.01%  postgres            [.] slot_getattr
     6.10%   +0.39%  postgres            [.] heap_getnext
     4.41%   -0.37%  postgres            [.] ExecQual
     3.08%   +0.12%  postgres            [.] ExecEvalScalarVarFast
     2.85%   -0.11%  postgres            [.] check_stack_depth
     2.48%   +0.42%  postgres            [.] ExecEvalConst
     2.44%   -0.33%  postgres            [.] heapgetpage
     2.34%   +0.11%  postgres            [.] ExecScan
     2.14%   -0.20%  postgres            [.] ExecStoreTuple

I.e. pretty random performance changes. This indeed looks like binary
layout changes.  Looking at these plans and at profiles spanning a run
of all queries shows that bitmap scans and hash aggregations, while
present, account for a very small amount of time in total. So tpc-ds
doesn't look particularly interesting to evaluate these patches - but
vey interesting for my slot deforming and qual evaluation patches.


Meh, that's annoying. Anyway, the reason why many of the TPC-DS queries can't really benefit from the patch is that a lot of the queries use grouping sets, and we only have sorted implementation for that.

I wonder whether the TPC-H differences are also affected by this ...

Btw, query_14.sql as generated by your templates (in pgperffarm) doesn't
seem to work here. And I never had the patience to run query_1.sql to
completion... Looks like we could very well use some planner
improvements here.


Yeah, planner improvements would be great ;-)

Query 14 needs an explicit cast to text, otherwise you get something like "can't cast unknown to text" error. Not sure if that's an expected issue or not, but the cast fixes it. I haven't pushed that to the pgperffarm repo yet, along with several other tooling fixes.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to