Re: [HACKERS] Final Patch for GROUPING SETS

Andrew Gierth Sat, 13 Dec 2014 04:42:06 -0800

>>>>> "Tom" == Tom Lane <t...@sss.pgh.pa.us> writes:


With the high-priority questions out of the way, time to tackle the
rest:

 Tom> My single biggest complaint is about the introduction of struct
 Tom> GroupedVar.  If we stick with that, we're going to have to teach
 Tom> an extremely large number of places that know about Vars to also
 Tom> know about GroupedVars.  This will result in code bloat and
 Tom> errors of omission.  If you think the latter concern is
 Tom> hypothetical, note that you can't get 40 lines into gsp1.patch
 Tom> without finding such an omission, namely the patch fails to
 Tom> teach pg_stat_statements.c about GroupedVars.  (That also points
 Tom> up that some of the errors of omission will be in third-party
 Tom> code that we can't fix easily.)

Except that GroupedVar is created only late in planning, and so only a
small proportion of places need to know about it (and certainly
pg_stat_statements does not). It also can't end up attached to any
foreign scan or otherwise potentially third-party plan node.

 Tom> I think you should get rid of that concept and instead implement
 Tom> the behavior by having nodeAgg.c set the relevant fields of the
 Tom> representative tuple slot to NULL, so that a regular Var does
 Tom> the right thing.

We did consider that. Messing with the null flags of the slot didn't
seem like an especially clean approach. But if that's how you want
it...

 Tom> I don't really have any comments on the algorithms yet, having
 Tom> spent too much time trying to figure out underdocumented data
 Tom> structures to get to the algorithms.  However, noting the
 Tom> addition of list_intersection_int() made me wonder whether you'd
 Tom> not be better off reducing the integer lists to bitmapsets a lot
 Tom> sooner, perhaps even at parse analysis.

list_intersection_int should not be time-critical; common queries do
not call it at all (simple cube or rollup clauses always have an empty
grouping set, causing the intersection test to bail immediately), and
in pathological worst-case constructions like putting a dozen
individually grouped columns in front of a 12-d cube (thus calling it
4096 times on lists at least 12 nodes long) it doesn't account for
more than a small percentage even with optimization off and debugging
and asserts on.

The code uses the list representation almost everywhere in parsing and
planning because in some places the order of elements matters, and I
didn't want to keep swapping between a bitmap and a list
representation.

(We _do_ use bitmapsets where we're potentially going to be doing an
O(N^2) number of subset comparisons to build the graph adjacency
list for computing the minimal set of sort operations, and at
execution time.)

I didn't even consider using bitmaps for the output of parse analysis
because at that stage we want to preserve most of the original query
substructure (otherwise view deparse won't look anything like the
original query did).

 Tom> list_intersection_int() is going to be O(N^2) by nature.  Maybe
 Tom> N can't get large enough to matter in this context, but I do see
 Tom> places that seem to be concerned about performance.

My main feeling on performance is that simple cube and rollup clauses
or short lists of grouping sets should parse and plan very quickly;
more complex cases should parse and plan fast enough that execution
time on any nontrivial input will swamp the parse/plan time; and the
most complex cases that aren't outright rejected should plan in no
more than a few seconds extra. (We're limiting to 4096 grouping sets
in any query level, which is comparable to other databases and seems
quite excessively high compared to what people are actually likely to
need.)

(don't be fooled by the excessive EXPLAIN time on some queries. There
are performance issues in EXPLAIN output generation that have nothing
to do with this patch, and which I've not pinned down.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Final Patch for GROUPING SETS

Reply via email to