Re: [HACKERS] Hash support for grouping sets

Andrew Gierth Thu, 23 Mar 2017 11:23:05 -0700

>>>>> "Mark" == Mark Dilger <hornschnor...@gmail.com> writes:


 Mark> You define DiscreteKnapsack to take integer weights and double
 Mark> values, and perform the usual Dynamic Programming algorithm to
 Mark> solve.  But the only place you call this, you pass in NULL for
 Mark> the values, indicating that all the values are equal.  I'm
 Mark> confused why in this degenerate case you bother with the DP at
 Mark> all.  Can't you just load the knapsack from lightest to heaviest
 Mark> after sorting, reducing the runtime complexity to O(nlogn)?  It's
 Mark> been a long day.  Sorry if I am missing something obvious.

That's actually a very good question.

(Though in passing I should point out that the runtime cost is going to
be negligible in all practical cases. Even an 8-way cube (256 grouping
sets) has only 70 rollups, and a 4-way cube has only 6; the limit of
4096 grouping sets was only chosen because it was a nice round number
that was larger than comparable limits in other database products. Other
stuff in the grouping sets code has worse time bounds; the
bipartite-match code used to minimize the number of rollups is
potentially O(n^2.5) in the number of grouping sets.)

Thinking about it, it's not at all obvious what we should be optimizing
for here in the absence of accurate sort costs. Right now what matters
is saving as many sort steps as possible, since that's a saving likely
to be many orders of magnitude greater than the differences between two
sorts. The one heuristic that might be useful is to prefer the smaller
estimated size if other factors are equal, just on memory usage grounds,
but even that seems a bit tenuous.

I'm inclined to leave things as they are in the current patch, and maybe
revisit it during beta if we get any relevant feedback from people
actually using grouping sets?

 Mark> I'm guessing you intend to extend the code at some later date to
 Mark> have different values for items.  Is that right?

Well, as I mentioned in a reply to Andres, we probably should eventually
optimize for greatest reduction in cost, and the code as it stands would
allow that to be added easily, but that would require having more
accurate cost info than we have now. cost_sort doesn't take into
consideration the number or types of sort columns or the costs of their
comparison functions, so all sorts of the same input data are costed
equally.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash support for grouping sets

Reply via email to