>>>>> "Mark" == Mark Dilger <hornschnor...@gmail.com> writes:
Mark> You define DiscreteKnapsack to take integer weights and double Mark> values, and perform the usual Dynamic Programming algorithm to Mark> solve. But the only place you call this, you pass in NULL for Mark> the values, indicating that all the values are equal. I'm Mark> confused why in this degenerate case you bother with the DP at Mark> all. Can't you just load the knapsack from lightest to heaviest Mark> after sorting, reducing the runtime complexity to O(nlogn)? It's Mark> been a long day. Sorry if I am missing something obvious. That's actually a very good question. (Though in passing I should point out that the runtime cost is going to be negligible in all practical cases. Even an 8-way cube (256 grouping sets) has only 70 rollups, and a 4-way cube has only 6; the limit of 4096 grouping sets was only chosen because it was a nice round number that was larger than comparable limits in other database products. Other stuff in the grouping sets code has worse time bounds; the bipartite-match code used to minimize the number of rollups is potentially O(n^2.5) in the number of grouping sets.) Thinking about it, it's not at all obvious what we should be optimizing for here in the absence of accurate sort costs. Right now what matters is saving as many sort steps as possible, since that's a saving likely to be many orders of magnitude greater than the differences between two sorts. The one heuristic that might be useful is to prefer the smaller estimated size if other factors are equal, just on memory usage grounds, but even that seems a bit tenuous. I'm inclined to leave things as they are in the current patch, and maybe revisit it during beta if we get any relevant feedback from people actually using grouping sets? Mark> I'm guessing you intend to extend the code at some later date to Mark> have different values for items. Is that right? Well, as I mentioned in a reply to Andres, we probably should eventually optimize for greatest reduction in cost, and the code as it stands would allow that to be added easily, but that would require having more accurate cost info than we have now. cost_sort doesn't take into consideration the number or types of sort columns or the costs of their comparison functions, so all sorts of the same input data are costed equally. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers