Konstantin Knizhnik <k.knizh...@postgrespro.ru> writes: > On 03/08/2016 07:01 AM, Tom Lane wrote: >> I've not spent a lot of time on this, but I think maybe what would make >> sense is to consider both the case where function calculations are >> postponed to after ORDER BY and the case where they aren't, and generate >> Paths for both. Neither approach is a slam-dunk win. For example, >> suppose that one of the tlist columns is md5(wide_column) --- it will >> likely not be preferable to pass the wide column data through the sort >> step rather than reducing it to a hash first. This would require some >> work in grouping_planner to track two possible pathtargets, and work in >> create_ordered_paths to generate paths for both possibilities. A possible >> objection is that this would add planning work even when no real benefit >> is possible; so maybe we should only consider the new way if the tlist has >> significant eval cost? Not sure about that. There is also something >> to be said for the idea that we should try to guarantee consistent >> semantics when the tlist contains volatile functions.
> Attached please find rebased patch. This may be rebased, but it doesn't seem to respond to any of my concerns above. In particular, if we're going to change behavior in this area, I think we need to try to ensure that volatile functions in the tlist will see consistent behavior no matter which plan shape is chosen. People may well be depending on the existing behavior for particular queries. If we're going to break their queries, I'd at least like to be able to say that there's now a predictable, well-defined semantics. Something like "volatile functions in the tlist that are not in sort/group columns are certain to be evaluated after sorting occurs, if there is an ORDER BY". This should not be dependent on whether there's a LIMIT, nor GROUP BY nor windowing functions, nor on random other whims of the optimizer. So I'm not at all happy with a patch that changes things only for the LIMIT-but-no-grouping-or-windows case. I'm not sure whether it's worth pursuing cost-based comparisons for functions that aren't volatile. In an ideal world we could deal with the md5() example I gave, but as things stand I suspect we usually have too little idea whether the result of f(x) is wider or narrower than x itself. (One thing we do know is that f's output won't be toasted, whereas x might be; so it might be a bad idea to bet on function results getting narrower.) So I'm afraid it's going to be really hard to tell whether we're making the sort step itself more or less expensive if we postpone function evals. But we do know for certain that we're adding a projection step that wasn't there before when we postpone functions to after the Sort. That kind of suggests that there should be some minimum estimated cost of the functions before we add that extra step (unless they are volatile of course). Or only do it if there is a LIMIT or we have a tuple_fraction suggesting that partial eval of the query is of interest. BTW, there's some additional refactoring I had had in mind to do in grouping_planner to make its handling of the targetlist a bit more organized; in particular, I'd like to see it using PathTarget representation more consistently throughout the post-scan-join steps. I had figured that was just neatnik-ism and could be done later, but we may need to do it now in order to be able to deal with these considerations in a cleaner fashion. I really don't like the way that this patch hacks up the behavior of make_scanjoin_target() without even a comment explaining its new API. The rough sketch of what I'd had in mind is that we should convert the processed, final tlist into PathTarget form immediately after query_planner, since we know we're going to need that eventually anyway. Then, if we need to do grouping/aggregation or windowing, derive modified PathTargets that we want to use for the inputs of those steps. (This'd require some infrastructure that doesn't currently exist for manipulating PathTargets, particularly the ability to copy a PathTarget and/or add a column to an existing PathTarget.) The way this optimization would fit into that structure is that the DISTINCT/ORDER BY steps would now also have a desired input pathtarget that might be different from the final target. regards, tom lane -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers