On Tue, Dec 26, 2023 at 10:23 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > I think it's a fool's errand to even try to separate different sort > column orderings by cost. We simply do not have sufficiently accurate > cost information. The previous patch in this thread got reverted because > of that (well, also some implementation issues, but mostly that), and > nothing has happened to make me think that another try will fare any > better.
I'm late to the party, but I'd like to better understand what's being argued here. If you're saying that, for some particular planner problem, we should prefer a solution that doesn't need to know about the relative cost of various sorts over one that does, I agree, for exactly the reason that you state: our knowledge of sort costs won't be reliable, and we will make mistakes. That's true in lots of situations, not just related to sorts, because estimation is a hard problem. Heuristics not based on cost are going to be, in many cases, more accurate than heuristics based on cost. They're also often cheaper, since they often let us reject possible approaches very early, without all the bother of a cost comparison. But if you're saying that it's utterly impossible to know whether sorting text will be cheaper or more expensive than sorting 4-byte integers, and that if a particular problem can be solved only by knowing which one is cheaper we should just give up, then I disagree. In the absence of any other information, it must be right, at the very least, to bank on varlena data types being more expensive to sort than fixed-length data types. How much more expensive is hard to know, because toasted blobs are going to be more expensive to sort than short varlenas. But even before you reach the comparison function, a pass-by-value datum has a significantly lower access cost than a pass-by-reference datum. The fact that the pass-by-reference value might be huge only compounds the problem. -- Robert Haas EDB: http://www.enterprisedb.com