On Wed, Jun 6, 2018 at 8:06 PM Tomas Vondra <tomas.von...@2ndquadrant.com> wrote: > >>> Comparison cost can be approximated probabilistically as: > >>> > >>> cost_comp = sum(cost_op_n * (1.0 / ndistinct(col_1_to_n))) > >>> > >>> Where cost_op_n is the cost of the comparison function for column N, > >>> and ndistinct(col_1_to_n) is an estimation of the number of distinct > >>> values for columns prior to N in the sort order. > >>> > >>> You can approximate ndistinct as the product of ndistinct of previous > >>> columns, or use extended statistics when available. > >>> > >> > >> Sure. The trouble is this also assumes uniform distribution, which is > >> not always the case. > > > > Well, (1.0 / ndistinct) = p(left == right). > > > > If you can compute a better p(left == right) with an MCV, you can get > > a better estimate. If you have an MCV. But that'd be a bit expensive, > > and I'm not sure it's all that relevant. > > > > To what degree does improving this produce better plans? As long as > > average expected cost is relatively well estimated (as in, one sort > > order relative to another sort order), it won't affect the decision. > > > > I'd bet I can construct corner-case examples with vastly different > behavior depending on column data distributions, so it's not entirely > insignificant. How important is it in practice I don't know.
I guess that can only be answered by building that solution and testing it against the corner cases.