Re: Assertion failure with LEFT JOINs among >500 relations

David Rowley Tue, 13 Oct 2020 19:55:01 -0700

Thanks for having a look at this.

On Wed, 14 Oct 2020 at 04:16, Tom Lane <t...@sss.pgh.pa.us> wrote:
> I'm on board with trying to get rid of NaN rowcount estimates more
> centrally.  I do not think it is a good idea to try to wire in a
> prohibition against zero rowcounts.  That is actually the correct
> thing in assorted scenarios --- one example recently under discussion
> was ModifyTable without RETURNING, and another is where we can prove
> that a restriction clause is constant-false.  At some point I think
> we are going to want to deal honestly with those cases instead of
> sweeping them under the rug.  So I'm disinclined to remove zero
> defenses that we'll just have to put back someday.


OK, that certainly limits the scope here.  It just means we can't get
rid of the <= 0 checks in join costing functions.  The problem case
that this was added for was a dummy Append. We still have valid cases
that won't convert the join rel to a dummy rel with a dummy Append on
one side.

> I think converting Inf to DBL_MAX, in hopes of avoiding creation of
> NaNs later, is fine.  (Note that applying rint() to that is quite
> useless --- in every floating-point system, values bigger than
> 2^number-of-mantissa-bits are certainly integral.)

Good point.

> I'm not sure why you propose to map NaN to one.  Wouldn't mapping it
> to Inf (and thence to DBL_MAX) make at least as much sense?  Probably
> more in fact.  We know that unwarranted one-row estimates are absolute
> death to our chances of picking a well-chosen plan.

That came around due to what the join costing functions were doing. i.e:

/* Protect some assumptions below that rowcounts aren't zero or NaN */
if (inner_path_rows <= 0 || isnan(inner_path_rows))
   inner_path_rows = 1;

[1] didn't have an example case of how the NaNs were introduced, so I
was mostly just copying the logic that was added to fix that back in
72826fb3.

> > I toyed around with the attached patch, but I'm still not that excited
> > about the clamping of infinite values to DBL_MAX.  The test case I
> > showed above with generate_Series(1,379) still ends up with NaN cost
> > estimates due to costing a sort with DBL_MAX rows.  When I was writing
> > the patch, I had it in my head that the costs per row will always be
> > lower than 1.
>
> Yeah, that is a good point.  Maybe instead of clamping to DBL_MAX,
> we should clamp rowcounts to something that provides some headroom
> for multiplication by per-row costs.  A max rowcount of say 1e100
> should serve fine, while still being comfortably more than any
> non-insane estimate.
>
> So now I'm imagining something like
>
> #define MAXIMUM_ROWCOUNT 1e100

That seems more reasonable. We likely could push it a bit higher, but
I'm not all that motivated to since if that was true, then you could
expect the heat death of the universe to arrive before your query
results. In which case the user would likely struggle to find
electrons to power their computer.

> clamp_row_est(double nrows)
> {
>         /* Get rid of NaN, Inf, and impossibly large row counts */
>         if (isnan(nrows) || nrows >= MAXIMUM_ROWCOUNT)
>             nrows = MAXIMUM_ROWCOUNT;
>         else
>         ... existing logic ...

I've got something along those lines in the attached.

> Perhaps we should also have some sort of clamp for path cost
> estimates, at least to prevent them from being NaNs which
> is going to confuse add_path terribly.

hmm. I'm not quite sure where to start with that one.  Many of the
path estimates will already go through clamp_row_est(). There are
various special requirements, e.g Appends with no subpaths. So when to
apply it would depend on what path type it is. I'd say it would need
lots of careful analysis and a scattering of new calls in pathnode.c

I've ended up leaving the NaN checks in the join costing functions.
There was no case mentioned in [1] that showed how we hit that
reported test case, so I'm not really confident enough to know I'm not
just reintroducing the same problem again by removing that.  The path
row estimate that had the NaN might not have been through
clamp_row_est(). Many don't.

David

[1] https://www.postgresql.org/message-id/7270.1302902842%40sss.pgh.pa.us

diff --git a/src/backend/optimizer/path/costsize.c 
b/src/backend/optimizer/path/costsize.c
index cd3716d494..733f7ea543 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -107,6 +107,13 @@
  */
 #define APPEND_CPU_COST_MULTIPLIER 0.5
 
+/*
+ * Maximum value for row estimates.  We cap row estimates to this to help
+ * ensure that costs based on these estimates remain within the range of what
+ * double can represent.  add_path() wouldn't act sanely given infinite or NaN
+ * cost values.
+ */
+#define MAXIMUM_ROWCOUNT 1e100
 
 double         seq_page_cost = DEFAULT_SEQ_PAGE_COST;
 double         random_page_cost = DEFAULT_RANDOM_PAGE_COST;
@@ -189,11 +196,14 @@ double
 clamp_row_est(double nrows)
 {
        /*
-        * Force estimate to be at least one row, to make explain output look
-        * better and to avoid possible divide-by-zero when interpolating costs.
-        * Make it an integer, too.
+        * Avoid infinite and NaN row estimates.  Costs derived from such values
+        * are going to be useless.  Also force the estimate to be at least one
+        * row, to make explain output look better and to avoid possible
+        * divide-by-zero when interpolating costs.  Make it an integer, too.
         */
-       if (nrows <= 1.0)
+       if (nrows > MAXIMUM_ROWCOUNT || isnan(nrows))
+               nrows = MAXIMUM_ROWCOUNT;
+       else if (nrows <= 1.0)
                nrows = 1.0;
        else
                nrows = rint(nrows);
@@ -2737,12 +2747,11 @@ final_cost_nestloop(PlannerInfo *root, NestPath *path,
        QualCost        restrict_qual_cost;
        double          ntuples;
 
-       /* Protect some assumptions below that rowcounts aren't zero or NaN */
-       if (outer_path_rows <= 0 || isnan(outer_path_rows))
+       /* Protect some assumptions below that rowcounts aren't zero */
+       if (outer_path_rows <= 0)
                outer_path_rows = 1;
-       if (inner_path_rows <= 0 || isnan(inner_path_rows))
+       if (inner_path_rows <= 0)
                inner_path_rows = 1;
-
        /* Mark the path with the correct row estimate */
        if (path->path.param_info)
                path->path.rows = path->path.param_info->ppi_rows;
@@ -2952,10 +2961,10 @@ initial_cost_mergejoin(PlannerInfo *root, 
JoinCostWorkspace *workspace,
                                innerendsel;
        Path            sort_path;              /* dummy for result of 
cost_sort */
 
-       /* Protect some assumptions below that rowcounts aren't zero or NaN */
-       if (outer_path_rows <= 0 || isnan(outer_path_rows))
+       /* Protect some assumptions below that rowcounts aren't zero */
+       if (outer_path_rows <= 0)
                outer_path_rows = 1;
-       if (inner_path_rows <= 0 || isnan(inner_path_rows))
+       if (inner_path_rows <= 0)
                inner_path_rows = 1;
 
        /*
@@ -3185,8 +3194,8 @@ final_cost_mergejoin(PlannerInfo *root, MergePath *path,
                                rescannedtuples;
        double          rescanratio;
 
-       /* Protect some assumptions below that rowcounts aren't zero or NaN */
-       if (inner_path_rows <= 0 || isnan(inner_path_rows))
+       /* Protect some assumptions below that rowcounts aren't zero */
+       if (inner_path_rows <= 0)
                inner_path_rows = 1;
 
        /* Mark the path with the correct row estimate */

Re: Assertion failure with LEFT JOINs among >500 relations

Reply via email to