Justin Pryzby <pry...@telsasoft.com> writes:
> On Mon, Dec 30, 2019 at 02:18:17PM -0500, Tom Lane wrote:
>> This answer is simply broken.  You've caused it to estimate half
>> of the bucket, which is an insane estimate for the given bucket
>> boundaries and WHERE constraint.

> I'm fine if the isnan() logic changes, but the comment indicates it's intended
> to be hit for an infinite histogram bound, but that doesn't work for 
> timestamps
> (convert_to_scalar() should return (double)INFINITY and not
> (double)INT64_MIN/MAX).

I suppose the code you're looking at is

                        binfrac = (val - low) / (high - low);

                        /*
                         * Watch out for the possibility that we got a NaN or
                         * Infinity from the division.  This can happen
                         * despite the previous checks, if for example "low"
                         * is -Infinity.
                         */
                        if (isnan(binfrac) ||
                            binfrac < 0.0 || binfrac > 1.0)
                            binfrac = 0.5;

This doesn't really have any goals beyond "make sure we get a result
between 0.0 and 1.0, even if the calculation went pear-shaped for
some reason".  You could make an argument that it should be like

                        if (isnan(binfrac))
                            binfrac = 0.5;   /* throw up our hands for NaN */
                        else if (binfrac <= 0.0)
                            binfrac = 0.0;   /* clamp in case of -Inf or -0 */
                        else if (binfrac > 1.0)
                            binfrac = 1.0;   /* clamp in case of +Inf */

which would probably produce saner results in edge cases like these.
I think it'd also obviate the need for fooling with the conversion in
convert_to_scalar: while DT_NOBEGIN/DT_NOEND wouldn't produce exactly
the same result (hard 0.0 or 1.0) as an infinity, they'd produce
results very close to that.

                        regards, tom lane


Reply via email to