Re: [PATCH] Support empty ranges with bounds information

Joel Jacobson Tue, 02 Mar 2021 09:53:15 -0800

On Tue, Mar 2, 2021, at 15:42, Tom Lane wrote:
> "Joel Jacobson" <j...@compiler.org> writes:
> > As discussed in the separate thread "[PATCH] regexp_positions ( string 
> > text, pattern text, flags text ) → setof int4range[]" [1]
> > it's currently not possible to create an empty range with bounds 
> > information.
> 
> > This patch tries to improve the situation by keeping the bounds information,
> > and allow accessing it via lower() and upper().
> 
> I think this is an actively bad idea.  We had a clean set-theoretic
> definition of ranges as sets of points, and with this we would not.
> We should not be whacking around the fundamental semantics of a
> whole class of data types on the basis that it'd be cute to make
> regexp_position return its result as int4range rather than int[].


I think there are *lots* of other use-cases where the current semantics of 
range types are very problematic.

The regexp_positions() patch just demonstrates one concrete example
on when real-life zero-length ranges can definitively have positions,
regardless of what the mathematicians thinks.
(I use the term "position" here since if lower=upper bound,
then we're talking about a position, since it has zero length.)

I think there is a risk lots of users coming from other programming environments
will misunderstand how ranges work, start implementing something using them,
only to later have to rewrite all their code using ranges due to eventually 
encountering the
zero-length corner-case for which there is no work-around (except not using 
ranges).

Imagine e.g. a Rust user, who has learned how ranges work in Rust,
and thinks the program below is valid and and expects it to output
"start 3 end 3 is_empty true".

fn main() {
    let r = std::ops::Range { start: 3, end: 3 };

    println!(
        "start {} end {} is_empty {}",
        r.start,
        r.end,
        r.is_empty()
    );
}

I think it would be a challenge to explain how PostgreSQL's range semantics
to this user, why you get NULL when trying to
access the start and end values for this range.

I feel this is a perfect example of when theory and practise has since long 
parted ways,
and the theory is only cute until you face the ugly reality.

That said, subtle changes are of course possibly dangerous,
and since I'm not a huge range type user myself,
I can't have an opinion on how many rely on the current null semantics for 
lower() and upper().

Argh! I wish we would have access to a large set of real-life real-time 
statistics on PostgreSQL SQL queries
and result sets from lots of different users around the world, similar to the 
regex test corpus.
It's very unfair e.g. Amazon with their Aurora could in theory collect such 
statistics on all their users,
so their Aurora-hackers could answers questions like

    "Do lower() and upper() ever return NULL in real-life for ranges?"

While all we can do is to rely on user reports and our imagination.
Maybe we should allow opting-in to contribute with statistics from production 
servers,
to help us better understand how PostgreSQL is used in real-life?
I see lots of problems with data privacy, business secrets etc, but perhaps 
there are something that can be done.

Oh well. At least it was fun to learn about how ranges are implemented behind 
the scenes.

If we cannot do a subtle change, then I think we should consider an entirely 
new range class,
just like multi-ranges are added in v14. Maybe "negrange" could be a good name?

> 
> If we did go forward with this, what would the implications be for
> multiranges?

None. It would only affect lower()/upper() for a single range created with 
bounds.

Before patch:

SELECT numrange(3,4) + numrange(5,5);
   [3,4)
SELECT upper(numrange(3,4) + numrange(5,5));
     4
SELECT numrange(5,5);
  empty
SELECT upper(numrange(5,5));
NULL

After patch:

SELECT numrange(3,4) + numrange(5,5);
   [3,4)
SELECT upper(numrange(3,4) + numrange(5,5));
     4
SELECT numrange(5,5);
  empty
SELECT upper(numrange(5,5));
     5

At the very least, I think we should in any case add test coverage of what 
lower()/upper() returns for empty ranges.

/Joel

Re: [PATCH] Support empty ranges with bounds information

Reply via email to