On Tue, Mar 2, 2021, at 15:42, Tom Lane wrote: > "Joel Jacobson" <j...@compiler.org> writes: > > As discussed in the separate thread "[PATCH] regexp_positions ( string > > text, pattern text, flags text ) → setof int4range[]" [1] > > it's currently not possible to create an empty range with bounds > > information. > > > This patch tries to improve the situation by keeping the bounds information, > > and allow accessing it via lower() and upper(). > > I think this is an actively bad idea. We had a clean set-theoretic > definition of ranges as sets of points, and with this we would not. > We should not be whacking around the fundamental semantics of a > whole class of data types on the basis that it'd be cute to make > regexp_position return its result as int4range rather than int[].
I think there are *lots* of other use-cases where the current semantics of range types are very problematic. The regexp_positions() patch just demonstrates one concrete example on when real-life zero-length ranges can definitively have positions, regardless of what the mathematicians thinks. (I use the term "position" here since if lower=upper bound, then we're talking about a position, since it has zero length.) I think there is a risk lots of users coming from other programming environments will misunderstand how ranges work, start implementing something using them, only to later have to rewrite all their code using ranges due to eventually encountering the zero-length corner-case for which there is no work-around (except not using ranges). Imagine e.g. a Rust user, who has learned how ranges work in Rust, and thinks the program below is valid and and expects it to output "start 3 end 3 is_empty true". fn main() { let r = std::ops::Range { start: 3, end: 3 }; println!( "start {} end {} is_empty {}", r.start, r.end, r.is_empty() ); } I think it would be a challenge to explain how PostgreSQL's range semantics to this user, why you get NULL when trying to access the start and end values for this range. I feel this is a perfect example of when theory and practise has since long parted ways, and the theory is only cute until you face the ugly reality. That said, subtle changes are of course possibly dangerous, and since I'm not a huge range type user myself, I can't have an opinion on how many rely on the current null semantics for lower() and upper(). Argh! I wish we would have access to a large set of real-life real-time statistics on PostgreSQL SQL queries and result sets from lots of different users around the world, similar to the regex test corpus. It's very unfair e.g. Amazon with their Aurora could in theory collect such statistics on all their users, so their Aurora-hackers could answers questions like "Do lower() and upper() ever return NULL in real-life for ranges?" While all we can do is to rely on user reports and our imagination. Maybe we should allow opting-in to contribute with statistics from production servers, to help us better understand how PostgreSQL is used in real-life? I see lots of problems with data privacy, business secrets etc, but perhaps there are something that can be done. Oh well. At least it was fun to learn about how ranges are implemented behind the scenes. If we cannot do a subtle change, then I think we should consider an entirely new range class, just like multi-ranges are added in v14. Maybe "negrange" could be a good name? > > If we did go forward with this, what would the implications be for > multiranges? None. It would only affect lower()/upper() for a single range created with bounds. Before patch: SELECT numrange(3,4) + numrange(5,5); [3,4) SELECT upper(numrange(3,4) + numrange(5,5)); 4 SELECT numrange(5,5); empty SELECT upper(numrange(5,5)); NULL After patch: SELECT numrange(3,4) + numrange(5,5); [3,4) SELECT upper(numrange(3,4) + numrange(5,5)); 4 SELECT numrange(5,5); empty SELECT upper(numrange(5,5)); 5 At the very least, I think we should in any case add test coverage of what lower()/upper() returns for empty ranges. /Joel