> On Mar 5, 2021, at 11:46 AM, Joel Jacobson <j...@compiler.org> wrote:
>
>
> /Joel
> <range.sql><0003-regexp-positions.patch>
I did a bit more testing:
+SELECT regexp_positions('foobarbequebaz', 'b', 'g');
+ regexp_positions
+------------------
+ {"[3,5)"}
+ {"[6,8)"}
+ {"[11,13)"}
+(3 rows)
+
I understand that these ranges are intended to be read as one character long
matches starting at positions 3, 6, and 11, but they look like they match
either two or three characters, depending on how you read them, and users will
likely be confused by that.
+SELECT regexp_positions('foobarbequebaz', '(?=beque)', 'g');
+ regexp_positions
+------------------
+ {"[6,7)"}
+(1 row)
+
This is a zero length match. As above, it might be confusing that a zero
length match reads this way.
+SELECT regexp_positions('foobarbequebaz', '(?<=z)', 'g');
+ regexp_positions
+------------------
+ {"[14,15)"}
+(1 row)
+
Same here, except this time position 15 is referenced, which is beyond the end
of the string.
I think a zero length match at the end of this string should read as
{"[14,14)"}, and you have been forced to add one to avoid that collapsing down
to "empty", but I'd rather you found a different datatype rather than abuse the
definition of int4range.
It seems that you may have reached a similar conclusion down-thread?
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company