On Tue, Mar 2, 2021, at 01:12, Mark Dilger wrote:
> I like the idea so I did a bit of testing. I think the following should not
> error, but does:
>
> +SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
> +ERROR: range lower bound must be less than or equal to range upper bound
Doh! How stupid of me. I realize now I had a off-by-one thinko in my 0001 patch
using int4range.
I didn't use the raw "so" and "eo" values in regexp.c like I should have,
instead, I incorrectly used (so + 1) as the startpos,
and just eo as the endpos.
This is what caused all the problems.
The fix is simple:
- lower.val = Int32GetDatum(so + 1);
+ lower.val = Int32GetDatum(so);
The example that gave the error now works properly:
SELECT regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
regexp_positions
------------------
{"[6,7)"}
(1 row)
I've also created a SQL PoC of the composite range type idea,
and convenience wrapper functions for int4range and int8range.
CREATE TYPE range AS (start int8, stop int8);
Helper functions:
range(start int8, stop int8) -> range
range(int8range) -> range
range(int4range) -> range
range(int8range[]) -> range[]
range(int4range[]) -> range[]
Demo:
regexp_positions() returns setof int4range[]:
SELECT r FROM regexp_positions('foobarbequebazilbarfbonk',
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
r
-----------------------
{"[3,7)","[6,12)"}
{"[11,17)","[16,21)"}
(2 rows)
Convert int4range[] -> range[]:
SELECT range(r) FROM regexp_positions('foobarbequebazilbarfbonk',
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
range
-----------------------
{"(3,6)","(6,11)"}
{"(11,16)","(16,20)"}
(2 rows)
"start" and "stop" fields:
SELECT (range(r[1])).* FROM regexp_positions('foobarbequebazilbarfbonk',
$re$(b[^b]+)(b[^b]+)$re$, 'g') AS r;
start | stop
-------+------
3 | 6
11 | 16
(2 rows)
zero-length match at beginning:
SELECT r FROM regexp_positions('','^','g') AS r;
r
-----------
{"[0,1)"}
(1 row)
SELECT (range(r[1])).* FROM regexp_positions('','^','g') AS r;
start | stop
-------+------
0 | 0
(1 row)
My conclusion is that we should use setof int4range[] as the return value for
regexp_positions().
New patch attached.
The composite range type and helper functions are of course not at all
necessary,
but I think they would be a nice addition, to make it easier to work with ranges
for composite types. I intentionally didn't create anyrange versions of them,
since they can only support composite types,
since they don't require the inclusive/exclusive semantics.
/Joel
range.sql
Description: Binary data
0003-regexp-positions.patch
Description: Binary data
