On Thu, Mar 4, 2021, at 16:40, Tom Lane wrote:
> "Joel Jacobson" <[email protected]> writes:
> > Having abandoned the cute idea that didn't work,
> > here comes a new patch with a regexp_positions() instead returning
> > setof record (start_pos integer[], end_pos integer[]).
>
> I wonder if a 2-D integer array wouldn't be a better idea,
> ie {{startpos1,length1},{startpos2,length2},...}. My experience
> with working with parallel arrays in SQL has been unpleasant.
I considered it, but I prefer two separate simple arrays for two reasons:
a) more pedagogic, it's at least then obvious what values are start and end
positions,
then you only have to understand what the values means.
b) 2-D doesn't arrays don't work well with unnest().
If you would unnest() the 2-D array you couldn't separate the start positions
from the end positions,
whereas with two separate, you could do:
SELECT unnest(start_pos) AS start_pos, unnest(end_pos) AS end_pos FROM
regexp_positions('foobarbequebazilbarfbonk', $re$(b[^b]+)(b[^b]+)$re$, 'g');
start_pos | end_pos
-----------+---------
3 | 6
6 | 11
11 | 16
16 | 20
(4 rows)
Can give some details on your unpleasant experiences of parallel arrays?
>
> Also, did you see
>
> https://www.postgresql.org/message-id/fc160ee0-c843-b024-29bb-97b5da61971f%40darold.net
>
>
> Seems like there may be some overlap in these proposals.
Yes, I saw it, it was sent shortly after my proposal, so I couldn't take it
into account.
Seems useful, except regexp_instr() seems redundant, I would rather have
regexp_positions(),
but maybe regexp_instr() should also be added for compatibility reasons.
/Joel