On Tue, Mar 9, 2021, at 17:42, Tom Lane wrote: > "Joel Jacobson" <[email protected]> writes: > > Tom - can you please give details on your unpleasant experiences with > > parallel arrays? > > The problems I can recall running into were basically down to not having > an easy way to iterate through parallel arrays. There are ways to do > that in SQL, certainly, but they all constrain how you write the query, > and usually force ugly stuff like splitting it into sub-selects.
I see now what you mean, many thanks for explaining.
>
> As an example, presuming that regexp_positions is defined along the
> lines of
>
> regexp_positions(str text, pat text, out starts int[], out lengths int[])
> returns setof record
+1
I think this is the most feasible best option so far.
Attached is a patch implementing it this way.
I changed the start to begin at 1, since this is how position ( substring text
IN string text ) → integer works.
SELECT * FROM regexp_positions('foobarbequebaz', '^', 'g');
starts | lengths
--------+---------
{1} | {0}
(1 row)
SELECT * FROM regexp_positions('foobarbequebaz', 'ba.', 'g');
starts | lengths
--------+---------
{4} | {3}
{12} | {3}
(2 rows)
Mark's examples:
SELECT * FROM regexp_positions('foObARbEqUEbAz', $re$(?=beque)$re$, 'i');
starts | lengths
--------+---------
{7} | {0}
(1 row)
SELECT * FROM regexp_positions('foobarbequebaz', '(?<=z)', 'g');
starts | lengths
--------+---------
{15} | {0}
(1 row)
I've also tested your template queries:
>
> then to actually get the identified substrings you'd have to do something
> like
>
> select
> substring([input string] from starts[i] for lengths[i])
> from
> regexp_positions([input string], [pattern]) r,
> lateral
> generate_series(1, array_length(starts, 1)) i;
select
substring('foobarbequebaz' from starts[i] for lengths[i])
from
regexp_positions('foobarbequebaz', 'ba.', 'g') r,
lateral
generate_series(1, array_length(starts, 1)) i;
substring
-----------
bar
baz
(2 rows)
> I think the last time I confronted this, we didn't have multi-array
> UNNEST. Now that we do, we can get rid of the generate_series(),
> but it's still not beautiful:
>
> select
> substring([input string] from s for l)
> from
> regexp_positions([input string], [pattern]) r,
> lateral
> unnest(starts, lengths) u(s,l);
select
substring('foobarbequebaz' from s for l)
from
regexp_positions('foobarbequebaz', 'ba.', 'g') r,
lateral
unnest(starts, lengths) u(s,l);
substring
-----------
bar
baz
(2 rows)
> Having said that, the other alternative with a 2-D array:
>
> regexp_positions(str text, pat text) returns setof int[]
>
> seems to still need UNNEST, though now it's not the magic multi-array
> UNNEST but this slicing version:
>
> select
> substring([input string] from u[1] for u[2])
> from
> regexp_positions([input string], [pattern]) r,
> lateral
> unnest_slice(r, 1) u;
Unable to test this one since there is no unnest_slice() (yet)
>
> Anyway, I'd counsel trying to write out SQL implementations
> of regexp_matches() and other useful things based on any
> particular regexp_positions() API you might be thinking about.
> Can we do anything useful without a LATERAL UNNEST thingie?
> Are some of them more legible than others?
Hmm, I cannot think of a way.
/Joel
0004-regexp-positions.patch
Description: Binary data
