On Thu, 15 Feb 2007, Peter Eisentraut wrote:

> Neil Conway wrote:
> > On Wed, 2007-02-14 at 16:49 -0800, Jeremy Drake wrote:
> > > What was the status of this?  Was there anything else I needed to
> > > do with this patch, or is it ready to be applied?  Let me know if
> > > there is anything else I need to do on this...
> >
> > Will do -- I'm planning to apply this as soon as I have the free
> > cycles to do so, likely tomorrow or Friday.
> I don't know which patch is actually being proposed now.  It would be
> good to make this more explicit and maybe include a synopsis of the
> functions in the email, so we know what's going on.

Sorry, my intent was just to check to see if I had gotten the patch
sufficiently fixed for Neil to apply and he just hadn't gotten to it yet
(which seems to be the case), or if there was something else he still
expected me to fix that I had missed in the prior discussions.  I suppose
I should have emailed him privately.

The patch in question can be seen in the archives here:

The functions added are:
* regexp_split(str text, pattern text) RETURNS SETOF text
  regexp_split(str text, pattern text, flags text) RETURNS SETOF text
   returns each section of the string delimited by the pattern.
* regexp_matches(str text, pattern text) RETURNS text[]
   returns all capture groups when matching pattern against string in an
* regexp_matches(str text, pattern text, flags text) RETURNS SETOF
    (prematch text, fullmatch text, matches text[], postmatch text)
   returns all capture groups when matching pattern against string in an
   array.  also returns the entire match in fullmatch.  if the 'g' option
   is given, returns all matches in the string.  if the 'r' option is
   given, also return the text before and after the match in prematch and
   postmatch respectively.

> What confuses me about some of the functions I've seen in earlier
> patches in this thread is that they return setof something.  But in my
> mind, regular expression matches or string splits are inherently
> ordered, so an array would be the correct return type.

They do return SETOF.  Addressing them separately:

regexp_matches uses a text[] for the match groups.  If you specify the
global flag, it could return multiple matches.  Couple this with the
requested feature of pre- and postmatch returns (with its own flag) and
the return would turn into some sort of nasty array of tuples, or multiple
arrays.  It seems much cleaner to me to return a set of the matches found,
and I find which order the matches occur in to be much less important than
the fact that they did occur and their contents.

regexp_split returns setof text.  This has, in my opinion, a much greater
case to return an array.  However, there are several issues with this

# My experience with the array code leads me to believe that building up
an array is an expensive proposition.  I know I could code it smarter so
that the array is only constructed in the end.

# With a set-returning function, it is possible to add a LIMIT clause, to
prevent splitting up more of the string than is necessary.  It is also
immediately possible to insert them into a table, or do grouping on them,
or call a function on each value.  Most of the time when I do a split, I
intend to do something like this with each split value.

# You can still get an array if you really want it:
#* SELECT ARRAY(SELECT * FROM regexp_split('first, second, third', E',\\s*'))

No problem is so formidable that you can't just walk away from it.
                -- C. Schulz

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at


Reply via email to