Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-12-20 Thread Peter Eisentraut

On 15.12.21 14:15, Gilles Darold wrote:

Le 15/12/2021 à 13:41, Peter Eisentraut a écrit :

On 03.08.21 19:10, Tom Lane wrote:

Gilles Darold  writes:

Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.


Pushed, then.  There's still lots of time to tweak the behavior of 
course.


I have a documentation follow-up to this.  It seems that these new 
functions are almost a de facto standard, whereas the SQL-standard 
functions are not implemented anywhere.  I propose the attached patch 
to update the subsection in the pattern-matching section to give more 
detail on this and suggest equivalent functions among these newly 
added ones.  What do you think?



I'm in favor to apply your changes to documentation. It is a good thing 
to precise the relation between this implementation of the regex_* 
functions and the SQL stardard.


ok, done




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-12-15 Thread Gilles Darold

Le 15/12/2021 à 13:41, Peter Eisentraut a écrit :

On 03.08.21 19:10, Tom Lane wrote:

Gilles Darold  writes:

Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.


Pushed, then.  There's still lots of time to tweak the behavior of 
course.


I have a documentation follow-up to this.  It seems that these new 
functions are almost a de facto standard, whereas the SQL-standard 
functions are not implemented anywhere.  I propose the attached patch 
to update the subsection in the pattern-matching section to give more 
detail on this and suggest equivalent functions among these newly 
added ones.  What do you think?



I'm in favor to apply your changes to documentation. It is a good thing 
to precise the relation between this implementation of the regex_* 
functions and the SQL stardard.


--
Gilles Darold






Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-12-15 Thread Peter Eisentraut

On 03.08.21 19:10, Tom Lane wrote:

Gilles Darold  writes:

Sorry I have missed that, but I'm fine with this implemenation so let's
keep the v6 version of the patch and drop this one.


Pushed, then.  There's still lots of time to tweak the behavior of course.


I have a documentation follow-up to this.  It seems that these new 
functions are almost a de facto standard, whereas the SQL-standard 
functions are not implemented anywhere.  I propose the attached patch to 
update the subsection in the pattern-matching section to give more 
detail on this and suggest equivalent functions among these newly added 
ones.  What do you think?From a2dbd0e24a30b945a5d641ed773dc44f5e6b50c1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 15 Dec 2021 11:02:59 +0100
Subject: [PATCH] doc: More documentation on regular expressions and SQL
 standard

---
 doc/src/sgml/func.sgml   | 91 +---
 src/backend/catalog/sql_features.txt | 10 +--
 2 files changed, 88 insertions(+), 13 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5801299b27..e58efce586 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -7353,10 +7353,26 @@ Basic Regular Expressions
 
 

-   Differences from XQuery (LIKE_REGEX)
+   Differences from SQL Standard and XQuery
 

-LIKE_REGEX
+LIKE_REGEX
+   
+
+   
+OCCURRENCES_REGEX
+   
+
+   
+POSITION_REGEX
+   
+
+   
+SUBSTRING_REGEX
+   
+
+   
+TRANSLATE_REGEX

 

@@ -7364,16 +7380,75 @@ Differences from XQuery 
(LIKE_REGEX)

 
 
- Since SQL:2008, the SQL standard includes
- a LIKE_REGEX operator that performs pattern
+ Since SQL:2008, the SQL standard includes regular expression operators
+ and functions that performs pattern
  matching according to the XQuery regular expression
- standard.  PostgreSQL does not yet
- implement this operator, but you can get very similar behavior using
- the regexp_match() function, since XQuery
- regular expressions are quite close to the ARE syntax described above.
+ standard:
+ 
+  LIKE_REGEX
+  OCCURRENCES_REGEX
+  POSITION_REGEX
+  SUBSTRING_REGEX
+  TRANSLATE_REGEX
+ 
+ PostgreSQL does not currently implement these
+ operators and functions.  You can get approximately equivalent
+ functionality in each case as shown in .  (Various optional clauses on
+ both sides have been omitted in this table.)
+
+
+
+ Regular Expression Functions Equivalencies
+
+ 
+  
+   
+SQL standard
+PostgreSQL
+   
+  
+
+  
+   
+string LIKE_REGEX 
pattern
+regexp_like(string, 
pattern) or 
string ~ 
pattern
+   
+
+   
+OCCURRENCES_REGEX(pattern 
IN string
+regexp_count(string, 
pattern)
+   
+
+   
+POSITION_REGEX(pattern IN 
string
+regexp_instr(string, 
pattern)
+   
+
+   
+SUBSTRING_REGEX(pattern IN 
string
+regexp_substr(string, 
pattern)
+   
+
+   
+TRANSLATE_REGEX(pattern IN 
string WITH 
replacement
+regexp_replace(string, 
pattern, 
replacement)
+   
+  
+ 
+
+
+
+ Regular expression functions similar to those provided by PostgreSQL are
+ also available in a number of other SQL implementations, whereas the
+ SQL-standard functions are not as widely implemented.  Some of the
+ details of the regular expression syntax will likely differ in each
+ implementation.
 
 
 
+ The SQL-standard operators and functions use XQuery regular expressions,
+ which are quite close to the ARE syntax described above.
  Notable differences between the existing POSIX-based
  regular-expression feature and XQuery regular expressions include:
 
diff --git a/src/backend/catalog/sql_features.txt 
b/src/backend/catalog/sql_features.txt
index 9f424216e2..b8a78f4d41 100644
--- a/src/backend/catalog/sql_features.txt
+++ b/src/backend/catalog/sql_features.txt
@@ -323,11 +323,11 @@ F821  Local table references  NO
 F831   Full cursor update  NO  
 F831   Full cursor update  01  Updatable scrollable cursorsNO  
 F831   Full cursor update  02  Updatable ordered cursors   NO  
-F841   LIKE_REGEX predicateNO  
-F842   OCCURRENCES_REGEX function  NO  
-F843   POSITION_REGEX function NO  
-F844   SUBSTRING_REGEX functionNO  
-F845   TRANSLATE_REGEX functionNO  
+F841   LIKE_REGEX predicateNO  consider regexp_like()
+F842   OCCURRENCES_REGEX function  NO  consider 
regexp_matches()
+F843   POSITION_REGEX function NO  consider regexp_instr()
+F844   SUBSTRING_REGEX functionNO  

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-03 Thread Tom Lane
Gilles Darold  writes:
> Sorry I have missed that, but I'm fine with this implemenation so let's 
> keep the v6 version of the patch and drop this one.

Pushed, then.  There's still lots of time to tweak the behavior of course.

regards, tom lane




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-03 Thread Gilles Darold

Le 03/08/2021 à 15:39, Tom Lane a écrit :

Erik Rijkers  writes:

On 8/3/21 1:26 PM, Gilles Darold wrote:

Le 03/08/2021 à 11:45, Gilles Darold a écrit :

Actually I just found that the regexp_like() function doesn't support
the start parameter which is something we should support. I saw that
Oracle do not support it but DB2 does and I think we should also
support it. I will post a new version of the patch once it is done.

+1
I for one am in favor of this 'start'-argument addition.  Slightly
harder usage, but more precise manipulation.

As I said upthread, I am *not* in favor of making those DB2 additions.
We do not need to create ambiguities around those functions like the
one we have for regexp_replace.  If Oracle doesn't have those options,
why do we need them?



Sorry I have missed that, but I'm fine with this implemenation so let's 
keep the v6 version of the patch and drop this one.


--
Gilles Darold





Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-03 Thread Tom Lane
Erik Rijkers  writes:
> On 8/3/21 1:26 PM, Gilles Darold wrote:
>> Le 03/08/2021 à 11:45, Gilles Darold a écrit :
>>> Actually I just found that the regexp_like() function doesn't support 
>>> the start parameter which is something we should support. I saw that 
>>> Oracle do not support it but DB2 does and I think we should also 
>>> support it. I will post a new version of the patch once it is done.

> +1

> I for one am in favor of this 'start'-argument addition.  Slightly 
> harder usage, but more precise manipulation.

As I said upthread, I am *not* in favor of making those DB2 additions.
We do not need to create ambiguities around those functions like the
one we have for regexp_replace.  If Oracle doesn't have those options,
why do we need them?

regards, tom lane




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-03 Thread Erik Rijkers

On 8/3/21 1:26 PM, Gilles Darold wrote:

Le 03/08/2021 à 11:45, Gilles Darold a écrit :
Actually I just found that the regexp_like() function doesn't support 
the start parameter which is something we should support. I saw that 
Oracle do not support it but DB2 does and I think we should also 
support it. I will post a new version of the patch once it is done.




+1

I for one am in favor of this 'start'-argument addition.  Slightly 
harder usage, but more precise manipulation.



Erik Rijkers




Here is a new version of the patch that adds the start parameter to 
regexp_like() function but while I'm adding support to this parameter it 
become less obvious for me that we should implement it. However feel 
free to not use this version if you think that adding the start 
parameter has no real interest.



Best regards,






Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-03 Thread Gilles Darold

Le 03/08/2021 à 11:45, Gilles Darold a écrit :
Actually I just found that the regexp_like() function doesn't support 
the start parameter which is something we should support. I saw that 
Oracle do not support it but DB2 does and I think we should also 
support it. I will post a new version of the patch once it is done.



Here is a new version of the patch that adds the start parameter to 
regexp_like() function but while I'm adding support to this parameter it 
become less obvious for me that we should implement it. However feel 
free to not use this version if you think that adding the start 
parameter has no real interest.



Best regards,

--
Gilles Darold

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a5b6adc4bb..2bc9060e47 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3108,6 +3108,80 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text
+ [, start integer
+ [, flags text ] ] )
+integer
+   
+   
+Returns the number of times the POSIX regular
+expression pattern matches in
+the string; see
+.
+   
+   
+regexp_count('123456789012', '\d\d\d', 2)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text
+ [, start integer
+ [, N integer
+ [, endoption integer
+ [, flags text
+ [, subexpr integer ] ] ] ] ] )
+integer
+   
+   
+Returns the position within string where
+the N'th match of the POSIX regular
+expression pattern occurs, or zero if there is
+no such match; see .
+   
+   
+regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i')
+3
+   
+   
+regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i', 2)
+5
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text
+ [, start integer
+ [, flags text ] ] )
+boolean
+   
+   
+Checks whether a match of the POSIX regular
+expression pattern occurs
+within string starting at
+the start'th character; see
+.
+   
+   
+regexp_like('Hello World', 'world$', 1, 'i')
+t
+   
+  
+
   

 
@@ -3117,8 +3191,9 @@ repeat('Pg', 4) PgPgPgPg
 text[]


-Returns captured substrings resulting from the first match of a POSIX
-regular expression to the string; see
+Returns captured substrings resulting from the first match of the
+POSIX regular expression pattern to
+the string; see
 .


@@ -3136,10 +3211,11 @@ repeat('Pg', 4) PgPgPgPg
 setof text[]


-Returns captured substrings resulting from the first match of a
-POSIX regular expression to the string,
-or multiple matches if the g flag is used;
-see .
+Returns captured substrings resulting from the first match of the
+POSIX regular expression pattern to
+the string, or all matches if
+the g flag is used; see
+.


 regexp_matches('foobarbequebaz', 'ba.', 'g')
@@ -3156,14 +3232,16 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text
+ [, start integer ]
+ [, flags text ] )
 text


-Replaces substrings resulting from the first match of a
-POSIX regular expression, or multiple substring matches
-if the g flag is used; see .
+Replaces the substring that is the first match to the POSIX
+regular expression pattern, or all matches
+if the g flag is used; see
+.


 regexp_replace('Thomas', '.[mN]a.', 'M')
@@ -3171,6 +3249,26 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+regexp_replace ( string text, pattern text, replacement text,
+ start integer,
+ N integer
+ [, flags text ] )
+text
+   
+   
+Replaces the substring that is the N'th
+match to the POSIX regular expression pattern,
+or all matches if N is zero; see
+.
+   
+   
+regexp_replace('Thomas', '.', 'X', 3, 2)
+ThoXas
+   
+  
+
   

 
@@ -3213,6 +3311,35 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text
+ [, start integer
+ [, N integer
+ [, flags text
+ [, subexpr 

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-03 Thread Gilles Darold

Le 02/08/2021 à 23:22, Gilles Darold a écrit :

Le 02/08/2021 à 01:21, Tom Lane a écrit :

Gilles Darold  writes:

[ v5-0001-regexp-foo-functions.patch ]

I've gone through this whole patch now, and found quite a lot that I did
not like.  In no particular order:

* Wrapping parentheses around the user's regexp doesn't work.  It can
turn an invalid regexp into a valid one: for example 'a)(b' should draw
a syntax error.  With this patch, no error would be thrown, but the
"outer" parens wouldn't do what you expected.  Worse, it can turn a
valid regexp into an invalid one: the metasyntax options described in
9.7.3.4 only work at the start of the regexp.  So we have to handle
whole-regexp cases honestly rather than trying to turn them into an
instance of the parenthesized-subexpression case.

* You did a lot of things quite inefficiently, apparently to avoid
touching any existing code.  I think it's better to extend
setup_regexp_matches() and replace_text_regexp() a little bit so that
they can support the behaviors these new functions need.  In both of
them, it's absolutely trivial to allow a search start position to be
passed in; and it doesn't take much to teach replace_text_regexp()
to replace only the N'th match.

* Speaking of N'th, there is not much of anything that I like
about Oracle's terminology for the function arguments, and I don't
think we ought to adopt it.  If we're documenting the functions as
processing the "N'th match", it seems to me to be natural to call
the parameter "N" not "occurrence".  Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell.
Likewise, "position" is a horribly vague term for the search start
position; it could be interpreted to mean several other things.
"start" seems much better.  "return_opt" is likewise awfully unclear.
I went with "endoption" below, though I could be talked into something
else.  The only one of Oracle's choices that I like is "subexpr" for
subexpression number ... but you went with DB2's rather vague "group"
instead.  I don't want to use their "capture group" terminology,
because that appears nowhere else in our documentation.  Our existing
terminology is "parenthesized subexpression", which seems fine to me
(and also agrees with Oracle's docs).

* I spent a lot of time on the docs too.  A lot of the syntax specs
were wrong (where you put the brackets matters), many of the examples
seemed confusingly overcomplicated, and the text explanations needed
copy-editing.

* Also, the regression tests seemed misguided.  This patch is not
responsible for testing the regexp engine as such; we have tests
elsewhere that do that.  So I don't think we need complex regexps
here.  We just need to verify that the parameters of these functions
act properly, and check their error cases.  That can be done much
more quickly and straightforwardly than what you had.


So here's a revised version that I like better.  I think this
is pretty nearly committable, aside from the question of whether
a too-large subexpression number should be an error or not.


Thanks a lot for the patch improvement and the guidance. I have read the
patch and I agree with your choices I think I was too much trying to
mimic the oraclisms. I don't think we should take care of the too-large
subexpression number, the regexp writer should always test its regular
expression and also this will not prevent him to chose the wrong capture
group number but just a non existing one.



Actually I just found that the regexp_like() function doesn't support 
the start parameter which is something we should support. I saw that 
Oracle do not support it but DB2 does and I think we should also support 
it. I will post a new version of the patch once it is done.



Best regards,

--
Gilles Darold





Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-02 Thread Gilles Darold
Le 02/08/2021 à 01:21, Tom Lane a écrit :
> Gilles Darold  writes:
>> [ v5-0001-regexp-foo-functions.patch ]
> I've gone through this whole patch now, and found quite a lot that I did
> not like.  In no particular order:
>
> * Wrapping parentheses around the user's regexp doesn't work.  It can
> turn an invalid regexp into a valid one: for example 'a)(b' should draw
> a syntax error.  With this patch, no error would be thrown, but the
> "outer" parens wouldn't do what you expected.  Worse, it can turn a
> valid regexp into an invalid one: the metasyntax options described in
> 9.7.3.4 only work at the start of the regexp.  So we have to handle
> whole-regexp cases honestly rather than trying to turn them into an
> instance of the parenthesized-subexpression case.
>
> * You did a lot of things quite inefficiently, apparently to avoid
> touching any existing code.  I think it's better to extend
> setup_regexp_matches() and replace_text_regexp() a little bit so that
> they can support the behaviors these new functions need.  In both of
> them, it's absolutely trivial to allow a search start position to be
> passed in; and it doesn't take much to teach replace_text_regexp()
> to replace only the N'th match.
>
> * Speaking of N'th, there is not much of anything that I like
> about Oracle's terminology for the function arguments, and I don't
> think we ought to adopt it.  If we're documenting the functions as
> processing the "N'th match", it seems to me to be natural to call
> the parameter "N" not "occurrence".  Speaking of the "occurrence'th
> occurrence" is just silly, not to mention long and easy to misspell.
> Likewise, "position" is a horribly vague term for the search start
> position; it could be interpreted to mean several other things.
> "start" seems much better.  "return_opt" is likewise awfully unclear.
> I went with "endoption" below, though I could be talked into something
> else.  The only one of Oracle's choices that I like is "subexpr" for
> subexpression number ... but you went with DB2's rather vague "group"
> instead.  I don't want to use their "capture group" terminology,
> because that appears nowhere else in our documentation.  Our existing
> terminology is "parenthesized subexpression", which seems fine to me
> (and also agrees with Oracle's docs).
>
> * I spent a lot of time on the docs too.  A lot of the syntax specs
> were wrong (where you put the brackets matters), many of the examples
> seemed confusingly overcomplicated, and the text explanations needed
> copy-editing.
>
> * Also, the regression tests seemed misguided.  This patch is not
> responsible for testing the regexp engine as such; we have tests
> elsewhere that do that.  So I don't think we need complex regexps
> here.  We just need to verify that the parameters of these functions
> act properly, and check their error cases.  That can be done much
> more quickly and straightforwardly than what you had.
>
>
> So here's a revised version that I like better.  I think this
> is pretty nearly committable, aside from the question of whether
> a too-large subexpression number should be an error or not.


Thanks a lot for the patch improvement and the guidance. I have read the
patch and I agree with your choices I think I was too much trying to
mimic the oraclisms. I don't think we should take care of the too-large
subexpression number, the regexp writer should always test its regular
expression and also this will not prevent him to chose the wrong capture
group number but just a non existing one.


Best regards,

-- 
Gilles Darold






Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Tom Lane
I wrote:
> ... aside from the question of whether
> a too-large subexpression number should be an error or not.

Oh ... poking around some more, I noticed a very nearby precedent.
regexp_replace's replacement string can include \1 to \9 to insert
the substring matching the N'th parenthesized subexpression.  But
if there is no such subexpression, you don't get an error, just
an empty insertion.  So that seems like an argument for not
throwing an error for an out-of-range subexpr parameter.

regards, tom lane




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Tom Lane
Gilles Darold  writes:
> [ v5-0001-regexp-foo-functions.patch ]

I've gone through this whole patch now, and found quite a lot that I did
not like.  In no particular order:

* Wrapping parentheses around the user's regexp doesn't work.  It can
turn an invalid regexp into a valid one: for example 'a)(b' should draw
a syntax error.  With this patch, no error would be thrown, but the
"outer" parens wouldn't do what you expected.  Worse, it can turn a
valid regexp into an invalid one: the metasyntax options described in
9.7.3.4 only work at the start of the regexp.  So we have to handle
whole-regexp cases honestly rather than trying to turn them into an
instance of the parenthesized-subexpression case.

* You did a lot of things quite inefficiently, apparently to avoid
touching any existing code.  I think it's better to extend
setup_regexp_matches() and replace_text_regexp() a little bit so that
they can support the behaviors these new functions need.  In both of
them, it's absolutely trivial to allow a search start position to be
passed in; and it doesn't take much to teach replace_text_regexp()
to replace only the N'th match.

* Speaking of N'th, there is not much of anything that I like
about Oracle's terminology for the function arguments, and I don't
think we ought to adopt it.  If we're documenting the functions as
processing the "N'th match", it seems to me to be natural to call
the parameter "N" not "occurrence".  Speaking of the "occurrence'th
occurrence" is just silly, not to mention long and easy to misspell.
Likewise, "position" is a horribly vague term for the search start
position; it could be interpreted to mean several other things.
"start" seems much better.  "return_opt" is likewise awfully unclear.
I went with "endoption" below, though I could be talked into something
else.  The only one of Oracle's choices that I like is "subexpr" for
subexpression number ... but you went with DB2's rather vague "group"
instead.  I don't want to use their "capture group" terminology,
because that appears nowhere else in our documentation.  Our existing
terminology is "parenthesized subexpression", which seems fine to me
(and also agrees with Oracle's docs).

* I spent a lot of time on the docs too.  A lot of the syntax specs
were wrong (where you put the brackets matters), many of the examples
seemed confusingly overcomplicated, and the text explanations needed
copy-editing.

* Also, the regression tests seemed misguided.  This patch is not
responsible for testing the regexp engine as such; we have tests
elsewhere that do that.  So I don't think we need complex regexps
here.  We just need to verify that the parameters of these functions
act properly, and check their error cases.  That can be done much
more quickly and straightforwardly than what you had.


So here's a revised version that I like better.  I think this
is pretty nearly committable, aside from the question of whether
a too-large subexpression number should be an error or not.

regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a5b6adc4bb..80aac4965e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3108,6 +3108,78 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text
+ [, start integer
+ [, flags text ] ] )
+integer
+   
+   
+Returns the number of times the POSIX regular
+expression pattern matches in
+the string; see
+.
+   
+   
+regexp_count('123456789012', '\d\d\d', 2)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text
+ [, start integer
+ [, N integer
+ [, endoption integer
+ [, flags text
+ [, subexpr integer ] ] ] ] ] )
+integer
+   
+   
+Returns the position within string where
+the N'th match of the POSIX regular
+expression pattern occurs, or zero if there is
+no such match; see .
+   
+   
+regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i')
+3
+   
+   
+regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i', 2)
+5
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text
+ [, flags text ] )
+boolean
+   
+   
+Checks whether a match of the POSIX regular
+expression pattern occurs
+within string; see
+.
+   
+   
+regexp_like('Hello World', 'world$', 'i')
+t
+   
+  
+
   

 
@@ -3117,8 +3189,9 @@ repeat('Pg', 4) PgPgPgPg
 text[]


-Returns captured substrings resulting from the first match of a POSIX
-regular expression to the string; see
+

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Gilles Darold
Le 01/08/2021 à 19:23, Tom Lane a écrit :
> I've been working through this patch, and trying to verify
> compatibility against Oracle and DB2, and I see some points that need
> discussion or at least recording for the archives.
>
> * In Oracle, while the documentation for regexp_instr says that
> return_option should only be 0 or 1, experimentation with sqlfiddle
> shows that any nonzero value is silently treated as 1.  The patch
> raises an error for other values, which I think is a good idea.
> (IBM's docs say that DB2 raises an error too, though I can't test
> that.)  We don't need to be bug-compatible to that extent.
>
> * What should happen when the subexpression/capture group number of
> regexp_instr or regexp_substr exceeds the number of parenthesized
> subexpressions of the regexp?  Oracle silently returns a no-match
> result (0 or NULL), as does this patch.  However, IBM's docs say
> that DB2 raises an error.  I'm inclined to think that this is
> likewise taking bug-compatibility too far, and that we should
> raise an error like DB2.  There are clearly cases where throwing
> an error would help debug a faulty call, while I'm less clear on
> a use-case where not throwing an error would be useful.
>
> * IBM's docs say that both regexp_count and regexp_like have
> arguments "string, pattern [, start] [, flags]" --- that is,
> each of start and flags can be independently specified or omitted.
> The patch follows Oracle, which has no start option for 
> regexp_like, and where you can't write flags for regexp_count
> without writing start.  This is fine by me, because doing these
> like DB2 would introduce the same which-argument-is-this issues
> as we're being forced to cope with for regexp_replace.  I don't
> think we need to accept ambiguity in these cases too.  But it's
> worth memorializing this decision in the thread.
>
> * The patch has most of these functions silently ignoring the 'g'
> flag, but I think they should raise errors instead.  Oracle doesn't
> accept a 'g' flag for these, so why should we?  The only case where
> that logic doesn't hold is regexp_replace, because depending on which
> syntax you use the 'g' flag might or might not be meaningful.  So
> for regexp_replace, I'd vote for silently ignoring 'g' if the
> occurrence-number parameter is given, while honoring it if not.
>
> I've already made changes in my local copy per the last item,
> but I've not done anything about throwing errors for out-of-range
> subexpression numbers.  Anybody have an opinion about that one?


I thought about this while I was implementing the functions and chose to
not throw an error because of the Oracle behavior and also with others
regular expression implementation. For example in Perl there is no error:


$ perl -e '$str="hello world"; $str =~ s/(l)/$20/; print "$str\n";'
helo world


Usually a regular expression is always tested by its creator to be sure
that this the right one and that it does what is expected. But I agree
that it could help the writer to debug its RE.


Also if I recall well Oracle and DB2 limit the number of capture groups
back references from \1 to \9 for Oracle and \0 to \9 for DB2. I have
chosen to not apply this limit, I don't see the interest of such a
limitation.



-- 
Gilles Darold
http://www.darold.net/



Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Gilles Darold
Le 30/07/2021 à 23:38, Tom Lane a écrit :
> Gilles Darold  writes:
>> Le 26/07/2021 à 21:56, Tom Lane a écrit :
>>> I'm inclined to just drop the regexp_replace additions.  I don't think
>>> that the extra parameters Oracle provides here are especially useful.
>>> They're definitely not useful enough to justify creating compatibility
>>> hazards for.
>> I would not say that being able to replace the Nth occurrence of a
>> pattern matching is not useful but i agree that this is not a common
>> case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
>> and I have though that we can not have compatibility issues because of
>> the different data type at the 4th parameter.
> Well, here's an example of the potential issues:
>
> [...]


Thanks for pointing me this case, I did not think that the prepared
statement could lead to this confusion.


>> Anyway, maybe we can just
>> rename the function even if I would prefer that regexp_replace() be
>> extended. For example:
>>     regexp_replace(source, pattern, replacement [, flags ]);
>>     regexp_substitute(source, pattern, replacement [, position ] [,
>> occurrence ] [, flags ]);
> Hmm.  Of course the entire selling point of this patch seems to be
> bug-compatibility with Oracle, so using different names is largely
> defeating the point :-(
>
> Maybe we should just hold our noses and do it.  The point that
> you'd get a recognizable failure if the wrong function were chosen
> reassures me a little bit.  We've seen a lot of cases where this
> sort of ambiguity results in the system just silently doing something
> different from what you expected, and I was afraid that that could
> happen here.


I join a new version of the patch that include a check of the option
parameter in the basic form of regexp_replace() and return an error in
ambiguous cases.


PREPARE rr AS SELECT regexp_replace('healthy, wealthy, and
wise','(\w+)thy', '\1ish', $1);
EXECUTE rr(1);
ERROR:  ambiguous use of the option parameter in regex_replace(),
value: 1
HINT:  you might set the occurrence parameter to force the use of
the extended form of regex_replace()


This is done by checking if the option parameter value is an integer and
throw the error in this case. I don't think of anything better.


Best regards,

-- 
Gilles Darold

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a5b6adc4bb..02d1f72e1e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3108,6 +3108,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Returns the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Returns the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluates the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+t
+   
+  
+
+
   

 
@@ -3156,7 +3216,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3171,6 +3231,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

   

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-08-01 Thread Tom Lane
I've been working through this patch, and trying to verify
compatibility against Oracle and DB2, and I see some points that need
discussion or at least recording for the archives.

* In Oracle, while the documentation for regexp_instr says that
return_option should only be 0 or 1, experimentation with sqlfiddle
shows that any nonzero value is silently treated as 1.  The patch
raises an error for other values, which I think is a good idea.
(IBM's docs say that DB2 raises an error too, though I can't test
that.)  We don't need to be bug-compatible to that extent.

* What should happen when the subexpression/capture group number of
regexp_instr or regexp_substr exceeds the number of parenthesized
subexpressions of the regexp?  Oracle silently returns a no-match
result (0 or NULL), as does this patch.  However, IBM's docs say
that DB2 raises an error.  I'm inclined to think that this is
likewise taking bug-compatibility too far, and that we should
raise an error like DB2.  There are clearly cases where throwing
an error would help debug a faulty call, while I'm less clear on
a use-case where not throwing an error would be useful.

* IBM's docs say that both regexp_count and regexp_like have
arguments "string, pattern [, start] [, flags]" --- that is,
each of start and flags can be independently specified or omitted.
The patch follows Oracle, which has no start option for 
regexp_like, and where you can't write flags for regexp_count
without writing start.  This is fine by me, because doing these
like DB2 would introduce the same which-argument-is-this issues
as we're being forced to cope with for regexp_replace.  I don't
think we need to accept ambiguity in these cases too.  But it's
worth memorializing this decision in the thread.

* The patch has most of these functions silently ignoring the 'g'
flag, but I think they should raise errors instead.  Oracle doesn't
accept a 'g' flag for these, so why should we?  The only case where
that logic doesn't hold is regexp_replace, because depending on which
syntax you use the 'g' flag might or might not be meaningful.  So
for regexp_replace, I'd vote for silently ignoring 'g' if the
occurrence-number parameter is given, while honoring it if not.

I've already made changes in my local copy per the last item,
but I've not done anything about throwing errors for out-of-range
subexpression numbers.  Anybody have an opinion about that one?

regards, tom lane




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-07-30 Thread Tom Lane
Gilles Darold  writes:
> Le 26/07/2021 à 21:56, Tom Lane a écrit :
>> I'm inclined to just drop the regexp_replace additions.  I don't think
>> that the extra parameters Oracle provides here are especially useful.
>> They're definitely not useful enough to justify creating compatibility
>> hazards for.

> I would not say that being able to replace the Nth occurrence of a
> pattern matching is not useful but i agree that this is not a common
> case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
> and I have though that we can not have compatibility issues because of
> the different data type at the 4th parameter.

Well, here's an example of the potential issues:

regression=# create function rr(text,text,text,text) returns text
regression-# language sql as $$select 'text'$$;
CREATE FUNCTION
regression=# create function rr(text,text,text,int4) returns text
language sql as $$select 'int4'$$;
CREATE FUNCTION
regression=# select rr('a','b','c','d');
  rr  
--
 text
(1 row)

regression=# select rr('a','b','c',42);
  rr  
--
 int4
(1 row)

So far so good, but:

regression=# prepare rr as select rr('a','b','c',$1);
PREPARE
regression=# execute rr(12);  
  rr  
--
 text
(1 row)

So somebody trying to use the 4-parameter Oracle form from, say, JDBC
would get bit if they were sloppy about specifying parameter types.

The one saving grace is that digits aren't valid regexp flags,
so the outcome would be something like

regression=# select regexp_replace('a','b','c','12');
ERROR:  invalid regular expression option: "1"

which'd be less difficult to debug than silent misbehavior.
Conversely, if you thought you were passing flags but it somehow
got interpreted as a start position, that would fail too:

regression=# prepare rri as select rr('a','b','c', $1::int);
PREPARE
regression=# execute rri('gi');
ERROR:  invalid input syntax for type integer: "gi"
LINE 1: execute rri('gi');
^

Still, I bet a lot that we'd see periodic bug reports complaining
that it doesn't work.

> Anyway, maybe we can just
> rename the function even if I would prefer that regexp_replace() be
> extended. For example:
>     regexp_replace(source, pattern, replacement [, flags ]);
>     regexp_substitute(source, pattern, replacement [, position ] [,
> occurrence ] [, flags ]);

Hmm.  Of course the entire selling point of this patch seems to be
bug-compatibility with Oracle, so using different names is largely
defeating the point :-(

Maybe we should just hold our noses and do it.  The point that
you'd get a recognizable failure if the wrong function were chosen
reassures me a little bit.  We've seen a lot of cases where this
sort of ambiguity results in the system just silently doing something
different from what you expected, and I was afraid that that could
happen here.

regards, tom lane




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-07-27 Thread Gilles Darold
Le 26/07/2021 à 21:56, Tom Lane a écrit :
> Gilles Darold  writes:
>> [ v4-0001-regexp-foo-functions.patch ]
> I started to work through this and was distressed to realize that
> it's trying to redefine regexp_replace() in an incompatible way.
> We already have
>
> regression=# \df regexp_replace
>List of functions
>Schema   |  Name  | Result data type |  Argument data types   | 
> Type 
> ++--++--
>  pg_catalog | regexp_replace | text | text, text, text   | 
> func
>  pg_catalog | regexp_replace | text | text, text, text, text | 
> func
> (2 rows)
>
> The patch proposes to add (among other alternatives)
>
> +{ oid => '9608', descr => 'replace text using regexp',
> +  proname => 'regexp_replace', prorettype => 'text',
> +  proargtypes => 'text text text int4', prosrc => 
> 'textregexreplace_extended_no_occurrence' },
>
> which is going to be impossibly confusing for both humans and machines.
> I don't think we should go there.  Even if you managed to construct
> examples that didn't result in "ambiguous function" failures, that
> doesn't mean that ordinary mortals won't get bit that way.
>
> I'm inclined to just drop the regexp_replace additions.  I don't think
> that the extra parameters Oracle provides here are especially useful.
> They're definitely not useful enough to justify creating compatibility
> hazards for.


I would not say that being able to replace the Nth occurrence of a
pattern matching is not useful but i agree that this is not a common
case with replacement. Both Oracle [1] and IBM DB2 [2] propose this form
and I have though that we can not have compatibility issues because of
the different data type at the 4th parameter. Anyway, maybe we can just
rename the function even if I would prefer that regexp_replace() be
extended. For example:


    regexp_replace(source, pattern, replacement [, flags ]);

    regexp_substitute(source, pattern, replacement [, position ] [,
occurrence ] [, flags ]);


of course with only 3 parameters the two functions are the same.


What do you think about the renaming proposal instead of simply drop the
extended form of the function?


Best regards,


[1] https://docs.oracle.com/database/121/SQLRF/functions163.htm#SQLRF06302

[2] https://www.ibm.com/docs/en/db2oc?topic=functions-regexp-replace


-- 
Gilles Darold
http://www.darold.net/






Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-07-26 Thread Tom Lane
Gilles Darold  writes:
> [ v4-0001-regexp-foo-functions.patch ]

I started to work through this and was distressed to realize that
it's trying to redefine regexp_replace() in an incompatible way.
We already have

regression=# \df regexp_replace
   List of functions
   Schema   |  Name  | Result data type |  Argument data types   | Type 
++--++--
 pg_catalog | regexp_replace | text | text, text, text   | func
 pg_catalog | regexp_replace | text | text, text, text, text | func
(2 rows)

The patch proposes to add (among other alternatives)

+{ oid => '9608', descr => 'replace text using regexp',
+  proname => 'regexp_replace', prorettype => 'text',
+  proargtypes => 'text text text int4', prosrc => 
'textregexreplace_extended_no_occurrence' },

which is going to be impossibly confusing for both humans and machines.
I don't think we should go there.  Even if you managed to construct
examples that didn't result in "ambiguous function" failures, that
doesn't mean that ordinary mortals won't get bit that way.

I'm inclined to just drop the regexp_replace additions.  I don't think
that the extra parameters Oracle provides here are especially useful.
They're definitely not useful enough to justify creating compatibility
hazards for.

regards, tom lane




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 21/03/2021 à 15:53, Tom Lane a écrit :
> Chapman Flack  writes:
>> If this turns out to be a case of "attached the wrong patch, here's
>> the one that does implement foo_regex functions!" then I reserve an
>> objection to that. :)
>>

And the patch renamed.

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..36e446cb7b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Returns the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Returns the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluates the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+t
+   
+  
+
+
   

 
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3157,6 +3217,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

 
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#')NULL
 regexp_split_to_array

+   
+regexp_like
+   
+   
+regexp_count
+   
+   
+regexp_instr
+   
+   
+regexp_substr
+   
 

  lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b')   o
  It has the syntax
  regexp_replace(source,
  pattern, replacement
+ , position 
+ , occurrence 
  , flags ).
  The source string is returned unchanged if
  there is no match to the pattern.  If there is a
@@ -5464,12 +5556,19 @@ substring('foobar' from 'o(.)b')   o
  substring matching the entire pattern should be inserted.  Write
  \\ if you need to put a literal backslash in the replacement
  text.
- The flags parameter is an optional text
- string containing zero or more single-letter flags that change the
+ pattern is searched in string starting
+ from an optional position or from the beginning
+ of source by default.  Optional occurrence
+ parameter indicates which occurrence of pattern in
+ source should be replaced.  Default value for occurrence
+ is 1, replace only the first occurrence.  The flags parameter
+ is an optional text string containing zero or more single-letter flags that change the
  function's behavior.  Flag i specifies case-insensitive
  matching, while flag g specifies replacement of each matching
- substring rather than only the first one.  Supported flags (though
- not g) are
+ substring rather than only the first one. When occurrence
+ is set flag g has no effect. If occurrence
+ is set to zero, all occurrences are replaced which is similar to flag g.
+ Supported flags (though not g) are
  described in .
 
 
@@ -5482,6 +5581,10 @@ regexp_replace('foobarbaz', 'b..', 'X', 'g')
fooXX
 regexp_replace('foobarbaz', 'b(..)', 'X\1Y', 'g')
fooXarYXazY
+regexp_replace('A PostgreSQL function', 

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 21/03/2021 à 15:53, Tom Lane a écrit :
> Chapman Flack  writes:
>> If this turns out to be a case of "attached the wrong patch, here's
>> the one that does implement foo_regex functions!" then I reserve an
>> objection to that. :)
> +1 to that.  Just to add a note, I do have some ideas about extending
> our regex parser so that it could duplicate the XQuery syntax --- none
> of the points we mention in 9.7.3.8 seem insurmountable.  I'm not
> planning to work on that in the near future, mind you, but I definitely
> think that we don't want to paint ourselves into a corner where we've
> already implemented the XQuery regex functions with the wrong behavior.
>
>   regards, tom lane


I apologize for confusing with the words and phrases I have used. This
patch implements the regexp_foo () functions which are available in most
RDBMS with the behavior described in the documentation. I have modified
the title of the patch in the commitfest to removed wrong use of XQUERY. 


I don't know too if the other RDBMS respect the XQUERY behavior but for
what I've seen for Oracle they are using limited regexp modifiers with
sometime not the same letter than PostgreSQL for the same behavior. I
have implemented these functions with the Oracle behavior in Orafce [1]
with a function that checks the modifiers used. This patch doesn't mimic
the Oracle behavior, it use the PostgreSQL behavior with regexp, the one
used by regex_replace() and regex_matches(). All regexp modifiers can be
used.


[1] https://github.com/orafce/orafce/blob/master/orafce--3.14--3.15.sql


-- 
Gilles Darold
http://www.darold.net/



Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Tom Lane
Chapman Flack  writes:
> If this turns out to be a case of "attached the wrong patch, here's
> the one that does implement foo_regex functions!" then I reserve an
> objection to that. :)

+1 to that.  Just to add a note, I do have some ideas about extending
our regex parser so that it could duplicate the XQuery syntax --- none
of the points we mention in 9.7.3.8 seem insurmountable.  I'm not
planning to work on that in the near future, mind you, but I definitely
think that we don't want to paint ourselves into a corner where we've
already implemented the XQuery regex functions with the wrong behavior.

regards, tom lane




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Chapman Flack
On 03/21/21 09:19, Gilles Darold wrote:
>>> On 2021.03.20. 19:48 Gilles Darold  wrote:
>>>  
>>> This is a new version of the patch that now implements all the XQUERY
>>> regexp functions as described in the standard, minus the differences of
>>> PostgerSQL regular expression explain in [1].
>>>
>>> The standard SQL describe functions like_regex(), occurrences_regex(),
>>> position_regex(), substring_regex() and translate_regex() which
>>> correspond to the commonly named functions regexp_like(),
>>> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
>>> reported by Chapman Flack in [2]. All these function are implemented in
>>> [v2-0001-xquery-regexp-functions.patch]

I quickly looked over this patch preparing to object if it actually
purported to implement the ISO foo_regex() named functions without
the ISO semantics, but a quick grep reassured me that it doesn't
implement any of those functions. It only supplies functions in
the alternative, apparently common de facto naming scheme regexp_foo().

To be clear, I think that's the right call. I do not think it would be
a good idea to supply functions that have the ISO names but not the
specified regex dialect.

A set of functions analogous to the ISO ones but differently named and
with a different regex dialect seems fine to me, especially if these
different names are de facto common, and as far as I can tell, that is
what this patch provides. So I have no objection to that. :)

It might then be fair to say that the /description/ of the patch as
implementing the XQuery-based foo_regex functions isn't quite right,
or at least carries a risk of jarring some readers into hasty
double-takes on Sunday mornings before coffee.

It might be clearer to just mention the close correspondence between
the functions in this differently-named set and the corresponding ISO ones.

If this turns out to be a case of "attached the wrong patch, here's
the one that does implement foo_regex functions!" then I reserve an
objection to that. :)

Regards,
-Chap




Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 21/03/2021 à 12:07, e...@xs4all.nl a écrit :
>> On 2021.03.20. 19:48 Gilles Darold  wrote:
>>  
>> This is a new version of the patch that now implements all the XQUERY
>> regexp functions as described in the standard, minus the differences of
>> PostgerSQL regular expression explain in [1].
>>
>> The standard SQL describe functions like_regex(), occurrences_regex(),
>> position_regex(), substring_regex() and translate_regex() which
>> correspond to the commonly named functions regexp_like(),
>> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
>> reported by Chapman Flack in [2]. All these function are implemented in
>> [v2-0001-xquery-regexp-functions.patch]
> Hi,
>
> Apply, compile and (world)check are fine. I haven't found errors in 
> functionality.
>
> I went through the docs, and came up with these changes in func.sgml, and 
> pg_proc.dat.
>
> Useful functions - thanks!
>
> Erik Rijkers


Thanks a lot Erik, here is a version of the patch with your corrections.


-- 
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..36e446cb7b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Returns the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Returns the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluates the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+t
+   
+  
+
+
   

 
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3157,6 +3217,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

 
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#')NULL
 regexp_split_to_array

+   
+regexp_like
+   
+   
+regexp_count
+   
+   
+regexp_instr
+   
+   
+regexp_substr
+   
 

  lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b')   o
  It has the syntax
  regexp_replace(source,
  pattern, replacement
+ , position 
+ , occurrence 
  , flags ).
  The source string is returned unchanged if
  there is no match to the pattern.  If there is a
@@ -5464,12 +5556,19 @@ substring('foobar' from 'o(.)b')   o
  substring matching the entire pattern should be inserted.  Write
  \\ if you need to put a literal backslash in the replacement
  text.
- The flags parameter is an optional text
- string containing zero or more single-letter flags that change the
+ pattern is searched in string starting
+ from an optional position or from the beginning
+ of source by default.  Optional occurrence
+ parameter indicates which occurrence of pattern in
+ source should be replaced.  Default value for occurrence
+ is 1, replace only the first occurrence.  The flags parameter
+ is an optional text string 

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread er
> On 2021.03.20. 19:48 Gilles Darold  wrote:
>  
> This is a new version of the patch that now implements all the XQUERY
> regexp functions as described in the standard, minus the differences of
> PostgerSQL regular expression explain in [1].
> 
> The standard SQL describe functions like_regex(), occurrences_regex(),
> position_regex(), substring_regex() and translate_regex() which
> correspond to the commonly named functions regexp_like(),
> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
> reported by Chapman Flack in [2]. All these function are implemented in

> [v2-0001-xquery-regexp-functions.patch]

Hi,

Apply, compile and (world)check are fine. I haven't found errors in 
functionality.

I went through the docs, and came up with these changes in func.sgml, and 
pg_proc.dat.

Useful functions - thanks!

Erik Rijkers--- doc/src/sgml/func.sgml.orig	2021-03-21 03:59:37.884365465 +0100
+++ doc/src/sgml/func.sgml	2021-03-21 11:37:46.880644051 +0100
@@ -3106,7 +3106,7 @@
 integer


-Return the number of times a pattern occurs for a match of a POSIX
+Returns the number of times a pattern occurs for a match of a POSIX
 regular expression to the string; see
 .

@@ -3125,11 +3125,11 @@
 integer


-   Return the position within string where the
+   Returns the position within string where the
match of a POSIX regular expression occurs. It returns an integer
indicating the beginning or ending position of the matched substring,
depending on the value of the returnopt argument
-   (default beginning). If no match is found, then the function returns 0;
+   (default beginning). If no match is found the function returns 0;
see .


@@ -3147,12 +3147,12 @@
 boolean


-Evaluate the existence of a match to a POSIX regular expression
+Evaluates the existence of a match to a POSIX regular expression
 in string; see .


 regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
-3
+t

   
 
@@ -5773,7 +5773,7 @@

 
 
- The regexp_like function evaluate the existence of a match
+ The regexp_like function evaluates the existence of a match
  to a POSIX regular expression in string; returns a boolean
  resulting from matching a POSIX regular expression pattern to a string.  It has
  the syntax regexp_like(string,
@@ -5782,7 +5782,7 @@
  from the beginning of string. 
  The flags parameter is an optional text string
  containing zero or more single-letter flags that change the function's behavior.
- regexp_count accepts all the flags
+ regexp_like accepts all the flags
  shown in .
  This function is similar to regexp operator ~ when used without
  flags and similar to operator ~* when
@@ -5792,9 +5792,9 @@

 Some examples:
 
-SELECT 'found' FROM t1 WHERE regexp_like('Hello'||chr(10)||'world', '^world$', 'm');
- regexp_like
---
+SELECT 'found' FROM (values('Hello'||chr(10)||'world') as f(col) WHERE regexp_like(col, '^world$', 'm');
+ ?column? 
+--
 found
 (1 row)
 
@@ -5814,7 +5814,7 @@
  regexp_count accepts all the flags
  shown in .
  The g flag is forced internally to count all matches.
- This function returns 0 if there is no match or the number of match as
+ This function returns 0 if there is no match or the number of matches as
  an integer.
 
 
@@ -5853,17 +5853,17 @@
  the position of the character after the occurrence.
  The flags parameter is an optional text string
  containing zero or more single-letter flags that change the function's behavior.
- regexp_count accepts all the flags
+ regexp_instr accepts all the flags
  shown in .
  The g flag is forced internally to track all matches.
  For a pattern with capture groups, group is an integer indicating
- which capture in pattern is the target of the function.  A capture group is a part of the pattern
- enclosed in parentheses. Capture groups can be nested.  They are numbered in order in which their
- left parentheses appear in pattern. If group is zero, then the position
+ which capture in pattern is the target of the function.  A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested.  They are numbered in the order in which their
+ left parentheses appear in pattern. If group is zero, then the position
  of the entire substring that matches the pattern is returned. If pattern
- does not have at least group capture group, the function returns zero.
+ does not have at least group capture groups, the function returns zero.
  This function returns 0 if there is no match or the starting or ending position
- of match as an integer.
+ of a match as an integer.
  

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-21 Thread Gilles Darold
Le 20/03/2021 à 19:48, Gilles Darold a écrit :
>
> Hi,
>
>
> This is a new version of the patch that now implements all the XQUERY
> regexp functions as described in the standard, minus the differences
> of PostgerSQL regular expression explain in [1].
>
>
> The standard SQL describe functions like_regex(), occurrences_regex(),
> position_regex(), substring_regex() and translate_regex() which
> correspond to the commonly named functions regexp_like(),
> regexp_count(), regexp_instr(), regexp_substr() and regexp_replace()
> as reported by Chapman Flack in [2]. All these function are
> implemented in the patch. Syntax of the functions are:
>
>
> - regexp_like(string, pattern [, flags ])
>
> - regexp_count( string, pattern [, position ] [, flags ])
>
> - regexp_instr( string, pattern [, position ] [, occurrence ] [,
> returnopt ] [, flags ] [, group ])
>
> - regexp_substr( string, pattern [, position ] [, occurrence ] [,
> flags ] [, group ])
>
> - regexp_replace(source, pattern, replacement [, position ] [,
> occurrence ] [, flags ])
>
>
> In addition to previous patch version I have added the regexp()_like
> function and extended the existsing regex_replace() function. The
> patch documents these functions and adds regression tests for all
> functions. I will add it to the commitfest.
>
>
> An other regexp functions regexp_positions() that returns all
> occurrences that matched a POSIX regular expression is also developped
> by Joel Jacobson, see [2]. This function expands the list of regexp
> functions described in XQUERY.
>
>
> [1]
> https://www.postgresql.org/docs/13/functions-matching.html#FUNCTIONS-POSIX-REGEXP
>
> [2]
> https://www.postgresql.org/message-id/flat/bfd5-909d-408b-8531-95b32f18d4ab%40www.fastmail.com#3ec8ba658eeabcae2ac6ccca33bd1aed
>
>

I would like to see these functions in PG 14 but it is a bit too late,
added to commitfest 2021-07.


-- 
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/



Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-20 Thread Gilles Darold
Hi,


This is a new version of the patch that now implements all the XQUERY
regexp functions as described in the standard, minus the differences of
PostgerSQL regular expression explain in [1].


The standard SQL describe functions like_regex(), occurrences_regex(),
position_regex(), substring_regex() and translate_regex() which
correspond to the commonly named functions regexp_like(),
regexp_count(), regexp_instr(), regexp_substr() and regexp_replace() as
reported by Chapman Flack in [2]. All these function are implemented in
the patch. Syntax of the functions are:


- regexp_like(string, pattern [, flags ])

- regexp_count( string, pattern [, position ] [, flags ])

- regexp_instr( string, pattern [, position ] [, occurrence ] [,
returnopt ] [, flags ] [, group ])

- regexp_substr( string, pattern [, position ] [, occurrence ] [, flags
] [, group ])

- regexp_replace(source, pattern, replacement [, position ] [,
occurrence ] [, flags ])


In addition to previous patch version I have added the regexp()_like
function and extended the existsing regex_replace() function. The patch
documents these functions and adds regression tests for all functions. I
will add it to the commitfest.


An other regexp functions regexp_positions() that returns all
occurrences that matched a POSIX regular expression is also developped
by Joel Jacobson, see [2]. This function expands the list of regexp
functions described in XQUERY.


[1]
https://www.postgresql.org/docs/13/functions-matching.html#FUNCTIONS-POSIX-REGEXP

[2]
https://www.postgresql.org/message-id/flat/bfd5-909d-408b-8531-95b32f18d4ab%40www.fastmail.com#3ec8ba658eeabcae2ac6ccca33bd1aed


-- 
Gilles Darold
LzLabs GmbH
http://www.lzlabs.com/

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index fee0561961..f5f08f1509 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,66 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Return the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Return the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found, then the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
+  
+   
+
+ regexp_like
+
+regexp_like ( string text, pattern text [, flags text ] )
+boolean
+   
+   
+Evaluate the existence of a match to a POSIX regular expression
+in string; see .
+   
+   
+regexp_like('Hello'||chr(10)||'world', '^world$', 'm')
+3
+   
+  
+
+
   

 
@@ -3144,7 +3204,7 @@ repeat('Pg', 4) PgPgPgPg
 
  regexp_replace
 
-regexp_replace ( string text, pattern text, replacement text [, flags text ] )
+regexp_replace ( string text, pattern text, replacement text [, position integer ] [, occurence integer ]  [, flags text ] )
 text


@@ -3157,6 +3217,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

 
@@ -5295,6 +5373,18 @@ substring('foobar' similar '#"o_b#"%' escape '#')NULL
 regexp_split_to_array

+   
+regexp_like
+   
+   
+regexp_count
+   
+   
+regexp_instr
+   
+   
+regexp_substr
+   
 

  lists the available
@@ -5451,6 +5541,8 @@ substring('foobar' from 'o(.)b')   o
  It has the syntax
  regexp_replace(source,
  pattern, replacement
+ , position 
+ , occurrence 
  , flags ).
  The source string is returned unchanged if
  there is no match to the pattern.  If there is a
@@ -5464,12 +5556,19 @@ substring('foobar' 

Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-03 Thread Gilles Darold
My apologies for the links in the head, the email formatting and the
missing patch, I accidently send the email too early.

--

Gilles

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index bf99f82149..88e08b40d2 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3097,6 +3097,47 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_count
+
+regexp_count ( string text, pattern text [, position integer ] [, flags text ] )
+integer
+   
+   
+Return the number of times a pattern occurs for a match of a POSIX
+regular expression to the string; see
+.
+   
+   
+regexp_count('123456789012', '\d{3}', 3)
+3
+   
+  
+
+  
+   
+
+ regexp_instr
+
+regexp_instr ( string text, pattern text [, position integer ] [, occurence integer ] [, returnopt integer ] [, flags text ] [, group integer ] )
+integer
+   
+   
+   Return the position within string where the
+   match of a POSIX regular expression occurs. It returns an integer
+   indicating the beginning or ending position of the matched substring,
+   depending on the value of the returnopt argument
+   (default beginning). If no match is found, then the function returns 0;
+   see .
+   
+   
+regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 4)
+7
+   
+  
+
   

 
@@ -3157,6 +3198,24 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ regexp_substr
+
+regexp_substr ( string text, pattern text [, position integer ] [, occurence integer ] [, flags text ] [, group integer ] )
+text
+   
+   
+   Return the substring within string corresponding to the
+   match of a POSIX regular expression; see .
+   
+   
+regexp_substr('1234567890 1234557890', '(123)(4(5[56])(78))', 1, 2, 'i', 3)
+55
+   
+  
+
   

 
@@ -5295,6 +5354,15 @@ substring('foobar' similar '#"o_b#"%' escape '#')NULL
 regexp_split_to_array

+   
+regexp_count
+   
+   
+regexp_instr
+   
+   
+regexp_substr
+   
 

  lists the available
@@ -5669,6 +5737,132 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
 in practice.  Other software systems such as Perl use similar definitions.

 
+
+ The regexp_count function returns the number of
+ captured substring(s) resulting from matching a POSIX regular
+ expression pattern to a string.  It has the syntax regexp_replace(
+ string, pattern ,
+ position  , flags ).
+ pattern is searched in string starting
+ from an optional position or from the beginning of string
+ by default.  The flags parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ regexp_count accepts all the flags
+ shown in .
+ The g flag is forced internally to count all matches.
+ This function returns 0 if there is no match or the number of match as
+ an integer.
+
+
+   
+Some examples:
+
+SELECT regexp_count('123123123123', '\d{3}', 1);
+ regexp_count 
+--
+4
+(1 row)
+
+SELECT regexp_count('Hello'||CHR(10)||'world!', '^world!$', 1, 'm');
+ regexp_count 
+--
+1
+(1 row)
+
+   
+
+
+ The regexp_instr function returns the beginning or ending
+ position of the matched substring resulting from matching a POSIX regular
+ expression pattern to a string.  It has the syntax regexp_instr(
+ string, pattern ,
+ position  , occurrence 
+ , returnopt 
+ , flags 
+ , group ).
+ pattern is searched in string starting
+ from an optional position or from the beginning
+ of string by default.  occurrence
+ indicates which occurrence of pattern in string
+ should be searched.  When returnopt is set to 0 (default) the function
+ returns the position of the first character of the occurrence.  When set to 1 returns the position
+ of the character after the occurrence.
+ The flags parameter is an optional text string
+ containing zero or more single-letter flags that change the function's behavior.
+ regexp_count accepts all the flags
+ shown in .
+ The g flag is forced internally to track all matches.
+ For a pattern with capture groups, group is an integer indicating
+ which capture in pattern is the target of the function.  A capture group is a part of the pattern
+ enclosed in parentheses. Capture groups can be nested.  They are numbered in order in which their
+ left parentheses appear in pattern. If group is zero, then the position
+ of the entire substring that matches the pattern is returned. If pattern
+ does not have at least group 

[PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace

2021-03-03 Thread Gilles Darold
Oracle:
https://docs.oracle.com/en/database/oracle/oracle-database/18/adfns/regexp.html#GUID-F14733F3-B943-4BAD-8489-F9704986386B
IBM:
https://www.ibm.com/support/producthub/db2/docs/content/SSEPGG_11.5.0/com.ibm.db2.luw.sql.ref.doc/doc/r0061494.html?pos=2
Z/OS:
https://www.ibm.com/support/knowledgecenter/SSEPEK_12.0.0/sqlref/src/tpc/db2z_bif_regexplike.html
EDB:
https://www.enterprisedb.com/edb-docs/d/edb-postgres-advanced-server/reference/database-compatibility-for-oracle-developers-reference-guide/9.6/Database_Compatibility_for_Oracle_Developers_Reference_Guide.1.098.html


Hi,


I would like to suggest adding the $subject functions to PostgreSQL. We
can do lot of things using regexp_matches() and regexp_replace() but
some time it consist on building complex queries that these functions
can greatly simplify.


Look like all RDBMS that embedded a regexp engine implement these
functions (Oracle, DB2, MySQL, etc) but I don't know if they are part of
the SQL standard. Probably using regexp_matches() can be enough even if
it generates more complex statements but having these functions in
PostgreSQL could be useful for users and code coming from theses RDBMS.


  - REGEXP_COUNT( string text, pattern text, [, position int] [, flags
text ] ) -> integer

        Return the number of times a pattern occurs in a source string
after a certain position, default from beginning.


        It can be implemented in PostgreSQL as a subquery using:

            SELECT count(*) FROM regexp_matches('A1B2C3', '[A-Z][0-9]',
'g'); -> 3

        To support positioning we have to use substr(), for example
starting at position 2:

            SELECT count(*) FROM regexp_matches(substr('A1B2C3', 2),
'[A-Z][0-9]'); -> 2

        With regexp_count() we can simply use it like this:

            SELECT regexp_count('A1B2C3', '[A-Z][0-9]'); -> 3
            SELECT regexp_count('A1B2C3', '[A-Z][0-9]', 2); -> 2


  - REGEXP_INSTR( string text, pattern text, [, position int] [,
occurrence int] [, return_opt int ] [, flags text ] [, group int] ) ->
integer

        Return the position in a string for a regular expression
pattern. It returns an integer indicating the beginning or ending
position of the matched substring, depending on the value of the
return_opt argument (default beginning). If no match is found, then the
function returns 0.

            * position: indicates the character where the search should
begin.
            * occurrence: indicates which occurrence of pattern found in
string should be search.
            * return_opt: 0 mean returns the position of the first
character of the occurrence, 1 mean returns the position of the
character following the occurrence.
            * flags: regular expression modifiers.
            * group: indicates which subexpression in pattern is the
target of the function.

        Example:

            SELECT regexp_instr('1234567890', '(123)(4(56)(78))', 1, 1,
0, 'i', 4); -> 7

        to obtain a PostgreSQL equivalent:

            SELECT position((SELECT (regexp_matches('1234567890',
'(123)(4(56)(78))', 'ig'))[4] offset 0 limit 1) IN '1234567890');


  - REGEXP_SUBSTR( string text, pattern text, [, position int] [,
occurrence int] [, flags text ] [, group int] ) -> text

        It is similar to regexp_instr(), but instead of returning the
position of the substring, it returns the substring itself.

        Example:

            SELECT regexp_substr('500 gilles''s street, 38000 Grenoble,
FR', ',[^,]+,'); -> , 38000 Grenoble,

        or with a more complex extraction:

            SELECT regexp_substr('1234567890', '(123)(4(56)(78))', 1, 1,
'i', 4); -> 78
            SELECT regexp_substr('1234567890 1234557890',
'(123)(4(5[56])(78))', 1, 2, 'i', 3); -> 55

        To obtain the same result for the last example we have to use:

            SELECT (SELECT * FROM regexp_matches('1234567890
1234557890', '(123)(4(5[56])(78))',  'g') offset 1 limit 2)[3];


I have not implemented the regexp_like() function, it is quite similar
than the ~ and ~* operators except that it can also support other
modifiers than 'i'. I can implement it easily and add it to the patch if
we want to supports all those common functions.

  - REGEXP_LIKE( string text, pattern text, [, flags text ] ) -> boolean

        Similar to the LIKE condition, except that it performs regular
expression matching instead of the simple pattern matching performed by
LIKE.

        Example:

            SELECT * FROM t1 WHERE regexp_like(col1, '^d$', 'm');

        to obtain a PostgreSQL equivalent:

            SELECT * FROM t1 WHERE regexp_match (col1, '^d$', 'm' ) IS
NOT NULL;


There is also a possible extension to regexp_replace() that I have not
implemented yet because it need more work than the previous functions.


  - REGEXP_REPLACE( string text, pattern text, replace_string text, [,
position int] [, occurrence int] [, flags text ] )

    Extend PostgreSQL regexp_replace() by adding position and occurrence
capabilities.

The