On Tue, Jan 9, 2024 at 8:52 AM Dian Fay <d...@nmfay.com> wrote: > > On Mon Jan 8, 2024 at 9:26 AM EST, jian he wrote: > > On Mon, Jan 8, 2024 at 8:44 AM Dian Fay <d...@nmfay.com> wrote: > > > The `regexp_replace` summary in table 9.10 is mismatched and still > > > specifies the first parameter name as `string` instead of `source`. > > > Since all the other functions use `string`, should `regexp_replace` do > > > the same or is this a case where an established "standard" diverges? > > > > got it. Thanks for pointing it out. > > > > in functions-matching.html > > if I change <replaceable>source</replaceable> to > > <replaceable>string</replaceable> then > > there are no markup "string" and markup "string", it's kind of > > slightly confusing. > > > > So does the following refactored description of regexp_replace make sense: > > > > The <replaceable>string</replaceable> is returned unchanged if > > there is no match to the <replaceable>pattern</replaceable>. If there > > is a > > match, the <replaceable>string</replaceable> is returned with the > > <replaceable>replacement</replaceable> string substituted for the > > matching > > substring. The <replaceable>replacement</replaceable> string can > > contain > > <literal>\</literal><replaceable>n</replaceable>, where > > <replaceable>n</replaceable> is 1 > > through 9, to indicate that the source substring matching the > > <replaceable>n</replaceable>'th parenthesized subexpression of > > the pattern should be > > inserted, and it can contain <literal>\&</literal> to indicate > > that the > > substring matching the entire pattern should be inserted. Write > > <literal>\\</literal> if you need to put a literal backslash in > > the replacement > > text. > > That change makes sense to me! I'll see about the section refactoring > after this lands.
I put the changes into the new patch.
From 8fa04ed1fecb48ca8254d2ed7e60a9013fa130a3 Mon Sep 17 00:00:00 2001 From: jian he <jian.universal...@gmail.com> Date: Wed, 10 Jan 2024 17:46:18 +0800 Subject: [PATCH v3 1/1] add function argument names to regex.* functions. Specifically add function argument names to the following funtions: regexp_replace, regexp_match, regexp_matches regexp_count, regexp_instr, regexp_like, regexp_substr, regexp_split_to_table regexp_split_to_array So it would be easier to understand these functions in psql via \df. Also now these functions can be called in different notaions. --- doc/src/sgml/func.sgml | 50 +++++++++++------------ src/include/catalog/pg_proc.dat | 71 ++++++++++++++++++++++++++------- 2 files changed, 82 insertions(+), 39 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index de78d58d..23ef07a5 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -3299,7 +3299,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in </indexterm> <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>start</parameter> <type>integer</type> - [, <parameter>N</parameter> <type>integer</type> + [, <parameter>occurrence</parameter> <type>integer</type> [, <parameter>endoption</parameter> <type>integer</type> [, <parameter>flags</parameter> <type>text</type> [, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] ] ) @@ -3307,7 +3307,7 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in </para> <para> Returns the position within <parameter>string</parameter> where - the <parameter>N</parameter>'th match of the POSIX regular + the <parameter>occurrence</parameter>'th match of the POSIX regular expression <parameter>pattern</parameter> occurs, or zero if there is no such match; see <xref linkend="functions-posix-regexp"/>. </para> @@ -3413,14 +3413,14 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in <entry role="func_table_entry"><para role="func_signature"> <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type>, <parameter>replacement</parameter> <type>text</type>, <parameter>start</parameter> <type>integer</type>, - <parameter>N</parameter> <type>integer</type> + <parameter>occurrence</parameter> <type>integer</type> [, <parameter>flags</parameter> <type>text</type> ] ) <returnvalue>text</returnvalue> </para> <para> - Replaces the substring that is the <parameter>N</parameter>'th + Replaces the substring that is the <parameter>occurrence</parameter>'th match to the POSIX regular expression <parameter>pattern</parameter>, - or all such matches if <parameter>N</parameter> is zero; see + or all such matches if <parameter>occurrence</parameter> is zero; see <xref linkend="functions-posix-regexp"/>. </para> <para> @@ -3478,14 +3478,14 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in </indexterm> <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> [, <parameter>start</parameter> <type>integer</type> - [, <parameter>N</parameter> <type>integer</type> + [, <parameter>occurrence</parameter> <type>integer</type> [, <parameter>flags</parameter> <type>text</type> [, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] ) <returnvalue>text</returnvalue> </para> <para> Returns the substring within <parameter>string</parameter> that - matches the <parameter>N</parameter>'th occurrence of the POSIX + matches the <parameter>occurrence</parameter>'th occurrence of the POSIX regular expression <parameter>pattern</parameter>, or <literal>NULL</literal> if there is no such match; see <xref linkend="functions-posix-regexp"/>. @@ -5888,13 +5888,13 @@ regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation> <para> The <function>regexp_instr</function> function returns the starting or - ending position of the <replaceable>N</replaceable>'th match of a + ending position of the <replaceable>occurrence</replaceable>'th match of a POSIX regular expression pattern to a string, or zero if there is no such match. It has the syntax <function>regexp_instr</function>(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>start</replaceable> - <optional>, <replaceable>N</replaceable> + <optional>, <replaceable>occurrence</replaceable> <optional>, <replaceable>endoption</replaceable> <optional>, <replaceable>flags</replaceable> <optional>, <replaceable>subexpr</replaceable> @@ -5903,8 +5903,8 @@ regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation> in <replaceable>string</replaceable>, normally from the beginning of the string, but if the <replaceable>start</replaceable> parameter is provided then beginning from that character index. - If <replaceable>N</replaceable> is specified - then the <replaceable>N</replaceable>'th match of the pattern + If <replaceable>occurrence</replaceable> is specified + then the <replaceable>occurrence</replaceable>'th match of the pattern is located, otherwise the first match is located. If the <replaceable>endoption</replaceable> parameter is omitted or specified as zero, the function returns the position of the first @@ -6024,8 +6024,8 @@ SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1]; expression pattern to a string. It has the same syntax as <function>regexp_match</function>. This function returns no rows if there is no match, one row if there is - a match and the <literal>g</literal> flag is not given, or <replaceable>N</replaceable> - rows if there are <replaceable>N</replaceable> matches and the <literal>g</literal> flag + a match and the <literal>g</literal> flag is not given, or <replaceable>occurrence</replaceable> + rows if there are <replaceable>occurrence</replaceable> matches and the <literal>g</literal> flag is given. Each returned row is a text array containing the whole matched substring or the substrings matching parenthesized subexpressions of the <replaceable>pattern</replaceable>, just as described above @@ -6076,18 +6076,18 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; The <function>regexp_replace</function> function provides substitution of new text for substrings that match POSIX regular expression patterns. It has the syntax - <function>regexp_replace</function>(<replaceable>source</replaceable>, + <function>regexp_replace</function>(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>, <replaceable>replacement</replaceable> <optional>, <replaceable>start</replaceable> - <optional>, <replaceable>N</replaceable> + <optional>, <replaceable>occurrence</replaceable> </optional></optional> <optional>, <replaceable>flags</replaceable> </optional>). - (Notice that <replaceable>N</replaceable> cannot be specified + (Notice that <replaceable>occurrence</replaceable> cannot be specified unless <replaceable>start</replaceable> is, but <replaceable>flags</replaceable> can be given in any case.) - The <replaceable>source</replaceable> string is returned unchanged if + The <replaceable>string</replaceable> is returned unchanged if there is no match to the <replaceable>pattern</replaceable>. If there is a - match, the <replaceable>source</replaceable> string is returned with the + match, the <replaceable>string</replaceable> is returned with the <replaceable>replacement</replaceable> string substituted for the matching substring. The <replaceable>replacement</replaceable> string can contain <literal>\</literal><replaceable>n</replaceable>, where <replaceable>n</replaceable> is 1 @@ -6102,14 +6102,14 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; the string, but if the <replaceable>start</replaceable> parameter is provided then beginning from that character index. By default, only the first match of the pattern is replaced. - If <replaceable>N</replaceable> is specified and is greater than zero, - then the <replaceable>N</replaceable>'th match of the pattern + If <replaceable>occurrence</replaceable> is specified and is greater than zero, + then the <replaceable>occurrence</replaceable>'th match of the pattern is replaced. If the <literal>g</literal> flag is given, or - if <replaceable>N</replaceable> is specified and is zero, then all + if <replaceable>occurrence</replaceable> is specified and is zero, then all matches at or after the <replaceable>start</replaceable> position are replaced. (The <literal>g</literal> flag is ignored - when <replaceable>N</replaceable> is specified.) + when <replaceable>occurrence</replaceable> is specified.) The <replaceable>flags</replaceable> parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Supported flags (though @@ -6220,7 +6220,7 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo; <function>regexp_substr</function>(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>start</replaceable> - <optional>, <replaceable>N</replaceable> + <optional>, <replaceable>occurrence</replaceable> <optional>, <replaceable>flags</replaceable> <optional>, <replaceable>subexpr</replaceable> </optional></optional></optional></optional>). @@ -6228,8 +6228,8 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo; in <replaceable>string</replaceable>, normally from the beginning of the string, but if the <replaceable>start</replaceable> parameter is provided then beginning from that character index. - If <replaceable>N</replaceable> is specified - then the <replaceable>N</replaceable>'th match of the pattern + If <replaceable>occurrence</replaceable> is specified + then the <replaceable>occurrence</replaceable>'th match of the pattern is returned, otherwise the first match is returned. The <replaceable>flags</replaceable> parameter is an optional text string containing zero or more single-letter flags that change the diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 79793927..3b4330a6 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -3611,105 +3611,148 @@ prosrc => 'replace_text' }, { oid => '2284', descr => 'replace text using regexp', proname => 'regexp_replace', prorettype => 'text', - proargtypes => 'text text text', prosrc => 'textregexreplace_noopt' }, + proargtypes => 'text text text', + proargnames => '{string, pattern, replacement}', + prosrc => 'textregexreplace_noopt' }, { oid => '2285', descr => 'replace text using regexp', proname => 'regexp_replace', prorettype => 'text', - proargtypes => 'text text text text', prosrc => 'textregexreplace' }, + proargtypes => 'text text text text', + proargnames => '{string, pattern, replacement, flags}', + prosrc => 'textregexreplace' }, { oid => '6251', descr => 'replace text using regexp', proname => 'regexp_replace', prorettype => 'text', proargtypes => 'text text text int4 int4 text', + proargnames => '{string, pattern, replacement, start, occurrence, flags}', prosrc => 'textregexreplace_extended' }, { oid => '6252', descr => 'replace text using regexp', proname => 'regexp_replace', prorettype => 'text', proargtypes => 'text text text int4 int4', + proargnames => '{string, pattern, replacement, start, occurrence}', prosrc => 'textregexreplace_extended_no_flags' }, { oid => '6253', descr => 'replace text using regexp', proname => 'regexp_replace', prorettype => 'text', proargtypes => 'text text text int4', + proargnames => '{string, pattern, replacement, start}', prosrc => 'textregexreplace_extended_no_n' }, { oid => '3396', descr => 'find first match for regexp', proname => 'regexp_match', prorettype => '_text', proargtypes => 'text text', + proargnames => '{string, pattern}', prosrc => 'regexp_match_no_flags' }, { oid => '3397', descr => 'find first match for regexp', proname => 'regexp_match', prorettype => '_text', - proargtypes => 'text text text', prosrc => 'regexp_match' }, + proargtypes => 'text text text', + proargnames => '{string, pattern, flags}', + prosrc => 'regexp_match' }, { oid => '2763', descr => 'find match(es) for regexp', proname => 'regexp_matches', prorows => '1', proretset => 't', prorettype => '_text', proargtypes => 'text text', + proargnames => '{string, pattern}', prosrc => 'regexp_matches_no_flags' }, { oid => '2764', descr => 'find match(es) for regexp', proname => 'regexp_matches', prorows => '10', proretset => 't', prorettype => '_text', proargtypes => 'text text text', + proargnames => '{string, pattern, flags}', prosrc => 'regexp_matches' }, { oid => '6254', descr => 'count regexp matches', proname => 'regexp_count', prorettype => 'int4', proargtypes => 'text text', + proargnames => '{string, pattern}', prosrc => 'regexp_count_no_start' }, { oid => '6255', descr => 'count regexp matches', proname => 'regexp_count', prorettype => 'int4', - proargtypes => 'text text int4', prosrc => 'regexp_count_no_flags' }, + proargtypes => 'text text int4', + proargnames => '{string, pattern, start}', + prosrc => 'regexp_count_no_flags' }, { oid => '6256', descr => 'count regexp matches', proname => 'regexp_count', prorettype => 'int4', - proargtypes => 'text text int4 text', prosrc => 'regexp_count' }, + proargtypes => 'text text int4 text', + proargnames => '{string, pattern, start, flags}', + prosrc => 'regexp_count' }, { oid => '6257', descr => 'position of regexp match', proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text', + proargnames => '{string, pattern}', prosrc => 'regexp_instr_no_start' }, { oid => '6258', descr => 'position of regexp match', proname => 'regexp_instr', prorettype => 'int4', - proargtypes => 'text text int4', prosrc => 'regexp_instr_no_n' }, + proargtypes => 'text text int4', + proargnames => '{string, pattern, start}', + prosrc => 'regexp_instr_no_n' }, { oid => '6259', descr => 'position of regexp match', proname => 'regexp_instr', prorettype => 'int4', - proargtypes => 'text text int4 int4', prosrc => 'regexp_instr_no_endoption' }, + proargtypes => 'text text int4 int4', + proargnames => '{string, pattern, start, occurrence}', + prosrc => 'regexp_instr_no_endoption' }, { oid => '6260', descr => 'position of regexp match', proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4', + proargnames => '{string, pattern, start, occurrence, endoption}', prosrc => 'regexp_instr_no_flags' }, { oid => '6261', descr => 'position of regexp match', proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text', + proargnames => '{string, pattern, start, occurrence, endoption, flags}', prosrc => 'regexp_instr_no_subexpr' }, { oid => '6262', descr => 'position of regexp match', proname => 'regexp_instr', prorettype => 'int4', proargtypes => 'text text int4 int4 int4 text int4', + proargnames => '{string, pattern, start, occurrence, endoption, flags, subexpr}', prosrc => 'regexp_instr' }, { oid => '6263', descr => 'test for regexp match', - proname => 'regexp_like', prorettype => 'bool', proargtypes => 'text text', + proname => 'regexp_like', prorettype => 'bool', + proargtypes => 'text text', + proargnames => '{string, pattern}', prosrc => 'regexp_like_no_flags' }, { oid => '6264', descr => 'test for regexp match', proname => 'regexp_like', prorettype => 'bool', - proargtypes => 'text text text', prosrc => 'regexp_like' }, + proargtypes => 'text text text', + proargnames => '{string, pattern,flags}', + prosrc => 'regexp_like' }, { oid => '6265', descr => 'extract substring that matches regexp', proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text', + proargnames => '{string, pattern}', prosrc => 'regexp_substr_no_start' }, { oid => '6266', descr => 'extract substring that matches regexp', proname => 'regexp_substr', prorettype => 'text', - proargtypes => 'text text int4', prosrc => 'regexp_substr_no_n' }, + proargtypes => 'text text int4', + proargnames => '{string, pattern, start}', + prosrc => 'regexp_substr_no_n' }, { oid => '6267', descr => 'extract substring that matches regexp', proname => 'regexp_substr', prorettype => 'text', - proargtypes => 'text text int4 int4', prosrc => 'regexp_substr_no_flags' }, + proargtypes => 'text text int4 int4', + proargnames => '{string, pattern, start, occurrence}', + prosrc => 'regexp_substr_no_flags' }, { oid => '6268', descr => 'extract substring that matches regexp', proname => 'regexp_substr', prorettype => 'text', proargtypes => 'text text int4 int4 text', + proargnames => '{string, pattern, start, occurrence, flags}', prosrc => 'regexp_substr_no_subexpr' }, { oid => '6269', descr => 'extract substring that matches regexp', proname => 'regexp_substr', prorettype => 'text', - proargtypes => 'text text int4 int4 text int4', prosrc => 'regexp_substr' }, + proargtypes => 'text text int4 int4 text int4', + proargnames => '{string, pattern, start, occurrence, flags, subexpr}', + prosrc => 'regexp_substr' }, { oid => '2088', descr => 'split string by field_sep and return field_num', proname => 'split_part', prorettype => 'text', proargtypes => 'text text int4', prosrc => 'split_part' }, { oid => '2765', descr => 'split string by pattern', proname => 'regexp_split_to_table', prorows => '1000', proretset => 't', prorettype => 'text', proargtypes => 'text text', + proargnames => '{string, pattern}', prosrc => 'regexp_split_to_table_no_flags' }, { oid => '2766', descr => 'split string by pattern', proname => 'regexp_split_to_table', prorows => '1000', proretset => 't', prorettype => 'text', proargtypes => 'text text text', + proargnames => '{string, pattern, flags}', prosrc => 'regexp_split_to_table' }, { oid => '2767', descr => 'split string by pattern', proname => 'regexp_split_to_array', prorettype => '_text', - proargtypes => 'text text', prosrc => 'regexp_split_to_array_no_flags' }, + proargtypes => 'text text', + proargnames => '{string, pattern}', + prosrc => 'regexp_split_to_array_no_flags' }, { oid => '2768', descr => 'split string by pattern', proname => 'regexp_split_to_array', prorettype => '_text', - proargtypes => 'text text text', prosrc => 'regexp_split_to_array' }, + proargtypes => 'text text text', + proargnames => '{string, pattern, flags}', + prosrc => 'regexp_split_to_array' }, { oid => '9030', descr => 'convert int4 number to binary', proname => 'to_bin', prorettype => 'text', proargtypes => 'int4', prosrc => 'to_bin32' }, -- 2.34.1