On 9/17/19 12:09 PM, Erik Rijkers wrote: > On 2019-09-17 17:38, Jonathan S. Katz wrote: >> [regex.patch]
Thanks for the review! > "Several other parts of the SQL standard > also define LIKE_REGEX equivalents that refer > to this implementation, including the > SQL/JSON path like_regex filter." > > As I understand this text, 'concept' seems better. > I'd drop 'also', too. I rewrote this to be: "Several other parts of the SQL standard refer to the LIKE_REGEX specification to define similar operations, including..." > 2. > 'whereas the POSIX will those' should be > 'whereas POSIX will regard those' > or maybe 'read those' I used "treat those" > > 3. > + The SQL/JSON standard borrows its definition for how regular > expressions > + from the <literal>LIKE_REGEX</literal> operator, which in turns > uses the > + XQuery standard. > That sentence needs the verb 'work', no? 'for how regular expressions > work [..]' > Or alternatively drop 'how'. I dropped the "how". v2 attached. Thanks! Jonathan
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2b4fe0cb59..c867ea13de 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5968,6 +5968,88 @@ SELECT regexp_match('abc01234xyz',
'(?:(.*?)(\d+)(.*)){1,1}');
</para>
</sect3>
+ <sect3 id="posix-vs-like_regexp">
+ <title>Differences with <literal>LIKE_REGEX</literal></title>
+
+ <para>
+ The <literal>LIKE_REGEX</literal> operator is specified starting with
+ the SQL:2008 standard to provide a more robust specification for
+ comparisons using regular expressions. Several other parts of the SQL
+ standard refer to the <literal>LIKE_REGEX</literal> specification
+ to define similar operations, including the
+ <link linkend="jsonpath-regular-expressions">SQL/JSON path
+ <literal>like_regex</literal></link> filter.
+ </para>
+ <para>
+ The SQL standard states that regular expressions are evaluated according to
+ the XQuery standard for regular expressions. While POSIX regular
+ expressions are similar to XQuery regular expressions, there exist some
+ differences where the behavior deviates from what is defined for
+ <literal>LIKE_REGEX</literal>. Notably, regular expressions evaluated for
+ <literal>LIKE_REGEX</literal> are defined to work on Unicode encoded
strings,
+ whereas POSIX regular expressions can work on strings of any encoding.
+ </para>
+ <para>
+ Other differences include:
+ <itemizedlist>
+ <listitem>
+ <para>
+ Character class subtraction is not supported (for example, using the
+ following to search for only consonants:
+ <literal>[a-z-[aeiou]]</literal>).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The <literal>LIKE_REGEX</literal> specification states that a single
+ <literal>.</literal> should match a Windows newline
+ (<literal>\r\n</literal>) whereas POSIX will treat those as two separate
+ characters.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The format <literal>#NN</literal> where <literal>NN</literal>
+ represents two hex digits used for character class elements is not
+ supported. The same character class elements can be used with POSIX by
+ specifying <literal>\xNN</literal>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Character class elements using <literal>\p{UnicodeProperty}</literal>
+ or the inverse <literal>\P{UnicodeProperty}</literal> are not supported.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Character class shorthands <literal>\i</literal>,
+ <literal>\I</literal>, <literal>\c</literal>, and <literal>\C</literal>
+ are not supported.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The specification for <literal>LIKE_REGEX</literal> may allow for more
+ characters for the <literal>\w</literal> character class shorthand, and
+ by extensions, excludes more characters with the complement
+ <literal>\W</literal>. As PostgreSQL depends on the underlying system's
+ locale, this may cause the behavior of <literal>\w</literal> and
+ <literal>\W</literal> to be equivalent to what POSIX provides.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The <literal>x</literal> flag in PostgreSQL extends on the specification
+ for <literal>LIKE_REGEX</literal> by allowing for comments specified
+ with <literal>#</literal>.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ </sect3>
+
<!-- end re_syntax.n man page -->
</sect2>
@@ -11872,6 +11954,38 @@ table2-mapping
</sect3>
+ <sect3 id="jsonpath-regular-expressions">
+ <title>Regular Expressions</title>
+
+ <para>
+ SQL/JSON path expressions support the ability to match text using regular
+ expressions with the <literal>like_regex</literal> filter. For example,
+ the following SQL/JSON path query would case-insensitively match all
+ strings in an array that start with a vowel:
+<programlisting>
+'$[*] ? (@ like_regex "^[aeiou]" flag "i")'
+</programlisting>
+ </para>
+
+ <para>
+ The SQL/JSON standard borrows its definition for regular expressions
+ from the <literal>LIKE_REGEX</literal> operator, which in turns uses the
+ XQuery standard. PostgreSQL does not support the
+ <literal>LIKE_REGEX</literal> operator as it currently implements
+ <link linkend="functions-posix-regexp">POSIX regular expressions</link>.
+ </para>
+
+ <para>
+ For its implementation of the SQL/JSON path <literal>like_regex</literal>
+ filter, PostgreSQL uses its POSIX implementation to evaluate the
+ regular expressions. While similar to the SQL standard specification for
+ the <literal>LIKE_REGEX</literal> operator, there are some noted
+ differences that you can read about in
+ <xref linkend="posix-vs-like_regexp" />.
+ </para>
+
+ </sect3>
+
<sect3 id="functions-sqljson-path-operators">
<title>SQL/JSON Path Operators and Methods</title>
@@ -12114,9 +12228,10 @@ table2-mapping
<entry><literal>like_regex</literal></entry>
<entry>
Tests pattern matching with POSIX regular expressions
- (see <xref linkend="functions-posix-regexp"/>). Supported flags
- are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>,
- <literal>x</literal>, and <literal>q</literal>.</entry>
+ (see <xref linkend="jsonpath-regular-expressions"/> for additional
+ details). Supported flags are <literal>i</literal>,
+ <literal>s</literal>, <literal>m</literal>,
+ and <literal>q</literal>.</entry>
<entry><literal>["abc", "abd", "aBdC", "abdacb",
"babc"]</literal></entry>
<entry><literal>$[*] ? (@ like_regex "^ab.*c" flag
"i")</literal></entry>
<entry><literal>"abc", "aBdC", "abdacb"</literal></entry>
signature.asc
Description: OpenPGP digital signature
