On 9/16/19 6:39 PM, Jonathan S. Katz wrote: > My main question is "where" -- I'm thinking somewhere in the JSON > path[2] section. After reading your email 3 times, I may have enough > knowledge to attempt some documentation on the regexp in JSON path.
Here is said attempt to document. Notes: - I centered it around the specification for LIKE_REGEX, which uses XQuery, but primarily noted where our implementation of POSIX regex's differs from what is specified for LIKE_REGEX vis-a-vis XQuery - I put the pith of the documentation in a subsection off of "POSIX regular expressions" - I noted that LIKE_REGEX is specified in SQL:2008, which I read on the Internet(tm) but was not able to confirm in the spec as I do not have a copy - For my explanation about the "x" flag differences, I talked about how we extended it, but I could not capture how Tom described the nuances above. - From the SQL/JSON path docs, I added a section on regular expressions stating what the behavior is, and referring back to the main regex docs - I removed the "x" flag being supported for like_regex in JSON path I also presume it needs a bit of wordsmithing / accuracy checks, but hope it's a good start and does not require a massive rewrite. Thanks, Jonathan
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2b4fe0cb59..74dbfd9e46 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5968,6 +5968,88 @@ SELECT regexp_match('abc01234xyz',
'(?:(.*?)(\d+)(.*)){1,1}');
</para>
</sect3>
+ <sect3 id="posix-vs-like_regexp">
+ <title>Differences with <literal>LIKE_REGEX</literal></title>
+
+ <para>
+ The <literal>LIKE_REGEX</literal> operator is specified starting with
+ the SQL:2008 standard to provide a more robust specification for
+ comparisons using regular expressions. Several other parts of the SQL
+ standard also define <literal>LIKE_REGEX</literal> equivalents that refer
+ to this implementation, including the
+ <link linkend="jsonpath-regular-expressions">SQL/JSON path
+ <literal>like_regex</literal></link> filter.
+ </para>
+ <para>
+ The SQL standard states that regular expressions are evaluated according to
+ the XQuery standard for regular expressions. While POSIX regular
+ expressions are similar to XQuery regular expressions, there exist some
+ differences where the behavior deviates from what is defined for
+ <literal>LIKE_REGEX</literal>. Notably, regular expressions evaluated for
+ <literal>LIKE_REGEX</literal> are defined to work on Unicode encoded
strings,
+ whereas POSIX regular expressions can work on strings of any encoding.
+ </para>
+ <para>
+ Other differences include:
+ <itemizedlist>
+ <listitem>
+ <para>
+ Character class subtraction is not supported (for example, using the
+ following to search for only consonants:
+ <literal>[a-z-[aeiou]]</literal>).
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The <literal>LIKE_REGEX</literal> specification states that a single
+ <literal>.</literal> should match a Windows newline
+ (<literal>\r\n</literal>) whereas the POSIX will those as two separate
+ characters.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The format <literal>#NN</literal> where <literal>NN</literal>
+ represents two hex digits used for character class elements is not
+ supported. The same character class elements can be used with POSIX by
+ specifying <literal>\xNN</literal>.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Character class elements using <literal>\p{UnicodeProperty}</literal>
+ or the inverse <literal>\P{UnicodeProperty}</literal> are not supported.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ Character class shorthands <literal>\i</literal>,
+ <literal>\I</literal>, <literal>\c</literal>, and <literal>\C</literal>
+ are not supported.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The specification for <literal>LIKE_REGEX</literal> may allow for more
+ characters for the <literal>\w</literal> character class shorthand, and
+ by extensions, excludes more characters with the complement
+ <literal>\W</literal>. As PostgreSQL depends on the underlying system's
+ locale, this may cause the behavior of <literal>\w</literal> and
+ <literal>\W</literal> to be equivalent to what POSIX provides.
+ </para>
+ </listitem>
+ <listitem>
+ <para>
+ The <literal>x</literal> flag in PostgreSQL extends on the specification
+ for <literal>LIKE_REGEX</literal> by allowing for comments specified
+ with <literal>#</literal>.
+ </para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ </sect3>
+
<!-- end re_syntax.n man page -->
</sect2>
@@ -11872,6 +11954,38 @@ table2-mapping
</sect3>
+ <sect3 id="jsonpath-regular-expressions">
+ <title>Regular Expressions</title>
+
+ <para>
+ SQL/JSON path expressions support the ability to match text using regular
+ expressions with the <literal>like_regex</literal> filter. For example,
+ the following SQL/JSON path query would case-insensitively match all
+ strings in an array that start with a vowel:
+<programlisting>
+'$[*] ? (@ like_regex "^[aeiou]" flag "i")'
+</programlisting>
+ </para>
+
+ <para>
+ The SQL/JSON standard borrows its definition for how regular expressions
+ from the <literal>LIKE_REGEX</literal> operator, which in turns uses the
+ XQuery standard. PostgreSQL does not support the
+ <literal>LIKE_REGEX</literal> operator as it currently implements
+ <link linkend="functions-posix-regexp">POSIX regular expressions</link>.
+ </para>
+
+ <para>
+ For its implementation of the SQL/JSON path <literal>like_regex</literal>
+ filter, PostgreSQL uses its POSIX implementation to evaluate the
+ regular expressions. While similar to the SQL standard specification for
+ the <literal>LIKE_REGEX</literal> operator, there are some noted
+ differences that you can read about in
+ <xref linkend="posix-vs-like_regexp" />.
+ </para>
+
+ </sect3>
+
<sect3 id="functions-sqljson-path-operators">
<title>SQL/JSON Path Operators and Methods</title>
@@ -12114,9 +12228,10 @@ table2-mapping
<entry><literal>like_regex</literal></entry>
<entry>
Tests pattern matching with POSIX regular expressions
- (see <xref linkend="functions-posix-regexp"/>). Supported flags
- are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>,
- <literal>x</literal>, and <literal>q</literal>.</entry>
+ (see <xref linkend="jsonpath-regular-expressions"/> for additional
+ details). Supported flags are <literal>i</literal>,
+ <literal>s</literal>, <literal>m</literal>,
+ and <literal>q</literal>.</entry>
<entry><literal>["abc", "abd", "aBdC", "abdacb",
"babc"]</literal></entry>
<entry><literal>$[*] ? (@ like_regex "^ab.*c" flag
"i")</literal></entry>
<entry><literal>"abc", "aBdC", "abdacb"</literal></entry>
signature.asc
Description: OpenPGP digital signature
