On 9/16/19 6:39 PM, Jonathan S. Katz wrote:

> My main question is "where" -- I'm thinking somewhere in the JSON
> path[2] section. After reading your email 3 times, I may have enough
> knowledge to attempt some documentation on the regexp in JSON path.

Here is said attempt to document. Notes:

- I centered it around the specification for LIKE_REGEX, which uses
XQuery, but primarily noted where our implementation of POSIX regex's
differs from what is specified for LIKE_REGEX vis-a-vis XQuery

- I put the pith of the documentation in a subsection off of "POSIX
regular expressions"

- I noted that LIKE_REGEX is specified in SQL:2008, which I read on the
Internet(tm) but was not able to confirm in the spec as I do not have a copy

- For my explanation about the "x" flag differences, I talked about how
we extended it, but I could not capture how Tom described the nuances above.

- From the SQL/JSON path docs, I added a section on regular expressions
stating what the behavior is, and referring back to the main regex docs

- I removed the "x" flag being supported for like_regex in JSON path

I also presume it needs a bit of wordsmithing / accuracy checks, but
hope it's a good start and does not require a massive rewrite.

Thanks,

Jonathan
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2b4fe0cb59..74dbfd9e46 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5968,6 +5968,88 @@ SELECT regexp_match('abc01234xyz', 
'(?:(.*?)(\d+)(.*)){1,1}');
    </para>
    </sect3>
 
+   <sect3 id="posix-vs-like_regexp">
+   <title>Differences with <literal>LIKE_REGEX</literal></title>
+
+   <para>
+    The <literal>LIKE_REGEX</literal> operator is specified starting with
+    the SQL:2008 standard to provide a more robust specification for
+    comparisons using regular expressions. Several other parts of the SQL
+    standard also define <literal>LIKE_REGEX</literal> equivalents that refer
+    to this implementation, including the
+    <link linkend="jsonpath-regular-expressions">SQL/JSON path
+     <literal>like_regex</literal></link> filter.
+   </para>
+   <para>
+    The SQL standard states that regular expressions are evaluated according to
+    the XQuery standard for regular expressions. While POSIX regular
+    expressions are similar to XQuery regular expressions, there exist some
+    differences where the behavior deviates from what is defined for
+    <literal>LIKE_REGEX</literal>. Notably, regular expressions evaluated for
+    <literal>LIKE_REGEX</literal> are defined to work on Unicode encoded 
strings,
+    whereas POSIX regular expressions can work on strings of any encoding.
+   </para>
+   <para>
+    Other differences include:
+    <itemizedlist>
+     <listitem>
+      <para>
+       Character class subtraction is not supported (for example, using the
+       following to search for only consonants:
+       <literal>[a-z-[aeiou]]</literal>).
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The <literal>LIKE_REGEX</literal> specification states that a single
+       <literal>.</literal> should match a Windows newline
+       (<literal>\r\n</literal>) whereas the POSIX will those as two separate
+       characters.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The format <literal>#NN</literal> where <literal>NN</literal>
+       represents two hex digits used for character class elements is not
+       supported. The same character class elements can be used with POSIX by
+       specifying <literal>\xNN</literal>.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Character class elements using <literal>\p{UnicodeProperty}</literal>
+       or the inverse <literal>\P{UnicodeProperty}</literal> are not supported.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       Character class shorthands <literal>\i</literal>,
+       <literal>\I</literal>, <literal>\c</literal>, and <literal>\C</literal>
+       are not supported.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The specification for <literal>LIKE_REGEX</literal> may allow for more
+       characters for the <literal>\w</literal> character class shorthand, and
+       by extensions, excludes more characters with the complement
+       <literal>\W</literal>. As PostgreSQL depends on the underlying system's
+       locale, this may cause the behavior of <literal>\w</literal> and
+       <literal>\W</literal> to be equivalent to what POSIX provides.
+      </para>
+     </listitem>
+     <listitem>
+      <para>
+       The <literal>x</literal> flag in PostgreSQL extends on the specification
+       for <literal>LIKE_REGEX</literal> by allowing for comments specified
+       with <literal>#</literal>.
+      </para>
+     </listitem>
+    </itemizedlist>
+   </para>
+
+  </sect3>
+
 <!-- end re_syntax.n man page -->
 
   </sect2>
@@ -11872,6 +11954,38 @@ table2-mapping
 
    </sect3>
 
+   <sect3 id="jsonpath-regular-expressions">
+    <title>Regular Expressions</title>
+
+    <para>
+     SQL/JSON path expressions support the ability to match text using regular
+     expressions with the <literal>like_regex</literal> filter. For example,
+     the following SQL/JSON path query would case-insensitively match all
+     strings in an array that start with a vowel:
+<programlisting>
+'$[*] ? (@ like_regex "^[aeiou]" flag "i")'
+</programlisting>
+    </para>
+
+    <para>
+     The SQL/JSON standard borrows its definition for how regular expressions
+     from the <literal>LIKE_REGEX</literal> operator, which in turns uses the
+     XQuery standard. PostgreSQL does not support the
+     <literal>LIKE_REGEX</literal> operator as it currently implements
+     <link linkend="functions-posix-regexp">POSIX regular expressions</link>.
+    </para>
+
+    <para>
+     For its implementation of the SQL/JSON path <literal>like_regex</literal>
+     filter, PostgreSQL uses its POSIX implementation to evaluate the
+     regular expressions. While similar to the SQL standard specification for
+     the <literal>LIKE_REGEX</literal> operator, there are some noted
+     differences that you can read about in
+     <xref linkend="posix-vs-like_regexp" />.
+    </para>
+
+   </sect3>
+
    <sect3 id="functions-sqljson-path-operators">
    <title>SQL/JSON Path Operators and Methods</title>
 
@@ -12114,9 +12228,10 @@ table2-mapping
         <entry><literal>like_regex</literal></entry>
         <entry>
           Tests pattern matching with POSIX regular expressions
-          (see <xref linkend="functions-posix-regexp"/>).  Supported flags
-          are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>,
-          <literal>x</literal>, and <literal>q</literal>.</entry>
+          (see <xref linkend="jsonpath-regular-expressions"/> for additional
+          details).  Supported flags are <literal>i</literal>,
+          <literal>s</literal>, <literal>m</literal>,
+          and <literal>q</literal>.</entry>
         <entry><literal>["abc", "abd", "aBdC", "abdacb", 
"babc"]</literal></entry>
         <entry><literal>$[*] ? (@ like_regex "^ab.*c" flag 
"i")</literal></entry>
         <entry><literal>"abc", "aBdC", "abdacb"</literal></entry>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to