On 9/16/19 6:39 PM, Jonathan S. Katz wrote: > My main question is "where" -- I'm thinking somewhere in the JSON > path[2] section. After reading your email 3 times, I may have enough > knowledge to attempt some documentation on the regexp in JSON path.
Here is said attempt to document. Notes: - I centered it around the specification for LIKE_REGEX, which uses XQuery, but primarily noted where our implementation of POSIX regex's differs from what is specified for LIKE_REGEX vis-a-vis XQuery - I put the pith of the documentation in a subsection off of "POSIX regular expressions" - I noted that LIKE_REGEX is specified in SQL:2008, which I read on the Internet(tm) but was not able to confirm in the spec as I do not have a copy - For my explanation about the "x" flag differences, I talked about how we extended it, but I could not capture how Tom described the nuances above. - From the SQL/JSON path docs, I added a section on regular expressions stating what the behavior is, and referring back to the main regex docs - I removed the "x" flag being supported for like_regex in JSON path I also presume it needs a bit of wordsmithing / accuracy checks, but hope it's a good start and does not require a massive rewrite. Thanks, Jonathan
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 2b4fe0cb59..74dbfd9e46 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -5968,6 +5968,88 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}'); </para> </sect3> + <sect3 id="posix-vs-like_regexp"> + <title>Differences with <literal>LIKE_REGEX</literal></title> + + <para> + The <literal>LIKE_REGEX</literal> operator is specified starting with + the SQL:2008 standard to provide a more robust specification for + comparisons using regular expressions. Several other parts of the SQL + standard also define <literal>LIKE_REGEX</literal> equivalents that refer + to this implementation, including the + <link linkend="jsonpath-regular-expressions">SQL/JSON path + <literal>like_regex</literal></link> filter. + </para> + <para> + The SQL standard states that regular expressions are evaluated according to + the XQuery standard for regular expressions. While POSIX regular + expressions are similar to XQuery regular expressions, there exist some + differences where the behavior deviates from what is defined for + <literal>LIKE_REGEX</literal>. Notably, regular expressions evaluated for + <literal>LIKE_REGEX</literal> are defined to work on Unicode encoded strings, + whereas POSIX regular expressions can work on strings of any encoding. + </para> + <para> + Other differences include: + <itemizedlist> + <listitem> + <para> + Character class subtraction is not supported (for example, using the + following to search for only consonants: + <literal>[a-z-[aeiou]]</literal>). + </para> + </listitem> + <listitem> + <para> + The <literal>LIKE_REGEX</literal> specification states that a single + <literal>.</literal> should match a Windows newline + (<literal>\r\n</literal>) whereas the POSIX will those as two separate + characters. + </para> + </listitem> + <listitem> + <para> + The format <literal>#NN</literal> where <literal>NN</literal> + represents two hex digits used for character class elements is not + supported. The same character class elements can be used with POSIX by + specifying <literal>\xNN</literal>. + </para> + </listitem> + <listitem> + <para> + Character class elements using <literal>\p{UnicodeProperty}</literal> + or the inverse <literal>\P{UnicodeProperty}</literal> are not supported. + </para> + </listitem> + <listitem> + <para> + Character class shorthands <literal>\i</literal>, + <literal>\I</literal>, <literal>\c</literal>, and <literal>\C</literal> + are not supported. + </para> + </listitem> + <listitem> + <para> + The specification for <literal>LIKE_REGEX</literal> may allow for more + characters for the <literal>\w</literal> character class shorthand, and + by extensions, excludes more characters with the complement + <literal>\W</literal>. As PostgreSQL depends on the underlying system's + locale, this may cause the behavior of <literal>\w</literal> and + <literal>\W</literal> to be equivalent to what POSIX provides. + </para> + </listitem> + <listitem> + <para> + The <literal>x</literal> flag in PostgreSQL extends on the specification + for <literal>LIKE_REGEX</literal> by allowing for comments specified + with <literal>#</literal>. + </para> + </listitem> + </itemizedlist> + </para> + + </sect3> + <!-- end re_syntax.n man page --> </sect2> @@ -11872,6 +11954,38 @@ table2-mapping </sect3> + <sect3 id="jsonpath-regular-expressions"> + <title>Regular Expressions</title> + + <para> + SQL/JSON path expressions support the ability to match text using regular + expressions with the <literal>like_regex</literal> filter. For example, + the following SQL/JSON path query would case-insensitively match all + strings in an array that start with a vowel: +<programlisting> +'$[*] ? (@ like_regex "^[aeiou]" flag "i")' +</programlisting> + </para> + + <para> + The SQL/JSON standard borrows its definition for how regular expressions + from the <literal>LIKE_REGEX</literal> operator, which in turns uses the + XQuery standard. PostgreSQL does not support the + <literal>LIKE_REGEX</literal> operator as it currently implements + <link linkend="functions-posix-regexp">POSIX regular expressions</link>. + </para> + + <para> + For its implementation of the SQL/JSON path <literal>like_regex</literal> + filter, PostgreSQL uses its POSIX implementation to evaluate the + regular expressions. While similar to the SQL standard specification for + the <literal>LIKE_REGEX</literal> operator, there are some noted + differences that you can read about in + <xref linkend="posix-vs-like_regexp" />. + </para> + + </sect3> + <sect3 id="functions-sqljson-path-operators"> <title>SQL/JSON Path Operators and Methods</title> @@ -12114,9 +12228,10 @@ table2-mapping <entry><literal>like_regex</literal></entry> <entry> Tests pattern matching with POSIX regular expressions - (see <xref linkend="functions-posix-regexp"/>). Supported flags - are <literal>i</literal>, <literal>s</literal>, <literal>m</literal>, - <literal>x</literal>, and <literal>q</literal>.</entry> + (see <xref linkend="jsonpath-regular-expressions"/> for additional + details). Supported flags are <literal>i</literal>, + <literal>s</literal>, <literal>m</literal>, + and <literal>q</literal>.</entry> <entry><literal>["abc", "abd", "aBdC", "abdacb", "babc"]</literal></entry> <entry><literal>$[*] ? (@ like_regex "^ab.*c" flag "i")</literal></entry> <entry><literal>"abc", "aBdC", "abdacb"</literal></entry>
signature.asc
Description: OpenPGP digital signature