Attached v2 patch rebased onto current master.
On 18.09.2019 18:10, Nikita Glukhov wrote:
Unfortunately, jsonpath lexer, in contrast to jsonpath parser, was written by
Teodor and me without a proper attention to the stanard. JSON path lexics is
is borrowed from the external ECMAScript [1], and we did not study it carefully.
There were numerous deviations from the ECMAScript standard in our jsonpath
implementation that were mostly fixed in the attached patch:
1. Identifiers (unquoted JSON key names) should start from the one of (see [2]):
- Unicode symbol having Unicode property "ID_Start" (see [3])
- Unicode escape sequence '\u' or '\u{X...}'
- '$'
- '_'
And they should continue with the one of:
- Unicode symbol having Unicode property "ID_Continue" (see [3])
- Unicode escape sequence
- '$'
- ZWNJ
- ZWJ
2. '$' is also allowed inside the identifiers, so it is possible to write
something like '$.a$$b'.
3. Variable references '$var' are regular identifiers simply starting from the
'$' sign, and there is no syntax like '$"var"', because quotes are not
allowed in identifiers.
4. Even if the Unicode escape sequence '\u' is used, it cannot produce
special symbols or whitespace, because the identifiers are displayed without
quoting (i.e. '$\u{20}' is not possible to display as '$" "' or even more as
string '"$ "').
5. All codepoints in '\u{XX}' greater than 0x10 should be forbidden.
6. 6 single-character escape sequences (\b \t \r \f \n \v) should only be
supported inside quoted strings.
I don't know if it is possible to check Unicode properties "ID_Start" and
"ID_Continue" in Postgres, and what ZWNJ/ZWJ is. Now, identifier's starting
character set is simply determined by the exclusion of all recognized special
characters.
The patch is not so simple, but I believe that it's not too late to fix v12.
[1]https://www.ecma-international.org/ecma-262/10.0/index.html#sec-ecmascript-language-lexical-grammar
[2]https://www.ecma-international.org/ecma-262/10.0/index.html#sec-names-and-keywords
[3]https://unicode.org/reports/tr31/
--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
>From fac98cad7a8dcb1354dd305c55372699d34659fe Mon Sep 17 00:00:00 2001
From: Nikita Glukhov
Date: Fri, 22 Mar 2019 15:15:38 +0300
Subject: [PATCH] Fix parsing of identifiers in jsonpath
---
src/backend/utils/adt/jsonpath.c | 11 +-
src/backend/utils/adt/jsonpath_gram.y | 6 +-
src/backend/utils/adt/jsonpath_scan.l | 146 +++---
src/test/regress/expected/jsonpath.out | 162 ++---
src/test/regress/sql/jsonpath.sql | 27 +
5 files changed, 252 insertions(+), 100 deletions(-)
diff --git a/src/backend/utils/adt/jsonpath.c b/src/backend/utils/adt/jsonpath.c
index e683cbef7c6..20c400a43cd 100644
--- a/src/backend/utils/adt/jsonpath.c
+++ b/src/backend/utils/adt/jsonpath.c
@@ -496,9 +496,14 @@ printJsonPathItem(StringInfo buf, JsonPathItem *v, bool inKey,
escape_json(buf, jspGetString(v, NULL));
break;
case jpiVariable:
- appendStringInfoChar(buf, '$');
- escape_json(buf, jspGetString(v, NULL));
- break;
+ {
+int32 len;
+char *name = jspGetString(v, );
+
+appendStringInfoChar(buf, '$');
+appendBinaryStringInfo(buf, name, len);
+break;
+ }
case jpiNumeric:
appendStringInfoString(buf,
DatumGetCString(DirectFunctionCall1(numeric_out,
diff --git a/src/backend/utils/adt/jsonpath_gram.y b/src/backend/utils/adt/jsonpath_gram.y
index 252f7051f65..ca47b96cb0e 100644
--- a/src/backend/utils/adt/jsonpath_gram.y
+++ b/src/backend/utils/adt/jsonpath_gram.y
@@ -348,8 +348,10 @@ makeItemVariable(JsonPathString *s)
JsonPathParseItem *v;
v = makeItemType(jpiVariable);
- v->value.string.val = s->val;
- v->value.string.len = s->len;
+
+ /* skip leading '$' */
+ v->value.string.val = >val[1];
+ v->value.string.len = s->len - 1;
return v;
}
diff --git a/src/backend/utils/adt/jsonpath_scan.l b/src/backend/utils/adt/jsonpath_scan.l
index 9650226f507..65d3626d9ac 100644
--- a/src/backend/utils/adt/jsonpath_scan.l
+++ b/src/backend/utils/adt/jsonpath_scan.l
@@ -20,6 +20,8 @@
#include "mb/pg_wchar.h"
#include "nodes/pg_list.h"
+#define JSONPATH_SPECIAL_CHARS "?%.[]{}()|&!=<>@#,*:-+/~`;\\\"' \b\f\n\r\t\v"
+
static JsonPathString scanstring;
/* Handles to the buffer that the lexer uses internally */
@@ -63,20 +65,21 @@ fprintf_to_ereport(const char *fmt, const char *msg)
* quoted variable names and C-style comments.
* Exclusive states:
* - quoted strings
- * - non-quoted strings
- * - quoted variable names
+ * - non-quoted identifiers
* - C-style comment
*/
%x xq
%x xnq
-%x xvq
%x xc
-special [\?\%\$\.\[\]\{\}\(\)\|\&\!\=\<\>\@\#\,\*:\-\+\/]
-blank [ \t\n\r\f]
-/* "other" means anything that's not special, blank, or '\' or '"' */
-other