[HACKERS] json/jsonb and unicode escapes

Andrew Dunstan Thu, 29 May 2014 16:14:35 -0700

Here is a draft patch for some of the issues to do with unicode escapesthat Teodor raised the other day.

I think it does the right thing, although I want to add a few moreregression cases before committing it.


Comments welcome.

cheers

andrew

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index a7364f3..47ab9be 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -2274,7 +2274,27 @@ escape_json(StringInfo buf, const char *str)
 				appendStringInfoString(buf, "\\\"");
 				break;
 			case '\\':
-				appendStringInfoString(buf, "\\\\");
+				/*
+				 * Unicode escapes are passed through as is. There is no
+				 * requirement that they denote a valid character in the
+				 * server encoding - indeed that is a big part of their 
+				 * usefulness.
+				 *
+				 * All we require is that they consist of \uXXXX where 
+				 * the Xs are hexadecimal digits. It is the responsibility
+				 * of the caller of, say, to_json() to make sure that the
+				 * unicode escape is valid. 
+				 *
+				 * In the case of a jsonb string value beng escaped, the
+				 * only unicode escape that should be present is \u0000,
+				 * all the other unicode escapes will have been resolved.
+				 *
+				 */
+				if (p[1] == 'u' && isxdigit(p[2]) && isxdigit(p[3])
+					&& isxdigit(p[4]) && isxdigit(p[5]))
+					appendStringInfoCharMacro(buf, *p);
+				else
+					appendStringInfoString(buf, "\\\\");
 				break;
 			default:
 				if ((unsigned char) *p < ' ')
diff --git a/src/test/regress/expected/jsonb.out b/src/test/regress/expected/jsonb.out
index ae7c506..1e46939 100644
--- a/src/test/regress/expected/jsonb.out
+++ b/src/test/regress/expected/jsonb.out
@@ -61,9 +61,9 @@ LINE 1: SELECT '"\u000g"'::jsonb;
 DETAIL:  "\u" must be followed by four hexadecimal digits.
 CONTEXT:  JSON data, line 1: "\u000g...
 SELECT '"\u0000"'::jsonb;		-- OK, legal escape
-   jsonb   
------------
- "\\u0000"
+  jsonb   
+----------
+ "\u0000"
 (1 row)
 
 -- use octet_length here so we don't get an odd unicode char in the
diff --git a/src/test/regress/expected/jsonb_1.out b/src/test/regress/expected/jsonb_1.out
index 38a95b4..955dc42 100644
--- a/src/test/regress/expected/jsonb_1.out
+++ b/src/test/regress/expected/jsonb_1.out
@@ -61,9 +61,9 @@ LINE 1: SELECT '"\u000g"'::jsonb;
 DETAIL:  "\u" must be followed by four hexadecimal digits.
 CONTEXT:  JSON data, line 1: "\u000g...
 SELECT '"\u0000"'::jsonb;		-- OK, legal escape
-   jsonb   
------------
- "\\u0000"
+  jsonb   
+----------
+ "\u0000"
 (1 row)
 
 -- use octet_length here so we don't get an odd unicode char in the

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] json/jsonb and unicode escapes

Reply via email to