On 05/29/2014 07:55 AM, Teodor Sigaev wrote:
# select  '"\uaBcD"'::json;
   json
----------
 "\uaBcD"

but

# select  '"\uaBcD"'::jsonb;
ERROR:  invalid input syntax for type json
LINE 1: select  '"\uaBcD"'::jsonb;
                ^
DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
CONTEXT:  JSON data, line 1: ...

and

# select  '"\uaBcD"'::json -> 0;
ERROR:  invalid input syntax for type json
DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
CONTEXT:  JSON data, line 1: ...
Time: 0,696 ms

More than, json looks strange:

# select  '["\uaBcD"]'::json;
    json
------------
 ["\uaBcD"]

but

# select  '["\uaBcD"]'::json->0;
ERROR:  invalid input syntax for type json
DETAIL: Unicode escape values cannot be used for code point values above 007F when the server encoding is not UTF8.
CONTEXT:  JSON data, line 1: [...

Looks like random parse rules.



It is documented that for json we don't check the validity of unicode escapes until we try to use them. That's because the original json parser didn't check them, and if we started doing so now users who pg_upgraded would find themselves with invalid data in the database. The rules for jsonb are more strict, because we actually resolve the unicode escapes when constructing the jsonb object. There is nothing at all random about it, although I agree it's slightly inconsistent. It's been the subject of some discussion on -hackers previously, IIRC. I actually referred to this difference in my talk at pgCon last Friday.

Frankly, if you want to use json/jsonb, you are going to be best served by using a UTF8-encoded database, or not using non-ascii unicode escapes in json at all.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to