On Tue, Jun 11, 2013 at 06:58:05PM -0400, Andrew Dunstan wrote: > > On 06/11/2013 06:26 PM, Noah Misch wrote: >> >>> As a final counter example, let me note that Postgres itself handles >>> Unicode escapes differently in UTF8 databases - in other databases it >>> only accepts Unicode escapes up to U+007f, i.e. ASCII characters. >> I don't see a counterexample there; every database that accepts without error >> a given Unicode escape produces from it the same text value. The proposal to >> which I objected was akin to having non-UTF8 databases silently translate >> E'\u0220' to E'\\u0220'. > > What? > > There will be no silent translation. The only debate here is about how > these databases turn strings values inside a json datum into PostgreSQL > text values via the documented operation of certain functions and > operators. If the JSON datum doesn't already contain a unicode escape > then nothing of what's been discussed would apply. Nothing whatever > that's been proposed would cause a unicode escape sequence to be emitted > that wasn't already there in the first place, and no patch that I have > submitted has contained any escape sequence generation at all.
Under your proposal to which I was referring, this statement would return true in UTF8 databases and false in databases of other encodings: SELECT '["\u0220"]'::json ->> 0 = E'\u0220' Contrast the next statement, which would return false in UTF8 databases and true in databases of other encodings: SELECT '["\u0220"]'::json ->> 0 = E'\\u0220' Defining ->>(json,int) and ->>(json,text) in this way would be *akin to* having "SELECT E'\u0220' = E'\\u0220'" return true in non-UTF8 databases. I refer to user-visible semantics, not matters of implementation. Does that help to clarify my earlier statement? -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers