On sön, 2010-03-28 at 23:24 -0400, Joseph Adams wrote:
> Thus, here's an example of how (in my opinion) character sets and such
> should be handled in the JSON code:
> 
> Suppose the client's encoding is UTF-16, and the server's encoding is
> Latin-1.  When JSON is stored to the database:
>  1. The client is responsible and sends a valid UTF-16 JSON string.
>  2. PostgreSQL checks to make sure it is valid UTF-16, then converts
> it to UTF-8.
>  3. The JSON code parses it (to ensure it's valid).
>  4. The JSON code unparses it (to get a representation without
> needless whitespace).  It is given a flag indicating it should only
> output ASCII text.
>  5. The ASCII is stored in the server, since it is valid Latin-1.
> 
> When JSON is retrieved from the database:
>  1. ASCII is retrieved from the server
>  2. If user needs to extract one or more fields, the JSON is parsed,
> and the fields are extracted.
>  3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

The problem I see here is that a data type output function is normally
not aware of the client encoding.  The alternatives that I see is that
you always escape everything you see to plain ASCII, so it's valid in
every server encoding, but that would result in pretty sad behavior for
users of languages that don't use a lot of ASCII characters, or you
decree a nonstandard JSON variant that momentarily uses whatever
encoding you decide.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to