On Sun, Mar 28, 2010 at 8:33 PM, Robert Haas <robertmh...@gmail.com> wrote: > On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander <mrylan...@gmail.com> wrote: >> In practice, every parser/serializer I've used (including the one I >> helped write) allows (and, often, forces) any non-ASCII character to >> be encoded as \u followed by a string of four hex digits. > > Is it correct to say that the only feasible place where non-ASCII > characters can be used is within string constants?
Yes. That includes object property strings -- they are quoted string literals. > If so, it might be > reasonable to disallow characters with the high-bit set unless the > server encoding is one of the flavors of Unicode of which the spec > approves. I'm tempted to think that when the server encoding is > Unicode we really ought to allow Unicode characters natively, because > turning a long string of two-byte wide chars into a long string of > six-byte wide chars sounds pretty evil from a performance point of > view. > +1 As an aside, \u-encoded (escaped) characters and native multi-byte sequences (of any RFC-allowable Unicode encoding) are exactly equivalent in JSON -- it's a storage and transmission format, and doesn't prescribe the application-internal representation of the data. If it's faster (which it almost certainly is) to not mangle the data when it's all staying server side, that seems like a useful optimization. For output to the client, however, it would be useful to provide a \u-escaping function, which (AIUI) should always be safe regardless of client encoding. -- Mike Rylander | VP, Research and Design | Equinox Software, Inc. / The Evergreen Experts | phone: 1-877-OPEN-ILS (673-6457) | email: mi...@esilibrary.com | web: http://www.esilibrary.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers