Felix Frank <[email protected]> writes:
>>> All other bytes (including ASCII) are unscathed. Is your point about the
>>> UTF8->UTF16 bit?
>> 
>> No, about emitting a four byte, two character sequence as twelve bytes 
>> because
>> it gratuitously escapes things:
>> 
>>     puts utf8_to_pson("ÂĎ")
>>     => "\u00c2\u010e"
>
> Ah, I see. So you're saying that when everything is UTF16BE anyway, it
> can as well be transferred binary with said encoding instead of escaping
> the encoding in addition.

*nod*, or just left as UTF-8, which would be my preference.  The
transformation to UTF-16 is only required because the \u escapes are necessary
UTF-16; if the string is UTF-8 then it can just go as-is.

(The PSON encoder had a note somewhere to the effect that this avoids
 surprising poorly written JSON decoders that assume ASCII or 8-bit encoded
 text rather than UTF-8, by staying strictly in ASCII.  Which, inside puppet,
 isn't strictly required. ;)

> I'm inclined to second that, except I'm not sure whether there are
> conceivable non-ASCII non-UTF strings that include valid UTF16
> characters. The parser could not discern between UTF and binary then, or
> could it?

Given that, technically, the JSON content can't ever contain anything that
isn't a valid Unicode character, this is an undefined question. :)

Practically, I don't know what PSON does, but my expectation is it would treat
it as UTF-16 and break when that didn't actually match the correct use pattern
for binary data.

        Daniel
-- 
✣ Daniel Pittman            ✉ [email protected]            ☎ +61 401 155 707
               ♽ made with 100 percent post-consumer electrons

-- 
You received this message because you are subscribed to the Google Groups 
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/puppet-dev?hl=en.

Reply via email to