Felix Frank <[email protected]> writes:
>>> All other bytes (including ASCII) are unscathed. Is your point about the
>>> UTF8->UTF16 bit?
>>
>> No, about emitting a four byte, two character sequence as twelve bytes
>> because
>> it gratuitously escapes things:
>>
>> puts utf8_to_pson("ÂĎ")
>> => "\u00c2\u010e"
>
> Ah, I see. So you're saying that when everything is UTF16BE anyway, it
> can as well be transferred binary with said encoding instead of escaping
> the encoding in addition.
*nod*, or just left as UTF-8, which would be my preference. The
transformation to UTF-16 is only required because the \u escapes are necessary
UTF-16; if the string is UTF-8 then it can just go as-is.
(The PSON encoder had a note somewhere to the effect that this avoids
surprising poorly written JSON decoders that assume ASCII or 8-bit encoded
text rather than UTF-8, by staying strictly in ASCII. Which, inside puppet,
isn't strictly required. ;)
> I'm inclined to second that, except I'm not sure whether there are
> conceivable non-ASCII non-UTF strings that include valid UTF16
> characters. The parser could not discern between UTF and binary then, or
> could it?
Given that, technically, the JSON content can't ever contain anything that
isn't a valid Unicode character, this is an undefined question. :)
Practically, I don't know what PSON does, but my expectation is it would treat
it as UTF-16 and break when that didn't actually match the correct use pattern
for binary data.
Daniel
--
✣ Daniel Pittman ✉ [email protected] ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons
--
You received this message because you are subscribed to the Google Groups
"Puppet Developers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/puppet-dev?hl=en.