I'm missing something about this. The python 2-3 migration plan is to treat a value expressed with 'str' as unambiguously textual, and a value expressed with 'bytes' as unambiguously data. Doesn't that line up with this proposal?
On Wed, Aug 21, 2013 at 11:49 AM, Rafael Schloming <[email protected]> wrote: > I think for python at least if we were to treat ambiguous string values as > text rather than data, we would be at odds with the python community's 2->3 > migration plan. The following thread has a useful discussion of this that > is worth a careful read: > > http://stackoverflow.com/questions/1736228/python-data-vs-text/1736279#1736279 > > --Rafael > > > > On Wed, Aug 21, 2013 at 11:31 AM, Justin Ross <[email protected]> wrote: > >> Jimmy, thanks for getting this started. I'd love your feedback to >> help sort this out. >> >> I think these are the cases: >> >> 1. If the language string is unambiguously textual, send it as amqp str16 >> 2. If the language string is unambiguously arbitrary bytes, send it as >> amqp vbin >> >> These are easy. We can tell the user's intention, and we can do the >> right thing. >> >> 3. If the language string is an overloaded text/bytes type, as is >> regrettably quite common, what do we do then? >> >> The current answer to this question is "send it as vbin". That's very >> safe, insofar as it won't throw any sort of encoding exception. It >> does not, however, always honor what I think is the user's more >> typical intention: produce an ascii string at the other end. >> >> So for 3, I'd like to consider the possibility of, by default, sending >> ambiguous language strings as ascii rendered to amqp str16. This >> requires an encoding step that may produce errors. And maybe that's >> just too obnoxious! That's what I'd like to know. >> >> In summary, if we have a way to determine what the user wanted (text >> or bytes), we should try to carry that through on the wire. At the >> following URL I've tried to map out what type information we can get >> for each language. Please update it as you please. >> >> >> https://cwiki.apache.org/confluence/display/qpid/Language+support+for+unambiguous+text+string+and+byte+array+types >> >> On Wed, Aug 21, 2013 at 8:44 AM, Jimmy Jones <[email protected]> >> wrote: >> >> > AFAIK in perl, if you include unicode characters in a string it'll >> >> > set the utf8 flag. If you don't include any unicode characters (eg. 7 >> >> > bit ascii, or raw bytes) the flag won't be set. So given a perl >> >> > scalar that doesn't contain any utf8 characters, you don't know if >> >> > its a textual string (str16) or a binary string (vbin). There is a >> >> > is_utf8_string function, but that'll only tell you if the string >> >> > would be valid utf8, but it could be a binary string that happens to >> >> > be valid utf8, so that's not really safe. >> >> >> >> You can explicitly mark it as utf8 using utf8::upgrade() though, right? >> >> Certainly I tried that in a simple test and the property in question was >> >> then sent as str16. >> > >> > Yes, if I as a user had a string that was textual, I could call >> utf8::upgrade() to ensure it got sent as str16. I guess this is similar in >> concept to calling setEncoding in C++, although maybe less natural in a >> dynamically typed language. >> >> It would be more reasonable to treat perl scalars as textual for our >> API if perl offered a good way to explicitly handle byte arrays. My >> (certainly insufficient) web browsing suggested that wasn't really >> available, or not in a form recommended for use. Any candidates for a >> serviceable explicitly-arbitrary-bytes-and-not-text-at-all "type" in >> perl? >> >> Thanks! >> Justin >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
