Re: [rust-dev] Unicode vs hex escapes in Rust

Behdad Esfahbod Fri, 06 Jul 2012 15:41:24 -0700

On 07/04/2012 02:53 PM, Graydon Hoare wrote:
> On 12-07-04 6:55 AM, Behdad Esfahbod wrote:
>
>>    * Here: "\xHH, \uHHHH, \UHHHHHHHH Unicode escapes", I strongly suggest 
>> that
>> \xHH be modified to allow inputting direct UTF-8 bytes.  For ASCII it doesn't
>> make any different.  For Latin1, it gives the impression that strings are
>> stored in Latin1, which is not the case.  It would also make C / Python
>> escaped strings directly usable in Rust.  Ie. '\xE2\x98\xBA' would be a 
>> single
>> character equivalent to '\u263a', not three Latin1 characters.
> 
> Heh. This is interesting! I hadn't noticed yet but you're not _entirely_
> giving the whole story.
> 
>   - \xNN means a utf8 byte: python2, python3 'bytes' literals,
>     perl, go, C, C++, C++-0x u8 literals, and ruby
> 
>   - \xNN means a unicode codepoint: python3 'string' literals,
>     javascript, scheme (at least racket follows spec; others
>     get it randomly wrong by implementation), and current rust.
> 
>   - \xNN illegal, but the octal version means a unicode codepoint:
>     java.
> 
> So, my inclination is to follow your suggestion and actually go with the C and
> C++ style. But it's not exactly universal!


Thanks for the survey!  Indeed.  Programming languages is not my strong suit.

Cheers,
behdad
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Unicode vs hex escapes in Rust

Reply via email to