Re: [rust-dev] Syntax of vectors, slices, etc

Joe Groff Tue, 24 Apr 2012 11:50:02 -0700

On Tue, Apr 24, 2012 at 11:30 AM, Matthieu Monrocq
<[email protected]> wrote:
> However this is at the condition of considering strings as list of
> codepoints, and not list of bytes. List of bytes are useful in encoding and
> decoding operations, but to manipulate Arabic or Korean, they fall short:
> having users manipulate the strings byte-wise instead of codepoint-wise is a
> recipe to disaster outside of English and Latin-1 representable languages.
>
> I understand that this may seem contradictory to Rust's original direction
> of utf-8 encoded strings, but having worked with utf-8 strings using C++
> `std::string` I can assure you that apart from blindly passing them around,
> one cannot do much. All modifiying operations require the use of Unicode
> aware libraries... even `substr`.


Well, that's why you should use ICU instead of builtin language
facilities for Unicode-aware processing. But there's a lot of code
that really does just need to blindly pass around pre-composed
strings, and an ICU or equivalent dependency (and in many cases even
UTF encoding/decoding) would be overkill for those applications. In
previous discussions about text processing on the list, IIRC it's been
decided that the builtin string facilities should remain low-level,
and bindings to ICU used for real text processing.

-Joe
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Syntax of vectors, slices, etc

Reply via email to