On Tue, Apr 24, 2012 at 11:30 AM, Matthieu Monrocq <[email protected]> wrote: > However this is at the condition of considering strings as list of > codepoints, and not list of bytes. List of bytes are useful in encoding and > decoding operations, but to manipulate Arabic or Korean, they fall short: > having users manipulate the strings byte-wise instead of codepoint-wise is a > recipe to disaster outside of English and Latin-1 representable languages. > > I understand that this may seem contradictory to Rust's original direction > of utf-8 encoded strings, but having worked with utf-8 strings using C++ > `std::string` I can assure you that apart from blindly passing them around, > one cannot do much. All modifiying operations require the use of Unicode > aware libraries... even `substr`.
Well, that's why you should use ICU instead of builtin language facilities for Unicode-aware processing. But there's a lot of code that really does just need to blindly pass around pre-composed strings, and an ICU or equivalent dependency (and in many cases even UTF encoding/decoding) would be overkill for those applications. In previous discussions about text processing on the list, IIRC it's been decided that the builtin string facilities should remain low-level, and bindings to ICU used for real text processing. -Joe _______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
