In Rust, the built-in std::str type "is a sequence of unicode codepoints encoded as a stream of UTF-8 bytes".
Meanwhile, building on experience with Python 2 and 3, I think it's worth considering a more flexible design. A string would be essentially a rope where each leaf specifies an encoding, e.g. UTF-8 or ISO8859-1 (ideally expressed as one or two bytes). That is, a string may be comprised of segments of different encodings. On the I/O barrier you would then explicitly encode (and flatten) to a compatible encoding such as UTF-8. Likewise, data may be read as 8-bit raw and then "decoded" at a later stage. For instance, HTTP request headers are ISO8859-1, but the entire input stream is 8-bit raw. Sources: - https://maltheborch.com/2014/04/pythons-missing-string-type - http://lucumr.pocoo.org/2014/1/9/ucs-vs-utf8/ _______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
