On 01/05/14 09:53 AM, Malthe Borch wrote:
> In Rust, the built-in std::str type "is a sequence of unicode
> codepoints encoded as a stream of UTF-8 bytes".
> 
> Meanwhile, building on experience with Python 2 and 3, I think it's
> worth considering a more flexible design.
> 
> A string would be essentially a rope where each leaf specifies an
> encoding, e.g. UTF-8 or ISO8859-1 (ideally expressed as one or two
> bytes).
> 
> That is, a string may be comprised of segments of different encodings.
> On the I/O barrier you would then explicitly encode (and flatten) to a
> compatible encoding such as UTF-8.
> 
> Likewise, data may be read as 8-bit raw and then "decoded" at a later
> stage. For instance, HTTP request headers are ISO8859-1, but the
> entire input stream is 8-bit raw.
> 
> Sources:
> 
> - https://maltheborch.com/2014/04/pythons-missing-string-type
> - http://lucumr.pocoo.org/2014/1/9/ucs-vs-utf8/

It needs to be a specific encoding both for sane performance and to make
good use of the type system. Unicode doesn't map 1:1 with other
encodings so they should be separate types with explicit conversion
functions exposed dealing with encoding errors.

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to