On Thursday, May 1, 2014, Nathan Myers <[email protected]> wrote: > It would be a mistake for a byte sequence container, stream, or string > type to know anything about particular encodings. An encoding is an > interpretation imposed on a byte sequence. Users of a sequence need to be > able to choose what interpretation to apply without interference from some > previous user's choice, and without need to make a copy.
You can "decode" an existing rope with an explicit codec without altering the stream. It's metadata essentially. As an example, from 8-bit raw to UTF-8. The byte stream does not change unless you "encode" (which really transcodes as it flattens the rope). > As an example, a given string may be seen as raw bytes, as a series of > delimited records, as Unicode code points within some of those records, as > a series of JSON name-value pairs within such a record, and as a decimal > number in a JSON value part. The same interpretations need to work on a > raw byte stream that would not tolerate in-band Rust-specific annotations. The encode operation would be free if the rope has only a single leaf and the codec is the same. The UTF-8 view of a string is an interesting special case. Depending on > context, what is considered a "character" may be a code point of at most 4 > bytes, or any number of bytes representing a base and combining characters > which might or might not be collapsible to a canonical, single code point, > or a series of such constructs that is to be displayed as a ligature such > as "Qu" or "ffi". (Some languages are best displayed as mostly ligatures.) > I think it's convenient that the string provides an encoding-aware interface. You normally want to work character by character, not byte by byte, if you have specified an encoding. Otherwise, just don't declare and use 8-bit raw.
_______________________________________________ Rust-dev mailing list [email protected] https://mail.mozilla.org/listinfo/rust-dev
