On 05/01/2014 02:52 PM, Patrick Walton wrote:
On 5/1/14 6:53 AM, Malthe Borch wrote:
In Rust, the built-in std::str type "is a sequence of unicode
codepoints encoded as a stream of UTF-8 bytes".
...
A string would be essentially a rope where each leaf specifies an
encoding, e.g. UTF-8 or ISO8859-1 (ideally expressed as one or two
bytes).
This is too complex for a systems language with a simple library.
In defining a library string we always grapple over how it
should differ from a raw (variable or fixed) array of bytes.
Ease of appending and of assigning into substrings always
comes up. In the old days, copies shared storage, but nowadays
that's considered evil. Indexed random access lookup was once
thought essential, but with today's variable-sized characters,
strings have become sequential structures. We might snip out a
substring and splice another in its place, but we must identify
those places by stepping iterators to them. We need to put string values
in partial or total order, but no single ordering is
compellingly best. Equality depends on context.
The outcome is that the context-independent requirements on
strings may not differ enough from an array of bytes to justify
a separate type. We might better give our byte arrays a few
stringy capabilities. Most users of strings don't need to know
anything about what's in them, and can operate on the raw byte
arrays. To use a string as a map key, though, implies choices:
fold case? canonicalize sequences? We need an object that can
remember your choices, and that the map can apply to strings
given to it.
Ideally what we use to express our interpretation of some set
of strings could be used on any sequence of bytes, not necessarily
contiguous in memory, not necessarily all in memory at once,
not necessarily even produced until called for.
The history of programming languages is littered with mistakes
around string types. There's no reason why Rust must repeat
them all.
Nathan Myers
_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev