On 29/05/14 06:38, Kevin Ballard wrote:
On May 28, 2014, at 1:26 PM, Benjamin Striegel <ben.strie...@gmail.com
<mailto:ben.strie...@gmail.com>> wrote:
> Unicode is not a simple concept. UTF-8 on the other hand is a
pretty simple concept.
I don't think we can fully divorce these two ideas. Understanding
UTF-8 still implies understanding the difference between code points,
code units, and grapheme clusters. If we have a single unadorned
`len` function, that implies the existence of a "default" length to a
UTF-8 string, which is a lie. It also *fails* to suggest the
existence of alternative measures of length of a UTF-8 string.
Finally, the choice of byte length as the default length metric
encourages the horrid status quo, which is the perpetuation of code
that is tested and works in ASCII environments but barfs as soon as
anyone from a sufficiently-foreign culture tries to use it.
Dedicating ourselves to Unicode support does us no good if the
remainder of our API encourages the depressingly-typical ASCII-ism
that pervades nearly every other language.
Do you honestly believe that calling it .byte_len() will do anything
besides confusing anyone who expects .len() to work, and resulting in
code that looks any different than just using .byte_len() everywhere
people use .len() today?
Forcing more verbose, annoying, unconventional names on people won't
actually change how they process strings. It will just confuse and
annoy them.
-Kevin
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev
Changing the names of methods on strings seems very similar how Path
does not implement Show (except with even stronger motivation, because
strings have at least 3 sensible interpretations of what the length
could be).
Huon
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev