On May 28, 2014, at 1:26 PM, Benjamin Striegel <ben.strie...@gmail.com> wrote:
> > Unicode is not a simple concept. UTF-8 on the other hand is a pretty simple
> > concept.
>
> I don't think we can fully divorce these two ideas. Understanding UTF-8 still
> implies understanding the difference between code points, code units, and
> grapheme clusters. If we have a single unadorned `len` function, that implies
> the existence of a "default" length to a UTF-8 string, which is a lie. It
> also *fails* to suggest the existence of alternative measures of length of a
> UTF-8 string. Finally, the choice of byte length as the default length metric
> encourages the horrid status quo, which is the perpetuation of code that is
> tested and works in ASCII environments but barfs as soon as anyone from a
> sufficiently-foreign culture tries to use it. Dedicating ourselves to Unicode
> support does us no good if the remainder of our API encourages the
> depressingly-typical ASCII-ism that pervades nearly every other language.
Do you honestly believe that calling it .byte_len() will do anything besides
confusing anyone who expects .len() to work, and resulting in code that looks
any different than just using .byte_len() everywhere people use .len() today?
Forcing more verbose, annoying, unconventional names on people won't actually
change how they process strings. It will just confuse and annoy them.
-Kevin
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev