On May 28, 2014, at 1:26 PM, Benjamin Striegel <ben.strie...@gmail.com> wrote:

> > Unicode is not a simple concept. UTF-8 on the other hand is a pretty simple 
> > concept.
> 
> I don't think we can fully divorce these two ideas. Understanding UTF-8 still 
> implies understanding the difference between code points, code units, and 
> grapheme clusters. If we have a single unadorned `len` function, that implies 
> the existence of a "default" length to a UTF-8 string, which is a lie. It 
> also *fails* to suggest the existence of alternative measures of length of a 
> UTF-8 string. Finally, the choice of byte length as the default length metric 
> encourages the horrid status quo, which is the perpetuation of code that is 
> tested and works in ASCII environments but barfs as soon as anyone from a 
> sufficiently-foreign culture tries to use it. Dedicating ourselves to Unicode 
> support does us no good if the remainder of our API encourages the 
> depressingly-typical ASCII-ism that pervades nearly every other language.

Do you honestly believe that calling it .byte_len() will do anything besides 
confusing anyone who expects .len() to work, and resulting in code that looks 
any different than just using .byte_len() everywhere people use .len() today?

Forcing more verbose, annoying, unconventional names on people won't actually 
change how they process strings. It will just confuse and annoy them.

-Kevin
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to