On 29/05/14 06:38, Kevin Ballard wrote:
On May 28, 2014, at 1:26 PM, Benjamin Striegel <ben.strie...@gmail.com <mailto:ben.strie...@gmail.com>> wrote:

> Unicode is not a simple concept. UTF-8 on the other hand is a pretty simple concept.

I don't think we can fully divorce these two ideas. Understanding UTF-8 still implies understanding the difference between code points, code units, and grapheme clusters. If we have a single unadorned `len` function, that implies the existence of a "default" length to a UTF-8 string, which is a lie. It also *fails* to suggest the existence of alternative measures of length of a UTF-8 string. Finally, the choice of byte length as the default length metric encourages the horrid status quo, which is the perpetuation of code that is tested and works in ASCII environments but barfs as soon as anyone from a sufficiently-foreign culture tries to use it. Dedicating ourselves to Unicode support does us no good if the remainder of our API encourages the depressingly-typical ASCII-ism that pervades nearly every other language.

Do you honestly believe that calling it .byte_len() will do anything besides confusing anyone who expects .len() to work, and resulting in code that looks any different than just using .byte_len() everywhere people use .len() today?

Forcing more verbose, annoying, unconventional names on people won't actually change how they process strings. It will just confuse and annoy them.

-Kevin


_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Changing the names of methods on strings seems very similar how Path does not implement Show (except with even stronger motivation, because strings have at least 3 sensible interpretations of what the length could be).


Huon
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to