> Being too opinionated (regarding opinions that deviate from the norm) tends to put people off the language unless there's a clear benefit to forcing the alternative behavior.
We have already chosen to be opinionated by enforcing UTF-8 in our strings. This is an extension of that break with tradition. > Today we only need to teach the simple concept that strings are utf-8 encoded History has shown that understanding Unicode is not a simple concept. Asking for the "length" of a Unicode string is not a well-formed question, and we must express this in our API. I also don't agree with accessor functions that work on code units without warning, and for this reason I strongly disagree with supporting the [] operator on strings. On Wed, May 28, 2014 at 2:42 PM, Kevin Ballard <ke...@sb.org> wrote: > Breaking with established convention is a dangerous thing to do. Being too > opinionated (regarding opinions that deviate from the norm) tends to put > people off the language unless there's a clear benefit to forcing the > alternative behavior. > > In this case, there's no compelling benefit to naming the thing > .byte_len() over merely documenting that .len() is in code units. > Everything else that doesn't explicitly say "char" on strings is in code > units too, so it's sensible that .len() is too. But having strings that > don't have an inherent "length" is confusing to anyone who hasn't already > memorized this difference. > > Today we only need to teach the simple concept that strings are utf-8 > encoded, and the corresponding notion that all of the accessor methods on > strings (including indexing using []) use code units unless they specify > otherwise (e.g. unless they contain the word "char"). > > -Kevin > > On May 28, 2014, at 10:54 AM, Benjamin Striegel <ben.strie...@gmail.com> > wrote: > > > People expect there to be a .len() > > This is the assumption that I object to. People expect there to be a > .len() because strings have been fundamentally broken since time > immemorial. Make people type .byte_len() and be explicit about their desire > to index via code units. > > > On Wed, May 28, 2014 at 1:12 PM, Kevin Ballard <ke...@sb.org> wrote: > >> It's .len() because slicing and other related functions work on byte >> indexes. >> >> We've had this discussion before in the past. People expect there to be a >> .len(), and the only sensible .len() is byte length (because char length is >> not O(1) and not appropriate for use with most string-manipulation >> functions). >> >> Since Rust strings are UTF-8 encoded text, it makes sense for .len() to >> be the number of UTF-8 code units. Which happens to be the number of bytes. >> >> -Kevin >> >> On May 28, 2014, at 7:07 AM, Benjamin Striegel <ben.strie...@gmail.com> >> wrote: >> >> I think that the naming of `len` here is dangerously misleading. Naive >> ASCII-users will be free to assume that this is counting codepoints rather >> than bytes. I'd prefer the name `byte_len` in order to make the behavior >> here explicit. >> >> >> On Wed, May 28, 2014 at 5:55 AM, Simon Sapin <simon.sa...@exyr.org>wrote: >> >>> On 28/05/2014 10:46, Aravinda VK wrote: >>> >>>> Thanks. I didn't know about char_len. >>>> `unicode_str.as_slice().char_len()` is giving number of code points. >>>> >>>> Sorry for the confusion, I was referring codepoint as character in my >>>> mail. char_len gives the correct output for my requirement. I have >>>> written javascript script to convert from string length to grapheme >>>> cluster length for Kannada language. >>>> >>> >>> Be careful, JavaScript’s String.length counts UCS-2 code units, not code >>> points… >>> >>> >>> -- >>> Simon Sapin >>> _______________________________________________ >>> Rust-dev mailing list >>> Rust-dev@mozilla.org >>> https://mail.mozilla.org/listinfo/rust-dev >>> >> >> _______________________________________________ >> Rust-dev mailing list >> Rust-dev@mozilla.org >> https://mail.mozilla.org/listinfo/rust-dev >> >> >> > _______________________________________________ > Rust-dev mailing list > Rust-dev@mozilla.org > https://mail.mozilla.org/listinfo/rust-dev > > >
_______________________________________________ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev