It's .len() because slicing and other related functions work on byte indexes.
We've had this discussion before in the past. People expect there to be a .len(), and the only sensible .len() is byte length (because char length is not O(1) and not appropriate for use with most string-manipulation functions). Since Rust strings are UTF-8 encoded text, it makes sense for .len() to be the number of UTF-8 code units. Which happens to be the number of bytes. -Kevin On May 28, 2014, at 7:07 AM, Benjamin Striegel <ben.strie...@gmail.com> wrote: > I think that the naming of `len` here is dangerously misleading. Naive > ASCII-users will be free to assume that this is counting codepoints rather > than bytes. I'd prefer the name `byte_len` in order to make the behavior here > explicit. > > > On Wed, May 28, 2014 at 5:55 AM, Simon Sapin <simon.sa...@exyr.org> wrote: > On 28/05/2014 10:46, Aravinda VK wrote: > Thanks. I didn't know about char_len. > `unicode_str.as_slice().char_len()` is giving number of code points. > > Sorry for the confusion, I was referring codepoint as character in my > mail. char_len gives the correct output for my requirement. I have > written javascript script to convert from string length to grapheme > cluster length for Kannada language. > > Be careful, JavaScript’s String.length counts UCS-2 code units, not code > points… > > > -- > Simon Sapin > _______________________________________________ > Rust-dev mailing list > Rust-dev@mozilla.org > https://mail.mozilla.org/listinfo/rust-dev > > _______________________________________________ > Rust-dev mailing list > Rust-dev@mozilla.org > https://mail.mozilla.org/listinfo/rust-dev
_______________________________________________ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev