It's .len() because slicing and other related functions work on byte indexes.

We've had this discussion before in the past. People expect there to be a 
.len(), and the only sensible .len() is byte length (because char length is not 
O(1) and not appropriate for use with most string-manipulation functions).

Since Rust strings are UTF-8 encoded text, it makes sense for .len() to be the 
number of UTF-8 code units. Which happens to be the number of bytes.

-Kevin

On May 28, 2014, at 7:07 AM, Benjamin Striegel <ben.strie...@gmail.com> wrote:

> I think that the naming of `len` here is dangerously misleading. Naive 
> ASCII-users will be free to assume that this is counting codepoints rather 
> than bytes. I'd prefer the name `byte_len` in order to make the behavior here 
> explicit.
> 
> 
> On Wed, May 28, 2014 at 5:55 AM, Simon Sapin <simon.sa...@exyr.org> wrote:
> On 28/05/2014 10:46, Aravinda VK wrote:
> Thanks. I didn't know about char_len.
> `unicode_str.as_slice().char_len()` is giving number of code points.
> 
> Sorry for the confusion, I was referring codepoint as character in my
> mail. char_len gives the correct output for my requirement. I have
> written javascript script to convert from string length to grapheme
> cluster length for Kannada language.
> 
> Be careful, JavaScript’s String.length counts UCS-2 code units, not code 
> points…
> 
> 
> -- 
> Simon Sapin
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
> 
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev

_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to