Re: [rust-dev] How to find Unicode string length in rustlang

Kevin Ballard Wed, 28 May 2014 10:13:43 -0700

It's .len() because slicing and other related functions work on byte indexes.


We've had this discussion before in the past. People expect there to be a 
.len(), and the only sensible .len() is byte length (because char length is not 
O(1) and not appropriate for use with most string-manipulation functions).

Since Rust strings are UTF-8 encoded text, it makes sense for .len() to be the 
number of UTF-8 code units. Which happens to be the number of bytes.

-Kevin

On May 28, 2014, at 7:07 AM, Benjamin Striegel <[email protected]> wrote:

> I think that the naming of `len` here is dangerously misleading. Naive 
> ASCII-users will be free to assume that this is counting codepoints rather 
> than bytes. I'd prefer the name `byte_len` in order to make the behavior here 
> explicit.
> 
> 
> On Wed, May 28, 2014 at 5:55 AM, Simon Sapin <[email protected]> wrote:
> On 28/05/2014 10:46, Aravinda VK wrote:
> Thanks. I didn't know about char_len.
> `unicode_str.as_slice().char_len()` is giving number of code points.
> 
> Sorry for the confusion, I was referring codepoint as character in my
> mail. char_len gives the correct output for my requirement. I have
> written javascript script to convert from string length to grapheme
> cluster length for Kannada language.
> 
> Be careful, JavaScript’s String.length counts UCS-2 code units, not code 
> points…
> 
> 
> -- 
> Simon Sapin
> _______________________________________________
> Rust-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/rust-dev
> 
> _______________________________________________
> Rust-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/rust-dev

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] How to find Unicode string length in rustlang

Reply via email to