> Being too opinionated (regarding opinions that deviate from the norm)
tends to put people off the language unless there's a clear benefit to
forcing the alternative behavior.

We have already chosen to be opinionated by enforcing UTF-8 in our strings.
This is an extension of that break with tradition.

> Today we only need to teach the simple concept that strings are utf-8
encoded

History has shown that understanding Unicode is not a simple concept.
Asking for the "length" of a Unicode string is not a well-formed question,
and we must express this in our API. I also don't agree with accessor
functions that work on code units without warning, and for this reason I
strongly disagree with supporting the [] operator on strings.


On Wed, May 28, 2014 at 2:42 PM, Kevin Ballard <ke...@sb.org> wrote:

> Breaking with established convention is a dangerous thing to do. Being too
> opinionated (regarding opinions that deviate from the norm) tends to put
> people off the language unless there's a clear benefit to forcing the
> alternative behavior.
>
> In this case, there's no compelling benefit to naming the thing
> .byte_len() over merely documenting that .len() is in code units.
> Everything else that doesn't explicitly say "char" on strings is in code
> units too, so it's sensible that .len() is too. But having strings that
> don't have an inherent "length" is confusing to anyone who hasn't already
> memorized this difference.
>
> Today we only need to teach the simple concept that strings are utf-8
> encoded, and the corresponding notion that all of the accessor methods on
> strings (including indexing using []) use code units unless they specify
> otherwise (e.g. unless they contain the word "char").
>
> -Kevin
>
> On May 28, 2014, at 10:54 AM, Benjamin Striegel <ben.strie...@gmail.com>
> wrote:
>
> > People expect there to be a .len()
>
> This is the assumption that I object to. People expect there to be a
> .len() because strings have been fundamentally broken since time
> immemorial. Make people type .byte_len() and be explicit about their desire
> to index via code units.
>
>
> On Wed, May 28, 2014 at 1:12 PM, Kevin Ballard <ke...@sb.org> wrote:
>
>> It's .len() because slicing and other related functions work on byte
>> indexes.
>>
>> We've had this discussion before in the past. People expect there to be a
>> .len(), and the only sensible .len() is byte length (because char length is
>> not O(1) and not appropriate for use with most string-manipulation
>> functions).
>>
>> Since Rust strings are UTF-8 encoded text, it makes sense for .len() to
>> be the number of UTF-8 code units. Which happens to be the number of bytes.
>>
>> -Kevin
>>
>> On May 28, 2014, at 7:07 AM, Benjamin Striegel <ben.strie...@gmail.com>
>> wrote:
>>
>> I think that the naming of `len` here is dangerously misleading. Naive
>> ASCII-users will be free to assume that this is counting codepoints rather
>> than bytes. I'd prefer the name `byte_len` in order to make the behavior
>> here explicit.
>>
>>
>> On Wed, May 28, 2014 at 5:55 AM, Simon Sapin <simon.sa...@exyr.org>wrote:
>>
>>> On 28/05/2014 10:46, Aravinda VK wrote:
>>>
>>>> Thanks. I didn't know about char_len.
>>>> `unicode_str.as_slice().char_len()` is giving number of code points.
>>>>
>>>> Sorry for the confusion, I was referring codepoint as character in my
>>>> mail. char_len gives the correct output for my requirement. I have
>>>> written javascript script to convert from string length to grapheme
>>>> cluster length for Kannada language.
>>>>
>>>
>>> Be careful, JavaScript’s String.length counts UCS-2 code units, not code
>>> points…
>>>
>>>
>>> --
>>> Simon Sapin
>>> _______________________________________________
>>> Rust-dev mailing list
>>> Rust-dev@mozilla.org
>>> https://mail.mozilla.org/listinfo/rust-dev
>>>
>>
>> _______________________________________________
>> Rust-dev mailing list
>> Rust-dev@mozilla.org
>> https://mail.mozilla.org/listinfo/rust-dev
>>
>>
>>
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>
>
>
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to