What about renaming len() to units()?

I don't see len() as a problem, but maybe as a potential source of
confusion. I also strongly believe that no one reads documentation if they
*think* they understand what the code is doing. Different people will see
len(), assume that it does whatever they want to do at the moment, and for
a significant portion of strings that they encounter it will seem like
their interpretation, whatever it is, is correct. So, why not rename len()
to something like units()? Its more explicit with the value that its
actually producing than len() and its not all that much longer to type. As
stated, exactly what a string is varies greatly between languages, so, I
don't think that lacking a function named len() is bad. Granted, I would
expect that many people expect that a string will have method named len()
(or length()) and when they don't find one, they will go to the
documentation and find units(). I think this is a good thing since the
documentation can then explain exactly what it does.

I much prefer len() to byte_len(), though. byte_len() seems like a bit much
to type and it seems like all the other methods on strings should then be
renamed with the byte_ prefix which seems unpleasant.

-Palmer Cox


On Thu, May 29, 2014 at 3:39 AM, Masklinn <maskl...@masklinn.net> wrote:

>
> On 2014-05-29, at 08:37 , Aravinda VK <hallimanearav...@gmail.com> wrote:
>
> > I think returning length of string in bytes is just fine. Since I didn't
> know about the availability of char_len in rust caused this confusion.
> >
> > python 2.7 - Returns length of string in bytes, Python 3 returns number
> of codepoints.
>
> Nope, depends on the string type *and* on compilation options.
>
> * Python 2's `str` and Python 3's `bytes` are byte sequences, their
>  len() returns their byte counts.
> * Python 2's `unicode` and Python 3's `str` before 3.3 returns a code
>  units count which may be UCS2 or UCS4 (depending whether the
>  interpreter was compiled with `—enable-unicode=ucs2` — the default —
>  or `—enable-unicode=ucs4`. Only the latter case is a true code points
>  count.
> * Python 3.3's `str` switched to the Flexible String Representation,
>  the build-time option disappeared and len() always returns the number
>  of codepoints.
>
> Note that in no case to len() operations take normalisation or visual
> composition in account.
>
> > JS returns number of codepoints.
>
> JS returns the number of UCS2 code units, which is twice the number of
> code points for those in astral planes.
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
>
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to