What about renaming len() to units()? I don't see len() as a problem, but maybe as a potential source of confusion. I also strongly believe that no one reads documentation if they *think* they understand what the code is doing. Different people will see len(), assume that it does whatever they want to do at the moment, and for a significant portion of strings that they encounter it will seem like their interpretation, whatever it is, is correct. So, why not rename len() to something like units()? Its more explicit with the value that its actually producing than len() and its not all that much longer to type. As stated, exactly what a string is varies greatly between languages, so, I don't think that lacking a function named len() is bad. Granted, I would expect that many people expect that a string will have method named len() (or length()) and when they don't find one, they will go to the documentation and find units(). I think this is a good thing since the documentation can then explain exactly what it does.
I much prefer len() to byte_len(), though. byte_len() seems like a bit much to type and it seems like all the other methods on strings should then be renamed with the byte_ prefix which seems unpleasant. -Palmer Cox On Thu, May 29, 2014 at 3:39 AM, Masklinn <maskl...@masklinn.net> wrote: > > On 2014-05-29, at 08:37 , Aravinda VK <hallimanearav...@gmail.com> wrote: > > > I think returning length of string in bytes is just fine. Since I didn't > know about the availability of char_len in rust caused this confusion. > > > > python 2.7 - Returns length of string in bytes, Python 3 returns number > of codepoints. > > Nope, depends on the string type *and* on compilation options. > > * Python 2's `str` and Python 3's `bytes` are byte sequences, their > len() returns their byte counts. > * Python 2's `unicode` and Python 3's `str` before 3.3 returns a code > units count which may be UCS2 or UCS4 (depending whether the > interpreter was compiled with `—enable-unicode=ucs2` — the default — > or `—enable-unicode=ucs4`. Only the latter case is a true code points > count. > * Python 3.3's `str` switched to the Flexible String Representation, > the build-time option disappeared and len() always returns the number > of codepoints. > > Note that in no case to len() operations take normalisation or visual > composition in account. > > > JS returns number of codepoints. > > JS returns the number of UCS2 code units, which is twice the number of > code points for those in astral planes. > _______________________________________________ > Rust-dev mailing list > Rust-dev@mozilla.org > https://mail.mozilla.org/listinfo/rust-dev >
_______________________________________________ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev