On 2014-05-29 05:36, Kevin Ballard wrote: [--snip--] > And when dealing with a sequence in a precise encoding, the natural unit to > work > with is the code unit (and this has precedence in other languages, such as JavaScript, Obj-C, and Go). >
JavaScript: $ node > var s = "hï"; // Note the accent undefined > s.length; 2 Rust: $ cat fn main() { let l = "hï".len(); // Note the accent println!("{:u}", l); } $ rustc hello.rs $ ./hello 3 No matter how defective the notion of "length" may be, personally I think that people will expect the former, but will be very surprised by the latter. There are certainly cases where the JavaScript version is wrong, but I conjecture that it "works" for the vast majority of cases that people and programs are likely to encounter. IMO expecting people to read docs is a poor substitute for being explicit in a method name about what the method does, especially when it costs only 5 characters. The Principle of Least Astonishment and all that. As a rule people don't read docs until they've encountered a "bug" in their expectations vs. what the language/library actually does -- at which point they're already annoyed and don't need to be further annoyed by the realization that "it does something completely non-intuitive" (for their perspective). Thankfully the programming world has become more aware of i18n issues, but for people who still predominantly use ASCII such bugs may lay dormant for a long time before anyone discovers them. Just my €0.02. Regards, _______________________________________________ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev