On 6/7/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote: > ... I will use XML character references to denote code points here. > Wherever you see such a thing in this e-mail, replace it in your > mind with the corresponding code point *immediately*. E.g. > len(r'�c5;') == 1, but len(r'\u00c5') == 6.
> In the following code == should be false: > if "L\u00F6wis" == "Lo\u0308wis": > print "Python is Unicode conforming in this respect." > On 6/7/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote: > > I think the default case should be that text operations produce the > > expected result in the text domain, even at the expense of array > > invariants. (There was confusion -- an explicit escape such as \u probably stands out enough to signal the non-default case. But even there, it would also be reasonable to say "use something other than text.") > > People who need arrays of code points have several ways to > > get them, and the usual comparison operators will work on them > > as desired. > But regexps and other string operations won't, and those are the > whole point of strings, (I was thinking that regexps would actually take an buffer interface, but...) How would you expect them to work on arrays of code points? What sort of answer should the following produce? # matches by codepoints, but doesn't look like it "LoĴwis".startswith("Lo") # if the above did match, then people will assume ö folds to o "L�F6wis".startswith("Lo") # looks like it matches. Matches as text. Does not match as bytes. "LoĴwis".startswith("L�F6") -jJ _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com