Re: [Python-3000] String comparison

Jim Jewett Thu, 07 Jun 2007 14:53:51 -0700

On 6/7/07, Rauli Ruohonen <[EMAIL PROTECTED]> wrote:

> ... I will use XML character references to denote code points here.
> Wherever you see such a thing in this e-mail, replace it in your
> mind with the corresponding code point *immediately*. E.g.
> len(r'&#00c5;') == 1, but len(r'\u00c5') == 6.


> In the following code == should be false:

> if "L\u00F6wis" == "Lo\u0308wis":
>     print "Python is Unicode conforming in this respect."

> On 6/7/07, Stephen J. Turnbull <[EMAIL PROTECTED]> wrote:
> > I think the default case should be that text operations produce the
> > expected result in the text domain, even at the expense of array
> > invariants.

(There was confusion -- an explicit escape such as \u probably stands
out enough to signal the non-default case.  But even there, it would
also be reasonable to say "use something other than text.")

> > People who need arrays of code points have several ways to
> > get them, and the usual comparison operators will work on them
> > as desired.

> But regexps and other string operations won't, and those are the
> whole point of strings,

(I was thinking that regexps would actually take an buffer interface, but...)

How would you expect them to work on arrays of code points?  What sort
of answer should the following produce?

    # matches by codepoints, but doesn't look like it
    "Lo&#0308wis".startswith("Lo")

    # if the above did match, then people will assume ö folds to o
    "L&#00F6wis".startswith("Lo")

    # looks like it matches.  Matches as text.  Does not match as bytes.
    "Lo&#0308wis".startswith("L&#00F6")

-jJ
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] String comparison

Reply via email to