On 6/8/07, Jim Jewett <[EMAIL PROTECTED]> wrote: > How would you expect them to work on arrays of code points?
Just like they do with Python 2.5 unicode objects, as long as the "array of code points" is str, not e.g. a numpy array or tuple of ints, which I don't expect to grow string methods :-) > What sort of answer should the following produce? That depends on what Python does when it reads in the source code. I think it should normalize to NFC (which Python 2.5 does not do). > # matches by codepoints, but doesn't look like it > "LoĴwis".startswith("Lo") > # if the above did match, then people will assume ö folds to o > "L�F6wis".startswith("Lo") > # looks like it matches. Matches as text. Does not match as bytes. > "LoĴwis".startswith("L�F6") Normalized to NFC: "L�F6;wis".startswith("Lo") "L�F6;wis".startswith("Lo") "L�F6;wis".startswith("L�F6;") After this Python lexes, parses and executes. The first two are false, the last one true. All of the examples should look the same in your editor (at least ideally). The following would, OTOH, be true false false: "Lo\u0308wis".startswith("Lo") "L\u00F6wis".startswith("Lo") "Lo\u0308wis".startswith("L\u00F6") As here the source code is pure ASCII, it's WYSIWYG everywhere. Python 2.5's output with each: >>> u"Löwis".startswith(u"Lo") True >>> u"Löwis".startswith(u"Lo") False >>> u"Löwis".startswith(u"Lö") False >>> u"Lo\u0308wis".startswith(u"Lo") True >>> u"L\u00F6wis".startswith(u"Lo") False >>> u"Lo\u0308wis".startswith(u"L\u00F6") False _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com