On 01/05/2014 03:31 AM, Alex Kleider wrote:
I've been maintaining both a Python3 and a Python2.7 version.  The latter has
actually opened my eyes to more complexities. Specifically the need to use
unicode strings rather than Python2.7's default ascii.

So-called Unicode strings are not the solution to all problems. Example with your 'á', which can be represented by either 1 "precomposed" code (unicode code point) 0xe1, or ibasically by 2 ucodes (one for the "base" 'a', one for the "combining" '´'). Imagine you search for "Bogotá": how do you know which is reprsentation is used in the text you search? How do you know at all there are multiple representations, and what they are? The routine wil work iff, by chance, your *programming editor* (!) used the same representation as the software used to create the searched test...

Usually it the case, because most text-creation software use precomposed codes, when they exist, for composite characters. (But this fact just makes the issue more rare, hard to be aware of, and thus difficult to cope with correctly in code. As far as I know nearly no software does it.)

Denis
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to