Re: [Tutor] encoding question

spir Sun, 05 Jan 2014 02:08:06 -0800

On 01/05/2014 03:31 AM, Alex Kleider wrote:

I've been maintaining both a Python3 and a Python2.7 version.  The latter has
actually opened my eyes to more complexities. Specifically the need to use
unicode strings rather than Python2.7's default ascii.

So-called Unicode strings are not the solution to all problems. Example withyour 'á', which can be represented by either 1 "precomposed" code (unicode codepoint) 0xe1, or ibasically by 2 ucodes (one for the "base" 'a', one for the"combining" '´'). Imagine you search for "Bogotá": how do you know which isreprsentation is used in the text you search? How do you know at all there aremultiple representations, and what they are? The routine wil work iff, bychance, your *programming editor* (!) used the same representation as thesoftware used to create the searched test...

Usually it the case, because most text-creation software use precomposed codes,when they exist, for composite characters. (But this fact just makes the issuemore rare, hard to be aware of, and thus difficult to cope with correctly incode. As far as I know nearly no software does it.)


Denis
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] encoding question

Reply via email to