Jon Crump wrote: > On Wed, 4 Jul 2007, Kent Johnson wrote: >> First, don't confuse unicode and utf-8. > > Too late ;-) already pitifully confused.
This is a good place to start correcting that: http://www.joelonsoftware.com/articles/Unicode.html >> Second, convert the string to unicode and then title-case it, then >> convert back to utf-8 if you need to: > > I'm having trouble figuring out where, in the context of my code, to > effect these translations. if s is your utf-8 string, instead of s.title(), use s.decode('utf-8').title().encode('utf-8') > In parsing the text file, I depend on > matching a re: > > if re.match(r'[A-Z]{2,}', line) > > to identify and process the place name data. If I translate the line to > unicode, the re fails. I don't know why that is, re works with unicode strings: In [1]: import re In [2]: re.match(r'[A-Z]{2,}', 'ABC') Out[2]: <_sre.SRE_Match object at 0x12078e0> In [3]: re.match(r'[A-Z]{2,}', u'ABC') Out[3]: <_sre.SRE_Match object at 0x11c1f00> Kent _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor