On Oct 15, 10:57 pm, John Machin <[EMAIL PROTECTED]> wrote: > On Oct 16, 2:33 am, Peter Bengtsson <[EMAIL PROTECTED]> wrote: > > > > > In UTF8, \u0141 is a capital L with a little dash through it as can be > > seen in this image:http://static.peterbe.com/lukasz.png > > > I tried this:>>> import unicodedata > > >>> unicodedata.normalize('NFKD', u'\u0141').encode('ascii','ignore') > > > '' > > > I was hoping it would convert it it 'L' because that's what it > > visually looks like. And I've seen it becoming a normal ascii L before > > in other programs such as Thunderbird. > > > I also tried the other forms: 'NFC', 'NFKC', 'NFD', and 'NFKD' but > > none of them helped. > > > What am I doing wrong? > > The character in question is NOT composed (in the way that Unicode > means) of an 'L' and a little slash; hence the concepts of > "normalization" and "decomposition" don't apply. > > To "asciify" such text, you need to build a look-up table that suits > your purpose. unicodedata.decomposition() is (accidentally) useful in > providing *some* of the entries for such a table.
Thank you! That explains it. -- http://mail.python.org/mailman/listinfo/python-list