Terry J. Reedy <tjre...@udel.edu> added the comment:

You are doing two different things to the original string: normalizing and 
encoding to ascii with errors ignored. Each should be tested separately.
On 3.2:
import unicodedata
s1 = "üfürükçü ağaç ve ıslıkçı çeşme"
s2 =  unicodedata.normalize('NFKD', s1)
print(s2)
print(s2.encode('ascii','ignore'))

#prints
üfürükçü ağaç ve ıslıkçı çeşme
b'ufurukcu agac ve slkc cesme'

The dotless i (==  '\u0131') in s2 does not encode to ascii and is properly 
dropped when the error is ignored.

I believe you are mistaken to think that unicodedata.normalize *should* turn 
turkish letter "ı" == "\u131" into "i". Unicodedata.decomposition("ı") returns 
an empty string, as it should (see below) because that character has no 
decomposition normalization in Unicode 6. So I am closing this issue as invalid.

Here is the entry from
http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt
0131;LATIN SMALL LETTER DOTLESS I;Ll;0;L;;;;;N;;;0049;;0049
That is explained here
http://www.unicode.org/reports/tr44/tr44-6.html#UnicodeData.txt
The blank after 'L' (bidi class - left to right) is for decomposition type and 
mapping. There is none, so unicodedata.decomposition is correct. The last three 
entries are for uppercase, lowercase, and titlecase conversions. Those are 
different from normalizations.

To reinforce this,
http://www.unicode.org/Public/6.0.0/ucd/NormalizationTest.txt
says explicitly
"@Part1 # Character by character test
# All characters not explicitly occurring in c1 of Part 1 have identical NFC, 
D, KC, KD forms."
'c1' is column 1, starting from 1.
In this list, 0130 is followed by 0132, omitting 0131, so the line above 
applies.

After writing this, I discovered that Lib/test/test_normalization.py runs the 
complete test specified in NormalizationTest.txt for code points that have and 
do not have normalization forms.

Side note" Python 2.6 is in security-fix-only mode.

----------
nosy: +terry.reedy
resolution:  -> invalid
status: open -> closed
versions: +Python 2.7, Python 3.2 -Python 2.6

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12846>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to