New submission from Pekka Klärck <pekka.kla...@gmail.com>:

If I have two strings that look the same but have different Unicode form, it's 
very hard to see where the problem actually is:

>>> a = 'hyv\xe4'
>>> b = 'hyva\u0308'
>>> print(a)
hyvä
>>> print(b)
hyvä
>>> a == b
False
>>> print(repr(a))
'hyvä'
>>> print(repr(b))
'hyvä'

This affects, for example, test automation frameworks using `repr()` in error 
reporting. For example, both unittest and pytest report 
`self.assertEqual('hyv\xe4', 'hyva\u0308')` like this:

AssertionError: 'hyvä' != 'hyvä'
- hyvä
+ hyvä

Because the NFC form is used by strings by default, I would propose that 
`repr()` would show the decomposed form if the string is in NFD. In practice 
I'd like `repr('hyva\0308')` to yield `'hyva\0308'`.

----------
messages: 315504
nosy: pekka.klarck
priority: normal
severity: normal
status: open
title: `repr()` of string in NFC and NFD forms does not differ

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33317>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to