[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

Jim J. Jewett Tue, 16 Nov 2021 15:52:32 -0800

Compatibility variants can look different, but they can also look identical.  
Allowing any non-ASCII characters was worrisome because of the security 
implications of confusables.  Squashing compatibility characters seemed the 
more conservative choice at the time.  Stestagg's example:
    е = lambda е, e: е if е > e else e
shows it wasn't perfect, but adding more invisible differences does have risks, 
even beyond the backwards incompatibility and the problem with (hopefully rare, 
but are we sure?) editors that don't distinguish between them in the way a 
programming language would prefer.


I think (but won't swear) that there were also several problematic characters 
that really should have been treated as (at most) glyph variants, but ... 
weren't.  If I Recall Correctly, the largest number were Arabic presentation 
forms, but there were also a few characters that were in Unicode only to 
support round-trip conversion with a legacy charset, even if that charset had 
been declared buggy.  In at least a few of these cases, it seemed likely that a 
beginning user would expect them to be equivalent.

-jJ
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/GNT3AG2SCVLMCJAZXSTIWFKKAYG25E7O/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Preventing Unicode-related gotchas (Was: pre-PEP: Unicode Security Considerations for Python)

Reply via email to