Terry J. Reedy <tjre...@udel.edu> added the comment:

'Reserved words' include all double underscore words, like __reserved__.  Using 
such is allowed, but we reserve the right to break such code by adding a use 
for the word.  'def' is a keyword.  Using identifier normalization to smuggle 
keywords into compiled code is a clever hack.  But I am not sure that there is 
an actionable bug anywhere.  

The Unicode normalization rules are not defined by us.  Changing how we use 
them or creating a custom normalization form is not to be done lightly.

Should ast.parse raise?  The effect is the same as "globals()['𝕕𝕖𝕗']=1" (which 
is the same as passing 'def' or anything else that normalizes to it) and that 
in turn allows ">>> 𝕕𝕖𝕗", which returns 1.  Should such identifiers be outlawed?

https://docs.python.org/3/reference/lexical_analysis.html#identifiers says "All 
identifiers are converted into the normal form NFKC while parsing; comparison 
of identifiers is based on NFKC."  This does not say when an identifier is 
compared to the keyword set, before or after normalization.  Currently is it 
before.  Changing this to after could be considered a backwards-incompatible 
feature change that would require a deprecation period with syntax warnings.  
(Do other implementations also compare before normalization?)

Batuhan already quoted https://docs.python.org/3/library/ast.html#ast.unparse 
and I mostly agree with his comments.  The "would produce" part is contingent 
upon the result having no syntax errors, and that cannot be guaranteed.  What 
could be done is to check every identifier against keywords and change the 
first character to a chosen NFKD equivalent.  Although 'fixing' the ast this 
way would make unparse seem to work better succeed in this case, there are 
other fixes that might also be suggested for the same reason. 

Until this is done in CPython, anyone who cares could write an AST visitor to 
make the same change before calling unparse.  Example code could be attached to 
this issue.

----------
nosy: +terry.reedy
title: `ast.unparse` produces syntactically illegal code for identifiers that 
look like reserved words -> ast.unparse produces bad code for identifiers that 
become keywords

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue46520>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to