New submission from Matt Bachmann:

PEP 3131 changed the definition of valid identifiers to match this pattern

<XID_Start> <XID_Continue>* .

Currently if you have an invalid character in an identifier you get this error

☺ = 4
SyntaxError: invalid character in identifier


This is fine in most cases. But in some cases the problem is not the character 
is invalid so much as the character may not be used to START the identifier. 
One example of this is the "combining grave accent" which is an XID_CONTINUE 
character but not an XID_START

So ̀e is an invalid identifier but è is a valid identifier. So the ̀ character 
is not invalid in all cases.

The attached patch attempts to clarify this by providing a different error when 
the start character is invalid.

>>> ̀e = 4
  File "<stdin>", line 1
    ̀e = 4
     ^
SyntaxError: invalid start character in identifier

However, if the character is simply not allowed (as it is neither an XID_START 
or an XID_CONTINUE character) the original error is used.
>>> ☺smile = 4
  File "<stdin>", line 1
    ☺smile = 4
         ^
SyntaxError: invalid character in identifier

----------
components: Unicode
files: clarify_unicode_identifier_errors.patch
keywords: patch
messages: 234222
nosy: Matt.Bachmann, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: Python 3 gives misleading errors when validating unicode identifiers
type: enhancement
versions: Python 3.2, Python 3.3, Python 3.4, Python 3.5, Python 3.6
Added file: 
http://bugs.python.org/file37755/clarify_unicode_identifier_errors.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23263>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to