[issue40678] Full list of Python lexical rules

Terry J. Reedy Sat, 23 May 2020 00:20:30 -0700


Terry J. Reedy <[email protected]> added the comment:


First note that 3.8.3 grammar.html is stated to be the actual grammar used by 
the old parser, and is a bit different from the more human readable grammar 
given in the reference manual.  It is a bit different in 3.9 and I expect will 
be much more different in 3.10 with the new PEG parser. 

In the grammar, the CAPITALIZED_NAMES are token names returned by the 
tokenizer/lexer.  This is a standard convention.  

I am pretty sure that the human readable lexing rules in lexical_analysis are 
not what the lexer uses.  I presume the latter uses barely readable RE 
expressions, as does the tokenize module.

Compare the float grammar in 
https://docs.python.org/3/reference/lexical_analysis.html#floating-point-literals
 to the float REs in tokenize.py.

def group(*choices): return '(' + '|'.join(choices) + ')'
def maybe(*choices): return group(*choices) + '?'
# The above are reused for multiple REs.
Exponent = r'[eE][-+]?[0-9](?:_?[0-9])*'
Pointfloat = group(r'[0-9](?:_?[0-9])*\.(?:[0-9](?:_?[0-9])*)?',
                   r'\.[0-9](?:_?[0-9])*') + maybe(Exponent)
Expfloat = r'[0-9](?:_?[0-9])*' + Exponent
Floatnumber = group(Pointfloat, Expfloat)

Note that this is (python) code, not a text specification.  You or someone else 
can look at what the C lexer does.  But I think that the proposal should be 
rejected.

----------
nosy: +terry.reedy

_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue40678>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue40678] Full list of Python lexical rules

Reply via email to