New submission from Ammar Askar <am...@ammaraskar.com>:

As was pointed out in https://bugs.python.org/issue33766 there is an edge case 
in the tokenizer whereby it will implicitly treat the end of input as a 
newline. The tokenize module in stdlib does not mirror the C code's behavior in 
this case.

tokenizer.c:

  ~/cpython $ echo -n 'x' | ./python
  ----------
  NAME ("x")
  NEWLINE
  ENDMARKER

tokenize module:

  ~/cpython $ echo -n 'x' | ./python -m tokenize
  1,0-1,1:            NAME           'x'            
  2,0-2,0:            ENDMARKER      ''

The instrumentation to have the C tokenizer dump out its tokens is mine, can 
provide a diff to produce that output if needed.

----------
assignee: ammar2
components: Library (Lib)
messages: 319934
nosy: ammar2
priority: normal
severity: normal
status: open
title: Tokenize module does not mirror "end-of-input" is newline behavior
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33899>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to