New submission from Ammar Askar <am...@ammaraskar.com>:
As was pointed out in https://bugs.python.org/issue33766 there is an edge case in the tokenizer whereby it will implicitly treat the end of input as a newline. The tokenize module in stdlib does not mirror the C code's behavior in this case. tokenizer.c: ~/cpython $ echo -n 'x' | ./python ---------- NAME ("x") NEWLINE ENDMARKER tokenize module: ~/cpython $ echo -n 'x' | ./python -m tokenize 1,0-1,1: NAME 'x' 2,0-2,0: ENDMARKER '' The instrumentation to have the C tokenizer dump out its tokens is mine, can provide a diff to produce that output if needed. ---------- assignee: ammar2 components: Library (Lib) messages: 319934 nosy: ammar2 priority: normal severity: normal status: open title: Tokenize module does not mirror "end-of-input" is newline behavior type: behavior versions: Python 3.8 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue33899> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com