I am creating this post as a courtesy to anyone interested in python's tokenize 
module.

**tl;dr:** Various posts, linked below, discuss a much better replacement for 
untokenize.  Do with it as you will.

This code is very unlikely to be buggy.  *Please* let me know if you find 
problems with it.

**About the new untokenize**

This post: https://groups.google.com/d/msg/leo-editor/DpZ2cMS03WE/VPqtB9lTEAAJ
announces a replacement for the untokenize function in tokenize.py: 
https://github.com/python/cpython/blob/3.8/Lib/tokenize.py

To summarize this post:

I have "discovered" a spectacular replacement for Untokenizer.untokenize in 
python's tokenize library module:

- The wretched, buggy, and impossible-to-fix add_whitespace method is gone.
- The new code has no significant 'if' statements, and knows almost nothing 
about tokens!

As I see it, the only possible failure modes might involve the zero-length line 
0.  See the above post for a full discussion.

**Testing**

This post: https://groups.google.com/d/msg/leo-editor/DpZ2cMS03WE/5X8IDzpgEAAJ 
discusses testing issues.
Imo, the new code should easily pass all existing unit tests.

The new code also passes a new unit test for Python issue 38663: 
https://bugs.python.org/issue38663,
something existing tests fail to do, even in "compatibility mode" (2-tuples) .

Imo, the way is now clear for proper unit testing of python's Untokenize class.

In particular, it is, imo, time to remove compatibility mode.  This hack has 
masked serious issues with untokenize:
https://bugs.python.org/issue?%40columns=id%2Cactivity%2Ctitle%2Ccreator%2Cassignee%2Cstatus%2Ctype&%40sort=-activity&%40filter=status&%40action=searchid&ignore=file%3Acontent&%40search_text=untokenize&submit=search&status=-1%2C1%2C2%2C3

**Summary**

The new untokenize is the way it is written in The Book.

I have done the heavy lifting on issue 38663. Python devs are free to do with 
it as they like.

Your choice will not affect me or Leo in any way. The new code will soon become 
the foundation of Leo's token-oriented commands.

Edward

P.S. I would imagine that tokenize.untokenize is pretty much off most dev's 
radar :-)

This Engineering Notebook 
post:https://groups.google.com/d/msg/leo-editor/aivhFnXW85Q/b2a8GHvEDwAJ
discusses (in way too much detail :-) why untokenize is important to me.

To summarize that post:

Imo, python devs are biased in favor of parse trees in programs involving text 
manipulations.  I assert that the "real" black and fstringify tools would be 
significantly simpler, clearer and faster if they used python's tokenize module 
instead of python's ast module. Leo's own "beautify" and "fstringify" commands 
prove my assertion to my own satisfaction.

This opinion will be controversial, so I want to make the strongest possible 
case. I need to prove that handling tokens can be done simply and correctly in 
all cases. This is a big ask, because python's tokens are complicated.  See the 
Lexical Analysis section of the Python Language Reference.

The new untokenize furnishes the required proof, and does so elegantly.

EKR
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AXZRDUUAUMID2CPP5A24SCG45C4ZDR6C/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to