[Python-Dev] Re: A much better tokenize.untokenize function

Terry Reedy Sun, 03 Nov 2019 14:10:47 -0800

On 11/3/2019 11:12 AM, Edward K. Ream wrote:

I am creating this post as a courtesy to anyone interested in python's tokenize 
module.

As one of the 46 contributors to this module, and as one who fixedseveral untokenize bugs a few years ago, I am interested.

> **tl;dr:** Various posts, linked below, discuss a much betterreplacement for untokenize.If that were true, I would be interested. But as explained below, Idon't believe it. Even if I did, https://bugs.python.org/issue38663gives no evidence that you have signed the PSF contributor agreement.In any case, it has no PR. We only use code that is actuallycontributed on the issue or in a PR under that agreement.


To continue, the first two lines of tokenize.untokenize() are
    ut = Untokenizer()
    out = ut.untokenize(iterable)

Your leoBeautify.Untokenize class appears to be completely unsuited as areplacement for tokenize.Untokenizer as the API for the class and methodare incompatible with the above.

1. untokenize.Untokenizer takes no argument. leoBeautify.Untokenize()requires a 'contents' argument, a (unicode) string, that is otherwiseundocumented. At first glance, it appears that 'contents' needs to besomething like the desired output. (I could read the code where youcall Untokenizer to improve my guess, but not now.) Since our exisingtests do not pass 'contents', they should all fail.

2. untokenize.Untokenizer.untokenize(iterable) require an iterable thatreturns "sequences with at least two elements, the token type and thetoken string."

https://docs.python.org/3/library/tokenize.html#tokenize.untokenize

One can generate python code from a sequence of pairs with a guaranteethat the resulting code will be tokenized by the python.exe parser intothe same sequence.


The doc continues "Any additional sequence elements are ignored."

The intent is that a tool can tokenize a file, modify the file (andthereby possibly invalidate the begin, end, and line elements of theoriginal token stream) and generate a modified file.

[Note that the end index (4th element), when present, is not ignored butis used to improve white space insertion. I believe that this should bedocumented. What if the end index is no longer valid? Should be alsouse the start index?]

leoBeautify.Untokenize.untokenize() requires an iterable of 5-tuples.It makes uses of both the start and end elements, as well as themysterious required 'contents' string.

> I have "discovered" a spectacular replacement forUntokenizer.untokenize in python's tokenize library module:To pass 'code == untokenize(tokenize(code))' (ignoring api details),there is an even more spectacular replacement: rebuild the code from the'line' elements. But while the above is an essential test, it is a toyexample with respect to applications. The challenge is to create acorrect and valid file from less information, possibly with only tokentype and string. (The latter is 'compatibility mode'.)

> In particular, it is, imo, time to remove compatibility mode.

And break all usage that requires it? Before doing much more withtokenize, I would want to understand more how it is actually used.

> Imo, python devs are biased in favor of parse trees in programsinvolving text manipulations. [snip]So why have 46 of us contributed to this one module? This sort ofpolemic is a net negative here. We a multiple individuals with differingopinions.


--
Terry Jan Reedy
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/GIXYAMZOZVBK7KODW6PR6QCEAA3ZSDAT/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: A much better tokenize.untokenize function

Reply via email to