Re: Wow: a second look at asttokens, fstringify and black

2020-01-15 Thread Edward K. Ream
On Wednesday, January 15, 2020 at 11:40:23 AM UTC-5, Edward K. Ream wrote:

> I'll investigate how thoroughly asttokens attaches token lists to nodes. 
I had remembered that only ast nodes for statements have attached tokens, 
but now I am not sure at all about that.

Well, I haven't forgotten everything about asttokens.  This unit test:

def test_compare_tog_vs_asttokens(self):
"""Compare asttokens token lists with TOG token lists."""
tag = 'test_compare_tog_vs_asttokens'
try:
import asttokens
except Exception:
self.skipTest('requires asttokens')
contents = """g.blue('wrote %s' % p.x())"""
expected = """g.blue(f'wrote {p.x()}')"""
contents, tokens, tree = self.make_data(contents, description=tag)
# Dump GOT data.
dump_contents(contents)
dump_tree(tokens, tree)
# Dump asttokens data
print('= asttokens =\n')
atok = asttokens.ASTTokens(contents, parse=False, filename=tag)
atok.mark_tokens(tree)
for node in asttokens.util.walk(tree):
print(f"{node.__class__.__name__:>10} {atok.get_text(node)!s}")

Produces this output:

Contents...

1g.blue('wrote %s' % p.x())

Tree...

parent   lines  nodetokens
==   =  ==
 1..2   0.Module:   newline.15(1:0)
  0.Module1.Expr:
  1.Expr2.Call:
  2.Call 13.Attribute:  op.2=. name.3(blue)
  3.Attribute1  4.Name: id='g'  name.1(g)
  2.Call 15.BinOp: op=% op.7=%
  5.BinOp1  6.Str: s='wrote %s' string.5('wrote %s')
  5.BinOp   7.Call:
  7.Call 18.Attribute:  op.10=. name.11(x) op.12
=( op.13=)
  8.Attribute1  9.Name: id='p'  name.9(p)

= asttokens =

Module g.blue('wrote %s' % p.x())
  Expr g.blue('wrote %s' % p.x())
  Call g.blue('wrote %s' % p.x())
 Attribute g.blue
  Name g
 BinOp 'wrote %s' % p.x()
   Str 'wrote %s'
  Call p.x()
 Attribute p.x
  Name p

As you can see, the TOG class assigns tokens to only one node, which is 
much better than the asttokens way, shown above.

This concludes the round of exploration of asttokens. I'll figure out what 
the next steps are tomorrow.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/425e1563-642a--a6df-cded076a6e22%40googlegroups.com.


Re: Wow: a second look at asttokens, fstringify and black

2020-01-15 Thread Edward K. Ream
On Wednesday, January 15, 2020 at 11:40:23 AM UTC-5, Edward K. Ream wrote:

> I will certainly investigate using asttokens instead of the TOG class. 

I only plan to compare how asttokens attaches token lists to nodes. 
Actually, asttokens uses begin/end token indices instead of actually 
creating variable-length lists of tokens.  The TOG does the same. For the 
comparison I'll change the names of the ivars, so that the AstDumper class 
in leoAst.py will report the token lists created by *asttokens.* 

> "Can the Fstringify class in leoAst.py use ast.walk?"

Clearly, the answer is "yes", because all ast.Binop nodes (and their 
associated tokens) are disjoint. Actually changing the code would be make 
work.

The surprises recorded in this post are *good* news. The work so far has 
> not been wasted:
>
> - It has been intensely enjoyable.
> - It has produced what imo is the best possible definition of token order.
> - It has given me expert's eyes, and created expert-level questions.
>

I forgot to mention the big improvements in TDD that result from using 
pytest and "traditional" unit tests. I have just created #1478 
: deprecate @test 
nodes.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/ca5c7109-1121-4697-a61c-b700358a9a8b%40googlegroups.com.


Re: Discuss: a proposed answer to python issue #33337

2020-01-15 Thread Edward K. Ream
On Wed, Jan 15, 2020 at 2:45 PM Brad  wrote:

This is very interesting work.
>

Thank you.

> As you probably know, the Python core developers are a bit curmudgeonly
> when it comes to suggestions that haven't come from themselves.
>

Hehe. I have some experience with that :-)

Be aware that the responses could focus exclusively on all of the potential
> problems, with the inevitable: "Why don't you put this on PyPI first and
> see how it is received?"
>

An excellent idea, even if it does come from the python devs ;-).  Matt is
working on the packaging.

Thanks for your feedback.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAMF8tS1OnELGp7eL8ou2p6z_nZHHRfy2tVSmr2VTVjAiXWD9-w%40mail.gmail.com.


Re: Discuss: a proposed answer to python issue #33337

2020-01-15 Thread Brad
Edward, 

This is very interesting work. 

As you probably know, the Python core developers are a bit curmudgeonly 
when it comes to suggestions that haven't come from themselves. 
Be aware that the responses could focus exclusively on all of the potential 
problems, with the inevitable: "Why don't you put this on PyPI first and 
see how it is received?"

-Brad

On Wednesday, January 15, 2020 at 12:42:31 AM UTC-7, Edward K. Ream wrote:
>
> On Tue, Jan 14, 2020 at 8:00 PM Matt Wilkie  > wrote:
>
> > you've worked incredibly hard to this point, and it must be really 
> exciting/enticing to be near a point of release and of shouting "hear ye, 
> hear ye" loud enough to attract attention, but don't make noise just yet!
>
> I agree that more work is needed.
>
> At the very least I think leoAst needs to be run on it's own (e.g. not 
>> rely on `import leo ...`), and then have a concrete quick start example or 
>> two like Terry and Btheado mentioned.
>>
>
> A quick-start example is a good idea. The first that comes to mind is the 
> code that would fstringify a file. leoBeautify.py contains three fstringify 
> commands that could be reworked. However, those commands do "too much" in 
> some sense.
>
> The challenge is to create a motivating example, which can be solved with 
> only a few lines of code. That's a big ask, because I imagine anything 
> useful might run to a few pages of code
>
> I think the Python devs (and everyone else) need to be told in more detail 
> why the tool is useful. That involves a discussion of what "unify the token 
> and ast worlds" means, and why it would be useful. That's what I'll discuss 
> in an upcoming post.
>
> Edward
>

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/69f49c9f-0058-4ef9-93f7-91c80abcf81e%40googlegroups.com.


Re: Please review the new docs for #1440

2020-01-15 Thread Edward K. Ream
On Wed, Jan 15, 2020 at 8:54 AM Brian Theado  wrote:

> But in its current form, this example doesn't do anything user visible.
It doesn't return anything and it doesn't display anything. IMO, examples
should as much as possible be standalone and do something the user can see.
That way the user can paste the exact code into a REPL and see something
happen. Or better, the documentation showing the example can also show the
results.

All this is moot, at least for the moment, as explained in my latest post.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAMF8tS28wo402D%3DKsHWUUthon2NH8D3YLU3y2Jxoy4UjZmjpWA%40mail.gmail.com.


Wow: a second look at asttokens, fstringify and black

2020-01-15 Thread Edward K. Ream
This morning I took a second look at asttokens, fstringify and black. What 
I found astounded me.

I had studied the sources asttokens awhile back, but now, looking at the 
sources with expert's eyes, I see that the asttokens people have solved all 
the problems that took me so much work. 

I then looked again at the latest versions of fstringify and black. As 
explained below, there seems to be no way to recover fstringify's history. 
It may have been rewritten using a token-based approach. I may be suffering 
from false memories concerning black. The code is much better than I 
remember, and the git logs show only incremental improvements.

*asttokens*

To repeat, the asttokens code solves all the problems that it took me so 
long to solve:

- They somehow discovered that the tokens list contains the primary data. 
The parse tree is kinda "along for the ride".

- The asttokens traversal is *not* done in token order. MarkTokens.def 
_visit_after_children contains this comment:

  # Note how this doesn't assume that we iterate through children in order 
that corresponds to occurrence in source code.

- The asttokens code must handle what I call "insignificant" tokens with 
special code, just as my TOG class does.
  In particular, there must be special cases for commas and parens.

- The asttokens code must "reassign" parens in some cases, just as in the 
ReassignTokens class in leoAst.py.

- MarkTokens._expand_to_matching_pairs is similar to the match_parens 
function in leoAst.py.

There is a *lot* to like in asttokens:

- asttokens hands astroid trees as well as ast trees.

- asttokens works for both python 2 and 3.

- The visitors are much more compact than the visitors in the TOG class.

*fstringify*

To repeat, fstringify is much different from what I had remembered. To find 
out more, I git cloned the fstringify code.

Alas, gitk is almost useless: apparently, all the code was dumped into the 
v.0.1 rc (#1) version.

gitk transform.py does show one change. transform.py must handle ast.Call 
nodes on the RHS of the ast.Binop. My newly-expert eyes understands exactly 
why the revised are necessary. I made the same mistake myself.

*black*

Again, black is much different from what I had remembered. To find out 
more, I git cloned the black code.

This time there is a full gitk history. This only increases my confusion: 
all code changes appear to be incremental. I'm starting to think I have 
somehow conflated important memories.

*Questions*

1. After studying asttokens again, I asked myself, "Why didn't the 
fstringify and black people use asttokens?"

After studying the fstringify and black again, I'm not sure that either 
fstringify or black needs extra help.

2. "Why doesn't the Fstringify class in leoAst.py use asttokens?"

I will certainly investigate using asttokens instead of the TOG class. I'll 
also investigate how thoroughly asttokens attaches token lists to nodes. I 
had remembered that only ast nodes for statements have attached tokens, but 
now I am not sure at all about that.

3. "Can the Fstringify class in leoAst.py use ast.walk?"

It would seem that it could. The code only needs to handle ast.Binop nodes, 
so traversal order probably doesn't matter!

*Summary*

The asttokens, fstringify and black tools are much different than I 
remember. The fstringify and black tools are under intense development, but 
that may not explain my faulty memories.

The surprises recorded in this post are *good* news. The work so far has 
not been wasted:

- It has been intensely enjoyable.
- It has produced what imo is the best possible definition of token order.
- It has given me expert's eyes, and created expert-level questions.

I must somehow digest today's revelations. The announcement of leoAst.py 
must wait for days, or weeks. It may never happen.

Before doing anything else, I'll thoroughly compare how asttokens and the 
TOG class assigns tokens to nodes. This must be done before I can form even 
tentative conclusions about the TOG class.

All comments welcome.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/546cb8ae-a286-4c1c-a9e7-4e6a9725b827%40googlegroups.com.


Re: Discuss: a proposed answer to python issue #33337

2020-01-15 Thread Edward K. Ream
On Tuesday, January 14, 2020 at 3:00:44 PM UTC-5, Terry Brown wrote:

I wonder if a couple of demos would help
>

I'd like to thank you, Brian and Matt for your comments. Those comments 
have helped me not to make a fool of myself straightaway :-)

Something shocking has just happened. I'll discuss what's in a separate 
thread.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/1693b98c-0748-4562-88ba-183e65acf99a%40googlegroups.com.


Re: Please review the new docs for #1440

2020-01-15 Thread Brian Theado
On Wed, Jan 15, 2020 at 3:57 AM Edward K. Ream  wrote:

> On Tue, Jan 14, 2020 at 7:06 PM Brian Theado 
> wrote:
>
[...]

> The ast module is particularly deficient in this regard. The documentation
> for  ast.walk  is:
>
> " Recursively yield all descendant nodes in the tree starting at *node*
> (including *node* itself), in no specified order. This is useful if you
> only want to modify nodes in place and don’t care about the context."
>
> Hmm. This could be one motivating example. The TOG class inserts
> parent/child links, and TOT.traverse(tree) *is* the token order
> traversal. So a much more valuable version of ast.walk would be:
>
> tog = TokenOrderGenerator()
> tot = TokenOrderTraverser()
> contents, encoding, tokens, tree = tog.init_from_file(filename)
> tot.traverse(tree)
>

But in its current form, this example doesn't do anything user visible. It
doesn't return anything and it doesn't display anything. IMO, examples
should as much as possible be standalone and do something the user can see.
That way the user can paste the exact code into a REPL and see something
happen. Or better, the documentation showing the example can also show the
results.

As it is now, the user would have to define a subclass of TOT in order to
see anything. This is what I was complaining about in another thread. In
order to be usable, at a minimum a 5 line subclass needs to be written. I
proposed that you move the logic from that class into a function which
yields each result. That function could be used both by the TOT class and
by simple for loops (perfect for demo and explanations).

But your response was "You have little chance of changing my mind" :-).
Maybe my suggestion isn't the right approach, but I still think there is
something missing regarding ease-of-use.

[...]

> The injected links will be useful for any tool that wants to modify python
> source code. fstringify and black are two most prominent examples.  Now we
> come to the real motivation. This is the "hole" in the documentation I have
> been talking about.
>
> *Any* tool that wants to modify python text will benefit from having
> *both* a token-level view of the text *and* a parse-tree level view of
> the text. The asttokens package provides this dual view, but only for
> top-level python statements. In effect, the TOG class is a much simpler
> implementation of the asttokens package.
>
> This suggests that some wrapper functions, similar/identical to those in
> the asttokens package, would be useful.
>

If this means you will be able to write simple examples in the form of the
one on the front page of the asttokens package, then it sounds useful.

[...]

Thanks for all the other explanation.

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAO5X8CzfYaG-a3xjN%3DJw_4Az9jSo63BWRe9uk753h%2BxErgOeiQ%40mail.gmail.com.


Re: Please review the new docs for #1440

2020-01-15 Thread Edward K. Ream
On Tue, Jan 14, 2020 at 10:59 PM Brian Theado 
wrote:

I like your second comment
> .
> It is well written and informative. Nice job.
>

Thanks.

> You stress "token order traversal".  How does that contrast the traversal
> provided by the ast library?
>

ast.walk is feeble. Formally, it visits nodes "in no particular order",
which actually means that it visits nodes in a useless order ;-)

In contrast, TOT.traverse (which requires the links injected by the TOG
class), visits nodes in token order, which is the order that will be most
useful for any tool that modifies python source code.

Does your library still traverse the ast nodes in the same order as the ast
> library does, just that with each visited node you will have access to all
> the tokens corresponding to that node?
>

The ast module (ast.walk) defines no order at all.

Furthermore, the TOT.traverse method must run *after* the TOG class has
created all links. So when links *are* available, they are available to all
nodes.

You have, however, put your finger on something that probably should be
documented.

The Fstringify class in leoAst.py replaces many tokens with a single, new,
"fstringified" string.  In the process, it converts all the old tokens to
"do-nothing" tokens. This is easy to do. I'll omit the details, except to
say that all links are adjusted properly.

However, the Orange class, may *increase *the net number of tokens. It
would be possible to readjust links from nodes to tokens, but that would be
messy. Instead, I'll probably replace one or more tokens with an *extension
token*.  Details not clear yet.

Back to the first comment:
>
> A *children* array from each ast node to its children. Order matters!
>
>
> In contrast, how does the ast library link the ast nodes? Not at all?
> Linked, but out of order? Something else?
>

Something else. Some (but not all!) nodes have lineno, col_offset,
end_lineno end_col_offset fields. For further fun, previous versions of
Python did not compute these fields properly.

And this:
>
>- For each token, *token.node* contains the ast.node "responsible" for
>the token.
>- For each ast node, *node.first_i* and *node.last_i* are indices into
>the token list.
>These indices give the range of tokens that can be said to be
>"generated" by the ast node.
>
> Does this imply the user can choose between traversing each token (and
> from the token reach the node) and traversing each node (and from the node
> reach each token)? In what case would the user want to do one over the
> other?
>

Another great question. My long answer to your previous reply mostly
answered this question.

Imo, the *token* view is usually the most natural view for tools that
modify python sources. The more I work with ast trees, the more their
deficiencies become apparent. I suspect the authors of fstringify and black
overvalue parse trees and undervalue tokens.

All the big Ahas involved in creating the TOG class involved realizing that
tokens were essential to help the traversal of the parse tree. I had no
idea that parse trees were so feeble. It was a big big change in my
attitude.

The Fstringify class is implicitly based on tokens, not parse trees. The
changes are made to the token list, and the result is the
tokens_to_string(self.tokens). This is so elegant that readers may not
understand what is going on!

The Orange class will also be based on tokens. This will be much more
obvious in the Orange class because most of the "beautifying" will be done
on tokens. The beautifier converts input tokens to output tokens.

My present plan is to use parse-tree views only for difficult cases:

1. Determine the exact spacing for slices. The present code doesn't have
access to enough context. Using the parse tree should be the easy way to
get that context.

2. Determine how to break long lines. I'm not sure what will happen.
Perhaps a purely token-oriented view will suffice. We shall see.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAMF8tS2iGup_zCEQ_Cv2%3DnCqk-LT8%3DpPMwzQ5-h5z3PKU8Pv3Q%40mail.gmail.com.


Re: Please review the new docs for #1440

2020-01-15 Thread Edward K. Ream
On Tue, Jan 14, 2020 at 7:06 PM Brian Theado  wrote:

In the theory of operation:
>
> "The notion of a *token order traversal* of a parse tree is the
> foundation of this project"
>
> In contrast, what traversal order do parse trees provide?
>

None whatever. Traversals are defined by code, not by the tree itself.

The ast module is particularly deficient in this regard. The documentation
for  ast.walk  is:

" Recursively yield all descendant nodes in the tree starting at *node*
(including *node* itself), in no specified order. This is useful if you
only want to modify nodes in place and don’t care about the context."

Hmm. This could be one motivating example. The TOG class inserts
parent/child links, and TOT.traverse(tree) *is* the token order traversal.
So a much more valuable version of ast.walk would be:

tog = TokenOrderGenerator()
tot = TokenOrderTraverser()
contents, encoding, tokens, tree = tog.init_from_file(filename)
tot.traverse(tree)

How is token order different/better? What does it allow me to do that I
> can't otherwise do with parse trees?
>

Great question. Perhaps I won't need a separate post after all. Here is a
long answer, which boiled down must become part of both the announcement
and the regular docs.

Recall that the python issue deals with deficiencies in ast-related tools.
The opening comment of that issues says: "the built-in AST does not
preserve comments or whitespace;"

This is only a small part of the problem facing anyone who wants to write a
program like fstringify or black:

1. The data in the parse tree does not preserve the *spelling* of comments
and strings. Why, I don't know, but that can't be helped. ast.parse creates
the initial parse trees, and ast.parse can't change in any way because the
ast module is cast in stone.

2. In contrast, the token list is what I have been calling the "ground
truth" of a program. Comment and string tokens *do* preserve spelling. It
is straightforward to recreate the program from the token list. That's what
the tokens_to_string function (in leoAst.py) does.

3. There is, in principle, no *short*, *easy *way to associate tokens with
ast nodes. The TOG class does this in what I firmly believe is the
simplest, clearest possible code. But the TOG class is far from short and
easy.

So the *first *answer your question is: a token order traversal is what
make the TOG class possible.

But why is the TOG class *itself *valuable? What can devs do with it that
they can't already do?

The TOG class inserts links between ast nodes and between nodes and tokens.
These links are what TOG does, and nothing else.

But now you ask, what good are these links? This is what I've never
properly explained.

The injected links will be useful for any tool that wants to modify python
source code. fstringify and black are two most prominent examples.  Now we
come to the real motivation. This is the "hole" in the documentation I have
been talking about.

*Any* tool that wants to modify python text will benefit from having *both*
a token-level view of the text *and* a parse-tree level view of the text.
The asttokens package provides this dual view, but only for top-level
python statements. In effect, the TOG class is a much simpler
implementation of the asttokens package.

This suggests that some wrapper functions, similar/identical to those in
the asttokens package, would be useful.

But I digress. Let me explain why the dual view (tokens *and* ast nodes) is
useful. This is something I've never explained because I started the
project knowing the answer.

*Tokens preserve linear text order. Parse tree define the meaning of those
tokens.*

Mostly, tools like fstringify and black will want to work at the token
level, because that is, or *should be* the most natural way to modify text:
just insert or delete the proper tokens.

Alas, at present, *the fstringify and black tools work at the parse tree
level*, despite *enormous *difficulties in doing so, because sometimes
those tools *must* have access to the meaning provided by the parse trees.

Example 1: (Fstringify) Potential f-strings are found by looking for a
ast.Binop node of a special form: the LHS of the Binop must be a string,
the RHS of the Binop must represent the one or more % specs in the LHS
string.

Example 2: (Black) When splitting a long line, black must analyze the
*meaning* of the corresponding line in significant detail. It needs to do
this because in some cases black must insert parens which did not exist
previously in the program. For example:

a = << a very very long line, possibly continued by the backslash newline
convention >>

black will convert this to:

a = (
   line 1
   line 2...
)

where none of lines line1, line 2, etc contain backslash newlines.

*At present, both the fstringify and black tools are stuck in the "ast
ghetto".*

Much of what these tools do would be much much easier if the token view of
an ast node were available.  For