Re: [sympy] Re: Autolev Parser

Ondřej Čertík Mon, 21 May 2018 12:10:29 -0700


On Mon, May 21, 2018, at 1:07 PM, Ondřej Čertík wrote:
> Hi Nikhil,
> 
> Thanks for the answer. If I understand your points correctly, your main 
> points are:
> 
> * Nobody would want to rewrite the parser
> * Autolev language is simple enough, so AST is not worth the additional 
> complexity / code
> * Converting from CST to AST is not trivial in this case
> 
> I thought about these questions a lot and my view is the following:
> 
> Regarding the first point, people definitely might want to try some 
> other parsers, especially since Autolev is such a simple language. I 
> know for example that Aaron was not 100% convinced that ANTLR is the 
> best way. One disadvantage of ANTLR is that it produces very large 
> parser files. For example even for your simple grammar, the parser has 
> almost 3000 lines 
> (https://github.com/NikhilPappu/autolev-parser/blob/f6d110ff5a8cef3aea4f6357f343491572350393/AutolevParser.py)
>  
> and such big files are not super easy to deal with, for example they 
> grow the sympy git repository a lot, so we might not want to have those 
> checked in. But then people require Java and ANTLR to generate those. 
> And if it is possible to write a simpler parser, for example with 
> pyparsing or something similar, then perhaps in the future we might want 
> to go this route. Note that I personally like and recommend ANTLR, as it 
> gets the job done pretty easily and cleanly, I really like that it keeps 
> the grammar g4 files separate from the code.
> 
> Regarding the second point, If you look at your parser, you have to 
> handle variables and some other stuff, and it is all tied up to how 
> exactly ANTLR represents the nodes, as defined in your g4 file. So if I 
> want to polish / change the g4 file, I have to understand the rest of 
> your compiler. That's actually not so easy. As a particular example, 
> this line 
> https://github.com/NikhilPappu/autolev-parser/blob/f6d110ff5a8cef3aea4f6357f343491572350393/myListener.py#L205
>  
> depends on how exactly you write the rule in the g4 file, so that the 
> 2nd Child is what you expect it to be. If you rework the g4 file, this 
> will break. And since this is used all over the compiler, it's not easy 
> for me to quickly figure out what needs to be changed and checked. If it 
> used AST, then I just have to make sure my changes to the g4 file 
> generate exactly the same AST, and don't have to worry about the rest. 
> In practice, I just have to go to the visitor methods that deal with the 
> old g4 rules, and modify those for the new g4 rules, to generate the 
> same AST. So the AST approach is actually simpler, from this 
> perspective.


One important point that I want to stress is that we want it to be easy for 
other people to contribute. The AST separates the parser from the rest of the 
compiler, so people can contribute to the parser, without modifying or even 
knowing/understanding the rest of the compiler, and vice versa, people can fix 
and work on the compiler (starting from AST) without knowing anything about the 
parser.

Ondrej

> 
> Regarding the third point, I don't quite understand what you meant, as 
> the point of AST is that you lose syntax information that you don't 
> need. If there is some information that you need, it needs to be 
> included in AST.
> 
> For now, I suggest you keep doing what you are doing, but I will try to 
> send a PR with the initial stub for an AST, for this problem, so that 
> you get an idea how it all works.
> 
> No matter however, whether AST is used or not, we will need thorough 
> tests for the grammar, essentially every rule should be tested by tests. 
> So that if we switch to AST later, we can be sure that we didn't break 
> anything.
> 
> Ondrej
> 
> On Fri, May 18, 2018, at 10:39 AM, Nikhil Pappu wrote:
> > 
> > 
> > On Friday, May 18, 2018 at 4:08:29 PM UTC+5:30, Ondřej Čertík wrote:
> > >
> > >
> > >
> > > On Fri, May 18, 2018, at 1:32 AM, Nikhil Pappu wrote: 
> > > > Ondřej, Jason, 
> > > > 
> > > > I have written my first blog post <https://wp.me/p3Sr5x-2> discussing 
> > > the 
> > > > project details and status. 
> > > > Can you please go over it and provide feedback? 
> > >
> > > I think overall it looks very good. It looks like you got a good handle 
> > > of 
> > > ANTLR4. 
> > >
> > > I noticed you are using a listener: 
> > >
> >  
> > I prefer using listeners over visitors. I feel it helps me to focus on the 
> > logic than on the tree traversal. 
> > 
> > >
> > > One thing to consider is whether to directly use the grammar from ANTLR, 
> > > which is called a concrete syntax tree (CST), and construct a sympy 
> > > expression of whatever you want out of it directly. That is what you do 
> > > now 
> > > I think. 
> > >
> >  
> > Yes, I am using the CST approach.
> > 
> > 
> > > The alternative is to first construct an abstract syntax tree (AST) from 
> > > the CST. That way the AST will be the input to the rest of your 
> > > "compiler", 
> > > and it won't matter that we currently use ANTLR to construct it.
> > 
> >  
> > I only have a basic idea of ASTs. I have never implemented them so I do not 
> > have any experience using them.
> > The ANTLR book showcased only the CST approach so I am much more 
> > comfortable with that.
> > I think ANTLR generates something in between a CST and an AST but it does 
> > have a lot of clutter compared to an AST. 
> > I was just looking at some examples of using ASTs with ANTLR in 
> > stackoverflow but most of the examples were quite trivial.
> > The conversion from CST to AST for the Autolev grammar wouldn't be so 
> > trivial if we want to obtain a transformation with minimal loss.
> > Also, I would need to change the parser code quite significantly to deal 
> > with the new tree. 
> 
> 
> 
> > 
> > 
> > Later somebody can decide to write a hand parser, or some other tool. As 
> > > long as it produces the AST, that's all that matters. 
> > 
> > 
> > I'm not sure why someone would want to do that if we already have a working 
> > parser.
> > One might rather look towards changing the actual parser code itself in my 
> > opinion, whether to fix any issues or add new functionality.
> >  
> > 
> > > As it is currently, the CST depends on the exact rules as you write them 
> > > in the ANTLR grammar, and everytime you change the rules, you have to 
> > > rework your compiler. While if you use AST, then you only have to modify 
> > > the code that converts the (modified) CST to AST, but that's it. The rest 
> > > of the compiler uses AST, and so it doesn't need to change. 
> > 
> > 
> > That is an advantage but I don't think we would have to change the grammar 
> > all that much. I have tested the grammar on a lot of inputs and it seems to 
> > parse them just fine.
> >  
> > 
> > > Python, for example, uses this approach. It parses Python code into an 
> > > AST, and so they are free to change the parser any way they like ---- our 
> > > SymPy code that uses the Python AST doesn't need to change. 
> > 
> >  
> > I think that the AST approach would be a much better one in the case of 
> > more complex languages like Python or C.
> > I think this is not as important in this case as Autolev is a much simpler 
> > language. 
> > At its heart it mainly only consists or declarations, assignments and 
> > commands which makes direct parsing using a CST convenient enough.
> > 
> > >
> > >
> > > > Also, when will you be available to discuss things? 
> > >
> > > I am available over email only the next week. 
> > >
> > > Ondrej 
> > >
> > > > 
> > > > -- 
> > > > You received this message because you are subscribed to the Google 
> > > > Groups "sympy" group. 
> > > > To unsubscribe from this group and stop receiving emails from it, send 
> > > > an email to [email protected] <javascript:>. 
> > > > To post to this group, send email to [email protected] 
> > > <javascript:>. 
> > > > Visit this group at https://groups.google.com/group/sympy. 
> > > > To view this discussion on the web visit 
> > > > 
> > > https://groups.google.com/d/msgid/sympy/bc81cd14-ad9b-4503-a639-aa6f673f4c51%40googlegroups.com.
> > >  
> > >
> > > > For more options, visit https://groups.google.com/d/optout. 
> > >
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> > Groups "sympy" group.
> > To unsubscribe from this group and stop receiving emails from it, send 
> > an email to [email protected].
> > To post to this group, send email to [email protected].
> > Visit this group at https://groups.google.com/group/sympy.
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/sympy/d207a582-c67e-4219-8a6f-1d5c6a640ebe%40googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sympy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/1526929776.1412679.1379756688.26D80CC3%40webmail.messagingengine.com.
For more options, visit https://groups.google.com/d/optout.

Re: [sympy] Re: Autolev Parser

Reply via email to