Okay, some bad news - this might not qualify as a science fair project 
since it doesn't really have an "experiment". My teacher will double-check, 
but he wasn't too sure. However, I would still like to pursue this project 
as it interests me.

On Tuesday, September 4, 2012 11:19:11 AM UTC-7, Aaron Meurer wrote:
>
> Another thing you could look at is what should be done at the parsing 
> stage and what should be done after the parsing.  For example, "2 x", 
> "x y", and "tan x" are all the same syntax as far as the parser is 
> concerned (unless you want to put all predefined names in the grammar 
> itself), but the first two are implicit multiplication and the second 
> is implicit calling.  So maybe those should be parsed to the same 
> object and then differentiated in software somehow. Then comes 
> questions of how to interpret things like "tan x y" (tan(x)*y or 
> tan(x*y), or fail). 
>

Yes, what I was thinking is that there would be a "whitespace expansion" 
step (probably after tokenization) that would convert statements like 2xy 
into "2 x y" and then tokenize again, and then differentiate between those 
syntaxes when constructing some sort of AST.

Another interesting example that I thought of is something like 
> sin^2(x) for sin(x)**2 (the former is common notation for this, and 
> indeed SymPy even pretty prints it that way).  To parse the one like 
> the other would require changing the precedence order, as it normally 
> would be parsed as sin^(2(x)).  So you might think of ways to make 
> that work, and whether those ways work at the parsing stage, the 
> post-parsing stage, or both. 
>

Okay, so that's another thing to keep in mind - I'll have to compile a list 
of allowed syntactical elements sometime.

So what I would do is try things in order of easiest to hardest (and 
> natural language heuristics are one of the hardest), and stop working 
> when you either run out of time or feel that you've done enough. You 
> almost certainly won't get to do it all, but it's not clear just how 
> far you will get, so set yourself up to do as much as you can. 
>
> By the way, the standard library tokenize module is exactly the same 
> as the parser in SymPy, except we've extended ours to do some other 
> stuff (e.g., parse "x!" as factorial(x), wrap all undefined names in 
> Symbol, wrap all number literals in Integer or Float, etc.).  So for 
> the parts that are just extending tokenize, you should put it there. 
> For the rest, it should go in the parsing module (another good thing 
> to think about by the way is a good way of organizing the parsing 
> code; that was discussed a little bit on that other thread). 
>

Alright, I'll keep this in mind as I work on an API.

David Li
 

>
> Aaron Meurer 
>
> On Tue, Sep 4, 2012 at 3:29 AM, Joachim Durchholz 
> <[email protected]<javascript:>> 
> wrote: 
> > Am 04.09.2012 00:11, schrieb David Li: 
> > 
> >> So perhaps some heuristic for differentiating 
> >> between various input languages and then interpreting them as Python 
> >> (Python, TeX, "English-like", etc.) could also be an interesting task. 
> > 
> > 
> > Heh. That's simple: 
> > - Have a grammar for each syntax that we have, 
> > - run the input through all grammars, 
> > - use the grammar that doesn't return an error. 
> > 
> > The fun begins when considering the following cases: 
> > 1) No grammar matches. 
> > 2) More than one grammar matches. 
> > 
> > For (1), you'd want to somehow rank the grammars according to how close 
> the 
> > input is to each grammar, and assume the user really meant the closest 
> one. 
> > 
> > For (2), you'd want to check if the different grammars all really mean 
> the 
> > same. E.g. "1*1" should parse the same for all math grammars. Just 
> continue 
> > processing. 
> > Otherwise, you'll have to ask the user. Or randomly guess one and let 
> the 
> > user explicitly select grammars. 
> > 
> > There's also a slight complication for case (2): You may get different 
> parse 
> > trees but they'd boil down to the same operations. For examples, 
> grammars 
> > with different numbers of precedence levels tend to end up that way; 1*2 
> > could end as 
> > 
> > op: * 
> >   int: 1 
> >   int: 2 
> > 
> > or as 
> > 
> > op: * 
> >   literal 
> >     int: 1 
> >   literal 
> >     int: 1 
> > 
> > where the second grammar would for some reason differentiate between 
> > literals, names, and other representations, where the first does not. 
> > 
> > You'll either need a pass that normalizes grammars, or require that 
> > commonalities between grammars are handled by identical rules. 
> > The first approach probably requires less work because SymPy already has 
> > routines for simplifying expressions; however, that makes error 
> reporting 
> > more difficult because the transformations aren't built for keeping 
> track of 
> > input line/column numbers. 
> > 
> > You see, there's enough to do :-) 
> > 
> > Not all aspects need to be addressed on the first round though. Just 
> choose 
> > how much of this all you want to deal with, and code in a way that the 
> rest 
> > can be added later without rewriting everything. 
> > 
> > 
> >> Since Gamma only deals with mathematical expressions (which is more 
> >> limited 
> >> than Wolfram|Alpha) I believe at least some basic English-like queries 
> can 
> >> be interpreted. 
> > 
> >> ... 
> > 
> >> Given how 
> >> difficult it is, though, I guess just being able to interpret 2x, sin 
> >> x, and integral of x^2 would be a nice step up in functionality. 
> > 
> > Indeed, that's easy enough. You can always write a grammar that accepts 
> a 
> > subset of English. 
> > Main points: 
> > - Do not require parentheses for function parameters; a function call is 
> > just: name {expr} 
> > - Make name {expr} bind weaker than all operators, so sin x+y is 
> equivalent 
> > to sin (x+y). 
> > 
> > 
> >> I should've been more specific about that. I thought that 
> >> 
> >> natural language could help somewhat with the task, or at least point 
> me 
> >> towards algorithms and ideas, which is why I mentioned it. 
> > 
> > 
> > That wouldn't have worked. Parsing natural language is really hard. And 
> the 
> > algorithms beyond parsing aren't related much to natural language. 
> > 
> > Still, the natural language parsers should be suitable. 
> > 
> > 
> > -- 
> > You received this message because you are subscribed to the Google 
> Groups 
> > "sympy" group. 
> > To post to this group, send email to [email protected]<javascript:>. 
>
> > To unsubscribe from this group, send email to 
> > [email protected] <javascript:>. 
> > For more options, visit this group at 
> > http://groups.google.com/group/sympy?hl=en. 
> > 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/sympy/-/TwOwBrf0idAJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Reply via email to