Okay, some bad news - this might not qualify as a science fair project since it doesn't really have an "experiment". My teacher will double-check, but he wasn't too sure. However, I would still like to pursue this project as it interests me.
On Tuesday, September 4, 2012 11:19:11 AM UTC-7, Aaron Meurer wrote: > > Another thing you could look at is what should be done at the parsing > stage and what should be done after the parsing. For example, "2 x", > "x y", and "tan x" are all the same syntax as far as the parser is > concerned (unless you want to put all predefined names in the grammar > itself), but the first two are implicit multiplication and the second > is implicit calling. So maybe those should be parsed to the same > object and then differentiated in software somehow. Then comes > questions of how to interpret things like "tan x y" (tan(x)*y or > tan(x*y), or fail). > Yes, what I was thinking is that there would be a "whitespace expansion" step (probably after tokenization) that would convert statements like 2xy into "2 x y" and then tokenize again, and then differentiate between those syntaxes when constructing some sort of AST. Another interesting example that I thought of is something like > sin^2(x) for sin(x)**2 (the former is common notation for this, and > indeed SymPy even pretty prints it that way). To parse the one like > the other would require changing the precedence order, as it normally > would be parsed as sin^(2(x)). So you might think of ways to make > that work, and whether those ways work at the parsing stage, the > post-parsing stage, or both. > Okay, so that's another thing to keep in mind - I'll have to compile a list of allowed syntactical elements sometime. So what I would do is try things in order of easiest to hardest (and > natural language heuristics are one of the hardest), and stop working > when you either run out of time or feel that you've done enough. You > almost certainly won't get to do it all, but it's not clear just how > far you will get, so set yourself up to do as much as you can. > > By the way, the standard library tokenize module is exactly the same > as the parser in SymPy, except we've extended ours to do some other > stuff (e.g., parse "x!" as factorial(x), wrap all undefined names in > Symbol, wrap all number literals in Integer or Float, etc.). So for > the parts that are just extending tokenize, you should put it there. > For the rest, it should go in the parsing module (another good thing > to think about by the way is a good way of organizing the parsing > code; that was discussed a little bit on that other thread). > Alright, I'll keep this in mind as I work on an API. David Li > > Aaron Meurer > > On Tue, Sep 4, 2012 at 3:29 AM, Joachim Durchholz > <[email protected]<javascript:>> > wrote: > > Am 04.09.2012 00:11, schrieb David Li: > > > >> So perhaps some heuristic for differentiating > >> between various input languages and then interpreting them as Python > >> (Python, TeX, "English-like", etc.) could also be an interesting task. > > > > > > Heh. That's simple: > > - Have a grammar for each syntax that we have, > > - run the input through all grammars, > > - use the grammar that doesn't return an error. > > > > The fun begins when considering the following cases: > > 1) No grammar matches. > > 2) More than one grammar matches. > > > > For (1), you'd want to somehow rank the grammars according to how close > the > > input is to each grammar, and assume the user really meant the closest > one. > > > > For (2), you'd want to check if the different grammars all really mean > the > > same. E.g. "1*1" should parse the same for all math grammars. Just > continue > > processing. > > Otherwise, you'll have to ask the user. Or randomly guess one and let > the > > user explicitly select grammars. > > > > There's also a slight complication for case (2): You may get different > parse > > trees but they'd boil down to the same operations. For examples, > grammars > > with different numbers of precedence levels tend to end up that way; 1*2 > > could end as > > > > op: * > > int: 1 > > int: 2 > > > > or as > > > > op: * > > literal > > int: 1 > > literal > > int: 1 > > > > where the second grammar would for some reason differentiate between > > literals, names, and other representations, where the first does not. > > > > You'll either need a pass that normalizes grammars, or require that > > commonalities between grammars are handled by identical rules. > > The first approach probably requires less work because SymPy already has > > routines for simplifying expressions; however, that makes error > reporting > > more difficult because the transformations aren't built for keeping > track of > > input line/column numbers. > > > > You see, there's enough to do :-) > > > > Not all aspects need to be addressed on the first round though. Just > choose > > how much of this all you want to deal with, and code in a way that the > rest > > can be added later without rewriting everything. > > > > > >> Since Gamma only deals with mathematical expressions (which is more > >> limited > >> than Wolfram|Alpha) I believe at least some basic English-like queries > can > >> be interpreted. > > > >> ... > > > >> Given how > >> difficult it is, though, I guess just being able to interpret 2x, sin > >> x, and integral of x^2 would be a nice step up in functionality. > > > > Indeed, that's easy enough. You can always write a grammar that accepts > a > > subset of English. > > Main points: > > - Do not require parentheses for function parameters; a function call is > > just: name {expr} > > - Make name {expr} bind weaker than all operators, so sin x+y is > equivalent > > to sin (x+y). > > > > > >> I should've been more specific about that. I thought that > >> > >> natural language could help somewhat with the task, or at least point > me > >> towards algorithms and ideas, which is why I mentioned it. > > > > > > That wouldn't have worked. Parsing natural language is really hard. And > the > > algorithms beyond parsing aren't related much to natural language. > > > > Still, the natural language parsers should be suitable. > > > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "sympy" group. > > To post to this group, send email to [email protected]<javascript:>. > > > To unsubscribe from this group, send email to > > [email protected] <javascript:>. > > For more options, visit this group at > > http://groups.google.com/group/sympy?hl=en. > > > -- You received this message because you are subscribed to the Google Groups "sympy" group. To view this discussion on the web visit https://groups.google.com/d/msg/sympy/-/TwOwBrf0idAJ. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
