Oh boy, this is going to be a big post. Responding to everyone in turn: *@Aaron:*
> Nonlinear, AC pattern matching is NP complete. Linear AC pattern matches > can be found in polynomial time. > > Interesting. Why is that? > Joachim got it right, having each match constrained by other matches, and a (potentially) combinatorially large set of available match combinations makes the nonlinear, associative-commutative problem much harder. This is a good paper on the subject: http://www.sciencedirect.com/science/article/pii/S0747717187800275 >The question is, do we want the most general match first, or the > > most specific? > > Whichever one is the correct behavior for a dispatch system. I *think* > that means specific, but I could be getting it backwards. > > I'm assuming that the whole point of having a set of patterns rather > than a single pattern at a time is so you can do this. > That's exactly the purpose. In that case, matching on constants rather than variables is the desired behavior (more specific patterns first, then more general). I'll think about ways to make this configurable, but right now I'm more interested in making it *work*. *@Francesco* That's cool. Is it addressing this issue? > https://github.com/sympy/sympy/issues/7748 Didn't know we had an issue on this, but yes - in part. The main purpose behind this was originally to get a good matcher for tiling the expression tree into computations for code generation, as there isn't necessarily a 1-to-1 match between nodes and C/Fortran functions. But since I had to write it for that, it made sense to make it general so it can be used everywhere else it's needed. You mean expressions like *Add(x, x, evaluate=False)* ? I think there > should be a way to check where an expression is in a canonical order. No, I mean anything where a variable occurs in a pattern more than once. So even sin(x)*cos(x), where x is a variable is nonlinear. Maybe it is preferrable that the first lazily generated substitution > appears in the same order as Mathematica's pattern matching. It would make > Mathematica's code more easily portable to SymPy. I think that would be incredibly hard to do. Especially since I have no access to or experience with mathematica. It all depends on the ordering algorithm they use for generating the potential matches. What kind of AST are you matching? It only depends on a preorder-traversal structure. To use the code right now, one needs to define a class that has 4 methods - `current`, `sub_terms`, `next`, and `skip`, which give the current term, its subterms, advance the preorder-traversal, and skip over a branch, respectively. This should allow it to be used with any tree-like term. There might be a better abstraction, but that's what I have right now in my prototype. One last complication: have you had any thought of how this could be > extended to an AST node containing a PermutationGroup instance determining > how its arguments commute/anticommute? That could have some applications in > the tensor module, but I guess it's hard or impossible to find a polynomial > algorithm for that. > I'm not sure what you mean by that. Can you explain? By the way, are you using Wild objects, or are you specifying wilds by an > extra parameter? > In unify.rewriterule, you just put in two expressions, and specify which > ones are variables (i.e. wilds) in another parameter. Furthermore, there is > an option to put a filtering lambda function and assumptions, so that > matches should be compatible with a test function and/or SymPy's new-style > assumptions. I'm using a similar method to the unification code. So far only "match-all" variables are supported - once it works it should be simple to modify the post-processing step to match on type/assumptions. *@Matt* Awesome, thanks for doing this. Well, thanks to you as well. This all started with the flatterms paper you sent me, along with the (ridiculously) long list of related papers to it I found. Another major use case is selecting the "best" match. In this case we > might also want to control the order in which matches come out of the > sequence so that matches that we think are best come first. In my ideal world I might be able to decide the order in which we traverse > the tree of possible matches. I don't know exactly how your algorithm > works but, in some algorithms, we obtain a set of partial matches at each > step and then have to decide which branch to pursue first. I've found that > accepting a user-provided objective function is useful. It allows the user > to specify a greedy search strategy. > The algorithm uses a discrimination net to do a "filtering" of the set of patterns, down to those that (potentially) match the given query term. At each node there are only 2 possible choices - follow the current constant symbol (if such a branch exists), or follow the variable symbol (all variables are treated as the same variable at this point). Backtracking is then used to eventually traverse all possible branches and get the full set of matches. Given that there are only two choices at each node, a user defined function would either say "follow variables first" or "follow constants first". After a the set of matches for the term is formed, then a matching substitution is determined. For non AC patterns, determining a matchset from the pattern is trivial, and is basically "done" after the discrimination net step. For AC, linear patterns, these can be simplified slightly based on prior known information from the discrimination net step, and then fed into a general, less efficient matching/unification routine. The nonlinear, AC patterns have to go right to this step. The thought behind this process is that most patterns will be removed by the cheap first "filtering" step, leaving only a small subset which require an expensive calculation to determine a matching substitution on. Given the above, a user defined "greedy" algorithm could generate the full set of potential matches, and then order them based on preference before diving into the expensive generation of a matching substitution. This may be a good idea. *@ Joachim* Covered some of your points elsewhere in this post. Is it a basic building block for other algorithms to use? Then favor > predictable performance over features. Something that has unpredictable > performance isn't very attractive to an algorithm author. > > Are you aiming for something that can solve as many situations as > possible? Then aim for more features. > I'm somewhat sceptical whether just pattern matching will be the right > approach, but I suspect you aren't after this one anyway :-) Aiming for a building block. I have my use cases for such a thing (see above), but it should be generally useful elsewhere. From what I understand, pattern matching and rewrite rules are common features in most computer algebra systems (Mathematica, maxima, etc...). The main reason for these questions is that some added space complexity, we can save some notes on the matches in the filtering step, which can then be used to provide a slight speedup in the creation of matching substitions. Or in the case of non-ac patterns, completely solve them. Doing so by default slightly slows down the determination of the set of patterns, but speeds up the creation of matching substitions. *@Richard* I don't know if your AC matcher is intended to be used for (say) > arithmetic expressions > or something else. But if arithmetic -- your stuff also needs to deal > with identities. > In that case if > you don't handle identities, your matcher becomes far less interesting. > Based on what I've read on the subject, I gather by identities you mean: pattern = x*y*z query = a*b Should match with {x: a, y: b, z: 1}, or a different permutation of that, where 1 is the identity for the mulitplication function node. Similarly 0 would be that for addition. Is that correct? I'm a mechanical engineer, not a computer scientist, so this is all a bit foreign to me. The literature accessible to me on Associative-Commutative-Identity many-to-one pattern matching is lacking, or I'm not the best at finding it. After some thinking, the formation of matches should still work mod identities, if they are as I described above. I'm not certain that the lack of this constitutes "far less interesting", but I can see how it could be useful. There's a long history of pattern matching "fast" including work by Richard > Jenks, (Scratchpad, predecessor of Axiom). The papers I've read have been almost exclusively from the theorem proving world. The work of Jim Christian on HIPER and Leo Bachmair on REVEAL specifically, which only cover matching mod AC. I only found one relevant paper by Jenks, and it had only 3 paragraphs on pattern matching (just describing the syntax of it in his system). I'm sure the code in macsyma/maxima would be quite useful, but being as lisp-illiterate as I am, I can't make much use of it. Thanks for the tip though. - Jim -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sympy. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/1de2c998-5e31-4a97-a87a-18410b2c7c40%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
