Re: [sympy] Writing a fast pattern matcher, updates and questions

James Crist Thu, 27 Nov 2014 19:49:50 -0800

Oh boy, this is going to be a big post. Responding to everyone in turn:

*@Aaron:*



> Nonlinear, AC pattern matching is NP complete. Linear AC pattern matches 

> can be found in polynomial time. 
>
> Interesting. Why is that? 
>

Joachim got it right, having each match constrained by other matches, and a 
(potentially) combinatorially large set of available match combinations 
makes the nonlinear, associative-commutative problem much harder. This is a 
good paper on the subject: 
http://www.sciencedirect.com/science/article/pii/S0747717187800275


>The question is, do we want the most general match first, or the 
> > most specific? 
>
> Whichever one is the correct behavior for a dispatch system. I *think* 
> that means specific, but I could be getting it backwards. 
>
> I'm assuming that the whole point of having a set of patterns rather 
> than a single pattern at a time is so you can do this. 
>

That's exactly the purpose. In that case, matching on constants rather than 
variables is the desired behavior (more specific patterns first, then more 
general). I'll think about ways to make this configurable, but right now 
I'm more interested in making it *work*.


*@Francesco*


That's cool. Is it addressing this issue?
> https://github.com/sympy/sympy/issues/7748


Didn't know we had an issue on this, but yes - in part. The main purpose 
behind this was originally to get a good matcher for tiling the expression 
tree into computations for code generation, as there isn't necessarily a 
1-to-1 match between nodes and C/Fortran functions. But since I had to 
write it for that, it made sense to make it general so it can be used 
everywhere else it's needed.

You mean expressions like *Add(x, x, evaluate=False)* ? I think there 
> should be a way to check where an expression is in a canonical order.


No, I mean anything where a variable occurs in a pattern more than once. So 
even sin(x)*cos(x), where x is a variable is nonlinear.


Maybe it is preferrable that the first lazily generated substitution 
> appears in the same order as Mathematica's pattern matching. It would make 
> Mathematica's code more easily portable to SymPy.


I think that would be incredibly hard to do. Especially since I have no 
access to or experience with mathematica. It all depends on the ordering 
algorithm they use for generating the potential matches.

What kind of AST are you matching?


It only depends on a preorder-traversal structure. To use the code right 
now, one needs to define a class that has 4 methods - `current`, 
`sub_terms`, `next`, and `skip`, which give the current term, its subterms, 
advance the preorder-traversal, and skip over a branch, respectively. This 
should allow it to be used with any tree-like term. There might be a better 
abstraction, but that's what I have right now in my prototype.

One last complication: have you had any thought of how this could be 
> extended to an AST node containing a PermutationGroup instance determining 
> how its arguments commute/anticommute? That could have some applications in 
> the tensor module, but I guess it's hard or impossible to find a polynomial 
> algorithm for that.
>

I'm not sure what you mean by that. Can you explain?

By the way, are you using Wild objects, or are you specifying wilds by an 
> extra parameter?
> In unify.rewriterule, you just put in two expressions, and specify which 
> ones are variables (i.e. wilds) in another parameter. Furthermore, there is 
> an option to put a filtering lambda function and assumptions, so that 
> matches should be compatible with a test function and/or SymPy's new-style 
> assumptions.


I'm using a similar method to the unification code. So far only "match-all" 
variables are supported - once it works it should be simple to modify the 
post-processing step to match on type/assumptions. 

 
*@Matt*

Awesome, thanks for doing this.


Well, thanks to you as well. This all started with the flatterms paper you 
sent me, along with the (ridiculously) long list of related papers to it I 
found.

Another major use case is selecting the "best" match.  In this case we 
> might also want to control the order in which matches come out of the 
> sequence so that matches that we think are best come first.


In my ideal world I might be able to decide the order in which we traverse 
> the tree of possible matches.  I don't know exactly how your algorithm 
> works but, in some algorithms, we obtain a set of partial matches at each 
> step and then have to decide which branch to pursue first.  I've found that 
> accepting a user-provided objective function is useful.  It allows the user 
> to specify a greedy search strategy.
>

The algorithm uses a discrimination net to do a "filtering" of the set of 
patterns, down to those that (potentially) match the given query term. At 
each node there are only 2 possible choices - follow the current constant 
symbol (if such a branch exists), or follow the variable symbol (all 
variables are treated as the same variable at this point). Backtracking is 
then used to eventually traverse all possible branches and get the full set 
of matches. Given that there are only two choices at each node, a user 
defined function would either say "follow variables first" or "follow 
constants first".

After a the set of matches for the term is formed, then a matching 
substitution is determined. For non AC patterns, determining a matchset 
from the pattern is trivial, and is basically "done" after the 
discrimination net step. For AC, linear patterns, these can be simplified 
slightly based on prior known information from the discrimination net step, 
and then fed into a general, less efficient matching/unification routine. 
The nonlinear, AC patterns have to go right to this step. 

The thought behind this process is that most patterns will be removed by 
the cheap first "filtering" step, leaving only a small subset which require 
an expensive calculation to determine a matching substitution on.

Given the above, a user defined "greedy" algorithm could generate the full 
set of potential matches, and then order them based on preference before 
diving into the expensive generation of a matching substitution. This may 
be a good idea.


*@ Joachim*

Covered some of your points elsewhere in this post.

Is it a basic building block for other algorithms to use? Then favor 
> predictable performance over features. Something that has unpredictable 
> performance isn't very attractive to an algorithm author. 
>
> Are you aiming for something that can solve as many situations as 
> possible? Then aim for more features. 
> I'm somewhat sceptical whether just pattern matching will be the right 
> approach, but I suspect you aren't after this one anyway :-) 


Aiming for a building block. I have my use cases for such a thing (see 
above), but it should be generally useful elsewhere. From what I 
understand, pattern matching and rewrite rules are common features in most 
computer algebra systems (Mathematica, maxima, etc...). The main reason for 
these questions is that some added space complexity, we can save some notes 
on the matches in the filtering step, which can then be used to provide a 
slight speedup in the creation of matching substitions. Or in the case of 
non-ac patterns, completely solve them. Doing so by default slightly slows 
down the determination of the set of patterns, but speeds up the creation 
of matching substitions.

*@Richard*

I don't know if your AC matcher is intended to be used for (say) 
>  arithmetic expressions
> or something else.  But if arithmetic -- your stuff also needs to deal 
> with identities.
> In that case if
> you don't handle identities, your matcher becomes far less interesting.
>

Based on what I've read on the subject, I gather by identities you mean:

pattern = x*y*z
query =  a*b

Should match with {x: a, y: b, z: 1}, or a different permutation of that, 
where 1 is the identity for the mulitplication function node. Similarly 0 
would be that for addition. Is that correct? I'm a mechanical engineer, not 
a computer scientist, so this is all a bit foreign to me.

The literature accessible to me on Associative-Commutative-Identity 
many-to-one pattern matching is lacking, or I'm not the best at finding it. 
After some thinking, the formation of matches should still work mod 
identities, if they are as I described above. I'm not certain that the lack 
of this constitutes "far less interesting", but I can see how it could be 
useful.

There's a long history of pattern matching "fast" including work by Richard 
> Jenks, (Scratchpad, predecessor of Axiom).


The papers I've read have been almost exclusively from the theorem proving 
world. The work of Jim Christian on HIPER and Leo Bachmair on REVEAL 
specifically, which only cover matching mod AC. I only found one relevant 
paper by Jenks, and it had only 3 paragraphs on pattern matching (just 
describing the syntax of it in his system). I'm sure the code in 
macsyma/maxima would be quite useful, but being as lisp-illiterate as I am, 
I can't make much use of it. Thanks for the tip though.

- Jim

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/1de2c998-5e31-4a97-a87a-18410b2c7c40%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [sympy] Writing a fast pattern matcher, updates and questions

Reply via email to