Re: [sympy] Writing a fast pattern matcher, updates and questions

Richard Fateman Thu, 27 Nov 2014 11:53:05 -0800

There's a long history of pattern matching "fast" including work by Richard 
Jenks, 
(Scratchpad, predecessor of Axiom).  The general scheme is to take a 
collection
of patterns and compile them into a tree form so that partial results from 
pattern 1
can be used to improve speed on pattern 2, etc.


I don't know if your AC matcher is intended to be used for (say) 
 arithmetic expressions
or something else.  But if arithmetic -- your stuff also needs to deal with 
identities.
In that case if
you don't handle identities, your matcher becomes far less interesting.

In addition to Jenks' stuff, you should certainly look  at Mathematica's 
pattern
matching, and perhaps my own matcher in Macsyma/ Maxima  "semantic"
matching.
Good luck
RJF

On Thursday, November 27, 2014 10:43:30 AM UTC-8, Joachim Durchholz wrote:
>
> Am 27.11.2014 um 00:19 schrieb James Crist: 
> > All, 
> > 
> > In my spare time, I've been working on implementing a fast pattern 
> matcher 
> > that accounts for Associative and Commutative symbols. It's going to be 
> a 
> > while before I'm ready to release the code (it needs some serious 
> cleanup), 
> > but as of now it is "partly functional". Some notation: 
> > 
> > T = set of known patterns. A trie like data structure is used to 
> preprocess 
> > these patterns to aid in fast matching. 
> > s = expression to be matched 
> > x, y = variables 
> > a, b, c = constants 
> > 
> > Currently the matcher can determine what patterns in T match s, in the 
> > presence of associative and commutative symbols, as long as the pattern 
> is 
> > "linear". By linear, I mean no repeated *variable* symbols (`x + y` is 
> > acceptable, `x + x` is not). 
>
> That's an important restriction. 
> It's probably also what makes the pattern matcher fast enough to be 
> useful :-) 
>
> > - Is there any use for just determining what patterns match, without 
> > generating a match substitution? Determining if a match exists is faster 
> > than determining a specific substitution. However, in the presence of 
> > nonlinear patterns a substitution needs to be found to verify the match 
> is 
> > valid. This decision determines how tightly coupled the two algorithms 
> will 
> > be. 
>
> See the next remark. 
>
> > - Is there need for nonlinear patterns? I plan to account for them, but 
> > they make the algorithm a bit more complicated. Nonlinear, AC pattern 
> > matching is NP complete. Linear AC pattern matches can be found in 
> > polynomial time. 
>
> This depends on what you're aiming for. 
>
> Is it a basic building block for other algorithms to use? Then favor 
> predictable performance over features. Something that has unpredictable 
> performance isn't very attractive to an algorithm author. 
>
> Are you aiming for something that can solve as many situations as 
> possible? Then aim for more features. 
> I'm somewhat sceptical whether just pattern matching will be the right 
> approach, but I suspect you aren't after this one anyway :-) 
>
> > - In a general case, there can be many substitutions that match a 
> specific 
> > pattern mod AC. For example, given `t = x*y`, and `s = a*b*c`, this can 
> > match with {x: a*b, y: c}, or {x:a, y: b*c}, etc... This is easy to 
> account 
> > for, and can be done lazily. Api-wise, are there cases where generating 
> > more than one matching substitution are needed? 
>
> I'm not sure that such cases will actually come up, but if they do (and 
> I see a significant probability that they will), it would be a pity if 
> your code would have to be rewritten from scratch. 
>
> So I'd recommend doing that, or at least keeping the path to this kind 
> of stuff open if possible. 
>
> > - The algorithm lazily determines matches. Currently preference is given 
> to 
> > matching variables before constants, but it could go the other way 
> around. 
> > i.e. if T = {a + b, x + b}, and s = a + b, then the matcher will find `x 
> + 
> > b` before `a + b`. I'd rather not make this configurable, as it 
> complicates 
> > things. The question is, do we want the most general match first, or the 
> > most specific? 
>
> It's almost a given that anybody who writes an algorithm on top of the 
> pattern matcher will find that the first match found isn't suitable but 
> some other matching order would fit their needs *exactly*. 
>
> You could tackle the complication, or just complete your work and assume 
> somebody else will find a less intrusive way to make it configurable. 
> For that, your code would have to be simple and easily understood - 
> which is something that would make the code more useful in the long term 
> anyway. 
>
> > - At the moment, the code is completely removed from dependence on 
> sympy, 
> > and can function on its own. As such, I'm planning on making it a 
> separate 
> > library that can be used by SymPy as a dependency. I understand we were 
> > against that in the past, but this may be changing? I'd rather not 
> maintain 
> > two separate codebases. 
>
> Heh. I'm seeing SymPy spin-offs of that kind happening lately. 
> I suspect SymPy could profit from rephrasing its "avoid external 
> dependencies" policy, but that's a different discussion, and possibly 
> one that can't be done before some more investigation has been done. 
>
> Regards, 
> Jo 
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/13facac4-a9f4-4057-9891-f130bf208c5b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [sympy] Writing a fast pattern matcher, updates and questions

Reply via email to