All,
In my spare time, I've been working on implementing a fast pattern matcher
that accounts for Associative and Commutative symbols. It's going to be a
while before I'm ready to release the code (it needs some serious cleanup),
but as of now it is "partly functional". Some notation:
T = set of known patterns. A trie like data structure is used to preprocess
these patterns to aid in fast matching.
s = expression to be matched
x, y = variables
a, b, c = constants
Currently the matcher can determine what patterns in T match s, in the
presence of associative and commutative symbols, as long as the pattern is
"linear". By linear, I mean no repeated *variable* symbols (`x + y` is
acceptable, `x + x` is not). If no associative and commutative symbols are
present, it can also determine a match substitution to make `s` equivalent
with the pattern.
Some design questions before I continue work:
- Is there any use for just determining what patterns match, without
generating a match substitution? Determining if a match exists is faster
than determining a specific substitution. However, in the presence of
nonlinear patterns a substitution needs to be found to verify the match is
valid. This decision determines how tightly coupled the two algorithms will
be.
- Is there need for nonlinear patterns? I plan to account for them, but
they make the algorithm a bit more complicated. Nonlinear, AC pattern
matching is NP complete. Linear AC pattern matches can be found in
polynomial time.
- In a general case, there can be many substitutions that match a specific
pattern mod AC. For example, given `t = x*y`, and `s = a*b*c`, this can
match with {x: a*b, y: c}, or {x:a, y: b*c}, etc... This is easy to account
for, and can be done lazily. Api-wise, are there cases where generating
more than one matching substitution are needed?
- The algorithm lazily determines matches. Currently preference is given to
matching variables before constants, but it could go the other way around.
i.e. if T = {a + b, x + b}, and s = a + b, then the matcher will find `x +
b` before `a + b`. I'd rather not make this configurable, as it complicates
things. The question is, do we want the most general match first, or the
most specific?
- At the moment, the code is completely removed from dependence on sympy,
and can function on its own. As such, I'm planning on making it a separate
library that can be used by SymPy as a dependency. I understand we were
against that in the past, but this may be changing? I'd rather not maintain
two separate codebases.
- There's a `commutative` assumption in sympy, but there doesn't seem to be
an `associative` one. How can I check if a function/operator is
associative? Does this information need to be kept elsewhere?
- Jim
--
You received this message because you are subscribed to the Google Groups
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit
https://groups.google.com/d/msgid/sympy/42d9511d-e963-416f-9d4e-3e0435461422%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.