Status: Accepted
Owner: smichr
Labels: Type-Enhancement Priority-Medium

New issue 1601 by smichr: match trouble and tutorial
http://code.google.com/p/sympy/issues/detail?id=1601

This is quoted from issue 1429:
     a = Wild('a', exclude=[f(x)])
     b = Wild('b')
     p = a*f(x) + b
     eq1 = sin(x)
     eq2 = f(x) + sin(x)
     eq3 = f(x) + x + sin(x)
     eq4 = x + sin(x)
     print eq1.match(p)
     print eq2.match(p)
     print eq3.match(p)
     print eq4.match(p) # This one does not work

to match or not to match...as I talked this over with asmuerer I found this
to be a very challenging thing to understand. It seems like it should be so
easy to understand this pattern matching. I've been successfully greping
for years, but the behavior above lets you know there is maybe a little
more to it than you first think. And if you try to trace the code to see
what is going on...well, be patient and think recursively. So, after
generating all combinations of expression terms and pattern terms and
seeing which ones failed and which ones succeed and after looking at the
code, some clarity that I hope to preserve for posterity has come out of
the process.

Oh, before we get started, there is expr.match(pattern) syntax and
pattern.matches(expression) syntax which I rememeber the order by looing
for the ES in matchES and think of it being near the "ES" or "ESPRESSION"
(as they say in some parts of the world :)

Here is the expression matching algorithm:
If the pattern isn't the same as the expression then
     1) the pattern is split into wild and non-wild parts
     2) the non-wild parts are moved in a manner (consistent with the
pattern type, e.g. subtracted from a pattern that is an Add) to the
expression. (wild parts are also moved if they have been recognized as a
replacement in the process or if they appear in the expression being
tested.) You may still end up with non-wild parts and wild parts combined:
3 + w*x (where w is a wild symbol) will have the 3 subtracted from the
expression but the x will not also be moved, thus the wild pattern that
will be tested is the w*x.
     3) now for each term in the expression--in the order they appear when
printed (which is the reverse order that they are given by args)--each
piece of the wild pattern (in the same order) is checked to see if it
matches the expression. If it does, that wild term in the pattern is
replaced with what it matched and a test is done to see if the resulting
pattern yet matches the expression. If it does, you are done, otherwise you
test the next wild part of the pattern. If you go through all the patterns
without success, you go on to the next part of the expression and repeat
the tests of the wild parts. And finally, if you haven't successfully
converted the pattern to the expression through matched parts then the
match has failed.

So, what will 3 + w*x match? letting m be what it matches, we can write:
3 + w*x = m
w*x = m - 3
w = (m - 3)/x

Now, if there are not restrictions on w, m can be anything. 3+w*x will
match sin(x)+pi giving w = (sin(x)+pi-3)/x. If you were using the pattern
w*x you were probably didn't want an expression like (sin(x)+pi-3)/x coming
backk for w. If not, you have to put and exclusion on w saying what can't
appear in (m-3)/x. e.g. if you make w = Wild('w', exclude=[x]). Then
neither the sin(x) nor the x can be on the right, so m cannot have x in it.
Not x, not sin(x) nor f(x).diff(x) which all give True when tested with "x
in " as "x in sin(x) == True". So what *can* m be? g*x + 3 where g is
anything not containing x. e.g. sin(y) or 2 or (2+1/y), etc...it can be
anything BUT the 3 must be there for without it, the x will not vanish when
the 3 of the pattern is subtracted and the result is divided by x.

So let's do the same thing with the patterns above:

a = Wild('a', exclude=[f(x)])
b = Wild('b')
p = a*f(x) + b
eq1 = sin(x)
eq2 = f(x) + sin(x)
eq3 = f(x) + x + sin(x)
eq4 = x + sin(x)
print eq1.match(p)
print eq2.match(p)
print eq3.match(p)
print eq4.match(p)

let's write the pattern as w + nf*f(x) where w is wild and nf is wild with
the "not f(x)' exclusion.
w + nf*f(x) = sin(x) = eq1
there are no constant parts, so start testing the patterns. In order to
know what to do next you have to know what order the wild parts will come
out of the pattern. Remember that they are processed in the same order as
they appear when printed, so we test the w first:
does w match sin(x)? yes w = sin(x), now
   replace w with sin(x) and check
       nf*f(x) + sin(x) = f(x) + sin(x), now subtract constants...
       nf*f(x) = 0
       nf = 0/f(x) = 0
       and nf matches so
     nf = 0 and w = sin(x), check
         sin(x) + 0*nf = sin(x) is True and we are done

With less commentary now,
w+nf*f(x) =? f(x) + sin(x) = eq2 [args processed L to R]
     test f(x)
         w =? f(x) YES, replace w with f(x) in pattern; check if equal
             nf*f(x)+f(x) =? f(x) + sin(x)
             nf*f(x) =? sin(x)
             nf =? sin(x)/f(x) NO go to next wild part
         nf*f(x) = f(x)
         nf = 1 YES, replace and check
             w+1*f(x) =? f(x) + sin(x)
             w =? sin(x) YES
                 sin(x)+1*f(x) = f(x) + sin(x) YES and done.
                 ==> w=sin(x) and nf = 1

w+nf*f(x) =? x + f(x) + sin(x) = eq3 [args processedL to R]
     test x
         w =? x YES
             x+nf*f(x) =? x + f(x) + sin(x)
             nf*f(x) =? f(x)+sin(x)
             nf = (f(x)+sin(x))/f(x); NO by exclusion
         nf*f(x) =? x
         nf =? x/f(x) NO by exclusion
     test f(x)
         w=? f(x) YES
             f(x) + nf*f(x) =? x + f(x) + sin(x)
             nf*f(x) =? x + sin(x)
             nf =? (x + sin(x))/f(x) NO by exclusion
         nf*f(x) =? f(x)
         nf = 1 YES
             w+f(x) = x + f(x) + sin(x)
             w = x + sin(x) YES and done
             w ==> x + sin(x) and nf = 1

w+nf*f(x) =? x + sin(x) = eq4 [args processedL to R]
     test x
         w = x YES
             x+nf*f(x) =? x+sin(x)
             nf*f(x) = sin(x)
             nf = sin(x)/f(x) NO by exclusion
         nf*f(x) = x
         nf = x/f(x) NO
     test sin(x)
         w = sin(x) YES
             sin(x)+nf*f(x) = x + sin(x)
             nf*f(x) =? x
             nf = x/f(x) NO
         nf*f(x) = sin(x)
         nf = sin(x)/f(x) NO
     ==> no match

So if you want to know if a pattern will match something, print the two
quantities
and process what you see from left to right as follows:

         match(pattern,expr):
             if pattern == expression:
                 pattern succeeded
             else:
                 move any constants in pattern to expr
                     for term in expr as printed L to R
                         for p in pattern as printed L to R
                             if match(p,term):
                                 replace p in pattern with term
                                 if match(pattern,expr):
                                     pattern succeeded
                     pattern failed

If you've followed this far you might have noticed that basically, having
an exclude on a quantity means that it will have to appear in the
expression unless some other term will have already knocked it out. This
happened in eq1 with the w knocking out the sin(x) so the f(x) didn't have
to match against anything and the pattern succeeded. But in eq4 the x got
processed first and matched to the w; the expr doesn't go to zero, however
and so an f(x) *had* to be present in the remaining expression (or else it
would be impossible for f(x) to divide the expression and cancel out). So
the pattern failed.

That's tedious to think about. And there's a moral to the story. You
*probably* just want to know if an expr matches a certain form, answering a
question like (A) "is it linear?" or (B) "what are the coefficients on
certain terms?"

If you are doing (A) you can make your life a lot easier if you just put
and exclusion on *every* symbol to keep it from matching anything that is
in the non-wild parts of the pattern. If your non-wild terms are all
functions of x then just having exclude=[x] will keep wild patterns from
matching anything with x in it. So if you want to know if an expression is
a+b*x...make a and b exclude x, e.g. Wild('a', exclude =[x]) and the same
for b.  This will match 3, a*x, x+3, a*x+3, but not a+x**2 (since there is
no way to knock out bout x's with the pattern as long as constants exclude
x). You can also make sure your pattern succeeds if you use all wild
characters with no exclusions. But if you do, then it is likely that the
wild patterns will "eat" part of your pattern or give you matches where the
non-wild parts of your pattern are on the rhs--you'll have to test for
that. e.g. u_*x will give u_ = x if matched against x**2.

If you are doing (B) there may be a much faster and simpler way to get what
you want using as_independent(). If you already know the terms that you are
interested in (or a range of terms, say, in a polynomial or differential
equation) then ask for the coefficients of the terms of interest as in:

>>> eq
3 + 2*x + 4*D(f(x), x) + f(x)
>>> eq.coeff(x)
2
>>> eq.coeff(f(x).diff(x))
4

Or you can get them all as in:

>>> for a in eq.args:
...     print a.as_independent(x)
...
(3, 1)
(4, D(f(x), x))
(1, f(x))
(2, x)

For the expression,
     eq=3 + x + f(x)+4*f(x).diff(x)
and the pattern with all non-x wilds excluding f(x)
     pa = u+v*x+w*f(x)+z*f(x).diff(x)

A is about 100 x faster than loop B:

A)
     for a in eq.args:
         a.as_independent(x)
B)
     ok=eq.match(pa)



>>> eq.as_base_exp()
(3 + 2*x + 4*D(f(x), x) + f(x), 1)
>>> eq.as_basic()
3 + 2*x + 4*D(f(x), x) + f(x)
>>> eq.as_coeff_exponent(f(x))
(3 + 2*x + 5*f(x), 0)
>>> eq.as_coeff_exponent((x))
(3 + 2*x + 4*D(f(x), x) + f(x), 0)
>>> eq.as_coeff_exponent(3)
(3 + 2*x + 4*D(f(x), x) + f(x), 0)
>>> eq.as_coeff_factors()
(3, (4*D(f(x), x), f(x), 2*x))
>>> eq.as_coeff_terms()
(1, (3 + 2*x + 4*D(f(x), x) + f(x),))
>>> eq.as_coefficient(x)
>>> eq.as_coefficient(f(x))
>>> eq.as_coefficient(S(2))
>>> eq.as_independent(x)
(3, 2*x + 4*D(f(x), x) + f(x))
>>> eq.as_independent(f(x))
(3 + 2*x, 4*D(f(x), x) + f(x))
>>> eq.as_leading_term()
3 + 2*x + 4*D(f(x), x) + f(x)
>>> eq.as_powers_dict()
{3 + 2*x + 4*D(f(x), x) + f(x): 1}
>>> eq.coeff(f(x))
5
>>> eq.coeff((x))
2
>>> eq
3 + 2*x + 4*D(f(x), x) + f(x)

Perhaps there is interest in fleshing out part of the documentation with
modifications of the above.

/c

--
You received this message because you are listed in the owner
or CC fields of this issue, or because you starred this issue.
You may adjust your issue notification preferences at:
http://code.google.com/hosting/settings

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"sympy-issues" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/sympy-issues?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to