Re: [sympy] Faster symbolics

Aaron Meurer Mon, 18 Apr 2022 15:55:17 -0700

I think the only way is to move SymPy away from its current approach
of automatically evaluating things when expressions are constructed.
To take a concrete example from
https://github.com/sympy/sympy/issues/10800, consider this expression:

a = 2*Integral(sin(exp(x)), (x, 1, oo))+2*Si(E)

This expression currently causes evalf() to hang (which is of course,
a separate issue, but it's useful to use it to just get an idea of
where evalf() is overused).

Try doing almost anything with this expression and you'll see where
evalf() is being used. You can't even print the expression, because
the printer wants to numerically evaluate every numerical
subexpression so that they can be printed in order. You can't create
the expression a > 0 because > wants to automatically evaluate to True
or False when the arguments are numeric. This expression is numeric
(it is actually equal to pi, as doit() will reveal), but just because
something is known to be a number doesn't mean that we can efficiently
compute that number.

Obviously, the best "fix" here is to make it so that a.evalf()
actually computes quickly. But even if it computed a value, it would
still take some time to do that. There's no reason why the printers
really need to evaluate expressions just to sort them. And there's no
reason why > needs to automatically evaluate. If a user creates a >
expression and wants to know if it can be simplified they can call
simplify() or doit().

This is just one example. This sort of thing happens all the time with
assumptions, and with other sorts of automatic simplifications. We
could try to come up with a design where things can evaluate, but only
in cases where they involve only minimal calculations like you
suggested. For example, assumptions might automatically evaluate
simple deductions like positive -> real, but bypass complex things
that require computation like _eval_is_real. This would be complicated
to implement. Better would be to use a design where calculation is not
expected to be done automatically in the first place. This represents
a pretty big change in the way SymPy works, though.

If you look at the matrix expressions, they are close to the sort of
design we should be aiming for. If you create any expression with
matrix expressions, it is completely unevaluated, and only when you
call doit() does it attempt any simplifications, even really basic
ones (with the exception of shape checking, which already might be too
much).

>>> A = MatrixSymbol("A", n, n)
>>> MatAdd(A, A)
A + A
>>> MatAdd(A, A).doit()
2*A

Note that this is different from calling +, which calls doit() automatically

>>> A + A
2*A

The important difference is that an algorithm that manipulates matrix
expressions which pulls them apart and rebuilds them with args will
effectively create the class directly using MatAdd, not +, so the
evaluation will not happen there. Those algorithms are where the
performance difference between evaluating and not evaluating will
matter the most.

Ideally a class like Add would not even have a __new__ method, so that
it is not possible for it to evaluate directly in its constructor.
Instead all evaluation would happen in a different method like doit().
Certain evaluations like x + x -> 2*x would still happen when using +
so that the end-user experience is still practical.

This would also have the side effect of making unevaluated expressions
easier to work with. Right now you have to create them with things
like evaluate=False, and they are buggy when you use them. If they
were directly supported as the "default" way that expressions work,
then it would be straightforward to use them, and every function would
handle them correctly.

If we don't remove the automatic evaluations then any other approach
will always be an uphill battle to get performance, because for any
automatic simplification you can construct expressions that make it
too slow, and because they happen automatically, it will completely
remove any other performance gains you might have had.

Aaron Meurer

On Mon, Apr 18, 2022 at 8:29 AM S.Y. Lee <[email protected]> wrote:
>
> How could the future of the fast expression evaluation be suggested?
> Would it be making the assumption system faster?
> Or would it be choosing carefully between 'easy assumptions' and 'hard 
> assumptions' with time complexity analysis?
> Or would it be not using assumption system at all, but may be replaced by 
> more faster (but limited) decision procedures like pattern matching or type 
> theory?
>
> On Sunday, April 17, 2022 at 5:45:12 PM UTC+3 Oscar wrote:
>>
>> On Sun, 17 Apr 2022 at 15:02, David Bailey <[email protected]> wrote:
>> >
>> > On 17/04/2022 13:00, Oscar Benjamin wrote:
>> > > On Sun, 17 Apr 2022 at 11:47, David Bailey <[email protected]> wrote:
>> > >>
>> > >> I am curious as to how much of the difference in speed between SymPy and
>> > >> SymEngine you think is attributable to the lack of the optimal design of
>> > >> the Python code, and how much do you think is attributable to the choice
>> > >> of computer language. Clearly C++ is a much lower level language than
>> > >> Python, and would presumably be intrinsically much faster, but result in
>> > >> many hard to fix memory corruption bugs.
>> > > It's hard to disentangle these things. Both SymPy and SymEngine will
>> > > do a lot of symbolic processing behind the scenes to produce the
>> > > output that users expect to see. In general SymPy will do a lot more
>> > > processing than SymEngine which means that expressions will not
>> > > evaluate in the same way but also means that a comparison of the two
>> > > for speed is not easy to interpret as being about e.g. C++ vs Python
>> > > or any other particular optimisation.
>> > >
>> > > For example one of the things that is often slow in SymPy when you
>> > > have large expressions is assumptions queries. Every time you create
>> > > an expression like an Add or a Mul or exp etc there is a lot of
>> > > processing that goes on to determine if the expression can simplify
>> > > and this often involves checking "assumptions" using the core
>> > > assumptions system e.g.:
>> > >
>> > > >>> n = symbols('n', integer=True)
>> > > >>> sin(n*pi)
>> > > 0
>> > >
>> > > This kind of thing often dominates the runtime in SymPy when working
>> > > with large expressions. In SymEngine there are no assumptions and so
>> > > no assumptions checking is done at all:
>> > >
>> > > >>> from symengine import symbols, sin, pi
>> > > >>> n = symbols('n', integer=True)
>> > > >>> sin(n*pi)
>> > > sin(n*pi)
>> > > >>> n.is_integer
>> > > >>> False
>> > >
>> > > Having a simpler evaluation scheme would make SymPy faster while still
>> > > working in Python. There are many more examples of this where SymPy
>> > > just hasn't been clearly designed with performance always in mind.
>> > > Contributors are often unaware of the performance implications of the
>> > > changes that they make and it's very easy for a seemingly innocent fix
>> > > in one place to result in significant slowdowns elsewhere.
>> > >
>> >
>> > Maybe I'm missing something, but if assumptions are so costly, couldn't
>> > every expression contain a flag contains_assumptions to say if it
>> > contains assumptions. Then when a larger expression was created it would
>> > be simple to compute the contains_assumptions for the larger expression.
>> >
>> > E.g. in your example, the expression for n would have
>> > contains_assumptions set to 1, and this would propagate through any
>> > expression containing n.
>> >
>> > I think this scheme would work because expressions are immutable.
>> >
>> > Couldn't such a flag be used to speed things up by locally turning off
>> > the assumption checking?
>>
>> The core assumptions system isn't just about assumptions that are
>> defined on symbols: every expression has "assumptions" e.g.:
>>
>> In [1]: (1 + sqrt(2)).is_positive
>> Out[1]: True
>>
>> You can read more about it here:
>> https://docs.sympy.org/dev/guides/assumptions.html
>>
>> Typically what can be slow when manipulating large expressions is
>> actually expressions that don't involve symbols at all. For example to
>> answer the query (1 + sqrt(2)).is_positive the expression is
>> numerically evaluated with evalf which can be very expensive for large
>> expressions. This can also make operations with some expressions very
>> slow e.g. RootOf, Integral, etc. There is a tension between some
>> users/contributors wanting all assumptions queries to give a definite
>> answer and the fact that the "assumptions system" is invoked
>> (repeatedly) pretty much every time any new expression object is
>> created.
>>
>> --
>> Oscar
>
> --
> You received this message because you are subscribed to the Google Groups 
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/sympy/e99b97c2-4904-45be-852b-1a507915a22fn%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/CAKgW%3D6%2BQPt81JeLsvQ1FmVP_YHmCAjux_hfFGhG0CyPwhBfUdA%40mail.gmail.com.

Re: [sympy] Faster symbolics

Reply via email to