Re: [sympy] Faster symbolics

David Bailey Sun, 17 Apr 2022 07:02:17 -0700

On 17/04/2022 13:00, Oscar Benjamin wrote:

On Sun, 17 Apr 2022 at 11:47, David Bailey <[email protected]> wrote:

On 16/04/2022 12:18, Oscar Benjamin wrote:

Yes, this does work differently in SymEngine compared to SymPy. If
SymPy could be changed to work the same way then it could also be a
lot faster (without needing to be rewritten in C++). SymEngine has the
same overall design as SymPy but makes different choices for some
aspects of that design so that it is faster but at the same time
incompatible.

I must say, SymPy symbolics seem fast enough to me, but I haven't tried
to generate very large symbolic expressions.


I am curious as to how much of the difference in speed between SymPy and
SymEngine you think is attributable to the lack of the optimal design of
the Python code, and how much do you think is attributable to the choice
of computer language. Clearly C++ is a much lower level language than
Python, and would presumably be intrinsically much faster, but result in
many hard to fix memory corruption bugs.

It's hard to disentangle these things. Both SymPy and SymEngine will
do a lot of symbolic processing behind the scenes to produce the
output that users expect to see. In general SymPy will do a lot more
processing than SymEngine which means that expressions will not
evaluate in the same way but also means that a comparison of the two
for speed is not easy to interpret as being about e.g. C++ vs Python
or any other particular optimisation.

For example one of the things that is often slow in SymPy when you
have large expressions is assumptions queries. Every time you create
an expression like an Add or a Mul or exp etc there is a lot of
processing that goes on to determine if the expression can simplify
and this often involves checking "assumptions" using the core
assumptions system e.g.:

   >>> n = symbols('n', integer=True)
   >>> sin(n*pi)
   0

This kind of thing often dominates the runtime in SymPy when working
with large expressions. In SymEngine there are no assumptions and so
no assumptions checking is done at all:

   >>> from symengine import symbols, sin, pi
   >>> n = symbols('n', integer=True)
   >>> sin(n*pi)
   sin(n*pi)
   >>> n.is_integer
   >>> False

Having a simpler evaluation scheme would make SymPy faster while still
working in Python. There are many more examples of this where SymPy
just hasn't been clearly designed with performance always in mind.
Contributors are often unaware of the performance implications of the
changes that they make and it's very easy for a seemingly innocent fix
in one place to result in significant slowdowns elsewhere.

The main reason it's so hard to understand the performance
implications of anything is that most of the computation is implicit
(because of the class based design). Something as innocuous as a+b
results in a bunch of computation that you might not expect at a
higher level. In turn that means that e.g. sum(expressions) has
quadratic complexity and is therefore slow for a large list of inputs:

In [1]: from sympy import symbols

In [2]: %time ok = sum(symbols('x:1000'))
CPU times: user 6.72 s, sys: 8 ms, total: 6.72 s
Wall time: 6.73 s

You can see the same quadratic cost with SymEngine but at least for
this example it is something like 50x faster. That difference will be
due in part to the different algorithms/representations but also the
difference between Python and C++. In both cases the cost is O(n^2)
though when in principle it can be O(n). Note that in both cases
Add(*expressions) is O(n) and should be preferred to using sum. Again
though in principle Add(*expressions) could be made effectively O(1)
in a different design.

Oscar

Thanks for that interesting response!

Maybe I'm missing something, but if assumptions are so costly, couldn'tevery expression contain a flag contains_assumptions to say if itcontains assumptions. Then when a larger expression was created it wouldbe simple to compute the contains_assumptions for the larger expression.

E.g. in your example, the expression for n would havecontains_assumptions set to 1, and this would propagate through anyexpression containing n.


I think this scheme would work because expressions are immutable.

Couldn't such a flag be used to speed things up by locally turning offthe assumption checking?



David

--
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/dfed221a-5a4c-6c0c-11ec-564ea0a5a2dd%40dbailey.co.uk.

Re: [sympy] Faster symbolics

Reply via email to