On Sun, 17 Apr 2022 at 11:47, David Bailey <[email protected]> wrote:
>
> On 16/04/2022 12:18, Oscar Benjamin wrote:
> > Yes, this does work differently in SymEngine compared to SymPy. If
> > SymPy could be changed to work the same way then it could also be a
> > lot faster (without needing to be rewritten in C++). SymEngine has the
> > same overall design as SymPy but makes different choices for some
> > aspects of that design so that it is faster but at the same time
> > incompatible.
>
> I must say, SymPy symbolics seem fast enough to me, but I haven't tried
> to generate very large symbolic expressions.
>
> I am curious as to how much of the difference in speed between SymPy and
> SymEngine you think is attributable to the lack of the optimal design of
> the Python code, and how much do you think is attributable to the choice
> of computer language. Clearly C++ is a much lower level language than
> Python, and would presumably be intrinsically much faster, but result in
> many hard to fix memory corruption bugs.
It's hard to disentangle these things. Both SymPy and SymEngine will
do a lot of symbolic processing behind the scenes to produce the
output that users expect to see. In general SymPy will do a lot more
processing than SymEngine which means that expressions will not
evaluate in the same way but also means that a comparison of the two
for speed is not easy to interpret as being about e.g. C++ vs Python
or any other particular optimisation.
For example one of the things that is often slow in SymPy when you
have large expressions is assumptions queries. Every time you create
an expression like an Add or a Mul or exp etc there is a lot of
processing that goes on to determine if the expression can simplify
and this often involves checking "assumptions" using the core
assumptions system e.g.:
>>> n = symbols('n', integer=True)
>>> sin(n*pi)
0
This kind of thing often dominates the runtime in SymPy when working
with large expressions. In SymEngine there are no assumptions and so
no assumptions checking is done at all:
>>> from symengine import symbols, sin, pi
>>> n = symbols('n', integer=True)
>>> sin(n*pi)
sin(n*pi)
>>> n.is_integer
>>> False
Having a simpler evaluation scheme would make SymPy faster while still
working in Python. There are many more examples of this where SymPy
just hasn't been clearly designed with performance always in mind.
Contributors are often unaware of the performance implications of the
changes that they make and it's very easy for a seemingly innocent fix
in one place to result in significant slowdowns elsewhere.
The main reason it's so hard to understand the performance
implications of anything is that most of the computation is implicit
(because of the class based design). Something as innocuous as a+b
results in a bunch of computation that you might not expect at a
higher level. In turn that means that e.g. sum(expressions) has
quadratic complexity and is therefore slow for a large list of inputs:
In [1]: from sympy import symbols
In [2]: %time ok = sum(symbols('x:1000'))
CPU times: user 6.72 s, sys: 8 ms, total: 6.72 s
Wall time: 6.73 s
You can see the same quadratic cost with SymEngine but at least for
this example it is something like 50x faster. That difference will be
due in part to the different algorithms/representations but also the
difference between Python and C++. In both cases the cost is O(n^2)
though when in principle it can be O(n). Note that in both cases
Add(*expressions) is O(n) and should be preferred to using sum. Again
though in principle Add(*expressions) could be made effectively O(1)
in a different design.
Oscar
--
You received this message because you are subscribed to the Google Groups
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/sympy/CAHVvXxRBG08u7iZWzd4gSM7%3DLNdHQ-bYyMV7%3Dv1%3DxXmG8J9wyQ%40mail.gmail.com.