I just merged https://github.com/sympy/sympy/pull/8506, which fixes a
couple of pretty major performance issues. Whether or not you see better
performance in your own code after this will depend on what that code is
doing, but I'd like to use this as an opportunity to pontificate a bit on
performance.

In SymPy, we always need to be aware of the tradeoff of symbolic capability
and performance. Whenever you use a general algorithm, that algorithm has a
potential to be slow. There is no way around this, since algorithms that
work on general symbolic expressions can require computing arbitrary things
to get answers. It's true that algorithms can be made faster, but this is
also a limit of the mathematics itself.

Thus, there needs to be a very clear distinction between fast/dumb and
slow/smart functions. Anything that is slow/dumb should only do the bare
minimum to get an answer. Returning "I don't know" (in whatever form that
means for the specific function) is preferable to spending a long time to
try to return an answer. In cases where a wrong answer would be
mathematically incorrect, we have to stipulate that the function works
structurally, not mathematically.

Here are some examples of things that should be fast/dumb/structural:

- == (structural equality)
- match
- xreplace
- any form of automatic evaluation/simplification (that is, it should never
be slow to just create an object)

Of course, we can always have slow/smart/mathematical versions of these,
like equals and subs. The assumptions should also be considered to be slow.
So is printing (as a side-note, don't ever include the string form of an
expression in an exception, as a large expression could take forever to
print before the exception even gets raised).

The fast algorithms should never call the slow ones, unless the type of
object being called is known to a degree that it will be known to be fast.

In the pull request I merged, automatic evaluation of functions performed
as_real_imag() in some cases, just to check if the expression should
automatically evalf (example of a slow expression: sin((1.0 + 1.0*I)**10000
+ 1)). Another issue was that match was calling signsimp(), which is slow
enough to be considered a mathematical function.

In general, we should avoid doing things like this in the core. Assumptions
is a big part of this, since the smarter the assumptions get, the slower
they will be in general (even if the common cases get faster). We should
not be using assumptions in the core (that is, in automatic evaluation, or
other fast/dumb contexts).

To achieve this, we need to do a few things:

- Be rigorous about this in pull requests.

- Get a benchmark machine and run airspeed velocity on it. We need to catch
performance regressions. The benchmark suite can be anything, although
obviously well-made benchmarks are better.

- If a something is slow, investigate why it is slow with a profiler. The
best Python profilers are snakevis (which is a front-end to the built-in
profile module), line_profiler, and pyinstrument. These each profile Python
in a different way, so each can be useful.

- If a test is slow, don't just mark it is slow. Investigate why it is
slow. Bisect to see if it didn't used to be slow.

- Benchmark functions that are supposed to be fast when we create them, by
stress testing them so that we can be assured that they won't slow
something down (I think _aresame is an example of this).

- Stress test SymPy in general.

- When you fix a performance issue, add a test for it. Something that
should run very fast, but would never finish if the wrong algorithm were
used. Put a comment on the test that it is a performance test, so that it
isn't changed when there is a regression.

It's easy to just look at SymPy and brush off the speed as a side-effect of
being in pure Python, but we should take the view that SymPy can be faster
than it is.

Aaron Meurer

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/CAKgW%3D6JmNoN%2Bdjis9a5dsa1nyDRMmp106QAhZeYxcOrxJh%2Bfqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to