On Sun, 14 Jun 2020 at 07:10, Aaron Meurer <[email protected]> wrote: > > On Sat, Jun 13, 2020 at 5:50 AM Oscar Benjamin > <[email protected]> wrote: > > > > On Sat, 13 Jun 2020 at 00:22, Aaron Meurer <[email protected]> wrote: > > > > > > On Sun, Jun 7, 2020 at 4:08 PM Oscar Benjamin > > > <[email protected]> wrote: > > > > > > > > 3. What are the limitation of the new module? Where is it not > > > > implemented? > > > > > > > > The new assumptions were made a lot faster last Summer. Their slowness > > > > was one major limitation before. They are still slow in comparison to > > > > the old assumptions. The real limitation is the fact that the old > > > > assumptions are used all across the codebase in both elementary > > > > evaluations and in more complex routines. A huge amount of code would > > > > need to be changed in order to make it so that general routines like > > > > "integrate" use the new assumptions. > > > > > > The biggest thing that is missing is the other direction, to make > > > expr.is_integer check ask(Q.integer(expr)). With the improved > > > performance, this may now be possible to do, although I suspect more > > > work is still required to make it so we don't slow down SymPy. For > > > instance, IMO we need a better model of assumptions evaluation so that > > > we can do fast checks vs. slow checks. The (new) assumptions do > > > various levels of checks starting from a simple "is this already > > > assumed" going all the way to a full SAT solve. But there's no way to > > > limit how far it tries to go to get an answer. It's either all or > > > nothing. > > > > We can't have `expr.is_integer` call `ask(Q.integer(expr))` without > > first making the new assumptions free standing. At the moment the new > > assumptions are propped up by the old and depend heavily on them for > > their queries. > > > > The old assumptions for something like `expr.is_integer` are *already* > > too slow. This is the main cause of slowness when working with large > > expressions as you can see if you profile e.g. a slow call to expand: > > https://github.com/sympy/sympy/pull/19485#issuecomment-639416355 > > I didn't realize that. Do we know which parts are slow? Is it the > logic in the core, or are the _eval handlers slow? I have a hard time > reading the graph on that comment.
The call graph is complicated (too complicated? - note that what you see is already a pruned graph). The key points are that: 1. 99% of the time is spent in expand so we're profiling a slow call to expand. 2. 84% of the time is spent in _ask so the bulk of the time taken by expand is in the old assumptions 3. 83% of the time is spent in Expr._eval_is_extended_positive_negative so most of the time is spent checking e.g. expr.is_positive 4. 86% of the time is spent in evalf so is_positive is slow because it is attempting to evaluate the expression numerically You can see where that all happens here: https://github.com/sympy/sympy/blob/master/sympy/core/expr.py#L888-L918 Note that this is on the handlers at the Expr level so it affects any subclass that does not override these handlers. This means that assumptions queries on objects with is_number=True (i.e. those that can in principle be evalf()ed to a complex number) will very often end up leading to numerical evaluation. The numerical evaluation seems to be reliable enough in this usage but it can be slow and fails for really large/small numbers like >>> (exp(exp(exp(exp(100*pi))))-1).is_positive OverflowError You can also see that if numerical evaluation fails but the expression is evalf()able and is algebraic then the handler will try to construct its minimal polynomial which can be very expensive. It's rare to hit this code path (expression is algebraic and evalf's to zero) but when we do it can lead to a substantial slowdown. From memory there is one test that depends on using the minimal polynomial and it is something to do with assumptions on an expression involving the golden ratio. There is also at least one issue about something being very slow and it's because it also hits this path. These handlers are depended upon heavily and in particular are used by things like Add.is_positive which overrides the method but calls it via super https://github.com/sympy/sympy/blob/master/sympy/core/add.py#L643 before going on to the next potentially slow function which is _monotonic_sign: https://github.com/sympy/sympy/blob/f7318dbfef2bf61e5ed8bc55f3e374880ec6ea67/sympy/core/exprtools.py#L29 In the non is_number case (expressions with symbols) the _monotonic_sign function will often dominate the run time for expressions with large Adds. The _monotonic sign function does all kinds of things like attempting to compute the roots of a polynomial: https://github.com/sympy/sympy/blob/f7318dbfef2bf61e5ed8bc55f3e374880ec6ea67/sympy/core/exprtools.py#L123 The other thing that is potentially slow in the old assumptions is that in the indeterminate case we often end up evaluating every handler e.g. if x and y are vanilla symbols with no assumptions then (x*y).is_positive calls the following handlers: x.is_positive x.is_negative x.is_extended_negative x.is_extended_positive x*y.is_positive x*y.is_finite y.is_negative y.is_positive y.is_extended_positive y.is_extended_negative x*y.is_even x*y.is_integer x*y.is_odd x*y.is_imaginary x*y.is_zero x*y.is_irrational x*y.is_complex x*y.is_negative x*y.is_extended_negative x*y.is_composite x*y.is_extended_real x*y.is_rational x*y.is_commutative x*y.is_hermitian x*y.is_algebraic x*y.is_infinite x*y.is_extended_positive x*y.is_antihermitian There is no Mul._eval_is_prime handler but if there was then it would also be called because prime->positive. This is something that I think few people really understand which is that adding a handler slows everything else down because the handler will be called in situations that you didn't expect e.g. expr.is_imaginary will check expr.is_prime and conversely expr.is_prime will check expr.is_imaginary. This means that we need to make every handler simple and fast because it will potentially be checked as part of unrelated queries during basic operations like multiplying two expressions. Another potential slowdown is that some queries evaluate new expressions as part of the query e.g.: https://github.com/sympy/sympy/blob/f7318dbfef2bf61e5ed8bc55f3e374880ec6ea67/sympy/core/power.py#L580-L583 when you already have a large expression even something as simple as taking the abs or adding one can be expensive. Progress on this can be made but we need other ways to replace the handlers that are being removed in the situations where they are depended upon. The new assumptions have a role to play in this: if we can use them sparingly for more complicated queries then we can gradually remove the slow handlers from the old assumptions while using and extending the new assumptions to take up the slack. Another important step that I think we need is this PR: https://github.com/sympy/sympy/pull/19107 which reworks the comparison <, <=, ==, !=, >, >= hierarchy. With that we can significantly reduce the dependence on Add.is_positive throughout the codebase. Currently every comparison lhs < rhs goes through (lhs - rhs).is_negative which then goes through all the slow things I've listed above. Still for all of that complexity it isn't possible to resolve something as simple as: >>> x = Symbol('x', integer=True) >>> 2**x < 2**(x + 1) 2**x < 2**(x + 1) The PR would make that kind of case trivial without depending on any of the slow things listed above. -- Oscar -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CAHVvXxSTr5vTZ5TnEC5bSF4Ka9H7_60Zm2K18VeyMbCaY53pAw%40mail.gmail.com.
