On Sun, 14 Jun 2020 at 07:10, Aaron Meurer <[email protected]> wrote:
>
> On Sat, Jun 13, 2020 at 5:50 AM Oscar Benjamin
> <[email protected]> wrote:
> >
> > On Sat, 13 Jun 2020 at 00:22, Aaron Meurer <[email protected]> wrote:
> > >
> > > On Sun, Jun 7, 2020 at 4:08 PM Oscar Benjamin
> > > <[email protected]> wrote:
> > > >
> > > > 3. What are the limitation of the new module? Where is it not 
> > > > implemented?
> > > >
> > > > The new assumptions were made a lot faster last Summer. Their slowness
> > > > was one major limitation before. They are still slow in comparison to
> > > > the old assumptions. The real limitation is the fact that the old
> > > > assumptions are used all across the codebase in both elementary
> > > > evaluations and in more complex routines. A huge amount of code would
> > > > need to be changed in order to make it so that general routines like
> > > > "integrate" use the new assumptions.
> > >
> > > The biggest thing that is missing is the other direction, to make
> > > expr.is_integer check ask(Q.integer(expr)). With the improved
> > > performance, this may now be possible to do, although I suspect more
> > > work is still required to make it so we don't slow down SymPy. For
> > > instance, IMO we need a better model of assumptions evaluation so that
> > > we can do fast checks vs. slow checks. The (new) assumptions do
> > > various levels of checks starting from a simple "is this already
> > > assumed" going all the way to a full SAT solve. But there's no way to
> > > limit how far it tries to go to get an answer. It's either all or
> > > nothing.
> >
> > We can't have `expr.is_integer` call `ask(Q.integer(expr))` without
> > first making the new assumptions free standing. At the moment the new
> > assumptions are propped up by the old and depend heavily on them for
> > their queries.
> >
> > The old assumptions for something like `expr.is_integer` are *already*
> > too slow. This is the main cause of slowness when working with large
> > expressions as you can see if you profile e.g. a slow call to expand:
> > https://github.com/sympy/sympy/pull/19485#issuecomment-639416355
>
> I didn't realize that. Do we know which parts are slow? Is it the
> logic in the core, or are the _eval handlers slow? I have a hard time
> reading the graph on that comment.

The call graph is complicated (too complicated? - note that what you
see is already a pruned graph). The key points are that:

1. 99% of the time is spent in expand so we're profiling a slow call to expand.
2. 84% of the time is spent in _ask so the bulk of the time taken by
expand is in the old assumptions
3. 83% of the time is spent in
Expr._eval_is_extended_positive_negative so most of the time is spent
checking e.g. expr.is_positive
4. 86% of the time is spent in evalf so is_positive is slow because it
is attempting to evaluate the expression numerically

You can see where that all happens here:
https://github.com/sympy/sympy/blob/master/sympy/core/expr.py#L888-L918

Note that this is on the handlers at the Expr level so it affects any
subclass that does not override these handlers. This means that
assumptions queries on objects with is_number=True (i.e. those that
can in principle be evalf()ed to a complex number) will very often end
up leading to numerical evaluation. The numerical evaluation seems to
be reliable enough in this usage but it can be slow and fails for
really large/small numbers like

    >>> (exp(exp(exp(exp(100*pi))))-1).is_positive
    OverflowError

You can also see that if numerical evaluation fails but the expression
is evalf()able and is algebraic then the handler will try to construct
its minimal polynomial which can be very expensive. It's rare to hit
this code path (expression is algebraic and evalf's to zero) but when
we do it can lead to a substantial slowdown. From memory there is one
test that depends on using the minimal polynomial and it is something
to do with assumptions on an expression involving the golden ratio.
There is also at least one issue about something being very slow and
it's because it also hits this path.

These handlers are depended upon heavily and in particular are used by
things like Add.is_positive which overrides the method but calls it
via super
https://github.com/sympy/sympy/blob/master/sympy/core/add.py#L643
before going on to the next potentially slow function which is _monotonic_sign:
https://github.com/sympy/sympy/blob/f7318dbfef2bf61e5ed8bc55f3e374880ec6ea67/sympy/core/exprtools.py#L29

In the non is_number case (expressions with symbols) the
_monotonic_sign function will often dominate the run time for
expressions with large Adds. The _monotonic sign function does all
kinds of things like attempting to compute the roots of a polynomial:
https://github.com/sympy/sympy/blob/f7318dbfef2bf61e5ed8bc55f3e374880ec6ea67/sympy/core/exprtools.py#L123

The other thing that is potentially slow in the old assumptions is
that in the indeterminate case we often end up evaluating every
handler e.g. if x and y are vanilla symbols with no assumptions then
(x*y).is_positive calls the following handlers:

x.is_positive
x.is_negative
x.is_extended_negative
x.is_extended_positive
x*y.is_positive
x*y.is_finite
y.is_negative
y.is_positive
y.is_extended_positive
y.is_extended_negative
x*y.is_even
x*y.is_integer
x*y.is_odd
x*y.is_imaginary
x*y.is_zero
x*y.is_irrational
x*y.is_complex
x*y.is_negative
x*y.is_extended_negative
x*y.is_composite
x*y.is_extended_real
x*y.is_rational
x*y.is_commutative
x*y.is_hermitian
x*y.is_algebraic
x*y.is_infinite
x*y.is_extended_positive
x*y.is_antihermitian

There is no Mul._eval_is_prime handler but if there was then it would
also be called because prime->positive. This is something that I think
few people really understand which is that adding a handler slows
everything else down because the handler will be called in situations
that you didn't expect e.g. expr.is_imaginary will check expr.is_prime
and conversely expr.is_prime will check expr.is_imaginary. This means
that we need to make every handler simple and fast because it will
potentially be checked as part of unrelated queries during basic
operations like multiplying two expressions.

Another potential slowdown is that some queries evaluate new
expressions as part of the query e.g.:
https://github.com/sympy/sympy/blob/f7318dbfef2bf61e5ed8bc55f3e374880ec6ea67/sympy/core/power.py#L580-L583
when you already have a large expression even something as simple as
taking the abs or adding one can be expensive.

Progress on this can be made but we need other ways to replace the
handlers that are being removed in the situations where they are
depended upon. The new assumptions have a role to play in this: if we
can use them sparingly for more complicated queries then we can
gradually remove the slow handlers from the old assumptions while
using and extending the new assumptions to take up the slack.

Another important step that I think we need is this PR:
https://github.com/sympy/sympy/pull/19107
which reworks the comparison <, <=, ==, !=, >, >= hierarchy. With that
we can significantly reduce the dependence on Add.is_positive
throughout the codebase. Currently every comparison lhs < rhs goes
through (lhs - rhs).is_negative which then goes through all the slow
things I've listed above. Still for all of that complexity it isn't
possible to resolve something as simple as:

>>> x = Symbol('x', integer=True)
>>> 2**x < 2**(x + 1)
2**x < 2**(x + 1)

The PR would make that kind of case trivial without depending on any
of the slow things listed above.

--
Oscar

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/CAHVvXxSTr5vTZ5TnEC5bSF4Ka9H7_60Zm2K18VeyMbCaY53pAw%40mail.gmail.com.

Reply via email to