On Thu, Mar 14, 2013 at 3:42 AM, Ondřej Čertík <[email protected]> wrote: > On Thu, Mar 14, 2013 at 12:34 AM, Aaron Meurer <[email protected]> wrote: >> This validates what I've often said, which is that if you use the >> right algorithms and the right data structures, then Python can be >> just as fast as a compiled alternative (especially since "the right >> data structures" usually means fast built-in data structures). >> >> To me, right now, there a four main things that can make SymPy slow (in >> order: >> >> - expand >> - slow matrices/linear algebra >> - slow polys (esp. in highly multivariate situations) >> - the core >> >> expand() is the worst. If we can make that faster, then all of SymPy >> will feel zippier. >> >> Slow linear algebra is responsible for slow solve(), but also for slow >> integrate(), since heurisch() basically hangs here. I'm not sure where >> else it is used (other than in the matrices themselves, obviously). >> >> For slow polys, this rarely makes things slow, but it serves as a >> barrier to make things faster, since if you rewrite an algorithm using >> the polys, it makes it slower whenever the polynomials have more than >> a few variables (this is what happened with heurisch). >> >> Whenever I find myself breaking a long computation, I'd say 70% of the >> time it is in expand somewhere and 60% of the time it is in the polys. > > Can you write examples of expressions for expand() that are slow? > > When I did benchmarks of expand() about 2 years ago, of expressions > like (x+y+z+1)^20, I noticed that the main expansion is done using > multinomial_coefficients(), which is very fast (it can be made maybe > 2x or so faster in C, but not 100x). However, 95% of time is spent > in creating the symbolic expression from the coefficients.
I put a print statement at the top of Expr.expand and another at the end where it returns, and based on the output for some test expressions, the issue is not so much that it is slow but that it is called a lot of times. I included a simple counter, and it is called thousands of times. And note that the function is cached, so the print statements are only called once per identical input. I then added "print self" to the output, and found a very startling thing. Almost all of the inputs were already expanded. So I then used a keyword argument to create a global variable to count the number of times that expand has been called, and the number of times that the input is different from the output. I then ran the tests on several submodules The results (number of times input == output / number of times expand() was called): solvers: 11487/13341 (86.1%) integrals: 18952/21791 (86.97%) physics: 6657/8394 (79.3%) matrices: 418/500 (83.6%) core: 1786/2109 (84.68%) polys: 2619/2822 (92.8%) and keep in mind that this is with the cache *on*. So this is only counting globally unique inputs to expand() for each submodule. So for these three modules, only about 15% of the time is expand doing anything useful. It would seem that it's not so much expand((x + y + z)**20) that we need to make fast, but rather just expand(x + y + z). Some ideas: - We should be more careful when creating Poly to use expand=False whenever we know for sure that the input will never need to be expanded. I got a lot of mileage out of this trick in the risch code. - expand currently works by recursively calling _eval_expand_hint on an expression. So if you have something like 1 + 2*log(x*y), it calls (1 + 2*log(x*y))._eval_expand_log(), then Integer(1)._eval_expand_log(), then (2*log(x*y))._eval_expand_log(), and finally log(x*y)._eval_expand_log(). But it is only this final form that actually does anything. This is a lot of recursive function calls, which can be slow in Python. We should think of ways to reduce this, while still keeping the API open (and backwards compatible if possible). What it *really* should do in this case is something like expr.xreplace(dict([(l, l._eval_expand_log()) for l in expr.atoms(log)])). Except according to the API, expr.atoms(log) should really be replaced with something that gets all objects with _eval_expand_log defined. - For expand_mul and expand_multinomial, the real work horses of expand(), we should add a fast check if the expression is a polynomial. This would break the API as it currently sits, though. I guess we need sort of the reverse of the idea from https://code.google.com/p/sympy/issues/detail?id=3338. - I wonder if it would be possible to do a sort of "structural" caching. How easy is it to say, "is expression 1 the same as expression 2, except up to a permutation of the Symbols?" (and actually give the permutation), and most importantly, determine that efficiently? How complicated would an expression have to be for it to be faster to apply a permutation of Symbols against a cached result instead of just doing things the way we do now? Even if it's slower, I guess it would be much more efficient memory-wise for the cache. - How can we use Poly to make expand() faster? This is a question that I hope Mateusz has an answer to. By the way, does anyone know of a more efficient way to perform the above profile? I know that cProfile can count the number of times that a function is called, but is there any way to extend it to check input == output for each call, and count the results? My simple method made things run very slowly. I'm also curious what profile methods you use, Mateusz? Anything beyond cProfile and kernprof (and timeit)? Aaron Meurer > > Which is why I am experimenting with a C++ core, to see how fast > one can get. > >> >> The core I would say is actually not too slow, at least compared to >> these other things, but since it is used everywhere, even marginal >> speedups can add up. >> >> Your work (I'm assuming) will fix point 3, and your continuing work >> will fix point 2. Do you have any plans to fix point 1 (expand)? >> >> For point 4 of course the best way forward is to remove the old >> assumptions. In my opinion, though, speed is the least concern in the >> core. Bigger issues are how to better manage canonicalization and how >> to deal with global vs. local assumptions. > > Mateusz and I discussed this last few days (I visited him in Wroclaw) and > I think we both agreed that a good idea is for me (or others) to > finish the experimental > C++ core (https://github.com/certik/csympy) so that we can run some > benchmarks and play with several things like hashes and > canonicalization and at the same time keep > working sympy (assumptions as you wrote, and other things). > > The general idea is to have an API, so that we can keep sympy in pure > Python by default, > but if needed, be able to use a faster C++ core for the raw symbolics. > One thing is to decouple the core a little bit more from the rest of > sympy, which should happen anyway. > > In general, I think it's quite amazing how fast Python actually is, > that sympy can compete very well > with for example Maxima, if the right data structures and algorithms > are used. So I think it's a good idea > to keep sympy in pure Python. But have an optional C++ core is > something that I feel is necessary, if nothing, at least to have an > idea of how fast one can get. > > Ondrej > > -- > You received this message because you are subscribed to the Google Groups > "sympy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/sympy?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > > -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sympy?hl=en. For more options, visit https://groups.google.com/groups/opt_out.
