Re: [sympy] Re: Is Sympy feasible for large numbers of variable manipulations?

Aaron S. Meurer Thu, 17 Jun 2010 10:09:54 -0700

On Jun 17, 2010, at 8:18 AM, Kevin Hunter wrote:

> Och, sorry for the delay I didn't get an email that you'd responded.
> Gotta love corporate filters ...
> 
>> Is this related to cylindrical algebraic decomposition?
> 
> No.  The framework could perhaps be applied to problems in that area,
> but we are not specifically doing CAD of this nature.
> 
>> What do you mean by reduction? Is it mainly cancelation of things like 2*x -
>> x and x**2/x, or is there more involved?
> 
> Hah!  That's a great question.  :-)  I'm fairly new to the code base,
> but that appears to be a good rough summation.  We do a couple things
> more, but they're specific to us, and nothing that couldn't use some
> cleanup based on good object use and general outsourcing.
> 
Well, the reason I ask is it depending on what you do, different things will be 
more efficient.  If you are going to be doing more polynomial like manipulation 
(multiplying and taking integer powers of Adds, canceling rational functions), 
then you definitely want to stick with the Polys module.  But if all you are 
doing is keeping terms together, then the core might be better.  Also, the 
slowness of Polys I mentioned only really comes whiten you have so many 
different variables like you do in your test script.  If you are actually only 
going to be working with a few variables at a time, not 60, then Polys should 
be faster, due to use of gmpy coefficients.


>> A slightly better way to do this is to send the list straight to Add.
> 
> I was wondering if I was overlooking something like this.  As
> efficient as list-processing is, it *still* incurs an overhead of an
> added function call and associated bookkeeping.  Without looking
> deeper, I'll bet that's a good chunk of the difference between these
> two paths.  This certainly does drop run-time costs tremendously.  I
> also wonder if Add sets up a lazy action?  I note that my timing
> methods claim that the Add function returns in half a second (0.5s),
> but the printing stage takes much longer.  For reference, here's what
> I'm now using:

Running with 100, I get:

Calculation: 0.636074066162 seconds.
Printing:    29.7397909164 seconds.

If I change it to the reduce method, I get:

Calculation: 383.056835175 seconds.
Printing:    39.7317187786 seconds.

The reduce method is slower because it creates a new Add object every step of 
the way, which then must go through and do all the x + x and x - x type 
simplifications.  This is done efficiently using dictionaries, but it is still 
quadratic in time. Add(*list) on the other hand only creates an Add once, going 
through the list once to create cancel things out.  So there is your O(…) 
difference, Add(*list) is O(n), reduce() is O(n**2).   

As for printing, this is a common problem, printing a large expression can 
often take as much as or more time than was needed to calculate the expression 
in the first place.  If you want to see your expression, there isn't much of a 
way around it, unless you can figure out how to make the printer faster.  
Otherwise, you could just avoid printing things.  If you are using the 
interpreter, you can set sys.displayhook to something else so that you will get:

>>> import sys
>>> sys.displayhook = lambda x: None
>>> 1
>>> 

Or, if you want to see some things but not all, you can define your own custom 
printer in SymPy.  See http://docs.sympy.org/modules/printing.html.  Also, the 
.args attribute is how things are stored internally, and you can use it with 
indexing to see only part of an expression.  For example:

In [6]: a = expand((x + y)**100)

In [7]: print a.args[3:5]
(28258808871162574166368460400*x**42*y**58, 
20116440213369968050635175200*x**59*y**41)

> 
> What kind of caching?  After running this once or twice, the Sympy
> code should be well primed in the OS cache.  The results to which I
> alluded were consistent over multiple runs.  Is there something more
> to which you're referring?

SymPy has its own caching.  It only kicks in if you run it in iSymPy (i.e., you 
don't restart the interpreter each time).  
> 
> Gotcha.  I'll have to play around with both of those.  Good pointers.

You might also look at sympycore, like you said.  It should have a faster core. 
 It doesn't have the more advanced things that sympy has, but if all you are 
doing is managing expressions like the one above, it should be enough.  I 
haven't played with it much myself, so you will again have to just try it out 
and see how it works.  

> 
> Noted.  I'm more worried about algorithmic efficiency (i.e. O(...) ),
> than I am by direct speedup.  It depends on with whom you talk, but I
> tend to fall into the camp of viewing this kind low-hanging fruit
> (Python -> C) as a one-time gain, not really a "true" measure of
> success.  But, as always context is everything.

Our idea of using Cython is to not try to port everything, but only the very 
low-level stuff that is the true bottleneck and is already doing direct 
computation with Python ints.  No matter what, we plan on keeping the core 
library pure Python (cython will remain optional), and free of dependencies.

Aaron Meurer

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sympy?hl=en.

Re: [sympy] Re: Is Sympy feasible for large numbers of variable manipulations?

Reply via email to