Re: [sympy] Questions about the goals of CSymPy

Ondřej Čertík Wed, 23 Apr 2014 23:19:22 -0700

On Thu, Apr 24, 2014 at 12:10 AM, mario <[email protected]> wrote:
> Ondřej, you wrote:
> For the sympy.polys in this example,
> is it using the sparse polynomial representation? I.e. it stores the
> symbols (x, y, z) and then stores their powers and coefficients
> in a dictionary, e.g.:
>
> 3*x^2*z + 2*x -> {3: (2, 0, 1), 2: (1, 0, 0)}
>
> ?
>
> In CSymPy, I am still implementing this datastructure, as it is very
> efficient, but it only works for polynomials. So at the moment, we
> can't benchmark this yet.
>
> In the dictionary in that example you meant {(2,0,1): 3, (1,0,0):2}


Ah, yes, of course.

> It is not fast with many variables; typically many of the exponents are
> zero, so
> it is not efficient to iterate on an array of exponents, which are mostly
> zero.
> It is faster to use a data structure keeping only the nonzero exponents,
> like
> the ETuple used in PolyDict in Sage. The code for ETuple is short and
> simple.

Here is the documentation:

http://www.sagemath.org/doc/reference/polynomial_rings/sage/rings/polynomial/polydict.html

Thanks for the tips. In CSymPy, here is the implementation for sparse
polynomial multiplication:

https://github.com/sympy/csympy/blob/master/src/rings.cpp#L63

and 'umap_vec_mpz' is defined here:

https://github.com/sympy/csympy/blob/master/src/dict.h#L67

If you have tips how to implement ETuple in C++ and improve upon this
datastructure, please let me know.

Ondrej

>
> On Wednesday, April 23, 2014 5:35:59 PM UTC+2, Ondřej Čertík wrote:
>>
>> On Wed, Apr 23, 2014 at 9:21 AM, Mateusz Paprocki <[email protected]>
>> wrote:
>> > Hi,
>> >
>> > On 23 April 2014 07:36, Ondřej Čertík <[email protected]> wrote:
>> >> On Tue, Apr 22, 2014 at 9:52 PM, Aaron Meurer <[email protected]>
>> >> wrote:
>> >>> On Tue, Apr 22, 2014 at 10:21 PM, Ondřej Čertík <[email protected]>
>> >>> wrote:
>> >>>> On Tue, Apr 22, 2014 at 6:06 PM, Aaron Meurer <[email protected]>
>> >>>> wrote:
>> >>>>> On Tue, Apr 22, 2014 at 12:05 PM, Ondřej Čertík
>> >>>>> <[email protected]> wrote:
>> >>>>>> Hi Aaron,
>> >>>>>>
>> >>>>>> Those are good questions. Here are the answers:
>> >>>>>>
>> >>>>>> On Tue, Apr 22, 2014 at 10:13 AM, Aaron Meurer <[email protected]>
>> >>>>>> wrote:
>> >>>>>>> I have some high level questions about CSymPy.
>> >>>>>>>
>> >>>>>>> - What are the goals of the project?
>> >>>>>>
>> >>>>>> The goals of the project are:
>> >>>>>>
>> >>>>>> * Fastest symbolic manipulation library, compared to other codes,
>> >>>>>> commercial or opensource
>> >>>>>> (Sage, GiNaC, Mathematica, ...).
>> >>>>>>
>> >>>>>> * Extension/complement to SymPy
>> >>>>>>
>> >>>>>> * If the above two goals allow, be able to also call it from other
>> >>>>>> languages easily and efficiently (Julia, Ruby, Mathematica, ...)
>> >>>>>>
>> >>>>>> As to technical solution: the core should be a C++ library, which
>> >>>>>> can
>> >>>>>> depend on other compiled libraries if needed.
>> >>>>>> The core should not depend on Python or Ruby or Julia, but rather
>> >>>>>> be
>> >>>>>> just one language, C++. That lowers the barrier
>> >>>>>> of entry significantly, compared to a big mix of C++, Cython and
>> >>>>>> Python, makes it easier to make things fast
>> >>>>>> (you don't need to worry about Python at all). The Python (and
>> >>>>>> other
>> >>>>>> languages) wrappers should be just a thin
>> >>>>>> wrappers around the C++ core (=just better syntax).
>> >>>>>>
>> >>>>>> There might be other technical solutions to this, but I know that I
>> >>>>>> can deliver the above goals with this solution
>> >>>>>> (and I failed to deliver with other solutions, like writing the
>> >>>>>> core
>> >>>>>> in Cython). So that's why we do it this way.
>> >>>>>>
>> >>>>>> Also, by being "just a C++ library", other people can use it in
>> >>>>>> their
>> >>>>>> projects. I hope to get interest of much broader
>> >>>>>> community that way, who can contribute back (somebody will need
>> >>>>>> fast
>> >>>>>> symbolic manipulation in Julia, so they
>> >>>>>> can just use CSymPy with Julia wrappers, and contribute
>> >>>>>> improvements back).
>> >>>>>>
>> >>>>>>>
>> >>>>>>> - What are the things that should definitely go in CSymPy?
>> >>>>>>
>> >>>>>> At the moment: all things to make specific applications fast, in
>> >>>>>> particular PyDy. For that, it needs basic
>> >>>>>> manipulation, differentiation, series expansion (I think) and
>> >>>>>> matrices. That's all roughly either done, or
>> >>>>>> on the way. Of course, lots of polishing is needed.
>> >>>>>
>> >>>>> I think that's already too much. Why is the series expansion slow in
>> >>>>> SymPy? Is it because the algorithms are slow? If so, then
>> >>>>> implementing
>> >>>>> the same inefficient algorithms in CSymPy won't help. They will be
>> >>>>> faster, but for large enough expressions they will still slow down.
>> >>>>> Is
>> >>>>> it because the expression manipulation is slow? In that case, if
>> >>>>> CSymPy has faster expression manipulation, then just use those
>> >>>>> expressions, but use the SymPy series algorithms.
>> >>>>
>> >>>> My experience is that it will actually help to implement the same
>> >>>> algorithm,
>> >>>> because there is a little overhead with any Python operation.
>> >>>
>> >>> Exactly. There is a "little" overhead. Not a huge overhead. It matters
>> >>> for the stuff that is in the inner loops, like addition and
>> >>> multiplication of terms, but for whole algorithms, which might be
>> >>> called only a few times (as opposed to a few hundred thousand times),
>> >>> it doesn't make a difference.
>> >>>
>> >>> This is all hypothetical without numbers (and btw, it would be awesome
>> >>> if you could provide real numbers here), but suppose these imaginary
>> >>> numbers were true:
>> >>>
>> >>> SymPy: 1x
>> >>> CSymPy with Python wrappers: 4x
>> >>> Raw CSymPy: 5x
>> >>>
>> >>> Then using CSymPy with Python would already be 4x faster than SymPy.
>> >>> Now doing everything in SymPy would only be 1.25 faster than that.
>> >>>
>> >>> Now, if CSymPy integrates flawlessly, so that it just works (at least
>> >>> as far as the user is concerned), there is little complexity cost of
>> >>> CSymPy + Python. Definitely little enough to warrant the 4x speedup.
>> >>> But as soon as you take that away, i.e., you implement more and more
>> >>> in C++, or CSymPy differs enough from SymPy that the user needs to
>> >>> care about it (which the more that is in CSymPy, the more likely this
>> >>> is to happen), then the complexity cost sky rockets. Maybe 4x would
>> >>> still be worth it here. But not 1.25x.
>> >>>
>> >>>>So if you do
>> >>>> a lot of them (like in series expansion, where you create
>> >>>> intermediate
>> >>>> expressions
>> >>>> and so on --- and even if we use CSymPy, there is overhead in the
>> >>>> Python wrappers,
>> >>>> so essentially you don't want to be calling them too often, if you
>> >>>> *really* care
>> >>>> about performance), they will add up.
>> >>>
>> >>> Sure, you could just implement a whole CAS in C++. That's what some
>> >>> people have done already. But you have to factor in the costs of
>> >>> everything, not just the speed. The costs of:
>> >>>
>> >>> - How much more complicated the code is
>> >>> - Code duplication (and all the associated issues that come with it)
>> >>> - The additional overhead needed for CSymPy to interop with SymPy. The
>> >>> more CSymPy does, the harder this is.
>> >>>
>> >>> Is series expansion slow? Is it an inner loop (i.e., will it matter to
>> >>> people if it is slow)? Is it simple to implement ('simple' being a
>> >>> relative term of course; obviously no part of a CAS is completely
>> >>> simple)?  If the answer to any of those is "no", I think you should
>> >>> seriously consider whether it's worth implementing.
>> >>>
>> >>>>
>> >>>>>
>> >>>>> My points are:
>> >>>>>
>> >>>>> - I think CSymPy should focus on making expression manipulation fast
>> >>>>> (i.e., the things that are the inner loop of any symbolic
>> >>>>> algorithm).
>> >>>>> It should not reimplement the symbolic algorithms themselves. Those
>> >>>>> are implemented in SymPy. If they use the CSymPy objects instead of
>> >>>>> the SymPy objects, they will be faster.
>> >>>>>
>> >>>>> - I would focus more on making CSymPy interoperate with SymPy and
>> >>>>> less
>> >>>>> on reimplementing things that are in SymPy in CSymPy. Once there is
>> >>>>> interoperation, we can see what is still slow, and then (and only
>> >>>>> then) implement it in C++.
>> >>>>
>> >>>> The interoperability is important and that is mostly the job of the
>> >>>> Python wrappers.
>> >>>> The C++ API underneath can change a lot, i.e. if we figure out a
>> >>>> faster way to represent
>> >>>> things, we'll switch. I will spend lots of time making the
>> >>>> interoperability work.
>> >>>> This summer though, I want to think hard about raw speed and making
>> >>>> it work
>> >>>> for PyDy.
>> >>>>
>> >>>>>
>> >>>>>>
>> >>>>>>>
>> >>>>>>> - What are the things that should definitely not go in CSymPy?
>> >>>>>>
>> >>>>>> Things that don't need to be fast. Things like limits. Also things
>> >>>>>> that are in SymPy, where CSymPy can
>> >>>>>> be used as a drop in replacement for the engine: PyDy, some stuff
>> >>>>>> in
>> >>>>>> physics, and so on. There is no need
>> >>>>>> to rewrite PyDy in C++. Also most user's code would stay in Python.
>> >>>>>> They can just optionally change
>> >>>>>> to CSymPy for some intensive calculation, then finish the thing
>> >>>>>> with SymPy.
>> >>>>>>
>> >>>>>>>
>> >>>>>>> - How will CSymPy be architectured to allow things to happen in
>> >>>>>>> CSymPy
>> >>>>>>> when they can but fallback to SymPy when they cannot.
>> >>>>>>
>> >>>>>> Currently you can simply mix and match SymPy and CSymPy
>> >>>>>> expressions.
>> >>>>>> So you simply
>> >>>>>> convert an expression to SymPy to do some advanced manipulation,
>> >>>>>> and
>> >>>>>> convert to CSymPy
>> >>>>>> to do some fast manipulation. I am open to suggestions how to
>> >>>>>> improve this.
>> >>>>>>
>> >>>>>>>
>> >>>>>>> My main concern here is that CSymPy has not clear separation from
>> >>>>>>> SymPy, and as a result it will end up growing larger and larger,
>> >>>>>>> until
>> >>>>>>> it becomes an independent CAS (which is fine if that's the goal,
>> >>>>>>> but
>> >>>>>>> my understanding was that it was supposed to be just a small fast
>> >>>>>>> core).
>> >>>>>>
>> >>>>>> The goals are written above. I am myself concentrating on speed,
>> >>>>>> that's what I really
>> >>>>>> want to nail down. And then enough features so that it's useful for
>> >>>>>> all the people who
>> >>>>>> found SymPy slow. However, let's say somebody comes and
>> >>>>>> reimplements the Gruntz
>> >>>>>> algorithm in CSymPy. Should we reject such a PR? My answer is that
>> >>>>>> if
>> >>>>>> the code is nice,
>> >>>>>> maintainable, much faster than SymPy and has the same or similar
>> >>>>>> features, I am ok
>> >>>>>> with merging it.  If the code is a mess, then not. But as I said, I
>> >>>>>> am
>> >>>>>> spending my own
>> >>>>>> time on things which people need, and faster limits don't seem to
>> >>>>>> be it.
>> >>>>>>
>> >>>>>>>
>> >>>>>>> In particular, if there is some feature of SymPy functions, how
>> >>>>>>> will
>> >>>>>>> CSymPy be architectured so that it can take advantage of it
>> >>>>>>> without
>> >>>>>>> having to completely reimplement that function in C++?
>> >>>>>>
>> >>>>>> You can convert any expression back and forth, so you keep it in
>> >>>>>> SymPy
>> >>>>>> if you want to have some particular feature. See also the
>> >>>>>> conversation
>> >>>>>> and specific examples here:
>> >>>>>>
>> >>>>>> https://github.com/sympy/csympy/issues/153
>> >>>>>>
>> >>>>>>>
>> >>>>>>> For instance, a current goal of CSymPy is to implement trig
>> >>>>>>> functions.
>> >>>>>>> But this can be quite complicated if you consider all the
>> >>>>>>> different
>> >>>>>>> things you can do with trig functions. Without even thinking about
>> >>>>>>> trig simplification, there are complicated evaluation issues
>> >>>>>>> (e.g.,
>> >>>>>>> consider sin(pi/7).rewrite(sqrt) in SymPy). It would be a shame to
>> >>>>>>> reimplement all this logic twice, especially it is not needed for
>> >>>>>>> performance.
>> >>>>>>
>> >>>>>> Agreed. On the other hand, we really need very fast trig functions.
>> >>>>>> The functionality
>> >>>>>> that we need is simplifications like sin(2*pi) -> 0,
>> >>>>>> differentiation
>> >>>>>> and series expansion.
>> >>>>>
>> >>>>> Why do you need to implement those in C++? If the expression
>> >>>>> manipulation is fast, then won't it be fine to have the actual
>> >>>>> formulas/algorithms in SymPy?
>> >>>>
>> >>>> Maybe, that depends on this issue:
>> >>>>
>> >>>> https://github.com/sympy/csympy/issues/153
>> >>>>
>> >>>> The problem is that once you start thinking about Python+C++ at once
>> >>>> and performance, things get complex quickly. It's much easier
>> >>>> to think in terms of C++ only and how to write the fastest possible
>> >>>> algorithm
>> >>>> (that is hard enough!). This sets the bar. Then one should try to see
>> >>>> if it is possible to match this with Python. Not the other way round,
>> >>>> because you need to set the bar first.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Apr 22, 2014 at 6:16 PM, Aaron Meurer <[email protected]>
>> >>>> wrote:
>> >>>>> On Tue, Apr 22, 2014 at 4:58 PM, Joachim Durchholz
>> >>>>> <[email protected]> wrote:
>> >>>>>> Hm.
>> >>>>>>
>> >>>>>> One:
>> >>>>>>> * Extension/complement to SymPy
>> >>>>>>
>> >>>>>> Two:
>> >>>>>>
>> >>>>>>
>> >>>>>>> That lowers the barrier
>> >>>>>>> of entry significantly, compared to a big mix of C++, Cython and
>> >>>>>>> Python, makes it easier to make things fast
>> >>>>>>> (you don't need to worry about Python at all).
>> >>>>>>
>> >>>>>>
>> >>>>>> That's not an extension nor a complement, it's a replacement.
>> >>>>>>
>> >>>>>>
>> >>>>>>> The Python (and other
>> >>>>>>>
>> >>>>>>> languages) wrappers should be just a thin
>> >>>>>>> wrappers around the C++ core (=just better syntax).
>> >>>>>>
>> >>>>>>
>> >>>>>> I.e. replace the engine if not the API.
>> >>>>>>
>> >>>>>> Not that I'm judging. I'm just pointing out perceived
>> >>>>>> inconsistencies.
>> >>>>>>
>> >>>>>> The more SymPy itself turns into a set of simplification rulese,
>> >>>>>> the less
>> >>>>>> significance this will have in the end.
>> >>>>>>
>> >>>>>>
>> >>>>>>>> - How will CSymPy be architectured to allow things to happen in
>> >>>>>>>> CSymPy
>> >>>>>>>> when they can but fallback to SymPy when they cannot.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Currently you can simply mix and match SymPy and CSymPy
>> >>>>>>> expressions.
>> >>>>>>> So you simply
>> >>>>>>> convert an expression to SymPy to do some advanced manipulation,
>> >>>>>>> and
>> >>>>>>> convert to CSymPy
>> >>>>>>> to do some fast manipulation. I am open to suggestions how to
>> >>>>>>> improve
>> >>>>>>> this.
>> >>>>>>
>> >>>>>>
>> >>>>>> The alternative would be to have a data structure that can be
>> >>>>>> manipulated
>> >>>>>> from both the C++ and the Python side, but that's going to be
>> >>>>>> unnatural for
>> >>>>>> at least one of the sides.
>> >>>>>>
>> >>>>>> Note that the data structures can become large-ish, and if the
>> >>>>>> simplification becomes complicated there may be a lot of
>> >>>>>> back-and-forth.
>> >>>>>> It's possible that mixed execution will be slow for some algorithms
>> >>>>>> or use
>> >>>>>> cases for that reason.
>> >>>>>>
>> >>>>>> I do not think that this can be determined in advance, it's
>> >>>>>> something to
>> >>>>>> keep an eye out for during benchmarks.
>> >>>>>>
>> >>>>>>
>> >>>>>>>> My main concern here is that CSymPy has not clear separation from
>> >>>>>>>> SymPy, and as a result it will end up growing larger and larger,
>> >>>>>>>> until
>> >>>>>>>> it becomes an independent CAS (which is fine if that's the goal,
>> >>>>>>>> but
>> >>>>>>>> my understanding was that it was supposed to be just a small fast
>> >>>>>>>> core).
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> The goals are written above. I am myself concentrating on speed,
>> >>>>>>> that's what I really
>> >>>>>>> want to nail down.
>> >>>>>>
>> >>>>>>
>> >>>>>> I'm somewhat sceptical about this.
>> >>>>>> A conversion to C++ will give a linear improvement.
>> >>>>>> Better algorithms can improve the big-Oh class.
>> >>>>>> Unless algorithmic improvements have been exhausted, this would be
>> >>>>>> premature
>> >>>>>> optimization. (Are algorithmic improvements exhausted yet?)
>> >>>>>
>> >>>>> That is true. I usually prefer to use faster algorithms.
>> >>>>>
>> >>>>> But to be the devil's advocate, there are two issues with this line
>> >>>>> of thinking:
>> >>>>>
>> >>>>> - Big O is basically useless. Consider the extreme effectiveness of
>> >>>>> SAT solvers (which solve an NP-complete problem), or the difference
>> >>>>> between the simplex and Khachiyan's algorithm, or AKS vs. more
>> >>>>> efficient deterministic primality testing algorithms. Asymptotic
>> >>>>> complexity is all fine, but at the end of the day, you don't care
>> >>>>> how
>> >>>>> fast your algorithm is for increasingly large inputs, you care how
>> >>>>> fast it is for *your* input.
>> >>>>>
>> >>>>> - Faster algorithms have a complexity cost. You can get closer to
>> >>>>> the
>> >>>>> metal in Python by being very careful about your use of data
>> >>>>> structures, and avoiding things that are slow in Python (like
>> >>>>> function
>> >>>>> calls), but the cost is high because you end up with code that is
>> >>>>> not
>> >>>>> only harder to read and maintain, but harder to keep in its fast
>> >>>>> state, because someone else who doesn't know all the little tricks
>> >>>>> might come along and change things in a way that seems equivalent
>> >>>>> but
>> >>>>> makes things slower.
>> >>>>
>> >>>> Precisely. That's why it's good to stick to just one language, C++,
>> >>>> and nail
>> >>>> the speed. That sets the bar. Then one can try to match the speed
>> >>>> with Python,
>> >>>> which sometimes is possible.
>> >>>>
>> >>>>>
>> >>>>> With that being said, C++ is itself enough of a complexity cost that
>> >>>>> doing this outweighs using it in many (most?) cases. (That's not
>> >>>>> just
>> >>>>> a knock on C++; using any second language to Python brings a cost,
>> >>>>> both because there are now two languages to think about, and because
>> >>>>> of interoperability questions)
>> >>>>
>> >>>> Yes, again the reason why to stick to just one language and only do
>> >>>> thin
>> >>>> wrappers that allow to use it as a blackbox.
>> >>>>
>> >>>>
>> >>>>
>> >>>> On Tue, Apr 22, 2014 at 6:23 PM, Brian Granger <[email protected]>
>> >>>> wrote:
>> >>>>> I too feel that csympy should implement the absolute minimum
>> >>>>> possible
>> >>>>> for it to be fast. All actual mathematical algorithms should remain
>> >>>>> in
>> >>>>> sympy. Trying to pull lots of algorithms into csympy will fail - not
>> >>>>> that many people want to write complex mathematical algorithms in
>> >>>>> C++.
>> >>>>
>> >>>>
>> >>>> I understand your worries. Both yours and Aaron's and other people in
>> >>>> the SymPy community.
>> >>>> Let's be frank about this, here they are:
>> >>>
>> >>> Thanks for your frankness. So I think you do understand the issues.
>> >>>
>> >>>>
>> >>>> * keep things simple, maintainable, in Python, not introducing other
>> >>>> languages
>> >>>>
>> >>>> * especially not C++, which is notorious for being hilariously
>> >>>> complex. That's why we use Python, because it's better.
>> >>>
>> >>> Well, there are also languages that are fast and not C++, but we can
>> >>> have that discussion separately.
>> >>>
>> >>>>
>> >>>> * danger of splitting a community by introducing a separate CAS
>> >>>> (=wasting our resources, attention, developers, and so on), and we've
>> >>>> been on this path before, e.g. with with sympycore, or even Sage ---
>> >>>> any improvement to those codes does not benefit SymPy. That does not
>> >>>> mean that there is anything bad with those, but one has to try to
>> >>>> focus, in our case SymPy, and just get things done, make it a useful
>> >>>> library so that people can use it and benefit from it.
>> >>>>
>> >>>> * that we should concentrate on features, and sacrifice some
>> >>>> reasonable speed (maybe 100x or so).
>> >>>>
>> >>>> * We should not be reimplementing SymPy in C++, that would be a big
>> >>>> waste.
>> >>>
>> >>> One of my worries is not listed here, which is that you are doing
>> >>> things completely backwards from good software design with CSymPy,
>> >>> which is getting speed first, and something that works later.
>> >>
>> >> The goal is to get something that works first, polished API later.
>> >> By "work" I mean fast and working for PyDy (at first).
>> >>
>> >>>
>> >>>>
>> >>>>
>> >>>> I thought about all these very deeply. And I can promise that I will
>> >>>> do my absolute best to make this work, with the SymPy community. I
>> >>>> might not have the best answer to all the worries, but I know that we
>> >>>> need to set the bar:
>> >>>>
>> >>>> * So implement trig functions in C++
>> >>>> * Benchmark against Mathematica, Sage, GiNaC
>> >>>> * Make sure it is as fast or faster
>> >>>> * See if we can match it with Python
>> >>>> * If so, great! If not, what is the penalty? 2x? 10x? 100x?
>> >>>
>> >>> That is exactly what I want to know.
>> >>>
>> >>>>
>> >>>> If we don't have this bar, then we miss on speed. And if we miss on
>> >>>> speed then there will always be reason why people would use other
>> >>>> software, because of speed. If, on the other hand, csympy is as fast
>> >>>> as state of the art, then it fixes the problem. And it will be
>> >>>> integrated with SymPy well, people can keep using SymPy.
>> >>>>
>> >>>> Ondrej
>> >>>
>> >>> I feel like without real numbers, we aren't going to get anywhere, so
>> >>> maybe you could provide some benchmarks. I'm not convinced about
>> >>> expand, because that relies pretty heavily on other things like
>> >>> multinomial coefficient generation. I'd rather see a benchmark that
>> >>> does the exact same expression manipulations everywhere.
>> >>>
>> >>> Feel free to suggest a better one. I'm just coming up with this from
>> >>> the seat of my pants, but something like
>> >>>
>> >>> a = x
>> >>> c = 1
>> >>> for i in range(1000): # Replace with larger numbers if necessary
>> >>>     a += c*i*x # If CSymPy expressions are mutable modify this
>> >>> accordingly
>> >>>     c *= -1
>> >>
>> >> Sure. Here is the code:
>> >>
>> >> from csympy import var, Integer
>> >> #from sympy import var, Integer
>> >> var("x")
>> >> a = x
>> >> c = Integer(1)
>> >> N = 10**5
>> >> for i in range(N):
>> >>     a += c*i*x
>> >>     c *= -1
>> >> print a
>> >>
>> >>
>> >> SymPy:
>> >>
>> >> $ time python a.py
>> >> -49999*x
>> >>
>> >> real 0m35.262s
>> >> user 0m34.870s
>> >> sys 0m0.300s
>> >>
>> >> CSymPy:
>> >>
>> >> $ time python a.py
>> >> -49999x
>> >>
>> >> real 0m0.860s
>> >> user 0m0.852s
>> >> sys 0m0.004s
>> >>
>> >
>> > Comparing sympy.polys and sympy.core:
>> >
>> > In [1]: R, x = ring("x", ZZ)
>> >
>> > In [2]: y = Symbol("y")
>> >
>> > In [3]: N, a, c = 10**5, x, ZZ(1)
>> >
>> > In [4]: %time for i in range(N): a += c*i*x; c *= -1
>> > CPU times: user 564 ms, sys: 4.85 ms, total: 569 ms
>> > Wall time: 555 ms
>> >
>> > In [5]: N, a, c = 10**5, y, Integer(1)
>> >
>> > In [6]: %time for i in range(N): a += c*i*y; c *= -1
>> > CPU times: user 20 s, sys: 133 ms, total: 20.1 s
>> > Wall time: 20 s
>> >
>> >>
>> >> So this particular one is 41x faster in CSymPy. You can modify this to
>> >> generate some long expressions, e.g.:
>> >>
>> >> from csympy import var, Integer
>> >> #from sympy import var, Integer
>> >> var("x")
>> >> a = x
>> >> c = Integer(1)
>> >> N = 3000
>> >> for i in range(N):
>> >>     a += c*x**i
>> >>     c *= -1
>> >>
>> >> SymPy:
>> >>
>> >> $ time python a.py
>> >>
>> >> real 0m37.890s
>> >> user 0m37.626s
>> >> sys 0m0.152s
>> >>
>> >> CSymPy:
>> >>
>> >> $ time python a.py
>> >>
>> >> real 0m1.032s
>> >> user 0m1.020s
>> >> sys 0m0.012s
>> >>
>> >
>> > Comparing sympy.polys and sympy.core:
>> >
>> > In [1]: R, x = ring("x", ZZ)
>> >
>> > In [2]: y = Symbol("y")
>> >
>> > In [3]: N, a, c = 3000, x, ZZ(1)
>> >
>> > In [4]: %time for i in range(N): a += c*x**i; c *= -1
>> > CPU times: user 148 ms, sys: 4.3 ms, total: 152 ms
>> > Wall time: 147 ms
>> >
>> > In [5]: N, a, c = 3000, y, Integer(1)
>> >
>> > In [6]: %time for i in range(N): a += c*y**i; c *= -1
>> > CPU times: user 20.6 s, sys: 42.6 ms, total: 20.6 s
>> > Wall time: 20.6 s
>> >
>> > So, what's the difference between CSymPy's +=, *=, *, **, etc.
>> > operators and SymPy's ones? Are they in-place? What are the underlying
>> > data structures? Do they use the same set of rewrite rules? Do they
>> > take assumptions into account? When comparing sympy.polys and
>> > sympy.core, it's obvious that sympy.polys will be faster because it
>> > simply does a lot less compared to sympy.core.
>>
>> They don't use assumptions. I think assumptions should not be taken
>> into account in these manipulations, but rather using
>> refine().
>>
>> For the sympy.polys in this example,
>> is it using the sparse polynomial representation? I.e. it stores the
>> symbols (x, y, z) and then stores their powers and coefficients
>> in a dictionary, e.g.:
>>
>> 3*x^2*z + 2*x -> {3: (2, 0, 1), 2: (1, 0, 0)}
>>
>> ?
>>
>> In CSymPy, I am still implementing this datastructure, as it is very
>> efficient, but it only works for polynomials. So at the moment, we
>> can't benchmark this yet.
>>
>> The internal datastructure for the CSymPy benchmark above is currently
>> std::unordered_map inside CSymPy::Add.
>> So a better benchmark is something like this then:
>>
>> %time for i in range(N): a += c*x**(i*x); c *= -1
>>
>> which fails with:
>>
>> TypeError: int() argument must be a string or a number, not 'PolyElement'
>>
>> if you try to use polynomials.
>>
>> So in SymPy:
>>
>> In [1]: from sympy import Symbol, Integer
>>
>> In [2]: x = Symbol("x")
>>
>> In [3]: N, a, c = 3000, x, Integer(1)
>>
>> In [4]: %time for i in range(N): a += c*x**(i*x); c *= -1
>> CPU times: user 25.8 s, sys: 8 ms, total: 25.8 s
>> Wall time: 25.8 s
>>
>> And CSymPy:
>>
>> In [1]: from csympy import Symbol, Integer
>>
>> In [2]: x = Symbol("x")
>>
>> In [3]: N, a, c = 3000, x, Integer(1)
>>
>> In [4]: %time for i in range(N): a += c*x**(i*x); c *= -1
>> CPU times: user 1.17 s, sys: 0 ns, total: 1.17 s
>> Wall time: 1.17 s
>>
>>
>> Ondrej
>
> --
> You received this message because you are subscribed to the Google Groups
> "sympy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sympy.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/sympy/c64643bb-8e68-4fa5-8eee-8ed7f73c10a8%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sympy.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/sympy/CADDwiVBEcSo8-fWR9-RbSGoMCORxcZs2RTBnw_AMiCCa1B7QCA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [sympy] Questions about the goals of CSymPy

Reply via email to