> On Apr 23, 2014, at 12:38 AM, "Ondřej Čertík" <[email protected]> wrote: > >> On Tue, Apr 22, 2014 at 9:52 PM, Aaron Meurer <[email protected]> wrote: >>> On Tue, Apr 22, 2014 at 10:21 PM, Ondřej Čertík <[email protected]> >>> wrote: >>>> On Tue, Apr 22, 2014 at 6:06 PM, Aaron Meurer <[email protected]> wrote: >>>>> On Tue, Apr 22, 2014 at 12:05 PM, Ondřej Čertík <[email protected]> >>>>> wrote: >>>>> Hi Aaron, >>>>> >>>>> Those are good questions. Here are the answers: >>>>> >>>>>> On Tue, Apr 22, 2014 at 10:13 AM, Aaron Meurer <[email protected]> >>>>>> wrote: >>>>>> I have some high level questions about CSymPy. >>>>>> >>>>>> - What are the goals of the project? >>>>> >>>>> The goals of the project are: >>>>> >>>>> * Fastest symbolic manipulation library, compared to other codes, >>>>> commercial or opensource >>>>> (Sage, GiNaC, Mathematica, ...). >>>>> >>>>> * Extension/complement to SymPy >>>>> >>>>> * If the above two goals allow, be able to also call it from other >>>>> languages easily and efficiently (Julia, Ruby, Mathematica, ...) >>>>> >>>>> As to technical solution: the core should be a C++ library, which can >>>>> depend on other compiled libraries if needed. >>>>> The core should not depend on Python or Ruby or Julia, but rather be >>>>> just one language, C++. That lowers the barrier >>>>> of entry significantly, compared to a big mix of C++, Cython and >>>>> Python, makes it easier to make things fast >>>>> (you don't need to worry about Python at all). The Python (and other >>>>> languages) wrappers should be just a thin >>>>> wrappers around the C++ core (=just better syntax). >>>>> >>>>> There might be other technical solutions to this, but I know that I >>>>> can deliver the above goals with this solution >>>>> (and I failed to deliver with other solutions, like writing the core >>>>> in Cython). So that's why we do it this way. >>>>> >>>>> Also, by being "just a C++ library", other people can use it in their >>>>> projects. I hope to get interest of much broader >>>>> community that way, who can contribute back (somebody will need fast >>>>> symbolic manipulation in Julia, so they >>>>> can just use CSymPy with Julia wrappers, and contribute improvements >>>>> back). >>>>> >>>>>> >>>>>> - What are the things that should definitely go in CSymPy? >>>>> >>>>> At the moment: all things to make specific applications fast, in >>>>> particular PyDy. For that, it needs basic >>>>> manipulation, differentiation, series expansion (I think) and >>>>> matrices. That's all roughly either done, or >>>>> on the way. Of course, lots of polishing is needed. >>>> >>>> I think that's already too much. Why is the series expansion slow in >>>> SymPy? Is it because the algorithms are slow? If so, then implementing >>>> the same inefficient algorithms in CSymPy won't help. They will be >>>> faster, but for large enough expressions they will still slow down. Is >>>> it because the expression manipulation is slow? In that case, if >>>> CSymPy has faster expression manipulation, then just use those >>>> expressions, but use the SymPy series algorithms. >>> >>> My experience is that it will actually help to implement the same algorithm, >>> because there is a little overhead with any Python operation. >> >> Exactly. There is a "little" overhead. Not a huge overhead. It matters >> for the stuff that is in the inner loops, like addition and >> multiplication of terms, but for whole algorithms, which might be >> called only a few times (as opposed to a few hundred thousand times), >> it doesn't make a difference. >> >> This is all hypothetical without numbers (and btw, it would be awesome >> if you could provide real numbers here), but suppose these imaginary >> numbers were true: >> >> SymPy: 1x >> CSymPy with Python wrappers: 4x >> Raw CSymPy: 5x >> >> Then using CSymPy with Python would already be 4x faster than SymPy. >> Now doing everything in SymPy would only be 1.25 faster than that. >> >> Now, if CSymPy integrates flawlessly, so that it just works (at least >> as far as the user is concerned), there is little complexity cost of >> CSymPy + Python. Definitely little enough to warrant the 4x speedup. >> But as soon as you take that away, i.e., you implement more and more >> in C++, or CSymPy differs enough from SymPy that the user needs to >> care about it (which the more that is in CSymPy, the more likely this >> is to happen), then the complexity cost sky rockets. Maybe 4x would >> still be worth it here. But not 1.25x. >> >>> So if you do >>> a lot of them (like in series expansion, where you create intermediate >>> expressions >>> and so on --- and even if we use CSymPy, there is overhead in the >>> Python wrappers, >>> so essentially you don't want to be calling them too often, if you *really* >>> care >>> about performance), they will add up. >> >> Sure, you could just implement a whole CAS in C++. That's what some >> people have done already. But you have to factor in the costs of >> everything, not just the speed. The costs of: >> >> - How much more complicated the code is >> - Code duplication (and all the associated issues that come with it) >> - The additional overhead needed for CSymPy to interop with SymPy. The >> more CSymPy does, the harder this is. >> >> Is series expansion slow? Is it an inner loop (i.e., will it matter to >> people if it is slow)? Is it simple to implement ('simple' being a >> relative term of course; obviously no part of a CAS is completely >> simple)? If the answer to any of those is "no", I think you should >> seriously consider whether it's worth implementing. >> >>> >>>> >>>> My points are: >>>> >>>> - I think CSymPy should focus on making expression manipulation fast >>>> (i.e., the things that are the inner loop of any symbolic algorithm). >>>> It should not reimplement the symbolic algorithms themselves. Those >>>> are implemented in SymPy. If they use the CSymPy objects instead of >>>> the SymPy objects, they will be faster. >>>> >>>> - I would focus more on making CSymPy interoperate with SymPy and less >>>> on reimplementing things that are in SymPy in CSymPy. Once there is >>>> interoperation, we can see what is still slow, and then (and only >>>> then) implement it in C++. >>> >>> The interoperability is important and that is mostly the job of the >>> Python wrappers. >>> The C++ API underneath can change a lot, i.e. if we figure out a >>> faster way to represent >>> things, we'll switch. I will spend lots of time making the >>> interoperability work. >>> This summer though, I want to think hard about raw speed and making it work >>> for PyDy. >>> >>>> >>>>> >>>>>> >>>>>> - What are the things that should definitely not go in CSymPy? >>>>> >>>>> Things that don't need to be fast. Things like limits. Also things >>>>> that are in SymPy, where CSymPy can >>>>> be used as a drop in replacement for the engine: PyDy, some stuff in >>>>> physics, and so on. There is no need >>>>> to rewrite PyDy in C++. Also most user's code would stay in Python. >>>>> They can just optionally change >>>>> to CSymPy for some intensive calculation, then finish the thing with >>>>> SymPy. >>>>> >>>>>> >>>>>> - How will CSymPy be architectured to allow things to happen in CSymPy >>>>>> when they can but fallback to SymPy when they cannot. >>>>> >>>>> Currently you can simply mix and match SymPy and CSymPy expressions. >>>>> So you simply >>>>> convert an expression to SymPy to do some advanced manipulation, and >>>>> convert to CSymPy >>>>> to do some fast manipulation. I am open to suggestions how to improve >>>>> this. >>>>> >>>>>> >>>>>> My main concern here is that CSymPy has not clear separation from >>>>>> SymPy, and as a result it will end up growing larger and larger, until >>>>>> it becomes an independent CAS (which is fine if that's the goal, but >>>>>> my understanding was that it was supposed to be just a small fast >>>>>> core). >>>>> >>>>> The goals are written above. I am myself concentrating on speed, >>>>> that's what I really >>>>> want to nail down. And then enough features so that it's useful for >>>>> all the people who >>>>> found SymPy slow. However, let's say somebody comes and reimplements the >>>>> Gruntz >>>>> algorithm in CSymPy. Should we reject such a PR? My answer is that if >>>>> the code is nice, >>>>> maintainable, much faster than SymPy and has the same or similar >>>>> features, I am ok >>>>> with merging it. If the code is a mess, then not. But as I said, I am >>>>> spending my own >>>>> time on things which people need, and faster limits don't seem to be it. >>>>> >>>>>> >>>>>> In particular, if there is some feature of SymPy functions, how will >>>>>> CSymPy be architectured so that it can take advantage of it without >>>>>> having to completely reimplement that function in C++? >>>>> >>>>> You can convert any expression back and forth, so you keep it in SymPy >>>>> if you want to have some particular feature. See also the conversation >>>>> and specific examples here: >>>>> >>>>> https://github.com/sympy/csympy/issues/153 >>>>> >>>>>> >>>>>> For instance, a current goal of CSymPy is to implement trig functions. >>>>>> But this can be quite complicated if you consider all the different >>>>>> things you can do with trig functions. Without even thinking about >>>>>> trig simplification, there are complicated evaluation issues (e.g., >>>>>> consider sin(pi/7).rewrite(sqrt) in SymPy). It would be a shame to >>>>>> reimplement all this logic twice, especially it is not needed for >>>>>> performance. >>>>> >>>>> Agreed. On the other hand, we really need very fast trig functions. >>>>> The functionality >>>>> that we need is simplifications like sin(2*pi) -> 0, differentiation >>>>> and series expansion. >>>> >>>> Why do you need to implement those in C++? If the expression >>>> manipulation is fast, then won't it be fine to have the actual >>>> formulas/algorithms in SymPy? >>> >>> Maybe, that depends on this issue: >>> >>> https://github.com/sympy/csympy/issues/153 >>> >>> The problem is that once you start thinking about Python+C++ at once >>> and performance, things get complex quickly. It's much easier >>> to think in terms of C++ only and how to write the fastest possible >>> algorithm >>> (that is hard enough!). This sets the bar. Then one should try to see >>> if it is possible to match this with Python. Not the other way round, >>> because you need to set the bar first. >>> >>> >>> >>>> On Tue, Apr 22, 2014 at 6:16 PM, Aaron Meurer <[email protected]> wrote: >>>>> On Tue, Apr 22, 2014 at 4:58 PM, Joachim Durchholz <[email protected]> >>>>> wrote: >>>>> Hm. >>>>> >>>>> One: >>>>>> * Extension/complement to SymPy >>>>> >>>>> Two: >>>>> >>>>> >>>>>> That lowers the barrier >>>>>> of entry significantly, compared to a big mix of C++, Cython and >>>>>> Python, makes it easier to make things fast >>>>>> (you don't need to worry about Python at all). >>>>> >>>>> >>>>> That's not an extension nor a complement, it's a replacement. >>>>> >>>>> >>>>>> The Python (and other >>>>>> >>>>>> languages) wrappers should be just a thin >>>>>> wrappers around the C++ core (=just better syntax). >>>>> >>>>> >>>>> I.e. replace the engine if not the API. >>>>> >>>>> Not that I'm judging. I'm just pointing out perceived inconsistencies. >>>>> >>>>> The more SymPy itself turns into a set of simplification rulese, the less >>>>> significance this will have in the end. >>>>> >>>>> >>>>>>> - How will CSymPy be architectured to allow things to happen in CSymPy >>>>>>> when they can but fallback to SymPy when they cannot. >>>>>> >>>>>> >>>>>> Currently you can simply mix and match SymPy and CSymPy expressions. >>>>>> So you simply >>>>>> convert an expression to SymPy to do some advanced manipulation, and >>>>>> convert to CSymPy >>>>>> to do some fast manipulation. I am open to suggestions how to improve >>>>>> this. >>>>> >>>>> >>>>> The alternative would be to have a data structure that can be manipulated >>>>> from both the C++ and the Python side, but that's going to be unnatural >>>>> for >>>>> at least one of the sides. >>>>> >>>>> Note that the data structures can become large-ish, and if the >>>>> simplification becomes complicated there may be a lot of back-and-forth. >>>>> It's possible that mixed execution will be slow for some algorithms or use >>>>> cases for that reason. >>>>> >>>>> I do not think that this can be determined in advance, it's something to >>>>> keep an eye out for during benchmarks. >>>>> >>>>> >>>>>>> My main concern here is that CSymPy has not clear separation from >>>>>>> SymPy, and as a result it will end up growing larger and larger, until >>>>>>> it becomes an independent CAS (which is fine if that's the goal, but >>>>>>> my understanding was that it was supposed to be just a small fast >>>>>>> core). >>>>>> >>>>>> >>>>>> The goals are written above. I am myself concentrating on speed, >>>>>> that's what I really >>>>>> want to nail down. >>>>> >>>>> >>>>> I'm somewhat sceptical about this. >>>>> A conversion to C++ will give a linear improvement. >>>>> Better algorithms can improve the big-Oh class. >>>>> Unless algorithmic improvements have been exhausted, this would be >>>>> premature >>>>> optimization. (Are algorithmic improvements exhausted yet?) >>>> >>>> That is true. I usually prefer to use faster algorithms. >>>> >>>> But to be the devil's advocate, there are two issues with this line of >>>> thinking: >>>> >>>> - Big O is basically useless. Consider the extreme effectiveness of >>>> SAT solvers (which solve an NP-complete problem), or the difference >>>> between the simplex and Khachiyan's algorithm, or AKS vs. more >>>> efficient deterministic primality testing algorithms. Asymptotic >>>> complexity is all fine, but at the end of the day, you don't care how >>>> fast your algorithm is for increasingly large inputs, you care how >>>> fast it is for *your* input. >>>> >>>> - Faster algorithms have a complexity cost. You can get closer to the >>>> metal in Python by being very careful about your use of data >>>> structures, and avoiding things that are slow in Python (like function >>>> calls), but the cost is high because you end up with code that is not >>>> only harder to read and maintain, but harder to keep in its fast >>>> state, because someone else who doesn't know all the little tricks >>>> might come along and change things in a way that seems equivalent but >>>> makes things slower. >>> >>> Precisely. That's why it's good to stick to just one language, C++, and nail >>> the speed. That sets the bar. Then one can try to match the speed with >>> Python, >>> which sometimes is possible. >>> >>>> >>>> With that being said, C++ is itself enough of a complexity cost that >>>> doing this outweighs using it in many (most?) cases. (That's not just >>>> a knock on C++; using any second language to Python brings a cost, >>>> both because there are now two languages to think about, and because >>>> of interoperability questions) >>> >>> Yes, again the reason why to stick to just one language and only do thin >>> wrappers that allow to use it as a blackbox. >>> >>> >>> >>>> On Tue, Apr 22, 2014 at 6:23 PM, Brian Granger <[email protected]> wrote: >>>> I too feel that csympy should implement the absolute minimum possible >>>> for it to be fast. All actual mathematical algorithms should remain in >>>> sympy. Trying to pull lots of algorithms into csympy will fail - not >>>> that many people want to write complex mathematical algorithms in C++. >>> >>> >>> I understand your worries. Both yours and Aaron's and other people in >>> the SymPy community. >>> Let's be frank about this, here they are: >> >> Thanks for your frankness. So I think you do understand the issues. >> >>> >>> * keep things simple, maintainable, in Python, not introducing other >>> languages >>> >>> * especially not C++, which is notorious for being hilariously >>> complex. That's why we use Python, because it's better. >> >> Well, there are also languages that are fast and not C++, but we can >> have that discussion separately. >> >>> >>> * danger of splitting a community by introducing a separate CAS >>> (=wasting our resources, attention, developers, and so on), and we've >>> been on this path before, e.g. with with sympycore, or even Sage --- >>> any improvement to those codes does not benefit SymPy. That does not >>> mean that there is anything bad with those, but one has to try to >>> focus, in our case SymPy, and just get things done, make it a useful >>> library so that people can use it and benefit from it. >>> >>> * that we should concentrate on features, and sacrifice some >>> reasonable speed (maybe 100x or so). >>> >>> * We should not be reimplementing SymPy in C++, that would be a big waste. >> >> One of my worries is not listed here, which is that you are doing >> things completely backwards from good software design with CSymPy, >> which is getting speed first, and something that works later. > > The goal is to get something that works first, polished API later. > By "work" I mean fast and working for PyDy (at first). > >> >>> >>> >>> I thought about all these very deeply. And I can promise that I will >>> do my absolute best to make this work, with the SymPy community. I >>> might not have the best answer to all the worries, but I know that we >>> need to set the bar: >>> >>> * So implement trig functions in C++ >>> * Benchmark against Mathematica, Sage, GiNaC >>> * Make sure it is as fast or faster >>> * See if we can match it with Python >>> * If so, great! If not, what is the penalty? 2x? 10x? 100x? >> >> That is exactly what I want to know. >> >>> >>> If we don't have this bar, then we miss on speed. And if we miss on >>> speed then there will always be reason why people would use other >>> software, because of speed. If, on the other hand, csympy is as fast >>> as state of the art, then it fixes the problem. And it will be >>> integrated with SymPy well, people can keep using SymPy. >>> >>> Ondrej >> >> I feel like without real numbers, we aren't going to get anywhere, so >> maybe you could provide some benchmarks. I'm not convinced about >> expand, because that relies pretty heavily on other things like >> multinomial coefficient generation. I'd rather see a benchmark that >> does the exact same expression manipulations everywhere. >> >> Feel free to suggest a better one. I'm just coming up with this from >> the seat of my pants, but something like >> >> a = x >> c = 1 >> for i in range(1000): # Replace with larger numbers if necessary >> a += c*i*x # If CSymPy expressions are mutable modify this accordingly >> c *= -1 > > Sure. Here is the code: > > from csympy import var, Integer > #from sympy import var, Integer > var("x") > a = x > c = Integer(1) > N = 10**5 > for i in range(N): > a += c*i*x > c *= -1 > print a > > > SymPy: > > $ time python a.py > -49999*x > > real 0m35.262s > user 0m34.870s > sys 0m0.300s > > CSymPy: > > $ time python a.py > -49999x > > real 0m0.860s > user 0m0.852s > sys 0m0.004s > > > So this particular one is 41x faster in CSymPy. You can modify this to > generate some long expressions, e.g.: > > from csympy import var, Integer > #from sympy import var, Integer > var("x") > a = x > c = Integer(1) > N = 3000 > for i in range(N): > a += c*x**i > c *= -1 > > SymPy: > > $ time python a.py > > real 0m37.890s > user 0m37.626s > sys 0m0.152s > > CSymPy: > > $ time python a.py > > real 0m1.032s > user 0m1.020s > sys 0m0.012s > > > This one is 37x faster. Sage takes about 3s on this benchmark. And so on.
And how fast is the raw C++, without the python wrappers? Aaron Meurer > > We need to also benchmark Mathematica and GiNaC. > >> >> This is a very basic test of how well terms are combined with >> addition. It also creates a lot of terms, which might penalize a >> system with too much caching (maybe it should be modified to not do >> that; I don't know). >> >> 1000 should really be replaced with a much larger number, say >> 10000000. Basically, if a benchmark runs in under a second, I'm weary >> of it. That means you're benchmarking things that might be irrelevant >> for larger expressions, when the speed really matters. I suppose >> really you should benchmark 10**N for some range of N and plot the >> results. > > Exactly, plot depending on N is the best. > > Ondrej > >> >> That's just one benchmark, that tests one thing. We need to benchmark >> all the major aspects of the core. A few others that come to mind: >> >> - x1*...*xn + x2*...*xn*x1 + ... (basically, test that large >> multiplications in different orders are efficiently canonicalized to >> the same thing) >> - (some large addition) - (the same addition, in a different order) >> - x**x**x**x**...**x (do some stuff with it; basically test that >> highly nested expressions don't kill things. You could also test >> sin(sin(sin(sin(...(x)...) or something) >> - (Huge complicated expression in x).subs(x, y) (CSymPy implements >> some kind of subs or xreplace natively, right?) >> >> It actually would be nice to have some kind of benchmarking site for >> SymPy, similar to http://speed.pypy.org/. >> >> Aaron Meurer >> >> -- >> You received this message because you are subscribed to the Google Groups >> "sympy" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/sympy. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/sympy/CAKgW%3D6KX8h7FRnKWa%2BjFihSfiu-9rrexYZmjXymOimLsUQ7BeQ%40mail.gmail.com. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "sympy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/sympy. > To view this discussion on the web visit > https://groups.google.com/d/msgid/sympy/CADDwiVCEo1q-UvKv_bpsDn%3DT3d8suxNqdVnvN8dBeEV7j3CQ4w%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sympy. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/5765638646223569605%40unknownmsgid. For more options, visit https://groups.google.com/d/optout.
