On Tue, Apr 22, 2014 at 6:06 PM, Aaron Meurer <[email protected]> wrote: > On Tue, Apr 22, 2014 at 12:05 PM, Ondřej Čertík <[email protected]> > wrote: >> Hi Aaron, >> >> Those are good questions. Here are the answers: >> >> On Tue, Apr 22, 2014 at 10:13 AM, Aaron Meurer <[email protected]> wrote: >>> I have some high level questions about CSymPy. >>> >>> - What are the goals of the project? >> >> The goals of the project are: >> >> * Fastest symbolic manipulation library, compared to other codes, >> commercial or opensource >> (Sage, GiNaC, Mathematica, ...). >> >> * Extension/complement to SymPy >> >> * If the above two goals allow, be able to also call it from other >> languages easily and efficiently (Julia, Ruby, Mathematica, ...) >> >> As to technical solution: the core should be a C++ library, which can >> depend on other compiled libraries if needed. >> The core should not depend on Python or Ruby or Julia, but rather be >> just one language, C++. That lowers the barrier >> of entry significantly, compared to a big mix of C++, Cython and >> Python, makes it easier to make things fast >> (you don't need to worry about Python at all). The Python (and other >> languages) wrappers should be just a thin >> wrappers around the C++ core (=just better syntax). >> >> There might be other technical solutions to this, but I know that I >> can deliver the above goals with this solution >> (and I failed to deliver with other solutions, like writing the core >> in Cython). So that's why we do it this way. >> >> Also, by being "just a C++ library", other people can use it in their >> projects. I hope to get interest of much broader >> community that way, who can contribute back (somebody will need fast >> symbolic manipulation in Julia, so they >> can just use CSymPy with Julia wrappers, and contribute improvements back). >> >>> >>> - What are the things that should definitely go in CSymPy? >> >> At the moment: all things to make specific applications fast, in >> particular PyDy. For that, it needs basic >> manipulation, differentiation, series expansion (I think) and >> matrices. That's all roughly either done, or >> on the way. Of course, lots of polishing is needed. > > I think that's already too much. Why is the series expansion slow in > SymPy? Is it because the algorithms are slow? If so, then implementing > the same inefficient algorithms in CSymPy won't help. They will be > faster, but for large enough expressions they will still slow down. Is > it because the expression manipulation is slow? In that case, if > CSymPy has faster expression manipulation, then just use those > expressions, but use the SymPy series algorithms.
My experience is that it will actually help to implement the same algorithm, because there is a little overhead with any Python operation. So if you do a lot of them (like in series expansion, where you create intermediate expressions and so on --- and even if we use CSymPy, there is overhead in the Python wrappers, so essentially you don't want to be calling them too often, if you *really* care about performance), they will add up. > > My points are: > > - I think CSymPy should focus on making expression manipulation fast > (i.e., the things that are the inner loop of any symbolic algorithm). > It should not reimplement the symbolic algorithms themselves. Those > are implemented in SymPy. If they use the CSymPy objects instead of > the SymPy objects, they will be faster. > > - I would focus more on making CSymPy interoperate with SymPy and less > on reimplementing things that are in SymPy in CSymPy. Once there is > interoperation, we can see what is still slow, and then (and only > then) implement it in C++. The interoperability is important and that is mostly the job of the Python wrappers. The C++ API underneath can change a lot, i.e. if we figure out a faster way to represent things, we'll switch. I will spend lots of time making the interoperability work. This summer though, I want to think hard about raw speed and making it work for PyDy. > >> >>> >>> - What are the things that should definitely not go in CSymPy? >> >> Things that don't need to be fast. Things like limits. Also things >> that are in SymPy, where CSymPy can >> be used as a drop in replacement for the engine: PyDy, some stuff in >> physics, and so on. There is no need >> to rewrite PyDy in C++. Also most user's code would stay in Python. >> They can just optionally change >> to CSymPy for some intensive calculation, then finish the thing with SymPy. >> >>> >>> - How will CSymPy be architectured to allow things to happen in CSymPy >>> when they can but fallback to SymPy when they cannot. >> >> Currently you can simply mix and match SymPy and CSymPy expressions. >> So you simply >> convert an expression to SymPy to do some advanced manipulation, and >> convert to CSymPy >> to do some fast manipulation. I am open to suggestions how to improve this. >> >>> >>> My main concern here is that CSymPy has not clear separation from >>> SymPy, and as a result it will end up growing larger and larger, until >>> it becomes an independent CAS (which is fine if that's the goal, but >>> my understanding was that it was supposed to be just a small fast >>> core). >> >> The goals are written above. I am myself concentrating on speed, >> that's what I really >> want to nail down. And then enough features so that it's useful for >> all the people who >> found SymPy slow. However, let's say somebody comes and reimplements the >> Gruntz >> algorithm in CSymPy. Should we reject such a PR? My answer is that if >> the code is nice, >> maintainable, much faster than SymPy and has the same or similar >> features, I am ok >> with merging it. If the code is a mess, then not. But as I said, I am >> spending my own >> time on things which people need, and faster limits don't seem to be it. >> >>> >>> In particular, if there is some feature of SymPy functions, how will >>> CSymPy be architectured so that it can take advantage of it without >>> having to completely reimplement that function in C++? >> >> You can convert any expression back and forth, so you keep it in SymPy >> if you want to have some particular feature. See also the conversation >> and specific examples here: >> >> https://github.com/sympy/csympy/issues/153 >> >>> >>> For instance, a current goal of CSymPy is to implement trig functions. >>> But this can be quite complicated if you consider all the different >>> things you can do with trig functions. Without even thinking about >>> trig simplification, there are complicated evaluation issues (e.g., >>> consider sin(pi/7).rewrite(sqrt) in SymPy). It would be a shame to >>> reimplement all this logic twice, especially it is not needed for >>> performance. >> >> Agreed. On the other hand, we really need very fast trig functions. >> The functionality >> that we need is simplifications like sin(2*pi) -> 0, differentiation >> and series expansion. > > Why do you need to implement those in C++? If the expression > manipulation is fast, then won't it be fine to have the actual > formulas/algorithms in SymPy? Maybe, that depends on this issue: https://github.com/sympy/csympy/issues/153 The problem is that once you start thinking about Python+C++ at once and performance, things get complex quickly. It's much easier to think in terms of C++ only and how to write the fastest possible algorithm (that is hard enough!). This sets the bar. Then one should try to see if it is possible to match this with Python. Not the other way round, because you need to set the bar first. On Tue, Apr 22, 2014 at 6:16 PM, Aaron Meurer <[email protected]> wrote: > On Tue, Apr 22, 2014 at 4:58 PM, Joachim Durchholz <[email protected]> wrote: >> Hm. >> >> One: >>> * Extension/complement to SymPy >> >> Two: >> >> >>> That lowers the barrier >>> of entry significantly, compared to a big mix of C++, Cython and >>> Python, makes it easier to make things fast >>> (you don't need to worry about Python at all). >> >> >> That's not an extension nor a complement, it's a replacement. >> >> >>> The Python (and other >>> >>> languages) wrappers should be just a thin >>> wrappers around the C++ core (=just better syntax). >> >> >> I.e. replace the engine if not the API. >> >> Not that I'm judging. I'm just pointing out perceived inconsistencies. >> >> The more SymPy itself turns into a set of simplification rulese, the less >> significance this will have in the end. >> >> >>>> - How will CSymPy be architectured to allow things to happen in CSymPy >>>> when they can but fallback to SymPy when they cannot. >>> >>> >>> Currently you can simply mix and match SymPy and CSymPy expressions. >>> So you simply >>> convert an expression to SymPy to do some advanced manipulation, and >>> convert to CSymPy >>> to do some fast manipulation. I am open to suggestions how to improve >>> this. >> >> >> The alternative would be to have a data structure that can be manipulated >> from both the C++ and the Python side, but that's going to be unnatural for >> at least one of the sides. >> >> Note that the data structures can become large-ish, and if the >> simplification becomes complicated there may be a lot of back-and-forth. >> It's possible that mixed execution will be slow for some algorithms or use >> cases for that reason. >> >> I do not think that this can be determined in advance, it's something to >> keep an eye out for during benchmarks. >> >> >>>> My main concern here is that CSymPy has not clear separation from >>>> SymPy, and as a result it will end up growing larger and larger, until >>>> it becomes an independent CAS (which is fine if that's the goal, but >>>> my understanding was that it was supposed to be just a small fast >>>> core). >>> >>> >>> The goals are written above. I am myself concentrating on speed, >>> that's what I really >>> want to nail down. >> >> >> I'm somewhat sceptical about this. >> A conversion to C++ will give a linear improvement. >> Better algorithms can improve the big-Oh class. >> Unless algorithmic improvements have been exhausted, this would be premature >> optimization. (Are algorithmic improvements exhausted yet?) > > That is true. I usually prefer to use faster algorithms. > > But to be the devil's advocate, there are two issues with this line of > thinking: > > - Big O is basically useless. Consider the extreme effectiveness of > SAT solvers (which solve an NP-complete problem), or the difference > between the simplex and Khachiyan's algorithm, or AKS vs. more > efficient deterministic primality testing algorithms. Asymptotic > complexity is all fine, but at the end of the day, you don't care how > fast your algorithm is for increasingly large inputs, you care how > fast it is for *your* input. > > - Faster algorithms have a complexity cost. You can get closer to the > metal in Python by being very careful about your use of data > structures, and avoiding things that are slow in Python (like function > calls), but the cost is high because you end up with code that is not > only harder to read and maintain, but harder to keep in its fast > state, because someone else who doesn't know all the little tricks > might come along and change things in a way that seems equivalent but > makes things slower. Precisely. That's why it's good to stick to just one language, C++, and nail the speed. That sets the bar. Then one can try to match the speed with Python, which sometimes is possible. > > With that being said, C++ is itself enough of a complexity cost that > doing this outweighs using it in many (most?) cases. (That's not just > a knock on C++; using any second language to Python brings a cost, > both because there are now two languages to think about, and because > of interoperability questions) Yes, again the reason why to stick to just one language and only do thin wrappers that allow to use it as a blackbox. On Tue, Apr 22, 2014 at 6:23 PM, Brian Granger <[email protected]> wrote: > I too feel that csympy should implement the absolute minimum possible > for it to be fast. All actual mathematical algorithms should remain in > sympy. Trying to pull lots of algorithms into csympy will fail - not > that many people want to write complex mathematical algorithms in C++. I understand your worries. Both yours and Aaron's and other people in the SymPy community. Let's be frank about this, here they are: * keep things simple, maintainable, in Python, not introducing other languages * especially not C++, which is notorious for being hilariously complex. That's why we use Python, because it's better. * danger of splitting a community by introducing a separate CAS (=wasting our resources, attention, developers, and so on), and we've been on this path before, e.g. with with sympycore, or even Sage --- any improvement to those codes does not benefit SymPy. That does not mean that there is anything bad with those, but one has to try to focus, in our case SymPy, and just get things done, make it a useful library so that people can use it and benefit from it. * that we should concentrate on features, and sacrifice some reasonable speed (maybe 100x or so). * We should not be reimplementing SymPy in C++, that would be a big waste. I thought about all these very deeply. And I can promise that I will do my absolute best to make this work, with the SymPy community. I might not have the best answer to all the worries, but I know that we need to set the bar: * So implement trig functions in C++ * Benchmark against Mathematica, Sage, GiNaC * Make sure it is as fast or faster * See if we can match it with Python * If so, great! If not, what is the penalty? 2x? 10x? 100x? If we don't have this bar, then we miss on speed. And if we miss on speed then there will always be reason why people would use other software, because of speed. If, on the other hand, csympy is as fast as state of the art, then it fixes the problem. And it will be integrated with SymPy well, people can keep using SymPy. Ondrej -- You received this message because you are subscribed to the Google Groups "sympy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sympy. To view this discussion on the web visit https://groups.google.com/d/msgid/sympy/CADDwiVD6eeuO7E0qs_-GvngYENoM-qmUx_8rjfqmatL-Ywdhkw%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
