For speed up how about parallel processing? Looking at Ryzen processors
16 cores and 32 threads are quite affordable these days. Is parallel
python mature code these days?
On 4/18/22 11:45 PM, Ondřej Čertík wrote:
Hi,
I am CCing the symengine list also. Let me say a few words about the design and
goals of SymEngine. It's a result of many years of experience with SymPy and
implementing high performance code in general. The goal is the fastest possible
symbolic manipulation. So we do not want to introduce slowdowns just to get a
different API, rather we should figure out how to create an API that works for
everybody, while keeping the speed.
The design is, as Isuru wrote below, quite close to optimal: SymEngine does not do any
unnecessary work. The C++ classes just "pass through" the data, which must be
prepared ahead of time. In order to get good speed, it needs to store the data in data
structures like a hash table, which restricts what it can represent, in return for good
speed. It seems that covers 90% of use cases. If you need to represent things differently
than what SymEngine can do (such as `x+x` instead of `2*x`), you can always use SymPy, or
we can implement some slower representations, but I think that should not be the default.
One idea for improvement is to represent all SymEngine classes in some kind of
a description language, similar to what we use for LFortran here:
https://gitlab.com/lfortran/lfortran/-/blob/8761fcee2bbc4cb924bf65f5d576541e58bb8b08/src/libasr/ASR.asdl,
and generate all the C++ classes automatically from it. This has many
advantages (much less code to maintain, consistent code, easy to add a new
symbolic function/object, etc.) and almost no downsides. One can then also
experiment with changing how things are represented.
Regarding the internal representation itself --- if anyone knows a faster
design, please let me know!
* One idea is not to use reference counted pointers at all, and just represent
everything by value. That means if you take a term from an Add, it will get
copied. When Add goes out of scope, it deallocates all its terms. Whether this
approach is overall faster is unclear, but it might be quite a bit faster if
you don't need to reuse subexpressions too much.
* Another orthogonal idea is to represent all symbols like "x" in a separate
structure (symbol table) and only reference them from expressions like x^3 +
2x^2+sin(x)...; The issue becomes again with memory management, but there are probably a
few ways forward.
* SymEngine was design to operate fast on general expressions. One can of
course get faster speed if you know that you are dealing with a polynomial or
other more specific structure, but this is an orthogonal idea. It seems just
getting the general case working really fast is worth it.
If anyone is interested in discussing this more, I am happy to have a video
meeting where we can discuss more. Just let me know!
Ondrej
On Fri, Apr 15, 2022, at 7:53 PM, Isuru Fernando wrote:
Hi Oscar,
Here's a few things that are different in SymEngine than SymPy.
As an example it would be trivial in SymPy to make substitutions
involving large expressions and many replacements much faster with a
small redesign. The problem though is backwards compatibility: each
expression class can implement its own _eval_subs method so for
backwards compatibility the subs implementation must recurse in the
same slow way once throughout the whole tree for each replacement that
is to be made. (There are ways to work around this but the design
makes it harder than it should be.)
In SymEngine, classes do not have such a method and therefore cannot
extend for eg: subs function.
However the subs function itself (SubsVisitor class technically) can be
extended.
This is a small example of a more general problem. Using classes means
you have to use the interfaces of the class which means that efficient
algorithms needing something not provided by that public interface are
impossible. Even worse both SymPy and SymEngine allow the
*constructors* of those classes to do nontrivial work without
providing any good way to override that. This means that you can't
even represent a symbolic expression without allowing arbitrary code
execution: it's impossible to optimize higher-level algorithms if you
have so little control over execution from the outside.
No, SymEngine does not do anything in the constructor of the C++ class
itself. (For eg: class "Add"). We have different functions (For eg:
function "add")
that do complicated functionality to create the C++ class. A user is
allowed
to create the C++ class directly and we assume that the data structure
that the
user passed is internally consistent in Release mode and we check user
input in Debug mode. This allows the user to do low level optimizations
when they know that going through the complicated function is
unnecessary.
There are ways that this can be improved in both SymPy and SymEngine
but I also think that for many important operations the basic design
used by these is limiting in a big-O sense: the design constrains what
algorithms can be used. SymEngine is faster than SymPy in a brute
force sense by using C++ rather than Python but it would be possible
to make something both faster and more flexible if a different design
was used at a basic level.
I disagree about this in SymEngine. For eg: SymPy's extensibility
of classes prevents optimizations that SymEngine can do because symengine
doesn't have an equivalence of `_eval_subs` or similar.
Some of the optimizations in SymEngine like Aaron said leads to better
big-O complexities. i.e. the ratio between the two modules is not always a
constant as input size increases.
Isuru
On Thu, Apr 14, 2022 at 1:26 PM Oscar Benjamin
<[email protected]> wrote:
On Thu, 14 Apr 2022 at 18:16, Tirthankar Mazumder
<[email protected]> wrote:
On Thursday, April 14, 2022 at 10:04:28 PM UTC+5:30 Oscar wrote:
On Tue, 12 Apr 2022 at 19:26, Matthias Köppe <[email protected]> wrote:
- status and plans regarding SymEngine
I don't know about the status of SymEngine. I can't say that I can see
any significant work happening on the SymPy side to integrate
SymEngine any further with SymPy. My personal view is that for faster
symbolics a different approach is needed in general but SymEngine
seems to have the same design flaws as SymPy itself in that respect.
Hi, as someone who is relatively new to the SymPy + SymEngine project, and
wants to work on SymEngine as a part of their GSoC, I would appreciate it if
you could elaborate a bit on what design flaws you are referring to.
The basic design of the way that symbolic expressions are represented
in SymPy and SymEngine is through the use of classes to represent
different types of expression e.g. there is a class Pow and an
expression like x**y is represented by an instance of that class
created as Pow(x, y).
Using classes like this makes it hard to extend the system with new
types of symbolic expression e.g. with SymEngine you can't make a new
kind of symbolic expression without writing C++ code so if you are
using SymEngine from Python then it isn't extensible. With SymPy you
could at least make your own symbolic expression class in Python but
then it still doesn't work so well because there are so many places in
the codebase where particular types of expression are special-cased
meaning that any new symbolic expression type cannot be handled by
functions like solve, integrate etc.
The other problem with using classes is that it means that the basic
data structure that is used to represent expressions is fixed from the
outside. In SymPy and SymEngine that data structure is a tree and all
algorithms recurse through that tree. More efficient data structures
for some given operation can't be used because the implementation of
each symbolic expression type requires that you always use instances
of the class meaning that you always have to have a tree and always
have to recurse in the same way.
As an example it would be trivial in SymPy to make substitutions
involving large expressions and many replacements much faster with a
small redesign. The problem though is backwards compatibility: each
expression class can implement its own _eval_subs method so for
backwards compatibility the subs implementation must recurse in the
same slow way once throughout the whole tree for each replacement that
is to be made. (There are ways to work around this but the design
makes it harder than it should be.)
This is a small example of a more general problem. Using classes means
you have to use the interfaces of the class which means that efficient
algorithms needing something not provided by that public interface are
impossible. Even worse both SymPy and SymEngine allow the
*constructors* of those classes to do nontrivial work without
providing any good way to override that. This means that you can't
even represent a symbolic expression without allowing arbitrary code
execution: it's impossible to optimise higher-level algorithms if you
have so little control over execution from the outside.
It's very hard later to change the public interfaces of the expression
classes because as soon as any downstream code has subclassed your
classes you are bound by backwards compatibility. (Subclassing across
the boundaries of different software projects leads to strong
coupling.)
There are ways that this can be improved in both SymPy and SymEngine
but I also think that for many important operations the basic design
used by these is limiting in a big-O sense: the design constrains what
algorithms can be used. SymEngine is faster than SymPy in a brute
force sense by using C++ rather than Python but it would be possible
to make something both faster and more flexible if a different design
was used at a basic level.
--
Oscar
--
You received this message because you are subscribed to the Google Groups
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
[email protected]
<mailto:sympy%[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/sympy/CAHVvXxTXJRY%3DtJ%3DPCtkNBgmd_nWmWd-wm9ax_w0zcPtXe_gynA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google
Groups "sympy" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/sympy/CA%2B01voM%2Byu65B%2Brtv5HQEVoxkvF95Gf27L8%3DH7N7ZbJQWUbcUg%40mail.gmail.com
<https://groups.google.com/d/msgid/sympy/CA%2B01voM%2Byu65B%2Brtv5HQEVoxkvF95Gf27L8%3DH7N7ZbJQWUbcUg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/sympy/9d80022f-c82d-70e5-9530-4f763c93cbf9%40gmail.com.