Re: [sympy] symjit

Shahriar Iravanian Sat, 12 Apr 2025 10:01:13 -0700

Hi Oscar,

Thank you very much for your response. I appreciate it. There is a lot to
think about.


Regarding the example, this is a tough test! It shows that there is a bug
in the x64-86 rust backend. Interestingly, the Python backend and the rust
one on ARM64 (MacOS) give the correct answer:

e = x**2 + x
for _ in range(10):
    e = e**2 + e
ed = e.diff(x)
f = compile_func([x], [ed], backend='python')

In [23]: %time f(0.001)
CPU times: user 94 μs, sys: 9 μs, total: 103 μs
Wall time: 105 μs
Out[23]: array([1.02233423])

In the short term, my goal is to work on correctness. I appreciate this
example and similar ones that stress test the code generator.

However, the bigger question is where symjit fits in the Python/sympy
ecosystem. It is lightweight because sympy expressions act as an
intermediate representation. LLVM and other compilers do a lot of work to
recreate the control flow graph (first as a tree and later on as a DAG)
from a linear sequence of instructions. Symjit doesn't do this because it
starts from a tree representation. Of course, as you mentioned, the
downside is that generating sympy expressions can be computationally
expensive. I don't know what the right abstraction is. Symjit already
converts sympy expressions into its internal tree structure (with nodes
like Unary and Binary). We could expose this structure to the users.
Moreover, it is possible to augment the tree structure by adding loops,
aggregate functions, and various functional accessories to allow for more
complex programs. However, this interface will be the key and must be
designed carefully.

Register allocation is a critical part of the code generators. It is tricky
because it depends on the exact ABI (which registers are used to pass
arguments, which ones are caller-saved and which ones are callee-saved, the
stack-frame...), which differ on different architectures and (of course)
different in Windows from Linux even on the same processor architecture. At
least MacOS follows *nix systems. Symjit uses a simple algorithm for
register allocation. It generates code like a stack machine but then
shadows some stack slots with scratch registers if possible. LLVM uses a
much more elaborate algorithm to allocate registers, but the results for
simple use cases like here are not that different.

Thanks again,

Shahriar

On Sat, Apr 12, 2025 at 11:32 AM Isuru Fernando <[email protected]> wrote:

> Hey Oscar,
>
> SymEngine's lambdify can be used too. It uses numpy arrays to support
> broadcasting
> and other features of sympy.
>
> from symengine import *
> x = symbols('x')
> e = x**2 + x
> for _ in range(10):
>     e = e**2 + e
> ed = e.diff(x)
>
> f = lambdify([x], [ed])
> print(f(.0001))
>
> To avoid a bit of overhead,
>
> import numpy as np
> from symengine.lib.symengine_wrapper import LLVMDouble
> a = np.array([0.0001])
> b = a.copy()
> f = LLVMDouble([x], ed, cse=True, opt_level=3)
> f.unsafe_real(a, b)
>
> The last one gives me
>
> In [43]: %time f.unsafe_real(a, b)
> CPU times: user 25 µs, sys: 2 µs, total: 27 µs
> Wall time: 30.5 µs
>
> In [49]: %timeit f.unsafe_real(a, b)
> 470 ns ± 170 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>
> compared to protosym's
>
> In [45]: %time f(.0001)
> CPU times: user 0 ns, sys: 44 µs, total: 44 µs
> Wall time: 47.7 µs
>
> In [47]: %timeit f(.0001)
> 399 ns ± 131 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
>
> SymEngine is a bit slower than protosym due to using memoryviews, but
> we can add an interface to avoid those.
>
> Regards,
> Isuru
>
>
>
> On Sat, Apr 12, 2025 at 8:02 AM Oscar Benjamin <[email protected]>
> wrote:
>
>> On Fri, 11 Apr 2025 at 20:49, Shahriar Iravanian <[email protected]>
>> wrote:
>> >
>> > The latest version of symjit (1.5.0) has just been published. By now,
>> the Rust backend is stabilized and generates code on Linus/Darwin/Windows
>> and x86-64 and arm64 machines.
>>
>> Wow this is amazing. I have been thinking for a long time that exactly
>> this is needed for precisely the reasons you show in the README but I
>> was working under the assumption that it would need something like
>> llvmlite. In protosym I added a lambdify function based on llvmlite
>> but symjit is just as fast without the massive llvmlite dependency and
>> can even be pure Python so super-portable.
>>
>> I am amazed at how simple the symjit code seems to be for what it
>> achieves. Maybe these things are not as complicated as they seem if
>> you are just someone who just knows how to write machine code...
>>
>> I have a prototype of how I wanted this to work for sympy in protosym:
>>
>> https://github.com/oscarbenjamin/protosym
>>
>> For comparison this is how protosym does it:
>>
>> # pip install protosym llvmlite
>> from protosym.simplecas import x, y, cos, sin, lambdify, Matrix, Expr
>>
>> e = x**2 + x
>> for _ in range(10):
>>     e = e**2 + e
>> ed = e.diff(x)
>>
>> f = lambdify([x], ed)
>> print(f(.0001))
>>
>> The expression here is converted to LLVM IR and compiled with
>> llvmlite. I'll show a simpler expression as a demonstration:
>>
>> In [9]: print((sin(x)**2 + cos(x)).to_llvm_ir([x]))
>>
>> ; ModuleID = "mod1"
>> target triple = "unknown-unknown-unknown"
>> target datalayout = ""
>>
>> declare double    @llvm.pow.f64(double %Val1, double %Val2)
>> declare double    @llvm.sin.f64(double %Val)
>> declare double    @llvm.cos.f64(double %Val)
>>
>> define double @"jit_func1"(double %"x")
>> {
>> %".0" = call double @llvm.sin.f64(double %"x")
>> %".1" = call double @llvm.pow.f64(double %".0", double 0x4000000000000000)
>> %".2" = call double @llvm.cos.f64(double %"x")
>> %".3" = fadd double %".1", %".2"
>> ret double %".3"
>> }
>>
>> For the particular benchmark ed shown above protosym is faster both at
>> compilation and evaluation:
>>
>> This is protosym:
>>
>> In [3]: %time f(0.001)
>> CPU times: user 37 μs, sys: 4 μs, total: 41 μs
>> Wall time: 51 μs
>> Out[3]: 1.0223342283660657
>>
>> In [4]: %timeit f(0.001)
>> 657 ns ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops
>> each)
>>
>> This is the equivalent with symjit's compile_func:
>>
>> In [3]: %time f(0.001)
>> CPU times: user 306 μs, sys: 8 μs, total: 314 μs
>> Wall time: 257 μs
>> Out[3]: array([0.00100401])
>>
>> In [4]: %timeit f(0.001)
>> 25.1 μs ± 148 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
>>
>> I think the reason for the speed difference here is that protsym first
>> converts the expression into a forward graph much like sympy's cse
>> function which handles all the repeating subexpressions efficiently. I
>> think symjit generates the code recursively without handling the
>> repeating subexpressions. Also the number from symjit here is
>> incorrect as confirmed by using exact rational numbers:
>>
>> In [6]: ed.subs({x: Rational(0.001)}).evalf()
>> Out[6]: 1.02233422836607
>>
>> I'm not sure if that difference is to do with the forward graph being
>> more numerically accurate or is it a bug in symjit?
>>
>> > Symjit also has a new plain Python-based backend, which depends only on
>> the Python standard library and numpy (the numpy dependency is not strictly
>> necessary) but can generate and run machine code routines. Currently, the
>> Python backend is used as a backup in cases where the compiled Rust code is
>> unavailable. However, it already works very well with a minimum performance
>> drop compared to the Rust backend.
>>
>> Possibly the most useful thing to do is to publish this as two
>> separate packages like symjit and symjit-rust. Then anyone can pip
>> install symjit for any Python version without needing binaries on PyPI
>> or a Rust toolchain installed locally. The symjit-rust backend can be
>> an optional dependency that makes things faster if installed. It can
>> also be possible to have symjit depend conditionally on symjit-rust
>> but only for platforms where a binary is provided on PyPI. That way no
>> one ever ends up doing pip install symjit and having it fail due to
>> missing rust toolchain.
>>
>> > I would like to have your suggestions and recommendations about the
>> next steps. I hope to add features that align with the maintainers' goals
>> for sympy. Some possibilities:
>> >
>> > 1. Expanding on the current focus on numerical computation and
>> numpy/scipy/matplotlib inter-operability, for example, adding other data
>> types besides double (single floats, complex numbers...).
>>
>> I don't know how difficult it is to do these things but generally yes
>> those would be useful.
>>
>> > 2. Fast polynomial evaluation, not only for floating point types, but
>> also over Z, Zp, and Q. The Python-only backend can be tightly coupled to
>> the polynomial subsystem. However, I don't know how useful having such a
>> fast polynomial evaluation function is, but, for example, it may be useful
>> in the combinatorial phase of the Zassenhaus algorithm. On the other hand,
>> it seems that sympy pivots toward using Flint for many such computations.
>>
>> Generally SymPy is going to use FLINT for these things but FLINT is
>> only an optional dependency. Some downstream users may prefer not to
>> use FLINT because it has a different licence (LGPL) whereas symjit has
>> the MIT license that pairs better with SymPy's BSD license.
>>
>> If symjit provided more general capability to just generate machine
>> code then I am sure that SymPy could make use of it for many of these
>> things. It would probably make more sense for the code that implements
>> those things to be in SymPy itself though with symjit as an optional
>> dependency that provides the code generation.
>>
>> > 3. A different area would be the Satisfiability module, where writing
>> fast SAT/SMT solver, with or without interfacing with Z3 or other solvers,
>> is possible.
>>
>> That would also be great but again I wonder if it makes sense to
>> include such specific things in symjit itself.
>>
>> I think that what you have made here in symjit is something that
>> people will want to use more broadly than SymPy. Maybe the most useful
>> thing would be for symjit to focus on the core code generation and
>> execution as a primitive that other libraries can build on. In other
>> words the ideal thing here would be that symjit provides a general
>> interface so that e.g. sympy's lambdify function could use symjit to
>> generate the code that it wants rather than symjit providing a
>> compile_func function directly.
>>
>> One downside of compile_func is precisely the fact that its input has
>> to be a sympy expression and just creating sympy expressions is slow.
>> This is something that we want to improve in sympy but realistically
>> the way to improve that is by using other types/representations like
>> symengine or protosym etc. I have some ideas for building new
>> representations of expressions so that many internal parts of sympy
>> could use those instead of the current slow expressions. Unfortunately
>> it is not going to be possible to make the user-facing sympy
>> expressions much faster unless at some point there is a significant
>> break in compatibility.
>>
>> The ideal thing here would be for symjit to provide an interface that
>> can be used to generate the code without needing a SymPy expression as
>> input. For example how would protosym use symjit without needing to
>> create a SymPy expression?
>>
>> I think that the reason that protosym is faster for the benchmark
>> shown above is because of the forward graph and so symjit could use
>> the same idea. What might be best though is to leave that sort of
>> thing for other libraries that would build on symjit and for symjit to
>> focus on being very good at providing comprehensive code generation
>> capabilities. I think that many Python libraries would want to build
>> on symjit to do all sorts of things because being able to generate
>> machine code directly like this is better in many ways than existing
>> approaches like llvmlite, numba, numexpr etc.
>>
>> The thing that is nice about generating the LLVM IR as compared to
>> generating machine code directly is that it gives you unlimited
>> registers but then LLVM figures out how to use a finite number of
>> registers on the backend. This makes the IR particularly suitable for
>> dumping in the forward graph without needing to think about different
>> architectures. Can symjit's machine code builders achieve the same
>> sort of thing? It's not clear to me exactly how the registers are
>> being managed.
>>
>> There is one important architecture for SymPy that symjit does not yet
>> generate code for which is wasm e.g. so you can run it in the browser:
>>
>> https://live.sympy.org/
>>
>> I don't know whether this sort of thing is even possible in wasm
>> though with its different memory safety rules.
>>
>> Does symjit work with PyPy or GraalPython or can it only be for CPython?
>>
>> --
>> Oscar
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "sympy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion visit
>> https://groups.google.com/d/msgid/sympy/CAHVvXxSb4wXbtEw%2BYqGSP%3DA%3DM1tTa_3NfpKwJ2UZywUKMKh6rQ%40mail.gmail.com
>> .
>>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "sympy" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/sympy/TBGpgEYtnWw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To view this discussion visit
> https://groups.google.com/d/msgid/sympy/CA%2B01voPGjNM4wR7CrkT2pMztZiza2KqipFzpwwJbWk3o0%2BT4hg%40mail.gmail.com
> <https://groups.google.com/d/msgid/sympy/CA%2B01voPGjNM4wR7CrkT2pMztZiza2KqipFzpwwJbWk3o0%2BT4hg%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"sympy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/sympy/CABEBXTapcv2WbT9i-m%2BR5jouan9j853Oes6tTAm18KTK1itpGw%40mail.gmail.com.

Re: [sympy] symjit

Reply via email to