[SciPy-Dev] Question: Can LowLevelCallable functions be beneficial to scipy.optimize performance?

d . lenard80 Thu, 15 Feb 2024 11:21:25 -0800

Hello SciPy Team,

Recently, I came across a blog article titled "Julia: 17X Faster than Python's 
SciPy" which discussed performance benchmarks comparing Julia and Python for 
scientific computing tasks. In the article, the author highlighted significant 
performance gains in Julia compared to Python's SciPy library, particularly in 
the context of using scipy.optimize.curve_fit as an example.
The article can be found here: 
https://www.matecdev.com/posts/julia-17x-faster-vs-python-scipy.html


Coincidentally, just a few days ago, I discovered LowLevelCallable functions as 
a way to speed up quadrature operations. I speculated that the overhead of 
Python functions might explain the performance difference described in the 
mentioned article. However, upon further investigation, I realized that 
LowLevelCallable functions are not currently utilized for optimizers in 
scipy.optimize.curve_fit.

In my search for more information on this topic, I stumbled upon the GitHub 
issue titled "Passing LowLevelCallable to Optimizers" (#11783), where Mr. 
Gommers from SciPy expressed reservations about the implementation of 
LowLevelCallable functions as inputs to minimization problems in SciPy. He 
noted that there is not much overhead in using Python functions, and the 
potential benefits are likely to be very small.

I conducted some experiments with scipy.integrate.quad and observed speed-ups 
ranging from 10-20x when using LowLevelCallable functions compared to Python 
functions for quadrature tasks. These findings suggest that LowLevelCallable 
functions can offer performance improvements in certain scenarios.

This leads me to wonder: are the benefits of LowLevelCallable specific to the 
scipy.integrate subpackage, or could it be beneficial for the scipy.optimize 
subpackage as well?

I wanted to share these findings and seek your expert opinion on the matter.

Best regards,
Oyibo

Below is the code of the performance check. I used Numba to convert the Python 
function into a LowLevelCallable.

# ****************************************************************************
import numpy as np
import numba as nb
import numba.types as nbt
import scipy.integrate as si
from scipy import LowLevelCallable
import timeit

# ****************************************************************************
# Case 1: Integrate with Python function
def py_integrand(t):
    return np.exp(-t) / t**2

result = si.quad(py_integrand, 1, np.inf)
print('Python:')
print('Integral Result:', result[0])
# Integral Result: 0.14849550677592208

# ****************************************************************************
# Case 2: Integrate with LowLevelCallable (use Numba to convert)
llc_integrand = nb.cfunc("float64(float64)")(py_integrand).ctypes
result = si.quad(llc_integrand, 1, np.inf)
print('\nLowLevelCallable:')
print('Integral Result:', result[0])
# Integral Result: 0.14849550677592208

# ****************************************************************************
# Case 3: Integrate with LowLevelCallable for multiple args
#         (A little bit more code but more versatile and faster)
def LowLevelCallable_integrand(integrand_func):
    jitted_function = nb.njit(integrand_func)

    # @cfunc('float64(int32, float64*)')
    @nb.cfunc(nbt.float64(nbt.intc, nbt.CPointer(nbt.float64)))
    def wrapped(n, xx):
        values = nb.carray(xx, n)
        return jitted_function(values)
    return LowLevelCallable(wrapped.ctypes)

@LowLevelCallable_integrand
def llc_integrand_gen(args):
    t = args[0]
    return np.exp(-t) / t**2

result = si.quad(llc_integrand_gen, 1, np.inf)
print('\nLowLevelCallable (generalized, can be used for additional args):')
print('Integral Result:', result[0])
# Integral Result: 0.14849550677592208

# ****************************************************************************
# Measuring execution time
num_repeats = 10_000
num_runs = 10
time_py = np.mean(timeit.repeat(lambda: si.quad(py_integrand, 1, np.inf),
                                repeat=num_repeats, number=num_runs))*1000
time_llc = np.mean(timeit.repeat(lambda: si.quad(llc_integrand, 1, np.inf),
                                 repeat=num_repeats, number=num_runs))*1000
time_llc_gen = np.mean(timeit.repeat(lambda: si.quad(llc_integrand_gen, 1, 
np.inf),
                                     repeat=num_repeats, number=num_runs))*1000
print(f'\nExecution Time (mean of {num_runs} runs, {num_repeats} repeats):')
print(f'Python Callable:      {time_py:.3f} ms')
print(f'LowLevelCallable:     {time_llc:.3f} ms')
print(f'LowLevelCallable_gen: {time_llc_gen:.3f} ms')
print(f'time_py/time_llc:      {time_py/time_llc:.1f}x')
print(f'time_py/time_llc_gen:  {time_py/time_llc_gen:.1f}x')
# Execution Time (mean of 10 runs, 10000 repeats):
# Python Callable:             1.218 ms
# LowLevelCallable:          0.114 ms
# LowLevelCallable_gen: 0.065 ms
# time_py/time_llc:           10.6x
# time_py/time_llc_gen:  18.7x
_______________________________________________
SciPy-Dev mailing list -- scipy-dev@python.org
To unsubscribe send an email to scipy-dev-le...@python.org
https://mail.python.org/mailman3/lists/scipy-dev.python.org/
Member address: arch...@mail-archive.com

[SciPy-Dev] Question: Can LowLevelCallable functions be beneficial to scipy.optimize performance?

Reply via email to