[petsc-users] Python PETSc performance vs scipy ZVODE

Niclas Götting Wed, 09 Aug 2023 08:10:07 -0700

Hi all,

I'm currently trying to convert a quantum simulation from scipy toPETSc. The problem itself is extremely simple and of the form \dot{u}(t)= (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case isa square function. The matrices A_const and B_const are extremely sparseand therefore I thought, the problem will be well suited for PETSc.Currently, I solve the ODE with the following procedure in scipy (I canprovide the necessary data files, if needed, but they are just sometrace-preserving, very sparse matrices):


import numpy as np
import scipy.sparse
import scipy.integrate

from tqdm import tqdm


l = np.load("../liouvillian.npy")
pump = np.load("../pump_operator.npy")
state = np.load("../initial_state.npy")

l = scipy.sparse.csr_array(l)
pump = scipy.sparse.csr_array(pump)

def f(t, y, *args):
    return (l + 0.5 * (5 < t < 10) * pump) @ y
    #return l @ y # Uncomment for f(t) = 0

dt = 0.1
NUM_STEPS = 200
res = np.empty((NUM_STEPS, 4096), dtype=np.complex128)

solver =scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)

times = []
for i in tqdm(range(NUM_STEPS)):
    res[i, :] = solver.integrate(solver.t + dt)
    times.append(solver.t)

Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reportsabout 330it/s on my machine. When converting the code to PETSc, I cameto the following result (according to the chapterhttps://petsc.org/main/manual/ts/#special-cases)


import sys
import petsc4py
petsc4py.init(args=sys.argv)
import numpy as np
import scipy.sparse

from tqdm import tqdm
from petsc4py import PETSc

comm = PETSc.COMM_WORLD


def mat_to_real(arr):

return np.block([[arr.real, -arr.imag], [arr.imag,arr.real]]).astype(np.float64)


def mat_to_petsc_aij(arr):
    arr_sc_sp = scipy.sparse.csr_array(arr)
    mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm)
    rstart, rend = mat.getOwnershipRange()
    print(rstart, rend)
    print(arr.shape[0])
    print(mat.sizes)
    I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart]

J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] :arr_sc_sp.indptr[rend]]

    V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]]

    print(I.shape, J.shape, V.shape)
    mat.setValuesCSR(I, J, V)
    mat.assemble()
    return mat


l = np.load("../liouvillian.npy")
l = mat_to_real(l)
pump = np.load("../pump_operator.npy")
pump = mat_to_real(pump)
state = np.load("../initial_state.npy")
state = np.hstack([state.real, state.imag]).astype(np.float64)

l = mat_to_petsc_aij(l)
pump = mat_to_petsc_aij(pump)


jac = l.duplicate()
for i in range(8192):
    jac.setValue(i, i, 0)
jac.assemble()
jac += l

vec = l.createVecRight()
vec.setValues(np.arange(state.shape[0], dtype=np.int32), state)
vec.assemble()


dt = 0.1

ts = PETSc.TS().create(comm=comm)
ts.setFromOptions()
ts.setProblemType(ts.ProblemType.LINEAR)
ts.setEquationType(ts.EquationType.ODE_EXPLICIT)
ts.setType(ts.Type.RK)
ts.setRKType(ts.RKType.RK3BS)
ts.setTime(0)
print("KSP:", ts.getKSP().getType())
print("KSP PC:",ts.getKSP().getPC().getType())
print("SNES :", ts.getSNES().getType())

def jacobian(ts, t, u, Amat, Pmat):
    Amat.zeroEntries()
    Amat.aypx(1, l, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)

Amat.axpy(0.5 * (5 < t < 10), pump,structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)


ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)

#ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) #Uncomment for f(t) = 0

ts.setRHSJacobian(jacobian, jac)

NUM_STEPS = 200
res = np.empty((NUM_STEPS, 8192), dtype=np.float64)
times = []
rstart, rend = vec.getOwnershipRange()
for i in tqdm(range(NUM_STEPS)):
    time = ts.getTime()
    ts.setMaxTime(time + dt)
    ts.solve(vec)
    res[i, rstart:rend] = vec.getArray()[:]
    times.append(time)

I decomposed the complex ODE into a larger real ODE, so that I caneasily switch maybe to GPU computation later on. Now, the solutions ofboth scripts are very much identical, but PETSc runs about 3 timesslower at 120it/s on my machine. I don't use MPI for PETSc yet.

I strongly suppose that the problem lies within the jacobian definition,as PETSc is about 3 times *faster* than scipy with f(t) = 0 andtherefore a constant jacobian.


Thank you in advance.

All the best,
Niclas

[petsc-users] Python PETSc performance vs scipy ZVODE

Reply via email to