Re: [petsc-users] Python PETSc performance vs scipy ZVODE

Niclas Götting Thu, 10 Aug 2023 03:31:48 -0700

Alright. Again, thank you very much for taking the time to answer mybeginner questions! Still a lot to learn..


Have a good day!


On 10.08.23 12:27, Stefano Zampini wrote:

Then just do the multiplications you need. My proposal was for theexample function you were showing.

On Thu, Aug 10, 2023, 12:25 Niclas Götting<ngoett...@itp.uni-bremen.de> wrote:


    You are absolutely right for this specific case (I get about
    2400it/s instead of 2100it/s). However, the single square function
    will be replaced by a series of gaussian pulses in the future,
    which will never be zero. Maybe one could do an approximation and
    skip the second mult, if the gaussians are close to zero.

    On 10.08.23 12:16, Stefano Zampini wrote:

    If you do the mult of "pump" inside an if it should be faster

    On Thu, Aug 10, 2023, 12:12 Niclas Götting
    <ngoett...@itp.uni-bremen.de> wrote:

        If I understood you right, this should be the resulting RHS:

        def rhsfunc5(ts, t, u, F):
            l.mult(u, F)
            pump.mult(u, tmp_vec)
            scale = 0.5 * (5 < t < 10)
            F.axpy(scale, tmp_vec)

        It is a little bit slower than option 3, but with about
        2100it/s consistently ~10% faster than option 4.

        Thank you very much for the suggestion!

        On 10.08.23 11:47, Stefano Zampini wrote:

        I would use option 3. Keep a work vector and do a vector
        summation instead of the multiple multiplication by scale
        and 1/scale.

        I agree with you the docs are a little misleading here.

        On Thu, Aug 10, 2023, 11:40 Niclas Götting
        <ngoett...@itp.uni-bremen.de> wrote:

            Thank you both for the very quick answer!

            So far, I compiled PETSc with debugging turned on, but I
            think it should still be faster than standard scipy in
            both cases. Actually, Stefano's answer has got me very
            far already; now I only define the RHS of the ODE and no
            Jacobian (I wonder, why the documentation suggests
            otherwise, though). I had the following four tries at
            implementing the RHS:

             1. def rhsfunc1(ts, t, u, F):
                    scale = 0.5 * (5 < t < 10)
                    (l + scale * pump).mult(u, F)
             2. def rhsfunc2(ts, t, u, F):
                    l.mult(u, F)
                    scale = 0.5 * (5 < t < 10)
                    (scale * pump).multAdd(u, F, F)
             3. def rhsfunc3(ts, t, u, F):
                    l.mult(u, F)
                    scale = 0.5 * (5 < t < 10)
                    if scale != 0:
                        pump.scale(scale)
                        pump.multAdd(u, F, F)
                        pump.scale(1/scale)
             4. def rhsfunc4(ts, t, u, F):
                    tmp_pump.zeroEntries() # tmp_pump is
                pump.duplicate()
                    l.mult(u, F)
                    scale = 0.5 * (5 < t < 10)
                    tmp_pump.axpy(scale, pump,
                structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN)
                    tmp_pump.multAdd(u, F, F)

            They all yield the same results, but with 50it/s,
            800it/, 2300it/s and 1900it/s, respectively, which is a
            huge performance boost (almost 7 times as fast as scipy,
            with PETSc debugging still turned on). As the scale
            function will most likely be a gaussian in the future, I
            think that option 3 will be become numerically unstable
            and I'll have to go with option 4, which is already
            faster than I expected. If you think it is possible to
            speed up the RHS calculation even more, I'd be happy to
            hear your suggestions; the -log_view is attached to this
            message.

            One last point: If I didn't misunderstand the
            documentation at
            https://petsc.org/release/manual/ts/#special-cases,
            should this maybe be changed?

            Best regards
            Niclas

            On 09.08.23 17:51, Stefano Zampini wrote:

            TSRK is an explicit solver. Unless you are changing the
            ts type from command line,  the explicit  jacobian
            should not be needed. On top of Barry's suggestion, I
            would suggest you to write the explicit RHS instead of
            assembly a throw away matrix every time that function
            needs to be sampled.

            On Wed, Aug 9, 2023, 17:09 Niclas Götting
            <ngoett...@itp.uni-bremen.de> wrote:

                Hi all,

                I'm currently trying to convert a quantum
                simulation from scipy to
                PETSc. The problem itself is extremely simple and
                of the form \dot{u}(t)
                = (A_const + f(t)*B_const)*u(t), where f(t) in this
                simple test case is
                a square function. The matrices A_const and B_const
                are extremely sparse
                and therefore I thought, the problem will be well
                suited for PETSc.
                Currently, I solve the ODE with the following
                procedure in scipy (I can
                provide the necessary data files, if needed, but
                they are just some
                trace-preserving, very sparse matrices):

                import numpy as np
                import scipy.sparse
                import scipy.integrate

                from tqdm import tqdm


                l = np.load("../liouvillian.npy")
                pump = np.load("../pump_operator.npy")
                state = np.load("../initial_state.npy")

                l = scipy.sparse.csr_array(l)
                pump = scipy.sparse.csr_array(pump)

                def f(t, y, *args):
                     return (l + 0.5 * (5 < t < 10) * pump) @ y
                     #return l @ y # Uncomment for f(t) = 0

                dt = 0.1
                NUM_STEPS = 200
                res = np.empty((NUM_STEPS, 4096), dtype=np.complex128)
                solver =
                
scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state)
                times = []
                for i in tqdm(range(NUM_STEPS)):
                     res[i, :] = solver.integrate(solver.t + dt)
                     times.append(solver.t)

                Here, A_const = l, B_const = pump and f(t) = 5 < t
                < 10. tqdm reports
                about 330it/s on my machine. When converting the
                code to PETSc, I came
                to the following result (according to the chapter
                https://petsc.org/main/manual/ts/#special-cases)

                import sys
                import petsc4py
                petsc4py.init(args=sys.argv)
                import numpy as np
                import scipy.sparse

                from tqdm import tqdm
                from petsc4py import PETSc

                comm = PETSc.COMM_WORLD


                def mat_to_real(arr):
                     return np.block([[arr.real, -arr.imag],
                [arr.imag,
                arr.real]]).astype(np.float64)

                def mat_to_petsc_aij(arr):
                     arr_sc_sp = scipy.sparse.csr_array(arr)
                     mat = PETSc.Mat().createAIJ(arr.shape[0],
                comm=comm)
                     rstart, rend = mat.getOwnershipRange()
                     print(rstart, rend)
                     print(arr.shape[0])
                     print(mat.sizes)
                     I = arr_sc_sp.indptr[rstart : rend + 1] -
                arr_sc_sp.indptr[rstart]
                     J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] :
                arr_sc_sp.indptr[rend]]
                     V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] :
                arr_sc_sp.indptr[rend]]

                     print(I.shape, J.shape, V.shape)
                     mat.setValuesCSR(I, J, V)
                     mat.assemble()
                     return mat


                l = np.load("../liouvillian.npy")
                l = mat_to_real(l)
                pump = np.load("../pump_operator.npy")
                pump = mat_to_real(pump)
                state = np.load("../initial_state.npy")
                state = np.hstack([state.real,
                state.imag]).astype(np.float64)

                l = mat_to_petsc_aij(l)
                pump = mat_to_petsc_aij(pump)


                jac = l.duplicate()
                for i in range(8192):
                     jac.setValue(i, i, 0)
                jac.assemble()
                jac += l

                vec = l.createVecRight()
                vec.setValues(np.arange(state.shape[0],
                dtype=np.int32), state)
                vec.assemble()


                dt = 0.1

                ts = PETSc.TS().create(comm=comm)
                ts.setFromOptions()
                ts.setProblemType(ts.ProblemType.LINEAR)
                ts.setEquationType(ts.EquationType.ODE_EXPLICIT)
                ts.setType(ts.Type.RK)
                ts.setRKType(ts.RKType.RK3BS)
                ts.setTime(0)
                print("KSP:", ts.getKSP().getType())
                print("KSP PC:",ts.getKSP().getPC().getType())
                print("SNES :", ts.getSNES().getType())

                def jacobian(ts, t, u, Amat, Pmat):
                     Amat.zeroEntries()
                     Amat.aypx(1, l,
                structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)
                     Amat.axpy(0.5 * (5 < t < 10), pump,
                structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN)

                ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear)
                #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant,
                l, l) #
                Uncomment for f(t) = 0
                ts.setRHSJacobian(jacobian, jac)

                NUM_STEPS = 200
                res = np.empty((NUM_STEPS, 8192), dtype=np.float64)
                times = []
                rstart, rend = vec.getOwnershipRange()
                for i in tqdm(range(NUM_STEPS)):
                     time = ts.getTime()
                     ts.setMaxTime(time + dt)
                     ts.solve(vec)
                     res[i, rstart:rend] = vec.getArray()[:]
                     times.append(time)

                I decomposed the complex ODE into a larger real
                ODE, so that I can
                easily switch maybe to GPU computation later on.
                Now, the solutions of
                both scripts are very much identical, but PETSc
                runs about 3 times
                slower at 120it/s on my machine. I don't use MPI
                for PETSc yet.

                I strongly suppose that the problem lies within the
                jacobian definition,
                as PETSc is about 3 times *faster* than scipy with
                f(t) = 0 and
                therefore a constant jacobian.

                Thank you in advance.

                All the best,
                Niclas

Re: [petsc-users] Python PETSc performance vs scipy ZVODE

Reply via email to