Re: [petsc-users] Preconditioning of Liouvillian Superoperator
Thank you very much for the input! I've spent a lot of time to compress the linear system to a quarter of its size. This resulted in a form, though, which cannot be represented by Kronecker products. Maybe I should return to the original form.. The new structure of the linear system is as follows: +---+---++ | A | 0 | -2*B^T | +---+---++ | 0 | D | -C^T | +---+---++ | B | C | D | +---+---++ The matrices D and C are huge and should therefore be accountable for most of the computational cost. Looking at the visual structure of C, I assume that it can maybe be written as a sum of (skew-)symmetric matrices. A, B, and D are not symmetric at all, though. Can the block substructure alone be helpful for solving the linear system in a smarter manner? On 1/31/24 18:45, Barry Smith wrote: For large problems, preconditioners have to take advantage of some underlying mathematical structure of the operator to perform well (require few iterations). Just black-boxing the system with simple preconditioners will not be effective. So, one needs to look at the Liouvillian Superoperator's structure to see what one can take advantage of. I first noticed that it can be represented as a Kronecker product: A x I or a combination of Kronecker products? In theory, one can take advantage of Kronecker structure to solve such systems much more efficiently than just directly solving the huge system naively as a huge system. In addition it may be possible to use the Kronecker structure of the operator to perform matrix-vector products with the operator much more efficiently than by first explicitly forming the huge matrix representation and doing the multiplies with that. I suggest some googling with linear solver, preconditioning, Kronecker product. On Jan 31, 2024, at 6:51 AM, Niclas Götting wrote: Hi all, I've been trying for the last couple of days to solve a linear system using iterative methods. The system size itself scales exponentially (64^N) with the number of components, so I receive sizes of * (64, 64) for one component * (4096, 4096) for two components * (262144, 262144) for three components I can solve the first two cases with direct solvers and don't run into any problems; however, the last case is the first nontrivial and it's too large for a direct solution, which is why I believe that I need an iterative solver. As I know the solution for the first two cases, I tried to reproduce them using GMRES and failed on the second, because GMRES didn't converge and seems to have been going in the wrong direction (the vector to which it "tries" to converge is a totally different one than the correct solution). I went as far as -ksp_max_it 100, which takes orders of magnitude longer than the LU solution and I'd intuitively think that GMRES should not take *that* much longer than LU. Here is the information I have about this (4096, 4096) system: * not symmetric (which is why I went for GMRES) * not singular (SVD: condition number 1.427743623238e+06, 0 of 4096 singular values are (nearly) zero) * solving without preconditioning does not converge (DIVERGED_ITS) * solving with iLU and natural ordering fails due to zeros on the diagonal * solving with iLU and RCM ordering does not converge (DIVERGED_ITS) After some searching I also found [this](http://arxiv.org/abs/1504.06768) paper, which mentions the use of ILUTP, which I believe in PETSc should be used via hypre, which, however, threw a SEGV for me, and I'm not sure if it's worth debugging at this point in time, because I might be missing something entirely different. Does anybody have an idea how this system could be solved in finite time, such that the method also scales to the three component problem? Thank you all very much in advance! Best regards Niclas
[petsc-users] Preconditioning of Liouvillian Superoperator
Hi all, I've been trying for the last couple of days to solve a linear system using iterative methods. The system size itself scales exponentially (64^N) with the number of components, so I receive sizes of * (64, 64) for one component * (4096, 4096) for two components * (262144, 262144) for three components I can solve the first two cases with direct solvers and don't run into any problems; however, the last case is the first nontrivial and it's too large for a direct solution, which is why I believe that I need an iterative solver. As I know the solution for the first two cases, I tried to reproduce them using GMRES and failed on the second, because GMRES didn't converge and seems to have been going in the wrong direction (the vector to which it "tries" to converge is a totally different one than the correct solution). I went as far as -ksp_max_it 100, which takes orders of magnitude longer than the LU solution and I'd intuitively think that GMRES should not take *that* much longer than LU. Here is the information I have about this (4096, 4096) system: * not symmetric (which is why I went for GMRES) * not singular (SVD: condition number 1.427743623238e+06, 0 of 4096 singular values are (nearly) zero) * solving without preconditioning does not converge (DIVERGED_ITS) * solving with iLU and natural ordering fails due to zeros on the diagonal * solving with iLU and RCM ordering does not converge (DIVERGED_ITS) After some searching I also found [this](http://arxiv.org/abs/1504.06768) paper, which mentions the use of ILUTP, which I believe in PETSc should be used via hypre, which, however, threw a SEGV for me, and I'm not sure if it's worth debugging at this point in time, because I might be missing something entirely different. Does anybody have an idea how this system could be solved in finite time, such that the method also scales to the three component problem? Thank you all very much in advance! Best regards Niclas
[petsc-users] TS docs wrong URLs in Examples
Hi all, I noticed that all links to the examples under https://petsc.org/release/manualpages/TS/TS/ point to the wrong URL. Instead of src/ts/**/*, they point to src/sys/**/*, which does not seem to be right. This definitely is a minor issue, but I couldn't see an obvious fix via "Edit this page", so here is the e-mail. Best regards Niclas
Re: [petsc-users] Python PETSc performance vs scipy ZVODE
On the basis of your suggestion, I tried using vode for the real-valued problem with scipy and I get roughly the same speed as before with scipy, which could have three reasons 1. (Z)VODE is slower than plain RK (however, I must admit that I'm not quite sure what (Z)VODE does precisely) 2. Sparse matrix operations in scipy are slow. Some of them are even written in pure python. 3. The RHS function in scipy must *return* a vector and therefore allocates new memory for each iteration. Parallelizing the code is of course a goal of mine, but I believe this will only become relevant for larger systems, which I want to investigate in the near future. Regarding the RHS Jacobian, I see why defining RHSFunction vs RHSJacobian should be computationally equivalent, but I found it much easier to optimize the RHSFunction in this case, and I'm not quite sure as to why the documentation is so specific in strictly recommending the pattern of only providing a Jacobian and not a RHS function, while that should be equivalent. Lastly, I'm aware that another performance boost awaits upon turning off the debugging functionality, but for this simple test I just wanted to see, if there is *any* improvement in performance and I was very much surprised over the factor of 7 with debugging turned on already. Thank you all for the interesting input and have a nice day! Niclas On 15.08.23 00:37, Zhang, Hong wrote: PETSs is not necessarily faster than scipy for your problem when executed in serial. But you get benefits when running in parallel. The PETSc code you wrote uses float64 while your scipy code uses complex128, so the comparison may not be fair. In addition, using the RHS Jacobian does not necessarily make your PETSc code slower. In your case, the bottleneck is the matrix operations. For best performance, you should avoid adding two sparse matrices (especially with different sparsity patterns) which is very costly. So one MatMult + one MultAdd is the best option. MatAXPY with the same nonzero pattern would be a bit slower but still faster than MatAXPY with subset nonzero pattern, which you used in the Jacobian function. I echo Barry’s suggestion that debugging should be turned off before you do any performance study. Hong (Mr.) On Aug 10, 2023, at 4:40 AM, Niclas Götting wrote: Thank you both for the very quick answer! So far, I compiled PETSc with debugging turned on, but I think it should still be faster than standard scipy in both cases. Actually, Stefano's answer has got me very far already; now I only define the RHS of the ODE and no Jacobian (I wonder, why the documentation suggests otherwise, though). I had the following four tries at implementing the RHS: 1. def rhsfunc1(ts, t, u, F): scale = 0.5 * (5 < t < 10) (l + scale * pump).mult(u, F) 2. def rhsfunc2(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) (scale * pump).multAdd(u, F, F) 3. def rhsfunc3(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) if scale != 0: pump.scale(scale) pump.multAdd(u, F, F) pump.scale(1/scale) 4. def rhsfunc4(ts, t, u, F): tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() l.mult(u, F) scale = 0.5 * (5 < t < 10) tmp_pump.axpy(scale, pump, structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) tmp_pump.multAdd(u, F, F) They all yield the same results, but with 50it/s, 800it/, 2300it/s and 1900it/s, respectively, which is a huge performance boost (almost 7 times as fast as scipy, with PETSc debugging still turned on). As the scale function will most likely be a gaussian in the future, I think that option 3 will be become numerically unstable and I'll have to go with option 4, which is already faster than I expected. If you think it is possible to speed up the RHS calculation even more, I'd be happy to hear your suggestions; the -log_view is attached to this message. One last point: If I didn't misunderstand the documentation at https://petsc.org/release/manual/ts/#special-cases, should this maybe be changed? Best regards Niclas On 09.08.23 17:51, Stefano Zampini wrote: TSRK is an explicit solver. Unless you are changing the ts type from command line, the explicit jacobian should not be needed. On top of Barry's suggestion, I would suggest you to write the explicit RHS instead of assembly a throw away matrix every time that function needs to be sampled. On Wed, Aug 9, 2023, 17:09 Niclas Götting wrote: Hi all, I'm currently trying to convert a quantum simulation from scipy to PETSc. The problem itself is extremely simple and of the form \dot{u}(t) = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is a square function. The matrices A_const and B_const are extremely sparse and therefore I thought, the problem will
Re: [petsc-users] Python PETSc performance vs scipy ZVODE
Alright. Again, thank you very much for taking the time to answer my beginner questions! Still a lot to learn.. Have a good day! On 10.08.23 12:27, Stefano Zampini wrote: Then just do the multiplications you need. My proposal was for the example function you were showing. On Thu, Aug 10, 2023, 12:25 Niclas Götting wrote: You are absolutely right for this specific case (I get about 2400it/s instead of 2100it/s). However, the single square function will be replaced by a series of gaussian pulses in the future, which will never be zero. Maybe one could do an approximation and skip the second mult, if the gaussians are close to zero. On 10.08.23 12:16, Stefano Zampini wrote: If you do the mult of "pump" inside an if it should be faster On Thu, Aug 10, 2023, 12:12 Niclas Götting wrote: If I understood you right, this should be the resulting RHS: def rhsfunc5(ts, t, u, F): l.mult(u, F) pump.mult(u, tmp_vec) scale = 0.5 * (5 < t < 10) F.axpy(scale, tmp_vec) It is a little bit slower than option 3, but with about 2100it/s consistently ~10% faster than option 4. Thank you very much for the suggestion! On 10.08.23 11:47, Stefano Zampini wrote: I would use option 3. Keep a work vector and do a vector summation instead of the multiple multiplication by scale and 1/scale. I agree with you the docs are a little misleading here. On Thu, Aug 10, 2023, 11:40 Niclas Götting wrote: Thank you both for the very quick answer! So far, I compiled PETSc with debugging turned on, but I think it should still be faster than standard scipy in both cases. Actually, Stefano's answer has got me very far already; now I only define the RHS of the ODE and no Jacobian (I wonder, why the documentation suggests otherwise, though). I had the following four tries at implementing the RHS: 1. def rhsfunc1(ts, t, u, F): scale = 0.5 * (5 < t < 10) (l + scale * pump).mult(u, F) 2. def rhsfunc2(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) (scale * pump).multAdd(u, F, F) 3. def rhsfunc3(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) if scale != 0: pump.scale(scale) pump.multAdd(u, F, F) pump.scale(1/scale) 4. def rhsfunc4(ts, t, u, F): tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() l.mult(u, F) scale = 0.5 * (5 < t < 10) tmp_pump.axpy(scale, pump, structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) tmp_pump.multAdd(u, F, F) They all yield the same results, but with 50it/s, 800it/, 2300it/s and 1900it/s, respectively, which is a huge performance boost (almost 7 times as fast as scipy, with PETSc debugging still turned on). As the scale function will most likely be a gaussian in the future, I think that option 3 will be become numerically unstable and I'll have to go with option 4, which is already faster than I expected. If you think it is possible to speed up the RHS calculation even more, I'd be happy to hear your suggestions; the -log_view is attached to this message. One last point: If I didn't misunderstand the documentation at https://petsc.org/release/manual/ts/#special-cases, should this maybe be changed? Best regards Niclas On 09.08.23 17:51, Stefano Zampini wrote: TSRK is an explicit solver. Unless you are changing the ts type from command line, the explicit jacobian should not be needed. On top of Barry's suggestion, I would suggest you to write the explicit RHS instead of assembly a throw away matrix every time that function needs to be sampled. On Wed, Aug 9, 2023, 17:09 Niclas Götting wrote: Hi all, I'm currently trying to convert a quantum simulation from scipy to PETSc. The problem itself is extremely simple and of the form \dot{u}(t) = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is a square function. The matrices A_const and B_const are extremely sparse and there
Re: [petsc-users] Python PETSc performance vs scipy ZVODE
You are absolutely right for this specific case (I get about 2400it/s instead of 2100it/s). However, the single square function will be replaced by a series of gaussian pulses in the future, which will never be zero. Maybe one could do an approximation and skip the second mult, if the gaussians are close to zero. On 10.08.23 12:16, Stefano Zampini wrote: If you do the mult of "pump" inside an if it should be faster On Thu, Aug 10, 2023, 12:12 Niclas Götting wrote: If I understood you right, this should be the resulting RHS: def rhsfunc5(ts, t, u, F): l.mult(u, F) pump.mult(u, tmp_vec) scale = 0.5 * (5 < t < 10) F.axpy(scale, tmp_vec) It is a little bit slower than option 3, but with about 2100it/s consistently ~10% faster than option 4. Thank you very much for the suggestion! On 10.08.23 11:47, Stefano Zampini wrote: I would use option 3. Keep a work vector and do a vector summation instead of the multiple multiplication by scale and 1/scale. I agree with you the docs are a little misleading here. On Thu, Aug 10, 2023, 11:40 Niclas Götting wrote: Thank you both for the very quick answer! So far, I compiled PETSc with debugging turned on, but I think it should still be faster than standard scipy in both cases. Actually, Stefano's answer has got me very far already; now I only define the RHS of the ODE and no Jacobian (I wonder, why the documentation suggests otherwise, though). I had the following four tries at implementing the RHS: 1. def rhsfunc1(ts, t, u, F): scale = 0.5 * (5 < t < 10) (l + scale * pump).mult(u, F) 2. def rhsfunc2(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) (scale * pump).multAdd(u, F, F) 3. def rhsfunc3(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) if scale != 0: pump.scale(scale) pump.multAdd(u, F, F) pump.scale(1/scale) 4. def rhsfunc4(ts, t, u, F): tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() l.mult(u, F) scale = 0.5 * (5 < t < 10) tmp_pump.axpy(scale, pump, structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) tmp_pump.multAdd(u, F, F) They all yield the same results, but with 50it/s, 800it/, 2300it/s and 1900it/s, respectively, which is a huge performance boost (almost 7 times as fast as scipy, with PETSc debugging still turned on). As the scale function will most likely be a gaussian in the future, I think that option 3 will be become numerically unstable and I'll have to go with option 4, which is already faster than I expected. If you think it is possible to speed up the RHS calculation even more, I'd be happy to hear your suggestions; the -log_view is attached to this message. One last point: If I didn't misunderstand the documentation at https://petsc.org/release/manual/ts/#special-cases, should this maybe be changed? Best regards Niclas On 09.08.23 17:51, Stefano Zampini wrote: TSRK is an explicit solver. Unless you are changing the ts type from command line, the explicit jacobian should not be needed. On top of Barry's suggestion, I would suggest you to write the explicit RHS instead of assembly a throw away matrix every time that function needs to be sampled. On Wed, Aug 9, 2023, 17:09 Niclas Götting wrote: Hi all, I'm currently trying to convert a quantum simulation from scipy to PETSc. The problem itself is extremely simple and of the form \dot{u}(t) = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is a square function. The matrices A_const and B_const are extremely sparse and therefore I thought, the problem will be well suited for PETSc. Currently, I solve the ODE with the following procedure in scipy (I can provide the necessary data files, if needed, but they are just some trace-preserving, very sparse matrices): import numpy as np import scipy.sparse import scipy.integrate from tqdm import tqdm l = np.load("../liouvillian.npy") pump = np.load("../pump_operator.npy") state = np.load("../initial_state.npy") l = scipy.sparse.csr_array(l) pump = scipy.sparse.csr_array(pump) def f(t,
Re: [petsc-users] Python PETSc performance vs scipy ZVODE
If I understood you right, this should be the resulting RHS: def rhsfunc5(ts, t, u, F): l.mult(u, F) pump.mult(u, tmp_vec) scale = 0.5 * (5 < t < 10) F.axpy(scale, tmp_vec) It is a little bit slower than option 3, but with about 2100it/s consistently ~10% faster than option 4. Thank you very much for the suggestion! On 10.08.23 11:47, Stefano Zampini wrote: I would use option 3. Keep a work vector and do a vector summation instead of the multiple multiplication by scale and 1/scale. I agree with you the docs are a little misleading here. On Thu, Aug 10, 2023, 11:40 Niclas Götting wrote: Thank you both for the very quick answer! So far, I compiled PETSc with debugging turned on, but I think it should still be faster than standard scipy in both cases. Actually, Stefano's answer has got me very far already; now I only define the RHS of the ODE and no Jacobian (I wonder, why the documentation suggests otherwise, though). I had the following four tries at implementing the RHS: 1. def rhsfunc1(ts, t, u, F): scale = 0.5 * (5 < t < 10) (l + scale * pump).mult(u, F) 2. def rhsfunc2(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) (scale * pump).multAdd(u, F, F) 3. def rhsfunc3(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) if scale != 0: pump.scale(scale) pump.multAdd(u, F, F) pump.scale(1/scale) 4. def rhsfunc4(ts, t, u, F): tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() l.mult(u, F) scale = 0.5 * (5 < t < 10) tmp_pump.axpy(scale, pump, structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) tmp_pump.multAdd(u, F, F) They all yield the same results, but with 50it/s, 800it/, 2300it/s and 1900it/s, respectively, which is a huge performance boost (almost 7 times as fast as scipy, with PETSc debugging still turned on). As the scale function will most likely be a gaussian in the future, I think that option 3 will be become numerically unstable and I'll have to go with option 4, which is already faster than I expected. If you think it is possible to speed up the RHS calculation even more, I'd be happy to hear your suggestions; the -log_view is attached to this message. One last point: If I didn't misunderstand the documentation at https://petsc.org/release/manual/ts/#special-cases, should this maybe be changed? Best regards Niclas On 09.08.23 17:51, Stefano Zampini wrote: TSRK is an explicit solver. Unless you are changing the ts type from command line, the explicit jacobian should not be needed. On top of Barry's suggestion, I would suggest you to write the explicit RHS instead of assembly a throw away matrix every time that function needs to be sampled. On Wed, Aug 9, 2023, 17:09 Niclas Götting wrote: Hi all, I'm currently trying to convert a quantum simulation from scipy to PETSc. The problem itself is extremely simple and of the form \dot{u}(t) = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is a square function. The matrices A_const and B_const are extremely sparse and therefore I thought, the problem will be well suited for PETSc. Currently, I solve the ODE with the following procedure in scipy (I can provide the necessary data files, if needed, but they are just some trace-preserving, very sparse matrices): import numpy as np import scipy.sparse import scipy.integrate from tqdm import tqdm l = np.load("../liouvillian.npy") pump = np.load("../pump_operator.npy") state = np.load("../initial_state.npy") l = scipy.sparse.csr_array(l) pump = scipy.sparse.csr_array(pump) def f(t, y, *args): return (l + 0.5 * (5 < t < 10) * pump) @ y #return l @ y # Uncomment for f(t) = 0 dt = 0.1 NUM_STEPS = 200 res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) solver = scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) times = [] for i in tqdm(range(NUM_STEPS)): res[i, :] = solver.integrate(solver.t + dt) times.append(solver.t) Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports about 330it/s on my machine. When converting the code to PETSc, I came to the following result (according to the chapter https://petsc.org/main/manual/ts/#special-cases) import sys import petsc4py petsc4py.init(args=sys.argv
Re: [petsc-users] Python PETSc performance vs scipy ZVODE
Thank you both for the very quick answer! So far, I compiled PETSc with debugging turned on, but I think it should still be faster than standard scipy in both cases. Actually, Stefano's answer has got me very far already; now I only define the RHS of the ODE and no Jacobian (I wonder, why the documentation suggests otherwise, though). I had the following four tries at implementing the RHS: 1. def rhsfunc1(ts, t, u, F): scale = 0.5 * (5 < t < 10) (l + scale * pump).mult(u, F) 2. def rhsfunc2(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) (scale * pump).multAdd(u, F, F) 3. def rhsfunc3(ts, t, u, F): l.mult(u, F) scale = 0.5 * (5 < t < 10) if scale != 0: pump.scale(scale) pump.multAdd(u, F, F) pump.scale(1/scale) 4. def rhsfunc4(ts, t, u, F): tmp_pump.zeroEntries() # tmp_pump is pump.duplicate() l.mult(u, F) scale = 0.5 * (5 < t < 10) tmp_pump.axpy(scale, pump, structure=PETSc.Mat.Structure.SAME_NONZERO_PATTERN) tmp_pump.multAdd(u, F, F) They all yield the same results, but with 50it/s, 800it/, 2300it/s and 1900it/s, respectively, which is a huge performance boost (almost 7 times as fast as scipy, with PETSc debugging still turned on). As the scale function will most likely be a gaussian in the future, I think that option 3 will be become numerically unstable and I'll have to go with option 4, which is already faster than I expected. If you think it is possible to speed up the RHS calculation even more, I'd be happy to hear your suggestions; the -log_view is attached to this message. One last point: If I didn't misunderstand the documentation at https://petsc.org/release/manual/ts/#special-cases, should this maybe be changed? Best regards Niclas On 09.08.23 17:51, Stefano Zampini wrote: TSRK is an explicit solver. Unless you are changing the ts type from command line, the explicit jacobian should not be needed. On top of Barry's suggestion, I would suggest you to write the explicit RHS instead of assembly a throw away matrix every time that function needs to be sampled. On Wed, Aug 9, 2023, 17:09 Niclas Götting wrote: Hi all, I'm currently trying to convert a quantum simulation from scipy to PETSc. The problem itself is extremely simple and of the form \dot{u}(t) = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is a square function. The matrices A_const and B_const are extremely sparse and therefore I thought, the problem will be well suited for PETSc. Currently, I solve the ODE with the following procedure in scipy (I can provide the necessary data files, if needed, but they are just some trace-preserving, very sparse matrices): import numpy as np import scipy.sparse import scipy.integrate from tqdm import tqdm l = np.load("../liouvillian.npy") pump = np.load("../pump_operator.npy") state = np.load("../initial_state.npy") l = scipy.sparse.csr_array(l) pump = scipy.sparse.csr_array(pump) def f(t, y, *args): return (l + 0.5 * (5 < t < 10) * pump) @ y #return l @ y # Uncomment for f(t) = 0 dt = 0.1 NUM_STEPS = 200 res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) solver = scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) times = [] for i in tqdm(range(NUM_STEPS)): res[i, :] = solver.integrate(solver.t + dt) times.append(solver.t) Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports about 330it/s on my machine. When converting the code to PETSc, I came to the following result (according to the chapter https://petsc.org/main/manual/ts/#special-cases) import sys import petsc4py petsc4py.init(args=sys.argv) import numpy as np import scipy.sparse from tqdm import tqdm from petsc4py import PETSc comm = PETSc.COMM_WORLD def mat_to_real(arr): return np.block([[arr.real, -arr.imag], [arr.imag, arr.real]]).astype(np.float64) def mat_to_petsc_aij(arr): arr_sc_sp = scipy.sparse.csr_array(arr) mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) rstart, rend = mat.getOwnershipRange() print(rstart, rend) print(arr.shape[0]) print(mat.sizes) I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart] J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] print(I.shape, J.shape, V.shape) mat.setValuesCSR(I, J, V) mat.assemble() return mat l = np.load("../liouvillian.npy") l = mat_to_real(l) pump = np.load("..
[petsc-users] Python PETSc performance vs scipy ZVODE
Hi all, I'm currently trying to convert a quantum simulation from scipy to PETSc. The problem itself is extremely simple and of the form \dot{u}(t) = (A_const + f(t)*B_const)*u(t), where f(t) in this simple test case is a square function. The matrices A_const and B_const are extremely sparse and therefore I thought, the problem will be well suited for PETSc. Currently, I solve the ODE with the following procedure in scipy (I can provide the necessary data files, if needed, but they are just some trace-preserving, very sparse matrices): import numpy as np import scipy.sparse import scipy.integrate from tqdm import tqdm l = np.load("../liouvillian.npy") pump = np.load("../pump_operator.npy") state = np.load("../initial_state.npy") l = scipy.sparse.csr_array(l) pump = scipy.sparse.csr_array(pump) def f(t, y, *args): return (l + 0.5 * (5 < t < 10) * pump) @ y #return l @ y # Uncomment for f(t) = 0 dt = 0.1 NUM_STEPS = 200 res = np.empty((NUM_STEPS, 4096), dtype=np.complex128) solver = scipy.integrate.ode(f).set_integrator("zvode").set_initial_value(state) times = [] for i in tqdm(range(NUM_STEPS)): res[i, :] = solver.integrate(solver.t + dt) times.append(solver.t) Here, A_const = l, B_const = pump and f(t) = 5 < t < 10. tqdm reports about 330it/s on my machine. When converting the code to PETSc, I came to the following result (according to the chapter https://petsc.org/main/manual/ts/#special-cases) import sys import petsc4py petsc4py.init(args=sys.argv) import numpy as np import scipy.sparse from tqdm import tqdm from petsc4py import PETSc comm = PETSc.COMM_WORLD def mat_to_real(arr): return np.block([[arr.real, -arr.imag], [arr.imag, arr.real]]).astype(np.float64) def mat_to_petsc_aij(arr): arr_sc_sp = scipy.sparse.csr_array(arr) mat = PETSc.Mat().createAIJ(arr.shape[0], comm=comm) rstart, rend = mat.getOwnershipRange() print(rstart, rend) print(arr.shape[0]) print(mat.sizes) I = arr_sc_sp.indptr[rstart : rend + 1] - arr_sc_sp.indptr[rstart] J = arr_sc_sp.indices[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] V = arr_sc_sp.data[arr_sc_sp.indptr[rstart] : arr_sc_sp.indptr[rend]] print(I.shape, J.shape, V.shape) mat.setValuesCSR(I, J, V) mat.assemble() return mat l = np.load("../liouvillian.npy") l = mat_to_real(l) pump = np.load("../pump_operator.npy") pump = mat_to_real(pump) state = np.load("../initial_state.npy") state = np.hstack([state.real, state.imag]).astype(np.float64) l = mat_to_petsc_aij(l) pump = mat_to_petsc_aij(pump) jac = l.duplicate() for i in range(8192): jac.setValue(i, i, 0) jac.assemble() jac += l vec = l.createVecRight() vec.setValues(np.arange(state.shape[0], dtype=np.int32), state) vec.assemble() dt = 0.1 ts = PETSc.TS().create(comm=comm) ts.setFromOptions() ts.setProblemType(ts.ProblemType.LINEAR) ts.setEquationType(ts.EquationType.ODE_EXPLICIT) ts.setType(ts.Type.RK) ts.setRKType(ts.RKType.RK3BS) ts.setTime(0) print("KSP:", ts.getKSP().getType()) print("KSP PC:",ts.getKSP().getPC().getType()) print("SNES :", ts.getSNES().getType()) def jacobian(ts, t, u, Amat, Pmat): Amat.zeroEntries() Amat.aypx(1, l, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) Amat.axpy(0.5 * (5 < t < 10), pump, structure=PETSc.Mat.Structure.SUBSET_NONZERO_PATTERN) ts.setRHSFunction(PETSc.TS.computeRHSFunctionLinear) #ts.setRHSJacobian(PETSc.TS.computeRHSJacobianConstant, l, l) # Uncomment for f(t) = 0 ts.setRHSJacobian(jacobian, jac) NUM_STEPS = 200 res = np.empty((NUM_STEPS, 8192), dtype=np.float64) times = [] rstart, rend = vec.getOwnershipRange() for i in tqdm(range(NUM_STEPS)): time = ts.getTime() ts.setMaxTime(time + dt) ts.solve(vec) res[i, rstart:rend] = vec.getArray()[:] times.append(time) I decomposed the complex ODE into a larger real ODE, so that I can easily switch maybe to GPU computation later on. Now, the solutions of both scripts are very much identical, but PETSc runs about 3 times slower at 120it/s on my machine. I don't use MPI for PETSc yet. I strongly suppose that the problem lies within the jacobian definition, as PETSc is about 3 times *faster* than scipy with f(t) = 0 and therefore a constant jacobian. Thank you in advance. All the best, Niclas