Thibaut: I'm writing a simple example using KSP directly -- will send to you soon. Hong
Hi Hong, That sounds like a more reasonable approach, I had no idea the PETSc/MUMPS interface could provide such a level of control on the solve process. Therefore, after assembling the matrix A_0, I should do something along the lines of: MatGetFactor(A_0, MATSOLVERMUMPS, MAT_FACTOR_LU, F) MatLUFactorSymbolic(F, A_0, NULL, NULL, info) MatLUFactorNumeric(F, A_0, info) and then call MatSolve? However I don't understand, I thought F would remain the same during the whole process but it's an input parameter of MatSolve so I'd need one F_m for each A_m? Which is not what you mentioned (do one symbolic factorization only) On a side note, after preallocating and assembling the first matrix, should I create/assemble all the others with MatDuplicate(A_0, MAT_DO_NOT_COPY_VALUES, A_m) Calls to MatSetValues( ... ) MatAssemblyBegin(A_m, MAT_FINAL_ASSEMBLY) MatAssemblyEnd(A_m, MAT_FINAL_ASSEMBLY) Is that the recommended/most scalable way of duplicating a matrix + its non-zero structure? Thank you for your support and suggestions, Thibaut On 23/07/2019 18:38, Zhang, Hong wrote: Thibaut: Thanks for taking the time. I would typically run that on a small cluster node of 16 or 32 physical cores with 2 or 4 sockets. I use 16 or 32 MPI ranks and bind them to cores. The matrices would ALL have the same size and the same nonzero structure - it's just a few numerical values that would differ. You may do one symbolic factorization of A_m, use it in the m-i loop: - numeric factorization of A_m - solve A_m x_m,i = b_m,i in mumps, numeric factorization and solve are scalable. Repeated numeric factorization of A_m are likely faster than reading data files from the disc. Hong This is a good point you've raised as I don't think MUMPS is able to exploit that - I asked the question in their users list just to be sure. There are some options in SuperLU dist to reuse permutation arrays, but there's no I/O for that solver. And the native PETSc LU solver is not parallel? I'm using high-order finite differences so I'm suffering from a lot of fill-in, one of the reasons why storing factorizations in RAM is not viable. In comparison, I have almost unlimited disk space. I'm aware my need might seem counter-intuitive, but I'm really willing to sacrifice performance in the I/O part. My code is already heavily based on PETSc (preallocation, assembly for matrices/vectors) coupled with MUMPS I'm minimizing the loss of efficiency. Thibaut On 23/07/2019 17:13, Smith, Barry F. wrote: > What types of computing systems will you be doing the computations? > Roughly how many MPI_ranks? > > Are the matrices all the same size? Do they have the same or different > nonzero structures? Would it be possible to use the same symbolic > representation for all of them and just have different numerical values? > > Clusters and large scale computing centers are notoriously terrible at IO; > often IO is orders of magnitude slower than compute/memory making this type > of workflow unrealistically slow. From a cost analysis point of view often > just buying lots of memory might be the most efficacious approach. > > That said, what you suggest might require only a few lines of code (though > determining where to put them is the tricky part) depending on the MUMPS > interface for saving a filer to disk. What we would do is keep the PETSc > wrapper that lives around the MUMPS matrix Mat_MUMPS but using the MUMPS API > save the information in the DMUMPS_STRUC_C id; and then reload it when needed. > > The user level API could be something like > > MatMumpsSaveToDisk(Mat) and MatMumpsLoadFromDisk(Mat) they would just > money with DMUMPS_STRUC_C id; item. > > > Barry > > >> On Jul 23, 2019, at 9:24 AM, Thibaut Appel via petsc-users >> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote: >> >> Dear PETSc users, >> >> I need to solve several linear systems successively, with LU factorization, >> as part of an iterative process in my Fortran application code. >> >> The process would solve M systems (A_m)(x_m,i) = (b_m,i) for m=1,M at each >> iteration i, but computing the LU factorization of A_m only once. >> The RHSs (b_m,i+1) are computed from all the different (x_m,i) and all >> depend upon each other. >> >> The way I envisage to perform that is to use MUMPS to compute, successively, >> each of the LU factorizations (m) in parallel and store the factors on disk, >> creating/assembling/destroying the matrices A_m on the go. >> Then whenever needed, read the factors in parallel to solve the systems. >> Since version 5.2, MUMPS has a save/restore feature that allows that, see >> http://mumps.enseeiht.fr/doc/userguide_5.2.1.pdf p.20, 24 and 58. >> >> In its current state, the PETSc/MUMPS interface does not incorporate that >> feature. I'm an advanced Fortran programmer but not in C so I don't think I >> would do an amazing job having a go inside >> src/mat/impls/aij/mpi/mumps/mumps.c. >> >> I was picturing something like creating as many KSP objects as linear >> systems to be solved, with some sort of flag to force the storage of LU >> factors on disk after the first call to KSPSolve. Then keep calling KSPSolve >> as many times as needed. >> >> Would you support such a feature? >> >> Thanks for your support, >> >> Thibaut