Thibaut:
I'm writing a simple example using KSP directly -- will send to you soon.
Hong

Hi Hong,

That sounds like a more reasonable approach, I had no idea the PETSc/MUMPS 
interface could provide such a level of control on the solve process. 
Therefore, after assembling the matrix A_0, I should do something along the 
lines of:

MatGetFactor(A_0, MATSOLVERMUMPS, MAT_FACTOR_LU, F)

MatLUFactorSymbolic(F, A_0, NULL, NULL, info)

MatLUFactorNumeric(F, A_0, info)

and then call MatSolve? However I don't understand, I thought F would remain 
the same during the whole process but it's an input parameter of MatSolve so 
I'd need one F_m for each A_m? Which is not what you mentioned (do one symbolic 
factorization only)


On a side note, after preallocating and assembling the first matrix, should I 
create/assemble all the others with

MatDuplicate(A_0, MAT_DO_NOT_COPY_VALUES, A_m)

Calls to MatSetValues( ... )

MatAssemblyBegin(A_m, MAT_FINAL_ASSEMBLY)
MatAssemblyEnd(A_m, MAT_FINAL_ASSEMBLY)


Is that the recommended/most scalable way of duplicating a matrix + its 
non-zero structure?


Thank you for your support and suggestions,

Thibaut


On 23/07/2019 18:38, Zhang, Hong wrote:
Thibaut:
Thanks for taking the time. I would typically run that on a small
cluster node of 16 or 32 physical cores with 2 or 4 sockets. I use 16 or
32 MPI ranks and bind them to cores.

The matrices would ALL have the same size and the same nonzero structure
- it's just a few numerical values that would differ.
You may do one symbolic factorization of A_m, use it in the m-i loop:
- numeric factorization of A_m
- solve A_m x_m,i = b_m,i
in mumps, numeric factorization and solve are scalable. Repeated numeric 
factorization of A_m are likely faster than reading data files from the disc.
Hong

This is a good point you've raised as I don't think MUMPS is able to
exploit that - I asked the question in their users list just to be sure.
There are some options in SuperLU dist to reuse permutation arrays, but
there's no I/O for that solver. And the native PETSc LU solver is not
parallel?

I'm using high-order finite differences so I'm suffering from a lot of
fill-in, one of the reasons why storing factorizations in RAM is not
viable. In comparison, I have almost unlimited disk space.

I'm aware my need might seem counter-intuitive, but I'm really willing
to sacrifice performance in the I/O part. My code is already heavily
based on PETSc (preallocation, assembly for matrices/vectors) coupled
with MUMPS I'm minimizing the loss of efficiency.

Thibaut

On 23/07/2019 17:13, Smith, Barry F. wrote:
>    What types of computing systems will you be doing the computations? 
> Roughly how many MPI_ranks?
>
> Are the matrices all the same size? Do they have the same or different 
> nonzero structures? Would it be possible to use the same symbolic 
> representation for all of them and just have different numerical values?
>
>    Clusters and large scale computing centers are notoriously terrible at IO; 
> often IO is orders of magnitude slower than compute/memory making this type 
> of workflow unrealistically slow. From a cost analysis point of view often 
> just buying lots of memory might be the most  efficacious approach.
>
>    That said, what you suggest might require only a few lines of code (though 
> determining where to put them is the tricky part) depending on the MUMPS 
> interface for saving a filer to disk. What we would do is keep the PETSc 
> wrapper that lives around the MUMPS matrix Mat_MUMPS but using the MUMPS API 
> save the information in the DMUMPS_STRUC_C id; and then reload it when needed.
>
>    The user level API could be something like
>
>    MatMumpsSaveToDisk(Mat) and MatMumpsLoadFromDisk(Mat) they would just 
> money with DMUMPS_STRUC_C id; item.
>
>
>    Barry
>
>
>> On Jul 23, 2019, at 9:24 AM, Thibaut Appel via petsc-users 
>> <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote:
>>
>> Dear PETSc users,
>>
>> I need to solve several linear systems successively, with LU factorization, 
>> as part of an iterative process in my Fortran application code.
>>
>> The process would solve M systems (A_m)(x_m,i) = (b_m,i) for m=1,M at each 
>> iteration i, but computing the LU factorization of A_m only once.
>> The RHSs (b_m,i+1) are computed from all the different (x_m,i) and all 
>> depend upon each other.
>>
>> The way I envisage to perform that is to use MUMPS to compute, successively, 
>> each of the LU factorizations (m) in parallel and store the factors on disk, 
>> creating/assembling/destroying the matrices A_m on the go.
>> Then whenever needed, read the factors in parallel to solve the systems. 
>> Since version 5.2, MUMPS has a save/restore feature that allows that, see 
>> http://mumps.enseeiht.fr/doc/userguide_5.2.1.pdf p.20, 24 and 58.
>>
>> In its current state, the PETSc/MUMPS interface does not incorporate that 
>> feature. I'm an advanced Fortran programmer but not in C so I don't think I 
>> would do an amazing job having a go inside 
>> src/mat/impls/aij/mpi/mumps/mumps.c.
>>
>> I was picturing something like creating as many KSP objects as linear 
>> systems to be solved, with some sort of flag to force the storage of LU 
>> factors on disk after the first call to KSPSolve. Then keep calling KSPSolve 
>> as many times as needed.
>>
>> Would you support such a feature?
>>
>> Thanks for your support,
>>
>> Thibaut

Reply via email to