Re: [petsc-users] Best practices for solving Dense Linear systems

Nidish Sat, 08 Aug 2020 14:56:52 -0700


On 8/7/20 12:55 PM, Barry Smith wrote:

On Aug 7, 2020, at 12:26 PM, Nidish <[email protected]<mailto:[email protected]>> wrote:
On 8/7/20 8:52 AM, Barry Smith wrote:
On Aug 7, 2020, at 1:25 AM, Nidish <[email protected]<mailto:[email protected]>> wrote:
Indeed - I was just using the default solver (GMRES with ILU).
Using just standard LU (direct solve with "-pc_type lu -ksp_typepreonly"), I find elemental to be extremely slow even for a1000x1000 matrix.
What about on one process?
On just one process the performance is comparable.
Elemental generally won't be competitive for such tiny matrices.
For MPIaij it's throwing me an error if I tried "-pc_type lu".
Yes, there is no PETSc code for sparse parallel direct solver,this is expected.
   What about ?
    mpirun -n 1 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi

    mpirun -n 4 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi
Same results - the elemental version is MUCH slower (for 1000x1000).
Where will your dense matrices be coming from and how big will theybe in practice? This will help determine if an iterative solver isappropriate. If they will be 100,000 for example then testing with1000 will tell you nothing useful, you need to test with the problemsize you care about.
The matrices in my application arise from substructuring/ComponentMode Synthesis conducted on a system that is linear "almosteverywhere", for example jointed systems. The procedure we follow is:build a mesh & identify the nodes corresponding to the interfaces,reduce the model using component mode synthesis to obtain arepresentation of the system using just the interfacedegrees-of-freedom along with some (~10s) generalized "modalcoordinates". We conduct the non-linear analyses (transient, steadystate harmonic, etc.) using this matrices.
I am interested in conducting non-linear mesh convergence for aparticular system of interest wherein the interface DoFs are, approx,4000, 8000, 12000, 16000. I'm fairly certain the dense matrices willnot be larger. The
Ok, so it is not clear how well conditioned these dense matriceswill be.
    There are three questions that need to be answered.
1) for your problem can iterative methods be used and will theyrequire less work than direct solvers.
For direct LU the work is order N^3 to do the factorizationwith a relatively small constant. Because of smart organization insidedense LU the flops can be done very efficiently.
For GMRES with Jacobi preconditioning the work is order N^2(the time for a dense matrix-vector product) for each iteration. Ifthe number of iterations small than the total work is much less than adirect solver. In the worst case the number of iterations is order Nso the total work is order N^3, the same order as a direct method. But the efficiency of a dense matrix-vector product is much lowerthan the efficiency of a LU factorization so even if the work is thesame order it can take longer. One should use mpidense as the matrixformat for iterative.
With iterative methods YOU get to decide how accurate you needyour solution, you do this by setting how small you want the residualto be (since you can't directly control the error). By default PETScuses a relative decrease in the residual of 1e-5.
2) for your size problems can parallelism help?
I think it should but elemental since it requires a different datalayout has additional overhead cost to get the data into the optimalformat for parallelism.
3) can parallelism help on YOUR machine. Just because a machine hasmultiple cores it may not be able to utilize them efficiently forsolvers if the total machine memory bandwidth is limited.
So the first thing to do is on the machine you plan to use for yourcomputations run the streams benchmark discussed inhttps://www.mcs.anl.gov/petsc/documentation/faq.html#computers thiswill give us some general idea of how much parallelism you can takeadvantage of. Is the machine a parallel cluster or just a single node?
After this I'll give you a few specific cases to run to get afeeling for what approach would be best for your problems,
   Barry

Thank you for the responses. Here's a pointwise response to your queries:

1) I am presently working with random matrices (with a large constantvalue in the diagonals to ensure diagonal dominance) before I startworking with the matrices from my system. At the end of the day thematrices I expect to be using can be thought of to be Schur complementsof a Laplacian operator.

2) Since my application is joint dynamics, I have a non-linear functionthat has to be evaluated at quadrature locations on a 2D mesh andintegrated to form the residue vector as well as the Jacobian matrices.There is thus potential speedup I expect for the function evaluationsdefinitely.

Since the matrices I will end up with will be dense (at least for staticsimulations), I wanted directions to find the best solver options for myproblem.

3) I am presently on an octa-core (4 physical cores) machine with 16Gigs of RAM. I plan to conduct code development and benchmarking on thismachine before I start running larger models on a cluster I have access to.

I was unable to run the streams benchmark on the cluster (PETSc 3.11.1is installed there, and the benchmarks in the git directory was givingissues), but I was able to do this in my local machine - here's the output:


   scaling.log
   1  13697.5004   Rate (MB/s)
   2  13021.9505   Rate (MB/s) 0.950681
   3  12402.6925   Rate (MB/s) 0.905471
   4  12309.1712   Rate (MB/s) 0.898644

Could you point me to the part in the documentation that speaks aboutthe different options available for dealing with dense matrices? I justrealized that bindings for MUMPS are available in PETSc.


Thank you very much,
Nidish

However for frequency domain simulations, we use matrices that areabout 10 times the size of the original matrices (whose meshes havebeen shown to be convergent in static test cases).
Thank you,
Nidish
Barry
I'm attaching the code here, in case you'd like to have a look atwhat I've been trying to do.
The two configurations of interest are,

    $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij
    $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental

(for the GMRES with ILU) and,

    $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij -pc_type lu
    -ksp_type preonly
    $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental -pc_type lu
    -ksp_type preonly

elemental seems to perform poorly in both cases.

Nidish

On 8/7/20 12:50 AM, Barry Smith wrote:
  What is the output of -ksp_view  for the two case?
It is not only the matrix format but also the matrix solver thatmatters. For example if you are using an iterative solver theelemental format won't be faster, you should use the PETScMPIDENSE format. The elemental format is really intended when youuse a direct LU solver for the matrix. For tiny matrices like thisan iterative solver could easily be faster than the direct solver,it depends on the conditioning (eigenstructure) of the densematrix. Also the default PETSc solver uses block Jacobi with ILUon each process if using a sparse format, ILU applied to a densematrix is actually LU so your solver is probably different alsobetween the MPIAIJ and the elemental.
  Barry
On Aug 7, 2020, at 12:30 AM, Nidish <[email protected]<mailto:[email protected]>> wrote:
Thank you for the response.
I've just been running some tests with matrices up to 2e4dimensions (dense). When I compared the solution times for"-mat_type elemental" and "-mat_type mpiaij" running with 4cores, I found the mpidense versions running way faster thanelemental. I have not been able to make the elemental versionfinish up for 2e4 so far (my patience runs out faster).
What's going on here? I thought elemental was supposed to besuperior for dense matrices.
I can share the code if that's appropriate for this forum (sorry,I'm new here).
Nidish
On Aug 6, 2020, at 23:01, Barry Smith <[email protected]<mailto:[email protected]>> wrote:
        On Aug 6, 2020, at 7:32 PM, Nidish <[email protected]
        <mailto:[email protected]>> wrote: I'm relatively new to
        PETSc, and my applications involve (for the most part)
        dense matrix solves. I read in the documentation that
        this is an area PETSc does not specialize in but instead
        recommends external libraries such as Elemental. I'm
        wondering if there are any "best" practices in this
        regard. Some questions I'd like answered are: 1. Can I
        just declare my dense matrix as a sparse one and fill the
        whole matrix up? Do any of the others go this route?
        What're possible pitfalls/unfavorable outcomes for this?
I understand the memory overhead probably shoots up.
       No, this isn't practical, the performance will be terrible.

        2. Are there any specific guidelines on when I can expect
elemental to perform better in parallel than in serial?
       Because the computation to communication ratio for dense matrices is 
higher than for sparse you will see better parallel performance for dense 
problems of a given size than sparse problems of a similar size. In other words 
parallelism can help for dense matrices for relatively small problems, of 
course the specifics of your machine hardware and software also play a role.

        Barry

        Of course, I'm interesting in any other details that may
be important in this regard. Thank you, Nidish
--
Nidish
<ksps.cpp>
--
Nidish

--
Nidish

Re: [petsc-users] Best practices for solving Dense Linear systems

Reply via email to