> On Aug 7, 2020, at 12:26 PM, Nidish <[email protected]> wrote:
> 
> 
> 
> On 8/7/20 8:52 AM, Barry Smith wrote:
>> 
>> 
>>> On Aug 7, 2020, at 1:25 AM, Nidish <[email protected] <mailto:[email protected]>> 
>>> wrote:
>>> 
>>> Indeed - I was just using the default solver (GMRES with ILU).
>>> 
>>> Using just standard LU (direct solve with "-pc_type lu -ksp_type preonly"), 
>>> I find elemental to be extremely slow even for a 1000x1000 matrix.
>>> 
>> 
>> What about on one process? 
> On just one process the performance is comparable.
>> 
>> Elemental generally won't be competitive for such tiny matrices. 
>>> For MPIaij it's throwing me an error if I tried "-pc_type lu".
>>> 
>> 
>>    Yes, there is no PETSc code for sparse parallel direct solver, this is 
>> expected.
>> 
>>    What about ?
>> 
>>> mpirun -n 1 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi
>>> 
>>> mpirun -n 4 ./ksps -N 1000 -mat_type mpidense -pc_type jacobi
> Same results - the elemental version is MUCH slower (for 1000x1000). 
>> Where will your dense matrices be coming from and how big will they be in 
>> practice? This will help determine if an iterative solver is appropriate. If 
>> they will be 100,000 for example then testing with 1000 will tell you 
>> nothing useful, you need to test with the problem size you care about.
> The matrices in my application arise from substructuring/Component Mode 
> Synthesis conducted on a system that is linear "almost everywhere", for 
> example jointed systems. The procedure we follow is: build a mesh & identify 
> the nodes corresponding to the interfaces, reduce the model using component 
> mode synthesis to obtain a representation of the system using just the 
> interface degrees-of-freedom along with some (~10s) generalized "modal 
> coordinates". We conduct the non-linear analyses (transient, steady state 
> harmonic, etc.) using this matrices. 
> 
> I am interested in conducting non-linear mesh convergence for a particular 
> system of interest wherein the interface DoFs are, approx, 4000, 8000, 12000, 
> 16000. I'm fairly certain the dense matrices will not be larger. The 
> 

   Ok, so it is not clear how well conditioned these dense matrices will be. 

    There are three questions that need to be answered.

1) for your problem can iterative methods be used and will they require less 
work than direct solvers.

       For direct LU the work is order N^3 to do the factorization with a 
relatively small constant. Because of smart organization inside dense LU the 
flops can be done very efficiently. 

       For GMRES with Jacobi preconditioning the work is order N^2 (the time 
for a dense matrix-vector product) for each iteration. If the number of 
iterations small than the total work is much less than a direct solver. In the 
worst case the number of iterations is order N so the total work is order N^3, 
the same order as a direct method.  But the efficiency of a dense matrix-vector 
product is much lower than the efficiency of a LU factorization so even if the 
work is the same order it can take longer.  One should use mpidense as the 
matrix format for iterative.

       With iterative methods YOU get to decide how accurate you need your 
solution, you do this by setting how small you want the residual to be (since 
you can't directly control the error). By default PETSc uses a relative 
decrease in the residual of 1e-5. 

2) for your size problems can parallelism help? 

    I think it should but elemental since it requires a different data layout 
has additional overhead cost to get the data into the optimal format for 
parallelism. 

3) can parallelism help on YOUR machine. Just because a machine has multiple 
cores it may not be able to utilize them efficiently for solvers if the total 
machine memory bandwidth is limited. 

   So the first thing to do is on the machine you plan to use for your 
computations run the streams benchmark discussed in 
https://www.mcs.anl.gov/petsc/documentation/faq.html#computers this will give 
us some general idea of how much parallelism you can take advantage of. Is the 
machine a parallel cluster or just a single node? 

   After this I'll give you a few specific cases to run to get a feeling for 
what approach would be best for your problems,

   Barry



> However for frequency domain simulations, we use matrices that are about 10 
> times the size of the original matrices (whose meshes have been shown to be 
> convergent in static test cases). 
> 
> Thank you,
> Nidish
> 
>> 
>> Barry
>> 
>>> I'm attaching the code here, in case you'd like to have a look at what I've 
>>> been trying to do. 
>>> 
>>> The two configurations of interest are,
>>> 
>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij
>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental
>>> 
>>> (for the GMRES with ILU) and,
>>> 
>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type mpiaij -pc_type lu -ksp_type preonly
>>> $> mpirun -n 4 ./ksps -N 1000 -mat_type elemental -pc_type lu -ksp_type 
>>> preonly
>>> 
>>> elemental seems to perform poorly in both cases.
>>> 
>>> Nidish
>>> 
>>> On 8/7/20 12:50 AM, Barry Smith wrote:
>>>> 
>>>>   What is the output of -ksp_view  for the two case?
>>>> 
>>>>   It is not only the matrix format but also the matrix solver that 
>>>> matters. For example if you are using an iterative solver the elemental 
>>>> format won't be faster, you should use the PETSc MPIDENSE format. The 
>>>> elemental format is really intended when you use a direct LU solver for 
>>>> the matrix. For tiny matrices like this an iterative solver could easily 
>>>> be faster than the direct solver, it depends on the conditioning 
>>>> (eigenstructure) of the dense matrix. Also the default PETSc solver uses 
>>>> block Jacobi with ILU on each process if using a sparse format, ILU        
>>>>            applied to a dense matrix is actually LU so your solver is 
>>>> probably different also between the MPIAIJ and the elemental. 
>>>> 
>>>>   Barry
>>>> 
>>>> 
>>>>   
>>>> 
>>>>> On Aug 7, 2020, at 12:30 AM, Nidish <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Thank you for the response.
>>>>> 
>>>>> I've just been running some tests with matrices up to 2e4 dimensions 
>>>>> (dense). When I compared the solution times for "-mat_type elemental" and 
>>>>> "-mat_type mpiaij" running with 4 cores, I found the mpidense versions 
>>>>> running way faster than elemental. I have not been able to make the 
>>>>> elemental version finish up for 2e4 so far (my patience runs out faster). 
>>>>> 
>>>>> What's going on here? I thought elemental was supposed to be superior for 
>>>>> dense matrices.
>>>>> 
>>>>> I can share the code if that's appropriate for this forum (sorry, I'm new 
>>>>> here). 
>>>>> 
>>>>> Nidish
>>>>> On Aug 6, 2020, at 23:01, Barry Smith <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>>  On Aug 6, 2020, at 7:32 PM, Nidish <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>>  
>>>>>  I'm relatively new to PETSc, and my applications involve (for the most 
>>>>> part) dense matrix solves.
>>>>>  
>>>>>  I read in the documentation that this is an area PETSc does not 
>>>>> specialize in but instead recommends external libraries such as 
>>>>> Elemental. I'm wondering if there are any "best" practices in this 
>>>>> regard. Some questions I'd like answered are:
>>>>>  
>>>>>  1. Can I just declare my dense matrix as a sparse one and fill the whole 
>>>>> matrix up? Do any of the others go this route? What're possible 
>>>>> pitfalls/unfavorable outcomes for this? I understand the memory overhead 
>>>>> probably shoots up.
>>>>> 
>>>>>   No, this isn't practical, the performance will be terrible.
>>>>> 
>>>>>  2. Are there any specific guidelines on when I can expect elemental to 
>>>>> perform better in parallel than in serial?
>>>>> 
>>>>>   Because the computation to communication ratio for dense matrices is 
>>>>> higher than for sparse you will see better parallel performance for dense 
>>>>> problems of a given size than sparse problems of a similar size. In other 
>>>>> words parallelism can help for dense matrices for relatively small 
>>>>> problems, of course the specifics of your machine hardware and software 
>>>>> also play a role.
>>>>> 
>>>>>    Barry
>>>>> 
>>>>>  
>>>>>  Of course, I'm interesting in any other details that may be important in 
>>>>> this regard.
>>>>>  
>>>>>  Thank you,
>>>>>  Nidish
>>>>> 
>>>> 
>>> -- 
>>> Nidish
>>> <ksps.cpp>
>> 
> -- 
> Nidish

Reply via email to