Re: [deal.II] Re: Assemble function, long time

2022-03-14 Thread Bruno Turcksin
Hermes,

Sorry, I don't use petsc. Maybe someone else can help you.

Best,

Bruno

Le lun. 14 mars 2022 à 05:42, Hermes Sampedro  a
écrit :

> Dear Bruno,
>
> I have been reading the examples and documents you pointed out. I tried to
> use SolvereGREMS with PreconditionILU. However, I am getting a running
> error that I can not really understand when calling
> PETScWrappers::PreconditionILU preconditioner(system_matrix). However, it
> seems to work with  PETScWrappers::PreconditionBlockJacobi
> 
> preconditioner(system_matrix). The solver function ad error that I get is
> the following:
>
> *void LaplaceProblem::solve()*
>
> *   {*
>
> *   PETScWrappers::MPI::Vector
> completely_distributed_solution(locally_owned_dofs,mpi_communicator);*
>
> * SolverControl cn(completely_distributed_solution.size(), 1e-8 *
> system_rhs.l2_norm());*
>
> *  PETScWrappers::SolverGMRES solver(cn, mpi_communicator);*
>
> *   PETScWrappers::PreconditionILU preconditioner(system_matrix);*
>
> * solver.solve(system_matrix, completely_distributed_solution,
> system_rhs, preconditioner); *
>
> *   constraints.distribute(completely_distributed_solution);*
>
> *   locally_relevant_solution = completely_distributed_solution;*
>
> *   }*
>
>
> [0]PETSC ERROR: - Error Message
> --
>
> [0]PETSC ERROR: See
> https://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for
> possible LU and Cholesky solvers
>
> [0]PETSC ERROR: Could not locate a solver package for factorization type
> ILU and matrix type mpiaij.
>
> [0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
>
> [0]PETSC ERROR: Petsc Release Version 3.13.1, May 02, 2020
>
> [0]PETSC ERROR: ./waveLaplaceSolver on a  named gbarlogin1 by hsllo Fri
> Mar 11 11:05:23 2022
>
> [0]PETSC ERROR: Configure options
> --prefix=/zhome/32/9/115503/dealii-candi/petsc-3.13.1 --with-debugging=0
> --with-shared-libraries=1 --with-mpi=1 --with-x=0 --with-64-bit-indices=0
> --download-hypre=1 CC=mpicc CXX=mpicxx FC=mpif90
> --with-blaslapack-dir=/appl/OpenBLAS/0.3.17/XeonE5-2660v3/gcc-11.2.0/lib
> --with-parmetis-dir=/zhome/32/9/115503/dealii-candi/parmetis-4.0.3
> --with-metis-dir=/zhome/32/9/115503/dealii-candi/parmetis-4.0.3
> --download-scalapack=1 --download-mumps=1
>
> [0]PETSC ERROR: #1 MatGetFactor() line 4492 in
> /zhome/32/9/115503/dealii-candi/tmp/build/petsc-lite-3.13.1/src/mat/interface/matrix.c
>
> [0]PETSC ERROR: #2 PCSetUp_ILU() line 133 in
> /zhome/32/9/115503/dealii-candi/tmp/build/petsc-lite-3.13.1/src/ksp/pc/impls/factor/ilu/ilu.c
>
> [0]PETSC ERROR: #3 PCSetUp() line 894 in
> /zhome/32/9/115503/dealii-candi/tmp/build/petsc-lite-3.13.1/src/ksp/pc/interface/precon.c
>
> 
>
> Exception on processing:
>
> 
>
> An error occurred in line <431> of file
> 
> in function
>
> void dealii::PETScWrappers::PreconditionILU::initialize(const
> dealii::PETScWrappers::MatrixBase&, const
> dealii::PETScWrappers::PreconditionILU::AdditionalData&)
>
> The violated condition was:
>
> ierr == 0
>
> Additional information:
>
> deal.II encountered an error while calling a PETSc function.
>
> The description of the error provided by PETSc is "See
>
> https://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for
>
> possible LU and Cholesky solvers".
>
> The numerical value of the original error code is 92.
>
>
>
>
>
> Could you please help me to understand what is happening?
>
> Thank you again for your help
> Regards,
> H
> El viernes, 11 de marzo de 2022 a las 10:47:37 UTC+1, Hermes Sampedro
> escribió:
>
>> Dear Bruno and Wolfgang,
>>
>> thank you very much for your comments and help, it is very helpful.
>> Actually, I think that is what I am experiencing. When running with my
>> actual direct solver a system with 15 elements per direction (4th
>> polynomial order with 0.5million dof), the solver takes 50 seconds.
>> However, increasing to 30 elements per direction (3.5 million dof) the
>> solver takes 1.5 hours. I think this shows how it does not scale well in
>> terms of time as you mentioned. I will definitely try with the iterative
>> solver.
>>
>> Thannk you again
>> Regards,
>> H.
>> El jueves, 10 de marzo de 2022 a las 17:17:47 UTC+1, Wolfgang Bangerth
>> escribió:
>>
>>> On 3/10/22 07:00, Hermes Sampedro wrote:
>>> > I am experiencing long computational times with the solver function.
>>> I am
>>> > trying to use DoFRenumbering::Cuthill_McKee(dof_handler),
>>> > DoFRenumbering::boost::Cuthill_McKee(dof_handler,false,false)
>>> > but I get even higher computational times. Am I doing something wrong?
>>>
>>> Renumbering makes an enormous difference for sparse direct solvers, and
>>> as a
>>> 

Re: [deal.II] Re: Assemble function, long time

2022-03-14 Thread Hermes Sampedro
 

Dear Bruno,

I have been reading the examples and documents you pointed out. I tried to 
use SolvereGREMS with PreconditionILU. However, I am getting a running 
error that I can not really understand when calling 
PETScWrappers::PreconditionILU preconditioner(system_matrix). However, it 
seems to work with  PETScWrappers::PreconditionBlockJacobi 

 
preconditioner(system_matrix). The solver function ad error that I get is 
the following:

*void LaplaceProblem::solve()*

*   {*

*   PETScWrappers::MPI::Vector 
completely_distributed_solution(locally_owned_dofs,mpi_communicator);*

* SolverControl cn(completely_distributed_solution.size(), 1e-8 * 
system_rhs.l2_norm());*

*  PETScWrappers::SolverGMRES solver(cn, mpi_communicator);*

*   PETScWrappers::PreconditionILU preconditioner(system_matrix);*

* solver.solve(system_matrix, completely_distributed_solution, 
system_rhs, preconditioner); *

*   constraints.distribute(completely_distributed_solution);*

*   locally_relevant_solution = completely_distributed_solution;*

*   }*


[0]PETSC ERROR: - Error Message 
--

[0]PETSC ERROR: See 
https://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for 
possible LU and Cholesky solvers

[0]PETSC ERROR: Could not locate a solver package for factorization type 
ILU and matrix type mpiaij.

[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html 
for trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.13.1, May 02, 2020 

[0]PETSC ERROR: ./waveLaplaceSolver on a  named gbarlogin1 by hsllo Fri Mar 
11 11:05:23 2022

[0]PETSC ERROR: Configure options 
--prefix=/zhome/32/9/115503/dealii-candi/petsc-3.13.1 --with-debugging=0 
--with-shared-libraries=1 --with-mpi=1 --with-x=0 --with-64-bit-indices=0 
--download-hypre=1 CC=mpicc CXX=mpicxx FC=mpif90 
--with-blaslapack-dir=/appl/OpenBLAS/0.3.17/XeonE5-2660v3/gcc-11.2.0/lib 
--with-parmetis-dir=/zhome/32/9/115503/dealii-candi/parmetis-4.0.3 
--with-metis-dir=/zhome/32/9/115503/dealii-candi/parmetis-4.0.3 
--download-scalapack=1 --download-mumps=1

[0]PETSC ERROR: #1 MatGetFactor() line 4492 in 
/zhome/32/9/115503/dealii-candi/tmp/build/petsc-lite-3.13.1/src/mat/interface/matrix.c

[0]PETSC ERROR: #2 PCSetUp_ILU() line 133 in 
/zhome/32/9/115503/dealii-candi/tmp/build/petsc-lite-3.13.1/src/ksp/pc/impls/factor/ilu/ilu.c

[0]PETSC ERROR: #3 PCSetUp() line 894 in 
/zhome/32/9/115503/dealii-candi/tmp/build/petsc-lite-3.13.1/src/ksp/pc/interface/precon.c



Exception on processing: 



An error occurred in line <431> of file 

 
in function

void dealii::PETScWrappers::PreconditionILU::initialize(const 
dealii::PETScWrappers::MatrixBase&, const 
dealii::PETScWrappers::PreconditionILU::AdditionalData&)

The violated condition was: 

ierr == 0

Additional information: 

deal.II encountered an error while calling a PETSc function.

The description of the error provided by PETSc is "See

https://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html for

possible LU and Cholesky solvers".

The numerical value of the original error code is 92.





Could you please help me to understand what is happening?

Thank you again for your help
Regards, 
H
El viernes, 11 de marzo de 2022 a las 10:47:37 UTC+1, Hermes Sampedro 
escribió:

> Dear Bruno and Wolfgang, 
>
> thank you very much for your comments and help, it is very helpful. 
> Actually, I think that is what I am experiencing. When running with my 
> actual direct solver a system with 15 elements per direction (4th 
> polynomial order with 0.5million dof), the solver takes 50 seconds. 
> However, increasing to 30 elements per direction (3.5 million dof) the 
> solver takes 1.5 hours. I think this shows how it does not scale well in 
> terms of time as you mentioned. I will definitely try with the iterative 
> solver.
>
> Thannk you again
> Regards, 
> H.
> El jueves, 10 de marzo de 2022 a las 17:17:47 UTC+1, Wolfgang Bangerth 
> escribió:
>
>> On 3/10/22 07:00, Hermes Sampedro wrote: 
>> > I am experiencing long computational times with the solver function.  I 
>> am 
>> > trying to use DoFRenumbering::Cuthill_McKee(dof_handler), 
>> > DoFRenumbering::boost::Cuthill_McKee(dof_handler,false,false) 
>> > but I get even higher computational times. Am I doing something wrong? 
>>
>> Renumbering makes an enormous difference for sparse direct solvers, and 
>> as a 
>> consequence all such solvers I know of do it internally (though they use 
>> variations of the "minimum degree" renumbering, rather than 
>> Cuthill-McKee). As 
>> a consequence, renumbering yourself likely makes no difference. 
>>
>> But, as you discover and as Bruno already pointed out, 

Re: [deal.II] Re: Assemble function, long time

2022-03-11 Thread Hermes Sampedro
Dear Bruno and Wolfgang, 

thank you very much for your comments and help, it is very helpful. 
Actually, I think that is what I am experiencing. When running with my 
actual direct solver a system with 15 elements per direction (4th 
polynomial order with 0.5million dof), the solver takes 50 seconds. 
However, increasing to 30 elements per direction (3.5 million dof) the 
solver takes 1.5 hours. I think this shows how it does not scale well in 
terms of time as you mentioned. I will definitely try with the iterative 
solver.

Thannk you again
Regards, 
H.
El jueves, 10 de marzo de 2022 a las 17:17:47 UTC+1, Wolfgang Bangerth 
escribió:

> On 3/10/22 07:00, Hermes Sampedro wrote:
> > I am experiencing long computational times with the solver function.  I 
> am 
> > trying to use DoFRenumbering::Cuthill_McKee(dof_handler), 
> > DoFRenumbering::boost::Cuthill_McKee(dof_handler,false,false)
> > but I get even higher computational times. Am I doing something wrong?
>
> Renumbering makes an enormous difference for sparse direct solvers, and as 
> a 
> consequence all such solvers I know of do it internally (though they use 
> variations of the "minimum degree" renumbering, rather than 
> Cuthill-McKee). As 
> a consequence, renumbering yourself likely makes no difference.
>
> But, as you discover and as Bruno already pointed out, even with optimal 
> ordering, direct solvers do not scale well both in terms of overall time 
> and 
> in parallelism. You may want to take a look at the several video lectures 
> on 
> solvers and preconditioners to see what you can do about your case.
>
> Best
> W.
>
> -- 
> 
> Wolfgang Bangerth email: bang...@colostate.edu
> www: http://www.math.colostate.edu/~bangerth/
>
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/c13e9b57-12cd-48fc-ba62-f48d3da3b6b2n%40googlegroups.com.


Re: [deal.II] Re: Assemble function, long time

2022-03-10 Thread Wolfgang Bangerth

On 3/10/22 07:00, Hermes Sampedro wrote:
I am experiencing long computational times with the solver function.  I am 
trying to use DoFRenumbering::Cuthill_McKee(dof_handler), 
DoFRenumbering::boost::Cuthill_McKee(dof_handler,false,false)

but I get even higher computational times. Am I doing something wrong?


Renumbering makes an enormous difference for sparse direct solvers, and as a 
consequence all such solvers I know of do it internally (though they use 
variations of the "minimum degree" renumbering, rather than Cuthill-McKee). As 
a consequence, renumbering yourself likely makes no difference.


But, as you discover and as Bruno already pointed out, even with optimal 
ordering, direct solvers do not scale well both in terms of overall time and 
in parallelism. You may want to take a look at the several video lectures on 
solvers and preconditioners to see what you can do about your case.


Best
 W.

--

Wolfgang Bangerth  email: bange...@colostate.edu
   www: http://www.math.colostate.edu/~bangerth/

--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/8ce429f5-279f-f934-de32-b989f921782b%40colostate.edu.


Re: [deal.II] Re: Assemble function, long time

2022-03-10 Thread Bruno Turcksin
Yes, you should use your system_matrix. AdditionalData can be used to
modify the parameters used by ILU. The interface of PreconditionILU should
work very similarly to BlockJacobi see
https://dealii.org/current/doxygen/deal.II/step_17.html#ElasticProblemsolve
There are several  tutorials that use petsc: steps 17, 18, 40, 50, and 55.
We also have test that use PreconditionILU
https://github.com/dealii/dealii/blob/master/tests/petsc/solver_03_precondition_ilu.cc

Bruno

Le jeu. 10 mars 2022 à 10:19, Hermes Sampedro  a
écrit :

> Dear Bruno,
>
> Thank you very much, I will try this.
> The last question if it is not too much to ask is about In the
> PreconditionILU matrix:
>
> PETScWrappers::PreconditionILU::PreconditionILU
> (const MatrixBase
> 
>  &
> matrix, const AdditionalData
> 
>  &
> additional_data = AdditionalData
> 
> () )
>
>  Is it correct in this case to use my system_matrix? Is there any example
> in the tutorials about this? I can not find any.
>
> Thank you very much
> Regards,
> H.
>
> El jueves, 10 de marzo de 2022 a las 15:29:59 UTC+1, bruno.t...@gmail.com
> escribió:
>
>> If your matrix is symmetric definite positive, you use CG
>> .
>> Otherwise, you use GMRES
>> .
>> Here is the page for ILU
>> 
>>
>> Bruno
>>
>> Le jeu. 10 mars 2022 à 09:18, Hermes Sampedro  a
>> écrit :
>> >
>> > Thank you for your suggestions. Could you please suggest me what
>> function can work well for using a Krylov solver? I can no see examples.
>> > My actual code is implemented using PETS (for sparsematrix, solver,
>> etc). I can see that SLEPcWrappers::SolverKrylovSchur allows PETS matrices.
>> >
>> >
>> > Thank you again
>> >
>> > El jueves, 10 de marzo de 2022 a las 15:12:19 UTC+1,
>> bruno.t...@gmail.com escribió:
>> >>
>> >> Hermes,
>> >>
>> >> I think Cuthill-McKee only works on symmetric matrices, is your matrix
>> >> symmetric? Also, the goal of Cuthill-McKee is to help with the fill in
>> >> of the matrix.There is no guarantee that it helps with the
>> >> performance. If you don't know which preconditioner to use, you can
>> >> use ILU (Incomplete LU decomposition). Basically, you use a direct
>> >> solver but you drop all the "small" entries in the matrix. It's not
>> >> the best preconditioner but you can control how much time you spend in
>> >> the "direct solver". The problem with direct solvers is that there is
>> >> not much you can do to speed them up. In practice, everybody uses
>> >> Krylov solvers because of the problems you are encountering now.
>> >>
>> >> Best,
>> >>
>> >> Bruno
>> >>
>> >> Le jeu. 10 mars 2022 à 09:00, Hermes Sampedro
>> >>  a écrit :
>> >> >
>> >> > Hi Bruno,
>> >> >
>> >> > Yes, for now, I have to use a direct solver due to the
>> preconditioner.
>> >> > I am experiencing long computational times with the solver function.
>> I am trying to use DoFRenumbering::Cuthill_McKee(dof_handler),
>> DoFRenumbering::boost::Cuthill_McKee(dof_handler,false,false)
>> >> > but I get even higher computational times. Am I doing something
>> wrong?
>> >> >
>> >> > In the setup_system() function I do:
>> >> > dof_handler.distribute_dofs(fe);
>> >> > DoFRenumbering::Cuthill_McKee(dof_handler);
>> >> >
>> >> > Then thee solver is
>> >> > void LaplaceProblem::solve()
>> >> > {
>> >> > PETScWrappers::MPI::Vector
>> completely_distributed_solution(locally_owned_dofs,mpi_communicator);
>> >> > SolverControl cn;
>> >> > PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
>> >> > solver.solve(system_matrix, completely_distributed_solution,
>> system_rhs);
>> >> > constraints.distribute(completely_distributed_solution);
>> >> > locally_relevant_solution = completely_distributed_solution;
>> >> > }
>> >> >
>> >> > Thank you
>> >> > Regards,
>> >> > H
>> >> >
>> >> > El jueves, 10 de marzo de 2022 a las 14:54:13 UTC+1,
>> bruno.t...@gmail.com escribió:
>> >> >>
>> >> >> Hermes,
>> >> >>
>> >> >> For large systems, Krylov solvers are faster and require less memory
>> >> >> than direct solvers. Direct solvers scale poorly, in terms of memory
>> >> >> and performance, with the number of unknowns. The only problem with
>> >> >> Krylov solvers is that you need to use a good preconditioner. The
>> >> >> choice of the preconditioner depends on the system that you want to
>> >> >> solve.
>> >> >>
>> >> >> Best,
>> >> >>
>> >> >> Bruno
>> >> >>
>> >> >> Le jeu. 10 mars 2022 à 02:51, Hermes Sampedro
>> >> >>  a écrit :
>> >> >> >
>> >> >> > Dear Bruno,
>> >> >> >

Re: [deal.II] Re: Assemble function, long time

2022-03-10 Thread Hermes Sampedro
Dear Bruno, 

Thank you very much, I will try this.
The last question if it is not too much to ask is about In the 
PreconditionILU matrix:

PETScWrappers::PreconditionILU::PreconditionILU
(const MatrixBase 

 &  
matrix, const AdditionalData 

 &  
additional_data = AdditionalData 

() )

 Is it correct in this case to use my system_matrix? Is there any example 
in the tutorials about this? I can not find any.

Thank you very much
Regards, 
H.

El jueves, 10 de marzo de 2022 a las 15:29:59 UTC+1, bruno.t...@gmail.com 
escribió:

> If your matrix is symmetric definite positive, you use CG 
> .
>  
> Otherwise, you use GMRES 
> .
>  
> Here is the page for ILU 
> 
>
> Bruno
>
> Le jeu. 10 mars 2022 à 09:18, Hermes Sampedro  a 
> écrit :
> >
> > Thank you for your suggestions. Could you please suggest me what 
> function can work well for using a Krylov solver? I can no see examples.
> > My actual code is implemented using PETS (for sparsematrix, solver, 
> etc). I can see that SLEPcWrappers::SolverKrylovSchur allows PETS matrices.
> >
> >
> > Thank you again
> >
> > El jueves, 10 de marzo de 2022 a las 15:12:19 UTC+1, 
> bruno.t...@gmail.com escribió:
> >>
> >> Hermes,
> >>
> >> I think Cuthill-McKee only works on symmetric matrices, is your matrix
> >> symmetric? Also, the goal of Cuthill-McKee is to help with the fill in
> >> of the matrix.There is no guarantee that it helps with the
> >> performance. If you don't know which preconditioner to use, you can
> >> use ILU (Incomplete LU decomposition). Basically, you use a direct
> >> solver but you drop all the "small" entries in the matrix. It's not
> >> the best preconditioner but you can control how much time you spend in
> >> the "direct solver". The problem with direct solvers is that there is
> >> not much you can do to speed them up. In practice, everybody uses
> >> Krylov solvers because of the problems you are encountering now.
> >>
> >> Best,
> >>
> >> Bruno
> >>
> >> Le jeu. 10 mars 2022 à 09:00, Hermes Sampedro
> >>  a écrit :
> >> >
> >> > Hi Bruno,
> >> >
> >> > Yes, for now, I have to use a direct solver due to the preconditioner.
> >> > I am experiencing long computational times with the solver function. 
> I am trying to use DoFRenumbering::Cuthill_McKee(dof_handler), 
> DoFRenumbering::boost::Cuthill_McKee(dof_handler,false,false)
> >> > but I get even higher computational times. Am I doing something wrong?
> >> >
> >> > In the setup_system() function I do:
> >> > dof_handler.distribute_dofs(fe);
> >> > DoFRenumbering::Cuthill_McKee(dof_handler);
> >> >
> >> > Then thee solver is
> >> > void LaplaceProblem::solve()
> >> > {
> >> > PETScWrappers::MPI::Vector 
> completely_distributed_solution(locally_owned_dofs,mpi_communicator);
> >> > SolverControl cn;
> >> > PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
> >> > solver.solve(system_matrix, completely_distributed_solution, 
> system_rhs);
> >> > constraints.distribute(completely_distributed_solution);
> >> > locally_relevant_solution = completely_distributed_solution;
> >> > }
> >> >
> >> > Thank you
> >> > Regards,
> >> > H
> >> >
> >> > El jueves, 10 de marzo de 2022 a las 14:54:13 UTC+1, 
> bruno.t...@gmail.com escribió:
> >> >>
> >> >> Hermes,
> >> >>
> >> >> For large systems, Krylov solvers are faster and require less memory
> >> >> than direct solvers. Direct solvers scale poorly, in terms of memory
> >> >> and performance, with the number of unknowns. The only problem with
> >> >> Krylov solvers is that you need to use a good preconditioner. The
> >> >> choice of the preconditioner depends on the system that you want to
> >> >> solve.
> >> >>
> >> >> Best,
> >> >>
> >> >> Bruno
> >> >>
> >> >> Le jeu. 10 mars 2022 à 02:51, Hermes Sampedro
> >> >>  a écrit :
> >> >> >
> >> >> > Dear Bruno,
> >> >> >
> >> >> > Thank you again for your answer.
> >> >> >
> >> >> > I managed to solve now a system of 3.5 million DOF using the same 
> solver as I posted above, SparseDirectMUMPS. Now, in release mode, the 
> assembling takes a few minutes instead of hours, however, the solver 
> function takes approximately 1.5h (per frequency iteration) using 40 
> processes in parallel (similar to step-40).
> >> >> >
> >> >> > I was expecting to get faster performance when running in parallel 
> with 40 processes, especially because I need to run for several 
> frequencies. I would like to ask if you also would expect faster 
> performance. Would that be 

Re: [deal.II] Re: Assemble function, long time

2022-03-10 Thread Bruno Turcksin
If your matrix is symmetric definite positive, you use CG
.
Otherwise, you use GMRES
.
Here is the page for ILU


Bruno

Le jeu. 10 mars 2022 à 09:18, Hermes Sampedro  a
écrit :
>
> Thank you for your suggestions. Could you please suggest me what function
can work well for using a Krylov solver? I can no see examples.
> My actual code is implemented using PETS (for sparsematrix, solver, etc).
I can see that SLEPcWrappers::SolverKrylovSchur allows PETS matrices.
>
>
> Thank you again
>
> El jueves, 10 de marzo de 2022 a las 15:12:19 UTC+1, bruno.t...@gmail.com
escribió:
>>
>> Hermes,
>>
>> I think Cuthill-McKee only works on symmetric matrices, is your matrix
>> symmetric? Also, the goal of Cuthill-McKee is to help with the fill in
>> of the matrix.There is no guarantee that it helps with the
>> performance. If you don't know which preconditioner to use, you can
>> use ILU (Incomplete LU decomposition). Basically, you use a direct
>> solver but you drop all the "small" entries in the matrix. It's not
>> the best preconditioner but you can control how much time you spend in
>> the "direct solver". The problem with direct solvers is that there is
>> not much you can do to speed them up. In practice, everybody uses
>> Krylov solvers because of the problems you are encountering now.
>>
>> Best,
>>
>> Bruno
>>
>> Le jeu. 10 mars 2022 à 09:00, Hermes Sampedro
>>  a écrit :
>> >
>> > Hi Bruno,
>> >
>> > Yes, for now, I have to use a direct solver due to the preconditioner.
>> > I am experiencing long computational times with the solver function. I
am trying to use DoFRenumbering::Cuthill_McKee(dof_handler),
DoFRenumbering::boost::Cuthill_McKee(dof_handler,false,false)
>> > but I get even higher computational times. Am I doing something wrong?
>> >
>> > In the setup_system() function I do:
>> > dof_handler.distribute_dofs(fe);
>> > DoFRenumbering::Cuthill_McKee(dof_handler);
>> >
>> > Then thee solver is
>> > void LaplaceProblem::solve()
>> > {
>> > PETScWrappers::MPI::Vector
completely_distributed_solution(locally_owned_dofs,mpi_communicator);
>> > SolverControl cn;
>> > PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
>> > solver.solve(system_matrix, completely_distributed_solution,
system_rhs);
>> > constraints.distribute(completely_distributed_solution);
>> > locally_relevant_solution = completely_distributed_solution;
>> > }
>> >
>> > Thank you
>> > Regards,
>> > H
>> >
>> > El jueves, 10 de marzo de 2022 a las 14:54:13 UTC+1,
bruno.t...@gmail.com escribió:
>> >>
>> >> Hermes,
>> >>
>> >> For large systems, Krylov solvers are faster and require less memory
>> >> than direct solvers. Direct solvers scale poorly, in terms of memory
>> >> and performance, with the number of unknowns. The only problem with
>> >> Krylov solvers is that you need to use a good preconditioner. The
>> >> choice of the preconditioner depends on the system that you want to
>> >> solve.
>> >>
>> >> Best,
>> >>
>> >> Bruno
>> >>
>> >> Le jeu. 10 mars 2022 à 02:51, Hermes Sampedro
>> >>  a écrit :
>> >> >
>> >> > Dear Bruno,
>> >> >
>> >> > Thank you again for your answer.
>> >> >
>> >> > I managed to solve now a system of 3.5 million DOF using the same
solver as I posted above, SparseDirectMUMPS. Now, in release mode, the
assembling takes a few minutes instead of hours, however, the solver
function takes approximately 1.5h (per frequency iteration) using 40
processes in parallel (similar to step-40).
>> >> >
>> >> > I was expecting to get faster performance when running in parallel
with 40 processes, especially because I need to run for several
frequencies. I would like to ask if you also would expect faster
performance. Would that be solved using the solver that you suggested
(Krylov)?
>> >> >
>> >> >
>> >> > Thank you
>> >> >
>> >> > Regards,
>> >> >
>> >> > H
>> >> >
>> >> >
>> >> > El lunes, 7 de marzo de 2022 a las 15:04:19 UTC+1,
bruno.t...@gmail.com escribió:
>> >> >>
>> >> >> Hermes,
>> >> >>
>> >> >> The problem is that you are using a direct solver. Direct solvers
>> >> >> require a lot of memory because the inverse of a sparse matrix is
>> >> >> generally not sparse. If you use a LU decomposition, which I think
>> >> >> MUMPS does, you need a dense matrix to store the LU decomposition.
>> >> >> That's a lot of memory! You will need to use a Krylov to solve a
>> >> >> problem of this size.
>> >> >>
>> >> >> Best,
>> >> >>
>> >> >> Bruno
>> >> >>
>> >> >> Le dim. 6 mars 2022 à 07:19, Hermes Sampedro 
a écrit :
>> >> >> >
>> >> >> > Dear Bruno,
>> >> >> >
>> >> >> > Thank you very much for the comments. The problem was that I was
running in Debug mode without knowing. Now, after changing to Release the
assembling time is considerably reduced.
>> >> >> >
>> >> >> > 

Re: [deal.II] Re: Assemble function, long time

2022-03-10 Thread Hermes Sampedro
Thank you for your suggestions. Could you please suggest me what function 
can work well for using a Krylov solver? I can no see examples.
My actual code is implemented using PETS (for sparsematrix, solver, etc). I 
can see that SLEPcWrappers::SolverKrylovSchur allows PETS matrices.


Thank you again

El jueves, 10 de marzo de 2022 a las 15:12:19 UTC+1, bruno.t...@gmail.com 
escribió:

> Hermes,
>
> I think Cuthill-McKee only works on symmetric matrices, is your matrix
> symmetric? Also, the goal of Cuthill-McKee is to help with the fill in
> of the matrix.There is no guarantee that it helps with the
> performance. If you don't know which preconditioner to use, you can
> use ILU (Incomplete LU decomposition). Basically, you use a direct
> solver but you drop all the "small" entries in the matrix. It's not
> the best preconditioner but you can control how much time you spend in
> the "direct solver". The problem with direct solvers is that there is
> not much you can do to speed them up. In practice, everybody uses
> Krylov solvers because of the problems you are encountering now.
>
> Best,
>
> Bruno
>
> Le jeu. 10 mars 2022 à 09:00, Hermes Sampedro
>  a écrit :
> >
> > Hi Bruno,
> >
> > Yes, for now, I have to use a direct solver due to the preconditioner.
> > I am experiencing long computational times with the solver function. I 
> am trying to use DoFRenumbering::Cuthill_McKee(dof_handler), 
> DoFRenumbering::boost::Cuthill_McKee(dof_handler,false,false)
> > but I get even higher computational times. Am I doing something wrong?
> >
> > In the setup_system() function I do:
> > dof_handler.distribute_dofs(fe);
> > DoFRenumbering::Cuthill_McKee(dof_handler);
> >
> > Then thee solver is
> > void LaplaceProblem::solve()
> > {
> > PETScWrappers::MPI::Vector 
> completely_distributed_solution(locally_owned_dofs,mpi_communicator);
> > SolverControl cn;
> > PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
> > solver.solve(system_matrix, completely_distributed_solution, system_rhs);
> > constraints.distribute(completely_distributed_solution);
> > locally_relevant_solution = completely_distributed_solution;
> > }
> >
> > Thank you
> > Regards,
> > H
> >
> > El jueves, 10 de marzo de 2022 a las 14:54:13 UTC+1, 
> bruno.t...@gmail.com escribió:
> >>
> >> Hermes,
> >>
> >> For large systems, Krylov solvers are faster and require less memory
> >> than direct solvers. Direct solvers scale poorly, in terms of memory
> >> and performance, with the number of unknowns. The only problem with
> >> Krylov solvers is that you need to use a good preconditioner. The
> >> choice of the preconditioner depends on the system that you want to
> >> solve.
> >>
> >> Best,
> >>
> >> Bruno
> >>
> >> Le jeu. 10 mars 2022 à 02:51, Hermes Sampedro
> >>  a écrit :
> >> >
> >> > Dear Bruno,
> >> >
> >> > Thank you again for your answer.
> >> >
> >> > I managed to solve now a system of 3.5 million DOF using the same 
> solver as I posted above, SparseDirectMUMPS. Now, in release mode, the 
> assembling takes a few minutes instead of hours, however, the solver 
> function takes approximately 1.5h (per frequency iteration) using 40 
> processes in parallel (similar to step-40).
> >> >
> >> > I was expecting to get faster performance when running in parallel 
> with 40 processes, especially because I need to run for several 
> frequencies. I would like to ask if you also would expect faster 
> performance. Would that be solved using the solver that you suggested 
> (Krylov)?
> >> >
> >> >
> >> > Thank you
> >> >
> >> > Regards,
> >> >
> >> > H
> >> >
> >> >
> >> > El lunes, 7 de marzo de 2022 a las 15:04:19 UTC+1, 
> bruno.t...@gmail.com escribió:
> >> >>
> >> >> Hermes,
> >> >>
> >> >> The problem is that you are using a direct solver. Direct solvers
> >> >> require a lot of memory because the inverse of a sparse matrix is
> >> >> generally not sparse. If you use a LU decomposition, which I think
> >> >> MUMPS does, you need a dense matrix to store the LU decomposition.
> >> >> That's a lot of memory! You will need to use a Krylov to solve a
> >> >> problem of this size.
> >> >>
> >> >> Best,
> >> >>
> >> >> Bruno
> >> >>
> >> >> Le dim. 6 mars 2022 à 07:19, Hermes Sampedro  
> a écrit :
> >> >> >
> >> >> > Dear Bruno,
> >> >> >
> >> >> > Thank you very much for the comments. The problem was that I was 
> running in Debug mode without knowing. Now, after changing to Release the 
> assembling time is considerably reduced.
> >> >> >
> >> >> > Moreover, I am experiencing another issue that I would like to 
> ask. My mesh is done with hyper_cube() in 3D and 5 refinements. The dof is 
> around 3 million. When running, I always get a memory issue and the program 
> stops. I realized that the problem is in the line that executes 
> solver.solve(system_matrix, completely_distributed_solution, system_rhs);
> >> >> > I am using SparseMatrix and I do not fully understand where the 
> problem could come from. The matrices are initialized 

Re: [deal.II] Re: Assemble function, long time

2022-03-10 Thread Bruno Turcksin
Hermes,

I think Cuthill-McKee only works on symmetric matrices, is your matrix
symmetric? Also, the goal of Cuthill-McKee is to help with the fill in
of the matrix.There is no guarantee that it helps with the
performance. If you don't know which preconditioner to use, you can
use ILU (Incomplete LU decomposition). Basically, you use a direct
solver but you drop all the "small" entries in the matrix. It's not
the best preconditioner but you can control how much time you spend in
the "direct solver". The problem with direct solvers is that there is
not much you can do to speed them up. In practice, everybody uses
Krylov solvers because of the problems you are encountering now.

Best,

Bruno

Le jeu. 10 mars 2022 à 09:00, Hermes Sampedro
 a écrit :
>
> Hi Bruno,
>
> Yes, for now, I have to use a direct solver due to the preconditioner.
> I am experiencing long computational times with the solver function.  I am 
> trying to use DoFRenumbering::Cuthill_McKee(dof_handler), 
> DoFRenumbering::boost::Cuthill_McKee(dof_handler,false,false)
> but I get even higher computational times. Am I doing something wrong?
>
> In the setup_system() function I do:
> dof_handler.distribute_dofs(fe);
> DoFRenumbering::Cuthill_McKee(dof_handler);
>
> Then thee solver is
> void LaplaceProblem::solve()
> {
> PETScWrappers::MPI::Vector 
> completely_distributed_solution(locally_owned_dofs,mpi_communicator);
> SolverControl cn;
> PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
> solver.solve(system_matrix, completely_distributed_solution, system_rhs);
> constraints.distribute(completely_distributed_solution);
> locally_relevant_solution = completely_distributed_solution;
> }
>
> Thank you
> Regards,
> H
>
> El jueves, 10 de marzo de 2022 a las 14:54:13 UTC+1, bruno.t...@gmail.com 
> escribió:
>>
>> Hermes,
>>
>> For large systems, Krylov solvers are faster and require less memory
>> than direct solvers. Direct solvers scale poorly, in terms of memory
>> and performance, with the number of unknowns. The only problem with
>> Krylov solvers is that you need to use a good preconditioner. The
>> choice of the preconditioner depends on the system that you want to
>> solve.
>>
>> Best,
>>
>> Bruno
>>
>> Le jeu. 10 mars 2022 à 02:51, Hermes Sampedro
>>  a écrit :
>> >
>> > Dear Bruno,
>> >
>> > Thank you again for your answer.
>> >
>> > I managed to solve now a system of 3.5 million DOF using the same solver 
>> > as I posted above, SparseDirectMUMPS. Now, in release mode, the assembling 
>> > takes a few minutes instead of hours, however, the solver function takes 
>> > approximately 1.5h (per frequency iteration) using 40 processes in 
>> > parallel (similar to step-40).
>> >
>> > I was expecting to get faster performance when running in parallel with 40 
>> > processes, especially because I need to run for several frequencies. I 
>> > would like to ask if you also would expect faster performance. Would that 
>> > be solved using the solver that you suggested (Krylov)?
>> >
>> >
>> > Thank you
>> >
>> > Regards,
>> >
>> > H
>> >
>> >
>> > El lunes, 7 de marzo de 2022 a las 15:04:19 UTC+1, bruno.t...@gmail.com 
>> > escribió:
>> >>
>> >> Hermes,
>> >>
>> >> The problem is that you are using a direct solver. Direct solvers
>> >> require a lot of memory because the inverse of a sparse matrix is
>> >> generally not sparse. If you use a LU decomposition, which I think
>> >> MUMPS does, you need a dense matrix to store the LU decomposition.
>> >> That's a lot of memory! You will need to use a Krylov to solve a
>> >> problem of this size.
>> >>
>> >> Best,
>> >>
>> >> Bruno
>> >>
>> >> Le dim. 6 mars 2022 à 07:19, Hermes Sampedro  a 
>> >> écrit :
>> >> >
>> >> > Dear Bruno,
>> >> >
>> >> > Thank you very much for the comments. The problem was that I was 
>> >> > running in Debug mode without knowing. Now, after changing to Release 
>> >> > the assembling time is considerably reduced.
>> >> >
>> >> > Moreover, I am experiencing another issue that I would like to ask. My 
>> >> > mesh is done with hyper_cube() in 3D and 5 refinements. The dof is 
>> >> > around 3 million. When running, I always get a memory issue and the 
>> >> > program stops. I realized that the problem is in the line that executes 
>> >> > solver.solve(system_matrix, completely_distributed_solution, 
>> >> > system_rhs);
>> >> > I am using SparseMatrix and I do not fully understand where the problem 
>> >> > could come from. The matrices are initialized beforehand, what reason 
>> >> > do you think It could produce a memory issue in the solver?
>> >> >
>> >> > Below is the full solver function:
>> >> >
>> >> > template 
>> >> > void LaplaceProblem::solve()
>> >> > {
>> >> > PETScWrappers::MPI::Vector 
>> >> > completely_distributed_solution(locally_owned_dofs,mpi_communicator);
>> >> > SolverControl cn;
>> >> > PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
>> >> > solver.solve(system_matrix, completely_distributed_solution, 
>> >> > 

Re: [deal.II] Re: Assemble function, long time

2022-03-10 Thread Hermes Sampedro
Hi Bruno, 

Yes, for now, I have to use a direct solver due to the preconditioner. 
I am experiencing long computational times with the solver function.  I am 
trying to use DoFRenumbering::Cuthill_McKee(dof_handler), 
DoFRenumbering::boost::Cuthill_McKee(dof_handler,false,false) 

but I get even higher computational times. Am I doing something wrong?

In the setup_system() function I do: 
dof_handler.distribute_dofs(fe);
DoFRenumbering::Cuthill_McKee(dof_handler);

Then thee solver is
* void LaplaceProblem::solve()*
* {*
* PETScWrappers::MPI::Vector 
completely_distributed_solution(locally_owned_dofs,mpi_communicator);*
* SolverControl cn;*
* PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);*
* solver.solve(system_matrix, completely_distributed_solution, system_rhs);*
* constraints.distribute(completely_distributed_solution);*
* locally_relevant_solution = completely_distributed_solution;*
*}*

Thank you
Regards, 
H

El jueves, 10 de marzo de 2022 a las 14:54:13 UTC+1, bruno.t...@gmail.com 
escribió:

> Hermes,
>
> For large systems, Krylov solvers are faster and require less memory
> than direct solvers. Direct solvers scale poorly, in terms of memory
> and performance, with the number of unknowns. The only problem with
> Krylov solvers is that you need to use a good preconditioner. The
> choice of the preconditioner depends on the system that you want to
> solve.
>
> Best,
>
> Bruno
>
> Le jeu. 10 mars 2022 à 02:51, Hermes Sampedro
>  a écrit :
> >
> > Dear Bruno,
> >
> > Thank you again for your answer.
> >
> > I managed to solve now a system of 3.5 million DOF using the same solver 
> as I posted above, SparseDirectMUMPS. Now, in release mode, the assembling 
> takes a few minutes instead of hours, however, the solver function takes 
> approximately 1.5h (per frequency iteration) using 40 processes in parallel 
> (similar to step-40).
> >
> > I was expecting to get faster performance when running in parallel with 
> 40 processes, especially because I need to run for several frequencies. I 
> would like to ask if you also would expect faster performance. Would that 
> be solved using the solver that you suggested (Krylov)?
> >
> >
> > Thank you
> >
> > Regards,
> >
> > H
> >
> >
> > El lunes, 7 de marzo de 2022 a las 15:04:19 UTC+1, bruno.t...@gmail.com 
> escribió:
> >>
> >> Hermes,
> >>
> >> The problem is that you are using a direct solver. Direct solvers
> >> require a lot of memory because the inverse of a sparse matrix is
> >> generally not sparse. If you use a LU decomposition, which I think
> >> MUMPS does, you need a dense matrix to store the LU decomposition.
> >> That's a lot of memory! You will need to use a Krylov to solve a
> >> problem of this size.
> >>
> >> Best,
> >>
> >> Bruno
> >>
> >> Le dim. 6 mars 2022 à 07:19, Hermes Sampedro  a 
> écrit :
> >> >
> >> > Dear Bruno,
> >> >
> >> > Thank you very much for the comments. The problem was that I was 
> running in Debug mode without knowing. Now, after changing to Release the 
> assembling time is considerably reduced.
> >> >
> >> > Moreover, I am experiencing another issue that I would like to ask. 
> My mesh is done with hyper_cube() in 3D and 5 refinements. The dof is 
> around 3 million. When running, I always get a memory issue and the program 
> stops. I realized that the problem is in the line that executes 
> solver.solve(system_matrix, completely_distributed_solution, system_rhs);
> >> > I am using SparseMatrix and I do not fully understand where the 
> problem could come from. The matrices are initialized beforehand, what 
> reason do you think It could produce a memory issue in the solver?
> >> >
> >> > Below is the full solver function:
> >> >
> >> > template 
> >> > void LaplaceProblem::solve()
> >> > {
> >> > PETScWrappers::MPI::Vector 
> completely_distributed_solution(locally_owned_dofs,mpi_communicator);
> >> > SolverControl cn;
> >> > PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
> >> > solver.solve(system_matrix, completely_distributed_solution, 
> system_rhs);
> >> > constraints.distribute(completely_distributed_solution);
> >> > locally_relevant_solution = completely_distributed_solution;
> >> > }
> >> >
> >> >
> >> > Thank you again for your help
> >> > Regards
> >> > H.
> >> >
> >> > El jueves, 3 de marzo de 2022 a las 15:13:30 UTC+1, 
> bruno.t...@gmail.com escribió:
> >> >>
> >> >> Hermes,
> >> >>
> >> >> There is a couple of things that you could do but it probably won't 
> give you a significant speed up. Are you sure that you are running in 
> Release mode and not in Debug? Do you evaluate complicated functions in the 
> assembly?
> >> >> A couple changes that could help:
> >> >> - don't use fe.system_to_component_index(i).first and 
> fe.system_to_component_index(j).first everywhere. Just define const k = ... 
> and const m = ... and use k and m. That might help the compiler with some 
> optimizations
> >> >> - move the two if for the cell assembly outside the for loop on the 

Re: [deal.II] Re: Assemble function, long time

2022-03-10 Thread Bruno Turcksin
Hermes,

For large systems, Krylov solvers are faster and require less memory
than direct solvers. Direct solvers scale poorly, in terms of memory
and performance, with the number of unknowns. The only problem with
Krylov solvers is that you need to use a good preconditioner. The
choice of the preconditioner depends on the system that you want to
solve.

Best,

Bruno

Le jeu. 10 mars 2022 à 02:51, Hermes Sampedro
 a écrit :
>
> Dear Bruno,
>
> Thank you again for your answer.
>
> I managed to solve now a system of 3.5 million DOF using the same solver as I 
> posted above, SparseDirectMUMPS. Now, in release mode, the assembling takes a 
> few minutes instead of hours, however,  the solver function takes 
> approximately 1.5h (per frequency iteration) using 40 processes in parallel 
> (similar to step-40).
>
> I was expecting to get faster performance when running in parallel with 40 
> processes, especially because I need to run for several frequencies.  I would 
> like to ask if you also would expect faster performance. Would that be solved 
> using the solver that you suggested (Krylov)?
>
>
> Thank you
>
> Regards,
>
> H
>
>
> El lunes, 7 de marzo de 2022 a las 15:04:19 UTC+1, bruno.t...@gmail.com 
> escribió:
>>
>> Hermes,
>>
>> The problem is that you are using a direct solver. Direct solvers
>> require a lot of memory because the inverse of a sparse matrix is
>> generally not sparse. If you use a LU decomposition, which I think
>> MUMPS does, you need a dense matrix to store the LU decomposition.
>> That's a lot of memory! You will need to use a Krylov to solve a
>> problem of this size.
>>
>> Best,
>>
>> Bruno
>>
>> Le dim. 6 mars 2022 à 07:19, Hermes Sampedro  a écrit :
>> >
>> > Dear Bruno,
>> >
>> > Thank you very much for the comments. The problem was that I was running 
>> > in Debug mode without knowing. Now, after changing to Release the 
>> > assembling time is considerably reduced.
>> >
>> > Moreover, I am experiencing another issue that I would like to ask. My 
>> > mesh is done with hyper_cube() in 3D and 5 refinements. The dof is around 
>> > 3 million. When running, I always get a memory issue and the program 
>> > stops. I realized that the problem is in the line that executes 
>> > solver.solve(system_matrix, completely_distributed_solution, system_rhs);
>> > I am using SparseMatrix and I do not fully understand where the problem 
>> > could come from. The matrices are initialized beforehand, what reason do 
>> > you think It could produce a memory issue in the solver?
>> >
>> > Below is the full solver function:
>> >
>> > template 
>> > void LaplaceProblem::solve()
>> > {
>> > PETScWrappers::MPI::Vector 
>> > completely_distributed_solution(locally_owned_dofs,mpi_communicator);
>> > SolverControl cn;
>> > PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
>> > solver.solve(system_matrix, completely_distributed_solution, system_rhs);
>> > constraints.distribute(completely_distributed_solution);
>> > locally_relevant_solution = completely_distributed_solution;
>> > }
>> >
>> >
>> > Thank you again for your help
>> > Regards
>> > H.
>> >
>> > El jueves, 3 de marzo de 2022 a las 15:13:30 UTC+1, bruno.t...@gmail.com 
>> > escribió:
>> >>
>> >> Hermes,
>> >>
>> >> There is a couple of things that you could do but it probably won't give 
>> >> you a significant speed up. Are you sure that you are running in Release 
>> >> mode and not in Debug? Do you evaluate complicated functions in the 
>> >> assembly?
>> >> A couple changes that could help:
>> >> - don't use fe.system_to_component_index(i).first and 
>> >> fe.system_to_component_index(j).first everywhere. Just define const k = 
>> >> ... and const m = ... and use k and m. That might help the compiler with 
>> >> some optimizations
>> >> - move the two if for the cell assembly outside the for loop on the 
>> >> quadrature point, similar to what you did for the boundaries. This could 
>> >> potentially help quite a bit if the cpu often gets the branch prediction 
>> >> wrong
>> >>
>> >> Best,
>> >>
>> >> Bruno
>> >>
>> >> On Thursday, March 3, 2022 at 4:31:04 AM UTC-5 hermes...@gmail.com wrote:
>> >>>
>> >>> Dear all,
>> >>>
>> >>> I am experiencing long times when computing the assembling and I would 
>> >>> like to ask if this is common or there is something wrong with my 
>> >>> implementation.
>> >>>
>> >>> My model is built in a similar way as step-29 and step-40 (using complex 
>> >>> values ad solving with a direct solver using distributed parallel 
>> >>> implementation).
>> >>> Now I am running larger systems with 3.5million dof and the assembling 
>> >>> took 16h, while the solver function took much less.
>> >>>
>> >>> I can show the structure of my assembly_system() function to ask if 
>> >>> there is something that can be done in order to speed up the process:
>> >>>
>> >>> void Problem::assemble_system()
>> >>> {
>> >>> for (unsigned int i = 0; i < dofs_per_cell; ++i) {
>> >>> for (unsigned int j = 0; j < 

Re: [deal.II] Re: Assemble function, long time

2022-03-09 Thread Hermes Sampedro
 

Dear Bruno,

Thank you again for your answer. 

I managed to solve now a system of 3.5 million DOF using the same solver as 
I posted above, *SparseDirectMUMPS. *Now, in release mode, the assembling 
takes a few minutes instead of hours, however,  the solver function takes 
approximately 1.5h (per frequency iteration) using 40 processes in parallel 
(similar to step-40). 

I was expecting to get faster performance when running in parallel with 40 
processes, especially because I need to run for several frequencies.  I 
would like to ask if you also would expect faster performance. Would that 
be solved using the solver that you suggested (Krylov)?


Thank you

Regards, 

H

El lunes, 7 de marzo de 2022 a las 15:04:19 UTC+1, bruno.t...@gmail.com 
escribió:

> Hermes,
>
> The problem is that you are using a direct solver. Direct solvers
> require a lot of memory because the inverse of a sparse matrix is
> generally not sparse. If you use a LU decomposition, which I think
> MUMPS does, you need a dense matrix to store the LU decomposition.
> That's a lot of memory! You will need to use a Krylov to solve a
> problem of this size.
>
> Best,
>
> Bruno
>
> Le dim. 6 mars 2022 à 07:19, Hermes Sampedro  a 
> écrit :
> >
> > Dear Bruno,
> >
> > Thank you very much for the comments. The problem was that I was running 
> in Debug mode without knowing. Now, after changing to Release the 
> assembling time is considerably reduced.
> >
> > Moreover, I am experiencing another issue that I would like to ask. My 
> mesh is done with hyper_cube() in 3D and 5 refinements. The dof is around 3 
> million. When running, I always get a memory issue and the program stops. I 
> realized that the problem is in the line that executes 
> solver.solve(system_matrix, completely_distributed_solution, system_rhs);
> > I am using SparseMatrix and I do not fully understand where the problem 
> could come from. The matrices are initialized beforehand, what reason do 
> you think It could produce a memory issue in the solver?
> >
> > Below is the full solver function:
> >
> > template 
> > void LaplaceProblem::solve()
> > {
> > PETScWrappers::MPI::Vector 
> completely_distributed_solution(locally_owned_dofs,mpi_communicator);
> > SolverControl cn;
> > PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
> > solver.solve(system_matrix, completely_distributed_solution, system_rhs);
> > constraints.distribute(completely_distributed_solution);
> > locally_relevant_solution = completely_distributed_solution;
> > }
> >
> >
> > Thank you again for your help
> > Regards
> > H.
> >
> > El jueves, 3 de marzo de 2022 a las 15:13:30 UTC+1, bruno.t...@gmail.com 
> escribió:
> >>
> >> Hermes,
> >>
> >> There is a couple of things that you could do but it probably won't 
> give you a significant speed up. Are you sure that you are running in 
> Release mode and not in Debug? Do you evaluate complicated functions in the 
> assembly?
> >> A couple changes that could help:
> >> - don't use fe.system_to_component_index(i).first and 
> fe.system_to_component_index(j).first everywhere. Just define const k = ... 
> and const m = ... and use k and m. That might help the compiler with some 
> optimizations
> >> - move the two if for the cell assembly outside the for loop on the 
> quadrature point, similar to what you did for the boundaries. This could 
> potentially help quite a bit if the cpu often gets the branch prediction 
> wrong
> >>
> >> Best,
> >>
> >> Bruno
> >>
> >> On Thursday, March 3, 2022 at 4:31:04 AM UTC-5 hermes...@gmail.com 
> wrote:
> >>>
> >>> Dear all,
> >>>
> >>> I am experiencing long times when computing the assembling and I would 
> like to ask if this is common or there is something wrong with my 
> implementation.
> >>>
> >>> My model is built in a similar way as step-29 and step-40 (using 
> complex values ad solving with a direct solver using distributed parallel 
> implementation).
> >>> Now I am running larger systems with 3.5million dof and the assembling 
> took 16h, while the solver function took much less.
> >>>
> >>> I can show the structure of my assembly_system() function to ask if 
> there is something that can be done in order to speed up the process:
> >>>
> >>> void Problem::assemble_system()
> >>> {
> >>> for (unsigned int i = 0; i < dofs_per_cell; ++i) {
> >>> for (unsigned int j = 0; j < dofs_per_cell; ++j)
> >>> {
> >>> for (unsigned int q_point = 0; q_point < n_q_points; ++q_point)
> >>> {
> >>> if (fe.system_to_component_index(i).first == 
> fe.system_to_component_index(j).first)
> >>> {
> >>> cell_matrix(i, j) += 
> >>> }
> >>> if (fe.system_to_component_index(i).first != 
> fe.system_to_component_index(j).first)
> >>> {
> >>> cell_matrix(i, j) += 
> >>> }
> >>> }
> >>>
> >>> // Boundaries
> >>> if (fe.system_to_component_index(i).first == 
> fe.system_to_component_index(j).first)
> >>> {
> >>> for (unsigned int face_no : GeometryInfo::face_indices())
> >>> if (cell->face(face_no)->at_boundary() 

Re: [deal.II] Re: Assemble function, long time

2022-03-07 Thread Bruno Turcksin
Hermes,

The problem is that you are using a direct solver. Direct solvers
require a lot of memory because the inverse of a sparse matrix is
generally not sparse. If you use a LU decomposition, which I think
MUMPS does, you need a dense matrix to store the LU decomposition.
That's a lot of memory! You will need to use a Krylov to solve a
problem of this size.

Best,

Bruno

Le dim. 6 mars 2022 à 07:19, Hermes Sampedro  a écrit :
>
> Dear Bruno,
>
> Thank you very much for the comments. The problem was that I was running in 
> Debug mode without knowing. Now, after changing to Release the assembling 
> time is considerably reduced.
>
> Moreover, I am experiencing another issue that I would like to ask. My mesh 
> is done with hyper_cube() in 3D and 5 refinements. The dof is around 3 
> million. When running, I always get a memory issue and the program stops. I 
> realized that the problem is in the line that executes 
> solver.solve(system_matrix, completely_distributed_solution, system_rhs);
> I am using SparseMatrix and I do not fully understand where the problem could 
> come from. The matrices are initialized beforehand, what reason do you think 
> It could produce a memory issue in the solver?
>
> Below is the full solver function:
>
> template 
> void LaplaceProblem::solve()
> {
> PETScWrappers::MPI::Vector 
> completely_distributed_solution(locally_owned_dofs,mpi_communicator);
> SolverControl cn;
> PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);
> solver.solve(system_matrix, completely_distributed_solution, system_rhs);
> constraints.distribute(completely_distributed_solution);
> locally_relevant_solution = completely_distributed_solution;
> }
>
>
> Thank you again for your help
> Regards
> H.
>
> El jueves, 3 de marzo de 2022 a las 15:13:30 UTC+1, bruno.t...@gmail.com 
> escribió:
>>
>> Hermes,
>>
>> There is a couple of things that you could do but it probably won't give you 
>> a significant speed up. Are you sure that you are running in Release mode 
>> and not in Debug? Do you evaluate complicated functions in the assembly?
>> A couple changes that could help:
>> - don't use fe.system_to_component_index(i).first and 
>> fe.system_to_component_index(j).first  everywhere. Just define const k = ... 
>> and const m = ... and use k and m. That might help the compiler with some 
>> optimizations
>> - move the two if for the cell assembly outside the for loop on the 
>> quadrature point, similar to what you did for the boundaries. This could 
>> potentially help quite a bit if the cpu often gets the branch prediction 
>> wrong
>>
>> Best,
>>
>> Bruno
>>
>> On Thursday, March 3, 2022 at 4:31:04 AM UTC-5 hermes...@gmail.com wrote:
>>>
>>> Dear all,
>>>
>>> I am experiencing long times when computing the assembling and I would like 
>>> to ask if this is common or there is something wrong with my implementation.
>>>
>>> My model is built in a similar way as step-29 and step-40 (using complex 
>>> values ad solving with a direct solver using distributed parallel 
>>> implementation).
>>> Now I am running larger systems with 3.5million dof and the assembling took 
>>> 16h, while the solver function took much less.
>>>
>>>  I can show the structure of my assembly_system() function to ask if there 
>>> is something that can be done in order to speed up the process:
>>>
>>> void Problem::assemble_system()
>>> {
>>> for (unsigned int i = 0; i < dofs_per_cell; ++i) {
>>> for (unsigned int j = 0; j < dofs_per_cell; ++j)
>>> {
>>> for (unsigned int q_point = 0; q_point < n_q_points; ++q_point)
>>> {
>>> if (fe.system_to_component_index(i).first == 
>>> fe.system_to_component_index(j).first)
>>> {
>>> cell_matrix(i, j) += 
>>> }
>>> if (fe.system_to_component_index(i).first != 
>>> fe.system_to_component_index(j).first)
>>> {
>>> cell_matrix(i, j) += 
>>> }
>>> }
>>>
>>> // Boundaries
>>> if (fe.system_to_component_index(i).first == 
>>> fe.system_to_component_index(j).first)
>>> {
>>> for (unsigned int face_no : GeometryInfo::face_indices())
>>> if (cell->face(face_no)->at_boundary() && 
>>> (cell->face(face_no)->boundary_id() == 0))
>>> {
>>> fe_face_values.reinit(cell, face_no);
>>> for (unsigned int q_point = 0; q_point < n_face_q_points; ++q_point)
>>> cell_matrix(i, j) += 
>>> }
>>> }
>>> if (fe.system_to_component_index(i).first != 
>>> fe.system_to_component_index(j).first)
>>> {
>>> for (unsigned int face_no : GeometryInfo::face_indices())
>>> {
>>> if (cell->face(face_no)->at_boundary() && 
>>> (cell->face(face_no)->boundary_id() == 0))
>>> {
>>> fe_face_values.reinit(cell, face_no);
>>> for (unsigned int q_point = 0; q_point < n_face_q_points;++q_point)
>>> cell_matrix(i, j) += 
>>> }
>>> }
>>> }
>>> }
>>> }
>>>
>>>
>>> Thank you very much.
>>> Regards,
>>> Hermes
>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see 
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are 

[deal.II] Re: Assemble function, long time

2022-03-06 Thread Hermes Sampedro
Dear Bruno, 

Thank you very much for the comments. The problem was that I was running in 
Debug mode without knowing. Now, after changing to Release the assembling 
time is considerably reduced.

Moreover, I am experiencing another issue that I would like to ask. My mesh 
is done with hyper_cube() in 3D and 5 refinements. The dof is around 3 
million. When running, I always get a memory issue and the program stops. I 
realized that the problem is in the line that executes 
*solver.solve(system_matrix, 
completely_distributed_solution, system_rhs);*
I am using SparseMatrix and I do not fully understand where the problem 
could come from. The matrices are initialized beforehand, what reason do 
you think It could produce a memory issue in the solver?

Below is the full solver function:

* template *
* void LaplaceProblem::solve()*
* {*
* PETScWrappers::MPI::Vector 
completely_distributed_solution(locally_owned_dofs,mpi_communicator);*
* SolverControl cn;*
* PETScWrappers::SparseDirectMUMPS solver(cn, mpi_communicator);*
* solver.solve(system_matrix, completely_distributed_solution, system_rhs);*
* constraints.distribute(completely_distributed_solution);*
* locally_relevant_solution = completely_distributed_solution;*
*}*


Thank you again for your help
Regards
H.

El jueves, 3 de marzo de 2022 a las 15:13:30 UTC+1, bruno.t...@gmail.com 
escribió:

> Hermes,
>
> There is a couple of things that you could do but it probably won't give 
> you a significant speed up. Are you sure that you are running in Release 
> mode and not in Debug? Do you evaluate complicated functions in the 
> assembly?
> A couple changes that could help:
> - don't use *fe.system_to_component_index(i).first *and 
> f*e.system_to_component_index(j).first  
> *everywhere. Just define const k = ... and const m = ... and use k and m. 
> That might help the compiler with some optimizations
> - move the two if for the cell assembly outside the for loop on the 
> quadrature point, similar to what you did for the boundaries. This could 
> potentially help quite a bit if the cpu often gets the branch prediction 
> wrong 
>
> Best,
>
> Bruno 
>
> On Thursday, March 3, 2022 at 4:31:04 AM UTC-5 hermes...@gmail.com wrote:
>
>> Dear all, 
>>
>> I am experiencing long times when computing the assembling and I would 
>> like to ask if this is common or there is something wrong with my 
>> implementation.
>>
>> My model is built in a similar way as step-29 and step-40 (using complex 
>> values ad solving with a direct solver using distributed parallel 
>> implementation).
>> Now I am running larger systems with 3.5million dof and the assembling 
>> took 16h, while the solver function took much less.
>>
>>  I can show the structure of my assembly_system() function to ask if 
>> there is something that can be done in order to speed up the process:
>>
>>
>> *void Problem::assemble_system()*
>> *{*
>> * for (unsigned int i = 0; i < dofs_per_cell; ++i) {*
>> * for (unsigned int j = 0; j < dofs_per_cell; ++j)*
>> * {*
>> * for (unsigned int q_point = 0; q_point < n_q_points; ++q_point)*
>> * { *
>> * if (fe.system_to_component_index(i).first == 
>> fe.system_to_component_index(j).first)*
>> * {*
>> * cell_matrix(i, j) += *
>>
>> * }*
>> * if (fe.system_to_component_index(i).first != 
>> fe.system_to_component_index(j).first)*
>> * {*
>> * cell_matrix(i, j) += *
>> * }*
>>
>> * }*
>>
>> // Boundaries
>> * if (fe.system_to_component_index(i).first == 
>> fe.system_to_component_index(j).first)*
>> * {*
>> * for (unsigned int face_no : GeometryInfo::face_indices())*
>> * if (cell->face(face_no)->at_boundary() && 
>> (cell->face(face_no)->boundary_id() == 0))*
>> * {*
>> * fe_face_values.reinit(cell, face_no);*
>> * for (unsigned int q_point = 0; q_point < n_face_q_points; ++q_point)*
>> * cell_matrix(i, j) += *
>> * }*
>> * }*
>> * if (fe.system_to_component_index(i).first != 
>> fe.system_to_component_index(j).first)*
>> * {*
>> * for (unsigned int face_no : GeometryInfo::face_indices())*
>> * {*
>> * if (cell->face(face_no)->at_boundary() && 
>> (cell->face(face_no)->boundary_id() == 0))*
>> * {*
>> * fe_face_values.reinit(cell, face_no);*
>> * for (unsigned int q_point = 0; q_point < n_face_q_points;++q_point)*
>> * cell_matrix(i, j) += *
>> * }*
>> * }*
>>
>> * }*
>>
>> * } *
>> *}*
>>
>>
>> Thank you very much.
>> Regards, 
>> Hermes
>>
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/d80cbd1c-7d1f-4042-a9e8-e8382a721790n%40googlegroups.com.


[deal.II] Re: Assemble function, long time

2022-03-03 Thread Bruno Turcksin
Hermes,

There is a couple of things that you could do but it probably won't give 
you a significant speed up. Are you sure that you are running in Release 
mode and not in Debug? Do you evaluate complicated functions in the 
assembly?
A couple changes that could help:
- don't use *fe.system_to_component_index(i).first *and 
f*e.system_to_component_index(j).first  
*everywhere. Just define const k = ... and const m = ... and use k and m. 
That might help the compiler with some optimizations
- move the two if for the cell assembly outside the for loop on the 
quadrature point, similar to what you did for the boundaries. This could 
potentially help quite a bit if the cpu often gets the branch prediction 
wrong 

Best,

Bruno 

On Thursday, March 3, 2022 at 4:31:04 AM UTC-5 hermes...@gmail.com wrote:

> Dear all, 
>
> I am experiencing long times when computing the assembling and I would 
> like to ask if this is common or there is something wrong with my 
> implementation.
>
> My model is built in a similar way as step-29 and step-40 (using complex 
> values ad solving with a direct solver using distributed parallel 
> implementation).
> Now I am running larger systems with 3.5million dof and the assembling 
> took 16h, while the solver function took much less.
>
>  I can show the structure of my assembly_system() function to ask if there 
> is something that can be done in order to speed up the process:
>
>
> *void Problem::assemble_system()*
> *{*
> * for (unsigned int i = 0; i < dofs_per_cell; ++i) {*
> * for (unsigned int j = 0; j < dofs_per_cell; ++j)*
> * {*
> * for (unsigned int q_point = 0; q_point < n_q_points; ++q_point)*
> * { *
> * if (fe.system_to_component_index(i).first == 
> fe.system_to_component_index(j).first)*
> * {*
> * cell_matrix(i, j) += *
>
> * }*
> * if (fe.system_to_component_index(i).first != 
> fe.system_to_component_index(j).first)*
> * {*
> * cell_matrix(i, j) += *
> * }*
>
> * }*
>
> // Boundaries
> * if (fe.system_to_component_index(i).first == 
> fe.system_to_component_index(j).first)*
> * {*
> * for (unsigned int face_no : GeometryInfo::face_indices())*
> * if (cell->face(face_no)->at_boundary() && 
> (cell->face(face_no)->boundary_id() == 0))*
> * {*
> * fe_face_values.reinit(cell, face_no);*
> * for (unsigned int q_point = 0; q_point < n_face_q_points; ++q_point)*
> * cell_matrix(i, j) += *
> * }*
> * }*
> * if (fe.system_to_component_index(i).first != 
> fe.system_to_component_index(j).first)*
> * {*
> * for (unsigned int face_no : GeometryInfo::face_indices())*
> * {*
> * if (cell->face(face_no)->at_boundary() && 
> (cell->face(face_no)->boundary_id() == 0))*
> * {*
> * fe_face_values.reinit(cell, face_no);*
> * for (unsigned int q_point = 0; q_point < n_face_q_points;++q_point)*
> * cell_matrix(i, j) += *
> * }*
> * }*
>
> * }*
>
> * } *
> *}*
>
>
> Thank you very much.
> Regards, 
> Hermes
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/249506f5-8ecb-42d5-b895-735be578d144n%40googlegroups.com.