Re: [deal.II] Shared memory parallelism of Trilinos

2019-03-05 Thread Jean-Paul Pelteret
Dear Bruno,

> Thank you for the rapid and very detailed answer. This makes this community 
> so great.

Bruno T. has given you far more insight into the inner workings of the Trilinos 
linear algebra packages than I’d have been able to (thanks :-) ), so I’m really 
glad that we were able to help you and we appreciate the positive feedback!

Best,
Jean-Paul

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [deal.II] Shared memory parallelism of Trilinos

2019-03-04 Thread Bruno Turcksin
Bruno,

Le lun. 4 mars 2019 à 09:21, Bruno Blais  a écrit :
> Would I see significant performance gain for a GMRES + ILU
preconditioning setup by going from AztecOO to the Tpetra or Belos stack?

No, I don't think so. You should get very close performance. On top of
that, deal.II does not support Tpetra matrices right now, so you will have
to make it work yourself.

> I guess that the best parallel performance increase I could get would be
by using the AMG from ML, but I think this will require some experiments
before I can set-up the parameters correctly. My system is not elliptic and
can be quite ugly at times due to stabilization.

There is a parameter that you can switch if your system is not elliptic
(see here
)
but you will probably need to tweak more than this parameter. If you can
start from a coarse mesh that you refine in deal.II, you can use the
geometric multigrid preconditioner in deal.II (see step 56)

Best,

Bruno

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [deal.II] Shared memory parallelism of Trilinos

2019-03-04 Thread Bruno Blais
Dear Bruno,
Thanks, that definitely answers my questions.
I would have a final slightly related question. 
Would I see significant performance gain for a GMRES + ILU preconditioning 
setup by going from AztecOO to the Tpetra or Belos stack?
I guess that the best parallel performance increase I could get would be by 
using the AMG from ML, but I think this will require some experiments 
before I can set-up the parameters correctly. My system is not elliptic and 
can be quite ugly at times due to stabilization.

Thank you for everything.

On Monday, 4 March 2019 09:03:24 UTC-5, Bruno Turcksin wrote:
>
> Le lun. 4 mars 2019 à 08:44, Bruno Blais > 
> a écrit : 
> > I'm using the wrapper, so I guess by default that means it is using the 
> AztecOO stack of solvers? 
>
> Yes, that's right. You won't get any speedup using OpenMP with 
> AztecOO, you need to switch to the Tpetra stack and Belos to use 
> OpenMP (but we don't have wrappers for the all Tpetra stack) 
>
> >>   2) Why do you think that OpenMP would be faster than MPI? MPI is 
> usually faster than OpenMP unless you are very careful about your data 
> management. 
> > My original idea was that since in shared memory parallelism you could 
> precondition a larger chunk of the matrix as a whole, that the ILU 
> preconditioning would be more efficient in a shared-memory context than in 
> a distributed one. Thus you would need less GMRES iterations to solve > 
> your system. It seems I am wrong :) ? 
>
> Using larger blocks for ILU preconditioning will decrease the number 
> of GMRES iterations but you will spend more time in ILU, so it's hard 
> to say if it's worth it. 
>
> Best, 
>
> Bruno 
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [deal.II] Shared memory parallelism of Trilinos

2019-03-04 Thread Bruno Turcksin
Le lun. 4 mars 2019 à 08:44, Bruno Blais  a écrit :
> I'm using the wrapper, so I guess by default that means it is using the 
> AztecOO stack of solvers?

Yes, that's right. You won't get any speedup using OpenMP with
AztecOO, you need to switch to the Tpetra stack and Belos to use
OpenMP (but we don't have wrappers for the all Tpetra stack)

>>   2) Why do you think that OpenMP would be faster than MPI? MPI is usually 
>> faster than OpenMP unless you are very careful about your data management.
> My original idea was that since in shared memory parallelism you could 
> precondition a larger chunk of the matrix as a whole, that the ILU 
> preconditioning would be more efficient in a shared-memory context than in a 
> distributed one. Thus you would need less GMRES iterations to solve > your 
> system. It seems I am wrong :) ?

Using larger blocks for ILU preconditioning will decrease the number
of GMRES iterations but you will spend more time in ILU, so it's hard
to say if it's worth it.

Best,

Bruno

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [deal.II] Shared memory parallelism of Trilinos

2019-03-04 Thread Bruno Blais
Bruno 

On Monday, 4 March 2019 08:28:18 UTC-5, Bruno Turcksin wrote:

> Bruno,
>
> On Monday, March 4, 2019 at 7:41:27 AM UTC-5, Bruno Blais wrote:
>>
>> 2. Furthermore, when you compile Trilinos with OpenMP and you try to 
>> compile the latest version of DEAL.II, you get a compilation error when 
>> ".hpp" from Kokkos are included. The error reads something like:
>> Kokkos was compiled with Openmp but the compiler did not pass an openmp 
>> flag.
>>
>> This can be easily fixed by manually adding -fopenmp to the CXX flags 
>> used by dealii. However, would it not be a better idea to add a 
>> DEALII_ENABLE_OpenMP Flag directly in the cmake to ensure that if you put 
>> that flag on, the -fopenmp flag is enabled?
>> Maybe I missed such an option. It just made me unsure if I was doing 
>> something supported or not.
>>
> deal.II does not use OpenMP for multithreading so if -fopenmp is missing 
> it's because Trilinos did not export it correctly. Unless you mean that you 
> include Kokkos in your own code? In that case you are responsible for the 
> flags if you use OpenMP. 
>

The issue would be then that my Trilinos installation did not export the 
flag correctly. I will check that out. It makes sense. I only use the 
solvers with the DEALII TrilinosWrappers, I have not tried to use the 
Trilinos solvers directly.
 

>
>> 3. When compiled with OpenMP, I got deceptively poor performances, but 
>> maybe this is because of the relatively small size of my application. I 
>> would have (naively maybe) expected that the time to solver a linear system 
>> with GMRES using 1 MPI with 4 OMP threads would have been lower than the 
>> time it takes with 4 MPI and on my application this was not the case. I was 
>> surprised because I was expecting my ILU Preconditioning to work better on 
>> lower amount of cores, but maybe this is related to fill-in or other issues?
>>
> Two things here:
>   1) Which package are using? The Epetra stack does not support OpenMP so 
> you can compile with OpenMP but it won't be used.
>

I'm using the wrapper, so I guess by default that means it is using the 
AztecOO stack of solvers?
 

>   2) Why do you think that OpenMP would be faster than MPI? MPI is usually 
> faster than OpenMP unless you are very careful about your data management.
>

My original idea was that since in shared memory parallelism you could 
precondition a larger chunk of the matrix as a whole, that the ILU 
preconditioning would be more efficient in a shared-memory context than in 
a distributed one. Thus you would need less GMRES iterations to solve your 
system. It seems I am wrong :) ?

 

>
> Best,
>
Thanks, this is very interesting / enlightening
 

>
> Bruno
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [deal.II] Shared memory parallelism of Trilinos

2019-03-04 Thread Bruno Turcksin
Bruno,

On Monday, March 4, 2019 at 7:41:27 AM UTC-5, Bruno Blais wrote:
>
> 2. Furthermore, when you compile Trilinos with OpenMP and you try to 
> compile the latest version of DEAL.II, you get a compilation error when 
> ".hpp" from Kokkos are included. The error reads something like:
> Kokkos was compiled with Openmp but the compiler did not pass an openmp 
> flag.
>
> This can be easily fixed by manually adding -fopenmp to the CXX flags used 
> by dealii. However, would it not be a better idea to add a 
> DEALII_ENABLE_OpenMP Flag directly in the cmake to ensure that if you put 
> that flag on, the -fopenmp flag is enabled?
> Maybe I missed such an option. It just made me unsure if I was doing 
> something supported or not.
>
deal.II does not use OpenMP for multithreading so if -fopenmp is missing 
it's because Trilinos did not export it correctly. Unless you mean that you 
include Kokkos in your own code? In that case you are responsible for the 
flags if you use OpenMP. 

>
> 3. When compiled with OpenMP, I got deceptively poor performances, but 
> maybe this is because of the relatively small size of my application. I 
> would have (naively maybe) expected that the time to solver a linear system 
> with GMRES using 1 MPI with 4 OMP threads would have been lower than the 
> time it takes with 4 MPI and on my application this was not the case. I was 
> surprised because I was expecting my ILU Preconditioning to work better on 
> lower amount of cores, but maybe this is related to fill-in or other issues?
>
Two things here:
  1) Which package are using? The Epetra stack does not support OpenMP so 
you can compile with OpenMP but it won't be used.
  2) Why do you think that OpenMP would be faster than MPI? MPI is usually 
faster than OpenMP unless you are very careful about your data management.

Best,

Bruno

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [deal.II] Shared memory parallelism of Trilinos

2019-03-04 Thread Jean-Paul Pelteret
Dear Bruno,

> The suggested compilation options for Trilinos do not suggest the use of 
> OpenMP and the flag is not enabled by default.


Can you please indicate where you read this? I’m looking at our documentation 
for interfacing to external libraries, i.e.
https://dealii.org/9.0.0/readme.html 
https://dealii.org/9.0.0/external-libs/trilinos.html 

and I don’t see any mention of OpenMP.

> Does this mean that the compilation of Trilinos with the suggested flag 
> enables shared memory parallelism when using the TrilinosWrappers function, 
> or is it necessary to compile Trilinos with openmp or threads enabled?

Trilinos does its own thread management, so irrespective of whether you've 
built it with threading enabled (i.e. with the OpenMP or pthreads options 
enabled) enabling/disabling threading in deal.II will have no effect on this. 
What might be possible is that you could have scheduling conflicts, e.g. what 
is mentioned here
https://software.intel.com/en-us/forums/intel-threading-building-blocks/topic/281761
 

if you use multiple threading models in one code base. But I don’t know enough 
about the topic to say anything more than mention the potential issue.

> What are the official guidelines for this?

I would say that there are none — how you’re wanting to run your simulations 
(e.g. desktop, cluster computing, etc.) dictates what parallelism model (i.e. 
multi-threading, MPI, or hybrid) would work best for you. Would you care to 
elaborate more on this so that we could maybe make some suggestions?

Best,
Jean-Paul

> On 03 Mar 2019, at 21:40, Bruno Blais  wrote:
> 
> Hello everyone,
> I have a quick question for which I have not found documentation.
> The suggested compilation options for Trilinos do not suggest the use of 
> OpenMP and the flag is not enabled by default. 
> DEALII by default also compiles using TBB for shared memory parallelism. Does 
> this mean that the compilation of Trilinos with the suggested flag enables 
> shared memory parallelism when using the TrilinosWrappers function, or is it 
> necessary to compile Trilinos with openmp or threads enabled?
> What are the official guidelines for this?
> 
> Thanks a lot
> Bruno
> 
> -- 
> The deal.II project is located at http://www.dealii.org/ 
> 
> For mailing list/forum options, see 
> https://groups.google.com/d/forum/dealii?hl=en 
> 
> --- 
> You received this message because you are subscribed to the Google Groups 
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to dealii+unsubscr...@googlegroups.com 
> .
> For more options, visit https://groups.google.com/d/optout 
> .

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[deal.II] Shared memory parallelism of Trilinos

2019-03-03 Thread Bruno Blais
Hello everyone,
I have a quick question for which I have not found documentation.
The suggested compilation options for Trilinos do not suggest the use of 
OpenMP and the flag is not enabled by default. 
DEALII by default also compiles using TBB for shared memory parallelism. 
Does this mean that the compilation of Trilinos with the suggested flag 
enables shared memory parallelism when using the TrilinosWrappers function, 
or is it necessary to compile Trilinos with openmp or threads enabled?
What are the official guidelines for this?

Thanks a lot
Bruno

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.