Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-10-02 Thread timo Hyvärinen
hi, Wolfgang,

thank you for your reply.

>10 MB of leaks is small potatoes compared to the 10
>GB you allocate. You will have to figure out where all that memory is
>allocated.

I must say this is certainly right thing to do.

Tim,
Sincerely

On Mon, Oct 2, 2023 at 7:01 PM Wolfgang Bangerth 
wrote:

> On 10/2/23 00:46, timo Hyvärinen wrote:
> > (I) How to make the program run faster through Valgrind profiling? I
> know I
> > should use cachegrind and callgrind, but I don't know is what things I
> should
> > pay attention to from cachegrind/callgrind reports, what properties
> > have significant impact on speed;
>
> Bruno already answered this, but I wanted to point you at the introduction
> of
> step-22 for an example.
>
> As for your other email: 10 MB of leaks is small potatoes compared to the
> 10
> GB you allocate. You will have to figure out where all that memory is
> allocated. My recommendation would be to debug these sorts of issues on a
> local machine, rather than a cluster. You could set up a smaller test case
> that runs faster.
>
> Best
>   W.
>
> --
> 
> Wolfgang Bangerth  email: bange...@colostate.edu
> www: http://www.math.colostate.edu/~bangerth/
>
>
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAArwj0G5sJHFqtzMVj8wTx-HnBvc7qmiVdiN1O0%2BbWRqAwnm4A%40mail.gmail.com.


Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-10-02 Thread timo Hyvärinen
hi, Bruno,

Thank you for your reply!

Valgrind is slow, indeed, so I reduced the grid size to 1/10 of full size.
I found at least it's easy to use on cluster.

I don't know AddressSanitizer, certainly I need duckduckgo it first.

Tim,
best

On Mon, Oct 2, 2023 at 4:20 PM Bruno Turcksin 
wrote:

> Tim,
>
> Valgrind is great but it is always slow. Instead you can use the
> AddressSanitizer from clang or gcc. It's much faster than Valgrind but I
> find the output is harder to read.
>
> Best,
>
> Bruno
>
> Le lun. 2 oct. 2023 à 02:47, timo Hyvärinen  a
> écrit :
>
>> Hi, Kendrick and Wolfgang,
>>
>> Thank you for your reply.
>>
>> I have two questions on hand:
>> (I) How to make the program run faster through Valgrind profiling? I know
>> I should use cachegrind and callgrind, but I don't know is what things I
>> should pay attention to from cachegrind/callgrind reports, what properties
>> have significant impact on speed;
>> (2) GPU acceleration (this may be a bad question for this thread, but I
>> really want to ask). I know dealii has CUDA wrapper and Kokkos leverage,
>> but what I don't know is how I can use them to speed up my matrix-based
>> newton iteration code. A straightforward idea in my mind is to use GPU for
>> system-matrix assembly, but I didn't see this in the only cuda tutorial
>> i.e., step-64. So I wonder what's the common way in deall for using GPU in
>> matrix-based code.
>>
>> Tim,
>> Sincerely
>>
>> On Mon, Oct 2, 2023 at 2:10 AM kendrick zoom 
>> wrote:
>>
>>> Hello Wolfgang,
>>> How are you doing today?
>>>
>>> Well your question is not quite clear.
>>> So what exactly do you want to know?
>>>
>>> On Sun, Oct 1, 2023 at 3:40 PM Wolfgang Bangerth 
>>> wrote:
>>>
 On 9/30/23 02:01, timo Hyvärinen wrote:
 >
 > So my questions here are (i) Did this issue ever happened for other
 deal.II
 > applications, how to solve it expect increase the number of nodes or
 memory
 > requirements; (ii) What kind of profiling/debugger tools
 nowaday's deal.II
 > experts are using to dress memory issue. Should I build Valgrind by
 myself?
 > Does Valgrind only support MPI 2, my openMPI is v.3.

 Valgrind doesn't care.

 6.5*10^5 unknowns with a quadratic element in 3d can probably be
 expected to
 take in the range of 2-5 GB. That should fit into most machines. But at
 the
 same time, this is a small enough problem that you can run it under
 valgrind's
 memory profilers on any workstation or laptop you have access to. You
 could
 also talk to the system administrators of the cluster you work on to
 see
 whether they are willing to give you a more up to date version of
 valgrind.

 Best
   W.

 --
 
 Wolfgang Bangerth  email:
 bange...@colostate.edu
 www:
 http://www.math.colostate.edu/~bangerth/


 --
 The deal.II project is located at http://www.dealii.org/
 For mailing list/forum options, see
 https://groups.google.com/d/forum/dealii?hl=en
 ---
 You received this message because you are subscribed to the Google
 Groups "deal.II User Group" group.
 To unsubscribe from this group and stop receiving emails from it, send
 an email to dealii+unsubscr...@googlegroups.com.
 To view this discussion on the web visit
 https://groups.google.com/d/msgid/dealii/0e01960a-3886-fc95-8332-a378acc1d7f8%40colostate.edu
 .

>>> --
>>> The deal.II project is located at http://www.dealii.org/
>>> For mailing list/forum options, see
>>> https://groups.google.com/d/forum/dealii?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "deal.II User Group" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to dealii+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/dealii/CAACKYBj65E817-x1GO%3DN7zrB5Op7X%3DFn1itAovnKaQ60CYyeyg%40mail.gmail.com
>>> 
>>> .
>>>
>> --
>> The deal.II project is located at http://www.dealii.org/
>> For mailing list/forum options, see
>> https://groups.google.com/d/forum/dealii?hl=en
>> ---
>> You received this message because you are subscribed to a topic in the
>> Google Groups "deal.II User Group" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/dealii/pkGlpp5uJUE/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> dealii+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/dealii/CAArwj0FJ1EgswOFeSh2ECHH-GBnx54reEv8wXaT79%3DFL679%3D7w%40mail.gmail.com
>> 

Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-10-02 Thread Wolfgang Bangerth

On 10/2/23 00:46, timo Hyvärinen wrote:
(I) How to make the program run faster through Valgrind profiling? I know I 
should use cachegrind and callgrind, but I don't know is what things I should 
pay attention to from cachegrind/callgrind reports, what properties 
have significant impact on speed;


Bruno already answered this, but I wanted to point you at the introduction of 
step-22 for an example.


As for your other email: 10 MB of leaks is small potatoes compared to the 10 
GB you allocate. You will have to figure out where all that memory is 
allocated. My recommendation would be to debug these sorts of issues on a 
local machine, rather than a cluster. You could set up a smaller test case 
that runs faster.


Best
 W.

--

Wolfgang Bangerth  email: bange...@colostate.edu
   www: http://www.math.colostate.edu/~bangerth/


--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/2c3e922d-022e-4204-79d9-c897267129d0%40colostate.edu.


Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-10-02 Thread Bruno Turcksin
Tim,

Valgrind is great but it is always slow. Instead you can use the
AddressSanitizer from clang or gcc. It's much faster than Valgrind but I
find the output is harder to read.

Best,

Bruno

Le lun. 2 oct. 2023 à 02:47, timo Hyvärinen  a
écrit :

> Hi, Kendrick and Wolfgang,
>
> Thank you for your reply.
>
> I have two questions on hand:
> (I) How to make the program run faster through Valgrind profiling? I know
> I should use cachegrind and callgrind, but I don't know is what things I
> should pay attention to from cachegrind/callgrind reports, what properties
> have significant impact on speed;
> (2) GPU acceleration (this may be a bad question for this thread, but I
> really want to ask). I know dealii has CUDA wrapper and Kokkos leverage,
> but what I don't know is how I can use them to speed up my matrix-based
> newton iteration code. A straightforward idea in my mind is to use GPU for
> system-matrix assembly, but I didn't see this in the only cuda tutorial
> i.e., step-64. So I wonder what's the common way in deall for using GPU in
> matrix-based code.
>
> Tim,
> Sincerely
>
> On Mon, Oct 2, 2023 at 2:10 AM kendrick zoom 
> wrote:
>
>> Hello Wolfgang,
>> How are you doing today?
>>
>> Well your question is not quite clear.
>> So what exactly do you want to know?
>>
>> On Sun, Oct 1, 2023 at 3:40 PM Wolfgang Bangerth 
>> wrote:
>>
>>> On 9/30/23 02:01, timo Hyvärinen wrote:
>>> >
>>> > So my questions here are (i) Did this issue ever happened for other
>>> deal.II
>>> > applications, how to solve it expect increase the number of nodes or
>>> memory
>>> > requirements; (ii) What kind of profiling/debugger tools
>>> nowaday's deal.II
>>> > experts are using to dress memory issue. Should I build Valgrind by
>>> myself?
>>> > Does Valgrind only support MPI 2, my openMPI is v.3.
>>>
>>> Valgrind doesn't care.
>>>
>>> 6.5*10^5 unknowns with a quadratic element in 3d can probably be
>>> expected to
>>> take in the range of 2-5 GB. That should fit into most machines. But at
>>> the
>>> same time, this is a small enough problem that you can run it under
>>> valgrind's
>>> memory profilers on any workstation or laptop you have access to. You
>>> could
>>> also talk to the system administrators of the cluster you work on to see
>>> whether they are willing to give you a more up to date version of
>>> valgrind.
>>>
>>> Best
>>>   W.
>>>
>>> --
>>> 
>>> Wolfgang Bangerth  email: bange...@colostate.edu
>>> www:
>>> http://www.math.colostate.edu/~bangerth/
>>>
>>>
>>> --
>>> The deal.II project is located at http://www.dealii.org/
>>> For mailing list/forum options, see
>>> https://groups.google.com/d/forum/dealii?hl=en
>>> ---
>>> You received this message because you are subscribed to the Google
>>> Groups "deal.II User Group" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to dealii+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/dealii/0e01960a-3886-fc95-8332-a378acc1d7f8%40colostate.edu
>>> .
>>>
>> --
>> The deal.II project is located at http://www.dealii.org/
>> For mailing list/forum options, see
>> https://groups.google.com/d/forum/dealii?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "deal.II User Group" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to dealii+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/dealii/CAACKYBj65E817-x1GO%3DN7zrB5Op7X%3DFn1itAovnKaQ60CYyeyg%40mail.gmail.com
>> 
>> .
>>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to a topic in the
> Google Groups "deal.II User Group" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/dealii/pkGlpp5uJUE/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> dealii+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/CAArwj0FJ1EgswOFeSh2ECHH-GBnx54reEv8wXaT79%3DFL679%3D7w%40mail.gmail.com
> 
> .
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group a

Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-10-01 Thread timo Hyvärinen
Hi, Kendrick and Wolfgang,

Thank you for your reply.

I have two questions on hand:
(I) How to make the program run faster through Valgrind profiling? I know I
should use cachegrind and callgrind, but I don't know is what things I
should pay attention to from cachegrind/callgrind reports, what properties
have significant impact on speed;
(2) GPU acceleration (this may be a bad question for this thread, but I
really want to ask). I know dealii has CUDA wrapper and Kokkos leverage,
but what I don't know is how I can use them to speed up my matrix-based
newton iteration code. A straightforward idea in my mind is to use GPU for
system-matrix assembly, but I didn't see this in the only cuda tutorial
i.e., step-64. So I wonder what's the common way in deall for using GPU in
matrix-based code.

Tim,
Sincerely

On Mon, Oct 2, 2023 at 2:10 AM kendrick zoom  wrote:

> Hello Wolfgang,
> How are you doing today?
>
> Well your question is not quite clear.
> So what exactly do you want to know?
>
> On Sun, Oct 1, 2023 at 3:40 PM Wolfgang Bangerth 
> wrote:
>
>> On 9/30/23 02:01, timo Hyvärinen wrote:
>> >
>> > So my questions here are (i) Did this issue ever happened for other
>> deal.II
>> > applications, how to solve it expect increase the number of nodes or
>> memory
>> > requirements; (ii) What kind of profiling/debugger tools
>> nowaday's deal.II
>> > experts are using to dress memory issue. Should I build Valgrind by
>> myself?
>> > Does Valgrind only support MPI 2, my openMPI is v.3.
>>
>> Valgrind doesn't care.
>>
>> 6.5*10^5 unknowns with a quadratic element in 3d can probably be expected
>> to
>> take in the range of 2-5 GB. That should fit into most machines. But at
>> the
>> same time, this is a small enough problem that you can run it under
>> valgrind's
>> memory profilers on any workstation or laptop you have access to. You
>> could
>> also talk to the system administrators of the cluster you work on to see
>> whether they are willing to give you a more up to date version of
>> valgrind.
>>
>> Best
>>   W.
>>
>> --
>> 
>> Wolfgang Bangerth  email: bange...@colostate.edu
>> www: http://www.math.colostate.edu/~bangerth/
>>
>>
>> --
>> The deal.II project is located at http://www.dealii.org/
>> For mailing list/forum options, see
>> https://groups.google.com/d/forum/dealii?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "deal.II User Group" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to dealii+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/dealii/0e01960a-3886-fc95-8332-a378acc1d7f8%40colostate.edu
>> .
>>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dealii+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/CAACKYBj65E817-x1GO%3DN7zrB5Op7X%3DFn1itAovnKaQ60CYyeyg%40mail.gmail.com
> 
> .
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAArwj0FJ1EgswOFeSh2ECHH-GBnx54reEv8wXaT79%3DFL679%3D7w%40mail.gmail.com.


Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-10-01 Thread timo Hyvärinen
hi, Wolfgang,

Thank you for your reply.

I did manage to run valgrind memcheck by launching job
(--ntask-per-node=64, --cpu-per-task=2) as
"
srun valgrind --tool=memcheck --leak-check=full --track-origins=yes
--suppressions=openmpi-valgrind.supp /my/project/path/binary > log.log
"
and this is typical output of memcheck for one task:
"
==1024886== HEAP SUMMARY:
==1024886== in use at exit: 10,109,942 bytes in 20,666 blocks
==1024886==   total heap usage: 8,070,458 allocs, 8,049,792 frees,
10,241,574,631 bytes allocated
==1024886==
==1024855==
==1024906== HEAP SUMMARY:
==1024906== in use at exit: 10,129,660 bytes in 20,721 blocks
==1024906==   total heap usage: 6,674,001 allocs, 6,653,280 frees,
50,454,219,932 bytes allocated
==1024906==
==1024910== HEAP SUMMARY:
==1024910== in use at exit: 10,110,344 bytes in 20,671 blocks
==1024910==   total heap usage: 7,738,278 allocs, 7,717,607 frees,
9,563,207,155 bytes allocated
"
I'm not sure if the mpi-valgrind suppress  ever works, but there are about
10M leaks for one task.

Tim,
Sincerely

On Mon, Oct 2, 2023 at 1:40 AM Wolfgang Bangerth 
wrote:

> On 9/30/23 02:01, timo Hyvärinen wrote:
> >
> > So my questions here are (i) Did this issue ever happened for other
> deal.II
> > applications, how to solve it expect increase the number of nodes or
> memory
> > requirements; (ii) What kind of profiling/debugger tools
> nowaday's deal.II
> > experts are using to dress memory issue. Should I build Valgrind by
> myself?
> > Does Valgrind only support MPI 2, my openMPI is v.3.
>
> Valgrind doesn't care.
>
> 6.5*10^5 unknowns with a quadratic element in 3d can probably be expected
> to
> take in the range of 2-5 GB. That should fit into most machines. But at
> the
> same time, this is a small enough problem that you can run it under
> valgrind's
> memory profilers on any workstation or laptop you have access to. You
> could
> also talk to the system administrators of the cluster you work on to see
> whether they are willing to give you a more up to date version of valgrind.
>
> Best
>   W.
>
> --
> 
> Wolfgang Bangerth  email: bange...@colostate.edu
> www: http://www.math.colostate.edu/~bangerth/
>
>
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAArwj0He3R8QutTg6sREUFjtDpWGUxFTQvtF--yFmVCm5mHZFw%40mail.gmail.com.


Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-10-01 Thread kendrick zoom
Hello Wolfgang,
How are you doing today?

Well your question is not quite clear.
So what exactly do you want to know?

On Sun, Oct 1, 2023 at 3:40 PM Wolfgang Bangerth 
wrote:

> On 9/30/23 02:01, timo Hyvärinen wrote:
> >
> > So my questions here are (i) Did this issue ever happened for other
> deal.II
> > applications, how to solve it expect increase the number of nodes or
> memory
> > requirements; (ii) What kind of profiling/debugger tools
> nowaday's deal.II
> > experts are using to dress memory issue. Should I build Valgrind by
> myself?
> > Does Valgrind only support MPI 2, my openMPI is v.3.
>
> Valgrind doesn't care.
>
> 6.5*10^5 unknowns with a quadratic element in 3d can probably be expected
> to
> take in the range of 2-5 GB. That should fit into most machines. But at
> the
> same time, this is a small enough problem that you can run it under
> valgrind's
> memory profilers on any workstation or laptop you have access to. You
> could
> also talk to the system administrators of the cluster you work on to see
> whether they are willing to give you a more up to date version of valgrind.
>
> Best
>   W.
>
> --
> 
> Wolfgang Bangerth  email: bange...@colostate.edu
> www: http://www.math.colostate.edu/~bangerth/
>
>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dealii+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/0e01960a-3886-fc95-8332-a378acc1d7f8%40colostate.edu
> .
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAACKYBj65E817-x1GO%3DN7zrB5Op7X%3DFn1itAovnKaQ60CYyeyg%40mail.gmail.com.


Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-10-01 Thread Wolfgang Bangerth

On 9/30/23 02:01, timo Hyvärinen wrote:


So my questions here are (i) Did this issue ever happened for other deal.II 
applications, how to solve it expect increase the number of nodes or memory 
requirements; (ii) What kind of profiling/debugger tools nowaday's deal.II 
experts are using to dress memory issue. Should I build Valgrind by myself? 
Does Valgrind only support MPI 2, my openMPI is v.3.


Valgrind doesn't care.

6.5*10^5 unknowns with a quadratic element in 3d can probably be expected to 
take in the range of 2-5 GB. That should fit into most machines. But at the 
same time, this is a small enough problem that you can run it under valgrind's 
memory profilers on any workstation or laptop you have access to. You could 
also talk to the system administrators of the cluster you work on to see 
whether they are willing to give you a more up to date version of valgrind.


Best
 W.

--

Wolfgang Bangerth  email: bange...@colostate.edu
   www: http://www.math.colostate.edu/~bangerth/


--
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups "deal.II User Group" group.

To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/0e01960a-3886-fc95-8332-a378acc1d7f8%40colostate.edu.


Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-09-30 Thread Tim hyvärinen

Hi, dear community and all developers,

Here is update from my side about the questions and issue I dressed earlier:

About profiling/debugger tools, I found this thread in 
maillist: https://groups.google.com/g/dealii/c/7_JJvipz0wY/m/aFU4pTuvAQAJ?hl=en.

About out of memory error, my current solution is undersbscribing node by 
splitting one task into 2 cpus, which will be sharing 4GB memory. This 
certainly half cuts performance but saves program from crash.

Hope you guys can give any kind of suggestions or comments.

Tim,
Sincerely
On Saturday, September 30, 2023 at 11:02:05 AM UTC+3 Tim hyvärinen wrote:

> Hi, dear all, I'm back to this thread and discussion.
>
> I recompiled 9.3.3 as Release with debug flag "-g". For a 3D system with 
> linear finite element (degree = 1), in which DoF is about 9.3*10^4, batch 
> job with --ntasks-per-node=128 --cpus-per-task=1 is about 10+ times 
> faster.  
>
> When I use degree = 2 finite element (uniform grid), DoF increases to 
> 6.5*10^5, batch run with same tasks-cpu setup gains about 5 times speed up 
> (it is expected). However, the program crashes after two newton iterations 
> with error message:
> "
> slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2795730.0. Some 
> of your processes may have been killed by the cgroup out-of-memory handler.
> srun: error: c: task 40: Out Of Memory
> srun: launch/slurm: _step_signal: Terminating StepId=2795730.0
> slurmstepd: error: *** STEP 2795730.0 ON c CANCELLED AT 
> 2023-09-XXTXX:XX:XX ***
> slurmstepd: error:  mpi/pmix_v3: _errhandler: c [0]: 
> pmixp_client_v2.c:212: Error handler invoked: status = -25, source = 
> [slurm.pmix.2795730.0:40]
> "
> ,where c is node index.
>
> My first intuition for this is memory leak, then I try to run Valgrind, 
> and sadly noticed the Valgrind on the cluster was compiled with gcc 8.5, 
> while dealII was built with gcc 11.2 (gcc 8.5 ).has been removed.
>
> So my questions here are (i) Did this issue ever happened for other 
> deal.II applications, how to solve it expect increase the number of nodes 
> or memory requirements; (ii) What kind of profiling/debugger tools 
> nowaday's deal.II experts are using to dress memory issue. Should I build 
> Valgrind by myself? Does Valgrind only support MPI 2, my openMPI is v.3.
>
> Tim,
> Sincerely
>
>
> On Mon, Sep 18, 2023 at 3:47 AM Bruno Turcksin  
> wrote:
>
>> Timo,
>>
>> Yes, you want to profile the optimized library but you also want the 
>> debug info. Without it, the information given by the profiler usually makes 
>> little sense. So you compile in release mode but you use the following 
>> option when configuring your deal.II "-DCMAKE_CXX_FLAGS=-g"
>>
>> Best,
>>
>> Bruno
>>
>> Le sam. 16 sept. 2023 à 03:47, timo Hyvärinen  a 
>> écrit :
>>
>>> Hi Bruno,
>>>
>>> Thank you for your explanations.
>>>
>>> Seemingly, I should compile an optimized lib then do profiling. 
>>>
>>> Sincerely,
>>> Timo
>>>
>>> On Fri, Sep 15, 2023 at 11:04 PM Bruno Turcksin  
>>> wrote:
>>>
 Timo,

 You will get vastly different results in debug and release modes for 
 two reasons. First, the compiler generates much faster code in release 
 mode 
 compared to debug. Second, there are a lot of checks inside deal.II that 
 are only enabled in debug mode. This is great when you develop your code 
 because it helps you catch bugs early but it makes your code much slower. 
 In general, you want to develop your code in debug mode but your 
 production 
 run should be done in release.

 Best,

 Bruno

 On Friday, September 15, 2023 at 1:53:59 PM UTC-4 Tim hyvärinen wrote:

 hi, Marc,

 Thank you for the reply.

 I compiled the lib with debug mode, didn't try the optimized version. 
 I didn't think this could be a significant issue, but I infer optimized 
 lib could improve performance alot based on your question. 

 Sincerely,
 Timo

 On Fri, Sep 15, 2023 at 8:21 PM Marc Fehling  wrote:

 Hello Tim,

 > Yet, even though it is universally believed to be superior in terms 
 of convergence properties, it is not widely used because it is often 
 believed to be difficult to implement. One way to address this belief is 
 to 
 provide well-tested, easy to use software that provides this kind of 
 functionality. 


 Just to make sure: did you compile the deal.II library and your code in 
 Optimized 
 mode/Release mode 
 ?

 Best,
 Marc

 On Friday, September 15, 2023 at 3:17:39 AM UTC-6 Tim hyvärinen wrote:

 Dear dealii community and developers,

 I have used dealii framework (9.3.x) a while on HPC machine. My project 
 involves solving vector-valued nonlinear PDE with nine components.
 Currently, I've implemented damping newton iteration with GMRES+AMG 
 precon

Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-09-30 Thread timo Hyvärinen
Hi, dear all, I'm back to this thread and discussion.

I recompiled 9.3.3 as Release with debug flag "-g". For a 3D system with
linear finite element (degree = 1), in which DoF is about 9.3*10^4, batch
job with --ntasks-per-node=128 --cpus-per-task=1 is about 10+ times
faster.

When I use degree = 2 finite element (uniform grid), DoF increases to
6.5*10^5, batch run with same tasks-cpu setup gains about 5 times speed up
(it is expected). However, the program crashes after two newton iterations
with error message:
"
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2795730.0. Some
of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: c: task 40: Out Of Memory
srun: launch/slurm: _step_signal: Terminating StepId=2795730.0
slurmstepd: error: *** STEP 2795730.0 ON c CANCELLED AT
2023-09-XXTXX:XX:XX ***
slurmstepd: error:  mpi/pmix_v3: _errhandler: c [0]:
pmixp_client_v2.c:212: Error handler invoked: status = -25, source =
[slurm.pmix.2795730.0:40]
"
,where c is node index.

My first intuition for this is memory leak, then I try to run Valgrind, and
sadly noticed the Valgrind on the cluster was compiled with gcc 8.5, while
dealII was built with gcc 11.2 (gcc 8.5 ).has been removed.

So my questions here are (i) Did this issue ever happened for other deal.II
applications, how to solve it expect increase the number of nodes or memory
requirements; (ii) What kind of profiling/debugger tools nowaday's deal.II
experts are using to dress memory issue. Should I build Valgrind by myself?
Does Valgrind only support MPI 2, my openMPI is v.3.

Tim,
Sincerely


On Mon, Sep 18, 2023 at 3:47 AM Bruno Turcksin 
wrote:

> Timo,
>
> Yes, you want to profile the optimized library but you also want the debug
> info. Without it, the information given by the profiler usually makes
> little sense. So you compile in release mode but you use the following
> option when configuring your deal.II "-DCMAKE_CXX_FLAGS=-g"
>
> Best,
>
> Bruno
>
> Le sam. 16 sept. 2023 à 03:47, timo Hyvärinen 
> a écrit :
>
>> Hi Bruno,
>>
>> Thank you for your explanations.
>>
>> Seemingly, I should compile an optimized lib then do profiling.
>>
>> Sincerely,
>> Timo
>>
>> On Fri, Sep 15, 2023 at 11:04 PM Bruno Turcksin 
>> wrote:
>>
>>> Timo,
>>>
>>> You will get vastly different results in debug and release modes for two
>>> reasons. First, the compiler generates much faster code in release mode
>>> compared to debug. Second, there are a lot of checks inside deal.II that
>>> are only enabled in debug mode. This is great when you develop your code
>>> because it helps you catch bugs early but it makes your code much slower.
>>> In general, you want to develop your code in debug mode but your production
>>> run should be done in release.
>>>
>>> Best,
>>>
>>> Bruno
>>>
>>> On Friday, September 15, 2023 at 1:53:59 PM UTC-4 Tim hyvärinen wrote:
>>>
>>> hi, Marc,
>>>
>>> Thank you for the reply.
>>>
>>> I compiled the lib with debug mode, didn't try the optimized version.
>>> I didn't think this could be a significant issue, but I infer optimized
>>> lib could improve performance alot based on your question.
>>>
>>> Sincerely,
>>> Timo
>>>
>>> On Fri, Sep 15, 2023 at 8:21 PM Marc Fehling  wrote:
>>>
>>> Hello Tim,
>>>
>>> > Yet, even though it is universally believed to be superior in terms
>>> of convergence properties, it is not widely used because it is often
>>> believed to be difficult to implement. One way to address this belief is to
>>> provide well-tested, easy to use software that provides this kind of
>>> functionality.
>>>
>>>
>>> Just to make sure: did you compile the deal.II library and your code in 
>>> Optimized
>>> mode/Release mode
>>> ?
>>>
>>> Best,
>>> Marc
>>>
>>> On Friday, September 15, 2023 at 3:17:39 AM UTC-6 Tim hyvärinen wrote:
>>>
>>> Dear dealii community and developers,
>>>
>>> I have used dealii framework (9.3.x) a while on HPC machine. My project
>>> involves solving vector-valued nonlinear PDE with nine components.
>>> Currently, I've implemented damping newton iteration with GMRES+AMG
>>> preconditioner with MPI on distributed memory architecture.
>>>
>>> A simple timing tells me the assembly process of system-matrix takes 99%
>>> of the whole running time in every newton iteration. I guess there are
>>> a lot of idle cpu times during assembly because I don't take advantage
>>> of thread parallelism yet.
>>>
>>> So here is my question, which tutorial steps demonstrate how to
>>> implement the mpi-thread hybrid parallelism. I've found step-48 is talking
>>> about this, but
>>> I wonder are there any other tutorial programs to look at? I also wonder
>>> if any of you guys have suggestions about mpi+thread parallelism under
>>> dealii framework?
>>>
>>> Sincerely,
>>> Timo Hyvarinen
>>>
>>> --
>>>
>>> The deal.II project is located at http://www.dealii.org/
>>> For mailing list/forum options, see
>>> https://groups

Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-09-17 Thread Bruno Turcksin
Timo,

Yes, you want to profile the optimized library but you also want the debug
info. Without it, the information given by the profiler usually makes
little sense. So you compile in release mode but you use the following
option when configuring your deal.II "-DCMAKE_CXX_FLAGS=-g"

Best,

Bruno

Le sam. 16 sept. 2023 à 03:47, timo Hyvärinen  a
écrit :

> Hi Bruno,
>
> Thank you for your explanations.
>
> Seemingly, I should compile an optimized lib then do profiling.
>
> Sincerely,
> Timo
>
> On Fri, Sep 15, 2023 at 11:04 PM Bruno Turcksin 
> wrote:
>
>> Timo,
>>
>> You will get vastly different results in debug and release modes for two
>> reasons. First, the compiler generates much faster code in release mode
>> compared to debug. Second, there are a lot of checks inside deal.II that
>> are only enabled in debug mode. This is great when you develop your code
>> because it helps you catch bugs early but it makes your code much slower.
>> In general, you want to develop your code in debug mode but your production
>> run should be done in release.
>>
>> Best,
>>
>> Bruno
>>
>> On Friday, September 15, 2023 at 1:53:59 PM UTC-4 Tim hyvärinen wrote:
>>
>> hi, Marc,
>>
>> Thank you for the reply.
>>
>> I compiled the lib with debug mode, didn't try the optimized version.
>> I didn't think this could be a significant issue, but I infer optimized
>> lib could improve performance alot based on your question.
>>
>> Sincerely,
>> Timo
>>
>> On Fri, Sep 15, 2023 at 8:21 PM Marc Fehling  wrote:
>>
>> Hello Tim,
>>
>> > Yet, even though it is universally believed to be superior in terms of
>> convergence properties, it is not widely used because it is often believed
>> to be difficult to implement. One way to address this belief is to provide
>> well-tested, easy to use software that provides this kind of functionality.
>>
>>
>> Just to make sure: did you compile the deal.II library and your code in 
>> Optimized
>> mode/Release mode
>> ?
>>
>> Best,
>> Marc
>>
>> On Friday, September 15, 2023 at 3:17:39 AM UTC-6 Tim hyvärinen wrote:
>>
>> Dear dealii community and developers,
>>
>> I have used dealii framework (9.3.x) a while on HPC machine. My project
>> involves solving vector-valued nonlinear PDE with nine components.
>> Currently, I've implemented damping newton iteration with GMRES+AMG
>> preconditioner with MPI on distributed memory architecture.
>>
>> A simple timing tells me the assembly process of system-matrix takes 99%
>> of the whole running time in every newton iteration. I guess there are
>> a lot of idle cpu times during assembly because I don't take advantage of
>> thread parallelism yet.
>>
>> So here is my question, which tutorial steps demonstrate how to
>> implement the mpi-thread hybrid parallelism. I've found step-48 is talking
>> about this, but
>> I wonder are there any other tutorial programs to look at? I also wonder
>> if any of you guys have suggestions about mpi+thread parallelism under
>> dealii framework?
>>
>> Sincerely,
>> Timo Hyvarinen
>>
>> --
>>
>> The deal.II project is located at http://www.dealii.org/
>> For mailing list/forum options, see
>> https://groups.google.com/d/forum/dealii?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "deal.II User Group" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to dealii+un...@googlegroups.com.
>>
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/dealii/cc50d23d-b6c3-46c3-95dc-4e2250a1b56dn%40googlegroups.com
>> 
>> .
>>
>> --
>> The deal.II project is located at http://www.dealii.org/
>> For mailing list/forum options, see
>> https://groups.google.com/d/forum/dealii?hl=en
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "deal.II User Group" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to dealii+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/dealii/21a210f0-269a-4a01-8988-6e08c11d470an%40googlegroups.com
>> 
>> .
>>
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAGVt9eOOw1F7Mzr5rLmJbn0fNjHD8yBz9T_uomuvr3wEzfbd-Q%40mail.gmail.com.


Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-09-16 Thread timo Hyvärinen
Hi Bruno,

Thank you for your explanations.

Seemingly, I should compile an optimized lib then do profiling.

Sincerely,
Timo

On Fri, Sep 15, 2023 at 11:04 PM Bruno Turcksin 
wrote:

> Timo,
>
> You will get vastly different results in debug and release modes for two
> reasons. First, the compiler generates much faster code in release mode
> compared to debug. Second, there are a lot of checks inside deal.II that
> are only enabled in debug mode. This is great when you develop your code
> because it helps you catch bugs early but it makes your code much slower.
> In general, you want to develop your code in debug mode but your production
> run should be done in release.
>
> Best,
>
> Bruno
>
> On Friday, September 15, 2023 at 1:53:59 PM UTC-4 Tim hyvärinen wrote:
>
> hi, Marc,
>
> Thank you for the reply.
>
> I compiled the lib with debug mode, didn't try the optimized version.
> I didn't think this could be a significant issue, but I infer optimized
> lib could improve performance alot based on your question.
>
> Sincerely,
> Timo
>
> On Fri, Sep 15, 2023 at 8:21 PM Marc Fehling  wrote:
>
> Hello Tim,
>
> > Yet, even though it is universally believed to be superior in terms of
> convergence properties, it is not widely used because it is often believed
> to be difficult to implement. One way to address this belief is to provide
> well-tested, easy to use software that provides this kind of functionality.
>
>
> Just to make sure: did you compile the deal.II library and your code in 
> Optimized
> mode/Release mode
> ?
>
> Best,
> Marc
>
> On Friday, September 15, 2023 at 3:17:39 AM UTC-6 Tim hyvärinen wrote:
>
> Dear dealii community and developers,
>
> I have used dealii framework (9.3.x) a while on HPC machine. My project
> involves solving vector-valued nonlinear PDE with nine components.
> Currently, I've implemented damping newton iteration with GMRES+AMG
> preconditioner with MPI on distributed memory architecture.
>
> A simple timing tells me the assembly process of system-matrix takes 99%
> of the whole running time in every newton iteration. I guess there are
> a lot of idle cpu times during assembly because I don't take advantage of
> thread parallelism yet.
>
> So here is my question, which tutorial steps demonstrate how to
> implement the mpi-thread hybrid parallelism. I've found step-48 is talking
> about this, but
> I wonder are there any other tutorial programs to look at? I also wonder
> if any of you guys have suggestions about mpi+thread parallelism under
> dealii framework?
>
> Sincerely,
> Timo Hyvarinen
>
> --
>
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dealii+un...@googlegroups.com.
>
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/cc50d23d-b6c3-46c3-95dc-4e2250a1b56dn%40googlegroups.com
> 
> .
>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dealii+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/21a210f0-269a-4a01-8988-6e08c11d470an%40googlegroups.com
> 
> .
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAArwj0EMPchO3ppSjJxw2ELYwXQq5U-R1%3DN-x3PJ78c5yOWc5w%40mail.gmail.com.


Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-09-15 Thread Bruno Turcksin
Timo,

You will get vastly different results in debug and release modes for two 
reasons. First, the compiler generates much faster code in release mode 
compared to debug. Second, there are a lot of checks inside deal.II that 
are only enabled in debug mode. This is great when you develop your code 
because it helps you catch bugs early but it makes your code much slower. 
In general, you want to develop your code in debug mode but your production 
run should be done in release.

Best,

Bruno

On Friday, September 15, 2023 at 1:53:59 PM UTC-4 Tim hyvärinen wrote:

hi, Marc,

Thank you for the reply.

I compiled the lib with debug mode, didn't try the optimized version. 
I didn't think this could be a significant issue, but I infer optimized lib 
could improve performance alot based on your question. 

Sincerely,
Timo

On Fri, Sep 15, 2023 at 8:21 PM Marc Fehling  wrote:

Hello Tim,

> Yet, even though it is universally believed to be superior in terms of 
convergence properties, it is not widely used because it is often believed 
to be difficult to implement. One way to address this belief is to provide 
well-tested, easy to use software that provides this kind of functionality. 


Just to make sure: did you compile the deal.II library and your code in 
Optimized 
mode/Release mode 
?

Best,
Marc

On Friday, September 15, 2023 at 3:17:39 AM UTC-6 Tim hyvärinen wrote:

Dear dealii community and developers,

I have used dealii framework (9.3.x) a while on HPC machine. My project 
involves solving vector-valued nonlinear PDE with nine components.
Currently, I've implemented damping newton iteration with GMRES+AMG 
preconditioner with MPI on distributed memory architecture. 

A simple timing tells me the assembly process of system-matrix takes 99% of 
the whole running time in every newton iteration. I guess there are
a lot of idle cpu times during assembly because I don't take advantage of 
thread parallelism yet.

So here is my question, which tutorial steps demonstrate how to 
implement the mpi-thread hybrid parallelism. I've found step-48 is talking 
about this, but 
I wonder are there any other tutorial programs to look at? I also wonder if 
any of you guys have suggestions about mpi+thread parallelism under
dealii framework?

Sincerely,
Timo Hyvarinen 

-- 

The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an 
email to dealii+un...@googlegroups.com.

To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/cc50d23d-b6c3-46c3-95dc-4e2250a1b56dn%40googlegroups.com
 

.

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/21a210f0-269a-4a01-8988-6e08c11d470an%40googlegroups.com.


Re: [deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-09-15 Thread timo Hyvärinen
hi, Marc,

Thank you for the reply.

I compiled the lib with debug mode, didn't try the optimized version.
I didn't think this could be a significant issue, but I infer optimized lib
could improve performance alot based on your question.

Sincerely,
Timo

On Fri, Sep 15, 2023 at 8:21 PM Marc Fehling  wrote:

> Hello Tim,
>
> > Yet, even though it is universally believed to be superior in terms of
> convergence properties, it is not widely used because it is often believed
> to be difficult to implement. One way to address this belief is to provide
> well-tested, easy to use software that provides this kind of functionality.
>
> Just to make sure: did you compile the deal.II library and your code in 
> Optimized
> mode/Release mode
> ?
>
> Best,
> Marc
>
> On Friday, September 15, 2023 at 3:17:39 AM UTC-6 Tim hyvärinen wrote:
>
>> Dear dealii community and developers,
>>
>> I have used dealii framework (9.3.x) a while on HPC machine. My project
>> involves solving vector-valued nonlinear PDE with nine components.
>> Currently, I've implemented damping newton iteration with GMRES+AMG
>> preconditioner with MPI on distributed memory architecture.
>>
>> A simple timing tells me the assembly process of system-matrix takes 99%
>> of the whole running time in every newton iteration. I guess there are
>> a lot of idle cpu times during assembly because I don't take advantage of
>> thread parallelism yet.
>>
>> So here is my question, which tutorial steps demonstrate how to
>> implement the mpi-thread hybrid parallelism. I've found step-48 is talking
>> about this, but
>> I wonder are there any other tutorial programs to look at? I also wonder
>> if any of you guys have suggestions about mpi+thread parallelism under
>> dealii framework?
>>
>> Sincerely,
>> Timo Hyvarinen
>>
> --
> The deal.II project is located at http://www.dealii.org/
> For mailing list/forum options, see
> https://groups.google.com/d/forum/dealii?hl=en
> ---
> You received this message because you are subscribed to the Google Groups
> "deal.II User Group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dealii+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dealii/cc50d23d-b6c3-46c3-95dc-4e2250a1b56dn%40googlegroups.com
> 
> .
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/CAArwj0Hxf4m6AAgtWnN-vQ9pLU%3DjKb2%3DU6VVDdcgv6cpkQziQg%40mail.gmail.com.


[deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-09-15 Thread Marc Fehling
Hello Tim,

> A simple timing tells me the assembly process of system-matrix takes 99% 
of the whole running time in every newton iteration.

Just to make sure: did you compile the deal.II library and your code in 
Optimized 
mode/Release mode 
?

Best,
Marc


On Friday, September 15, 2023 at 3:17:39 AM UTC-6 Tim hyvärinen wrote:

> Dear dealii community and developers,
>
> I have used dealii framework (9.3.x) a while on HPC machine. My project 
> involves solving vector-valued nonlinear PDE with nine components.
> Currently, I've implemented damping newton iteration with GMRES+AMG 
> preconditioner with MPI on distributed memory architecture. 
>
> A simple timing tells me the assembly process of system-matrix takes 99% 
> of the whole running time in every newton iteration. I guess there are
> a lot of idle cpu times during assembly because I don't take advantage of 
> thread parallelism yet.
>
> So here is my question, which tutorial steps demonstrate how to 
> implement the mpi-thread hybrid parallelism. I've found step-48 is talking 
> about this, but 
> I wonder are there any other tutorial programs to look at? I also wonder 
> if any of you guys have suggestions about mpi+thread parallelism under
> dealii framework?
>
> Sincerely,
> Timo Hyvarinen 
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/632302e4-aa22-4ae8-af2a-7410a167f122n%40googlegroups.com.


[deal.II] Re: question about hybrid MPI and TBB thread parallelism

2023-09-15 Thread Marc Fehling
Hello Tim,

> Yet, even though it is universally believed to be superior in terms of 
convergence properties, it is not widely used because it is often believed 
to be difficult to implement. One way to address this belief is to provide 
well-tested, easy to use software that provides this kind of functionality. 

Just to make sure: did you compile the deal.II library and your code in 
Optimized 
mode/Release mode 
?

Best,
Marc

On Friday, September 15, 2023 at 3:17:39 AM UTC-6 Tim hyvärinen wrote:

> Dear dealii community and developers,
>
> I have used dealii framework (9.3.x) a while on HPC machine. My project 
> involves solving vector-valued nonlinear PDE with nine components.
> Currently, I've implemented damping newton iteration with GMRES+AMG 
> preconditioner with MPI on distributed memory architecture. 
>
> A simple timing tells me the assembly process of system-matrix takes 99% 
> of the whole running time in every newton iteration. I guess there are
> a lot of idle cpu times during assembly because I don't take advantage of 
> thread parallelism yet.
>
> So here is my question, which tutorial steps demonstrate how to 
> implement the mpi-thread hybrid parallelism. I've found step-48 is talking 
> about this, but 
> I wonder are there any other tutorial programs to look at? I also wonder 
> if any of you guys have suggestions about mpi+thread parallelism under
> dealii framework?
>
> Sincerely,
> Timo Hyvarinen 
>

-- 
The deal.II project is located at http://www.dealii.org/
For mailing list/forum options, see 
https://groups.google.com/d/forum/dealii?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"deal.II User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to dealii+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/dealii/cc50d23d-b6c3-46c3-95dc-4e2250a1b56dn%40googlegroups.com.