Dear Spandan,

A minor addition to Erik's statement. If you haven't already, I would recommend 
outputing carpet timing statistics. Unless you are running just a calculation 
for 24hrs (no output, no checkpoints, no regridding etc), a 30min test will 
give you a false sense of the expected evolution time per hour.

Cheers,
Samuel

From: Erik Schnetter <[email protected]>
To: Spandan Sarma 19306 <[email protected]>
CC: [email protected]
Date: Dec 19, 2022 18:33:07
Subject: Re: [Users] Issue with Multiple Node Simulation on cluster

> Spandan
> 
> It is quite possible that different build options, different MPI
> options, or different parameter settings would improve the performance
> of your calculation. Performance optimisation is a difficult topic,
> and it's impossible to say anything in general. A good starting point
> would be to run your simulation on a different system, to run a
> different parameter file with a known setup on your system, and then
> to compare.
> 
> -erik
> 
> 
> 
> 
> On Thu, Dec 15, 2022 at 4:19 AM Spandan Sarma 19306
> <[email protected]> wrote:
>> 
>> Dear Erik and Steven,
>> 
>> Thank you so much for the suggestions. We changed the runscript to add -x 
>> OMP_NUMTHREADS to the command line and it worked in solving the issue with 
>> the total number of threads being 144. Now it sets to 32 (equal to the 
>> number of procs).
>> 
>> Also, the iterations have increased to 132105 for 32 procs (24 hr walltime) 
>> compared to just 240 before. Although this is a huge increase, we expected 
>> it to be a bit more. For a shorter walltime (30 mins) we received iterations 
>> - 2840, 2140, 1216 for procs - 32, 16, 8. Are there any more changes that we 
>> can do to improve on this?
>> 
>> The new runscript and the output file for 32 procs are attached below (no 
>> changes were made to the machine file, option list and the submit script 
>> from before).
>> 
>> On Fri, Dec 9, 2022 at 8:13 PM Steven R. Brandt <[email protected]> wrote:
>>> 
>>> It's not too late to do a check, though, to see if all other nodes have
>>> the same OMP_NUM_THREADS value. Maybe that's the warning? It sounds like
>>> it should be an error.
>>> 
>>> --Steve
>>> 
>>> On 12/8/2022 5:23 PM, Erik Schnetter wrote:
>>>> Steve
>>>> 
>>>> Code that runs as part of the Cactus executable is running too late
>>>> for this. At that time, OpenMP has already been initialized.
>>>> 
>>>> There is the environment variable "CACTUS_NUM_THREADS" which is
>>>> checked at run time, but only if it is set (for backward
>>>> compatibility). Most people do not bother setting it, leaving this
>>>> error undetected. There is a warning output, but these are generally
>>>> ignored.
>>>> 
>>>> -erik
>>>> 
>>>> On Thu, Dec 8, 2022 at 3:48 PM Steven R. Brandt <[email protected]> 
>>>> wrote:
>>>>> We could probably add some startup code in which MPI broadcasts the
>>>>> OMP_NUM_THREADS setting to all the other processes and either checks the
>>>>> value of the environment variable or calls omp_set_num_threads() or some
>>>>> such.
>>>>> 
>>>>> --Steve
>>>>> 
>>>>> On 12/8/2022 9:03 AM, Erik Schnetter wrote:
>>>>>> …
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> [email protected]
>>>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>>> 
>>>> 
>>> _______________________________________________
>>> Users mailing list
>>> [email protected]
>>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>> 
>> 
>> 
>> --
>> Spandan Sarma
>> BS-MS' 19
>> Department of Physics (4th Year),
>> IISER Bhopal
> 
> 
> 
> -- 
> Erik Schnetter <[email protected]>
> http://www.perimeterinstitute.ca/personal/eschnetter/
> _______________________________________________
> Users mailing list
> [email protected]
> http://lists.einsteintoolkit.org/mailman/listinfo/users
_______________________________________________
Users mailing list
[email protected]
http://lists.einsteintoolkit.org/mailman/listinfo/users

Reply via email to