Dear Spandan, A minor addition to Erik's statement. If you haven't already, I would recommend outputing carpet timing statistics. Unless you are running just a calculation for 24hrs (no output, no checkpoints, no regridding etc), a 30min test will give you a false sense of the expected evolution time per hour.
Cheers, Samuel From: Erik Schnetter <[email protected]> To: Spandan Sarma 19306 <[email protected]> CC: [email protected] Date: Dec 19, 2022 18:33:07 Subject: Re: [Users] Issue with Multiple Node Simulation on cluster > Spandan > > It is quite possible that different build options, different MPI > options, or different parameter settings would improve the performance > of your calculation. Performance optimisation is a difficult topic, > and it's impossible to say anything in general. A good starting point > would be to run your simulation on a different system, to run a > different parameter file with a known setup on your system, and then > to compare. > > -erik > > > > > On Thu, Dec 15, 2022 at 4:19 AM Spandan Sarma 19306 > <[email protected]> wrote: >> >> Dear Erik and Steven, >> >> Thank you so much for the suggestions. We changed the runscript to add -x >> OMP_NUMTHREADS to the command line and it worked in solving the issue with >> the total number of threads being 144. Now it sets to 32 (equal to the >> number of procs). >> >> Also, the iterations have increased to 132105 for 32 procs (24 hr walltime) >> compared to just 240 before. Although this is a huge increase, we expected >> it to be a bit more. For a shorter walltime (30 mins) we received iterations >> - 2840, 2140, 1216 for procs - 32, 16, 8. Are there any more changes that we >> can do to improve on this? >> >> The new runscript and the output file for 32 procs are attached below (no >> changes were made to the machine file, option list and the submit script >> from before). >> >> On Fri, Dec 9, 2022 at 8:13 PM Steven R. Brandt <[email protected]> wrote: >>> >>> It's not too late to do a check, though, to see if all other nodes have >>> the same OMP_NUM_THREADS value. Maybe that's the warning? It sounds like >>> it should be an error. >>> >>> --Steve >>> >>> On 12/8/2022 5:23 PM, Erik Schnetter wrote: >>>> Steve >>>> >>>> Code that runs as part of the Cactus executable is running too late >>>> for this. At that time, OpenMP has already been initialized. >>>> >>>> There is the environment variable "CACTUS_NUM_THREADS" which is >>>> checked at run time, but only if it is set (for backward >>>> compatibility). Most people do not bother setting it, leaving this >>>> error undetected. There is a warning output, but these are generally >>>> ignored. >>>> >>>> -erik >>>> >>>> On Thu, Dec 8, 2022 at 3:48 PM Steven R. Brandt <[email protected]> >>>> wrote: >>>>> We could probably add some startup code in which MPI broadcasts the >>>>> OMP_NUM_THREADS setting to all the other processes and either checks the >>>>> value of the environment variable or calls omp_set_num_threads() or some >>>>> such. >>>>> >>>>> --Steve >>>>> >>>>> On 12/8/2022 9:03 AM, Erik Schnetter wrote: >>>>>> … >>>>> _______________________________________________ >>>>> Users mailing list >>>>> [email protected] >>>>> http://lists.einsteintoolkit.org/mailman/listinfo/users >>>> >>>> >>> _______________________________________________ >>> Users mailing list >>> [email protected] >>> http://lists.einsteintoolkit.org/mailman/listinfo/users >> >> >> >> -- >> Spandan Sarma >> BS-MS' 19 >> Department of Physics (4th Year), >> IISER Bhopal > > > > -- > Erik Schnetter <[email protected]> > http://www.perimeterinstitute.ca/personal/eschnetter/ > _______________________________________________ > Users mailing list > [email protected] > http://lists.einsteintoolkit.org/mailman/listinfo/users _______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
