Jim
You can enable thorn SystemTopology, but disable its setting that makes it
change CPU bindings. This will output the bindings that OpenMP chooses.
The problem with setting the bindings via OpenMP is that OpenMP is not
aware of the multiple MPI processes, and thus cannot prevent that multiple
Hi Roland, all,
I tried the changes Roland made to the runscript on stampede2. The point
being to see if by choosing a different OpenMP binding than what
Stampede2 uses by default, we can achieve better run speeds without
enabling hwloc/SystemTopology. The answer is yes.
I looked at the cas
Hello Jim,
thank you for benchmarking these. I have just updated the defaults in
simfactory to be 2 threads per node (ie 24 MPI ranks) since this gave
you the fastest simulation when using hwloc (though not without).
I suspect the hwloc requirement is due to bad default layout of the
threads by T
Very good! That looks like a 25% speed improvement in the mid-range of #MPI
processes per node.
It also looks as if the maximum speed is achieved by using between 8 and 24
MPI processes per node, i.e. between 2 and 6 OpenMP threads per MPI process.
-erik
On Mon, Feb 19, 2018 at 10:07 AM, James H
Hello Jim,
thank you very much for giving this a spin.
Yours,
Roland
> Hi Erik, Roland, all,
>
> After our discussion on last week's telecon, I followed Roland's instructions
> on how to get the branch which has changes to how Carpet handles prolongation
> with respect to OpenMP. I reran my
Hi, for another application we found that 4-8 mpi ranks per node was necessary
in order to saturate network bandwidth. Since the application was network
bandwidth limited, this was key to performance. Joel
Sent from my Samsung Galaxy S8
Original message From: James Healy Date