Jeff Hammond <jeff.scie...@gmail.com> writes: > MPC is a great idea, although it poses some challenges w.r.t. globals and > such (however, see below). Unfortunately, "MPC conforms to the POSIX > Threads, OpenMP 3.1 and MPI 1.3 standards" ( > http://mpc.hpcframework.paratools.com/), it does not do me much good (I'm a > heavy-duty RMA user).
Yes, though I it may be enough for various hybrid things of interest here and presumably in CEA. > For those that are interested in MPC, the Intel compilers (on Linux) > support an option to change how TLS works so that MPC works. In the free world it needs a patched gcc and binutils as far as I remember, which would be painful. >> For what it's worth, you have to worry about the batch resource manager >> as well as the MPI, and you may need to ensure you're allocated complete >> nodes. There are known problems with IMPI and SGE specifically, and >> several times I've made users a lot happier with OMPI/GCC. >> > > This is likely because GCC uses one OpenMP thread when the user does not > set OMP_NUM_THREADS, whereas Intel will use one per virtual processor > (divided by MPI processes, but only if it can figure out how many). It's particularly because you need a suitable allocation and binding from the resource manager in the first place. > Both > behaviors are compliant with the OpenMP standard. GCC is doing the > conservative thing, whereas Intel is trying to maximize performance in the > case of OpenMP-only applications (more common than you think) and > MPI+OpenMP applications where Intel MPI is used. As experienced HPC users > always set OMP_NUM_THREADS (and OMP_PROC_BIND, OMP_WAIT_POLICY or > implementation-specific equivalents) explicitly anyways, this should not be > a problem. Ho, ho! I won't argue off-topic about what happens on the systems I support, but I guess others see similar issues. > As for not getting complete nodes, one is either in the cloud or the shared > debug queue and performance is secondary. I'm afraid people live in different worlds! > But as always, one should be > able to set OMP_NUM_THREADS, OMP_PROC_BIND, OMP_WAIT_POLICY to get the > right behavior. > My limited experience with SGE has caused me to conclude that any problems > associated with SGE + $X are almost certainly the fault of SGE and not $X. I'd say that's too strong, but it's often made clear I don't understand SGE. I know problems people have had with IMPI/ifort have been resolved in the upgrade churn, but I don't know what they were. >> Sure, but the trouble is that "everyone knows"" you need the hybrid >> stuff. Are there good examples of using MPI-3 instead/in comparison? >> I'd be particularly interested in convincing chemists, though as they >> don't believe in deadlock and won't measure things, that's probably a >> lost cause. Not all chemists, of course. > > PETSc ( > http://www.orau.gov/hpcor2015/whitepapers/Exascale_Computing_without_Threads-Barry_Smith.pdf > ) PETSc can't be using MPI-3 because I'm in the process of fixing rpm packaging for the current version and building it with ompi 1.6. (Exascale is only of interest if when are spins-off useful for university-scale systems.) I was hoping for a running example. > Quantum chemistry or molecular dynamics? The former is what I mainly see. > Parts of quantum chemistry are so > flop heavy that stupid fork-join MPI+OpenMP is just fine. I'm doing this > in NWChem coupled cluster codes. I fork-join in every kernel even though > this is shameful, because my kernels do somewhere between 4 and 40 billion > FMAs and touch between 0.5 and 5 GB of memory. For methods that aren't > coupled-cluster, OpenMP is not always a good solution, and certainly not > for legacy codes that aren't thread-safe. OpenMP may be useful within a > core to exploit >1 thread per core (if necessary) and certainly "#pragma > omp simd" should be exploited when appropriate, but scaling OpenMP beyond > ~4 threads in most quantum chemistry codes requires an intensive rewrite. > Because of load-balancing issues in atomic integral computations, TBB or > OpenMP tasking may be more appropriate. Pity that doesn't help to make the case, but thanks :-/. The re-write does seem to have been done in various cases. > If you want to have a more detailed discussion of programming models for > computational chemistry, I'd be happy to take that discussion offline. I'd be happy to hear of them, but I'm just trying to support a range of users rather than write this stuff and the wisdom must be worth publishing in case it hasn't been already. I see the issue mostly in comp chem, and being able to refer chemists to a chemist is potentially useful, but it's presumably more general. Thanks for the thoughts.