Jeff Hammond <jeff.scie...@gmail.com> writes:

> MPC is a great idea, although it poses some challenges w.r.t. globals and
> such (however, see below).  Unfortunately, "MPC conforms to the POSIX
> Threads, OpenMP 3.1 and MPI 1.3 standards" (
> http://mpc.hpcframework.paratools.com/), it does not do me much good (I'm a
> heavy-duty RMA user).

Yes, though I it may be enough for various hybrid things of interest
here and presumably in CEA.

> For those that are interested in MPC, the Intel compilers (on Linux)
> support an option to change how TLS works so that MPC works.

In the free world it needs a patched gcc and binutils as far as I
remember, which would be painful.

>> For what it's worth, you have to worry about the batch resource manager
>> as well as the MPI, and you may need to ensure you're allocated complete
>> nodes.  There are known problems with IMPI and SGE specifically, and
>> several times I've made users a lot happier with OMPI/GCC.
>>
>
> This is likely because GCC uses one OpenMP thread when the user does not
> set OMP_NUM_THREADS, whereas Intel will use one per virtual processor
> (divided by MPI processes, but only if it can figure out how many).

It's particularly because you need a suitable allocation and binding
from the resource manager in the first place.

> Both
> behaviors are compliant with the OpenMP standard.  GCC is doing the
> conservative thing, whereas Intel is trying to maximize performance in the
> case of OpenMP-only applications (more common than you think) and
> MPI+OpenMP applications where Intel MPI is used.  As experienced HPC users
> always set OMP_NUM_THREADS (and OMP_PROC_BIND, OMP_WAIT_POLICY or
> implementation-specific equivalents) explicitly anyways, this should not be
> a problem.

Ho, ho!

I won't argue off-topic about what happens on the systems I support, but
I guess others see similar issues.

> As for not getting complete nodes, one is either in the cloud or the shared
> debug queue and performance is secondary.

I'm afraid people live in different worlds!

> But as always, one should be
> able to set OMP_NUM_THREADS, OMP_PROC_BIND, OMP_WAIT_POLICY to get the
> right behavior.

> My limited experience with SGE has caused me to conclude that any problems
> associated with SGE + $X are almost certainly the fault of SGE and not $X.

I'd say that's too strong, but it's often made clear I don't understand
SGE.  I know problems people have had with IMPI/ifort have been resolved
in the upgrade churn, but I don't know what they were.

>> Sure, but the trouble is that "everyone knows"" you need the hybrid
>> stuff.  Are there good examples of using MPI-3 instead/in comparison?
>> I'd be particularly interested in convincing chemists, though as they
>> don't believe in deadlock and won't measure things, that's probably a
>> lost cause.  Not all chemists, of course.
>
> PETSc (
> http://www.orau.gov/hpcor2015/whitepapers/Exascale_Computing_without_Threads-Barry_Smith.pdf
> )

PETSc can't be using MPI-3 because I'm in the process of fixing rpm
packaging for the current version and building it with ompi 1.6.
(Exascale is only of interest if when are spins-off useful for
university-scale systems.)  I was hoping for a running example.

> Quantum chemistry or molecular dynamics?

The former is what I mainly see.

> Parts of quantum chemistry are so
> flop heavy that stupid fork-join MPI+OpenMP is just fine.  I'm doing this
> in NWChem coupled cluster codes.  I fork-join in every kernel even though
> this is shameful, because my kernels do somewhere between 4 and 40 billion
> FMAs and touch between 0.5 and 5 GB of memory.  For methods that aren't
> coupled-cluster, OpenMP is not always a good solution, and certainly not
> for legacy codes that aren't thread-safe.  OpenMP may be useful within a
> core to exploit >1 thread per core (if necessary) and certainly "#pragma
> omp simd" should be exploited when appropriate, but scaling OpenMP beyond
> ~4 threads in most quantum chemistry codes requires an intensive rewrite.
> Because of load-balancing issues in atomic integral computations, TBB or
> OpenMP tasking may be more appropriate.

Pity that doesn't help to make the case, but thanks :-/.  The re-write
does seem to have been done in various cases.

> If you want to have a more detailed discussion of programming models for
> computational chemistry, I'd be happy to take that discussion offline.

I'd be happy to hear of them, but I'm just trying to support a range of
users rather than write this stuff and the wisdom must be worth
publishing in case it hasn't been already.  I see the issue mostly in
comp chem, and being able to refer chemists to a chemist is potentially
useful, but it's presumably more general.

Thanks for the thoughts.

Reply via email to