Satish Balay <[email protected]> writes:

> Works for me on vesta with [the following on sys/examples/tutorials/ex1]
>
> runjob --np 8192  --ranks-per-node 16 --cwd $PWD --block  VST-00440-33771-512 
> : $PWD/ex1 -log_summary

This is only 512 nodes.  According to ALCF, the probability of MPI_Bcast
crossing paths goes way up at more than 1024 nodes.  IBM should really
fix this problem, but until then, the workaround is to fall back to the
reference implementations (PAMID_COLLECTIVES=0) which are sometimes
also faster (go figure).

Attachment: pgpKfCbkyZQqO.pgp
Description: PGP signature

Reply via email to