Satish Balay <[email protected]> writes: > Works for me on vesta with [the following on sys/examples/tutorials/ex1] > > runjob --np 8192 --ranks-per-node 16 --cwd $PWD --block VST-00440-33771-512 > : $PWD/ex1 -log_summary
This is only 512 nodes. According to ALCF, the probability of MPI_Bcast crossing paths goes way up at more than 1024 nodes. IBM should really fix this problem, but until then, the workaround is to fall back to the reference implementations (PAMID_COLLECTIVES=0) which are sometimes also faster (go figure).
pgpKfCbkyZQqO.pgp
Description: PGP signature
