On Wed, 18 Dec 2013, Jed Brown wrote: > Satish Balay <[email protected]> writes: > > > Works for me on vesta with [the following on sys/examples/tutorials/ex1] > > > > runjob --np 8192 --ranks-per-node 16 --cwd $PWD --block > > VST-00440-33771-512 : $PWD/ex1 -log_summary > > This is only 512 nodes. According to ALCF, the probability of MPI_Bcast > crossing paths goes way up at more than 1024 nodes. IBM should really > fix this problem, but until then, the workaround is to fall back to the > reference implementations (PAMID_COLLECTIVES=0) which are sometimes > also faster (go figure).
I had a chat with Derek today morning. The error case was with 512 nodes [same as above] with --ranks-per-node 4 or 8. And this was on ceatus. The hang was confirmed to be in PetscInitialze [via the debugger] and -skip_petscrc went past the hang. Will try reproducing the problem on ceatus. Satish
