Hello, Last week I opened a PR to add the configuration files for Expanse to simfactory. Expanse is an example of the new generation of AMD supercomputers. Others are Anvil, one of the other new XSEDE machines, or Puma, the newest cluster at The University of Arizona.
I have some experience with Puma and Expanse and I would like to share some thoughts, some of which come from interacting with the admins of Expanse. The problem is that I am finding terrible multi-node performance on both these machines, and I don't know if this will be a common thread among new AMD clusters. These supercomputers have similar characteristics. First, they have very high cores/node count (typically 128/node) but low memory per core (typically 2 GB / core). In these conditions, it is very easy to have a job killed by the OOM daemon. My suspicion is that it is rank 0 that goes out of memory, and the entire run is aborted. Second, depending on the MPI implementation, MPI collective operations can be extremely expensive. I was told that the best implementation is mvapich 2.3.6 (at the moment). This seems to be due to the high core count. I found that the code does not scale well. This is possibly related to the previous point. If your job can fit on a single node, it will run wonderfully. However, when you perform the same simulation on two nodes, the code will actually be slower. This indicates that there's no strong scaling at all from 1 node to 2 (128 to 256 cores, or 32 to 64 MPI ranks). Using mvapich 2.3.6 improves the situation, but it is still faster to use fewer nodes. (My benchmark is a par file I've tested extensively on Frontera) I am working with Expanse's support staff to see what we can do, but I wonder if anyone has had a positive experience with this architecture and has some tips to share. Gabriele
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
