Hi Roland, thanks for your answer.
The test suites are quick running parfiles with small grids, so running > them on large numbers of MPI ranks (they are designed for 1 or 2 MPI > ranks) can lead to unexpected situations (such as an MPI rank having no > grid points at all). Generally, if the tests work for 1,2,4 ranks (4 being the largest > number of procs requested by any test.ccl file) then this is sufficient. Frontera and Stampede2 use 24/28 MPI processes, but the tests still pass. I am particularly looking at the test ADMMass/tov_carpet.par, where the numbers are off, but no error is thrown. Another example is Exact/de_Sitter.par. Other tests do fail because of Carpet errors, which might be what you are describing. Can you create a pull request for the "linux" architecture file with > the changes for the AMD compiler you found, please? So far it sees you > mostly only changed the detection part, does it then not also require > some changes in the "set values" part of the file? Eg default values > for optimization, preprocessor or so? Where is the repo? I am not too familiar with what that file is supposed to set. But, I only changed what was needed to at least start the compilation. Gabriele On Wed, Aug 18, 2021 at 8:20 AM Roland Haas <[email protected]> wrote: > Hello Gabriele, > > Thank you for contributing these. > > The test suites are quick running parfiles with small grids, so running > them on large numbers of MPI ranks (they are designed for 1 or 2 MPI > ranks) can lead to unexpected situations (such as an MPI rank having no > grid points at all). > > Generally, if the tests work for 1,2,4 ranks (4 being the largest > number of procs requested by any test.ccl file) then this is sufficient. > > In principle even running on more MPI ranks should work, so if you know > which tests fail with the larger number of MPI ranks and were to list > them in a ticket, maybe someone could look into this. > > Note that you can undersubscribe compute node, in particular for > tests, if you do not need / want to use all cores. > > Can you create a pull request for the "linux" architecture file with > the changes for the AMD compiler you found, please? So far it sees you > mostly only changed the detection part, does it then not also require > some changes in the "set values" part of the file? Eg default values > for optimization, preprocessor or so? > > Yours, > Roland > > > Hello, > > > > Two days ago, I opened a PR to the simfactory repo to add Expanse, > > the newest machine at the San Diego Supercomputing Center, based on > > AMD Epyc "Rome" CPUs and part of XSEDE. In the meantime, I realized > > that some tests are failing miserably, but I couldn't figure out why. > > > > Before I describe what I found, let me start with a side node on AMD > > compilers. > > > > <side node> > > > > There are four compilers available on Expanse: GNU, Intel, AMD, and PGI. > > I did not touch the PGI compilers. I briefly tried (and failed) to > compile > > with > > the AMD compilers (aocc and flang). I did not try hard, and it seems that > > most of the libraries on Expanse are compiled with gcc anyways. > > > > A first step to support these compilers is adding the lines: > > > > elif test "`$F90 --version 2>&1 | grep AMD`" ; then > > LINUX_F90_COMP=AMD > > else > > > > elif test "`$CC --version 2>&1 | grep AMD`" ; then > > LINUX_C_COMP=AMD > > fi > > > > elif test "`$CC --version 2>&1 | grep AMD`" ; then > > LINUX_CXX_COMP=AMD > > fi > > > > in the obvious places in flesh/lib/make/known-architecture/linux. > > > > </side node> > > > > I successfully compiled the Einstein Toolkit with > > - gcc 10.2.0 and OpenMPI 4.0.4 > > - gcc 9.2.0 and OpenMPI 4.0.4 > > - intel 2019 and Intel MPI 2019 > > > > I noticed that some tests, like ADMMass/tov_carpet.par, gave > > completely incorrect results. For example, the expected value is 1.3, > > but I would find 1.6. > > > > I disabled all the optimizations, but the test would keep failing. At the > > end, I noticed that if I ran with 8/16/32 MPI processes per node, and > > the corresponding number of OpenMP threads (128/N_MPI), the test > > would fail, but if I ran with 4/2/1 MPI processes, the test would pass. > > > > Most of my experiments were with gcc 10, but the test fails also with > > the Intel suite. > > > > I tried increasing the OMP_STACK_SIZE to a very large value, but > > it didn't help. > > > > Any idea of what the problem might be? > > > > Gabriele > > > > -- > My email is as private as my paper mail. I therefore support encrypting > and signing email messages. Get my PGP key from http://pgp.mit.edu . >
_______________________________________________ Users mailing list [email protected] http://lists.einsteintoolkit.org/mailman/listinfo/users
