Jeff Hammond <[email protected]> writes: > On Thu, Jul 23, 2020 at 9:35 PM Satish Balay <[email protected]> wrote: > >> On Thu, 23 Jul 2020, Jeff Hammond wrote: >> >> > Open-MPI refuses to let users over subscribe without an extra flag to >> > mpirun. >> >> Yes - and when using this flag - it lets the run through - but there is >> still performance degradation in oversubscribe mode. >> >> > I think Intel MPI has an option for blocking poll that supports >> > oversubscription “nicely”. >> >> What option is this? Is it compile time option or something for mpiexec? >> > > https://software.intel.com/content/www/us/en/develop/articles/tuning-the-intel-mpi-library-advanced-techniques.html > > Apply wait mode to oversubscribed jobs > > This option is particularly relevant for oversubscribed MPI jobs. The goal > is to enable the wait mode of the progress engine in order to wait for > messages without polling the fabric(s). This can save CPU cycles but > decreases the message-response rate (latency), so it should be used with > caution. To enable wait mode simply use: > > I_MPI_WAIT_MODE=1
Has anyone tested ch4:ucx? $ rg UCX_PERF_WAIT_MODE src/mpid/ch4/netmod/ucx/ucx/test/gtest/common/test_perf.cc 190: params.wait_mode = UCX_PERF_WAIT_MODE_LAST; src/mpid/ch4/netmod/ucx/ucx/src/tools/perf/perftest.c 513: params->wait_mode = UCX_PERF_WAIT_MODE_LAST; src/mpid/ch4/netmod/ucx/ucx/src/tools/perf/api/libperf.h 70: UCX_PERF_WAIT_MODE_PROGRESS, /* Repeatedly call progress */ 71: UCX_PERF_WAIT_MODE_SLEEP, /* Go to sleep */ 72: UCX_PERF_WAIT_MODE_SPIN, /* Spin without calling progress */ 73: UCX_PERF_WAIT_MODE_LAST modules/ucx/test/gtest/common/test_perf.cc 189: params.wait_mode = UCX_PERF_WAIT_MODE_LAST; modules/ucx/src/tools/perf/perftest.c 553: params->wait_mode = UCX_PERF_WAIT_MODE_LAST; modules/ucx/src/tools/perf/api/libperf.h 71: UCX_PERF_WAIT_MODE_PROGRESS, /* Repeatedly call progress */ 72: UCX_PERF_WAIT_MODE_SLEEP, /* Go to sleep */ 73: UCX_PERF_WAIT_MODE_SPIN, /* Spin without calling progress */ 74: UCX_PERF_WAIT_MODE_LAST
