Hi Solal, On 04/02/2020 08:39, Solal Amouyal wrote: > From the information provided by microway: > > * 9x Intel 6540 = 11.25 TFlops (CPU taken at median flops) > * 2x V100 = 14-16 TFlops. > > So theoretically, the 2 GPUs should offer better performance, but not as > much as I've experienced. The issue lies somewhere else. > > I'll start profiling and see if the MPI isn't an issue (shouldn't be > with only 18 ranks). I'll also benchmark my BLAS to see how it performs > with respect to other measurements found online. From what I understand, > as PyFR is written in Python, it heavily relies on BLAS for compute > performance. So a few things to check. First is the compiler. Sometimes I've got better results with ICC than GCC (but always be sure to use the latest version). Secondly, I think that this case (where anti-aliasing is disabled) is limited not by FLOP/s but by memory bandwidth. Thus PyFR will probably be using GiMMiK rather than vendor BLAS on both platforms.
On CPUs one thing you can do to improve performance is to make libxsmm available on the shared library path. If available, PyFR will call into this for sparse (and dense) BLAS and it tends to outperform everything else. Another thing to check is that the OpenMP threads are not all getting pinned to the same core. This can happen with some combinations of OpenMP runtimes and MPI libraries. One thing you might want to try here is running one MPI rank per core (with OMP_NUM_THREADS=1) and seeing if this makes a difference. Regards, Freddie. -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web, visit https://groups.google.com/d/msgid/pyfrmailinglist/ad9a69ce-d8c6-c8cf-afdc-5a1bd735239c%40witherden.org.
signature.asc
Description: OpenPGP digital signature
