Dear Wien2k mailing list, in some recent discussion with profs. Marks and Blaha it was shown that under some circumstances the threading parallelization in Wien2k and its interaction with threaded BLAS/LAPACK environment variable (MKL but possibly also OpenBLAS and others) might have unexpected behavior potentially leading either to not perfect utilization of nodes (underload) or too many contending threads (overload), both reducing optimal speed of calculations.
Short story with just three points: - Occasionally check the load of your nodes when running (either with "top", similar program or using your job scheduler reporting). If its much higher or lower than the number of cores, than this could be a problem and please continue reading. - If you have previously set MKL_NUM_THREADS, OPENBLAS_NUM_THREADS or any other equivalent BLAS/LAPACK specific threading variable, please unset them. - If you linked with non-default MKL settings or linked with different threaded BLAS/LAPACK such as OpenBLAS, please make sure that you BLAS/LAPACK library is internally threaded with OpenMP (not pthreads, TBB or any other threading library) and it uses the same OpenMP library as Wien2k (one example of such problematic config would be when compiling Wien2k with gfortran using MKL and using libiomp5 for MKL threading but libgomp for OpenMP threading in Wien2k itself). Best regards Pavel P.S.: Long story for people interested in technical details: Wien2k links with the threaded MKL by default and threaded OpenBLAS is usually also the default which distributions provide. In Wien2k versions before 19 when running stuff k-parallel and without OMP_NUM_THREADS set (or the BLAS specific equivalent env variables) the parallel BLAS/LAPACK libraries usually try to use the maximum number of cores, leading to overload if multiple k-points were running on single node. This was fixed with Wien2k 19.1 where the threading is now explicitly controlled from machines files and when no threading is specified it defaults to one thread per process. Another problem is with the BLAS/LAPACK specific threading variables such as MKL_NUM_THREADS, OPENBLAS_NUM_TRHEADS, etc. They have higher priority than the OMP_NUM_THREADS which is set by Wien2k internally based on the omp_xxx:y lines in .machines file and therefore can overwrite optimal threading set by the user. Unsetting them will make the parallel BLAS/LAPACK obey settings from the .machines file. More problems can occur when combining different threading models in Wien2k and BLAS/LAPACK (such as OpenMP and POSIX threads) or using OpenMP threading in both but different OpenMP libraries (for example Intel and GNU). This is most likely to happen when using gfortran and distro-provided OpenBLAS as its default threading is with ptreads. The OpenMP parallelization in Wien2k works in such a way that there are some explicit OpenMP parallel regions in which there might be also BLAS/LAPACK calls. In other places the BLAS/LAPACK calls are done from serial regions and we depend on parallelization at the BLAS/LAPACK level. If using OpenMP and same omplib everywhere that in the first case the BLAS/LAPACK library will recognize it is already being called from parallel region and run only single threaded while in the second case it will run with multiple threads as expected. If combining threading models or different threading libraries the BLAS/LAPACK calls from OpenMP parallel regions have no way of knowing there are already multiple threads and can each spawn more threads leading again to overload. _______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html