On Fri, Sep 1, 2017 at 7:40 AM, Jakub Kruzik <[email protected]> wrote:
> Hi, > > I am looking at a single node performance of MUMPS and SuperLU on KNL 7230 > (on Theta). I am using KSP example ex2 (http://www.mcs.anl.gov/petsc/ > petsc-current/src/ksp/ksp/examples/tutorials/ex2.c.html) with m X n = > 2880 x 2880. KNL runs in cache and quad modes. > > Times in seconds for 24 cores: > mumps: 279 > superlu: 326 > cg: 116 > > Times in seconds for 64 cores: > mumps: 316 > superlu: 410 > cg : 49 > > The performance for 24 cores is OK - both direct solvers are roughly 3.5 > times slower than 2x E5-2680v3. (According to people from Intel, the single > core performance of KNL is about 3-4 times lower than that of E5-2680v3). > However, strong scalability is really bad. > > I am using cray-petsc/3.7.6.0 module. I tried my own PETSc compilation > with MKL and MUMPS/SuperLU installed by PETSc configure but the results are > similar. > > Please find attached Theta submission script and logs for KNL and Haswells. > > Why the performance of direct solvers on a full node is so bad? > Admittedly it was for different computations, but we saw strong scaling degradation after 32 cores of KNL in https://arxiv.org/abs/1705.09907, and we also saw strong scaling tail off as the problem size got this small in https://arxiv.org/abs/1705.03625 Thanks, Matt > Best, > Jakub > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/
