Yes, that's worth a try. Xiangdong, if you want to employ the MKL implementations for BAIJ MatMult() and friends, you can do so by configuring petsc-master with a recent version of MKL and then using the option "-mat_type baijmkl" (on the command line or set in your PETSC_OPTIONS environment variable).
Note that the above requires a version of MKL that is recent enough to have the sparse inspector-executor routines. MKL is now free, so I recommend installing the latest version. (You can also try using the sparse MKL routines with AIJ format matrices by using either "-mat_type aijmkl" or "-mat_seqaij_type seqaijmkl". This will use MKL for MatMult()-type operations and some sparse matrix-matrix products.) Best regards, Richard On Tue, Nov 14, 2017 at 2:42 PM, Smith, Barry F. <[email protected]> wrote: > > Use MKL versions of block formats? > > > On Nov 14, 2017, at 4:40 PM, Richard Tran Mills <[email protected]> wrote: > > > > On Tue, Nov 14, 2017 at 12:13 PM, Zhang, Hong <[email protected]> wrote: > > > > > >> On Nov 13, 2017, at 10:49 PM, Xiangdong <[email protected]> wrote: > >> > >> 1) How about the vectorization of BAIJ format? > > > > BAIJ kernels are optimized with manual unrolling, but not with AVX > intrinsics. So the vectorization relies on the compiler's ability. > > It may or may not get vectorized depending on the compiler's > optimization decisions. But vectorization is not essential for the > performance of most BAIJ kernels. > > > > I know that this has come up in previous discussions, but I'm guessing > that the manual unrolling actually impedes the ability of many modern > compilers to optimize the BAIJ calculations. I suppose we ought to have a > switch to enable or disable the use of the unrolled versions? (And, further > down the road, some sort of performance model to tell us what the setting > for the switch should be...) > > > > --Richard > > > > > >> If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to > do anything special (more than AVX flag) for the compiler to vectorize it? > > > > In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal > for AVX512. But other block sizes would make vectorization less profitable > because of the remainders. > > > >> 2) Could you please update the linear solver table to label the > preconditioners/solvers compatible with ELL format? > >> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html > > > > This is still in a working progress. The easiest thing to do would be to > use ELL for the Jacobian matrix and other formats (e.g. AIJ) for the > preconditioners. > > Then you would not need to worry about which preconditioners are > compatible. An example can be found at ts/examples/tutorials/ > advection-diffusion-reaction/ex5adj.c. > > For preconditioners such as block jacobi and mg (with bjacobi or with > sor), you can use ELL for both the preconditioner and the Jacobian, > > and expect a considerable gain since MatMult is the dominating operation. > > > > The makefile for ex5adj includes a few use cases that demonstrate how > ELL plays with various preconditioners. > > > > Hong (Mr.) > > > >> Thank you. > >> > >> Xiangdong > >> > >> On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong <[email protected]> > wrote: > >> Most operations in PETSc would not benefit much from vectorization > since they are memory-bounded. But this does not discourage you from > compiling PETSc with AVX2/AVX512. We have added a new matrix format > (currently named ELL, but will be changed to SELL shortly) that can make > MatMult ~2X faster than the AIJ format. The MatMult kernel is > hand-optimized with AVX intrinsics. It works on any Intel processors that > support AVX or AVX2 or AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. > On the other hand, we have been optimizing the AIJ MatMult kernel for these > architectures as well. And one has to use AVX compiler flags in order to > take advantage of the optimized kernels and the new matrix format. > >> > >> Hong (Mr.) > >> > >> > On Nov 12, 2017, at 10:35 PM, Xiangdong <[email protected]> wrote: > >> > > >> > Hello everyone, > >> > > >> > Can someone comment on the vectorization of PETSc? For example, for > the MatMult function, will it perform better or run faster if it is > compiled with avx2 or avx512? > >> > > >> > Thank you. > >> > > >> > Best, > >> > Xiangdong > >> > >> > > > > > >
