On Nov 13, 2017, at 10:49 PM, Xiangdong 
<[email protected]<mailto:[email protected]>> wrote:

1) How about the vectorization of BAIJ format?

BAIJ kernels are optimized with manual unrolling, but not with AVX intrinsics. 
So the vectorization relies on the compiler's ability.
It may or may not get vectorized depending on the compiler's optimization 
decisions. But vectorization is not essential for the performance of most BAIJ 
kernels.

If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to do 
anything special (more than AVX flag) for the compiler to vectorize it?

In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal for 
AVX512. But other block sizes would make vectorization less profitable because 
of the remainders.

2) Could you please update the linear solver table to label the 
preconditioners/solvers compatible with ELL format?
http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html

This is still in a working progress. The easiest thing to do would be to use 
ELL for the Jacobian matrix and other formats (e.g. AIJ) for the 
preconditioners.
Then you would not need to worry about which preconditioners are compatible. An 
example can be found at 
ts/examples/tutorials/advection-diffusion-reaction/ex5adj.c.
For preconditioners such as block jacobi and mg (with bjacobi or with sor), you 
can use ELL for both the preconditioner and the Jacobian,
and expect a considerable gain since MatMult is the dominating operation.

The makefile for ex5adj includes a few use cases that demonstrate how ELL plays 
with various preconditioners.

Hong (Mr.)

Thank you.

Xiangdong

On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong 
<[email protected]<mailto:[email protected]>> wrote:
Most operations in PETSc would not benefit much from vectorization since they 
are memory-bounded. But this does not discourage you from compiling PETSc with 
AVX2/AVX512. We have added a new matrix format (currently named ELL, but will 
be changed to SELL shortly) that can make MatMult ~2X faster than the AIJ 
format. The MatMult kernel is hand-optimized with AVX intrinsics. It works on 
any Intel processors that support AVX or AVX2 or AVX512, e.g. Haswell, 
Broadwell, Xeon Phi, Skylake. On the other hand, we have been optimizing the 
AIJ MatMult kernel for these architectures as well. And one has to use AVX 
compiler flags in order to take advantage of the optimized kernels and the new 
matrix format.

Hong (Mr.)

> On Nov 12, 2017, at 10:35 PM, Xiangdong 
> <[email protected]<mailto:[email protected]>> wrote:
>
> Hello everyone,
>
> Can someone comment on the vectorization of PETSc? For example, for the 
> MatMult function, will it perform better or run faster if it is compiled with 
> avx2 or avx512?
>
> Thank you.
>
> Best,
> Xiangdong



Reply via email to