Use MKL versions of block formats?

> On Nov 14, 2017, at 4:40 PM, Richard Tran Mills <rtmi...@anl.gov> wrote:
> 
> On Tue, Nov 14, 2017 at 12:13 PM, Zhang, Hong <hongzh...@anl.gov> wrote:
> 
> 
>> On Nov 13, 2017, at 10:49 PM, Xiangdong <epsco...@gmail.com> wrote:
>> 
>> 1) How about the vectorization of BAIJ format?
> 
> BAIJ kernels are optimized with manual unrolling, but not with AVX 
> intrinsics. So the vectorization relies on the compiler's ability.
> It may or may not get vectorized depending on the compiler's optimization 
> decisions. But vectorization is not essential for the performance of most 
> BAIJ kernels.
> 
> I know that this has come up in previous discussions, but I'm guessing that 
> the manual unrolling actually impedes the ability of many modern compilers to 
> optimize the BAIJ calculations. I suppose we ought to have a switch to enable 
> or disable the use of the unrolled versions? (And, further down the road, 
> some sort of performance model to tell us what the setting for the switch 
> should be...)
> 
> --Richard
> 
> 
>> If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to do 
>> anything special (more than AVX flag) for the compiler to vectorize it?
> 
> In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal for 
> AVX512. But other block sizes would make vectorization less profitable 
> because of the remainders.
> 
>> 2) Could you please update the linear solver table to label the 
>> preconditioners/solvers compatible with ELL format?
>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html
> 
> This is still in a working progress. The easiest thing to do would be to use 
> ELL for the Jacobian matrix and other formats (e.g. AIJ) for the 
> preconditioners.
> Then you would not need to worry about which preconditioners are compatible. 
> An example can be found at 
> ts/examples/tutorials/advection-diffusion-reaction/ex5adj.c.
> For preconditioners such as block jacobi and mg (with bjacobi or with sor), 
> you can use ELL for both the preconditioner and the Jacobian,
> and expect a considerable gain since MatMult is the dominating operation.
> 
> The makefile for ex5adj includes a few use cases that demonstrate how ELL 
> plays with various preconditioners.
> 
> Hong (Mr.)
> 
>> Thank you.
>> 
>> Xiangdong
>> 
>> On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong <hongzh...@anl.gov> wrote:
>> Most operations in PETSc would not benefit much from vectorization since 
>> they are memory-bounded. But this does not discourage you from compiling 
>> PETSc with AVX2/AVX512. We have added a new matrix format (currently named 
>> ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster 
>> than the AIJ format. The MatMult kernel is hand-optimized with AVX 
>> intrinsics. It works on any Intel processors that support AVX or AVX2 or 
>> AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we 
>> have been optimizing the AIJ MatMult kernel for these architectures as well. 
>> And one has to use AVX compiler flags in order to take advantage of the 
>> optimized kernels and the new matrix format.
>> 
>> Hong (Mr.)
>> 
>> > On Nov 12, 2017, at 10:35 PM, Xiangdong <epsco...@gmail.com> wrote:
>> >
>> > Hello everyone,
>> >
>> > Can someone comment on the vectorization of PETSc? For example, for the 
>> > MatMult function, will it perform better or run faster if it is compiled 
>> > with avx2 or avx512?
>> >
>> > Thank you.
>> >
>> > Best,
>> > Xiangdong
>> 
>> 
> 
> 

Reply via email to