Use MKL versions of block formats?
> On Nov 14, 2017, at 4:40 PM, Richard Tran Mills <rtmi...@anl.gov> wrote:
>
> On Tue, Nov 14, 2017 at 12:13 PM, Zhang, Hong <hongzh...@anl.gov> wrote:
>
>
>> On Nov 13, 2017, at 10:49 PM, Xiangdong <epsco...@gmail.com> wrote:
>>
>> 1) How about the vectorization of BAIJ format?
>
> BAIJ kernels are optimized with manual unrolling, but not with AVX
> intrinsics. So the vectorization relies on the compiler's ability.
> It may or may not get vectorized depending on the compiler's optimization
> decisions. But vectorization is not essential for the performance of most
> BAIJ kernels.
>
> I know that this has come up in previous discussions, but I'm guessing that
> the manual unrolling actually impedes the ability of many modern compilers to
> optimize the BAIJ calculations. I suppose we ought to have a switch to enable
> or disable the use of the unrolled versions? (And, further down the road,
> some sort of performance model to tell us what the setting for the switch
> should be...)
>
> --Richard
>
>
>> If the block size s is 2 or 4, would it be ideal for AVXs? Do I need to do
>> anything special (more than AVX flag) for the compiler to vectorize it?
>
> In double precision, 4 would be good for AVX/AVX2, and 8 would be ideal for
> AVX512. But other block sizes would make vectorization less profitable
> because of the remainders.
>
>> 2) Could you please update the linear solver table to label the
>> preconditioners/solvers compatible with ELL format?
>> http://www.mcs.anl.gov/petsc/documentation/linearsolvertable.html
>
> This is still in a working progress. The easiest thing to do would be to use
> ELL for the Jacobian matrix and other formats (e.g. AIJ) for the
> preconditioners.
> Then you would not need to worry about which preconditioners are compatible.
> An example can be found at
> ts/examples/tutorials/advection-diffusion-reaction/ex5adj.c.
> For preconditioners such as block jacobi and mg (with bjacobi or with sor),
> you can use ELL for both the preconditioner and the Jacobian,
> and expect a considerable gain since MatMult is the dominating operation.
>
> The makefile for ex5adj includes a few use cases that demonstrate how ELL
> plays with various preconditioners.
>
> Hong (Mr.)
>
>> Thank you.
>>
>> Xiangdong
>>
>> On Mon, Nov 13, 2017 at 11:32 AM, Zhang, Hong <hongzh...@anl.gov> wrote:
>> Most operations in PETSc would not benefit much from vectorization since
>> they are memory-bounded. But this does not discourage you from compiling
>> PETSc with AVX2/AVX512. We have added a new matrix format (currently named
>> ELL, but will be changed to SELL shortly) that can make MatMult ~2X faster
>> than the AIJ format. The MatMult kernel is hand-optimized with AVX
>> intrinsics. It works on any Intel processors that support AVX or AVX2 or
>> AVX512, e.g. Haswell, Broadwell, Xeon Phi, Skylake. On the other hand, we
>> have been optimizing the AIJ MatMult kernel for these architectures as well.
>> And one has to use AVX compiler flags in order to take advantage of the
>> optimized kernels and the new matrix format.
>>
>> Hong (Mr.)
>>
>> > On Nov 12, 2017, at 10:35 PM, Xiangdong <epsco...@gmail.com> wrote:
>> >
>> > Hello everyone,
>> >
>> > Can someone comment on the vectorization of PETSc? For example, for the
>> > MatMult function, will it perform better or run faster if it is compiled
>> > with avx2 or avx512?
>> >
>> > Thank you.
>> >
>> > Best,
>> > Xiangdong
>>
>>
>
>