[petsc-dev] FW: MKL with PETSc

Dinesh K Kaushik Sat, 21 Dec 2013 19:03:37 -0800

This group of Intel people has been working with me and David for past many 
months.


Do you have any recommendation for the issue they are encountering? Can we use 
the multithreaded routines such as MatILUFactor and MatSolve from MKL (I 
believe for some matrix formats- but at least for BAIJ and AIJ)?

Thanks,

Dinesh


From: <Mudigere>, Dheevatsa 
<[email protected]<mailto:[email protected]>>
To: Dinesh Kaushik 
<[email protected]<mailto:[email protected]>>, Dinesh 
Kaushik <[email protected]<mailto:[email protected]>>
Cc: David E Keyes <[email protected]<mailto:[email protected]>>, 
"Deshpande, Anand M" 
<[email protected]<mailto:[email protected]>>
Subject: MKL with PETSc

Hi Dinesh,

I had a question regarding interfacing MKL routines with PETSC.
Now that we have a fairly optimized flux kernel both on the Xeon and Xeon-Phi, 
we are progressing on to the other key kernels. Among them, the next major 
contributors to the execution time are the - ILU decomposition (called within 
the pre-conditioner once every time step)  and the direct solver (called every 
inner GMRES iteration,  using preconditioned matrix).  Form the initial 
performance profile (below) it can be seen that these two operations together 
contribute to 31% of the sequential execution time on a single node on Xeon and 
on Xeon-phi the contribution of these two operations are ~50%.

As you would already know, the following PETSc routines are used for these 
operations – MatILUFactor and MatSolve. These are higher level interfaces and 
depending on the sparse matrix storage format – AIJ or BAIJ, the more specific 
lower-level MatSymbolic, MatNumeric  and MatSolve routines are used. 
Unfortunately, these PETSc routines are not multi-threaded and can’t leverage 
the available fine grained parallelism. As a first step in optimizing these 
operations, we want to replace these PETSc calls with multi-threaded MKL 
routines. This would give us a good idea of how well these operations scale on 
single node (Xeon and Xeon-Phi) with multiple threads.
So, it’s in this regard that I wanted your help – to know what’s the best way 
to reverse communicate from PETSc to use MKL routines for these operations. For 
now, I have managed to do this by modifying the PETSC functions themselves 
(MatLUFactorNumeric_SeqAIJ_Inode) to use the MKL routines. This is somewhat a 
“dirty hack”, where I am shunting out the actual logic and calling the MKL 
functions instead and I am not taking the all the precautions to maintaining 
compatibility with other functions.  I wanted to check with you if there is a 
better and more systematic way to do this, without having to hack around with 
the PETSc library routines ?
PETSc already supports hypre, suprelu and several other such performance 
libraries, is there also a way to support MKL ?

Your help on this will be greatly appreciated.

Thanks,
Dheevatsa


[cid:[email protected]][cid:[email protected]]

________________________________

This message and its contents including attachments are intended solely for the 
original recipient. If you are not the intended recipient or have received this 
message in error, please notify me immediately and delete this message from 
your computer system. Any unauthorized use or distribution is prohibited. 
Please consider the environment before printing this email.

<<attachment: image002.png>>

<<attachment: image004.png>>

[petsc-dev] FW: MKL with PETSc

Reply via email to