Dinesh,

   It would be better if they just email us directly, if they are afraid of 
public review they can use [email protected] since no public records of 
kept of those.

    Barry

On Dec 21, 2013, at 9:02 PM, Dinesh K Kaushik <[email protected]> 
wrote:

> This group of Intel people has been working with me and David for past many 
> months.
> 
> Do you have any recommendation for the issue they are encountering? Can we 
> use the multithreaded routines such as MatILUFactor and MatSolve from MKL (I 
> believe for some matrix formats- but at least for BAIJ and AIJ)?
> 
> Thanks,
> 
> Dinesh
> 
> 
> From: <Mudigere>, Dheevatsa <[email protected]>
> To: Dinesh Kaushik <[email protected]>, Dinesh Kaushik 
> <[email protected]>
> Cc: David E Keyes <[email protected]>, "Deshpande, Anand M" 
> <[email protected]>
> Subject: MKL with PETSc
> 
> Hi Dinesh,
>  
> I had a question regarding interfacing MKL routines with PETSC.
> Now that we have a fairly optimized flux kernel both on the Xeon and 
> Xeon-Phi, we are progressing on to the other key kernels. Among them, the 
> next major contributors to the execution time are the - ILU decomposition 
> (called within the pre-conditioner once every time step)  and the direct 
> solver (called every inner GMRES iteration,  using preconditioned matrix).  
> Form the initial performance profile (below) it can be seen that these two 
> operations together contribute to 31% of the sequential execution time on a 
> single node on Xeon and on Xeon-phi the contribution of these two operations 
> are ~50%.
>  
> As you would already know, the following PETSc routines are used for these 
> operations – MatILUFactor and MatSolve. These are higher level interfaces and 
> depending on the sparse matrix storage format – AIJ or BAIJ, the more 
> specific lower-level MatSymbolic, MatNumeric  and MatSolve routines are used. 
> Unfortunately, these PETSc routines are not multi-threaded and can’t leverage 
> the available fine grained parallelism. As a first step in optimizing these 
> operations, we want to replace these PETSc calls with multi-threaded MKL 
> routines. This would give us a good idea of how well these operations scale 
> on single node (Xeon and Xeon-Phi) with multiple threads.
> So, it’s in this regard that I wanted your help – to know what’s the best way 
> to reverse communicate from PETSc to use MKL routines for these operations. 
> For now, I have managed to do this by modifying the PETSC functions 
> themselves (MatLUFactorNumeric_SeqAIJ_Inode) to use the MKL routines. This is 
> somewhat a “dirty hack”, where I am shunting out the actual logic and calling 
> the MKL functions instead and I am not taking the all the precautions to 
> maintaining compatibility with other functions.  I wanted to check with you 
> if there is a better and more systematic way to do this, without having to 
> hack around with the PETSc library routines ?
> PETSc already supports hypre, suprelu and several other such performance 
> libraries, is there also a way to support MKL ?
>  
> Your help on this will be greatly appreciated.
>  
> Thanks,
> Dheevatsa
>  
>  
> <image002.png><image004.png>
> 
> 
> This message and its contents including attachments are intended solely for 
> the original recipient. If you are not the intended recipient or have 
> received this message in error, please notify me immediately and delete this 
> message from your computer system. Any unauthorized use or distribution is 
> prohibited. Please consider the environment before printing this email.
> <image002.png><image004.png>

Reply via email to