Dinesh,
It would be better if they just email us directly, if they are afraid of
public review they can use [email protected] since no public records of
kept of those.
Barry
On Dec 21, 2013, at 9:02 PM, Dinesh K Kaushik <[email protected]>
wrote:
> This group of Intel people has been working with me and David for past many
> months.
>
> Do you have any recommendation for the issue they are encountering? Can we
> use the multithreaded routines such as MatILUFactor and MatSolve from MKL (I
> believe for some matrix formats- but at least for BAIJ and AIJ)?
>
> Thanks,
>
> Dinesh
>
>
> From: <Mudigere>, Dheevatsa <[email protected]>
> To: Dinesh Kaushik <[email protected]>, Dinesh Kaushik
> <[email protected]>
> Cc: David E Keyes <[email protected]>, "Deshpande, Anand M"
> <[email protected]>
> Subject: MKL with PETSc
>
> Hi Dinesh,
>
> I had a question regarding interfacing MKL routines with PETSC.
> Now that we have a fairly optimized flux kernel both on the Xeon and
> Xeon-Phi, we are progressing on to the other key kernels. Among them, the
> next major contributors to the execution time are the - ILU decomposition
> (called within the pre-conditioner once every time step) and the direct
> solver (called every inner GMRES iteration, using preconditioned matrix).
> Form the initial performance profile (below) it can be seen that these two
> operations together contribute to 31% of the sequential execution time on a
> single node on Xeon and on Xeon-phi the contribution of these two operations
> are ~50%.
>
> As you would already know, the following PETSc routines are used for these
> operations – MatILUFactor and MatSolve. These are higher level interfaces and
> depending on the sparse matrix storage format – AIJ or BAIJ, the more
> specific lower-level MatSymbolic, MatNumeric and MatSolve routines are used.
> Unfortunately, these PETSc routines are not multi-threaded and can’t leverage
> the available fine grained parallelism. As a first step in optimizing these
> operations, we want to replace these PETSc calls with multi-threaded MKL
> routines. This would give us a good idea of how well these operations scale
> on single node (Xeon and Xeon-Phi) with multiple threads.
> So, it’s in this regard that I wanted your help – to know what’s the best way
> to reverse communicate from PETSc to use MKL routines for these operations.
> For now, I have managed to do this by modifying the PETSC functions
> themselves (MatLUFactorNumeric_SeqAIJ_Inode) to use the MKL routines. This is
> somewhat a “dirty hack”, where I am shunting out the actual logic and calling
> the MKL functions instead and I am not taking the all the precautions to
> maintaining compatibility with other functions. I wanted to check with you
> if there is a better and more systematic way to do this, without having to
> hack around with the PETSc library routines ?
> PETSc already supports hypre, suprelu and several other such performance
> libraries, is there also a way to support MKL ?
>
> Your help on this will be greatly appreciated.
>
> Thanks,
> Dheevatsa
>
>
> <image002.png><image004.png>
>
>
> This message and its contents including attachments are intended solely for
> the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete this
> message from your computer system. Any unauthorized use or distribution is
> prohibited. Please consider the environment before printing this email.
> <image002.png><image004.png>