After profiling our code, we have found that most of the time is spent in MatSolve_SeqAIJ_NaturalOrdering, which upon inspection is just doing simple forward and backward solves of already factored ILU matrices.
We think that we should be able to see improvement by replacing these with optimized versions from Intel MKL (or other optimized BLAS). For example, Intel MKL has these routines: https://software.intel.com/en-us/node/468572 Is it possible to replace the PETSc triangular solves with a more optimized version? Thanks, Randy M.