I am looking into ways to improve performance of the VecMDot_Seq routine. I am focusing on the variant that gets called when PETSC_THREADCOMM_ACTIVE and PETSC_USE_FORTRAN_KERNEL_MDOT are NOT defined.
My current version of PETSc is 3.4.5 due solely to user requirement. I am linking against MKL. I tried and failed to implement VecMDot_Seq as a call to cblas_dgemv in ~/mpi/pvec2.c cblas_dgemv(CblasRowMajor, CblasNoTrans, nv, n, 1., b, n, xbase, 1, 0., work, 1); I could not figure out a way to extract the vectors from 'Vec y[]' and store them as rows of an allocated array. This user post starts off with a similar request (how to construct a matrix from many vectors) https://lists.mcs.anl.gov/pipermail/petsc-users/2015-August/026848.html I understand that this sort of memory shuffling is expensive. I was just hoping to prove the point to myself that it's possible. The action performed by VecMDot_Seq is the same as matrix-vector multiplication, so I was wondering why it wasn't implemented as a call ?gemv? Daniel Kokron NASA Ames (ARC-TN) SciCon group
