Barry,
Thanks for the reply.
Here is my attempt. Do you see anything wrong with it?
#if 1
#include <mkl.h>
PetscInt n = xin->map->n,i,j;
const PetscScalar *xbase,*a;
PetscScalar *b;
Vec *yy;
yy = (Vec*)y;
ierr = PetscMalloc(n*nv*sizeof(PetscScalar),&b);CHKERRQ(ierr);
for(i=0; i<nv; i++) {
ierr = VecGetArrayRead(yy[i],&a);CHKERRQ(ierr);
for(j=0; j<n; j++) {
b[j] = a[j];
}
b += n;
ierr = VecRestoreArrayRead(yy[i],&a);CHKERRQ(ierr);
}
ierr = VecGetArrayRead(xin,&xbase);CHKERRQ(ierr);
cblas_dgemv(CblasRowMajor, CblasNoTrans, nv, n, 1., b, n, xbase, 1, 0.,
work, 1);
ierr = VecRestoreArrayRead(xin,&xbase);CHKERRQ(ierr);
for(i=0; i<nv; i++) {
PetscPrintf(PETSC_COMM_SELF,"VecMDot_MPI: %D %.16f\n",i,work[i]);
}
ierr = PetscFree(b);CHKERRQ(ierr);
#else
ierr = VecMDot_Seq(xin,nv,y,work);CHKERRQ(ierr);
PetscInt i;
for(i=0; i<nv; i++) {
PetscPrintf(PETSC_COMM_SELF,"VecMDot_MPI: %D %.16f\n",i,work[i]);
}
#endif
Daniel Kokron
NASA Ames (ARC-TN)
SciCon group
________________________________________
From: Smith, Barry F. [[email protected]]
Sent: Wednesday, December 27, 2017 14:58
To: Kokron, Daniel S. (ARC-606.2)[CSRA, LLC]
Cc: [email protected]
Subject: Re: [petsc-users] coding VecMDot_Seq as gemv
> On Dec 27, 2017, at 2:53 PM, Kokron, Daniel S. (ARC-606.2)[CSRA, LLC]
> <[email protected]> wrote:
>
> I am looking into ways to improve performance of the VecMDot_Seq routine. I
> am focusing on the variant that gets called when PETSC_THREADCOMM_ACTIVE and
> PETSC_USE_FORTRAN_KERNEL_MDOT are NOT defined.
>
> My current version of PETSc is 3.4.5 due solely to user requirement. I am
> linking against MKL.
>
> I tried and failed to implement VecMDot_Seq as a call to cblas_dgemv in
> ~/mpi/pvec2.c
>
> cblas_dgemv(CblasRowMajor, CblasNoTrans, nv, n, 1., b, n, xbase, 1, 0., work,
> 1);
>
> I could not figure out a way to extract the vectors from 'Vec y[]' and store
> them as rows of an allocated array.
Use VecGetArray on each one then do a memcpy of those values into the big
allocated array (or a loop copy or whatever).
>
> This user post starts off with a similar request (how to construct a matrix
> from many vectors)
> https://lists.mcs.anl.gov/pipermail/petsc-users/2015-August/026848.html
>
> I understand that this sort of memory shuffling is expensive. I was just
> hoping to prove the point to myself that it's possible.
Yes the memory shuffling is possible and not that difficult, perhaps you
just have the rows and columns backwards?
>
> The action performed by VecMDot_Seq is the same as matrix-vector
> multiplication, so I was wondering why it wasn't implemented as a call ?gemv?
The reason is that we want each of the vectors to be independent of the
other ones; to use gemv they have be shared in a single common array.
I doubt that copying into a single array and calling the gemv will ever be
faster. Better to just try to optimize/vectorize better the current code.
>
> Daniel Kokron
> NASA Ames (ARC-TN)
> SciCon group