Barry,

Thanks for the reply.

Here is my attempt.  Do you see anything wrong with it?

#if 1
#include <mkl.h>
  PetscInt n = xin->map->n,i,j;
  const PetscScalar *xbase,*a;
  PetscScalar *b;
  Vec         *yy;
  yy   = (Vec*)y;

  ierr = PetscMalloc(n*nv*sizeof(PetscScalar),&b);CHKERRQ(ierr);
  for(i=0; i<nv; i++) {
     ierr = VecGetArrayRead(yy[i],&a);CHKERRQ(ierr);
     for(j=0; j<n; j++) {
        b[j] = a[j];
     }
     b += n;
     ierr = VecRestoreArrayRead(yy[i],&a);CHKERRQ(ierr);
  }
  ierr = VecGetArrayRead(xin,&xbase);CHKERRQ(ierr);
    cblas_dgemv(CblasRowMajor, CblasNoTrans, nv, n, 1., b, n, xbase, 1, 0., 
work, 1);
  ierr = VecRestoreArrayRead(xin,&xbase);CHKERRQ(ierr);
  for(i=0; i<nv; i++) {
     PetscPrintf(PETSC_COMM_SELF,"VecMDot_MPI: %D %.16f\n",i,work[i]);
  }
  ierr = PetscFree(b);CHKERRQ(ierr);
#else
  ierr = VecMDot_Seq(xin,nv,y,work);CHKERRQ(ierr);
  PetscInt i;
  for(i=0; i<nv; i++) {
     PetscPrintf(PETSC_COMM_SELF,"VecMDot_MPI: %D %.16f\n",i,work[i]);
  }
#endif


Daniel Kokron
NASA Ames (ARC-TN)
SciCon group
________________________________________
From: Smith, Barry F. [[email protected]]
Sent: Wednesday, December 27, 2017 14:58
To: Kokron, Daniel S. (ARC-606.2)[CSRA, LLC]
Cc: [email protected]
Subject: Re: [petsc-users] coding VecMDot_Seq as gemv

> On Dec 27, 2017, at 2:53 PM, Kokron, Daniel S. (ARC-606.2)[CSRA, LLC] 
> <[email protected]> wrote:
>
> I am looking into ways to improve performance of the VecMDot_Seq routine.  I 
> am focusing on the variant that gets called when PETSC_THREADCOMM_ACTIVE and 
> PETSC_USE_FORTRAN_KERNEL_MDOT are NOT defined.
>
> My current version of PETSc is 3.4.5 due solely to user requirement.  I am 
> linking against MKL.
>
> I tried and failed to implement VecMDot_Seq as a call to cblas_dgemv in 
> ~/mpi/pvec2.c
>
> cblas_dgemv(CblasRowMajor, CblasNoTrans, nv, n, 1., b, n, xbase, 1, 0., work, 
> 1);
>
> I could not figure out a way to extract the vectors from 'Vec y[]' and store 
> them as rows of an allocated array.

  Use VecGetArray on each one then do a memcpy of those values into the big 
allocated array (or a loop copy or whatever).
>
> This user post starts off with a similar request (how to construct a matrix 
> from many vectors)
> https://lists.mcs.anl.gov/pipermail/petsc-users/2015-August/026848.html
>
> I understand that this sort of memory shuffling is expensive.  I was just 
> hoping to prove the point to myself that it's possible.

   Yes the memory shuffling is possible and not that difficult, perhaps you 
just have the rows and columns backwards?

>
> The action performed by VecMDot_Seq is the same as matrix-vector 
> multiplication, so I was wondering why it wasn't implemented as a call ?gemv?

   The reason is that we want each of the vectors to be independent of the 
other ones; to use gemv they have be shared in a single common array.

   I doubt that copying into a single array and calling the gemv will ever be 
faster. Better to just try to optimize/vectorize better the current code.


>
> Daniel Kokron
> NASA Ames (ARC-TN)
> SciCon group

Reply via email to