Hi Barry,
Thanks for your email. sorry i did not make it clear. Here is a more detailed
one:
int dim, i, j;
int NDOF=3, NX=5, NY=5;
for(dim=0; dim<NDOF; dim++)
{
for(i=0; i<NX; i++)
{
for(j=0; j<NY; j++)
{
//compute inpx
Set values for vec inpx, which has a dimension of 256*256
//compute inpw
Set values for vec inpw, which has a dimension of 256*256
//fast convolution
i am following ex158 in src/mat/examples using the petsc and
fftw interface, the mat is created using MatCreateFFT()
}
}
}
The values of inpx and inpw are changing with the indices dim, i and j, but the
lengths are the same all the time and the convolution can be calculated
separately. I am thinking about two options:
option1: using MPI to do the fast convolution for each inpx and inpw
simulataneously, i.e. , do NDOF*NX*NY convolutions in parallel
option2: in convolution, define an extended matrix and vector to store all the
values from the NDOF*NX*NY convolutions, and do MatMult(), VecPointwiseMult(),
MatMultTranpose() on the extended objects at the same time.
I would very much appreciate your comments. Thanks.
> Subject: Re: [petsc-users] efficiency of parallel convolution
> From: [email protected]
> Date: Wed, 6 Aug 2014 10:13:34 -0500
> CC: [email protected]
> To: [email protected]
>
>
> It is difficult to understand what you are doing here. What is dim? What is
> NX and NY? Is the length of inpx and inpw 256*256 ? Are you using a PETSc
> Mat like AIJ to apply the “fast convolution” or some custom MATSHELL? Is the
> “fast convolution” the same for each dim, i and j or is it different ?
>
> Barry
>
> On Aug 5, 2014, at 1:24 AM, LikunTan <[email protected]> wrote:
>
> > Hi all,
> >
> > I am calculating the multiplication of matrix and vector using fast
> > convolution, but this has to be done for many times. Here is a brief
> > framework of my code:
> >
> > for(dim=0; dim<NDOF; dim++)
> > {
> > for(i=0; i<NX; i++)
> > {
> > for(j=0; j<NY; j++)
> > {
> > //compute inpx
> > //compute inpw
> > //fast convolution
> > }
> > }
> > }
> >
> > The fast convolution needs to compute multiple times within the for loops.
> > The dimension of the input vector is 256*256. The most time consuming parts
> > are MatMult(), VecPoinstwiseMult() and MatMultTranspose() during fast
> > convolution. The optimal number of processors is 2. Further increase of
> > processor numbers will reduce the efficiency. In this case, would you
> > please suggest a way to improve efficiency and fully make use of
> > parallelization? Thanks.
>