Hi Barry,

Thanks for your email. sorry i did not make it clear. Here is a more detailed 
one:

int dim, i, j;
int NDOF=3,  NX=5, NY=5;

 for(dim=0; dim<NDOF; dim++)
{
      for(i=0; i<NX; i++)
      {
          for(j=0; j<NY; j++)
          {
                //compute inpx
                 Set values for vec inpx, which has a dimension of 256*256
                //compute inpw
                Set values for vec inpw, which has a dimension of 256*256
                //fast convolution
               i am following ex158 in src/mat/examples using the petsc and 
fftw interface, the mat is created using        MatCreateFFT()
         }
     }
}

The values of inpx and inpw are changing with the indices dim, i and j, but the 
lengths are the same all the time and the convolution can be calculated 
separately. I am thinking about two options:
option1: using MPI to do the fast convolution for each inpx and inpw 
simulataneously, i.e. , do NDOF*NX*NY convolutions in parallel
option2: in convolution, define an extended matrix and vector to store all the 
values from the NDOF*NX*NY convolutions, and do MatMult(), VecPointwiseMult(), 
MatMultTranpose() on the extended objects at the same time.

I would very much appreciate your comments. Thanks.



> Subject: Re: [petsc-users] efficiency of parallel convolution
> From: [email protected]
> Date: Wed, 6 Aug 2014 10:13:34 -0500
> CC: [email protected]
> To: [email protected]
> 
> 
>   It is difficult to understand what you are doing here. What is dim? What is 
> NX and NY?   Is the length of inpx and inpw 256*256 ?  Are you using a PETSc 
> Mat like AIJ to apply the “fast convolution” or some custom MATSHELL?  Is the 
> “fast convolution” the same for each dim, i and j or is it different ?
> 
>   Barry
> 
> On Aug 5, 2014, at 1:24 AM, LikunTan <[email protected]> wrote:
> 
> > Hi all,
> > 
> > I am calculating the multiplication of matrix and vector using fast 
> > convolution, but this has to be done for many times. Here is a brief 
> > framework of my code:
> > 
> > for(dim=0; dim<NDOF; dim++)
> > {
> >      for(i=0; i<NX; i++)
> >      {
> >          for(j=0; j<NY; j++)
> >          {
> >                //compute inpx
> >                //compute inpw
> >                //fast convolution
> >           }
> >      }
> > }
> > 
> > The fast convolution needs to compute multiple times within the for loops. 
> > The dimension of the input vector is 256*256. The most time consuming parts 
> > are MatMult(), VecPoinstwiseMult() and MatMultTranspose() during fast 
> > convolution. The optimal number of processors is 2. Further increase of 
> > processor numbers will reduce the efficiency. In this case, would you 
> > please suggest a way to improve efficiency and fully make use of 
> > parallelization?  Thanks.
> 
                                          

Reply via email to