I believe you very likely should be doing variants of both option1 and 
option2 at the same time.  That is use several processors each computing some 
of the transforms but use the FFTW      "advanced interface computes transforms 
of multiple or strided arrays.” feature of FFTW to have FFTW do several at the 
same time http://www.fftw.org/fftw3.pdf.

   This means you will have to call fftw advanced functions yourself to set up 
the plan to multiple transforms together instead of using the PETSc MatMult() 
but I believe it will be likely much faster then doing them one at a time.

   Barry


On Aug 6, 2014, at 1:39 PM, LikunTan <[email protected]> wrote:

> Hi Barry,
> 
> Thanks for your email. sorry i did not make it clear. Here is a more detailed 
> one:
> 
> int dim, i, j;
> int NDOF=3,  NX=5, NY=5;
> 
> for(dim=0; dim<NDOF; dim++)
> {
>       for(i=0; i<NX; i++)
>       {
>           for(j=0; j<NY; j++)
>           {
>                 //compute inpx
>                  Set values for vec inpx, which has a dimension of 256*256
>                 //compute inpw
>                 Set values for vec inpw, which has a dimension of 256*256
>                 //fast convolution
>                i am following ex158 in src/mat/examples using the petsc and 
> fftw interface, the mat is created using        MatCreateFFT()
>          }
>      }
> }
> 
> The values of inpx and inpw are changing with the indices dim, i and j, but 
> the lengths are the same all the time and the convolution can be calculated 
> separately. I am thinking about two options:
> option1: using MPI to do the fast convolution for each inpx and inpw 
> simulataneously, i.e. , do NDOF*NX*NY convolutions in parallel
> option2: in convolution, define an extended matrix and vector to store all 
> the values from the NDOF*NX*NY convolutions, and do MatMult(), 
> VecPointwiseMult(), MatMultTranpose() on the extended objects at the same 
> time.
> 
> I would very much appreciate your comments. Thanks.
> 
> 
> 
> > Subject: Re: [petsc-users] efficiency of parallel convolution
> > From: [email protected]
> > Date: Wed, 6 Aug 2014 10:13:34 -0500
> > CC: [email protected]
> > To: [email protected]
> > 
> > 
> > It is difficult to understand what you are doing here. What is dim? What is 
> > NX and NY? Is the length of inpx and inpw 256*256 ? Are you using a PETSc 
> > Mat like AIJ to apply the “fast convolution” or some custom MATSHELL? Is 
> > the “fast convolution” the same for each dim, i and j or is it different ?
> > 
> > Barry
> > 
> > On Aug 5, 2014, at 1:24 AM, LikunTan <[email protected]> wrote:
> > 
> > > Hi all,
> > > 
> > > I am calculating the multiplication of matrix and vector using fast 
> > > convolution, but this has to be done for many times. Here is a brief 
> > > framework of my code:
> > > 
> > > for(dim=0; dim<NDOF; dim++)
> > > {
> > > for(i=0; i<NX; i++)
> > > {
> > > for(j=0; j<NY; j++)
> > > {
> > > //compute inpx
> > > //compute inpw
> > > //fast convolution
> > > }
> > > }
> > > }
> > > 
> > > The fast convolution needs to compute multiple times within the for 
> > > loops. The dimension of the input vector is 256*256. The most time 
> > > consuming parts are MatMult(), VecPoinstwiseMult() and MatMultTranspose() 
> > > during fast convolution. The optimal number of processors is 2. Further 
> > > increase of processor numbers will reduce the efficiency. In this case, 
> > > would you please suggest a way to improve efficiency and fully make use 
> > > of parallelization? Thanks.
> >

Reply via email to