It is difficult to understand what you are doing here. What is dim? What is 
NX and NY?   Is the length of inpx and inpw 256*256 ?  Are you using a PETSc 
Mat like AIJ to apply the “fast convolution” or some custom MATSHELL?  Is the 
“fast convolution” the same for each dim, i and j or is it different ?

  Barry

On Aug 5, 2014, at 1:24 AM, LikunTan <[email protected]> wrote:

> Hi all,
> 
> I am calculating the multiplication of matrix and vector using fast 
> convolution, but this has to be done for many times. Here is a brief 
> framework of my code:
> 
> for(dim=0; dim<NDOF; dim++)
> {
>      for(i=0; i<NX; i++)
>      {
>          for(j=0; j<NY; j++)
>          {
>                //compute inpx
>                //compute inpw
>                //fast convolution
>           }
>      }
> }
> 
> The fast convolution needs to compute multiple times within the for loops. 
> The dimension of the input vector is 256*256. The most time consuming parts 
> are MatMult(), VecPoinstwiseMult() and MatMultTranspose() during fast 
> convolution. The optimal number of processors is 2. Further increase of 
> processor numbers will reduce the efficiency. In this case, would you please 
> suggest a way to improve efficiency and fully make use of parallelization?  
> Thanks.

Reply via email to