It is difficult to understand what you are doing here. What is dim? What is NX and NY? Is the length of inpx and inpw 256*256 ? Are you using a PETSc Mat like AIJ to apply the “fast convolution” or some custom MATSHELL? Is the “fast convolution” the same for each dim, i and j or is it different ?
Barry On Aug 5, 2014, at 1:24 AM, LikunTan <[email protected]> wrote: > Hi all, > > I am calculating the multiplication of matrix and vector using fast > convolution, but this has to be done for many times. Here is a brief > framework of my code: > > for(dim=0; dim<NDOF; dim++) > { > for(i=0; i<NX; i++) > { > for(j=0; j<NY; j++) > { > //compute inpx > //compute inpw > //fast convolution > } > } > } > > The fast convolution needs to compute multiple times within the for loops. > The dimension of the input vector is 256*256. The most time consuming parts > are MatMult(), VecPoinstwiseMult() and MatMultTranspose() during fast > convolution. The optimal number of processors is 2. Further increase of > processor numbers will reduce the efficiency. In this case, would you please > suggest a way to improve efficiency and fully make use of parallelization? > Thanks.
