Hi all,
I am calculating the multiplication of matrix and vector using fast 
convolution, but this has to be done for many times. Here is a brief framework 
of my code:
for(dim=0; dim<NDOF; dim++){     for(i=0; i<NX; i++)     {         for(j=0; 
j<NY; j++)         {               //compute inpx               //compute inpw  
             //fast convolution          }     }}
The fast convolution needs to compute multiple times within the for loops. The 
dimension of the input vector is 256*256. The most time consuming parts are 
MatMult(), VecPoinstwiseMult() and MatMultTranspose() during fast convolution. 
The optimal number of processors is 2. Further increase of processor numbers 
will reduce the efficiency. In this case, would you please suggest a way to 
improve efficiency and fully make use of parallelization?  Thanks.              
                       

Reply via email to