"Jin, Shuangshuang" <[email protected]> writes:

>   
> ////////////////////////////////////////////////////////////////////////////////////////
>   // This proves to be the most time-consuming block in the computation:
>   // Assign values to J matrix for the first 2*n rows (constant values)
>   ... (skipped)
>
>   // Assign values to J matrix for the following 2*n rows (depends on X 
> values)
>   for (i = 0; i < n; i++) {
>     for (j = 0; j < n; j++) {
>        ...(skipped)

This is a dense iteration.  Are the entries really mostly nonzero?  Why
is your i loop over all rows instead of only over xstart to xstart+xlen?

>   }
>   
> ////////////////////////////////////////////////////////////////////////////////////////
>
>   for (i = 0; i < 4*n; i++) {
>     rowcol[i] = i;
>   }
>
>   // Compute function over the locally owned part of the grid
>   for (i = xstart; i < xstart+xlen; i++) {
>     ierr = MatSetValues(*B, 1, &i, 4*n, rowcol, &J[i][0], INSERT_VALUES); 
> CHKERRQ(ierr); 

This is seems to be creating a distributed dense matrix from a dense
matrix J of the global dimension.  Is that correct?  You need to
_distribute_ the work of computing the matrix entries if you want to see
a speedup.

Attachment: pgpPfhN6hFGmD.pgp
Description: PGP signature

Reply via email to