"Jin, Shuangshuang" <[email protected]> writes: > > //////////////////////////////////////////////////////////////////////////////////////// > // This proves to be the most time-consuming block in the computation: > // Assign values to J matrix for the first 2*n rows (constant values) > ... (skipped) > > // Assign values to J matrix for the following 2*n rows (depends on X > values) > for (i = 0; i < n; i++) { > for (j = 0; j < n; j++) { > ...(skipped)
This is a dense iteration. Are the entries really mostly nonzero? Why
is your i loop over all rows instead of only over xstart to xstart+xlen?
> }
>
> ////////////////////////////////////////////////////////////////////////////////////////
>
> for (i = 0; i < 4*n; i++) {
> rowcol[i] = i;
> }
>
> // Compute function over the locally owned part of the grid
> for (i = xstart; i < xstart+xlen; i++) {
> ierr = MatSetValues(*B, 1, &i, 4*n, rowcol, &J[i][0], INSERT_VALUES);
> CHKERRQ(ierr);
This is seems to be creating a distributed dense matrix from a dense
matrix J of the global dimension. Is that correct? You need to
_distribute_ the work of computing the matrix entries if you want to see
a speedup.
pgpPfhN6hFGmD.pgp
Description: PGP signature
