Are you calling VecSetValues() for the vector entries?

    You can run with -info save the results in a file and search for the word 
stash in the file to see how much of the vector and matrix entries are being 
communicated between processes. If this number is very high then that is a 
problem. You can send the output to us if you like also.

   Barry

On Aug 30, 2013, at 5:50 PM, "Jin, Shuangshuang" <[email protected]> 
wrote:

> I'm sorry I made a wrong statement in the last email. My f functions in 
> IFunction are also distributed formulated already. And the 24 seconds each 
> for Fucntion and Jacobian EVAL are already based on this implementation. What 
> else I can do?
> 
> Thanks,
> Shuangshuang
> 
> -----Original Message-----
> From: Barry Smith [mailto:[email protected]] 
> Sent: Friday, August 30, 2013 3:48 PM
> To: Jin, Shuangshuang
> Cc: Jed Brown; Shri ([email protected]); [email protected]
> Subject: Re: [petsc-users] Performance of PETSc TS solver
> 
> 
>   I would next parallelize the function evaluation since it is the single 
> largest consumer of time and should presumably be faster in parallel. After 
> that revisit the -log_summary again to decide if the Jacobian evaluation can 
> be improved.
> 
>   Barry
> 
> On Aug 30, 2013, at 5:28 PM, "Jin, Shuangshuang" <[email protected]> 
> wrote:
> 
>> Hello, I'm trying to update some of my status here. I just managed to" 
>> _distribute_ the work of computing the Jacobian matrix" as you suggested, so 
>> each processor only computes a part of elements for the Jacobian matrix 
>> instead of a global Jacobian matrix. I observed a reduction of the 
>> computation time from 351 seconds to 55 seconds, which is much better but 
>> still slower than I expected given the problem size is small. (4n functions 
>> in IFunction, and 4n*4n Jacobian matrix in IJacobian, n = 288).
>> 
>> I looked at the log profile again, and saw that most of the computation time 
>> are still for Functioan Eval and Jacobian Eval:
>> 
>> TSStep               600 1.0 5.6103e+01 1.0 9.42e+0825.6 3.0e+06 2.9e+02 
>> 7.0e+04 93100 99 99 92 152100 99 99110   279
>> TSFunctionEval      2996 1.0 2.9608e+01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 3.0e+04 30  0  0  0 39  50  0  0  0 47     0
>> TSJacobianEval      1796 1.0 2.3436e+01 1.0 0.00e+00 0.0 5.4e+02 3.8e+01 
>> 1.3e+04 39  0  0  0 16  64  0  0  0 20     0
>> Warning -- total time of even greater than time of entire stage -- something 
>> is wrong with the timer
>> SNESSolve            600 1.0 5.5692e+01 1.1 9.42e+0825.7 3.0e+06 2.9e+02 
>> 6.4e+04 88100 99 99 84 144100 99 99101   281
>> SNESFunctionEval    2396 1.0 2.3715e+01 3.4 1.04e+06 1.0 0.0e+00 0.0e+00 
>> 2.4e+04 25  0  0  0 31  41  0  0  0 38     1
>> SNESJacobianEval    1796 1.0 2.3447e+01 1.0 0.00e+00 0.0 5.4e+02 3.8e+01 
>> 1.3e+04 39  0  0  0 16  64  0  0  0 20     0
>> SNESLineSearch      1796 1.0 1.8313e+01 1.0 1.54e+0831.4 4.9e+05 2.9e+02 
>> 2.5e+04 30 16 16 16 33  50 16 16 16 39   139
>> KSPGMRESOrthog      9090 1.0 1.1399e+00 4.1 1.60e+07 1.0 0.0e+00 0.0e+00 
>> 9.1e+03  1  3  0  0 12   2  3  0  0 14   450
>> KSPSetUp            3592 1.0 2.8342e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
>> 3.0e+01  0  0  0  0  0   0  0  0  0  0     0
>> KSPSolve            1796 1.0 2.3052e+00 1.0 7.87e+0825.2 2.5e+06 2.9e+02 
>> 2.0e+04  4 84 83 83 26   6 84 83 83 31  5680
>> PCSetUp             3592 1.0 9.1255e-02 1.7 6.47e+05 2.5 0.0e+00 0.0e+00 
>> 1.8e+01  0  0  0  0  0   0  0  0  0  0   159
>> PCSetUpOnBlocks     1796 1.0 6.6802e-02 2.3 6.47e+05 2.5 0.0e+00 0.0e+00 
>> 1.2e+01  0  0  0  0  0   0  0  0  0  0   217
>> PCApply            10886 1.0 2.6064e-01 1.3 4.70e+06 1.5 0.0e+00 0.0e+00 
>> 0.0e+00  0  1  0  0  0   1  1  0  0  0   481
>> 
>> I was wondering why SNESFunctionEval and SNESJacobianEval took over 23 
>> seconds each, however, the KSPSolve only took 2.3 seconds, which is 10 times 
>> faster. Is this normal? Do you have any more suggestion on how to reduce the 
>> FunctionEval and JacobianEval time?
>> (Currently in IFunction, my f function is sequentially formulated; in 
>> IJacobian, the Jacobian matrix is distributed formulated).
>> 
>> Thanks,
>> Shuangshuang
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Jed Brown [mailto:[email protected]] On Behalf Of Jed Brown
>> Sent: Friday, August 16, 2013 5:00 PM
>> To: Jin, Shuangshuang; Barry Smith; Shri ([email protected])
>> Cc: [email protected]
>> Subject: RE: [petsc-users] Performance of PETSc TS solver
>> 
>> "Jin, Shuangshuang" <[email protected]> writes:
>> 
>>> 
>>> /////////////////////////////////////////////////////////////////////
>>> ///////////////////  // This proves to be the most time-consuming 
>>> block in the computation:
>>> // Assign values to J matrix for the first 2*n rows (constant 
>>> values)  ... (skipped)
>>> 
>>> // Assign values to J matrix for the following 2*n rows (depends on 
>>> X values)  for (i = 0; i < n; i++) {
>>>   for (j = 0; j < n; j++) {
>>>      ...(skipped)
>> 
>> This is a dense iteration.  Are the entries really mostly nonzero?  Why is 
>> your i loop over all rows instead of only over xstart to xstart+xlen?
>> 
>>> }
>>> 
>>> /////////////////////////////////////////////////////////////////////
>>> /
>>> //////////////////
>>> 
>>> for (i = 0; i < 4*n; i++) {
>>>   rowcol[i] = i;
>>> }
>>> 
>>> // Compute function over the locally owned part of the grid  for (i 
>>> = xstart; i < xstart+xlen; i++) {
>>>   ierr = MatSetValues(*B, 1, &i, 4*n, rowcol, &J[i][0], 
>>> INSERT_VALUES); CHKERRQ(ierr);
>> 
>> This is seems to be creating a distributed dense matrix from a dense matrix 
>> J of the global dimension.  Is that correct?  You need to _distribute_ the 
>> work of computing the matrix entries if you want to see a speedup.
> 

Reply via email to