I’m using the Trapezoidal method with the command “-ts_theta_endpoint”
ierr = TSCreate(PETSC_COMM_WORLD, &ts); CHKERRQ(ierr);
ierr = TSSetType(ts, TSTHETA); CHKERRQ(ierr);
ierr = TSThetaSetTheta(ts, 0.5); CHKERRQ(ierr);
Just did a quick try on Rosenbrock methods, and it’s diverged.
I didn’t use VecSetValues. I only used MatSetValues multiple times inside
IJacobian.
I tried the –info option. The output file is too large to be sent out. I search
the “Stash” and found 118678 hits in the file. All of them are like:
Line 1668: [16] MatStashScatterBegin_Private(): No of messages: 0
Line 1669: [16] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries,
uses 0 mallocs.
Line 1670: [27] MatStashScatterBegin_Private(): No of messages: 0
Line 1671: [27] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries,
uses 0 mallocs.
Line 1672: [28] MatStashScatterBegin_Private(): No of messages: 0
Line 1673: [28] MatAssemblyBegin_MPIAIJ(): Stash has 0 entries,
uses 0 mallocs.
Line 1674: [11] MatStashScatterBegin_Private(): No of messages: 0
Thanks,
Shuangshuang
From: [email protected] [mailto:[email protected]] On Behalf Of Jed Brown
Sent: Friday, August 30, 2013 3:52 PM
To: Barry Smith
Cc: PETSc users list; Shrirang Abhyankar; Jin, Shuangshuang
Subject: Re: [petsc-users] Performance of PETSc TS solver
Also, which TS method are you using? Rosenbrock methods will amortize a lot of
assembly cost by reusing the matrix for several stages.
On Aug 30, 2013 3:48 PM, "Barry Smith"
<[email protected]<mailto:[email protected]>> wrote:
I would next parallelize the function evaluation since it is the single
largest consumer of time and should presumably be faster in parallel. After
that revisit the -log_summary again to decide if the Jacobian evaluation can be
improved.
Barry
On Aug 30, 2013, at 5:28 PM, "Jin, Shuangshuang"
<[email protected]<mailto:[email protected]>> wrote:
> Hello, I'm trying to update some of my status here. I just managed to"
> _distribute_ the work of computing the Jacobian matrix" as you suggested, so
> each processor only computes a part of elements for the Jacobian matrix
> instead of a global Jacobian matrix. I observed a reduction of the
> computation time from 351 seconds to 55 seconds, which is much better but
> still slower than I expected given the problem size is small. (4n functions
> in IFunction, and 4n*4n Jacobian matrix in IJacobian, n = 288).
>
> I looked at the log profile again, and saw that most of the computation time
> are still for Functioan Eval and Jacobian Eval:
>
> TSStep 600 1.0 5.6103e+01 1.0 9.42e+0825.6 3.0e+06 2.9e+02
> 7.0e+04 93100 99 99 92 152100 99 99110 279
> TSFunctionEval 2996 1.0 2.9608e+01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+04 30 0 0 0 39 50 0 0 0 47 0
> TSJacobianEval 1796 1.0 2.3436e+01 1.0 0.00e+00 0.0 5.4e+02 3.8e+01
> 1.3e+04 39 0 0 0 16 64 0 0 0 20 0
> Warning -- total time of even greater than time of entire stage -- something
> is wrong with the timer
> SNESSolve 600 1.0 5.5692e+01 1.1 9.42e+0825.7 3.0e+06 2.9e+02
> 6.4e+04 88100 99 99 84 144100 99 99101 281
> SNESFunctionEval 2396 1.0 2.3715e+01 3.4 1.04e+06 1.0 0.0e+00 0.0e+00
> 2.4e+04 25 0 0 0 31 41 0 0 0 38 1
> SNESJacobianEval 1796 1.0 2.3447e+01 1.0 0.00e+00 0.0 5.4e+02 3.8e+01
> 1.3e+04 39 0 0 0 16 64 0 0 0 20 0
> SNESLineSearch 1796 1.0 1.8313e+01 1.0 1.54e+0831.4 4.9e+05 2.9e+02
> 2.5e+04 30 16 16 16 33 50 16 16 16 39 139
> KSPGMRESOrthog 9090 1.0 1.1399e+00 4.1 1.60e+07 1.0 0.0e+00 0.0e+00
> 9.1e+03 1 3 0 0 12 2 3 0 0 14 450
> KSPSetUp 3592 1.0 2.8342e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 3.0e+01 0 0 0 0 0 0 0 0 0 0 0
> KSPSolve 1796 1.0 2.3052e+00 1.0 7.87e+0825.2 2.5e+06 2.9e+02
> 2.0e+04 4 84 83 83 26 6 84 83 83 31 5680
> PCSetUp 3592 1.0 9.1255e-02 1.7 6.47e+05 2.5 0.0e+00 0.0e+00
> 1.8e+01 0 0 0 0 0 0 0 0 0 0 159
> PCSetUpOnBlocks 1796 1.0 6.6802e-02 2.3 6.47e+05 2.5 0.0e+00 0.0e+00
> 1.2e+01 0 0 0 0 0 0 0 0 0 0 217
> PCApply 10886 1.0 2.6064e-01 1.3 4.70e+06 1.5 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 1 1 0 0 0 481
>
> I was wondering why SNESFunctionEval and SNESJacobianEval took over 23
> seconds each, however, the KSPSolve only took 2.3 seconds, which is 10 times
> faster. Is this normal? Do you have any more suggestion on how to reduce the
> FunctionEval and JacobianEval time?
> (Currently in IFunction, my f function is sequentially formulated; in
> IJacobian, the Jacobian matrix is distributed formulated).
>
> Thanks,
> Shuangshuang
>
>
>
>
>
> -----Original Message-----
> From: Jed Brown [mailto:[email protected]<mailto:[email protected]>] On
> Behalf Of Jed Brown
> Sent: Friday, August 16, 2013 5:00 PM
> To: Jin, Shuangshuang; Barry Smith; Shri
> ([email protected]<mailto:[email protected]>)
> Cc: [email protected]<mailto:[email protected]>
> Subject: RE: [petsc-users] Performance of PETSc TS solver
>
> "Jin, Shuangshuang"
> <[email protected]<mailto:[email protected]>> writes:
>
>>
>> ////////////////////////////////////////////////////////////////////////////////////////
>> // This proves to be the most time-consuming block in the computation:
>> // Assign values to J matrix for the first 2*n rows (constant values)
>> ... (skipped)
>>
>> // Assign values to J matrix for the following 2*n rows (depends on X
>> values)
>> for (i = 0; i < n; i++) {
>> for (j = 0; j < n; j++) {
>> ...(skipped)
>
> This is a dense iteration. Are the entries really mostly nonzero? Why is
> your i loop over all rows instead of only over xstart to xstart+xlen?
>
>> }
>>
>> //////////////////////////////////////////////////////////////////////
>> //////////////////
>>
>> for (i = 0; i < 4*n; i++) {
>> rowcol[i] = i;
>> }
>>
>> // Compute function over the locally owned part of the grid
>> for (i = xstart; i < xstart+xlen; i++) {
>> ierr = MatSetValues(*B, 1, &i, 4*n, rowcol, &J[i][0],
>> INSERT_VALUES); CHKERRQ(ierr);
>
> This is seems to be creating a distributed dense matrix from a dense matrix J
> of the global dimension. Is that correct? You need to _distribute_ the work
> of computing the matrix entries if you want to see a speedup.