You can choose the number of rows per process so that each has about the same number of entries. "Residual" meant IFunction and/or RHSFunction, when applicable. On Aug 31, 2013 3:53 PM, "Jin, Shuangshuang" <[email protected]> wrote:
> Hi, Jed, I think you have a good point here. The load imbalance might be > a big problem for us, since the Jaociban matrix is not symmetric, and the > distributed computation of each part of the Jacobian matrix elements on > different processor can vary a lot. However, that’s what the matrix looks > like. Do we have any control over that? And what do you mean by “distribute > the work for residual evaluation better?” I think I can only distribute the > Ifunction and Ijacobian computation, but have no control of residual > evaluation. Isn’t it a black box inside TS? > > For the gprof Barry suggested, I tried to compile with gcc –pg with the > sequential mode, couldn’t create the gmon.out file after running the > executable... > > Thanks, > Shuangshuang > > > On 8/30/13 4:57 PM, "Jed gov>" <[email protected]> wrote: > > "Jin, Shuangshuang" <[email protected]> writes: > > > Hello, I'm trying to update some of my status here. I just managed to" > _distribute_ the work of computing the Jacobian matrix" as you suggested, > so each processor only computes a part of elements for the Jacobian matrix > instead of a global Jacobian matrix. I observed a reduction of the > computation time from 351 seconds to 55 seconds, which is much better but > still slower than I expected given the problem size is small. (4n functions > in IFunction, and 4n*4n Jacobian matrix in IJacobian, n = 288). > > > > I looked at the log profile again, and saw that most of the computation > time are still for Functioan Eval and Jacobian Eval: > > > > TSStep 600 1.0 5.6103e+01 1.0 9.42e+0825.6 3.0e+06 2.9e+02 > 7.0e+04 93100 99 99 92 152100 99 99110 279 > > TSFunctionEval 2996 1.0 2.9608e+01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.0e+04 30 0 0 0 39 50 0 0 0 47 0 > > The load imbalance is pretty significant here, so maybe you can > distribute the work for residual evaluation better? > > > TSJacobianEval 1796 1.0 2.3436e+01 1.0 0.00e+00 0.0 5.4e+02 3.8e+01 > 1.3e+04 39 0 0 0 16 64 0 0 0 20 0 > > Warning -- total time of even greater than time of entire stage -- > something is wrong with the timer > > SNESSolve contains the Jacobian and residual evaluations, as well as > KSPSolve. Pretty much all the cost is in those three things. > > > SNESSolve 600 1.0 5.5692e+01 1.1 9.42e+0825.7 3.0e+06 2.9e+02 > 6.4e+04 88100 99 99 84 144100 99 99101 281 > > SNESFunctionEval 2396 1.0 2.3715e+01 3.4 1.04e+06 1.0 0.0e+00 0.0e+00 > 2.4e+04 25 0 0 0 31 41 0 0 0 38 1 > > SNESJacobianEval 1796 1.0 2.3447e+01 1.0 0.00e+00 0.0 5.4e+02 3.8e+01 > 1.3e+04 39 0 0 0 16 64 0 0 0 20 0 > > SNESLineSearch 1796 1.0 1.8313e+01 1.0 1.54e+0831.4 4.9e+05 2.9e+02 > 2.5e+04 30 16 16 16 33 50 16 16 16 39 139 > > KSPGMRESOrthog 9090 1.0 1.1399e+00 4.1 1.60e+07 1.0 0.0e+00 0.0e+00 > 9.1e+03 1 3 0 0 12 2 3 0 0 14 450 > > KSPSetUp 3592 1.0 2.8342e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 3.0e+01 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 1796 1.0 2.3052e+00 1.0 7.87e+0825.2 2.5e+06 2.9e+02 > 2.0e+04 4 84 83 83 26 6 84 83 83 31 5680 > > PCSetUp 3592 1.0 9.1255e-02 1.7 6.47e+05 2.5 0.0e+00 0.0e+00 > 1.8e+01 0 0 0 0 0 0 0 0 0 0 159 > > PCSetUpOnBlocks 1796 1.0 6.6802e-02 2.3 6.47e+05 2.5 0.0e+00 0.0e+00 > 1.2e+01 0 0 0 0 0 0 0 0 0 0 217 > > PCApply 10886 1.0 2.6064e-01 1.3 4.70e+06 1.5 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 1 1 0 0 0 481 > > > > I was wondering why SNESFunctionEval and SNESJacobianEval took over 23 > > seconds each, however, the KSPSolve only took 2.3 seconds, which is 10 > > times faster. Is this normal? Do you have any more suggestion on how > > to reduce the FunctionEval and JacobianEval time? > > It means that the linear systems are easy to solve (probably because > they are small), but the IFunction and IJacobian are expensive. As > Barry says, you might be able to speed it up by sequential optimization. > >
