On 8/2/06, Matt Funk <mafunk at nmsu.edu> wrote: > Hi Matt, > > thanks for all the help so far. The -info option is really very helpful. So i > think i straightened the actual errors out. However, now i am back to the > original question i had. That is why it takes so much longer on 4 procs than > on 1 proc.
So you have a 1.5 load imbalance for MatMult(), which probably cascades to give the 133! load imbalance for VecDot(). You probably have either: 1) VERY bad laod imbalance 2) a screwed up network 3) bad contention on the network (loaded cluster) Can you help us narrow this down? Matt > I profiled the KSPSolve(...) as stage 2: > > For 1 proc i have: > --- Event Stage 2: Stage 2 of ChomboPetscInterface > > VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 > VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 0.0e+00 > 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 > VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 > MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 > MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 > KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 0.0e+00 > 1.2e+04 7100 0 0 84 97100 0 0100 45 > PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 > > > for 4 procs i have : > --- Event Stage 2: Stage 2 of ChomboPetscInterface > > VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 0.0e+00 0.0e+00 > 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 > VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 0.0e+00 > 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 > VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 0.0e+00 > 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 > VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 > VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 > MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 0.0e+00 > 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 > MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 > MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 > MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 0.0e+00 > 2.8e+04 84100 0 0 34 100100 0 0100 1 > PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 0.0e+00 > 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 > PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 0.0e+00 > 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 > PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 0.0e+00 > 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 > ------------------------------------------------------------------------------------------------------------------------ > > Now if i understand it right, all these calls summarize all calls between the > pop and push commands. That would mean that the majority of the time is spend > in the MatMult and in within that the VecScatterBegin and VecScatterEnd > commands (if i understand it right). > > My problem size is really small. So i was wondering if the problem lies in > that (namely that the major time is simply spend communicating between > processors, or whether there is still something wrong with how i wrote the > code?) > > > thanks > mat > > > > On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > > On 8/1/06, Matt Funk <mafunk at nmsu.edu> wrote: > > > Actually the errors occur on my calls to a PETSc functions after calling > > > PETSCInitialize. > > > > Yes, it is the error I pointed out in the last message. > > > > Matt > > > > > mat > > -- "Failure has a thousand explanations. Success doesn't need one" -- Sir Alec Guiness
