Is there a limit to how big an attachment can be? The file is 1.3Mb big. I tried to send it twice and none of the emails went through. I also send it directly to Barry and Matthews email. I hope that got though?
mat On Tuesday 15 August 2006 14:44, Matthew Knepley wrote: > I don't think it matters initially since the problem is BIG imbalances. > > Matt > > On 8/15/06, Matt Funk <mafunk at nmsu.edu> wrote: > > Do you want me to use the debug version or the optimized version of > > PETSc? > > > > mat > > > > On Tuesday 15 August 2006 13:56, Barry Smith wrote: > > > Please send the entire -info output as an attachment to me. (Not > > > in the email) I'll study it in more detail. > > > > > > Barry > > > > > > On Tue, 15 Aug 2006, Matt Funk wrote: > > > > On Tuesday 15 August 2006 11:52, Barry Smith wrote: > > > >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 > > > >>> 0.0e+00 > > > >> > > > >> ^^^^^ > > > >> balance > > > >> > > > >> Hmmm, I would guess that the matrix entries are not so well > > > >> balanced? One process takes 1.4 times as long for the triangular > > > >> solves as the other so either one matrix has many more entries or > > > >> one processor is slower then the other. > > > >> > > > >> Barry > > > > > > > > Well it would seem that way at first, but i don't know how that could > > > > be since i allocate an exactly equal amount of points on both > > > > processor (see previous email). > > > > Further i used the -mat_view_info option. Here is what it gives me: > > > > > > > > ... > > > > Matrix Object: > > > > type=mpiaij, rows=119808, cols=119808 > > > > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > > > > [1] PetscCommDuplicateUsing internal PETSc communicator 91 168 > > > > ... > > > > > > > > ... > > > > Matrix Object: > > > > type=seqaij, rows=59904, cols=59904 > > > > total: nonzeros=407400, allocated nonzeros=407400 > > > > not using I-node routines > > > > Matrix Object: > > > > type=seqaij, rows=59904, cols=59904 > > > > total: nonzeros=407400, allocated nonzeros=407400 > > > > not using I-node routines > > > > ... > > > > > > > > So to me it look s well split up. Is there anything else that > > > > somebody can think of. The machine i am running on is all same > > > > processors. > > > > > > > > By the way, i am not sure if the '[1] PetscCommDuplicateUsing > > > > internal PETSc communicator 91 168' is something i need to worry > > > > about?? > > > > > > > > mat > > > > > > > >> On Tue, 15 Aug 2006, Matt Funk wrote: > > > >>> Hi Matt, > > > >>> > > > >>> sorry for the delay since the last email, but there were some other > > > >>> things i needed to do. > > > >>> > > > >>> Anyway, I hope that maybe I can get some more help from you guys > > > >>> with respect to the loadimbalance problem i have. Here is the > > > >>> situtation: I run my code on 2 procs. I profile my KSPSolve call > > > >>> and here is what i get: > > > >>> > > > >>> ... > > > >>> > > > >>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > >>> > > > >>> VecDot 20000 1.0 1.9575e+01 2.0 2.39e+08 2.0 0.0e+00 > > > >>> 0.0e+00 2.0e+04 2 8 0 0 56 7 8 0 0 56 245 > > > >>> VecNorm 16000 1.0 4.0559e+01 3.2 1.53e+08 3.2 0.0e+00 > > > >>> 0.0e+00 1.6e+04 3 7 0 0 44 13 7 0 0 44 95 > > > >>> VecCopy 4000 1.0 1.5148e+00 1.4 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > > > >>> VecSet 16000 1.0 3.1937e+00 1.8 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 > > > >>> VecAXPY 16000 1.0 8.2395e+00 1.4 3.22e+08 1.4 0.0e+00 > > > >>> 0.0e+00 0.0e+00 1 7 0 0 0 3 7 0 0 0 465 > > > >>> VecAYPX 8000 1.0 3.6370e+00 1.3 3.46e+08 1.3 0.0e+00 > > > >>> 0.0e+00 0.0e+00 0 3 0 0 0 2 3 0 0 0 527 > > > >>> VecScatterBegin 12000 1.0 5.7721e-01 1.1 0.00e+00 0.0 2.4e+04 > > > >>> 2.1e+04 0.0e+00 0 0100100 0 0 0100100 0 0 > > > >>> VecScatterEnd 12000 1.0 1.4059e+01 9.2 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 > > > >>> MatMult 12000 1.0 5.0567e+01 1.0 1.82e+08 1.0 2.4e+04 > > > >>> 2.1e+04 0.0e+00 5 32100100 0 25 32100100 0 361 > > > >>> MatSolve 16000 1.0 1.1285e+02 1.4 1.50e+08 1.4 0.0e+00 > > > >>> 0.0e+00 0.0e+00 10 43 0 0 0 47 43 0 0 0 214 > > > >>> MatLUFactorNum 1 1.0 1.9693e-02 1.0 5.79e+07 1.1 0.0e+00 > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 110 > > > >>> MatILUFactorSym 1 1.0 7.8840e-03 1.1 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>> MatGetOrdering 1 1.0 1.2250e-03 1.1 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>> KSPSetup 1 1.0 1.1921e-06 1.2 0.00e+00 0.0 0.0e+00 > > > >>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>> KSPSolve 4000 1.0 2.0419e+02 1.0 1.39e+08 1.0 2.4e+04 > > > >>> 2.1e+04 3.6e+04 21100100100100 100100100100100 278 > > > >>> PCSetUp 1 1.0 2.8828e-02 1.1 3.99e+07 1.1 0.0e+00 > > > >>> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 75 > > > >>> PCSetUpOnBlocks 4000 1.0 3.5605e-02 1.1 3.35e+07 1.1 0.0e+00 > > > >>> 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 61 > > > >>> PCApply 16000 1.0 1.1661e+02 1.4 1.46e+08 1.4 0.0e+00 > > > >>> 0.0e+00 0.0e+00 10 43 0 0 0 49 43 0 0 0 207 > > > >>> ------------------------------------------------------------------- > > > >>>---- -- ----------------------------------------------- > > > >>> > > > >>> ... > > > >>> > > > >>> > > > >>> Some things to note are the following: > > > >>> I allocate my vector as: > > > >>> VecCreateMPI(PETSC_COMM_WORLD, //communicator > > > >>> a_totallocal_numPoints[a_thisproc], //local points on this > > > >>> proc a_totalglobal_numPoints, //total number of global points > > > >>> &m_globalRHSVector); //the vector to be created > > > >>> > > > >>> where the vector a_totallocal_numPoints is : > > > >>> a_totallocal_numPoints: 59904 59904 > > > >>> > > > >>> The matrix is allocated as: > > > >>> m_ierr = MatCreateMPIAIJ(PETSC_COMM_WORLD, //communicator > > > >>> a_totallocal_numPoints[a_thisproc], //total > > > >>> number of local rows (that is rows residing on this proc > > > >>> a_totallocal_numPoints[a_thisproc], //total > > > >>> number of columns corresponding to local part of parallel vector > > > >>> a_totalglobal_numPoints, //number of global > > > >>> rows a_totalglobal_numPoints, //number of global columns > > > >>> PETSC_NULL, > > > >>> a_NumberOfNZPointsInDiagonalMatrix, > > > >>> PETSC_NULL, > > > >>> a_NumberOfNZPointsInOffDiagonalMatrix, > > > >>> &m_globalMatrix); > > > >>> > > > >>> With the info option i checked and there is no extra mallocs at > > > >>> all. My problems setup is symmetric so it seems that everything is > > > >>> set up so that it should be essentially perfectly balanced. > > > >>> However, the numbers given above certainly do not reflect that. > > > >>> > > > >>> However, the in all other parts of my code (except the PETSc call), > > > >>> i get the expected, almost perfect loadbalance. > > > >>> > > > >>> Is there anything that i am overlooking? Any help is greatly > > > >>> appreciated. > > > >>> > > > >>> thanks > > > >>> mat > > > >>> > > > >>> On Wednesday 02 August 2006 16:21, Matt Funk wrote: > > > >>>> Hi Matt, > > > >>>> > > > >>>> It could be a bad load imbalance because i don't let PETSc decide. > > > >>>> I need to fix that anyway, so i think i'll try that first and then > > > >>>> let you know. Thanks though for the quick response and helping me > > > >>>> to interpret those numbers ... > > > >>>> > > > >>>> > > > >>>> mat > > > >>>> > > > >>>> On Wednesday 02 August 2006 15:50, Matthew Knepley wrote: > > > >>>>> On 8/2/06, Matt Funk <mafunk at nmsu.edu> wrote: > > > >>>>>> Hi Matt, > > > >>>>>> > > > >>>>>> thanks for all the help so far. The -info option is really very > > > >>>>>> helpful. So i think i straightened the actual errors out. > > > >>>>>> However, now i am back to the original question i had. That is > > > >>>>>> why it takes so much longer on 4 procs than on 1 proc. > > > >>>>> > > > >>>>> So you have a 1.5 load imbalance for MatMult(), which probably > > > >>>>> cascades to give the 133! load imbalance for VecDot(). You > > > >>>>> probably have either: > > > >>>>> > > > >>>>> 1) VERY bad laod imbalance > > > >>>>> > > > >>>>> 2) a screwed up network > > > >>>>> > > > >>>>> 3) bad contention on the network (loaded cluster) > > > >>>>> > > > >>>>> Can you help us narrow this down? > > > >>>>> > > > >>>>> > > > >>>>> Matt > > > >>>>> > > > >>>>>> I profiled the KSPSolve(...) as stage 2: > > > >>>>>> > > > >>>>>> For 1 proc i have: > > > >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > >>>>>> > > > >>>>>> VecDot 4000 1.0 4.9158e-02 1.0 4.74e+08 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 2 18 0 0 0 474 > > > >>>>>> VecNorm 8000 1.0 2.1798e-01 1.0 2.14e+08 1.0 0.0e+00 > > > >>>>>> 0.0e+00 4.0e+03 1 36 0 0 28 7 36 0 0 33 214 > > > >>>>>> VecAYPX 4000 1.0 1.3449e-01 1.0 1.73e+08 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 5 18 0 0 0 173 > > > >>>>>> MatMult 4000 1.0 3.6004e-01 1.0 3.24e+07 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 1 9 0 0 0 12 9 0 0 0 32 > > > >>>>>> MatSolve 8000 1.0 1.0620e+00 1.0 2.19e+07 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 36 18 0 0 0 22 > > > >>>>>> KSPSolve 4000 1.0 2.8338e+00 1.0 4.52e+07 1.0 0.0e+00 > > > >>>>>> 0.0e+00 1.2e+04 7100 0 0 84 97100 0 0100 45 > > > >>>>>> PCApply 8000 1.0 1.1133e+00 1.0 2.09e+07 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 3 18 0 0 0 38 18 0 0 0 21 > > > >>>>>> > > > >>>>>> > > > >>>>>> for 4 procs i have : > > > >>>>>> --- Event Stage 2: Stage 2 of ChomboPetscInterface > > > >>>>>> > > > >>>>>> VecDot 4000 1.0 3.5884e+01133.7 2.17e+07133.7 > > > >>>>>> 0.0e+00 0.0e+00 4.0e+03 8 18 0 0 5 9 18 0 0 14 1 > > > >>>>>> VecNorm 8000 1.0 3.4986e-01 1.3 4.43e+07 1.3 0.0e+00 > > > >>>>>> 0.0e+00 8.0e+03 0 36 0 0 10 0 36 0 0 29 133 > > > >>>>>> VecSet 8000 1.0 3.5024e-02 1.4 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>>>>> VecAYPX 4000 1.0 5.6790e-02 1.3 1.28e+08 1.3 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 410 > > > >>>>>> VecScatterBegin 4000 1.0 6.0042e+01 1.4 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 38 0 0 0 0 45 0 0 0 0 0 > > > >>>>>> VecScatterEnd 4000 1.0 5.9364e+01 1.4 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 37 0 0 0 0 44 0 0 0 0 0 > > > >>>>>> MatMult 4000 1.0 1.1959e+02 1.4 3.46e+04 1.4 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 75 9 0 0 0 89 9 0 0 0 0 > > > >>>>>> MatSolve 8000 1.0 2.8150e-01 1.0 2.16e+07 1.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 18 0 0 0 0 18 0 0 0 83 > > > >>>>>> MatLUFactorNum 1 1.0 1.3685e-04 1.1 5.64e+06 1.1 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 21 > > > >>>>>> MatILUFactorSym 1 1.0 2.3389e-04 1.2 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>>>>> MatGetOrdering 1 1.0 9.6083e-05 1.2 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>>>>> KSPSetup 1 1.0 2.1458e-06 2.2 0.00e+00 0.0 0.0e+00 > > > >>>>>> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>>>>> KSPSolve 4000 1.0 1.2200e+02 1.0 2.63e+05 1.0 0.0e+00 > > > >>>>>> 0.0e+00 2.8e+04 84100 0 0 34 100100 0 0100 1 > > > >>>>>> PCSetUp 1 1.0 5.0187e-04 1.2 1.68e+06 1.2 0.0e+00 > > > >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 6 > > > >>>>>> PCSetUpOnBlocks 4000 1.0 1.2104e-02 2.2 1.34e+05 2.2 0.0e+00 > > > >>>>>> 0.0e+00 4.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > > >>>>>> PCApply 8000 1.0 8.4254e-01 1.2 8.27e+06 1.2 0.0e+00 > > > >>>>>> 0.0e+00 8.0e+03 1 18 0 0 10 1 18 0 0 29 28 > > > >>>>>> ---------------------------------------------------------------- > > > >>>>>>---- -- - -- ----------------------------------------------- > > > >>>>>> > > > >>>>>> Now if i understand it right, all these calls summarize all > > > >>>>>> calls between the pop and push commands. That would mean that > > > >>>>>> the majority of the time is spend in the MatMult and in within > > > >>>>>> that the VecScatterBegin and VecScatterEnd commands (if i > > > >>>>>> understand it right). > > > >>>>>> > > > >>>>>> My problem size is really small. So i was wondering if the > > > >>>>>> problem lies in that (namely that the major time is simply spend > > > >>>>>> communicating between processors, or whether there is still > > > >>>>>> something wrong with how i wrote the code?) > > > >>>>>> > > > >>>>>> > > > >>>>>> thanks > > > >>>>>> mat > > > >>>>>> > > > >>>>>> On Tuesday 01 August 2006 18:28, Matthew Knepley wrote: > > > >>>>>>> On 8/1/06, Matt Funk <mafunk at nmsu.edu> wrote: > > > >>>>>>>> Actually the errors occur on my calls to a PETSc functions > > > >>>>>>>> after calling PETSCInitialize. > > > >>>>>>> > > > >>>>>>> Yes, it is the error I pointed out in the last message. > > > >>>>>>> > > > >>>>>>> Matt > > > >>>>>>> > > > >>>>>>>> mat
