What is the partition like? Suppose you randomly assigned nodes to processes; then in the typical case, all neighbors would be on different processors. Then the "diagonal block" would be nearly diagonal and the off-diagonal block would be huge, requiring communication with many other processes.
"Smith, Barry F. via petsc-users" <[email protected]> writes: > The load balance is definitely out of whack. > > > > BuildTwoSidedF 1 1.0 1.6722e-0241.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatMult 138 1.0 2.6604e+02 7.4 3.19e+10 2.1 8.2e+07 7.8e+06 > 0.0e+00 2 4 13 13 0 15 25100100 0 2935476 > MatAssemblyBegin 1 1.0 1.6807e-0236.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > MatAssemblyEnd 1 1.0 3.5680e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecNorm 2 1.0 4.4252e+0174.8 1.73e+07 1.0 0.0e+00 0.0e+00 > 2.0e+00 1 0 0 0 0 5 0 0 0 1 12780 > VecCopy 6 1.0 6.5655e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > VecAXPY 2 1.0 1.3793e-02 2.7 1.73e+07 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 41000838 > VecScatterBegin 138 1.0 1.1653e+0285.8 0.00e+00 0.0 8.2e+07 7.8e+06 > 0.0e+00 1 0 13 13 0 4 0100100 0 0 > VecScatterEnd 138 1.0 1.3653e+0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 > VecSetRandom 1 1.0 9.6668e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > Note that VecCopy/AXPY/SetRandom which are all embarrassingly parallel have a > balance ratio above 2 which means some processes have more than twice the > work of others. Meanwhile the ratio for anything with communication is > extremely in balanced, some processes get to the synchronization point well > before other processes. > > The first thing I would do is worry about the load imbalance, what is its > cause? is it one process with much less work than others (not great but not > terrible) or is it one process with much more work then the others (terrible) > or something in between. I think once you get a handle on the load balance > the rest may fall into place, otherwise we still have some exploring to do. > This is not expected behavior for a good machine with a good network and a > well balanced job. After you understand the load balancing you may need to > use one of the parallel performance visualization tools to see why the > synchronization is out of whack. > > Good luck > > Barry > > >> On Jun 21, 2019, at 9:27 AM, Ale Foggia <[email protected]> wrote: >> >> I'm sending one with a bit less time. >> I'm timing the functions also with std::chronos and for the case of 180 >> seconds the program runs out of memory (and crushes) before the PETSc log >> gets to be printed, so I know the time only from my function. Anyway, in >> every case, the times between std::chronos and the PETSc log match. >> >> (The large times are in part "4b- Building offdiagonal part" or "Event Stage >> 5: Offdiag"). >> >> El vie., 21 jun. 2019 a las 16:09, Zhang, Junchao (<[email protected]>) >> escribió: >> >> >> On Fri, Jun 21, 2019 at 8:07 AM Ale Foggia <[email protected]> wrote: >> Thanks both of you for your answers, >> >> El jue., 20 jun. 2019 a las 22:20, Smith, Barry F. (<[email protected]>) >> escribió: >> >> Note that this is a one time cost if the nonzero structure of the matrix >> stays the same. It will not happen in future MatAssemblies. >> >> > On Jun 20, 2019, at 3:16 PM, Zhang, Junchao via petsc-users >> > <[email protected]> wrote: >> > >> > Those messages were used to build MatMult communication pattern for the >> > matrix. They were not part of the matrix entries-passing you imagined, but >> > indeed happened in MatAssemblyEnd. If you want to make sure processors do >> > not set remote entries, you can use >> > MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE), which will generate an >> > error when an off-proc entry is set. >> >> I started being concerned about this when I saw that the assembly was taking >> a few hundreds of seconds in my code, like 180 seconds, which for me it's a >> considerable time. Do you think (or maybe you need more information to >> answer this) that this time is "reasonable" for communicating the pattern >> for the matrix? I already checked that I'm not setting any remote entries. >> It is not reasonable. Could you send log view of that test with 180 seconds >> MatAssembly? >> >> Also I see (in my code) that even if there are no messages being passed >> during MatAssemblyBegin, it is taking time and the "ratio" is very big. >> >> > >> > >> > --Junchao Zhang >> > >> > >> > On Thu, Jun 20, 2019 at 4:13 AM Ale Foggia via petsc-users >> > <[email protected]> wrote: >> > Hello all! >> > >> > During the conference I showed you a problem happening during >> > MatAssemblyEnd in a particular code that I have. Now, I tried the same >> > with a simple code (a symmetric problem corresponding to the Laplacian >> > operator in 1D, from the SLEPc Hands-On exercises). As I understand (and >> > please, correct me if I'm wrong), in this case the elements of the matrix >> > are computed locally by each process so there should not be any >> > communication during the assembly. However, in the log I get that there >> > are messages being passed. Also, the number of messages changes with the >> > number of processes used and the size of the matrix. Could you please help >> > me understand this? >> > >> > I attach the code I used and the log I get for a small problem. >> > >> > Cheers, >> > Ale >> > >> >> <log.txt>
