You could access the VecScatter inside the matrix-multiply and call VecScatterView() with an ASCII viewer with the format PETSC_VIEWER_ASCII_INFO (make sure you use this format) and it provides information about how much communication is being done and how many neighbors are being communicated with
Barry > On Jun 21, 2019, at 10:56 AM, Jed Brown <[email protected]> wrote: > > What is the partition like? Suppose you randomly assigned nodes to > processes; then in the typical case, all neighbors would be on different > processors. Then the "diagonal block" would be nearly diagonal and the > off-diagonal block would be huge, requiring communication with many > other processes. > > "Smith, Barry F. via petsc-users" <[email protected]> writes: > >> The load balance is definitely out of whack. >> >> >> >> BuildTwoSidedF 1 1.0 1.6722e-0241.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatMult 138 1.0 2.6604e+02 7.4 3.19e+10 2.1 8.2e+07 7.8e+06 >> 0.0e+00 2 4 13 13 0 15 25100100 0 2935476 >> MatAssemblyBegin 1 1.0 1.6807e-0236.1 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 1 1.0 3.5680e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecNorm 2 1.0 4.4252e+0174.8 1.73e+07 1.0 0.0e+00 0.0e+00 >> 2.0e+00 1 0 0 0 0 5 0 0 0 1 12780 >> VecCopy 6 1.0 6.5655e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAXPY 2 1.0 1.3793e-02 2.7 1.73e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 41000838 >> VecScatterBegin 138 1.0 1.1653e+0285.8 0.00e+00 0.0 8.2e+07 7.8e+06 >> 0.0e+00 1 0 13 13 0 4 0100100 0 0 >> VecScatterEnd 138 1.0 1.3653e+0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 4 0 0 0 0 0 >> VecSetRandom 1 1.0 9.6668e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> >> Note that VecCopy/AXPY/SetRandom which are all embarrassingly parallel have >> a balance ratio above 2 which means some processes have more than twice the >> work of others. Meanwhile the ratio for anything with communication is >> extremely in balanced, some processes get to the synchronization point well >> before other processes. >> >> The first thing I would do is worry about the load imbalance, what is its >> cause? is it one process with much less work than others (not great but not >> terrible) or is it one process with much more work then the others >> (terrible) or something in between. I think once you get a handle on the >> load balance the rest may fall into place, otherwise we still have some >> exploring to do. This is not expected behavior for a good machine with a >> good network and a well balanced job. After you understand the load >> balancing you may need to use one of the parallel performance visualization >> tools to see why the synchronization is out of whack. >> >> Good luck >> >> Barry >> >> >>> On Jun 21, 2019, at 9:27 AM, Ale Foggia <[email protected]> wrote: >>> >>> I'm sending one with a bit less time. >>> I'm timing the functions also with std::chronos and for the case of 180 >>> seconds the program runs out of memory (and crushes) before the PETSc log >>> gets to be printed, so I know the time only from my function. Anyway, in >>> every case, the times between std::chronos and the PETSc log match. >>> >>> (The large times are in part "4b- Building offdiagonal part" or "Event >>> Stage 5: Offdiag"). >>> >>> El vie., 21 jun. 2019 a las 16:09, Zhang, Junchao (<[email protected]>) >>> escribió: >>> >>> >>> On Fri, Jun 21, 2019 at 8:07 AM Ale Foggia <[email protected]> wrote: >>> Thanks both of you for your answers, >>> >>> El jue., 20 jun. 2019 a las 22:20, Smith, Barry F. (<[email protected]>) >>> escribió: >>> >>> Note that this is a one time cost if the nonzero structure of the matrix >>> stays the same. It will not happen in future MatAssemblies. >>> >>>> On Jun 20, 2019, at 3:16 PM, Zhang, Junchao via petsc-users >>>> <[email protected]> wrote: >>>> >>>> Those messages were used to build MatMult communication pattern for the >>>> matrix. They were not part of the matrix entries-passing you imagined, but >>>> indeed happened in MatAssemblyEnd. If you want to make sure processors do >>>> not set remote entries, you can use >>>> MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE), which will generate an >>>> error when an off-proc entry is set. >>> >>> I started being concerned about this when I saw that the assembly was >>> taking a few hundreds of seconds in my code, like 180 seconds, which for me >>> it's a considerable time. Do you think (or maybe you need more information >>> to answer this) that this time is "reasonable" for communicating the >>> pattern for the matrix? I already checked that I'm not setting any remote >>> entries. >>> It is not reasonable. Could you send log view of that test with 180 seconds >>> MatAssembly? >>> >>> Also I see (in my code) that even if there are no messages being passed >>> during MatAssemblyBegin, it is taking time and the "ratio" is very big. >>> >>>> >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Jun 20, 2019 at 4:13 AM Ale Foggia via petsc-users >>>> <[email protected]> wrote: >>>> Hello all! >>>> >>>> During the conference I showed you a problem happening during >>>> MatAssemblyEnd in a particular code that I have. Now, I tried the same >>>> with a simple code (a symmetric problem corresponding to the Laplacian >>>> operator in 1D, from the SLEPc Hands-On exercises). As I understand (and >>>> please, correct me if I'm wrong), in this case the elements of the matrix >>>> are computed locally by each process so there should not be any >>>> communication during the assembly. However, in the log I get that there >>>> are messages being passed. Also, the number of messages changes with the >>>> number of processes used and the size of the matrix. Could you please help >>>> me understand this? >>>> >>>> I attach the code I used and the log I get for a small problem. >>>> >>>> Cheers, >>>> Ale >>>> >>> >>> <log.txt>
