What is the partition like?  Suppose you randomly assigned nodes to
processes; then in the typical case, all neighbors would be on different
processors.  Then the "diagonal block" would be nearly diagonal and the
off-diagonal block would be huge, requiring communication with many
other processes.

"Smith, Barry F. via petsc-users" <[email protected]> writes:

>    The load balance is definitely out of whack. 
>
>
>
> BuildTwoSidedF         1 1.0 1.6722e-0241.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatMult              138 1.0 2.6604e+02 7.4 3.19e+10 2.1 8.2e+07 7.8e+06 
> 0.0e+00  2  4 13 13  0  15 25100100  0 2935476
> MatAssemblyBegin       1 1.0 1.6807e-0236.1 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> MatAssemblyEnd         1 1.0 3.5680e-01 3.9 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecNorm                2 1.0 4.4252e+0174.8 1.73e+07 1.0 0.0e+00 0.0e+00 
> 2.0e+00  1  0  0  0  0   5  0  0  0  1 12780
> VecCopy                6 1.0 6.5655e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> VecAXPY                2 1.0 1.3793e-02 2.7 1.73e+07 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 41000838
> VecScatterBegin      138 1.0 1.1653e+0285.8 0.00e+00 0.0 8.2e+07 7.8e+06 
> 0.0e+00  1  0 13 13  0   4  0100100  0     0
> VecScatterEnd        138 1.0 1.3653e+0222.4 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  1  0  0  0  0   4  0  0  0  0     0
> VecSetRandom           1 1.0 9.6668e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
>
> Note that VecCopy/AXPY/SetRandom which are all embarrassingly parallel have a 
> balance ratio above 2 which means some processes have more than twice the 
> work of others. Meanwhile the ratio for anything with communication is 
> extremely in balanced, some processes get to the synchronization point well 
> before other processes. 
>
> The first thing I would do is worry about the load imbalance, what is its 
> cause? is it one process with much less work than others (not great but not 
> terrible) or is it one process with much more work then the others (terrible) 
> or something in between. I think once you get a handle on the load balance 
> the rest may fall into place, otherwise we still have some exploring to do. 
> This is not expected behavior for a good machine with a good network and a 
> well balanced job. After you understand the load balancing you may need to 
> use one of the parallel performance visualization tools to see why the 
> synchronization is out of whack.
>
>    Good luck
>
>   Barry
>
>
>> On Jun 21, 2019, at 9:27 AM, Ale Foggia <[email protected]> wrote:
>> 
>> I'm sending one with a bit less time.
>> I'm timing the functions also with std::chronos and for the case of 180 
>> seconds the program runs out of memory (and crushes) before the PETSc log 
>> gets to be printed, so I know the time only from my function. Anyway, in 
>> every case, the times between std::chronos and the PETSc log match.
>> 
>> (The large times are in part "4b- Building offdiagonal part" or "Event Stage 
>> 5: Offdiag").
>> 
>> El vie., 21 jun. 2019 a las 16:09, Zhang, Junchao (<[email protected]>) 
>> escribió:
>> 
>> 
>> On Fri, Jun 21, 2019 at 8:07 AM Ale Foggia <[email protected]> wrote:
>> Thanks both of you for your answers,
>> 
>> El jue., 20 jun. 2019 a las 22:20, Smith, Barry F. (<[email protected]>) 
>> escribió:
>> 
>>   Note that this is a one time cost if the nonzero structure of the matrix 
>> stays the same. It will not happen in future MatAssemblies.
>> 
>> > On Jun 20, 2019, at 3:16 PM, Zhang, Junchao via petsc-users 
>> > <[email protected]> wrote:
>> > 
>> > Those messages were used to build MatMult communication pattern for the 
>> > matrix. They were not part of the matrix entries-passing you imagined, but 
>> > indeed happened in MatAssemblyEnd. If you want to make sure processors do 
>> > not set remote entries, you can use 
>> > MatSetOption(A,MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE), which will generate an 
>> > error when an off-proc entry is set.
>> 
>> I started being concerned about this when I saw that the assembly was taking 
>> a few hundreds of seconds in my code, like 180 seconds, which for me it's a 
>> considerable time. Do you think (or maybe you need more information to 
>> answer this) that this time is "reasonable" for communicating the pattern 
>> for the matrix? I already checked that I'm not setting any remote entries. 
>> It is not reasonable. Could you send log view of that test with 180 seconds 
>> MatAssembly?
>>  
>> Also I see (in my code) that even if there are no messages being passed 
>> during MatAssemblyBegin, it is taking time and the "ratio" is very big.
>> 
>> > 
>> > 
>> > --Junchao Zhang
>> > 
>> > 
>> > On Thu, Jun 20, 2019 at 4:13 AM Ale Foggia via petsc-users 
>> > <[email protected]> wrote:
>> > Hello all!
>> > 
>> > During the conference I showed you a problem happening during 
>> > MatAssemblyEnd in a particular code that I have. Now, I tried the same 
>> > with a simple code (a symmetric problem corresponding to the Laplacian 
>> > operator in 1D, from the SLEPc Hands-On exercises). As I understand (and 
>> > please, correct me if I'm wrong), in this case the elements of the matrix 
>> > are computed locally by each process so there should not be any 
>> > communication during the assembly. However, in the log I get that there 
>> > are messages being passed. Also, the number of messages changes with the 
>> > number of processes used and the size of the matrix. Could you please help 
>> > me understand this?
>> > 
>> > I attach the code I used and the log I get for a small problem.
>> > 
>> > Cheers,
>> > Ale
>> > 
>> 
>> <log.txt>

Reply via email to