Barry Smith <[email protected]> writes:

>    For the default PCMG using PCREDUNDANT on the coarse level with LU 
>
> VecScatterBegin     2356 1.0 1.6390e+01 4.7 0.00e+00 0.0 5.9e+09 1.7e+01 
> 0.0e+00  2  0 98 99  0   2  0 98 99  0     0
> VecScatterEnd       2356 1.0 4.1647e+02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00 69  0  0  0  0  69  0  0  0  0     0

433/2356 = 0.18 seconds per scatter.

> so the gathering together of the vector values from all processes to the one 
> process is killing performance (presumably it is just using the "default" 
> VecScatter so sending individual messages to all processes. Terribly slow. If 
> the VecScatter were smart enough to switch to an alltoall here it would 
> actually help an enormous amount).
>
> ---------
>
>   For the DMDAREPART the two sets of VecScatter is still killing you
>
> VecScatterBegin     2907 1.0 2.7453e-02 2.1 0.00e+00 0.0 9.2e+07 7.6e+02 
> 0.0e+00  0  0 92 99  0   0  0 98 99  0     0
> VecScatterEnd       2907 1.0 1.8748e-01 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0

73 µs per scatter

> VecScatterBegin     1393 3.0 3.2119e-02112.6 0.00e+00 0.0 5.9e+06 3.2e+01 
> 0.0e+00  0  0  6  0  0   0  0 99 99  0     0
> VecScatterEnd       1393 3.0 3.2946e-01 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  4  0  0  0  0  99  0  0  0  0     0

259 µs per scatter

These scatter costs don't seem that bad.

> Ideas on how to proceed? From anyone?

We actually have a lot of irregular/high-degree communication patterns
in DMPlex's use of SF.  I think it would be valuable to add an analysis
component that builds a good mapping to the communication primitives.
It's not clear to me that it's needed here at the present scale.

Attachment: signature.asc
Description: PGP signature

Reply via email to