Re: [petsc-users] DMPlex in Firedrake: scaling of mesh distribution

Stefano Zampini Sun, 07 Mar 2021 05:23:25 -0800

128^3 is the entire mesh. The blue line (1 phase) is with dmplexdistribute,
the red line, with a two-stage approach.


Il Dom 7 Mar 2021, 16:20 Mark Adams <[email protected]> ha scritto:

> Is phase 1 the old method and 2 the new?
> Is this 128^3 mesh per process?
>
> On Sun, Mar 7, 2021 at 7:27 AM Stefano Zampini <[email protected]>
> wrote:
>
>>
>>
>> [2] On the robustness and performance of entropy stable discontinuous
>>> collocation methods for the compressible Navier-Stokes equations, ROjas .
>>> et.al.
>>>       https://arxiv.org/abs/1911.10966
>>>
>>
>> This is not the proper reference, here is the correct one
>> https://www.sciencedirect.com/science/article/pii/S0021999120306185?dgcid=rss_sd_all
>> However, there the algorithm is only outlined, and performances related
>> to the mesh distribution are not really reported.
>> We observed a large gain for large core counts and one to all
>> distributions (from minutes to seconds) by splitting the several
>> communication rounds needed by DMPlex into stages: from rank 0 to 1 rank
>> per node, and then decomposing independently within the node.
>> Attached the total time for one-to-all DMPlexDistrbute for a 128^3 mesh
>>
>>
>>>
>>>
>>>> ?
>>>>
>>>> The attached plots suggest (A), (B), and (C) is happening for
>>>> Cahn-Hilliard problem (from firedrake-bench repo) on a 2D 8Kx8K
>>>> unit-square mesh. The implementation is here [1]. Versions are
>>>> Firedrake, PyOp2: 20200204.0; PETSc 3.13.1; ParMETIS 4.0.3.
>>>>
>>>> Two questions, one on (A) and the other on (B)+(C):
>>>>
>>>> 1. Is (A) result expected? Given (A), any effort to improve the quality
>>>> of the compiled assembly kernels (or anything else other than mesh
>>>> distribution) appears futile since it takes 1% of end-to-end execution
>>>> time, or am I missing something?
>>>>
>>>> 1a. Is mesh distribution fundamentally necessary for any FEM framework,
>>>> or is it only needed by Firedrake? If latter, then how do other
>>>> frameworks partition the mesh and execute in parallel with MPI but avoid
>>>> the non-scalable mesh destribution step?
>>>>
>>>> 2. Results (B) and (C) suggest that the mesh distribution step does
>>>> not scale. Is it a fundamental property of the mesh distribution problem
>>>> that it has a central bottleneck in the master process, or is it
>>>> a limitation of the current implementation in PETSc-DMPlex?
>>>>
>>>> 2a. Our (B) result seems to agree with Figure 4(left) of [2]. Fig 6 of
>>>> [2]
>>>> suggests a way to reduce the time spent on sequential bottleneck by
>>>> "parallel mesh refinment" that creates high-resolution meshes from an
>>>> initial coarse mesh. Is this approach implemented in DMPLex?  If so, any
>>>> pointers on how to try it out with Firedrake? If not, any other
>>>> directions for reducing this bottleneck?
>>>>
>>>> 2b. Fig 6 in [3] shows plots for Assembly and Solve steps that scale
>>>> well up
>>>> to 96 cores -- is mesh distribution included in those times?  Is anyone
>>>> reading this aware of any other publications with evaluations of
>>>> Firedrake that measure mesh distribution (or explain how to avoid or
>>>> exclude it)?
>>>>
>>>> Thank you for your time and any info or tips.
>>>>
>>>>
>>>> [1]
>>>> https://github.com/ISI-apex/firedrake-bench/blob/master/cahn_hilliard/firedrake_cahn_hilliard_problem.py
>>>>
>>>> [2] Unstructured Overlapping Mesh Distribution in Parallel, Matthew G.
>>>> Knepley, Michael Lange, Gerard J. Gorman, 2015.
>>>> https://arxiv.org/pdf/1506.06194.pdf
>>>>
>>>> [3] Efficient mesh management in Firedrake using PETSc-DMPlex, Michael
>>>> Lange, Lawrence Mitchell, Matthew G. Knepley and Gerard J. Gorman, SISC,
>>>> 38(5), S143-S155, 2016. http://arxiv.org/abs/1506.07749
>>>>
>>>
>>
>> --
>> Stefano
>>
>

Re: [petsc-users] DMPlex in Firedrake: scaling of mesh distribution

Reply via email to