On Thu, Nov 9, 2017 at 5:53 AM, Vaclav Hapla <[email protected]> wrote:
> Thanks for the discussion. I think we got further. > > 8. 11. 2017 v 20:42, Matthew Knepley <[email protected]>: > > On Wed, Nov 8, 2017 at 2:27 PM, Jed Brown <[email protected]> wrote: > >> Matthew Knepley <[email protected]> writes: >> >> > On Wed, Nov 8, 2017 at 1:49 PM, Jed Brown <[email protected]> wrote: >> > >> >> Matthew Knepley <[email protected]> writes: >> >> >> >> >> > No, this is the right structure. >> >> >> >> >> >> Oh come on. You're defending a quadratic algorithm. >> >> >> >> >> >> ierr = ParMETIS_V3_PartKway(vtxdist, xadj, adjncy, vwgt, >> adjwgt, >> >> >> &wgtflag, &numflag, &ncon, &nparts, tpwgts, ubvec, options, >> &edgeCut, >> >> >> assignment, &comm); >> >> >> // ... >> >> >> for (p = 0, i = 0; p < nparts; ++p) { >> >> >> for (v = 0; v < nvtxs; ++v) { >> >> >> if (assignment[v] == p) points[i++] = v; >> >> >> } >> >> >> } >> >> >> >> >> >> MatPartitioningApply creates an IS with "assignment" and can be >> >> >> converted to a global numbering with ISPartitioningToNumbering. You >> >> >> could as well have an ISPartitioningToSectionAndIS() that produces >> your >> >> >> representation, preferably without this silly quadratic algorithm. >> >> >> >> >> > >> >> > Time it. Tell me if it matters. Telling me it matters in the long >> run is >> >> > metaphysics. >> > > Jed replied exactly what I needed to know - whether there's some easy > conversion from the natural IS output of MatPartitioningApply into > IS+PetscSection DMPlexDistribute needs. He showed there actually is a way, > and it won't have worse performance (theoretical performance benefit). > Moreover, what you do now for every PetscPartitioner implementation (all > those "Convert to PetscSection+IS" blocks) would be done just in one place > (design benefit). In this context, the request for timing was slightly > unfair, I think. > > >> >> >> I realize ParMETIS isn't scalable and that if you have a modest number >> >> of parts and only a million or so elements per rank, the cost of what >> >> you do here will be acceptable for most uses. >> >> >> >> But you didn't refute my point that ISPartitioningToSectionAndIS can >> >> produce the representation you want. >> > >> > >> > I do not think its an either or thing. Many equivalent interfaces are >> > possible, >> > so I should have told Vaclav "I think this is the right one", but I >> thought >> > that >> > was implicit in me being the responder, and none of us thinking that >> there >> > is >> > a true "right" in interface design. >> > >> > >> >> The IS you're creating is similar >> >> to the inverse of the Numbering (which is a permutation). You are the >> >> one that replaced a scalable algorithm that has existed for a long time >> >> and uses types correctly with PetscPartitioner which has some ugly >> >> warts, duplicates a lot of code, and isn't a viable replacement for >> >> MatPartitioning. >> > >> > >> > 1) MatPartitioning is not a replacement for what I do. According to you, >> > that >> > should end the argument right there. >> > >> > 2) Deep inside MatPartitioning, it must split things up as I do in >> order to >> > pack >> > stuff to be sent. I think it should be exposed as I have done. >> >> Pack what stuff to be sent? PetscPartitionerPartition() takes the >> arguments to ParMETIS directly as arrays. To use MatPartitioning, you >> pass those same arrays to MatCreateMPIAdj which literally just holds the >> pointers. Then MatPartitioningApply passes those on to ParMETIS or >> whichever partitioning package. PetscPartitionerPartition_* >> implementations depends on DM strictly as a way to define the weights. >> That isn't more general or better; it's more restrictive. >> > > This sounds like deliberate misunderstanding. We are not talking about the > input to > ParMetis, but the output. The biggest shortcoming of ParMetis (and indeed > all mesh > partitioners) is that they tell you what must move, but do not move it. > This is the value > add of Plex, it moves the mesh (including all connectivity) and the field > data according > to the partition. In order to do this communication, you always want the > partition segmented > by rank, as it is in the Section/IS output. > > > I think we are talking here about both input and output! > > What PetscPartitionerPartition actually does is getting the adjacency > information from the DM using > DMPlexCreatePartitionerGraph(DM dm, PetscInt height, > PetscInt *numVertices, PetscInt **offsets, PetscInt **adjacency, IS * > globalNumbering) > and these raw arrays are then passed to > (*part->ops->partition)(part, dm, size, numVertices, start, adjacency, > partSection, partition); > > This I don't like from couple of reasons: > 1) the type-specific function has completely different arguments than the > interface function - this is slightly non-standard, and if you talk about > exposing, there should be an interface function with these arguments > 2) on the other hand you derive IS+PetscSection output from the "raw" > partitioner output inside each PetscPartitionerPartition_* separately > ("Convert > to PetscSection+IS" blocks) > 3) so part->ops->partition mixes "raw" inputs and "derived" outputs which > sounds messy > > I fully agree with Jed (and it seems it's compatible with Barry and > Lisandro) this should be doable in a clearer way using MatCreateMPIAdj, > MatPartitioningSetAdjacency, MatPartitioningApply, > ISPartitioningToSectionAndIS > (the latter to be done). > > > >> I see no reason why PetscPartitioner cannot be implemented as a very >> small amount of code around MatPartitioning (little more than the >> interface function). The opposite would necessarily bring in concepts >> like DM that are unnecessary when a user is partitioning matrices and >> graphs. > > > I am not arguing that it should not be done. I am saying > - I do not want to do it, because the payoff is small > - I do not want it to break anything that currently works > - I do not want it to break the interface we currently have to > PetscPartitioner because I think it is genuinely good > > > I think the payoff isn't necessarily small. Just imagine how much code > could be avoided for further maintenance, or if there would be some new > partitioner we wanted to interface. And I think the code much clearer to > anyone who wants to learn how it works or even improve something. > > I'm not that afraid about breaking. Grep tells me there's exactly one call > to PetscPartitionerPartition in whole PETSc - in DMPlexDistribute. > > One thing I always liked on PETSc is a culture of not sticking to status > quo! > > > >> > 3) I think you exaggerate the faults of the code for effect. Do what you >> > must. >> > >> > >> >> And now you seem to be arguing against Vaclav unifying >> >> it, claiming technical rationale that don't appear to hold up. >> > >> > >> > Of course, I disagree with you, and think you make no attempt to >> understand >> > why >> > someone would write this code, because you never solve the problems its >> > necessary >> > for. >> > > I understand you spent the most time with DMPlex, and it's a masterpiece! > But it can always be beneficial to take into account a fresh view of > someone who was not involved before. > > I think I need to create a proof-of-concept. I would start by employing > MatPartitioning in PetscPartitionerPartition with anything outside of this > function untouched for now (as already suggested in #192), if you agree. > > Or what about implementing a special temporary PetscPartitioner > implementation wrapping MatPartitioning? > PETSCPARTITIONERMATPARTITIONING sounds crazy, though :) But could be a > good starting point. > This is probably easier, and allows nice regression testing, namely run all tests using EXTRA_OPTIONS="-petscparitioner_type matpartitioning". I think that should be alright for correctness. There are some parallel redistribution tests in Plex. We will need at least one performance regression. Look at how I do it here: https://bitbucket.org/petsc/petsc/src/312beb00c9b3e1e8ec8fac64a948a1af779da02f/src/dm/impls/plex/examples/tests/ex9.c?at=master&fileviewer=file-view-default You can make custom events, directly access the times, and compare. You could run the two versions and check for degradation for a sequence of meshes in parallel. Thanks, Matt > Thanks > > Vaclav > > >> I have written a parallel unstructured FEM that did this sort of >> partitioning and redistribution on the fly. I wrote it in terms of >> MatPartitioning and it worked fine. I abandoned it because it had >> screwy dependencies and you had abandoned Sieve to work on >> DMComplex/DMPlex. I'm happy to have made that change, but the concepts >> and requirements are hardly foreign to me. > > > Worked fine for you :) It always different with other people. > > >> I don't >> >> understand why, but it certainly looks like PetscPartitioner can be >> >> implemented as a thin layer around MatPartitioning, then that layer can >> >> be refactored into helper functions at the IS/PetscSection level. >> > >> > >> > It might be true. To me, it looked like Mat stuff was draped on so it >> would >> > be hard >> > to pull the raw result out of the partitioner, and that things would get >> > created (like >> > the MatMult Scatter) which I did not want. >> >> Barry created the MPIADJ representations in 1997 precisely for this >> purpose. >> > > Fair. > > Matt > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/> > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.caam.rice.edu/~mk51/>
