On Fri, Jun 24, 2011 at 11:30 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> > Moving this discussion to petsc-dev since it is relevant for much more > than this one preconditioner. > > In PETSc the Mat object is responsible for the layout of the matrix > (graph) data and the Vec object is responsible for the layout of the vector > data. This will be true within a compute node as it has always been outside > of the compute nodes. If one uses a DM to create the Mat and Vec objects > then the DM may decide the data layout (just as it currently does between > processes) within the node for NUMA reasons (i.e. that touch all business). > > I think it is ok to assume that the input (on process part of) Mat's and > Vec's have good data layout across the the node memory (as well as across > the nodes as you assumed n years ok), if that is not the case then we will > need to have generic tools to repartition them anyways that would be used in > a preprocessing step before applying the KSP before the GAMG ever sees the > data. > > I don't think you should get too hung up on the multiple thread business > in your first cut, just do something and it will evolve as we figure out > what we are doing. All fast point location algorithms use coarse divisions and then some resolution step at the end. We could just do the same thing, which I think is what you propose. I agree that its not a big deal exactly what is happening on a node (1 threads, 1000 threads, etc.) Matt > > Barry > > > > > > > On Jun 24, 2011, at 11:17 AM, Mark F. Adams wrote: > > > Folks, I'm looking at designing this GAMG code and I'm running into > issues that are pretty general in PETSc so I want to bring them up. > > > > The setup for these GAMG methods is simplified if subdomains are small. > So I'm thinking that just writing a flat program per MPI processes will not > be good long term. These issues are very general and they touch on the > parallel data model of PETSc so I don't want to go off and just do something > w/o discussing this. > > > > Generally as we move to multi/many core nodes, a mesh that is well > partitioned onto nodes in not good enough. Data locality in the node needs > to be addressed and it probably needs to be addressed explicitly. The GAMG > algorithm requires finding fine grid points in the coarse grid mesh (and if > the coarse grid does not cover the fine grid then something has to be done > like find the nearest coarse grid triangle/tet/element). Doing this well > requires a lot of unstructured mesh work and data structures, and it can get > goopy. I'd like to simplify this, at least for an initial implementation, > but I want to build in a good data model for say machines in 5-10 years. > THe flat MPI model that I use in Prometheus is probably not a good model. > > > > Let me throw out a straw man to at least get the discussion going. My > first thought for the GAMG setup is use METIS to partition the local > vertices into domains of some approximate size (say 100 or 10^D) and have an > array or arrays of coordinates, for instance, on each node. Put as much of > the GAMG setup into an outer loop over the subdomains on a node. THis outer > loop could be multi-threaded eventually. This basically lets GAMG get away > with N^2 like algorithms because N is an internal parameter (eg, 100), and > it addresses general data locality issues. Anyway, this is just a straw > man, I have not thought this through completely. > > > > An exercise that might help to ground this discussion is to think about a > case study of a "perfect" code in 5-10 years. GAMG has promise, I think, > for convective/hyperbolic problems where the geometric coarse functions can > be a big win. So perhaps we could think about a PFLOWTRAN problem on a > domain that would be suitable for GAMG. Or an ice sheet problem. THese > data locality issues should be addressed at the application level if > feasible, but I don't know what a reasonable model would be. For instance, > my model in Prometheus is that the user provides me with a well partitioned > fine grid matrix. I do not repartition the fine grid, but I do not make any > assumptions about data locality within an MPI process. We may want to > demand more out of users if they expect to go extreme-scale. Anyway just > some thoughts, > > > > Mark > > > > > > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20110624/cb845dc1/attachment.html>
