On Sep 10, 2013, at 8:55 PM, Jed Brown <[email protected]> wrote:

> Matthew Knepley <[email protected]> writes:
>>> PetscSF cannot currently scatter individual bytes (4 bytes minimum), and
>>> even if it could, it's a horribly inefficient representation (4 or 8
>>> bytes of metadata for each byte of payload).  The quick fix at that
>>> moment was to send in units of size PetscInt (the struct was always
>>> going to be divisible by that size).

   Since when are PETSc people interested in quick fixes, we are only concerned 
with doing things right; quick fixes lead to bad code and bad ideas.

>>> 
>> 
>> The way I understand it, he never had a single MPI_BYTE, but always a bunch.
>> Shouldn't that packing handle that case?
> 
> In DMPlexDistributeData, your fieldSF is blown up so that every unit has
> its own metadata (rank, offset).  That's a terrible mechanism for moving
> structs with hundreds of bytes.
> 
> I would rather not make PetscSF deal with fully-heterogeneous data
> (where each node can have a different size) because it makes indexing a
> nightmare (you need a PetscSection or similar just to get in the door; I
> don't want PetscSF to depend on PetscSection).

   Having PetscSF depend on PetscSection would be insane (after all 
PetscSection has information about constrained variables and other silly 
nonsense) but having PetscSF depend on XX makes perfect sense.

A review:

(current situation; with unimportant stuff removed)

+  sf - star forest
.  nroots - number of root vertices on the current process (these are possible 
targets for other process to attach leaves)
.  nleaves - number of leaf vertices on the current process, each of these 
references a root on any process
.  ilocal - locations of leaves in leafdata buffers, pass NULL for contiguous 
storage
.  iremote - remote locations of root vertices for each leaf on the current 
process

PetscSFSetGraph(PetscSF sf,PetscInt nroots,PetscInt nleaves,PetscInt 
*ilocal,,PetscSFNode *iremote,)

typedef struct {
  PetscInt rank;                /* Rank of owner */  (why is this not 
PetscMPIInt?)
  PetscInt index;               /* Index of node on rank */
} PetscSFNode;

abstractly nleaves, ilocal  are the indices (into an array) you are copying 
too; iremote are the indices (into an array) you are copying from 

Now if PETSc had an abstract concept of indexing (funny no one ever put that in 
PETSc decades ago) it would look like

PetscSFSetGraph(PetscSF sf,PetscInt nroots,  toIndices  , fromIndices)

but wait; PETSc does have have an abstraction for indexing (into regular 
arrays) called IS, come to think of it, PETSc has another abstraction for 
indexing (into arrays with different sized items) called XX.  Actually thinking 
a tiny bit more one realizes that there is a third: PetscSFNode is a way of 
indexing (into regular arrays) on a bunch of different processes. We come up 
with a couple more related, but inconsistent syntaxes for indexing, we'll have 
code has good as Trilinos.

   So let's fix this up! One abstraction for indexing that handles both regular 
arrays, arrays with different size items and both kinds of arrays on a bunch of 
different processes; with simple enough syntax so it can be used whenever 
indexing is needed including passing to PetscSF! (We can ignore the need for 
MPI datatypes in communication for now). 

   Concrete examples to demonstrate this need not be so difficult. A completely 
general set of indices for this situation could be handled by 

typedef struct {
   PetscInt         nindices;
   PetscMPIInt   *ranks;
   PetscInt          *offsets;
   PetscInt          *sizes;
}  XXData;  Hardly more than PetscSFNode.  Or could be handled as an array of 
structs.

special cases would include: 
    all on one process so ranks is not needed
    all items the same size so sizes not needed
    contiquous memory (or strided) so offsets  not needed

If we decided not to go with an abstract object but instead just a concrete 
data structure that could handle all cases it could look something like

typedef struct {
   PetscInt         nindices;
   PetscMPIInt   *ranks;
   PetscInt          *offsets;
   PetscInt          *sizes;
   PetscInt          chunksize,strideoffset;
}  XXData;

For completeness I note with IS (for example in VecScatter creating with 
parallel vectors) we don't use rank,offset notation we use global offset == 
rstart[rank] + offset but that is something I'm sure we can deal with.


    QED

    Comments?




Reply via email to