Hi Matt, > In a purely CPU-driven execution, there is a pointer to the data > (*data), which is assumed to reside in a single linear piece of > memory (please correct me if I'm wrong), yet may be managed by some > external routines (VecOps). > > > No, the 'data' is actually a pointer to the implementation class (it is > helpful to compare this to other class headers, which all have > the data pointer). In this case, it would be Vec_Seq or Vec_MPI > > http://petsc.cs.iit.edu/petsc/petsc-dev/annotate/0b92fc173218/src/vec/vec/impls/dvecimpl.h#l14 > > In fact is VECHEADER that has the array: > > http://petsc.cs.iit.edu/petsc/petsc-dev/annotate/0b92fc173218/include/petsc-private/vecimpl.h#l435 > > Jed started the practice of linking to code, and I think its the bees > knees. You are correct that all these implementations > assume a piece of linear memory on the CPU. On the GPU, we synchronize > some linear memory with Cusp vectors.
I'm aware of the redirection to Vec_Seq and Vec_MPI (see Section 3), my sentence just took a slight shortcut here. Anyway, thanks for pointing that out :-) > As accelerators enter the game (indicated by PETSC_HAVE_CUSP), the > concept of a vector having one pointer to its data is undermined. > Now, Vec can possibly have data on CPU RAM, and on one (multiple > with txpetscgpu) CUDA accelerator. 'valid_GPU_array' indicates which > of the two memory domains holds the most recent data, possibly both. > > > There is an implementation of PETSc Vecs with non-contiguous memory for > SAMRAI. > Thanks, I'll have a look at this. > (...) > -- 4. Concluding remarks -- > > Even though the mere question of how to hold memory handles is > certainly less complex than a full unification of actual operations > at runtime, this first step needs to be done right in order to have > a solid foundation to built on. Thus, if you guys spot any > weaknesses in the proposed modifications, please let me know. I > tried to align everything such that integrates nicely into Petsc, > yet I don't know many of the implementation details yet... > > > I can't tell from the above how we would synchronize memory. Perhaps it > would be easy to show with an example > of how this would work, as opposed to the current system. The memory synchronization is something that interferes with the actual runtime (data manipulation), so I just focused on the datastructure. Basically, the synchronizations would be accomplished in essentially the same way as now (VecCUSPCopyToGPU(), VecCUSPCopyToGPU(), etc., I can't look up the exact names), but possibly with finer granularity (cf. VecCUSPCopyFromGPUSome()). The important point here, however, is the independence from the implementation libraries, otherwise we would have to maintain a separate memory management implementation for each GPU library we possibly interface with. Thanks and best regards, Karli
