Hi Paul, thanks for the comments. I'll have a look whether we can have an intermediate layer for CUDA and OpenCL, e.g. Vec_Seq -> Vec_CUDA -> Vec_Thrust. This should allow us to define a broader set of operations on Vec_CUDA (and similarly for matrices), particularly such not covered by CUSparse and Thrust.
Best regards, Karli On 10/09/2012 03:48 PM, Paul Mullowney wrote: > I think the current vector class should be Vec_Thrust with > > -vec_type thrust (not cusp) > > First, most of the vector functions are computed from kernels in the > Thrust library (although there may be an occasional CUSP or CUBLAS > function call). Second, it is not clear how long CUSP is going to > survive ... and I think Nvidia puts more energy into CUSPARSE and Thrust. > > I think a Vec_CUDA would be very useful ... there is a lot you could do > with this that you can't currently do with Thrust. > > I think separating the Mat types into CUSP and CUSPARSE is sensible. > > -Paul > > > >>> Hi guys, >>> >>> as our discussion of memory is more and more drifting apart towards >>> runtime and scheduling aspects, I'll try to wrap up the key points of >>> the memory part of the discussion and postpone all runtime/execution >>> aspects to 'Part 2' of the series. >>> >>> * The proposed unification of memory handles (CPU and GPU) within >>> *data of Vec could not find any backup, rather the GPU handles should >>> remain in GPUarray (or any equivalent for OpenCL/CUDA). However, it >>> is not yet clear whether we want to stick with library-specific names >>> such as Vec_CUSP, or whether we want to go with runtime-specific >>> names such as Vec_CUDA and Vec_OpenCL and probably dispatch into >>> library-specific routines from there. Jed pointed out that Vec_OpenCL >>> is probably too fuzzy, suggesting that Vec_LIBRARYNAME is the better >>> option. >> >> The Vec_CUSP is most definitely built on top of CUSP and is not >> built around generic CUDA hence going to Vec_CUDA from Vec_CUSP >> doesn't make sense to me. If we had (have? as an alternative) a Vec >> class that was built directly on CUDA then it could be called >> Vec_CUDA. Similarly if Vec_OpenCL is built directly on generic OpenCL >> then that name is fine, if it is built on top of something like >> ViennaCL then Vec_ViennaCL would be the way to go. >> >> Barry >> >> >> Paul has put in some code based on cusparse, I haven't had the energy >> to see how that works. Perhaps there should be a Vec_CUSparse to that. >> >>> * Barry backups my suggestion to have multi-GPU support for a single >>> process, whereas Jed and Matt suggest to map one GPU to one >>> MPI-process for reasons of simplicity. As the usual application of >>> multi-GPU is within sparse matrix-vector products and block-based >>> preconditioners, I note the following: >>> - Such implementations are basically available out-of-the-box with MPI. >>> - According to the manual, block-based preconditioners can also be >>> configured on a per-process basis, thus allowing to use the >>> individual streaming processors on a GPU efficiently (there is no >>> native synchronization possible between streaming processors within a >>> single kernel!). >>> - The current multi-GPU support using txpetscgpu focuses on sparse >>> matrix-vector products only (there are some hints in >>> src/ksp/pc/impls/factor/ilu that forward-backward substitutions for >>> ILU preconditioners on GPUs may also be available, yet I haven't >>> found any actual code/kernels for that). >>> Consequently, from the available functionality it seems that we can >>> live with a one-GPU-per-process option. >>> >>> * Adding a bit of meta information to arrays in main RAM (without >>> splitting up the actual buffer) for increased cache-awareness >>> requires a demonstration of significant performance benefits for any >>> further consideration. >>> >>> If my wrap-up missed some part of the discussion, please let me/us >>> know. I'll now move on to the actual runtime and come up with more >>> concrete ideas in 'Part 2' :-) >>> >>> Best regards, >>> Karli >>> >>> >
