CUDA: Part 1: Memory

Karl Rupp Wed, 10 Oct 2012 09:18:55 -0500

Hi Paul,

thanks for the comments. I'll have a look whether we can have an 
intermediate layer for CUDA and OpenCL, e.g.
  Vec_Seq -> Vec_CUDA -> Vec_Thrust.
This should allow us to define a broader set of operations on Vec_CUDA 
(and similarly for matrices), particularly such not covered by CUSparse 
and Thrust.


Best regards,
Karli


On 10/09/2012 03:48 PM, Paul Mullowney wrote:
> I think the current vector class should be Vec_Thrust with
>
> -vec_type thrust (not cusp)
>
> First, most of the vector functions are computed from kernels in the
> Thrust library (although there may be an occasional CUSP or CUBLAS
> function call). Second, it is not clear how long CUSP is going to
> survive ... and I think Nvidia puts more energy into CUSPARSE and Thrust.
>
> I think a Vec_CUDA would be very useful ... there is a lot you could do
> with this that you can't currently do with Thrust.
>
> I think separating the Mat types into CUSP and CUSPARSE is sensible.
>
> -Paul
>
>
>
>>> Hi guys,
>>>
>>> as our discussion of memory is more and more drifting apart towards
>>> runtime and scheduling aspects, I'll try to wrap up the key points of
>>> the memory part of the discussion and postpone all runtime/execution
>>> aspects to 'Part 2' of the series.
>>>
>>> * The proposed unification of memory handles (CPU and GPU) within
>>> *data of Vec could not find any backup, rather the GPU handles should
>>> remain in GPUarray (or any equivalent for OpenCL/CUDA). However, it
>>> is not yet clear whether we want to stick with library-specific names
>>> such as Vec_CUSP, or whether we want to go with runtime-specific
>>> names such as Vec_CUDA and Vec_OpenCL and probably dispatch into
>>> library-specific routines from there. Jed pointed out that Vec_OpenCL
>>> is probably too fuzzy, suggesting that Vec_LIBRARYNAME is the better
>>> option.
>>
>>       The Vec_CUSP is most definitely built on top of CUSP and is not
>> built around generic CUDA hence going to Vec_CUDA from Vec_CUSP
>> doesn't make sense to me. If we had (have? as an alternative) a Vec
>> class that was built directly on CUDA then it could be called
>> Vec_CUDA. Similarly if Vec_OpenCL is built directly on generic OpenCL
>> then that name is fine, if it is built on top of something like
>> ViennaCL then Vec_ViennaCL would be the way to go.
>>
>>      Barry
>>
>>
>> Paul has put in some code based on cusparse, I haven't had the energy
>> to see how that works. Perhaps there should be a Vec_CUSparse to that.
>>
>>> * Barry backups my suggestion to have multi-GPU support for a single
>>> process, whereas Jed and Matt suggest to map one GPU to one
>>> MPI-process for reasons of simplicity. As the usual application of
>>> multi-GPU is within sparse matrix-vector products and block-based
>>> preconditioners, I note the following:
>>> - Such implementations are basically available out-of-the-box with MPI.
>>> - According to the manual, block-based preconditioners can also be
>>> configured on a per-process basis, thus allowing to use the
>>> individual streaming processors on a GPU efficiently (there is no
>>> native synchronization possible between streaming processors within a
>>> single kernel!).
>>> - The current multi-GPU support using txpetscgpu focuses on sparse
>>> matrix-vector products only (there are some hints in
>>> src/ksp/pc/impls/factor/ilu that forward-backward substitutions for
>>> ILU preconditioners on GPUs may also be available, yet I haven't
>>> found any actual code/kernels for that).
>>> Consequently, from the available functionality it seems that we can
>>> live with a one-GPU-per-process option.
>>>
>>> * Adding a bit of meta information to arrays in main RAM (without
>>> splitting up the actual buffer) for increased cache-awareness
>>> requires a demonstration of significant performance benefits for any
>>> further consideration.
>>>
>>> If my wrap-up missed some part of the discussion, please let me/us
>>> know. I'll now move on to the actual runtime and come up with more
>>> concrete ideas in 'Part 2' :-)
>>>
>>> Best regards,
>>> Karli
>>>
>>>
>

[petsc-dev] Unification approach for OpenMP/Threads/OpenCL/CUDA: Part 1: Memory

Reply via email to