But you won’t want to have to create the CUSP or ViennaCL objects on the fly each time from the CUDA/OpenCL “raw pointers”?
why not? I can just 'wrap' an existing memory buffer for use with the respective operations provided with the respective library. Note that the cost of setting up such wrappers is negligible compared to the cost of launching a CUDA or OpenCL kernel.
Best regards, Karli
