Hi Mani,

> 1) In ViennaCL, queue.finish() seems to be called only if we enable the
flags VIENNACL_DEBUG_ALL or VIENNACL_DEBUG_KERNEL. How do I ensure that
my custom kernel finishes when the debug mode is not enabled?

Use this:

viennacl::ocl::enqueue(kernel(*vars, *dvars_dt, *Fvars));
viennacl::backend::finish();

However, you don't need to call finish() at all. All reads from the device are implicitly synchronized within the OpenCL command queue, so any subsequent operations are guaranteed to work on the latest data.



2) My platform 0 has only a GPU. So when I launch my custom kernel
inside the residual function it indeed does evaluate on the GPU. In
particular, suppose my residual function (for seq case) is like this:

PetscErrorCode ComputeResidual(TS ts,
                                PetscScalar t,
                                Vec X, Vec dX_dt,
                                Vec F, void *ptr)
{
     VecViennaCLGetArrayRead(X, &x);
     VecViennaCLGetArrayWrite(F, &f);

     viennacl::ocl::enqueue(myKernel(*x, *f));
// Put something here to finish the kernel.

     VecViennaCLRestoreArrayRead(X, &x);
     VecViennaCLRestoreArrayWrite(F, &f);
}

and I execute as given below:

./program

then the code inside the ComputeResidual function runs inside the GPU
but everything else runs on the CPU, right? (since I did not specify
-dm_vec_type viennacl  -dm_mat_type aijviennacl).

If I remember correctly, you'll get an error if you check the return value from VecViennaCL*();


Now suppose I execute
as given below:

./program -dm_vec_type viennacl -dm_mat_type aijviennacl

then every vector operation occurs using the viennacl code
in vecviennacl.cxx. And since my default platform is 0 (only having a
NVIDIA GPU), I thought everything will run on the GPU. However with the
ViennaCL debug mode, I get the following messages for the vector operations:

ViennaCL: Starting 1D-kernel 'assign_cpu'...
ViennaCL: Global work size: '16384'...
ViennaCL: Local work size: '128'...
ViennaCL: Kernel assign_cpu finished!

How is it possible that part of the ViennaCL code is using my CPU (which
is on a completely different platform, #1) and the custom kernel is
launched on my GPU (platform #0).

The kernel 'assign_cpu' indicates that the operation
 x[i] <- alpha
is performed on the OpenCL device, where alpha is a scalar value located in main RAM ('provided from CPU RAM', hence the 'cpu' suffix). All ViennaCL-related operations are executed as expected on the GPU.

Note to self: We better include the active device name in the debug output. :-)

Best regards,
Karli

Reply via email to