Hi Mani,

> I have a few questions regarding the usage of Viennacl in Petsc.

1) In the residual evaluation function:

PetscErrorCode ComputeResidual(TS ts,
                                PetscScalar t,
                                Vec X, Vec dX_dt,
                                Vec F, void *ptr)
{
     DM da;
     Vec localX;
     TSGetDM(ts, &da)
     DMGetLocalVector(da, &localX);

     DMGlobalToLocalBegin(da, X, INSERT_VALUES, localX);
     DMGlobalToLocalEnd(da, X, INSERT_VALUES, localX);

     viennacl::vector<PetscScalar> *x, *f;
     VecViennaCLGetArrayWrite(localX, &x);
     VecViennaCLGetArrayRead(F, &f);

     viennacl::ocl::enqueue(myKernel(*x, *f));
//Should it be viennacl::ocl::enqueue(myKernel(x, f))?

It should be viennacl::ocl::enqueue(myKernel(*x, *f));
Usually you also want to pass the sizes to the kernel. Don't forget to cast the sizes to the correct types (e.g. cl_uint).


     VecViennaCLRestoreArrayWrite(localX, &x);
     VecViennaCLRestoreArrayRead(F, &f);
     DMRestoreLocalVector(da, &localX);
}

Will the residual evaluation occur on the GPU/accelerator depending on
where we choose the ViennaCL array computations to occur? As I
understand, if we simply use VecGetArray in the residual evaluation
function, then the residual evaluation is still done on the CPU even
though the solves are done on the GPU.

If you use VecViennaCLGetArrayWrite(), the data will be valid on the GPU, so your residual evaluation should happen in the OpenCL kernel you provide. This is already the case in the code snippet above.


2) How does one choose on which device the ViennaCL array computations
will occur? I was looking for some flags like -viennacl
cpu/gpu/accelerator but could not find any in -help.

Use one out of
 -viennacl_device_cpu
 -viennacl_device_gpu
 -viennacl_device_accelerator


3) How can one pass compiler flags when building OpenCL kernels in ViennaCL?

You could do that through the ViennaCL API directly, but I'm not sure whether you really want to do this. Which flags do you want to set? My experience is that these options have little to no effect on performance, particularly for the memory-bandwidth-limited case. This is also the reason why I haven't provided a PETSc routine for this.

Best regards,
Karli

Reply via email to