Ah yes, thanks Karl.  I remember that now.  With that said, are there
recommendations on how kernels should be written to address the padded
columns?  I am imagining some if/else or loop limits on indices but thought
I would ask here before I start trying to do that.  I am trying to look
through the kernels and I am seeing things along the lines of
'global_size(0) < size' where I assume size refers to one of the dimensions?

If so, I humbly recommend that although the padding is mentioned with
respect to the matrix types either an example or explanation would be
valuable in the custom kernel section (at the very least another friendly
reminder).  Not all repetition is bad :)

Thanks,
Charles

On Mon, May 23, 2016 at 2:03 PM, Karl Rupp <r...@iue.tuwien.ac.at> wrote:

> Hi,
>
>
> On 05/23/2016 05:38 PM, Charles Determan wrote:
>
>> I am experimenting with the custom OpenCL kernel functionality,
>> specifically a naive matrix multiplication as an example.
>>
>> My OpenCL Kernel:
>> __kernel void iMatMult(const int Mdim, const int Pdim,
>>                         __global const int *A, __global const int *B,
>> __global int *C) {
>>
>>      // Get the index of the elements to be processed
>>      const int globalRow = get_global_id(0); // C Row ID
>>      const int globalCol = get_global_id(1); // C Col ID
>>      int tmp = 0;
>>
>>      // Do the operation
>>      for(int k=0; k < Pdim; k++){
>>          tmp += A[k*Mdim+globalRow] * B[globalCol*Pdim+k];
>>      }
>>      C[globalCol*Mdim+globalRow] = tmp;
>> }
>>
>>
>> Relevant C++ code
>> where vcl_* refer to viennacl::matrix<int>
>> and my_kernel is a string referring to the kernel above:
>>
>>      int M = vcl_A.size2();
>>      int P = vcl_A.size1();
>>
>>      // add kernel to program
>>      viennacl::ocl::program & my_prog =
>> viennacl::ocl::current_context().add_program(my_kernel, "my_kernel");
>>
>>      // get compiled kernel function
>>      viennacl::ocl::kernel & my_kernel_mul =
>> my_prog.get_kernel("iMatMult");
>>
>>      // execute kernel
>>      viennacl::ocl::enqueue(my_kernel_mul(M, P, vcl_A, vcl_B, vcl_C));
>>
>>
>> Oddly, the results in the vcl_C object are incorrect.  But if I manually
>> go through the OpenCL using the C++ API the results are correct (which
>> you can see the API code here
>> https://github.com/cdeterman/gpuR/blob/develop/src/gpuMatrix_igemm.cpp).
>> Did
>> I miss something?
>>
>
> yes, you missed the internal data layout: rows/columns may be padded with
> zeros:
> http://viennacl.sourceforge.net/doc/manual-types.html#manual-types-matrix
> (this has performance reasons, but it is - unfortunately - often
> overlooked by users)
>
> Best regards,
> Karli
>
>
>
>
>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Mobile security can be enabling, not merely restricting. Employees who
>> bring their own devices (BYOD) to work are irked by the imposition of MDM
>> restrictions. Mobile Device Manager Plus allows you to control only the
>> apps on BYO-devices by containerizing them, leaving personal data
>> untouched!
>> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
>>
>>
>>
>> _______________________________________________
>> ViennaCL-devel mailing list
>> ViennaCL-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>>
>>
>
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to