Karl,
I believe I figured it out, your comment about the global sizes allowed me
to realize the the defaults don't account for a second dimension. Once I
set that I am able to get the kernel to work properly. Thank you for
listening and directing me to different points to check.
Regards,
Charles
On Fri, Jun 10, 2016 at 7:42 AM, Charles Determan <cdeterma...@gmail.com>
wrote:
> I neglected one further question you had
>
> Which local and global work sizes do you use?
>
> I am not setting any local/global work sizes as I thought the defaults
> specified by ViennaCL were supposed to be sufficient as noted in the
> documentation (
> http://viennacl.sourceforge.net/doc/manual-custom-kernels.html) - 'The
> default work sizes suffice for most cases'.
>
> Regards,
> Charles
>
> On Fri, Jun 10, 2016 at 7:35 AM, Charles Determan <cdeterma...@gmail.com>
> wrote:
>
>> Karl,
>>
>> I am trying to adapt from a previous kernel I knew worked on an unpadded
>> matrix.
>>
>> __kernel void iMatMult(const int Mdim, const int Pdim,
>> __global const int *A, __global const int *B,
>> __global int *C) {
>>
>> int k;
>>
>> // Get the index of the elements to be processed
>> const int globalRow = get_global_id(0); // C Row ID
>> const int globalCol = get_global_id(1); // C Col ID
>> int tmp = 0;
>>
>> // Do the operation
>> for(k=0; k < Pdim; k++){
>> tmp += A[k*Mdim+globalRow] * B[globalCol*Pdim+k];
>> }
>> C[globalCol*Mdim+globalRow] = tmp;
>> }
>>
>> So when you ask - "where is the third dimension? Are you assuming C to be
>> M-by-M?"
>>
>> I haven't passed a third dimension as Mdim is the number of columns and
>> Pdim is the number of rows in matrix 'A'.
>>
>> Which values do you pass to the kernel? Which local and global work sizes
>> do you use?
>>
>> Right now I am passing Mdim, Pdim, MdimPad (padded number of columns),
>> PdimPad (padded number of rows), and three matrices.
>>
>> I'm confused with your use of MdimPad and PdimPad here. As currently
>> written, A has Mdim columns, and B has Pdim columns. But this doesn't agree
>> with the if-check above, where C is assumed Mdim-by-Mdim.
>>
>> I am using MdimPad and PdimPad to index the matrix elements because they
>> are padded (this is new to me for writing OpenCL kernels). C is intended
>> to be square but I can't even get it to work with a square matrix. That
>> line actually looks like I intended to have:
>>
>> if (globalRow > MdimPad || globalCol > PdimPad)
>> return;
>>
>> but that still doesn't fix the problem for me.
>>
>> The last line assumes C to be M-by-M. Is this the case?
>>
>> Again, I am trying to base this off the previous kernel which I thought
>> worked for non-square matrices but I could very well be mistaken. The
>> entire goal here is to just get a basic working integer gemm kernel for
>> square or rectangular matrices. I really didn't think it would be
>> difficult but I think I have fallen in a rabbit hole at this point and
>> likely just confusing myself.
>>
>> Regards,
>> Charles
>>
>>
>> On Fri, Jun 10, 2016 at 3:40 AM, Karl Rupp <r...@iue.tuwien.ac.at> wrote:
>>
>>> Hi Charles,
>>>
>>> Here is the current kernel
>>>> with all the different attempts commented out (where MdimPad and PdimPad
>>>> or the padded dimensions).
>>>>
>>>
>>> where is the third dimension? Are you assuming C to be M-by-M?
>>>
>>>
>>>
>>> If I don't have a size condition check, the
>>>> device quickly runs out of resources (Error: ViennaCL: FATAL ERROR:
>>>> CL_OUT_OF_RESOURCES ). Any thoughts? I feel like I must be missing
>>>> something simple at this point.
>>>>
>>>
>>> Which values do you pass to the kernel? Which local and global work
>>> sizes do you use?
>>>
>>>
>>>
>>>
>>> __kernel void iMatMult(const int Mdim, const int MdimPad,
>>>> const int Pdim, const int PdimPad,
>>>> __global const int *A, __global const int *B,
>>>> __global int *C) {
>>>>
>>>> // Get the index of the elements to be processed
>>>> const int globalRow = get_global_id(0); // C Row ID
>>>> const int globalCol = get_global_id(1); // C Col ID
>>>> int tmp = 0;
>>>>
>>>> if (globalRow > MdimPad || globalCol > MdimPad)
>>>> return;
>>>>
>>>
>>> Here it should be enough to check against Mdim.
>>>
>>> printf("globalCol = %d\n", globalCol);
>>>> printf("globalRow = %d\n", globalRow);
>>>>
>>>> // Do the operation
>>>> for(int k=0; k < Pdim; k++){
>>>> tmp += A[globalRow * MdimPad + k] * B[globalCol+PdimPad*k];
>>>>
>>>
>>> I'm confused with your use of MdimPad and PdimPad here. As currently
>>> written, A has Mdim columns, and B has Pdim columns. But this doesn't agree
>>> with the if-check above, where C is assumed Mdim-by-Mdim.
>>>
>>> }
>>>>
>>>> C[globalCol+MdimPad*globalRow] = tmp;
>>>>
>>>
>>> The last line assumes C to be M-by-M. Is this the case?
>>>
>>> Best regards,
>>> Karli
>>>
>>>
>>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel