Hi,

 > I believe I figured it out, your comment about the global sizes allowed
> me to realize the the defaults don't account for a second dimension.
> Once I set that I am able to get the kernel to work properly.  Thank you
> for listening and directing me to different points to check.

ah, great, I'm glad it's now working! :-)

Best regards,
Karli




> On Fri, Jun 10, 2016 at 7:42 AM, Charles Determan <[email protected]
> <mailto:[email protected]>> wrote:
>
>     I neglected one further question you had
>
>     Which local and global work sizes do you use?
>
>     I am not setting any local/global work sizes as I thought the
>     defaults specified by ViennaCL were supposed to be sufficient as
>     noted in the documentation
>     (http://viennacl.sourceforge.net/doc/manual-custom-kernels.html) -
>     'The default work sizes suffice for most cases'.
>
>     Regards,
>     Charles
>
>     On Fri, Jun 10, 2016 at 7:35 AM, Charles Determan
>     <[email protected] <mailto:[email protected]>> wrote:
>
>         Karl,
>
>         I am trying to adapt from a previous kernel I knew worked on an
>         unpadded matrix.
>
>         __kernel void iMatMult(const int Mdim, const int Pdim,
>                                 __global const int *A, __global const
>         int *B, __global int *C) {
>
>              int k;
>
>              // Get the index of the elements to be processed
>              const int globalRow = get_global_id(0); // C Row ID
>              const int globalCol = get_global_id(1); // C Col ID
>              int tmp = 0;
>
>              // Do the operation
>              for(k=0; k < Pdim; k++){
>                  tmp += A[k*Mdim+globalRow] * B[globalCol*Pdim+k];
>              }
>              C[globalCol*Mdim+globalRow] = tmp;
>         }
>
>         So when you ask - "where is the third dimension? Are you
>         assuming C to be M-by-M?"
>
>         I haven't passed a third dimension as Mdim is the number of
>         columns and Pdim is the number of rowsin matrix 'A'.
>
>         Which values do you pass to the kernel? Which local and global
>         work sizes do you use?
>
>         Right now I am passing Mdim, Pdim, MdimPad (padded number of
>         columns), PdimPad (padded number of rows), and three matrices.
>
>         I'm confused with your use of MdimPad and PdimPad here. As
>         currently written, A has Mdim columns, and B has Pdim columns.
>         But this doesn't agree with the if-check above, where C is
>         assumed Mdim-by-Mdim.
>
>         I am using MdimPad and PdimPad to index the matrix elements
>         because they are padded (this is new to me for writing OpenCL
>         kernels).  C is intended to be square but I can't even get it to
>         work with a square matrix.  That line actually looks like I
>         intended to have:
>
>         if (globalRow > MdimPad || globalCol > PdimPad)
>                  return;
>
>         but that still doesn't fix the problem for me.
>
>         The last line assumes C to be M-by-M. Is this the case?
>
>         Again, I am trying to base this off the previous kernel which I
>         thought worked for non-square matrices but I could very well be
>         mistaken.  The entire goal here is to just get a basic working
>         integer gemm kernel for square or rectangular matrices.  I
>         really didn't think it would be difficult but I think I have
>         fallen in a rabbit hole at this point and likely just confusing
>         myself.
>
>         Regards,
>         Charles
>
>
>         On Fri, Jun 10, 2016 at 3:40 AM, Karl Rupp
>         <[email protected] <mailto:[email protected]>> wrote:
>
>             Hi Charles,
>
>                 Here is the current kernel
>                 with all the different attempts commented out (where
>                 MdimPad and PdimPad
>                 or the padded dimensions).
>
>
>             where is the third dimension? Are you assuming C to be M-by-M?
>
>
>
>                 If I don't have a size condition check, the
>                 device quickly runs out of resources (Error: ViennaCL:
>                 FATAL ERROR:
>                 CL_OUT_OF_RESOURCES ).  Any thoughts?  I feel like I
>                 must be missing
>                 something simple at this point.
>
>
>             Which values do you pass to the kernel? Which local and
>             global work sizes do you use?
>
>
>
>
>                 __kernel void iMatMult(const int Mdim, const int MdimPad,
>                                          const int Pdim, const int PdimPad,
>                                          __global const int *A, __global
>                 const int *B,
>                 __global int *C) {
>
>                       // Get the index of the elements to be processed
>                       const int globalRow = get_global_id(0); // C Row ID
>                       const int globalCol = get_global_id(1); // C Col ID
>                       int tmp = 0;
>
>                       if (globalRow > MdimPad || globalCol > MdimPad)
>                           return;
>
>
>             Here it should be enough to check against Mdim.
>
>                       printf("globalCol = %d\n", globalCol);
>                       printf("globalRow = %d\n", globalRow);
>
>                       // Do the operation
>                       for(int k=0; k < Pdim; k++){
>                           tmp += A[globalRow * MdimPad + k] *
>                 B[globalCol+PdimPad*k];
>
>
>             I'm confused with your use of MdimPad and PdimPad here. As
>             currently written, A has Mdim columns, and B has Pdim
>             columns. But this doesn't agree with the if-check above,
>             where C is assumed Mdim-by-Mdim.
>
>                       }
>
>                       C[globalCol+MdimPad*globalRow] = tmp;
>
>
>             The last line assumes C to be M-by-M. Is this the case?
>
>             Best regards,
>             Karli
>
>
>
>


------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity 
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
ViennaCL-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to