Hi, > I believe I figured it out, your comment about the global sizes allowed > me to realize the the defaults don't account for a second dimension. > Once I set that I am able to get the kernel to work properly. Thank you > for listening and directing me to different points to check.
ah, great, I'm glad it's now working! :-) Best regards, Karli > On Fri, Jun 10, 2016 at 7:42 AM, Charles Determan <[email protected] > <mailto:[email protected]>> wrote: > > I neglected one further question you had > > Which local and global work sizes do you use? > > I am not setting any local/global work sizes as I thought the > defaults specified by ViennaCL were supposed to be sufficient as > noted in the documentation > (http://viennacl.sourceforge.net/doc/manual-custom-kernels.html) - > 'The default work sizes suffice for most cases'. > > Regards, > Charles > > On Fri, Jun 10, 2016 at 7:35 AM, Charles Determan > <[email protected] <mailto:[email protected]>> wrote: > > Karl, > > I am trying to adapt from a previous kernel I knew worked on an > unpadded matrix. > > __kernel void iMatMult(const int Mdim, const int Pdim, > __global const int *A, __global const > int *B, __global int *C) { > > int k; > > // Get the index of the elements to be processed > const int globalRow = get_global_id(0); // C Row ID > const int globalCol = get_global_id(1); // C Col ID > int tmp = 0; > > // Do the operation > for(k=0; k < Pdim; k++){ > tmp += A[k*Mdim+globalRow] * B[globalCol*Pdim+k]; > } > C[globalCol*Mdim+globalRow] = tmp; > } > > So when you ask - "where is the third dimension? Are you > assuming C to be M-by-M?" > > I haven't passed a third dimension as Mdim is the number of > columns and Pdim is the number of rowsin matrix 'A'. > > Which values do you pass to the kernel? Which local and global > work sizes do you use? > > Right now I am passing Mdim, Pdim, MdimPad (padded number of > columns), PdimPad (padded number of rows), and three matrices. > > I'm confused with your use of MdimPad and PdimPad here. As > currently written, A has Mdim columns, and B has Pdim columns. > But this doesn't agree with the if-check above, where C is > assumed Mdim-by-Mdim. > > I am using MdimPad and PdimPad to index the matrix elements > because they are padded (this is new to me for writing OpenCL > kernels). C is intended to be square but I can't even get it to > work with a square matrix. That line actually looks like I > intended to have: > > if (globalRow > MdimPad || globalCol > PdimPad) > return; > > but that still doesn't fix the problem for me. > > The last line assumes C to be M-by-M. Is this the case? > > Again, I am trying to base this off the previous kernel which I > thought worked for non-square matrices but I could very well be > mistaken. The entire goal here is to just get a basic working > integer gemm kernel for square or rectangular matrices. I > really didn't think it would be difficult but I think I have > fallen in a rabbit hole at this point and likely just confusing > myself. > > Regards, > Charles > > > On Fri, Jun 10, 2016 at 3:40 AM, Karl Rupp > <[email protected] <mailto:[email protected]>> wrote: > > Hi Charles, > > Here is the current kernel > with all the different attempts commented out (where > MdimPad and PdimPad > or the padded dimensions). > > > where is the third dimension? Are you assuming C to be M-by-M? > > > > If I don't have a size condition check, the > device quickly runs out of resources (Error: ViennaCL: > FATAL ERROR: > CL_OUT_OF_RESOURCES ). Any thoughts? I feel like I > must be missing > something simple at this point. > > > Which values do you pass to the kernel? Which local and > global work sizes do you use? > > > > > __kernel void iMatMult(const int Mdim, const int MdimPad, > const int Pdim, const int PdimPad, > __global const int *A, __global > const int *B, > __global int *C) { > > // Get the index of the elements to be processed > const int globalRow = get_global_id(0); // C Row ID > const int globalCol = get_global_id(1); // C Col ID > int tmp = 0; > > if (globalRow > MdimPad || globalCol > MdimPad) > return; > > > Here it should be enough to check against Mdim. > > printf("globalCol = %d\n", globalCol); > printf("globalRow = %d\n", globalRow); > > // Do the operation > for(int k=0; k < Pdim; k++){ > tmp += A[globalRow * MdimPad + k] * > B[globalCol+PdimPad*k]; > > > I'm confused with your use of MdimPad and PdimPad here. As > currently written, A has Mdim columns, and B has Pdim > columns. But this doesn't agree with the if-check above, > where C is assumed Mdim-by-Mdim. > > } > > C[globalCol+MdimPad*globalRow] = tmp; > > > The last line assumes C to be M-by-M. Is this the case? > > Best regards, > Karli > > > > ------------------------------------------------------------------------------ What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic patterns at an interface-level. Reveals which users, apps, and protocols are consuming the most bandwidth. Provides multi-vendor support for NetFlow, J-Flow, sFlow and other flows. Make informed decisions using capacity planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e _______________________________________________ ViennaCL-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/viennacl-devel
