Hey,
2014-08-14 22:10 GMT+02:00 Karl Rupp <r...@iue.tuwien.ac.at>: > Hi, > > > > The GEMM kernel(s) are getting pretty tricky, with quite a few fallbacks > >> involved. This gets hard to test, so I thought it could be a good idea >> to discuss this. Basically, here is how it works: >> >> A = [A1 A2; A3 A4] >> B = [B1 B2; B3 B4] >> C = [C1 C2; C3 C4] >> >> Where each block is divided according to the corresponding block size of >> the template. For example; A1 is the closest multiple of the size tuple >> (ML, KL), where ML is the number of rows computed by each work group, >> and KL the "width step" for computing the inner products (If the kernel >> use local memories, it will load successive blocks of size ML*KL in each >> work group). >> >> A few kernels are enqueued so that: >> C1 = A1*B1 [optimized kernel] >> C1 += A2*B3 [fallback] if needed >> C2 = A1*B2 [fallback] if needed >> C2 += A2*B4 [fallback] if needed >> etc... >> >> Basically, one optimized kernel doing the bulk of the work, and the >> other ones doing the "clean-up". This works well for full matrices and >> ranges. When slices are involved, things get more complicated. If the >> stride is on the non-leading dimension (stride2 for column-major >> matrices), then it can be incorporated in the optimized kernel. (by >> appending ld *= stride2 at the beginning of the kernel). However, if >> stride1 > 1, then we need to use the fallback kernel. This is a >> reasonable thing to do : in most applications I know of, only one stride >> is accessed at the time (we want a set of the rows/columns of a given >> matrix). >> >> However, this becomes really messy to test! Basically, I think that, to >> have an exhaustive enough testing suite, then we should go for: >> >> - Matrices of complicated arbitrary sizes (143, 284, 395). It is >> important to space them by more than 128, to be sure that A1, B1 and C1 >> is not square. >> - Ranges of similar complicated sizes. >> - "Optimized" range: (128, 256, 384) for example >> - matrix row-wise slices, matrix col-wise slices, matrix slice in both >> directions. >> > > As far as I can tell, all you need to do is to adjust the matrix sizes in > the existing gemm tests? It covers all this already. What am I missing? Well, essentially it's about reajusting the size, yes. But the tests should be slightly different and allow for multiple passes on multiple size tuples. > > > I am ready to rewrite the GEMM tests accordingly, but any thought on the >> procedure would be appreciated! >> > > The GEMM tests are quite an issue already, because they consume a lot of > time particularly on weaker systems. A substantial part of the problem is > the verification on the CPU with uBLAS, which both adds an uBLAS dependency > and is also rather slow. The current test sizes are pretty much the minimum > possible, but still they take minutes to complete. Without a proper > strategy to deal with this, chances are high that we make our test system > almost unmanageable... Any clever approaches appreciated! > > Well, with the current approach I've noticed that something a bit silly is being done, in that products are computed many, many more times than necessary. For all row/col layouts combination, A*B has to be computed only once for full/range/stride. Then, C += A*B, C-=A*B can be tested on the GPU without recomputing A*B on the CPU. Right now, the CPU product is computed something like 8*27*12 = 2592 times. We could equally test our GEMM implementation with only 27*4 = 108 clever computations (all the full/stride/range combination for all the transposition possibilities). Also, the test file is like 800 lines long, which is a bit discouraging to modify :-p I'll refurbish it using macros and such. As a side note, most tests could be really benefit from using macros. I've lost a couple of hours a few days ago because the vector tests report a failure on dot product when plane rotation is faulty. There are a couple of similar glitches here and there. Perhaps we should do this during the large code refactoring session we've planed for a couple of weeks already :-p Philippe Philippe Best regards, > Karli > >
------------------------------------------------------------------------------
_______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel