Re: [ViennaCL-devel] Testing GEMM

Philippe Tillet Thu, 14 Aug 2014 13:21:02 -0700

Hey,


2014-08-14 22:10 GMT+02:00 Karl Rupp <r...@iue.tuwien.ac.at>:

> Hi,
>
>
> > The GEMM kernel(s) are getting pretty tricky, with quite a few fallbacks
>
>> involved. This gets hard to test, so I thought it could be a good idea
>> to discuss this. Basically, here is how it works:
>>
>> A = [A1 A2; A3 A4]
>> B = [B1 B2; B3 B4]
>> C = [C1 C2; C3 C4]
>>
>> Where each block is divided according to the corresponding block size of
>> the template. For example; A1 is the closest multiple of the size tuple
>> (ML, KL), where ML is the number of rows computed by each work group,
>> and KL the "width step" for computing the inner products (If the kernel
>> use local memories, it will load successive blocks of size ML*KL in each
>> work group).
>>
>> A few kernels are enqueued so that:
>> C1 = A1*B1 [optimized kernel]
>> C1 += A2*B3 [fallback] if needed
>> C2 = A1*B2 [fallback] if needed
>> C2 += A2*B4 [fallback] if needed
>> etc...
>>
>> Basically, one optimized kernel doing the bulk of the work, and the
>> other ones doing the "clean-up". This works well for full matrices and
>> ranges. When slices are involved, things get more complicated. If the
>> stride is on the non-leading dimension (stride2 for column-major
>> matrices), then it can be incorporated in the optimized kernel. (by
>> appending ld *= stride2 at the beginning of the kernel). However, if
>> stride1 > 1, then we need to use the fallback kernel. This is a
>> reasonable thing to do : in most applications I know of, only one stride
>> is accessed at the time (we want a set of the rows/columns of a given
>> matrix).
>>
>> However, this becomes really messy to test! Basically, I think that, to
>> have an exhaustive enough testing suite, then we should go for:
>>
>> - Matrices of complicated arbitrary sizes (143, 284, 395). It is
>> important to space them by more than 128, to be sure that A1, B1 and C1
>> is not square.
>> - Ranges of similar complicated sizes.
>> - "Optimized" range: (128, 256, 384) for example
>> - matrix row-wise slices, matrix col-wise slices, matrix slice in both
>> directions.
>>
>
> As far as I can tell, all you need to do is to adjust the matrix sizes in
> the existing gemm tests? It covers all this already. What am I missing?


Well, essentially it's about reajusting the size, yes. But the tests should
be slightly different and allow for multiple passes on multiple size tuples.


>
>
>  I am ready to rewrite the GEMM tests accordingly, but any thought on the
>> procedure would be appreciated!
>>
>
> The GEMM tests are quite an issue already, because they consume a lot of
> time particularly on weaker systems. A substantial part of the problem is
> the verification on the CPU with uBLAS, which both adds an uBLAS dependency
> and is also rather slow. The current test sizes are pretty much the minimum
> possible, but still they take minutes to complete. Without a proper
> strategy to deal with this, chances are high that we make our test system
> almost unmanageable... Any clever approaches appreciated!
>
>
Well, with the current approach I've noticed that something a bit silly is
being done, in that products are computed many, many more times than
necessary. For all row/col layouts combination, A*B has to be computed only
once for full/range/stride. Then, C += A*B, C-=A*B can be tested on the GPU
without recomputing A*B on the CPU.

Right now, the CPU product is computed something like 8*27*12 = 2592 times.
We could equally test our GEMM implementation with only 27*4 = 108 clever
computations (all the full/stride/range combination for all the
transposition possibilities). Also, the test file is like 800 lines long,
which is a bit discouraging to modify :-p I'll refurbish it using macros
and such. As a side note, most tests could be really benefit from using
macros. I've lost a couple of hours a few days ago because the vector tests
report a failure on dot product when plane rotation is faulty. There are a
couple of similar glitches here and there. Perhaps we should do this during
the large code refactoring session we've planed for a couple of weeks
already :-p

Philippe

Philippe

Best regards,
> Karli
>
>

------------------------------------------------------------------------------

_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Testing GEMM

Reply via email to