Hi,
> We actually need two sets of files: One for dumping the benchmark
> results, one for holding the 'best' parameter configuration. For
> dumping results, we probably want something more lightweight than XML:
> - JSON
> - Just CSV files with a metadata section, e.g.
> #
Hi,
2013/8/3 Karl Rupp
> Hi,
>
>
> A padding of 256 looks pretty expensive to me, resulting in a lot of
>> unnecessary FLOPs in worst case. Can you please assemble a list of
>> all GEMM kernel configuration parameters and their execution times
>> for the GTX 470, Tesla C2050, H
Hi,
> A padding of 256 looks pretty expensive to me, resulting in a lot of
> unnecessary FLOPs in worst case. Can you please assemble a list of
> all GEMM kernel configuration parameters and their execution times
> for the GTX 470, Tesla C2050, HD 7970 and HD 5850? mL, nL, and kL
>
Hi,
2013/8/2 Karl Rupp
> Hey,
>
>
> >
>
>> Hmm, I'm not completely sure.
>> The best GEMM performance are not located "around" (distance-wise in the
>> parameter space) the sweet spot, generally, since perturbating one
>> parameter can result in disastrous performance.
>>
>
> Yeah, I agree, the
Hey,
>
> Hmm, I'm not completely sure.
> The best GEMM performance are not located "around" (distance-wise in the
> parameter space) the sweet spot, generally, since perturbating one
> parameter can result in disastrous performance.
Yeah, I agree, the sweet spot may not be defined 'distance-wise
Hi hi,
2013/8/2 Karl Rupp
> Hi,
>
> > I've been thinking a bit about dynamically zero-padding
> > viennacl::matrix<> for full hardware use ( best bandwidth for BLAS1,
> > BLAS2, best performance for BLAS3).
> >
> > Basically, the big problem arising is that the blocking-parameter is not
> > de
Hi,
> I've been thinking a bit about dynamically zero-padding
> viennacl::matrix<> for full hardware use ( best bandwidth for BLAS1,
> BLAS2, best performance for BLAS3).
>
> Basically, the big problem arising is that the blocking-parameter is not
> dependent on the hardware or the matrix, but ra
Hi everybody,
I've been thinking a bit about dynamically zero-padding viennacl::matrix<>
for full hardware use ( best bandwidth for BLAS1, BLAS2, best performance
for BLAS3).
Basically, the big problem arising is that the blocking-parameter is not
dependent on the hardware or the matrix, but rath