Hi hi,
2013/8/2 Karl Rupp
> Hi,
>
> > I've been thinking a bit about dynamically zero-padding
> > viennacl::matrix<> for full hardware use ( best bandwidth for BLAS1,
> > BLAS2, best performance for BLAS3).
> >
> > Basically, the big problem arising is that the blocking-parameter is not
> > de
Hey,
>
> Hmm, I'm not completely sure.
> The best GEMM performance are not located "around" (distance-wise in the
> parameter space) the sweet spot, generally, since perturbating one
> parameter can result in disastrous performance.
Yeah, I agree, the sweet spot may not be defined 'distance-wise
FYI:
OpenMP 4.0 specification has been released, which includes support for
accelerators, thread affinity, Fortran 2003, etc.:
http://www.hpcwire.com/hpcwire/2013-07-31/openmp_40_specification_released_with_significant_new_standard_features.html
And CUDA 5.5 has been released:
http://www.hpcwir
Hi Evan,
> OpenMP 4.0 specification has been released, which includes support for
> accelerators, thread affinity, Fortran 2003, etc.:
>
> http://www.hpcwire.com/hpcwire/2013-07-31/openmp_40_specification_released_with_significant_new_standard_features.html
Thanks! I consider the thread affinity
I hope it wont take years. I saw a presentation earlier today that
they already have spec 5 started and are hoping to have most of it
squared away by SC13.
-E
On Fri, Aug 2, 2013 at 3:20 PM, Karl Rupp wrote:
> Hi Evan,
>
> > OpenMP 4.0 specification has been released, which includes support for
Hey,
> I hope it wont take years.
First compiler implementations will be available in no time, sure.
However, it will take years until enterprise cluster systems like CentOS
have upgraded to these compilers. We still have clusters here with GCC
4.2.x...
> I saw a presentation earlier today th
Hi Phil,
the tests are now split into more light-weight units by separating
single and double precision. matrix-test was additionally split into
row-major and column-major tests. This should now allow you to build with
`make -j4`
on weaker machines with limited RAM.
Best regards,
Karli
On 0
Hi,
2013/8/2 Karl Rupp
> Hey,
>
>
> >
>
>> Hmm, I'm not completely sure.
>> The best GEMM performance are not located "around" (distance-wise in the
>> parameter space) the sweet spot, generally, since perturbating one
>> parameter can result in disastrous performance.
>>
>
> Yeah, I agree, the
Hi,
> A padding of 256 looks pretty expensive to me, resulting in a lot of
> unnecessary FLOPs in worst case. Can you please assemble a list of
> all GEMM kernel configuration parameters and their execution times
> for the GTX 470, Tesla C2050, HD 7970 and HD 5850? mL, nL, and kL
>