Re: [ViennaCL-devel] zero-padding datastructures...

2013-08-02 Thread Philippe Tillet
Hi hi, 2013/8/2 Karl Rupp > Hi, > > > I've been thinking a bit about dynamically zero-padding > > viennacl::matrix<> for full hardware use ( best bandwidth for BLAS1, > > BLAS2, best performance for BLAS3). > > > > Basically, the big problem arising is that the blocking-parameter is not > > de

Re: [ViennaCL-devel] zero-padding datastructures...

2013-08-02 Thread Karl Rupp
Hey, > > Hmm, I'm not completely sure. > The best GEMM performance are not located "around" (distance-wise in the > parameter space) the sweet spot, generally, since perturbating one > parameter can result in disastrous performance. Yeah, I agree, the sweet spot may not be defined 'distance-wise

[ViennaCL-devel] openmp 4.0

2013-08-02 Thread Evan Bollig
FYI: OpenMP 4.0 specification has been released, which includes support for accelerators, thread affinity, Fortran 2003, etc.: http://www.hpcwire.com/hpcwire/2013-07-31/openmp_40_specification_released_with_significant_new_standard_features.html And CUDA 5.5 has been released: http://www.hpcwir

Re: [ViennaCL-devel] openmp 4.0

2013-08-02 Thread Karl Rupp
Hi Evan, > OpenMP 4.0 specification has been released, which includes support for > accelerators, thread affinity, Fortran 2003, etc.: > > http://www.hpcwire.com/hpcwire/2013-07-31/openmp_40_specification_released_with_significant_new_standard_features.html Thanks! I consider the thread affinity

Re: [ViennaCL-devel] openmp 4.0

2013-08-02 Thread Evan Bollig
I hope it wont take years. I saw a presentation earlier today that they already have spec 5 started and are hoping to have most of it squared away by SC13. -E On Fri, Aug 2, 2013 at 3:20 PM, Karl Rupp wrote: > Hi Evan, > > > OpenMP 4.0 specification has been released, which includes support for

Re: [ViennaCL-devel] openmp 4.0

2013-08-02 Thread Karl Rupp
Hey, > I hope it wont take years. First compiler implementations will be available in no time, sure. However, it will take years until enterprise cluster systems like CentOS have upgraded to these compilers. We still have clusters here with GCC 4.2.x... > I saw a presentation earlier today th

Re: [ViennaCL-devel] Compilation load of matrix-test-*

2013-08-02 Thread Karl Rupp
Hi Phil, the tests are now split into more light-weight units by separating single and double precision. matrix-test was additionally split into row-major and column-major tests. This should now allow you to build with `make -j4` on weaker machines with limited RAM. Best regards, Karli On 0

Re: [ViennaCL-devel] zero-padding datastructures...

2013-08-02 Thread Philippe Tillet
Hi, 2013/8/2 Karl Rupp > Hey, > > > > > >> Hmm, I'm not completely sure. >> The best GEMM performance are not located "around" (distance-wise in the >> parameter space) the sweet spot, generally, since perturbating one >> parameter can result in disastrous performance. >> > > Yeah, I agree, the

Re: [ViennaCL-devel] zero-padding datastructures...

2013-08-02 Thread Karl Rupp
Hi, > A padding of 256 looks pretty expensive to me, resulting in a lot of > unnecessary FLOPs in worst case. Can you please assemble a list of > all GEMM kernel configuration parameters and their execution times > for the GTX 470, Tesla C2050, HD 7970 and HD 5850? mL, nL, and kL >