Hi again, thanks, the compilation problem is fixed. Unfortunately, there's still the invalid work group size error showing up. Output from viennacl-info:
Address Bits: 32 Available: 1 Compiler Available: 1 Endian Little: 1 Error Correction Support: 0 Execution Capabilities: CL_EXEC_KERNEL Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 Global Mem Cache Size: 0 Bytes Global Mem Cache Type: CL_NONE Global Mem Cacheline Size: 0 Bytes Global Mem Size: 1073414144 Bytes Host Unified Memory: 0 Image Support: 1 Image2D Max Height: 16383 Image2D Max Width: 4096 Image3D Max Depth: 2048 Image3D Max Height: 2048 Image3D Max Width: 2048 Local Mem Size: 16384 Bytes Local Mem Type: CL_LOCAL Max Clock Frequency: 1476 MHz Max Compute Units: 30 Max Constant Args: 9 Max Constant Buffer Size: 65536 Bytes Max Mem Alloc Size: 268353536 Bytes Max Parameter Size: 4352 Bytes Max Read Image Args: 128 Max Samplers: 16 Max Work Group Size: 512 Max Work Item Dimensions: 3 Max Work Item Sizes: 512 512 64 Max Write Image Args: 8 Mem Base Addr Align: 2048 Min Data Type Align Size: 128 Bytes Name: GeForce GTX 285 Native Vector Width char: 1 Native Vector Width short: 1 Native Vector Width int: 1 Native Vector Width long: 1 Native Vector Width float: 1 Native Vector Width double: 1 Native Vector Width half: 0 OpenCL C Version: OpenCL C 1.1 Platform: 0xbf45c0 Preferred Vector Width char: 1 Preferred Vector Width short: 1 Preferred Vector Width int: 1 Preferred Vector Width long: 1 Preferred Vector Width float: 1 Preferred Vector Width double: 1 Preferred Vector Width half: 0 Profile: FULL_PROFILE Profiling Timer Resolution: 1000 ns Queue Properties: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_QUEUE_PROFILING_ENABLE Single FP Config: CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA Type: GPU Vendor: NVIDIA Corporation Vendor ID: 4318 Version: OpenCL 1.0 CUDA Driver Version: 304.43 Maybe the work group size exceeds 512? It works well on the GTX 470, though... Best regards, Karli On 08/13/2013 11:01 AM, Philippe Tillet wrote: > Hi hi, > > > 2013/8/13 Karl Rupp <r...@iue.tuwien.ac.at <mailto:r...@iue.tuwien.ac.at>> > > Hi, > > > On GPUs with 16kB of shared memory (e.g. GTX 285), the generated > > GEMM kernels now exceed the available memory: > > > > Log: ptxas error : Entry function 'kernel_0x207f4b0_0' uses too > > much shared data (0x40a0 bytes + 0x10 bytes system, 0x4000 max) > > > > This is because of > > __local float lhs_buf[4128]; > > which is more than the total 16kB of shared memory (already > ignoring > > some overhead for kernel parameters, etc.). Phil, could you > please > > cut this default down to only half the work group size, i.e. half > > the shared memory? > > > > I also got a CL_INVALID_WORK_GROUP_SIZE in > > blas3_prod_double-test-opencl, but this may be a follow-up issue. > > > > > > > > Okay, I will do that :) This brings back another thing I wanted to > > discuss. Since any device for a given vendor can have 16kB of shared > > memory, this means that the vendor defaults will actually have to be > > very conservative. A way to solve this issue is to have some > "generation > > defaults"... the problem is that it is pretty difficult to achieve > > without parsing the device name, which is a bit dirty in my > opinion... > > Do you think this is a good idea? > > We can directly query the available local device memory (which is the > reason why I added all this buffering to the device class). Am I missing > something? > > > Yes, we could. But having the combination {vendor, local memory} seems a > bit weird to me, I think {vendor, generation} makes more sense, don't > you think? > > Best regards, > Philippe ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ ViennaCL-devel mailing list ViennaCL-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/viennacl-devel