Hi again, On Tue, Feb 14, 2012 at 09:51:15AM -0500, Andreas Kloeckner wrote:
On Tue, 14 Feb 2012 09:51:03 +0100, Holger Rapp <[email protected]> wrote:> [snipped code snippet] >OpenCL 1.2 spec, 6.9b): > >An image type (image2d_t, image3d_t, image2d_array_t, image1d_t, >image1d_buffer_t or image1d_array_t) can only be used as the type of a >function argument. An image function argument cannot be modified. Elements of an >image can only be accessed using built-in functions described in section 6.12.14. >An image type cannot be used to declare a variable, a structure or union field, an array of >images, a pointer to an image, or the return type of a function. An image type cannot be >used with the __private, __local and __constant address space qualifiers. The >image3d_t type cannot be used with the __write_only access qualifier unless the >cl_khr_3d_image_writes extension is enabled. An image type cannot be used >with the __read_write access qualifer which is reserved for future use. That cleans that up. Much obliged for the answer. This leads to a direct followup though: is there a best practice to pass a variable amount of images to a kernel?Not sure that's possible. I think that number needs to be known at compile time. (My response to that would be, well simply compile and cache the kernel on-demand.)
I will try that.
I guess this is beyond my capabilities. I am actually on nvidia but using Apples OpenCL. I will just turn optimization on and keep monitoring my performance.>> 2) I pass the params_buf as __constant to my kernel. I have some >> functions doing arithmetic with DualQuaternions and I have to first copy >> all data from my structure before working with them: e.g. >> >> void conjugate(const DualQuaternion * a, DualQuaternion * rv); >> >> DualQuaternion rv; >> conjugate(&measurement->w2c, &rv); >> Gives this error: >> passing 'DualQuaternion __attribute__((address_space(2)))const *' discards qualifiers, expected 'DualQuaternion const *' >> >> DualQuaternion temp = measurement->w2c; >> conjugate(&w2c, &rv); >> is working okay. >> >> I understand the reason for this I think: functions need to work in one >> address space only. But is there a way to pass my structures to my kernel >> that the explicit copy is not needed? > >My advice would be to pass the arguments to conjugate() by value and use >a return value. This avoids issues of address space matching >(i.e. declaring __constant args in conjugate()), and any half-way smart >compiler will generate equivalent code anyway. My profiling shoes that this not the case. Passing const DualQuaternion* is roughly 20% faster than passing const DualQuaternion. Maybe I need to activate optimization or so? Would that be cl.Program.build(["-O2"])?Good question. If you're on Nv, maybe start by looking at the PTX. (prg.binaries[0])
I already ran into a new problem. The following program fails for me on one box (Linux 64 bit) but not the other (Apple). I'd like to know why it fails on the Linux box. Below is the sample program, the output when it is ran and the properties of the linux card. I see no apparent reason why the image does not work - it only has ~30 MB. Can other programs influence the amount of memory available on the card?
------------------- SNIP -------------------
import pyopencl as cl
import numpy as np
def test():
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
sgmf = np.empty((1200,1600,4), dtype=np.dtype(np.float32, align=True))
sgmf_buf = cl.image_from_array(ctx, sgmf, 4)
prg ="""
void __kernel testme(__read_only image2d_t a) {
}
"""
prg = cl.Program(ctx, prg).build()
prg.testme(queue, (16,), None, sgmf_buf)
test()
------------------- SNAP -------------------
crash:
------------------- SNIP -------------------
Traceback (most recent call last):
File "./test.py", line 24, in <module>
test()
File "./test.py", line 22, in test
prg.testme(queue, (16,), None, sgmf_buf)
File
"/usr/local/lib/python2.7/dist-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/__init__.py",
line 452, in kernel_call
global_offset, wait_for, g_times_l=g_times_l)
pyopencl.MemoryError: clEnqueueNDRangeKernel failed: mem object allocation
failure
------------------- SNAP -------------------
output of dump-properties from the example directory.
------------------- SNIP -------------------
===========================================================================
<pyopencl.Platform 'NVIDIA CUDA' at 0x128c3e0>
===========================================================================
EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
NAME: NVIDIA CUDA
PROFILE: FULL_PROFILE VENDOR: NVIDIA Corporation VERSION: OpenCL 1.1 CUDA 4.2.1 --------------------------------------------------------------------------- <pyopencl.Device 'GeForce 8400 GS' on 'NVIDIA CUDA' at 0x1244080> --------------------------------------------------------------------------- ADDRESS_BITS: 32 AFFINITY_DOMAINS_EXT: <error> AVAILABLE: 1 COMPILER_AVAILABLE: 1 COMPUTE_CAPABILITY_MAJOR_NV: 1 COMPUTE_CAPABILITY_MINOR_NV: 1 DOUBLE_FP_CONFIG: 0 DRIVER_VERSION: 295.20 ENDIAN_LITTLE: 1 ERROR_CORRECTION_SUPPORT: 0 EXECUTION_CAPABILITIES: 1EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics GLOBAL_MEM_CACHELINE_SIZE: 0
GLOBAL_MEM_CACHE_SIZE: 0 GLOBAL_MEM_CACHE_TYPE: 0 GLOBAL_MEM_SIZE: 268107776 GPU_OVERLAP_NV: 1 HALF_FP_CONFIG: <error> HOST_UNIFIED_MEMORY: 0 IMAGE2D_MAX_HEIGHT: 16383 IMAGE2D_MAX_WIDTH: 4096 IMAGE3D_MAX_DEPTH: 2048 IMAGE3D_MAX_HEIGHT: 2048 IMAGE3D_MAX_WIDTH: 2048 IMAGE_SUPPORT: 1 INTEGRATED_MEMORY_NV: 0 KERNEL_EXEC_TIMEOUT_NV: 1 LOCAL_MEM_SIZE: 16384 LOCAL_MEM_TYPE: 1 MAX_CLOCK_FREQUENCY: 918 MAX_COMPUTE_UNITS: 2 MAX_CONSTANT_ARGS: 9 MAX_CONSTANT_BUFFER_SIZE: 65536 MAX_MEM_ALLOC_SIZE: 134217728 MAX_PARAMETER_SIZE: 4352 MAX_READ_IMAGE_ARGS: 128 MAX_SAMPLERS: 16 MAX_WORK_GROUP_SIZE: 512 MAX_WORK_ITEM_DIMENSIONS: 3 MAX_WORK_ITEM_SIZES: [512, 512, 64] MAX_WRITE_IMAGE_ARGS: 8 MEM_BASE_ADDR_ALIGN: 2048 MIN_DATA_TYPE_ALIGN_SIZE: 128 NAME: GeForce 8400 GS NATIVE_VECTOR_WIDTH_CHAR: 1 NATIVE_VECTOR_WIDTH_DOUBLE: 0 NATIVE_VECTOR_WIDTH_FLOAT: 1 NATIVE_VECTOR_WIDTH_HALF: 0 NATIVE_VECTOR_WIDTH_INT: 1 NATIVE_VECTOR_WIDTH_LONG: 1 NATIVE_VECTOR_WIDTH_SHORT: 1OPENCL_C_VERSION: OpenCL C 1.0 PARENT_DEVICE_EXT: <error>
PARTITION_STYLE_EXT: <error> PARTITION_TYPES_EXT: <error> PLATFORM: <pyopencl.Platform 'NVIDIA CUDA' at 0x128c3e0> PREFERRED_VECTOR_WIDTH_CHAR: 1 PREFERRED_VECTOR_WIDTH_DOUBLE: 0 PREFERRED_VECTOR_WIDTH_FLOAT: 1 PREFERRED_VECTOR_WIDTH_HALF: 0 PREFERRED_VECTOR_WIDTH_INT: 1 PREFERRED_VECTOR_WIDTH_LONG: 1 PREFERRED_VECTOR_WIDTH_SHORT: 1 PROFILE: FULL_PROFILE PROFILING_TIMER_OFFSET_AMD: <error> PROFILING_TIMER_RESOLUTION: 1000 QUEUE_PROPERTIES: 3 REFERENCE_COUNT_EXT: <error> REGISTERS_PER_BLOCK_NV: 8192 SINGLE_FP_CONFIG: 62 TYPE: 4 VENDOR: NVIDIA Corporation VENDOR_ID: 4318 VERSION: OpenCL 1.0 CUDA WARP_SIZE_NV: 32 IMAGE2D READ_ONLY FORMATS: R-F, R-HALF_F, R-UN8, R-UN16, R-SN16, R-S8, R-S16, R-S32, R-U8, R-U16, R-U32, A-F, A-HALF_F, A-UN8, A-UN16, A-SN16, A-S8, A-S16, A-S32, A-U8, A-U16, A-U32, RG-F, RG-HALF_F, RG-UN8, RG-UN16, RG-SN16, RG-S8, RG-S16, RG-S32, RG-U8, RG-U16, RG-U32, RA-F, RA-HALF_F, RA-UN8, RA-UN16, RA-SN16, RA-S8, RA-S16, RA-S32, RA-U8, RA-U16, RA-U32, RGBA-F, RGBA-HALF_F, RGBA-UN8, RGBA-UN16, RGBA-SN16, RGBA-S8, RGBA-S16, RGBA-S32, RGBA-U8, RGBA-U16, RGBA-U32, BGRA-UN8, BGRA-S8, BGRA-U8, <unknown channel order 0x10b7>-UN8, <unknown channel order 0x10b7>-S8, <unknown channel order 0x10b7>-U8, INTENSITY-F, INTENSITY-HALF_F, INTENSITY-UN8, INTENSITY-UN16, INTENSITY-SN16, LUMINANCE-F, LUMINANCE-HALF_F, LUMINANCE-UN8, LUMINANCE-UN16, LUMINANCE-SN16 IMAGE3D READ_ONLY FORMATS: R-F, R-HALF_F, R-UN8, R-UN16, R-SN16, R-S8, R-S16, R-S32, R-U8, R-U16, R-U32, A-F, A-HALF_F, A-UN8, A-UN16, A-SN16, A-S8, A-S16, A-S32, A-U8, A-U16, A-U32, RG-F, RG-HALF_F, RG-UN8, RG-UN16, RG-SN16, RG-S8, RG-S16, RG-S32, RG-U8, RG-U16, RG-U32, RA-F, RA-HALF_F, RA-UN8, RA-UN16, RA-SN16, RA-S8, RA-S16, RA-S32, RA-U8, RA-U16, RA-U32, RGBA-F, RGBA-HALF_F, RGBA-UN8, RGBA-UN16, RGBA-SN16, RGBA-S8, RGBA-S16, RGBA-S32, RGBA-U8, RGBA-U16, RGBA-U32, BGRA-UN8, BGRA-S8, BGRA-U8, <unknown channel order 0x10b7>-UN8, <unknown channel order 0x10b7>-S8, <unknown channel order 0x10b7>-U8, INTENSITY-F, INTENSITY-HALF_F, INTENSITY-UN8, INTENSITY-UN16, INTENSITY-SN16, LUMINANCE-F, LUMINANCE-HALF_F, LUMINANCE-UN8, LUMINANCE-UN16, LUMINANCE-SN16 ------------------- SNAP ------------------- Thanks again. Holger
pgppK1jgWnt21.pgp
Description: PGP signature
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
