Hi again,

On Tue, Feb 14, 2012 at 09:51:15AM -0500, Andreas Kloeckner wrote:
On Tue, 14 Feb 2012 09:51:03 +0100, Holger Rapp <[email protected]> wrote:
> [snipped code snippet]
>OpenCL 1.2 spec, 6.9b):
>
>An image type (image2d_t, image3d_t, image2d_array_t, image1d_t,
>image1d_buffer_t or image1d_array_t) can only be used as the type of a
>function argument. An image function argument cannot be modified. Elements of 
an
>image can only be accessed using built-in functions described in section 
6.12.14.
>An image type cannot be used to declare a variable, a structure or union 
field, an array of
>images, a pointer to an image, or the return type of a function. An image type 
cannot be
>used with the __private, __local and __constant address space qualifiers. The
>image3d_t type cannot be used with the __write_only access qualifier unless the
>cl_khr_3d_image_writes extension is enabled. An image type cannot be used
>with the __read_write access qualifer which is reserved for future use.
That cleans that up. Much obliged for the answer. This leads to a direct
followup though: is there a best practice to pass a variable amount of
images to a kernel?

Not sure that's possible. I think that number needs to be known
at compile time. (My response to that would be, well simply compile and
cache the kernel on-demand.)
I will try that.


>> 2) I pass the params_buf as __constant to my kernel. I have some
>> functions doing arithmetic with DualQuaternions and I have to first copy
>> all data from my structure before working with them: e.g.
>>
>> void conjugate(const DualQuaternion * a, DualQuaternion * rv);
>>
>> DualQuaternion rv;
>> conjugate(&measurement->w2c, &rv);
>> Gives this error:
>> passing 'DualQuaternion __attribute__((address_space(2)))const *' discards 
qualifiers, expected 'DualQuaternion const *'
>>
>> DualQuaternion temp = measurement->w2c;
>> conjugate(&w2c, &rv);
>> is working okay.
>>
>> I understand the reason for this I think: functions need to work in one
>> address space only. But is there a way to pass my structures to my kernel
>> that the explicit copy is not needed?
>
>My advice would be to pass the arguments to conjugate() by value and use
>a return value. This avoids issues of address space matching
>(i.e. declaring __constant args in conjugate()), and any half-way smart
>compiler will generate equivalent code anyway.

My profiling shoes that this not the case. Passing const
DualQuaternion* is roughly 20% faster than passing const
DualQuaternion. Maybe I need to activate optimization or so? Would
that be cl.Program.build(["-O2"])?
Good question. If you're on Nv, maybe start by looking at the
PTX. (prg.binaries[0])
I guess this is beyond my capabilities. I am actually on nvidia but using Apples OpenCL. I will just turn optimization on and keep monitoring my performance.

I already ran into a new problem. The following program fails for me on one box (Linux 64 bit) but not the other (Apple). I'd like to know why it fails on the Linux box. Below is the sample program, the output when it is ran and the properties of the linux card. I see no apparent reason why the image does not work - it only has ~30 MB. Can other programs influence the amount of memory available on the card?

------------------- SNIP -------------------
import pyopencl as cl
import numpy as np

def test():
    ctx = cl.create_some_context()
    queue = cl.CommandQueue(ctx)

    sgmf = np.empty((1200,1600,4), dtype=np.dtype(np.float32, align=True))
    sgmf_buf = cl.image_from_array(ctx, sgmf, 4)

    prg ="""
void __kernel testme(__read_only image2d_t a) {
}
"""
    prg = cl.Program(ctx, prg).build()
    prg.testme(queue, (16,), None, sgmf_buf)

test()
------------------- SNAP -------------------

crash:
------------------- SNIP -------------------
Traceback (most recent call last):
  File "./test.py", line 24, in <module>
    test()
  File "./test.py", line 22, in test
    prg.testme(queue, (16,), None, sgmf_buf)
  File 
"/usr/local/lib/python2.7/dist-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/__init__.py",
 line 452, in kernel_call
    global_offset, wait_for, g_times_l=g_times_l)
pyopencl.MemoryError: clEnqueueNDRangeKernel failed: mem object allocation 
failure
------------------- SNAP -------------------

output of dump-properties from the example directory.

------------------- SNIP -------------------
===========================================================================
<pyopencl.Platform 'NVIDIA CUDA' at 0x128c3e0>
===========================================================================
EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll NAME: NVIDIA CUDA
PROFILE: FULL_PROFILE
VENDOR: NVIDIA Corporation
VERSION: OpenCL 1.1 CUDA 4.2.1
---------------------------------------------------------------------------
<pyopencl.Device 'GeForce 8400 GS' on 'NVIDIA CUDA' at 0x1244080>
---------------------------------------------------------------------------
ADDRESS_BITS: 32
AFFINITY_DOMAINS_EXT: <error>
AVAILABLE: 1
COMPILER_AVAILABLE: 1
COMPUTE_CAPABILITY_MAJOR_NV: 1
COMPUTE_CAPABILITY_MINOR_NV: 1
DOUBLE_FP_CONFIG: 0
DRIVER_VERSION: 295.20
ENDIAN_LITTLE: 1
ERROR_CORRECTION_SUPPORT: 0
EXECUTION_CAPABILITIES: 1
EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics GLOBAL_MEM_CACHELINE_SIZE: 0
GLOBAL_MEM_CACHE_SIZE: 0
GLOBAL_MEM_CACHE_TYPE: 0
GLOBAL_MEM_SIZE: 268107776
GPU_OVERLAP_NV: 1
HALF_FP_CONFIG: <error>
HOST_UNIFIED_MEMORY: 0
IMAGE2D_MAX_HEIGHT: 16383
IMAGE2D_MAX_WIDTH: 4096
IMAGE3D_MAX_DEPTH: 2048
IMAGE3D_MAX_HEIGHT: 2048
IMAGE3D_MAX_WIDTH: 2048
IMAGE_SUPPORT: 1
INTEGRATED_MEMORY_NV: 0
KERNEL_EXEC_TIMEOUT_NV: 1
LOCAL_MEM_SIZE: 16384
LOCAL_MEM_TYPE: 1
MAX_CLOCK_FREQUENCY: 918
MAX_COMPUTE_UNITS: 2
MAX_CONSTANT_ARGS: 9
MAX_CONSTANT_BUFFER_SIZE: 65536
MAX_MEM_ALLOC_SIZE: 134217728
MAX_PARAMETER_SIZE: 4352
MAX_READ_IMAGE_ARGS: 128
MAX_SAMPLERS: 16
MAX_WORK_GROUP_SIZE: 512
MAX_WORK_ITEM_DIMENSIONS: 3
MAX_WORK_ITEM_SIZES: [512, 512, 64]
MAX_WRITE_IMAGE_ARGS: 8
MEM_BASE_ADDR_ALIGN: 2048
MIN_DATA_TYPE_ALIGN_SIZE: 128
NAME: GeForce 8400 GS
NATIVE_VECTOR_WIDTH_CHAR: 1
NATIVE_VECTOR_WIDTH_DOUBLE: 0
NATIVE_VECTOR_WIDTH_FLOAT: 1
NATIVE_VECTOR_WIDTH_HALF: 0
NATIVE_VECTOR_WIDTH_INT: 1
NATIVE_VECTOR_WIDTH_LONG: 1
NATIVE_VECTOR_WIDTH_SHORT: 1
OPENCL_C_VERSION: OpenCL C 1.0 PARENT_DEVICE_EXT: <error>
PARTITION_STYLE_EXT: <error>
PARTITION_TYPES_EXT: <error>
PLATFORM: <pyopencl.Platform 'NVIDIA CUDA' at 0x128c3e0>
PREFERRED_VECTOR_WIDTH_CHAR: 1
PREFERRED_VECTOR_WIDTH_DOUBLE: 0
PREFERRED_VECTOR_WIDTH_FLOAT: 1
PREFERRED_VECTOR_WIDTH_HALF: 0
PREFERRED_VECTOR_WIDTH_INT: 1
PREFERRED_VECTOR_WIDTH_LONG: 1
PREFERRED_VECTOR_WIDTH_SHORT: 1
PROFILE: FULL_PROFILE
PROFILING_TIMER_OFFSET_AMD: <error>
PROFILING_TIMER_RESOLUTION: 1000
QUEUE_PROPERTIES: 3
REFERENCE_COUNT_EXT: <error>
REGISTERS_PER_BLOCK_NV: 8192
SINGLE_FP_CONFIG: 62
TYPE: 4
VENDOR: NVIDIA Corporation
VENDOR_ID: 4318
VERSION: OpenCL 1.0 CUDA
WARP_SIZE_NV: 32
IMAGE2D READ_ONLY FORMATS: R-F, R-HALF_F, R-UN8, R-UN16, R-SN16, R-S8, R-S16, R-S32, R-U8, R-U16, 
R-U32, A-F, A-HALF_F, A-UN8, A-UN16, A-SN16, A-S8, A-S16, A-S32, A-U8, A-U16, A-U32, RG-F, 
RG-HALF_F, RG-UN8, RG-UN16, RG-SN16, RG-S8, RG-S16, RG-S32, RG-U8, RG-U16, RG-U32, RA-F, 
RA-HALF_F, RA-UN8, RA-UN16, RA-SN16, RA-S8, RA-S16, RA-S32, RA-U8, RA-U16, RA-U32, RGBA-F, 
RGBA-HALF_F, RGBA-UN8, RGBA-UN16, RGBA-SN16, RGBA-S8, RGBA-S16, RGBA-S32, RGBA-U8, RGBA-U16, 
RGBA-U32, BGRA-UN8, BGRA-S8, BGRA-U8, <unknown channel order 0x10b7>-UN8, <unknown 
channel order 0x10b7>-S8, <unknown channel order 0x10b7>-U8, INTENSITY-F, 
INTENSITY-HALF_F, INTENSITY-UN8, INTENSITY-UN16, INTENSITY-SN16, LUMINANCE-F, LUMINANCE-HALF_F, 
LUMINANCE-UN8, LUMINANCE-UN16, LUMINANCE-SN16

IMAGE3D READ_ONLY FORMATS: R-F, R-HALF_F, R-UN8, R-UN16, R-SN16, R-S8, R-S16, R-S32, R-U8, R-U16, 
R-U32, A-F, A-HALF_F, A-UN8, A-UN16, A-SN16, A-S8, A-S16, A-S32, A-U8, A-U16, A-U32, RG-F, 
RG-HALF_F, RG-UN8, RG-UN16, RG-SN16, RG-S8, RG-S16, RG-S32, RG-U8, RG-U16, RG-U32, RA-F, 
RA-HALF_F, RA-UN8, RA-UN16, RA-SN16, RA-S8, RA-S16, RA-S32, RA-U8, RA-U16, RA-U32, RGBA-F, 
RGBA-HALF_F, RGBA-UN8, RGBA-UN16, RGBA-SN16, RGBA-S8, RGBA-S16, RGBA-S32, RGBA-U8, RGBA-U16, 
RGBA-U32, BGRA-UN8, BGRA-S8, BGRA-U8, <unknown channel order 0x10b7>-UN8, <unknown 
channel order 0x10b7>-S8, <unknown channel order 0x10b7>-U8, INTENSITY-F, 
INTENSITY-HALF_F, INTENSITY-UN8, INTENSITY-UN16, INTENSITY-SN16, LUMINANCE-F, LUMINANCE-HALF_F, 
LUMINANCE-UN8, LUMINANCE-UN16, LUMINANCE-SN16
------------------- SNAP -------------------


Thanks again.

Holger

Attachment: pgppK1jgWnt21.pgp
Description: PGP signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to