Re: [PyOpenCL] Non-blocking copy and wait-for event

Schock, Jonathan Wed, 06 Apr 2016 05:53:44 -0700

Hi Andreas,

The driver information is as follows:

filename:/lib/modules/3.13.0-61-generic/kernel/drivers/video/nvidia.ko

alias:          char-major-195-*
version:        352.30
supported:      external
license:        NVIDIA
alias:          pci:v000010DEd00000E00sv*sd*bc04sc80i00*
alias:          pci:v000010DEd*sv*sd*bc03sc02i00*
alias:          pci:v000010DEd*sv*sd*bc03sc00i00*
depends:        drm
vermagic:       3.13.0-61-generic SMP mod_unload modversions
parm:           NVreg_Mobile:int
parm:           NVreg_ResmanDebugLevel:int
parm:           NVreg_RmLogonRC:int
parm:           NVreg_ModifyDeviceFiles:int
parm:           NVreg_DeviceFileUID:int
parm:           NVreg_DeviceFileGID:int
parm:           NVreg_DeviceFileMode:int
parm:           NVreg_UpdateMemoryTypes:int
parm:           NVreg_InitializeSystemMemoryAllocations:int
parm:           NVreg_UsePageAttributeTable:int
parm:           NVreg_MapRegistersEarly:int
parm:           NVreg_RegisterForACPIEvents:int
parm:           NVreg_CheckPCIConfigSpace:int
parm:           NVreg_EnablePCIeGen3:int
parm:           NVreg_EnableMSI:int
parm:           NVreg_MemoryPoolSize:int
parm:           NVreg_RegistryDwords:charp
parm:           NVreg_RmMsg:charp
parm:           NVreg_AssignGpus:charp

Using only one queue is not an option, as more than one device has to beused with advanced syncing

and therefore the event system should be in a usable state.
Thanks for any help.

Jonathan

Am 2016-04-05 21:59, schrieb Andreas Kloeckner:

Hi Jonathan,

"Schock, Jonathan" <jonathan.sch...@tum.de> writes:

The important bits that seem to the behaviour are:

import pyopencl as cl
import numpy as np

platform = cl.get_platforms()[0]
devs = platform.get_devices()
device1 = devs[0]

h_data =
np.arange(81*9000).reshape(90,90,90).astype(np.float32,order='F')

ctx = cl.Context([device1])
queue = cl.CommandQueue(ctx)
queue2 = cl.CommandQueue(ctx)

h_image_shape=h_data.shape

f = open('Kernel.cl', 'r')
fstr = "".join(f.readlines())
prg = cl.Program(ctx, fstr).build()

d_image = cl.Image(ctx, mf.READ_ONLY,
cl.ImageFormat(cl.channel_order.INTENSITY,cl.channel_type.FLOAT),h_image_shape)
wev1 = cl.enqueue_copy(queue, d_image, h_data, is_blocking=False,
origin=(0,0,0), region=h_image_shape)
prg.sum(queue2,(h_image_shape[0],),None,d_image,wait_for = [wev1])

The Kernel is doing some simple number crunching on the input image.

I'm measuring with nvvp, one result is attached, where you clearlysee,

that the kernel launches long before the copy has ended.
I allready tested with OoO disabled...same behaviour.
Implementation is 'Tesla K10.G2.8GB' on 'NVIDIA CUDA'.


For completeness: What driver version is this?

I am pretty certain that what you're seeing is non-compliant behaviorby

the Nv implementation. The behavior you're seeing is consistent with Nv
just ignoring the event_wait_list.

Beyond that, using two queues *and* ooo is redundant. ooo alone will
already allow concurrency.
https://www.khronos.org/registry/cl/sdk/1.2/docs/man/xhtml/clCreateCommandQueue.html

I would probably recommend using a single queue until overlapping
becomes crucial.

Andreas


_______________________________________________
PyOpenCL mailing list
PyOpenCL@tiker.net
https://lists.tiker.net/listinfo/pyopencl

Re: [PyOpenCL] Non-blocking copy and wait-for event

Reply via email to