Forgot the most important thing... are you sure nothing else is running on
that GPU? Maybe a OpenCL or CUDA or OpenGL application?


On 12 June 2014 14:52, CRV§ADER//KY <[email protected]> wrote:

> You're right that something is wrong. There's no way to justify  0.2
> seconds per iteration of overhead.
> Could you confirm that you are not doing any buffer copy in the meantime?
> Did you retry with a different OpenCL Platform (e.g. AMD CPU or Intel CPU)?
>
> What happens if you pipeline the kernel execution?
>
> events = []
> kernel_total = 0
>
> t0 = time.time()
> for i in range(64):
>     events.append(prog.sha1( queue , shape , None , in_buf , out_buf ,
> ..<other buffers> ))
>
> t2 = time.time()
> print("Scheduling time: %f", t2 - t0)
> t1 = t2
>
> for event in events:
>     event.wait()
>     t2 = time.time()
>     kernel_elapsed = 1e-9 * ( event.profile.end - event.profile.start )
>     kernel_total += kernel_elapsed
>     print("Real run time: %f, Kernel time: %f", t2 - t1, kernel_elapsed)
>     t1 = t2
>
> print("Total real run time: %f, Total kernel time: %f", t2 - t0,
> kernel_total)
>
>
> On 11 June 2014 23:15, Abhilash Dighe <[email protected]> wrote:
>
>> Hi,
>>
>> I was hoping to get some insight on my observations. I am using PyOpenCL
>> version 2 with NVIDIA Tesla M2090 to run my kernel which runs SHA1
>> algorithm over variably sized data blocks. I'm running the same kernel  I'm
>> trying to find the execution time for my kernel. But I'm getting different
>> readings for time for when I use the PyOpenCL's profiling tool and when I
>> use the standard python time library. My code is structured as:
>>
>>
>> hash_start = time.time()
>> hash_event = prog.sha1( queue , shape , None , in_buf , out_buf ,
>> ..<other buffers> )
>> hash_event.wait()
>> hash_end = time.time()
>> add_hash_CPU_time( hash_end - hash_start )
>> add_hash_GPU_time( 1e-9 * ( hash_event.profile.end -
>> hash_event.profile.start ) )
>>
>> These are the results for a test case of size 3 GB. The kernel gets
>> called 64 times and runs 12288 threads each time.
>>
>> Total OpenCL profiling time = 1.56s
>> Total CPU wall clock time = 13.79s
>>
>> I needed some help understanding what the cause for this inconsistency
>> is. Or is there any mistake I'm making in recording the data.
>>
>> Regards,
>> Abhilash Dighe
>>
>> _______________________________________________
>> PyOpenCL mailing list
>> [email protected]
>> http://lists.tiker.net/listinfo/pyopencl
>>
>>
>
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to