Re: [PyOpenCL] GPU speed wrt number of threads

2018-06-06 Thread aseem hegshetye
The findings below are considering I already have a 20 million * 57 bits int array in the GPu. > On Jun 6, 2018, at 3:05 AM, aseem hegshetye wrote: > > Hi, > I did some testing with number of threads. I changed number of threads and > recorded the time in seconds it took for the pyopencl

Re: [PyOpenCL] GPU speed wrt number of threads

2018-06-06 Thread aseem hegshetye
Hi, I did some testing with number of threads. I changed number of threads and recorded the time in seconds it took for the pyopencl kernel to execute. Following are the results: - No_of_threads --- Time in seconds - 10,000 -- 202 - 20,000 -- 170 - 24,000 -- 209 - 30,000 -- 224

Re: [PyOpenCL] GPU speed wrt number of threads

2018-06-06 Thread Sven Warris
Hi Aseem, This maybe caused by memory access collisions and/or lack of coalesced memory access. This technical report gives some pointers: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf Do you use atomic operations? Or maybe you have too many thread fences? I have no

[PyOpenCL] GPU speed wrt number of threads

2018-06-06 Thread aseem hegshetye
Hi, Does GPU speed exponentially drop as number of threads increase beyond a certain number?. I used to allocate number of threads= number of transactions in data under consideration. For Tesla K80 I see exponential drop in speed above 30290 Threads. If true, is it a best practice to keep number