The findings below are considering I already have a 20 million * 57 bits int 
array in the GPu.

> On Jun 6, 2018, at 3:05 AM, aseem hegshetye <aseem.hegshe...@gmail.com> wrote:
> 
> Hi,
> I did some testing with number of threads. I changed number of threads and 
> recorded the time in seconds it took for the pyopencl kernel to execute.
> Following are the results:
>  No_of_threads --- Time in seconds
> 10,000 -- 202
> 20,000 -- 170
> 24,000 -- 209
> 30,000 -- 224
> 30714 -- 659
> Thanks
> Aseem
> 
>> On Wed, Jun 6, 2018 at 1:54 AM, Sven Warris <s...@warris.nl> wrote:
>> Hi Aseem,
>> 
>> This maybe caused by memory access collisions and/or lack of coalesced 
>> memory access. This technical report gives some pointers:
>> https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf
>> Do you use atomic operations? Or maybe you have too many thread fences? 
>> I have no problem starting many threads: the number of threads alone is not 
>> the issues. 
>> 
>> Cheers,
>> Sven
>> 
>> 
>> Op 6-6-2018 om 8:37 schreef aseem hegshetye:
>>> Hi,
>>> Does GPU speed exponentially drop as number of threads increase beyond a 
>>> certain number?. I used to allocate number of threads= number of 
>>> transactions in data under consideration.
>>> For Tesla K80 I see exponential drop in speed above 30290 Threads. 
>>> If true, is it a best practice to keep number of threads low and iterate 
>>> over the data to get results at optimum speed. 
>>> How to find best number of threads for a GPU?
>>> 
>>> Thanks
>>> Aseem
>>> 
>>> 
>>>  _______________________________________________
>>> PyOpenCL mailing list
>>> PyOpenCL@tiker.net
>>> https://lists.tiker.net/listinfo/pyopencl
>> 
>> 
>> _______________________________________________
>> PyOpenCL mailing list
>> PyOpenCL@tiker.net
>> https://lists.tiker.net/listinfo/pyopencl
>> 
> 
_______________________________________________
PyOpenCL mailing list
PyOpenCL@tiker.net
https://lists.tiker.net/listinfo/pyopencl

Reply via email to