The findings below are considering I already have a 20 million * 57 bits int array in the GPu.
> On Jun 6, 2018, at 3:05 AM, aseem hegshetye <aseem.hegshe...@gmail.com> wrote: > > Hi, > I did some testing with number of threads. I changed number of threads and > recorded the time in seconds it took for the pyopencl kernel to execute. > Following are the results: > No_of_threads --- Time in seconds > 10,000 -- 202 > 20,000 -- 170 > 24,000 -- 209 > 30,000 -- 224 > 30714 -- 659 > Thanks > Aseem > >> On Wed, Jun 6, 2018 at 1:54 AM, Sven Warris <s...@warris.nl> wrote: >> Hi Aseem, >> >> This maybe caused by memory access collisions and/or lack of coalesced >> memory access. This technical report gives some pointers: >> https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf >> Do you use atomic operations? Or maybe you have too many thread fences? >> I have no problem starting many threads: the number of threads alone is not >> the issues. >> >> Cheers, >> Sven >> >> >> Op 6-6-2018 om 8:37 schreef aseem hegshetye: >>> Hi, >>> Does GPU speed exponentially drop as number of threads increase beyond a >>> certain number?. I used to allocate number of threads= number of >>> transactions in data under consideration. >>> For Tesla K80 I see exponential drop in speed above 30290 Threads. >>> If true, is it a best practice to keep number of threads low and iterate >>> over the data to get results at optimum speed. >>> How to find best number of threads for a GPU? >>> >>> Thanks >>> Aseem >>> >>> >>> _______________________________________________ >>> PyOpenCL mailing list >>> PyOpenCL@tiker.net >>> https://lists.tiker.net/listinfo/pyopencl >> >> >> _______________________________________________ >> PyOpenCL mailing list >> PyOpenCL@tiker.net >> https://lists.tiker.net/listinfo/pyopencl >> >
_______________________________________________ PyOpenCL mailing list PyOpenCL@tiker.net https://lists.tiker.net/listinfo/pyopencl