As a general note: Once you sort out the resources issue, it is *very* important to retune your block and grid sizes after switching from compute capability 2.0 (Tesla C2075) to compute capability 3.x (Tesla K40c). When first switched my code to the new architecture, I saw almost no improvement or actual regressions in performance. It wasn't until I re-benchmarked different grid configurations that I discovered the problem.
In fact, I now sometimes include an auto-tuning stage in my CUDA programs to dynamically select from a range of reasonable block sizes based on runtime benchmarks of my important kernels. On Apr 2, 2014, at 1:46 AM, Jerome Kieffer <[email protected]> wrote: > On Wed, 2 Apr 2014 17:41:59 +1300 > Alistair McDougall <[email protected]> wrote: > >> Hi, >> I'm have previously been using PyCUDA on a Tesla C2075 as part my of >> astrophysics research. We recently installed a Tesla K40c and I was hoping >> to just run the same code on the new card, however I am receiving "pycuda >> ._driver.LaunchError: cuLaunchKernel failed: launch out of resources" >> errors. >> >> A quick google search for "PyCUDA Tesla K40c" returned a minimal set of >> results, which led me to wonder has anyone tried running PyCUDA on this >> card? > > Hi, > I ran into similar bugs with our K20 and I was > scratching my head for a while when people from Nvidia told me that the > driver 319 from nvidia had problems with the GK110 based Tesla cards. > Driver 331 runs without glitches for a while now. > > Hope this helps. > > > -- > Jérôme Kieffer > tel +33 476 882 445 > > _______________________________________________ > PyCUDA mailing list > [email protected] > http://lists.tiker.net/listinfo/pycuda _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
