Hi Bruce, That's an excellent problem for a GPU. However, because each problem uses a fair amount of memory being careful about how the memory is accessed will dominate your performance gains (as is typical when using a GPU). For example tf won't fit in the shared memory or cache of a multi-processor so you'll also want to divide the problem again. If you don't need to get this working for routine usage though, you might just try using numba primitives to move it to a GPU. I haven't used them, so I can't attest that it will give you a good answer. On the other hand, this is the sort of problem that makes learning CUDA and PyCUDA easy, so you might as well give it a shot. Regards, Craig
On Sat, Mar 28, 2015 at 8:29 AM Bruce Labitt <[email protected]> wrote: > From reading the documentation, I am confused if paralleling of this kind > of function is worth doing in pycuda. > > I'm trying to add the effect of phase noise in to a radar simulation. The > simulation is written in Scipy/numpy. Currently I am using joblib to run > multiple cores. It is too slow for the scenarios I wish to try. It does > work for a small number of targets and reduced phase noise array sizes. > The following is the current approach: > > Function to parallelize > > def MSIN( farray, Mf, tf, jj ): > """ > farray, Mf, tf, ii > > farray array of frequencies (size = 10000) > Mf array of coefficients (size = 10000) > tf 2D array ~[2048 x 256] of time > jj list of indices (fraction of the problem to solve) > > """ > Msin = 0.0 > for ii in jj: > Msin = Msin + Mf[ii] * 2.0*cos( 2.0*pi*farray[ii]*tf ) > return Msin > > Current method to call function in parallel (multiprocessing) > > """ > ==================================================== > Parallel computes the function MSIN with njobs cores > ==================================================== > """ > MMM = Parallel(n_jobs=njobs, max_nbytes=None)\ > (delayed(MSIN)( f, aa, tf1, ii ) for ii in idx) > Msin = reduce(add, MMM) # add all the results of the cores together > > Any suggestions to port this to pycuda? Reasonable candidate? > > In essence, it is accumulating a scalar weighted cos function for many > elements of a 2D array. It 'feels' like it should be portable. Any road > blocks forseen? The 2D array of times is continuous in the sense of > stride. But there are discontinuous jumps in time values in the array, > which I do not think is a problem. > > I have from DumpProperties.py > Device #0: GeForce GTX 680M > Compute Capability: 3.0 > Total Memory: 4193984 KB > CAN_MAP_HOST_MEMORY: 1 > CLOCK_RATE: 758000 > MAX_BLOCK_DIM_X: 1024 > MAX_BLOCK_DIM_Y: 1024 > MAX_BLOCK_DIM_Z: 64 > MAX_GRID_DIM_X: 2147483647 > MAX_GRID_DIM_Y: 65535 > MAX_GRID_DIM_Z: 65535 > > CUDA6.5 > > Thanks in advance for any insight, or suggestions on how to attack the > problem > > -Bruce > > _______________________________________________ > PyCUDA mailing list > [email protected] > http://lists.tiker.net/listinfo/pycuda >
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
