On Wed, Aug 12, 2009 at 11:31 AM, M. Badawy<[email protected]> wrote: > Well put. Thing is I'm attending a summer school on CUDA right now and > it seems that micro managing the threads, blocks, warps, > registers...etc. is not for the faint of heart.
Think of it as a puzzle. I'll take memory bank conflict avoidance over sudoku any day :) > I am not a programmer > and I doubt that I will ever have the time to do all this fine tuning > to achieve optimal performance.This also depends on the code, so, it > may not be that hard for a lot of tasks that lends themselves well to > parallel processing. What problem are you trying to solve? Maybe you're blessed with a large problem with ridiculously fine-grained parallelism. > An interesting remark was mentioned today is that there is a lot of > testing going on right now to automate the fine-tuning process, and it > was mentioned that a certain algorithm managed to squeeze 15~20% more > performance than the human optimized code. The optimizations done by > the algorithm would have taken a person weeks to implement. These fine > tuning features will be implemented later in CUDA. But it seems not > any time soon. link? > My guess is that once CUDA gets smart enough, it maybe then easier for > the non-professional programmer to use any tool whatsoever without > worrying too much about performance. I would not hold my breath :) I would be surprised if CUDA programming changes qualitatively before some some completely different architecture with a different programming model comes along. _______________________________________________ PyCUDA mailing list [email protected] http://tiker.net/mailman/listinfo/pycuda_tiker.net
