Robert, What domain sizes are you studying in this problem? A 4-point stencil is memory bound, so you shouldn't expect to outperform the STREAMs benchmark (calculated using the appropriate reuse ratio depending on how far you unroll the kernel).
Have you looked at Volkov's work on this problem? They have a very good CUDA implementation for 3-D stencil operators, a lot of what they say applies in 2-D: http://www.cs.berkeley.edu/~volkov/volkov10-parcfd.pdf Also, you can't use 'top' as a reliable measure of computational performance when analyzing numerical code. You need to work out the number of floating-point instructions (or memory bandwidth) your CPU or GPU is capable of per cycle and look at the requirements of your operator. Good luck, Aron On Tue, Sep 20, 2011 at 10:21 PM, Robert L Cloud <[email protected]> wrote: > Hi, > > I've done some analysis comparing CPU(on a nehalem) and GPU(on a tesla) > performance of PyOpenCL to parallel Cython using OpenMP. The performance of > PyOpenCL on the CPU(Intel Nehalem with AMD OpenCL 1.1) was very poor, even > slower than a single threaded Cython program. I realize that my OpenCL > implementation was fairly poor, but I expected performance to be a bit > better than it was. > > The analysis is available here: > http://www.rcloud.me/2011/09/20/pyopencl-implementation/ > > I'm hoping that someone can give some insight into how to improve it or why > it is so bad. > > Also, I would like to run the analysis again with the Intel OpenCL driver, > but can't get PyOpenCL to recognize both Intel and AMD platforms, when I run > get_platforms it only shows AMD. Here is my siteconf.py file: > > rcloud@Vertex:~/sources/pyopencl-2011.1.2$ cat siteconf.py > BOOST_INC_DIR = [] > BOOST_LIB_DIR = [] > BOOST_COMPILER = 'gcc43' > BOOST_PYTHON_LIBNAME = ['boost_python-gcc43-mt'] > USE_SHIPPED_BOOST = True > CL_TRACE = False > CL_ENABLE_GL = False > CL_ENABLE_DEVICE_FISSION = True > CL_INC_DIR = > ['/home/rcloud/sources/amd/AMD-APP-SDK-v2.5-RC2-lnx64/include'] > CL_LIB_DIR = > ['/home/rcloud/sources/amd/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64', > '/usr/lib64'] > CL_LIBNAME = ['OpenCL'] > CXXFLAGS = [] > LDFLAGS = [] > > > thanks in advance, > -- > Robert L Cloud > > ,,Warum willst du dich von uns Allen > Und unsrer Meinung entfernen?" > Ich schreibe nicht, euch zu gefallen; > Ihr sollt was lernen. > --Goethe > > http://www.robertlouiscloud.com > > > _______________________________________________ > PyOpenCL mailing list > [email protected] > http://lists.tiker.net/listinfo/pyopencl > >
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
