Robert,

What domain sizes are you studying in this problem?  A 4-point stencil is
memory bound, so you shouldn't expect to outperform the STREAMs benchmark
(calculated using the appropriate reuse ratio depending on how far you
unroll the kernel).

Have you looked at Volkov's work on this problem?  They have a very good
CUDA implementation for 3-D stencil operators, a lot of what they say
applies in 2-D: http://www.cs.berkeley.edu/~volkov/volkov10-parcfd.pdf

Also, you can't use 'top' as a reliable measure of computational performance
when analyzing numerical code.  You need to work out the number of
floating-point instructions (or memory bandwidth) your CPU or GPU is capable
of per cycle and look at the requirements of your operator.

Good luck,
Aron

On Tue, Sep 20, 2011 at 10:21 PM, Robert L Cloud <[email protected]> wrote:

> Hi,
>
> I've done some analysis comparing CPU(on a nehalem) and GPU(on a tesla)
> performance of PyOpenCL to parallel Cython using OpenMP.  The performance of
> PyOpenCL on the CPU(Intel Nehalem with AMD OpenCL 1.1) was very poor, even
> slower than a single threaded Cython program.  I realize that my OpenCL
> implementation was fairly poor, but I expected performance to be a bit
> better than it was.
>
> The analysis is available here:
> http://www.rcloud.me/2011/09/20/pyopencl-implementation/
>
> I'm hoping that someone can give some insight into how to improve it or why
> it is so bad.
>
> Also, I would like to run the analysis again with the Intel OpenCL driver,
> but can't get PyOpenCL to recognize both Intel and AMD platforms, when I run
> get_platforms it only shows AMD.  Here is my siteconf.py file:
>
> rcloud@Vertex:~/sources/pyopencl-2011.1.2$ cat siteconf.py
> BOOST_INC_DIR = []
> BOOST_LIB_DIR = []
> BOOST_COMPILER = 'gcc43'
> BOOST_PYTHON_LIBNAME = ['boost_python-gcc43-mt']
> USE_SHIPPED_BOOST = True
> CL_TRACE = False
> CL_ENABLE_GL = False
> CL_ENABLE_DEVICE_FISSION = True
> CL_INC_DIR =
> ['/home/rcloud/sources/amd/AMD-APP-SDK-v2.5-RC2-lnx64/include']
> CL_LIB_DIR =
> ['/home/rcloud/sources/amd/AMD-APP-SDK-v2.5-RC2-lnx64/lib/x86_64',
> '/usr/lib64']
> CL_LIBNAME = ['OpenCL']
> CXXFLAGS = []
> LDFLAGS = []
>
>
> thanks in advance,
> --
> Robert L Cloud
>
> ,,Warum willst du dich von uns Allen
> Und unsrer Meinung entfernen?"
> Ich schreibe nicht, euch zu gefallen;
> Ihr sollt was lernen.
>                    --Goethe
>
> http://www.robertlouiscloud.com
>
>
> _______________________________________________
> PyOpenCL mailing list
> [email protected]
> http://lists.tiker.net/listinfo/pyopencl
>
>
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to