Hi Zach, Many thanks for all this, and your continued interest in the project!
A couple of general points: 1.) If you are interested in comparisons between different backends, you may want to check this out: http://arxiv.org/abs/1409.0405 2.) When looking at absolute (and even relative) performance of different backends, the very small 2D example test cases are somewhat pathological; the matrices that end up being repeatedly multiplied together are not very big, and hence are unlikely to get a good fraction of peak out of dgemm 3.) Regarding failure of the CUDA backend. What GPU and version of CUDA do you have on the Mac? Cheers Peter On 4 Nov 2014, at 02:34, Zach Davis <[email protected]<mailto:[email protected]>> wrote: Monday, 3 November 2014 Hi Freddie, I’m still looking into the OpenCL backend, but I think I was finally able to get the OpenMP backend up and running under OS X. I’ve collected a few relative performance benchmark results that perhaps some in the community might be interested in. Ideally, I would like to apply this test to see similar results for both CUDA and OpenCL backends. My test system was a modest 2.4 GHz (i7–3630QM) quad-core Intel Core i7 Ivy Bridge processor with 16GB 1600 MHz DDR3 RAM running OS X v10.10. I modified the compiler flags in ${PYFR_ROOT}/pyfr/backends/openmp/compiler.py replacing the -march=native option with -mtune=native. This change isn’t necessary for the clang-omp compiler, but was necessary for the gcc–4.9 compiler I tried. To keep things consistent, I left that option changed across compilers. I took the fastest run in my test matrix (i.e. the very last case) and re-ran the test while reverting the -mtune=native option back to -march=native and observed no change in runtime. I ran the couette_flow_2d example case using a single partition initiating pyfr-sim as follows: pyfr-sim -p -n 100 -b openmp run couette_flow_2d.pyfrm couette_flow_2d.ini The results for the openmp backend tests follow: Backend Compiler Environment Time cblas-mt = Accelerate Framework gcc–4.9 OMP_NUM_THREADS=4 07m 48s cblas-st = Accelerate Framework gcc–4.9 OMP_NUM_THREADS=4 10m 52s cblas-mt = Accelerate Framework gcc–4.9 OMP_NUM_THREADS=8 12m 20s cblas-st = OpenBLAS 0.2.12 gcc–4.9 OMP_NUM_THREADS=4 11m 01s cblas-mt = OpenBLAS 0.2.12 gcc–4.9 OMP_NUM_THREADS=4 07m 46s cblas-mt = Accelerate Framework clang-omp OMP_NUM_THREADS=4 04m 24s cblas-st = Accelerate Framework clang-omp OMP_NUM_THREADS=4 04m 23s cblas-mt = OpenBLAS 0.2.12 clang-omp OMP_NUM_THREADS=4 04m 12s cblas-st = OpenBLAS 0.2.12 clang-omp OMP_NUM_THREADS=4 04m 10s The third case run shows that hyperthreading is a no-no as I’m sure you’re already aware. I was actually surprised that Apple’s Accelerate Framework was less performant than OpenBLAS, and I’ve convinced myself that gcc–4.9 (v4.9.2) is garbage. To install an OpenMP version of clang I used homebrew and this brew recipe<https://github.com/Homebrew/homebrew/pull/33278>. I also had to compile and install Intel’s OpenMP Runtime Library<https://www.openmprtl.org/download#stable-releases>. I downloaded the the version listed at the top of the table (Version 20140926), unpacked, and invoked make with make compiler=clang. Next, I moved the *.dylib and *.h files to their respective lib and include directories under /usr/local. Lastly, I set the C_INCLUDE_PATH, CPLUS_INCLUDE_PATH to include /usr/local/include and the DYLD_LIBRARY_PATH to include /usr/local/lib. Now something has recently changed with either pycuda under OS X or PyFR, because initiating a similar test using the cuda backend results in the following traceback: pyfr-sim -p -n 100 -b cuda run couette_flow_2d.pyfrm couette_flow_2d.ini Traceback (most recent call last): File "/Users/zdavis/Applications/PyFR/pyfr/scripts/pyfr-sim", line 112, in <module> main() File "/usr/local/lib/python2.7/site-packages/mpmath/ctx_mp.py", line 1301, in g return f(*args, **kwargs) File "/Users/zdavis/Applications/PyFR/pyfr/scripts/pyfr-sim", line 82, in main backend = get_backend(args.backend, cfg) File "/Users/zdavis/Applications/PyFR/pyfr/backends/__init__.py", line 11, in get_backend return subclass_where(BaseBackend, name=name.lower())(cfg) File "/Users/zdavis/Applications/PyFR/pyfr/backends/cuda/base.py", line 33, in __init__ from pycuda.autoinit import context File "/usr/local/lib/python2.7/site-packages/pycuda/autoinit.py", line 4, in <module> cuda.init() pycuda._driver.RuntimeError: cuInit failed: no device I remember when first installing and running PyFR (~v0.2) this worked just fine using the default backend. I’m curious what has changed. Best Regards, Zach On Nov 3, 2014, at 2:06 PM, Freddie Witherden <[email protected]<mailto:[email protected]>> wrote: On 03/11/14 21:57, Zach Davis wrote: It appears that didn’t work—PyFR complains about being unable to find the OpenMP header file. Looking at the compiler.py file in ${PYFR_ROOT}/pyfr/backends/openmp/ on line 55 you are using the value of cc to get the path of the compiler to be used. Unfortunately, on OS X this is a symbolic link to clang. Is there an environment variable that PyFR supports that will allow you to change which c compiler is used? Setting the shell environment variable CC is ignored, so I was hoping there might be an alternative way to explicitly specify the compiler PyFR uses. I have both gcc-4.9 (4.9.2) and have built an OpenMP compatible version of clang which I’ve named clang-omp to test; however, I can’t figure out how to direct PyFR to use those, rather than the cc symbolic link (which points to clang bundled with Apple’s XCode command line tools). Note, you also outlined an example of getting the copy of gcc-4.8 installed on your mac to compile a simple Hello World example. I believe if you replaced the -march=native option with something like -msse4.2 or -mtune=native, then the code snippet compiles without the error. Although, I’m not certain that is relevant to what you were pointing out. The compiler used by PyFR can be changed in the configuration file. For example on my Linux system I have: [backend-openmp] cc = gcc-4.8.3 if you are going to be experimenting you might want to put [backend-openmp] cc = ${CC} and then you can simply export CC in your shell to be your desired compiler. We do not currently support expansions such as: [backend-openmp] cc = gcc -fsomething the 'cc' field must be an executable. Similarly, we do not -- currently -- permit one to append arguments to the compiler invocation. (Although this is can be trivially accomplished with a one-line shell script should any user require this feature.) With regards to GCC on the Mac, yes it is -march=native that is causing the trouble. I would, however, rather that compilers not try and emit assembly instructions which they know can not be assembled on the current system! Regards, Freddie. -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To post to this group, send an email to [email protected]<mailto:[email protected]>. Visit this group at http://groups.google.com/group/pyfrmailinglist. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]<mailto:[email protected]>. To post to this group, send email to [email protected]<mailto:[email protected]>. Visit this group at http://groups.google.com/group/pyfrmailinglist. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send an email to [email protected]. Visit this group at http://groups.google.com/group/pyfrmailinglist. For more options, visit https://groups.google.com/d/optout.
