Monday, 3 November 2014
Hi Freddie, I have another follow-up question for you. I was able to confirm that using the openmp backend, OS X was unable to compile the requisite kernel due to lack of clang OpenMP support as you mentioned. I found a post on stackoverflow referencing partial OpenMP support in XCode 6 (clang 3.5) on OS X which can be activated using the -Xclang -fopenmp=libiomp5 options (http://stackoverflow.com/questions/26159225/openmp-support-in-xcode-6-clang-3-5). The responder, Alexey Bataev, is doing this development work at Intel I believe. Would passing these options to clang++ enable the kernel to compile on OS X, or are we still left waiting for the OpenMP implementation to be fully supported? Best Regards Zach > On Nov 3, 2014, at 2:05 AM, Freddie Witherden <[email protected]> wrote: > > Hi Zach, > > On 03/11/14 00:15, Zach Davis wrote: >> I'm giving your tuning idea a go. It appears to be a somewhat slow and >> compute intensive process. One question I have related to this tune >> executable is whether this process works on the installation, or those >> files compiled in the build directory (i.e. do I need to run make >> install again after this is has completed)? > > You should not need to run make install again. The tune utility will > create a .kdb file for each GPU on the system in the directory specified > by CLBLAS_STORAGE_PATH. This variable must also be set in the shell > where you run PyFR. Currently, as far as I can tell, it only tunes GPUs > and not CPUs. > >> Another question I have is what to provide for the openmp backend on >> systems running OS X. I set the cblas-mt parameter to >> /System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework >> in one of the example case input files; however, PyFR gives a >> RuntimeError: Unable to load cblas. It was so much easier with CUDA! >> Thanks for your time today. > > There are a couple of gotchas when it comes to running PyFR on OS X. > The first is that frameworks are actually directories; not shared > libraries. So what you want is actually: > > > /System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Versions/Current/libBLAS.dylib > > In terms of BLAS libraries I would recommend using a single-threaded > BLAS library where possible and let PyFR handle the multi-threading; > such libraries are specified as cblas-st. Both ATLAS and OpenBLAS can > be built as shared libraries without thread support. Intel's MKL > library can be told to only use a single thread by setting an > environmental variable. I am unsure if something similar exists for > accelerate. > > The next problem is that clang on the Mac does not support OpenMP. The > kernels generated by PyFR will therefore fail to compile. Furthermore, > clang currently does not do a particularly good job at optimising > floating point code when compared with GCC/ICC. It should therefore be > avoided. > > A simple solution to this is to install a copy of GCC on your Mac. > Unfortunately, many of the builds of GCC for OS X are `buggy': > > calcium:Programming freddie$ cat test.c > #include <stdio.h> > > int main() > { > printf("Hello %f\n", 3.14); > return 0; > } > calcium:Programming freddie$ gcc-mp-4.8 -Ofast -march=native test.c > /var/folders/rs/zwdffscn1qlgntxyby6vct800000gn/T//ccyZ8fE9.s:13:no such > instruction: `vmovsd LC0(%rip), %xmm0' > > where we can see that GCC 4.8 on my Mac can not compile a simple Hello > World type application successfully. The underlying reason for this is > the fact that the assembler GCC uses does not understand AVX > instructions (like 'vmovsd') but for whatever reason GCC tries to emit > them anyway. > > It is therefore necessary to first get a working build of GCC (or hack > it to disable the emission of AVX instructions). Alternatively, if you > have a license for ICC this should work out of the box without issue. > > Finally, when running PyFR using the OpenMP backend the recommended > environmental variables (at least when not using MPI) are: > > export OMP_PROC_BIND=true > export OMP_NUM_THREADS=n > > where n is the number of real cores on the system. If you do all of > these things performance in excess of 50% of peak FLOP/S are possible; > the backend really does perform well! > > Regards, Freddie. > > -- > You received this message because you are subscribed to the Google Groups > "PyFR Mailing List" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send an email to [email protected]. > Visit this group at http://groups.google.com/group/pyfrmailinglist. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "PyFR Mailing List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send an email to [email protected]. Visit this group at http://groups.google.com/group/pyfrmailinglist. For more options, visit https://groups.google.com/d/optout.
