Hi Zach,
On 03/11/14 00:15, Zach Davis wrote:
> I'm giving your tuning idea a go. It appears to be a somewhat slow and
> compute intensive process. One question I have related to this tune
> executable is whether this process works on the installation, or those
> files compiled in the build directory (i.e. do I need to run make
> install again after this is has completed)?
You should not need to run make install again. The tune utility will
create a .kdb file for each GPU on the system in the directory specified
by CLBLAS_STORAGE_PATH. This variable must also be set in the shell
where you run PyFR. Currently, as far as I can tell, it only tunes GPUs
and not CPUs.
> Another question I have is what to provide for the openmp backend on
> systems running OS X. I set the cblas-mt parameter to
> /System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework
> in one of the example case input files; however, PyFR gives a
> RuntimeError: Unable to load cblas. It was so much easier with CUDA!
> Thanks for your time today.
There are a couple of gotchas when it comes to running PyFR on OS X.
The first is that frameworks are actually directories; not shared
libraries. So what you want is actually:
/System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Versions/Current/libBLAS.dylib
In terms of BLAS libraries I would recommend using a single-threaded
BLAS library where possible and let PyFR handle the multi-threading;
such libraries are specified as cblas-st. Both ATLAS and OpenBLAS can
be built as shared libraries without thread support. Intel's MKL
library can be told to only use a single thread by setting an
environmental variable. I am unsure if something similar exists for
accelerate.
The next problem is that clang on the Mac does not support OpenMP. The
kernels generated by PyFR will therefore fail to compile. Furthermore,
clang currently does not do a particularly good job at optimising
floating point code when compared with GCC/ICC. It should therefore be
avoided.
A simple solution to this is to install a copy of GCC on your Mac.
Unfortunately, many of the builds of GCC for OS X are `buggy':
calcium:Programming freddie$ cat test.c
#include <stdio.h>
int main()
{
printf("Hello %f\n", 3.14);
return 0;
}
calcium:Programming freddie$ gcc-mp-4.8 -Ofast -march=native test.c
/var/folders/rs/zwdffscn1qlgntxyby6vct800000gn/T//ccyZ8fE9.s:13:no such
instruction: `vmovsd LC0(%rip), %xmm0'
where we can see that GCC 4.8 on my Mac can not compile a simple Hello
World type application successfully. The underlying reason for this is
the fact that the assembler GCC uses does not understand AVX
instructions (like 'vmovsd') but for whatever reason GCC tries to emit
them anyway.
It is therefore necessary to first get a working build of GCC (or hack
it to disable the emission of AVX instructions). Alternatively, if you
have a license for ICC this should work out of the box without issue.
Finally, when running PyFR using the OpenMP backend the recommended
environmental variables (at least when not using MPI) are:
export OMP_PROC_BIND=true
export OMP_NUM_THREADS=n
where n is the number of real cores on the system. If you do all of
these things performance in excess of 50% of peak FLOP/S are possible;
the backend really does perform well!
Regards, Freddie.
--
You received this message because you are subscribed to the Google Groups "PyFR
Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send an email to [email protected].
Visit this group at http://groups.google.com/group/pyfrmailinglist.
For more options, visit https://groups.google.com/d/optout.