I have a recent iMac with 4 logical cores (and 8 hyper threads).  I would 
have thought that peakflops(N) for a large enough N should be increasing in 
the number of threads I allow BLAS to use.  I do find that peakflops(N) 
with 1 thread is about half as high as peakflops(N) with 2 threads, but 
there is no gain to 4 threads.  Are my expectations wrong here, or is it 
possible that BLAS is somehow configured incorrectly on my machine?  In the 
example below, N = 6755, a number relevant for my work, but the results are 
similar with 5000 or 10000.

here is my versioninfo()
julia> versioninfo()
Julia Version 0.5.0
Commit 3c9d753* (2016-09-19 18:14 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin15.6.0)
  CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

here is an example peakflops() exercise:
julia> BLAS.set_num_threads(1)

julia> mean(peakflops(6755) for i=1:10)
5.225580459387056e10

julia> BLAS.set_num_threads(2)

julia> mean(peakflops(6755) for i=1:10)
1.004317640281997e11

julia> BLAS.set_num_threads(4)

julia> mean(peakflops(6755) for i=1:10)
9.838116463900085e10





Reply via email to