I have a different MacBook Pro generation and get
$ make streams NPMAX=4
cd src/benchmarks/streams; /usr/bin/make --no-print-directory streams
/Users/barrysmith/Src/PETSc/arch-mpich/bin/mpicc -o MPIVersion.o -c -fPIC -Wall
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0
-I/Users/barrysmith/Src/PETSc/include
-I/Users/barrysmith/Src/PETSc/arch-mpich/include -I/opt/X11/include
-I/opt/local/include `pwd`/MPIVersion.c
Number of MPI processes 1 Processor names Barrys-MacBook-Pro-3.local
Triad: 10417.1979 Rate (MB/s)
Number of MPI processes 2 Processor names Barrys-MacBook-Pro-3.local
Barrys-MacBook-Pro-3.local
Triad: 14673.8802 Rate (MB/s)
Number of MPI processes 3 Processor names Barrys-MacBook-Pro-3.local
Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local
Triad: 14998.7656 Rate (MB/s)
Number of MPI processes 4 Processor names Barrys-MacBook-Pro-3.local
Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local
Barrys-MacBook-Pro-3.local
Triad: 15001.2941 Rate (MB/s)
------------------------------------------------
np speedup
1 1.0
2 1.41
3 1.44
4 1.44
Is mine a better machine since I get a speedup of 1.44 while you get no speed
up?
No, the total memory bandwidth each of our machines can sustain is about Triad:
15001.2941 Rate (MB/s). My machine, which I am guessing is a little
older than yours cannot utilize all that memory bandwidth with a single core.
Triad: 10417.1979 Rate (MB/s) On your machine a single core can
utilize all of the memory bandwidth, hence when you use the second core you get
no speedup. I get speed up because the second core utilizes the extra memory
bandwidth the first core did not utilize. On the other hand your machine will
run PETSc programs a good bit faster on one core than mine. So parallelism will
not give you any real benefit on your laptop, on mine it does, but in the end
code will run slightly faster on your machine so your machine is better than
mine.
Barry
> On Dec 4, 2014, at 7:51 PM, Jed Brown <[email protected]> wrote:
>
> "Sun, Hui" <[email protected]> writes:
>
>> Thank you Jed. I don't know how to use "lstopo" from the hwloc,
>
> A search engine will solve that problem.
>
>> but I looked up the cores and memory from the hardware overview from
>> my MAC, it has
>>
>> Number of Processors: 1
>> Total Number of Cores: 2
>>
>> Besides, as you said, there are 4 logical cores due to hyperthreading.
>> However, I'm still expecting to get speed doubled because I have 2 real
>> cores. So where is the restriction then?
>
> Memory bandwidth, as stated in my email and the page I linked.