Thank you Barry and Jed for your explanations. I think I understand it a little 
bit better now. 

Hui


________________________________________
From: Barry Smith [[email protected]]
Sent: Thursday, December 04, 2014 7:37 PM
To: Jed Brown
Cc: Sun, Hui; [email protected]
Subject: Re: [petsc-users] Parallelization efficiency diagnose

   I have a different MacBook Pro generation and get

$ make streams NPMAX=4
cd src/benchmarks/streams; /usr/bin/make  --no-print-directory streams
/Users/barrysmith/Src/PETSc/arch-mpich/bin/mpicc -o MPIVersion.o -c -fPIC -Wall 
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -g3 -O0   
-I/Users/barrysmith/Src/PETSc/include 
-I/Users/barrysmith/Src/PETSc/arch-mpich/include -I/opt/X11/include 
-I/opt/local/include    `pwd`/MPIVersion.c
Number of MPI processes 1 Processor names  Barrys-MacBook-Pro-3.local
Triad:        10417.1979   Rate (MB/s)
Number of MPI processes 2 Processor names  Barrys-MacBook-Pro-3.local 
Barrys-MacBook-Pro-3.local
Triad:        14673.8802   Rate (MB/s)
Number of MPI processes 3 Processor names  Barrys-MacBook-Pro-3.local 
Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local
Triad:        14998.7656   Rate (MB/s)
Number of MPI processes 4 Processor names  Barrys-MacBook-Pro-3.local 
Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local Barrys-MacBook-Pro-3.local
Triad:        15001.2941   Rate (MB/s)
------------------------------------------------
np  speedup
1 1.0
2 1.41
3 1.44
4 1.44

Is mine a better machine since I get a speedup of 1.44 while you get no speed 
up?

No, the total memory bandwidth each of our machines can sustain is about Triad: 
       15001.2941   Rate (MB/s).  My machine, which I am guessing is a little 
older than yours cannot utilize all that memory bandwidth with a single core. 
Triad:        10417.1979   Rate (MB/s)  On your machine a single core can 
utilize all of the memory bandwidth, hence when you use the second core you get 
no speedup. I get speed up because the second core utilizes the extra memory 
bandwidth the first core did not utilize. On the other hand your machine will 
run PETSc programs a good bit faster on one core than mine. So parallelism will 
not give you any real benefit on your laptop, on mine it does, but in the end 
code will run slightly faster on your machine so your machine is better than 
mine.

  Barry


> On Dec 4, 2014, at 7:51 PM, Jed Brown <[email protected]> wrote:
>
> "Sun, Hui" <[email protected]> writes:
>
>> Thank you Jed. I don't know how to use "lstopo" from the hwloc,
>
> A search engine will solve that problem.
>
>> but I looked up the cores and memory from the hardware overview from
>> my MAC, it has
>>
>> Number of Processors:        1
>> Total Number of Cores:       2
>>
>> Besides, as you said, there are 4 logical cores due to hyperthreading. 
>> However, I'm still expecting to get speed doubled because I have 2 real 
>> cores. So where is the restriction then?
>
> Memory bandwidth, as stated in my email and the page I linked.

Reply via email to