I am running single-node Sandy Bridge cases with OpenMPI and looking at scaling.

I'm using -bind-to-core without any other options (default is -bycore I 
believe).

These numbers indicate number of cores first, then the second digit is the run 
number (except for n=1, all runs repeated 3 times).  Any thought why n15 should 
be so much slower than n16?   I also measure the RSS of the running processes, 
and the rank 0 process for n=15 cases uses about 2x more memory than all the 
other ranks, whereas all the ranks use the same amount of memory for the n=16 
cases.

Thanks for insights,

Ed

n1.1:    6.9530
n2.1:    7.0185
n2.2:    7.0313
n3.1:    8.2069
n3.2:    8.1628
n3.3:    8.1311
n4.1:    7.5307
n4.2:    7.5323
n4.3:    7.5858
n5.1:    9.5693
n5.2:    9.5104
n5.3:    9.4821
n6.1:    8.9821
n6.2:    8.9720
n6.3:    8.9541
n7.1:    10.640
n7.2:    10.650
n7.3:    10.638
n8.1:    8.6822
n8.2:    8.6630
n8.3:    8.6903
n9.1:    9.5058
n9.2:    9.5255
n9.3:    9.4809
n10.1:    10.484
n10.2:    10.452
n10.3:    10.516
n11.1:    11.327
n11.2:    11.316
n11.3:    11.318
n12.1:    12.285
n12.2:    12.303
n12.3:    12.272
n13.1:    13.127
n13.2:    13.113
n13.3:    13.113
n14.1:    14.035
n14.2:    13.989
n14.3:    14.021
n15.1:    14.533
n15.2:    14.529
n15.3:    14.586
n16.1:    8.6542
n16.2:    8.6731
n16.3:    8.6586
~

Reply via email to