I am running single-node Sandy Bridge cases with OpenMPI and looking at scaling.
I'm using -bind-to-core without any other options (default is -bycore I believe). These numbers indicate number of cores first, then the second digit is the run number (except for n=1, all runs repeated 3 times). Any thought why n15 should be so much slower than n16? I also measure the RSS of the running processes, and the rank 0 process for n=15 cases uses about 2x more memory than all the other ranks, whereas all the ranks use the same amount of memory for the n=16 cases. Thanks for insights, Ed n1.1: 6.9530 n2.1: 7.0185 n2.2: 7.0313 n3.1: 8.2069 n3.2: 8.1628 n3.3: 8.1311 n4.1: 7.5307 n4.2: 7.5323 n4.3: 7.5858 n5.1: 9.5693 n5.2: 9.5104 n5.3: 9.4821 n6.1: 8.9821 n6.2: 8.9720 n6.3: 8.9541 n7.1: 10.640 n7.2: 10.650 n7.3: 10.638 n8.1: 8.6822 n8.2: 8.6630 n8.3: 8.6903 n9.1: 9.5058 n9.2: 9.5255 n9.3: 9.4809 n10.1: 10.484 n10.2: 10.452 n10.3: 10.516 n11.1: 11.327 n11.2: 11.316 n11.3: 11.318 n12.1: 12.285 n12.2: 12.303 n12.3: 12.272 n13.1: 13.127 n13.2: 13.113 n13.3: 13.113 n14.1: 14.035 n14.2: 13.989 n14.3: 14.021 n15.1: 14.533 n15.2: 14.529 n15.3: 14.586 n16.1: 8.6542 n16.2: 8.6731 n16.3: 8.6586 ~