Interesting.

Try with the native OFED benchmarks -- i.e., get MPI out of the way and see if 
the raw/native performance of the network between the devices reflects the same 
dichotomy.

(e.g., ibv_rc_pingpong)


On Jul 15, 2011, at 7:58 PM, David Warren wrote:

> All OFED 1.4 and 2.6.32 (that's what I can get to today)
> qib to qib:
> 
> # OSU MPI Latency Test v3.3
> # Size            Latency (us)
> 0                         0.29
> 1                         0.32
> 2                         0.31
> 4                         0.32
> 8                         0.32
> 16                        0.35
> 32                        0.35
> 64                        0.47
> 128                       0.47
> 256                       0.50
> 512                       0.53
> 1024                      0.66
> 2048                      0.88
> 4096                      1.24
> 8192                      1.89
> 16384                     3.94
> 32768                     5.94
> 65536                     9.79
> 131072                   18.93
> 262144                   37.36
> 524288                   71.90
> 1048576                 189.62
> 2097152                 478.55
> 4194304                1148.80
> 
> # OSU MPI Bandwidth Test v3.3
> # Size        Bandwidth (MB/s)
> 1                         2.48
> 2                         5.00
> 4                        10.04
> 8                        20.02
> 16                       33.22
> 32                       67.32
> 64                      134.65
> 128                     260.30
> 256                     486.44
> 512                     860.77
> 1024                   1385.54
> 2048                   1940.68
> 4096                   2231.20
> 8192                   2343.30
> 16384                  2944.99
> 32768                  3213.77
> 65536                  3174.85
> 131072                 3220.07
> 262144                 3259.48
> 524288                 3277.05
> 1048576                3283.97
> 2097152                3288.91
> 4194304                3291.84
> 
> # OSU MPI Bi-Directional Bandwidth Test v3.3
> # Size     Bi-Bandwidth (MB/s)
> 1                         3.10
> 2                         6.21
> 4                        13.08
> 8                        26.91
> 16                       41.00
> 32                       78.17
> 64                      161.13
> 128                     312.08
> 256                     588.18
> 512                     968.32
> 1024                   1683.42
> 2048                   2513.86
> 4096                   2948.11
> 8192                   2918.39
> 16384                  3370.28
> 32768                  3543.99
> 65536                  4159.99
> 131072                 4709.73
> 262144                 4733.31
> 524288                 4795.44
> 1048576                4753.69
> 2097152                4786.11
> 4194304                4779.40
> 
> mlx4 to mlx4:
> # OSU MPI Latency Test v3.3
> # Size            Latency (us)
> 0                         1.62
> 1                         1.66
> 2                         1.67
> 4                         1.66
> 8                         1.70
> 16                        1.71
> 32                        1.75
> 64                        1.91
> 128                       3.11
> 256                       3.32
> 512                       3.66
> 1024                      4.46
> 2048                      5.57
> 4096                      6.62
> 8192                      8.95
> 16384                    11.07
> 32768                    15.94
> 65536                    25.57
> 131072                   44.93
> 262144                   83.58
> 524288                  160.85
> 1048576                 315.47
> 2097152                 624.68
> 4194304                1247.17
> 
> # OSU MPI Bandwidth Test v3.3
> # Size        Bandwidth (MB/s)
> 1                         1.80
> 2                         4.21
> 4                         8.79
> 8                        18.14
> 16                       35.79
> 32                       68.58
> 64                      132.72
> 128                     221.89
> 256                     399.62
> 512                     724.13
> 1024                   1267.36
> 2048                   1959.22
> 4096                   2354.26
> 8192                   2519.50
> 16384                  3225.44
> 32768                  3227.86
> 65536                  3350.76
> 131072                 3369.86
> 262144                 3378.76
> 524288                 3384.02
> 1048576                3386.60
> 2097152                3387.97
> 4194304                3388.66
> 
> # OSU MPI Bi-Directional Bandwidth Test v3.3
> # Size     Bi-Bandwidth (MB/s)
> 1                         1.70
> 2                         3.86
> 4                        10.42
> 8                        20.99
> 16                       41.22
> 32                       79.17
> 64                      151.25
> 128                     277.64
> 256                     495.44
> 512                     843.44
> 1024                    162.53
> 2048                   2427.23
> 4096                   2989.63
> 8192                   3587.58
> 16384                  5391.08
> 32768                  6051.56
> 65536                  6314.33
> 131072                 6439.04
> 262144                 6506.51
> 524288                 6539.51
> 1048576                6558.34
> 2097152                6567.24
> 4194304                6555.76
> 
> mixed:
> # OSU MPI Latency Test v3.3
> # Size            Latency (us)
> 0                         3.81
> 1                         3.88
> 2                         3.86
> 4                         3.85
> 8                         3.92
> 16                        3.93
> 32                        3.93
> 64                        4.02
> 128                       4.60
> 256                       4.80
> 512                       5.14
> 1024                      5.94
> 2048                      7.26
> 4096                      8.50
> 8192                     10.98
> 16384                    19.92
> 32768                    26.35
> 65536                    39.93
> 131072                   64.45
> 262144                  106.93
> 524288                  191.89
> 1048576                 358.31
> 2097152                 694.25
> 4194304                1429.56
> 
> # OSU MPI Bandwidth Test v3.3
> # Size        Bandwidth (MB/s)
> 1                         0.64
> 2                         1.39
> 4                         2.76
> 8                         5.58
> 16                       11.03
> 32                       22.17
> 64                       43.70
> 128                     100.49
> 256                     179.83
> 512                     305.87
> 1024                    544.68
> 2048                    838.22
> 4096                   1187.74
> 8192                   1542.07
> 16384                  1260.93
> 32768                  1708.54
> 65536                  2180.45
> 131072                 2482.28
> 262144                 2624.89
> 524288                 2680.55
> 1048576                2728.58
> never gets past here
> 
> # OSU MPI Bi-Directional Bandwidth Test v3.3
> # Size     Bi-Bandwidth (MB/s)
> 1                         0.41
> 2                         0.83
> 4                         1.68
> 8                         3.37
> 16                        6.71
> 32                       13.37
> 64                       26.64
> 128                      63.47
> 256                     113.23
> 512                     202.92
> 1024                    362.48
> 2048                    578.53
> 4096                    830.31
> 8192                   1143.16
> 16384                  1303.02
> 32768                  1913.07
> 65536                  2463.83
> 131072                 2793.83
> 262144                 2918.32
> 524288                 2987.92
> 1048576                3033.31
> never gets past here
> 
> 
> 
> On 07/15/11 09:03, Jeff Squyres wrote:
>> I don't think too many people have done combined QLogic + Mellanox runs, so 
>> this probably isn't a well-explored space.
>> 
>> Can you run some microbenchmarks to see what kind of latency / bandwidth 
>> you're getting between nodes of the same type and nodes of different types?
>> 
>> On Jul 14, 2011, at 8:21 PM, David Warren wrote:
>> 
>>   
>>> On my test runs (wrf run just long enough to go beyond the spinup influence)
>>> On just 6 of the the old mlx4 machines I get about 00:05:30 runtime
>>> On 3 mlx4 and 3 qib nodes I get avg of 00:06:20
>>> So the slow down is about 11+%
>>> When this is a full run 11% becomes a evry long time.  This has held for 
>>> some longer tests as well before I went to ofed 1.6.
>>> 
>>> On 07/14/11 05:55, Jeff Squyres wrote:
>>>     
>>>> On Jul 13, 2011, at 7:46 PM, David Warren wrote:
>>>> 
>>>> 
>>>>       
>>>>> I finally got access to the systems again (the original ones are part of 
>>>>> our real time system). I thought I would try one other test I had set up 
>>>>> first.  I went to OFED 1.6 and it started running with no errors. It must 
>>>>> have been an OFED bug. Now I just have the speed problem. Anyone have a 
>>>>> way to make the mixture of mlx4 and qlogic work together without slowing 
>>>>> down?
>>>>> 
>>>>>         
>>>> What do you mean by "slowing down"?
>>>> 
>>>> 
>>>>       
>>> <warren.vcf>
>>>     
>> 
>>   
> <warren.vcf>


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to