Hi all,

Here is BMI pingpong performance using Opteron 285s (dual core, dual socket 2.4 GHz):

Length          Lat (us)        BW (MB/s)
1               3.90            0.26
64              4.38            14.60
128             4.87            26.29
256             6.34            40.37
512             6.85            74.75
1024            8.22            124.52
2048            9.31            219.94
4096            11.62           352.49
8192            17.04           480.67
32768           39.90           821.21
1048576         1044.83         1003.59

With MX registration cache:
1048576         917.61          1142.72


and native MX performance for comparison:

   Length   Latency(us)    Bandwidth(MB/s)
        1       2.200          0.455
       64       2.763         23.167
      128       2.908         44.024
      256       4.439         57.677
      512       5.129         99.825
     1024       6.530        156.815
     2048       7.556        271.043
     4096      10.145        403.766
     8192      14.743        555.635
    32768      37.490        874.046
  1048576     879.213       1192.630

On these machines, BMI only adds about 1.7 us latency.

I would normally expect MX to get about 1225 MB/s (out of the 1250 MB/ s line rate) and I would expect BMI to get about 1200 MB/s. I will look into this tomorrow.

Overall, raw BMI performance is good and imposes little overhead.

Scott


On Jan 26, 2007, at 4:35 PM, Scott Atchley wrote:

Hi Murali,

Ok, I will check them out.

In the meantime, I have written a test similar to IMB PingPong that uses BMI directly. It should work with TCP, GM, MX, and IB. Below are some various results for some old Xeons with Myrinet-2000 cards (250 MB/s link rate).

The latency is one-way and throughput is bi-directional.

I will also write a version that tests unexpected messages up to unexpected max size.

Scott

bmi_mx results:

Length          Lat (us)        BW (MB/s)
1               7.97            0.13
64              8.85            7.23
256             11.99           21.35
512             13.76           37.20
1024            17.76           57.67
4096            32.49           126.07
8192            54.38           150.65
32768           158.41          206.85
1048576         4583.91         228.75

With the registration cache:
1048576         4305.72         243.53


For comparison, these are mx_pingpong (raw MX) results for the same message sizes:

   Length   Latency(us)    Bandwidth(MB/s)
        1       3.466          0.288
       64       4.587         13.954
      256       7.141         35.852
      512       9.089         56.329
     1024      12.937         79.153
     4096      26.696        153.431
     8192      46.815        174.987
    32768     154.087        212.659
  1048576    4271.931        245.457

BMI and/or bmi_mx adds about 4.5 us additional latency. It can get close to line rate with or without the MX registration cache.

Scott
_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

_______________________________________________
Pvfs2-developers mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-developers

Reply via email to