> From: Or Gerlitz > Sent: Monday, September 18, 2006 5:45 AM > To: Michael S. Tsirkin > Cc: OPENIB > Subject: Re: [openib-general] [PATCH] osm: PathRecord prefer 1K MTU for > MT23108 devices > > Michael S. Tsirkin wrote: > > Quoting r. Or Gerlitz <[EMAIL PROTECTED]>: > > >> Eitan Zahavi wrote: > >>> The following patch solves an issue with OpenSM preferring largest MTU > >>> for PathRecord/MultiPathRecord for paths going to or from MT23108 > (Tavor) > >>> devices instead of using a 1K MTU which is best for this device. > > >> Isn't the 2K MTU issue with Tavor comes into play only under RC QP? > > > I don't think so, no. Tavor supports 2K MTU, but it has better > performance with > > 1K MTU than 2K MTU. QP type should not matter. > > Can you double check that please, as far as i know there is something > like BW 40-50% drop with Tavor/RC/2048 vs Tavor/RC/1024 but the BW with > Tavor/UD/2048 is **no less** then Tavor/UD/1024. > > So its very common for IPoIB net devices impl. to expose 2044 or 1500 > bytes MTU to the OS eg to cope with Ethernet and reduce IP > fragmentation/reassembly of UDP/TCP traffic. >
Putting this in the SM alone and making it a fabric wide setting is inappropriate. The performance difference depends on application message size. Application message size can vary per ULP and/or per application itself. For example one MPI application may send mostly large messages while another may send mostly small messages. The same could be true of applications for other ULPs such as uDAPL and SDP, etc. The root issue is the Tavor HCA has 1 too few credits to truly double buffer at 2K MTU. However at message sizes > 1K but < 2K the 2K MTU performs better. Here are some MPI bandwidth results: Tavor w/ 2K MTU: 512 140.394173 1024 310.553002 1500 407.003858 1800 435.538752 2048 392.831026 4096 417.592991 Tavor w/ 1K MTU: 512 140.261964 1024 300.789425 1500 379.746835 1800 416.726957 2048 425.227096 4096 501.442289 Note that message sizes shown on left do not include MPI headers. Hence actual IB message size is approx 50 bytes larger. So we see at IB message sizes < 1024 (MPI 512 message), performance is the same. At IB message sizes > 1024 < 2048 (MPI 1024-1800 messages), performance is best with 2K MTU. At IB message sizes > 2048 (MPI 2048-4096 messages above), performance is best with 1K MTU. At larger IB message sizes (MPI 4096 message), performance starts to take off and ultimately at 128K message size (not shown) the 50% difference between 1K and 2K MTU reaches its peak. Todd Rimmer _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general