On Thu, 2006-04-20 at 21:03, Bernard King-Smith wrote: > Grant Grundler wrote: > > > Currently we only get 40% of the link bandwidth compared to > > 85% for 10 GigE. (Yes I know the cost differences which favor IB ). > > Grant> 10gige is getting 85% without TOE? > Grant> Or are they distributing event handling across several CPUs? > > On 10 GigE they are using large send to the adapter where a 60K buffer is > read by the adapter and fragmented into 1500 or 9000 byte Ethernet packets. > Essentially they offload fragmentation to Ethernet packets from TCP to the > adapter. This is similar to RC mode in IB fragmenting larger buffers into > link 2000 byte frames/packets. > > > However, two things hurt user level protocols. First is scaling and > memory > > requirements. Looking at parallel file systems on large clusters, SDP > ended > > up consuming so much memory it couldn't be used. The N by N socket > > connections per node, using SDP the required buffer space and QP memory > got > > out of control. There is something to be said for sharing buffer and QP > > space across lots of sockets. > > Grant> My guess is it's an easier problem to fix SDP than reducing TCP/IP > Grant> cache/CPU foot print. I realize only a subset of apps can (or will > Grant> try to) use SDP because of setup/config issues. I still believe SDP > Grant> is useful to a majority of apps without having to recompile them. > > I agree that reducing any protocol footprint is a very challenging job, > however, going to a larger MTU drops the overhead much faster. If IB > supported a 60K MTU then the TCP/IP overhead would be 1/30 that of what we > measure today.
Then what you want is IPoIB-CM where the MTU can be much larger. -- Hal > Traversng the TCP/IP stack once for a 60K packet is much > lower than 30 times using 2000 byte packets for the same amount of data > transmitted. > > > The other issue is flow control across hundreds of autonomous sockets. In > > TCP/IP, traffic can be managed so that there is some fairness > > (multiplexing, QoS etc.) across all active sockets. For user level > > protocols like SDP and uDAPL, you can't manage traffic across multiple > > autonomous user application connections because ther is no where to see > all > > of them at the same tiem for mangement. This can lead to overrunning > > adapters or timeouts to the applications. This tends to be a large system > > problem when you have lots of CPUs. > > Grant> I'm not competent to disagree in detail. > Grant> Fabian Tillier and Caitlin Bestler can (and have) addressed this. > > I would be very interested in any pointers to their work. > > > The footprint of IPoIB + TCP/IP is large as on any system, However, as > you > > get to higher CPU counts, the issue becomes less of a problem since more > > unused CPU cycles are available. However, affinity ( CPU and Memory) and > > cacheline miss issues get greater. > > Grant> Hrm...the concept of "unused CPU cycles" is bugging me as someone > Grant> who occasionally gets to run benchmarks. If a system today has > Grant> unused CPU cycles, then will adding a faster link change the CPU > Grant> load if the application doesn't change? > > This goes back to systems where the system is busy doing nothing, generally > when waiting for memory or a cache line miss, or I/O to disks. This is > where hyperthreading has shown some speedups for benchmarks where > previously they were totally CPU limited, and with hyperthreading there is > a gain. The unused cycles are "wait" cycles when something can run if it > can get in quickly. You can't get a TCP stack in the wait, but small parts > of the stackor driver could fit in the other thread. Yes I do benchmarking > and was skeptical at first. > > Grant> thanks, > Grant> grant > > Bernie King-Smith > IBM Corporation > Server Group > Cluster System Performance > [EMAIL PROTECTED] (845)433-8483 > Tie. 293-8483 or wombat2 on NOTES > > "We are not responsible for the world we are born into, only for the world > we leave when we die. > So we have to accept what has gone before us and work to change the only > thing we can, > -- The Future." William Shatner > > _______________________________________________ > openib-general mailing list > [email protected] > http://openib.org/mailman/listinfo/openib-general > > To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
