Hello Grant,

[EMAIL PROTECTED] wrote on 04/19/2006 09:42:26 AM:
> I've looked at this tradeoff pretty closely with ia64 (1.5Ghz)
> by pinning netperf to a different CPU than the one handling interrupts.
> By moving netperf RX traffic off the CPU handling interrupts,
> the 1.5Ghz ia64 box goes from 2.8 Gb/s to around 3.5 Gb/s.
> But the "service demand" (CPU time per KB payload)  goes up
> from ~2.3 usec/KB to ~3.1 usec/KB - cacheline misses go up dramatically.

Yes, netperf/netserver binding to same cpu definitely has benefit cacheline.
But the cpu will be the bottleneck. One cpu is not sufficient to drain out faster
network device HCA.

> I'm expect splitting the RX/TX completeions would achieve something
> similar since we are just "slicing" the same problem from a different
> angle.  Apps typically do both RX and TX and will be running on one
> CPU. So on one path they will be missing cachelines.

It's different. Binding cpu garantees packets goes to the same cpu.
WC handler is not in interrupt conext. It could deliver packets to different cpus.

> Anyway, my take is IPoIB perf isn't as critical as SDP and RDMA perf.
> If folks really care about perf, they have to migrate away from
> IPoIB to either SDP or directly use RDMA (uDAPL or something).
> Splitting RX/TX completions might help initial adoption, but
> aren't were the big wins in perf are.

IPoIB perf if important for people still use old application. We do see under some
workload IPoIB gain double bidirectional performance with splitting CQ/tune poll
interval/poll more entries from WC patch.
It's a huge improvement.

Thanks
Shirley Ma
IBM Linux Technology Center
15300 SW Koll Parkway
Beaverton, OR 97006-6063
Phone(Fax): (503) 578-7638



_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to