On Tue, Jan 04, 2005 at 01:10:15PM -0800, Roland Dreier wrote: > Josh> I'm seeing about 364 MB/s between 2 PCIe Xeon 3.2GHz boxes > Josh> using netperf-2.3pl1. > > Are you using MSI-X? To use it, set CONFIG_PCI_MSI=y when you build > your kernel and either "modprobe ib_mthca msi_x=1"...
Good news: Topspin firmware 3.3.2 can run netperf w/MSI-X on ia64 too Bad news: I'm getting weak perf #s on the ZX1 boxes (~1580 Mbps == ~200MB/s) This is with MSI-X enabled on both systems. RX2600 sending TCP_Stream packets to RX4640 via topspin 12port switch. Rx2600 has "Low Profile" (Cougarcub) and rx4640 has "Cougar" installed in "dual rope" slots. /opt/netperf/netperf -l 60 -H 10.0.1.81 -t TCP_STREAM -i 5,2 -I 99,5 -- -m 8192 -s 262144 -S 262144 TCP STREAM TEST to 10.0.1.81 : +/-2.5% @ 99% conf. Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 262142 262142 8192 60.00 1588.33 q-syscollect on netperf client (RX2600, dual 1.5Ghz): ionize:~/.q# q-view kernel-cpu1.info#0 | less Flat profile of CPU_CYCLES in kernel-cpu1.hist#0: Each histogram sample counts as 1.00034m seconds % time self cumul calls self/call tot/call name 25.09 14.98 14.98 80.7k 186u 186u default_idle 9.73 5.81 20.79 35.9M 162n 162n _spin_unlock_irqrestore 5.63 3.36 24.15 27.8M 121n 136n ipt_do_table 4.27 2.55 26.70 15.0M 170n 170n do_csum 3.49 2.08 28.78 6.95M 300n 300n __copy_user 2.66 1.59 30.37 14.3M 111n 673n nf_iterate 2.63 1.57 31.94 5.82M 270n 729n tcp_transmit_skb 2.59 1.54 33.49 68.5M 22.5n 33.2n local_bh_enable 2.33 1.39 34.88 6.79M 205n - tcp_packet 1.83 1.09 35.97 355k 3.08u 32.4u tcp_sendmsg 1.57 0.94 36.91 2.32M 405n 2.11u ipoib_ib_completion 1.48 0.88 37.79 5.92M 149n 162n ip_queue_xmit 1.46 0.87 38.67 2.46M 354n 2.41u mthca_eq_int 1.20 0.72 39.39 6.93M 104n 376n ip_conntrack_in 1.17 0.70 40.08 7.52M 92.6n 92.6n time_interpolator_get_o ffset ... And on the "netserver" (RX4640, 4 1.3Ghz) side: t profile of CPU_CYCLES in kernel-cpu3.hist#0: Each histogram sample counts as 551.305u seconds % time self cumul calls self/call tot/call name 34.69 18.97 18.97 16.6M 1.15u 1.15u do_csum 7.58 4.15 23.12 19.4M 213n 213n _spin_unlock_irqrestore 6.67 3.65 26.76 61.4k 59.4u 59.4u default_idle 5.33 2.91 29.68 22.3M 131n 149n ipt_do_table 3.02 1.65 31.33 1.93M 856n 8.35u ipoib_ib_completion 2.73 1.49 32.82 6.45M 231n 231n __copy_user 2.61 1.43 34.25 11.2M 128n 1.32u nf_iterate 2.30 1.26 35.51 5.55M 227n - tcp_packet 2.06 1.12 36.63 51.3M 21.9n 25.4n local_bh_enable 1.97 1.08 37.71 5.51M 195n 273n tcp_v4_rcv 1.43 0.78 38.49 1.77M 443n 9.63u mthca_eq_int 1.35 0.74 39.23 5.28M 139n 1.93u netif_receive_skb 1.19 0.65 39.88 5.60M 116n 1.59u ip_conntrack_in 1.14 0.62 40.50 5.53M 113n 2.92u tcp_rcv_established 1.03 0.56 41.06 5.31M 106n 135n ip_route_input 1.02 0.56 41.62 5.24M 107n 1.80u ip_rcv 0.91 0.50 42.12 5.43M 91.6n 369n ip_local_deliver_finish 0.90 0.49 42.61 5.51M 89.7n 89.7n netif_rx 0.89 0.49 43.10 1.93M 253n 9.13u handle_IRQ_event 0.85 0.46 43.56 33.7M 13.8n 13.8n _read_lock_bh ... _spin_unlock_irqrestore is a clue we are spending time in interrupt handlers and that isn't getting measured. top was reporting "netserver" consuming ~80% of one CPU and netperf consuming ~60% of one CPU. Other cpu's were idle on both boxes. Something else is slowing things down...I know these boxes are capable of 800-900 MB/s on the PCI bus. hth, grant _______________________________________________ openib-general mailing list [email protected] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
