/me bounces! With openib-1321, ip_ipoib is seems to be working! I couldn't reproduce the problem with ping not working sometimes. Current issue is misaligned accesses in the kernel. Here's a "cleaner" set of data.
ionize:/usr/src/linux-ia64-release-2.6.10# modprobe ib_mthca ib_mthca: Mellanox InfiniBand HCA driver v0.06-pre (November 8, 2004) ib_mthca: Initializing Mellanox Technology MT23108 InfiniHost (0000:81:00.0) GSI 60 (level, low) -> CPU 0 (0x0000) vector 67 ACPI: PCI interrupt 0000:81:00.0[A] -> GSI 60 (level, low) -> IRQ 67 ionize:/usr/src/linux-ia64-release-2.6.10# elilo -v --efiboot ionize:/usr/src/linux-ia64-release-2.6.10# modprobe ib_ipoib ionize:/usr/src/linux-ia64-release-2.6.10# cat /sys/class/infiniband/mthca0/ports/?/state 4: ACTIVE 4: ACTIVE ionize:/usr/src/linux-ia64-release-2.6.10# ifconfig ib0 10.0.0.2 netmask 255.255.255.0 broadcast 10.0.0.255 ionize:/usr/src/linux-ia64-release-2.6.10# ifconfig ib1 10.0.1.2 netmask 255.255.255.0 broadcast 10.0.1.255 ionize:/usr/src/linux-ia64-release-2.6.10# ping 10.0.0.1 PING 10.0.0.1 (10.0.0.1) 56(84) bytes of data. kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001be010 kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001bef10 64 bytes from 10.0.0.1: icmp_seq=1 ttl=64 time=14.4 ms kernel unaligned access to 0xe0000002ff5fe05c, ip=0xa0000002001bef10 64 bytes from 10.0.0.1: icmp_seq=2 ttl=64 time=0.571 ms 64 bytes from 10.0.0.1: icmp_seq=3 ttl=64 time=0.069 ms 64 bytes from 10.0.0.1: icmp_seq=4 ttl=64 time=0.067 ms 64 bytes from 10.0.0.1: icmp_seq=5 ttl=64 time=0.069 ms 64 bytes from 10.0.0.1: icmp_seq=6 ttl=64 time=0.068 ms --- 10.0.0.1 ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 5001ms rtt min/avg/max/mdev = 0.067/2.551/14.463/5.330 ms ionize:/usr/src/linux-ia64-release-2.6.10# ionize:/usr/src/linux-ia64-release-2.6.10# cd /opt/netperf/ ionize:/opt/netperf# ls netperf snapshot_script tcp_rr_script udp_rr_script netserver tcp_range_script tcp_stream_script udp_stream_script ionize:/opt/netperf# ./snapshot_script 10.0.1.1 Netperf snapshot script started at Fri Dec 10 13:00:02 PST 2004 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 kernel unaligned access to 0xe0000002ff5fd85c, ip=0xa0000002001bef10 ... misaligned accesses reports are rate limited by the kernel. The above is just the tip of the iceberg. a0000002001bee60 t ipoib_start_xmit [ib_ipoib] a0000002001bf880 t ipoib_get_stats [ib_ipoib] The "netserver" (rx4640) is getting the following: kernel unaligned access to 0xe0000001008b0f5c, ip=0xa000000200152f10 a000000200152e60 t ipoib_start_xmit [ib_ipoib] a000000200153880 t ipoib_get_stats [ib_ipoib] based on IP and offset (0x5c) I'll guess this is the same problem on both sides. Still looking at it. FYA, Starting 32x4 TCP_STREAM tests at Fri Dec 10 13:09:36 PST 2004 ------------------------------------ Testing with the following command line: /opt/netperf/netperf -t TCP_STREAM -l 60 -H 10.0.1.1 -i 10,3 -I 99,5 -- -s 32768 -S 32768 -m 4096 ... Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 262142 262142 4096 60.00 1164.65 Fixing the alignment issue should help here. Then I can start drilling a bit deeper on bottlenecks. hth, grant _______________________________________________ openib-general mailing list [EMAIL PROTECTED] http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
