Re: [E1000-devel] LRO botch with 82598EB 2.0.44.14-NAPI
Hi Roy, I am sure we can figure out what is going on, thanks for the report. Can you run one test for me? Please try without the InterruptThrottleRate driver parameter, but with LRO enabled. Since you are here at the same campus as we are I hope I can maybe just get direct access to your machines. On Wed, 2009-11-11 at 15:24 -0800, Larsen, Roy K wrote: I believe there is a problem with the software LRO in the ixgbe driver. With LRO enabled, my cluster application hangs where two processes have data to send to each other as indicated by looking at the send queue with netstat(8) but it is not making progress even though the receive queues are empty. If I build the driver without LRO (make CFLAGS_EXTRA=-DIXGBE_NO_LRO install), this issue goes away. These are compute nodes that do not do routing or IP forwarding. The hang is easily reproduced. The particulars follow. Roy Larsen Intel Corp. roy.k.lar...@intel.commailto:roy.k.lar...@intel.com JF5-3-J4 -- Red Hat EL5.3 (2.6.18-128.el5 kernel) Dual socket Nehalem 2.9GHz nodes (8 cores) with 12GB of memory, hyper-threading disabled Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01) Fujitsu xg700 switch 8 nodes (64 cores) Intel MPI 4.0.0.014 [r...@cstnh-1 library]# ethtool -i eth2 driver: ixgbe version: 2.0.44.14-NAPI firmware-version: 1.8-0 bus-info: :02:00.0 ixgbe driver loaded with following options: modprobe ixgbe InterruptThrottleRate=0,0 netstat -t on node nh1-eth2 Proto Recv-Q Send-Q Local Address Foreign Address State tcp0 5224 nh1-eth2:55716 nh2-eth2:44115 ESTABLISHED netstat -t on node nh2-eth2 Proto Recv-Q Send-Q Local Address Foreign Address State tcp0 331648 nh2-eth2:44115 nh1-eth2:55716 ESTABLISHED The tcpdump(8) trace shows the connection is not making progress [r...@cstnh-1 library]# tcpdump -i eth2 -v host nh2-eth2 and host nh1-eth2 and port 55716 and port 44115 tcpdump-tnic: listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes 17:17:17.092120 IP (tos 0x0, ttl 64, id 95, offset 0, flags [DF], proto TCP (6), length 1500) nh2-eth2.44115 nh1-eth2.55716: Flags [.], ack 162998588, win 382, length 1460 17:17:17.092227 IP (tos 0x0, ttl 64, id 56737, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716 nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 39426, win 382, length 0 17:17:17.348305 IP (tos 0x0, ttl 64, id 56738, offset 0, flags [DF], proto TCP (6), length 1500) nh1-eth2.55716 nh2-eth2.44115: Flags [.], ack 39426, win 382, length 1460 17:17:17.348326 IP (tos 0x0, ttl 64, id 96, offset 0, flags [DF], proto TCP (6), length 40) nh2-eth2.44115 nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 1, win 382, length 0 17:17:17.348331 IP (tos 0x0, ttl 64, id 56739, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716 nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 39426, win 382, length 0 17:18:08.548706 IP (tos 0x0, ttl 64, id 97, offset 0, flags [DF], proto TCP (6), length 1500) nh2-eth2.44115 nh1-eth2.55716: Flags [.], ack 1, win 382, length 1460 17:18:08.548711 IP (tos 0x0, ttl 64, id 56740, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716 nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 39426, win 382, length 0 17:18:09.060306 IP (tos 0x0, ttl 64, id 56741, offset 0, flags [DF], proto TCP (6), length 1500) nh1-eth2.55716 nh2-eth2.44115: Flags [.], ack 39426, win 382, length 1460 17:18:09.060327 IP (tos 0x0, ttl 64, id 98, offset 0, flags [DF], proto TCP (6), length 40) nh2-eth2.44115 nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 1, win 382, length 0 17:18:09.060332 IP (tos 0x0, ttl 64, id 56742, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716 nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 39426, win 382, length 0 17:19:51.461901 IP (tos 0x0, ttl 64, id 99, offset 0, flags [DF], proto TCP (6), length 1500) nh2-eth2.44115 nh1-eth2.55716: Flags [.], ack 1, win 382, length 1460 17:19:51.461909 IP (tos 0x0, ttl 64, id 56743, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716 nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 39426, win 382, length 0 17:19:52.484306 IP (tos 0x0, ttl 64, id 56744, offset 0, flags [DF], proto TCP (6), length 1500) nh1-eth2.55716 nh2-eth2.44115: Flags [.], ack 39426, win 382, length 1460 17:19:52.484328 IP (tos 0x0, ttl 64, id 100, offset 0, flags [DF], proto TCP (6), length 40) nh2-eth2.44115 nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 1, win 382, length 0 17:19:52.484333 IP (tos 0x0, ttl 64, id 56745, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716
[E1000-devel] How to verify the result of RSS hash function?
Dear list, I have a 82572 (driver e1000e) and a 82576 (driver igb) in hand. The following code was added in *_configure_rx to turn RSS on. == static const u8 rsshash[40] = { 0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2, 0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0, 0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4, 0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c, 0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa }; u32 mrqc; u32 j; mrqc = 0x0002; /* Fill out hash function seeds */ for (j = 0; j 10; j++) { u32 rsskey = rsshash[(j * 4)]; rsskey |= rsshash[(j * 4) + 1] 8; rsskey |= rsshash[(j * 4) + 2] 16; rsskey |= rsshash[(j * 4) + 3] 24; E1000_WRITE_REG_ARRAY(hw, E1000_RSSRK(0), j, rsskey); } mrqc |= (E1000_MRQC_RSS_FIELD_IPV4 | E1000_MRQC_RSS_FIELD_IPV4_TCP); E1000_WRITE_REG(hw, E1000_MRQC, mrqc); rxcsum = E1000_READ_REG(hw, E1000_RXCSUM); rxcsum |= E1000_RXCSUM_PCSD; E1000_WRITE_REG(hw, E1000_RXCSUM, rxcsum); According to the manual, developers can get a Dword result of RSS hash function to verify it. However, I did't get the right data. It seems there is some misunderstanding. My question is how can I get this Dword result correctly? Thanks in advance. --Junchang -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july___ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel
Re: [E1000-devel] LRO botch with 82598EB 2.0.44.14-NAPI
I did, in fact, try with default interrupt settings and the hang persisted. Getting access to the cluster shouldn't be a problem at all. Roy -Original Message- From: Brandeburg, Jesse Sent: Thursday, November 12, 2009 12:16 AM To: Larsen, Roy K Cc: e1000-de...@lists.sf.net Subject: Re: [E1000-devel] LRO botch with 82598EB 2.0.44.14-NAPI Hi Roy, I am sure we can figure out what is going on, thanks for the report. Can you run one test for me? Please try without the InterruptThrottleRate driver parameter, but with LRO enabled. Since you are here at the same campus as we are I hope I can maybe just get direct access to your machines. On Wed, 2009-11-11 at 15:24 -0800, Larsen, Roy K wrote: I believe there is a problem with the software LRO in the ixgbe driver. With LRO enabled, my cluster application hangs where two processes have data to send to each other as indicated by looking at the send queue with netstat(8) but it is not making progress even though the receive queues are empty. If I build the driver without LRO (make CFLAGS_EXTRA=- DIXGBE_NO_LRO install), this issue goes away. These are compute nodes that do not do routing or IP forwarding. The hang is easily reproduced. The particulars follow. Roy Larsen Intel Corp. roy.k.lar...@intel.commailto:roy.k.lar...@intel.com JF5-3-J4 -- Red Hat EL5.3 (2.6.18-128.el5 kernel) Dual socket Nehalem 2.9GHz nodes (8 cores) with 12GB of memory, hyper- threading disabled Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01) Fujitsu xg700 switch 8 nodes (64 cores) Intel MPI 4.0.0.014 [r...@cstnh-1 library]# ethtool -i eth2 driver: ixgbe version: 2.0.44.14-NAPI firmware-version: 1.8-0 bus-info: :02:00.0 ixgbe driver loaded with following options: modprobe ixgbe InterruptThrottleRate=0,0 netstat -t on node nh1-eth2 Proto Recv-Q Send-Q Local Address Foreign Address State tcp0 5224 nh1-eth2:55716 nh2-eth2:44115 ESTABLISHED netstat -t on node nh2-eth2 Proto Recv-Q Send-Q Local Address Foreign Address State tcp0 331648 nh2-eth2:44115 nh1-eth2:55716 ESTABLISHED The tcpdump(8) trace shows the connection is not making progress [r...@cstnh-1 library]# tcpdump -i eth2 -v host nh2-eth2 and host nh1- eth2 and port 55716 and port 44115 tcpdump-tnic: listening on eth2, link-type EN10MB (Ethernet), capture size 96 bytes 17:17:17.092120 IP (tos 0x0, ttl 64, id 95, offset 0, flags [DF], proto TCP (6), length 1500) nh2-eth2.44115 nh1-eth2.55716: Flags [.], ack 162998588, win 382, length 1460 17:17:17.092227 IP (tos 0x0, ttl 64, id 56737, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716 nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 39426, win 382, length 0 17:17:17.348305 IP (tos 0x0, ttl 64, id 56738, offset 0, flags [DF], proto TCP (6), length 1500) nh1-eth2.55716 nh2-eth2.44115: Flags [.], ack 39426, win 382, length 1460 17:17:17.348326 IP (tos 0x0, ttl 64, id 96, offset 0, flags [DF], proto TCP (6), length 40) nh2-eth2.44115 nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 1, win 382, length 0 17:17:17.348331 IP (tos 0x0, ttl 64, id 56739, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716 nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 39426, win 382, length 0 17:18:08.548706 IP (tos 0x0, ttl 64, id 97, offset 0, flags [DF], proto TCP (6), length 1500) nh2-eth2.44115 nh1-eth2.55716: Flags [.], ack 1, win 382, length 1460 17:18:08.548711 IP (tos 0x0, ttl 64, id 56740, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716 nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 39426, win 382, length 0 17:18:09.060306 IP (tos 0x0, ttl 64, id 56741, offset 0, flags [DF], proto TCP (6), length 1500) nh1-eth2.55716 nh2-eth2.44115: Flags [.], ack 39426, win 382, length 1460 17:18:09.060327 IP (tos 0x0, ttl 64, id 98, offset 0, flags [DF], proto TCP (6), length 40) nh2-eth2.44115 nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 1, win 382, length 0 17:18:09.060332 IP (tos 0x0, ttl 64, id 56742, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716 nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 39426, win 382, length 0 17:19:51.461901 IP (tos 0x0, ttl 64, id 99, offset 0, flags [DF], proto TCP (6), length 1500) nh2-eth2.44115 nh1-eth2.55716: Flags [.], ack 1, win 382, length 1460 17:19:51.461909 IP (tos 0x0, ttl 64, id 56743, offset 0, flags [DF], proto TCP (6), length 40) nh1-eth2.55716 nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 39426, win 382, length 0 17:19:52.484306 IP (tos 0x0, ttl 64, id 56744, offset 0, flags [DF], proto TCP (6), length 1500) nh1-eth2.55716 nh2-eth2.44115: Flags [.], ack 39426, win 382, length 1460 17:19:52.484328 IP (tos 0x0, ttl 64, id 100, offset 0, flags [DF], proto TCP (6), length 40)