Re: [E1000-devel] LRO botch with 82598EB 2.0.44.14-NAPI

2009-11-12 Thread Jesse Brandeburg
Hi Roy, I am sure we can figure out what is going on, thanks for the
report.

Can you run one test for me? Please try without the
InterruptThrottleRate driver parameter, but with LRO enabled.

Since you are here at the same campus as we are I hope I can maybe just
get direct access to your machines.

On Wed, 2009-11-11 at 15:24 -0800, Larsen, Roy K wrote:
 I believe there is a problem with the software LRO in the ixgbe driver.  With 
 LRO enabled, my cluster application hangs where two processes have data to 
 send to each other as indicated by looking at the send queue with netstat(8) 
 but it is not making progress even though the receive queues are empty.  If I 
 build the driver without LRO (make CFLAGS_EXTRA=-DIXGBE_NO_LRO install), 
 this issue goes away. These are compute nodes that do not do routing or IP 
 forwarding.  The hang is easily reproduced.  The particulars follow.
 
 Roy Larsen
 Intel Corp.
 roy.k.lar...@intel.commailto:roy.k.lar...@intel.com
 JF5-3-J4
 
 --
 
 Red Hat EL5.3 (2.6.18-128.el5 kernel)
 Dual socket Nehalem 2.9GHz nodes (8 cores) with 12GB of memory, 
 hyper-threading disabled
 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network 
 Connection (rev 01)
 Fujitsu xg700 switch
 8 nodes (64 cores)
 Intel MPI 4.0.0.014
 
 [r...@cstnh-1 library]# ethtool -i eth2
 driver: ixgbe
 version: 2.0.44.14-NAPI
 firmware-version: 1.8-0
 bus-info: :02:00.0
 
 ixgbe driver loaded with following options:
 modprobe ixgbe InterruptThrottleRate=0,0
 
 netstat -t on node nh1-eth2
 
 Proto Recv-Q Send-Q Local Address   Foreign Address 
 State
 tcp0   5224 nh1-eth2:55716  nh2-eth2:44115  
 ESTABLISHED
 
 netstat -t on node nh2-eth2
 
 Proto Recv-Q Send-Q Local Address   Foreign Address 
 State
 tcp0 331648 nh2-eth2:44115  nh1-eth2:55716  
 ESTABLISHED
 
 The tcpdump(8) trace shows the connection is not making progress
 
 [r...@cstnh-1 library]# tcpdump -i eth2 -v host nh2-eth2 and host nh1-eth2 
 and port 55716 and port 44115
 tcpdump-tnic: listening on eth2, link-type EN10MB (Ethernet), capture size 96 
 bytes
 17:17:17.092120 IP (tos 0x0, ttl 64, id 95, offset 0, flags [DF], proto TCP 
 (6), length 1500)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], ack 162998588, win 382, 
 length 1460
 17:17:17.092227 IP (tos 0x0, ttl 64, id 56737, offset 0, flags [DF], proto 
 TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
 39426, win 382, length 0
 17:17:17.348305 IP (tos 0x0, ttl 64, id 56738, offset 0, flags [DF], proto 
 TCP (6), length 1500)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], ack 39426, win 382, length 
 1460
 17:17:17.348326 IP (tos 0x0, ttl 64, id 96, offset 0, flags [DF], proto TCP 
 (6), length 40)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 
 1, win 382, length 0
 17:17:17.348331 IP (tos 0x0, ttl 64, id 56739, offset 0, flags [DF], proto 
 TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
 39426, win 382, length 0
 17:18:08.548706 IP (tos 0x0, ttl 64, id 97, offset 0, flags [DF], proto TCP 
 (6), length 1500)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], ack 1, win 382, length 1460
 17:18:08.548711 IP (tos 0x0, ttl 64, id 56740, offset 0, flags [DF], proto 
 TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
 39426, win 382, length 0
 17:18:09.060306 IP (tos 0x0, ttl 64, id 56741, offset 0, flags [DF], proto 
 TCP (6), length 1500)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], ack 39426, win 382, length 
 1460
 17:18:09.060327 IP (tos 0x0, ttl 64, id 98, offset 0, flags [DF], proto TCP 
 (6), length 40)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 
 1, win 382, length 0
 17:18:09.060332 IP (tos 0x0, ttl 64, id 56742, offset 0, flags [DF], proto 
 TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
 39426, win 382, length 0
 17:19:51.461901 IP (tos 0x0, ttl 64, id 99, offset 0, flags [DF], proto TCP 
 (6), length 1500)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], ack 1, win 382, length 1460
 17:19:51.461909 IP (tos 0x0, ttl 64, id 56743, offset 0, flags [DF], proto 
 TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct), ack 
 39426, win 382, length 0
 17:19:52.484306 IP (tos 0x0, ttl 64, id 56744, offset 0, flags [DF], proto 
 TCP (6), length 1500)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], ack 39426, win 382, length 
 1460
 17:19:52.484328 IP (tos 0x0, ttl 64, id 100, offset 0, flags [DF], proto TCP 
 (6), length 40)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct), ack 
 1, win 382, length 0
 17:19:52.484333 IP (tos 0x0, ttl 64, id 56745, offset 0, flags [DF], proto 
 TCP (6), length 40)
 nh1-eth2.55716  

[E1000-devel] How to verify the result of RSS hash function?

2009-11-12 Thread Junchang Wang
Dear list,
I have a 82572 (driver e1000e) and a 82576 (driver igb) in hand. The
following code was added in *_configure_rx to turn RSS on.

==
   static const u8 rsshash[40] = {
   0x6d, 0x5a, 0x56, 0xda, 0x25, 0x5b, 0x0e, 0xc2,
   0x41, 0x67, 0x25, 0x3d, 0x43, 0xa3, 0x8f, 0xb0,
   0xd0, 0xca, 0x2b, 0xcb, 0xae, 0x7b, 0x30, 0xb4,
   0x77, 0xcb, 0x2d, 0xa3, 0x80, 0x30, 0xf2, 0x0c,
   0x6a, 0x42, 0xb7, 0x3b, 0xbe, 0xac, 0x01, 0xfa };
   u32 mrqc;
   u32 j;

   mrqc = 0x0002;

   /* Fill out hash function seeds */
   for (j = 0; j  10; j++) {
   u32 rsskey = rsshash[(j * 4)];
   rsskey |= rsshash[(j * 4) + 1]  8;
   rsskey |= rsshash[(j * 4) + 2]  16;
   rsskey |= rsshash[(j * 4) + 3]  24;
   E1000_WRITE_REG_ARRAY(hw, E1000_RSSRK(0), j, rsskey);
   }

   mrqc |= (E1000_MRQC_RSS_FIELD_IPV4 |
E1000_MRQC_RSS_FIELD_IPV4_TCP);

   E1000_WRITE_REG(hw, E1000_MRQC, mrqc);

   rxcsum = E1000_READ_REG(hw, E1000_RXCSUM);
   rxcsum |= E1000_RXCSUM_PCSD;
   E1000_WRITE_REG(hw, E1000_RXCSUM, rxcsum);


According to the manual, developers can get a Dword result of RSS hash
function to verify it. However, I did't get the right data. It seems there
is some misunderstanding. My question is how can I get this Dword result
correctly?

Thanks in advance.
--Junchang
--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july___
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel


Re: [E1000-devel] LRO botch with 82598EB 2.0.44.14-NAPI

2009-11-12 Thread Larsen, Roy K
I did, in fact, try with default interrupt settings and the hang persisted.  
Getting access to the cluster shouldn't be a problem at all.

Roy

-Original Message-
From: Brandeburg, Jesse
Sent: Thursday, November 12, 2009 12:16 AM
To: Larsen, Roy K
Cc: e1000-de...@lists.sf.net
Subject: Re: [E1000-devel] LRO botch with 82598EB 2.0.44.14-NAPI

Hi Roy, I am sure we can figure out what is going on, thanks for the
report.

Can you run one test for me? Please try without the
InterruptThrottleRate driver parameter, but with LRO enabled.

Since you are here at the same campus as we are I hope I can maybe just
get direct access to your machines.

On Wed, 2009-11-11 at 15:24 -0800, Larsen, Roy K wrote:
 I believe there is a problem with the software LRO in the ixgbe driver.
With LRO enabled, my cluster application hangs where two processes have
data to send to each other as indicated by looking at the send queue with
netstat(8) but it is not making progress even though the receive queues are
empty.  If I build the driver without LRO (make CFLAGS_EXTRA=-
DIXGBE_NO_LRO install), this issue goes away. These are compute nodes that
do not do routing or IP forwarding.  The hang is easily reproduced.  The
particulars follow.

 Roy Larsen
 Intel Corp.
 roy.k.lar...@intel.commailto:roy.k.lar...@intel.com
 JF5-3-J4

 --

 Red Hat EL5.3 (2.6.18-128.el5 kernel)
 Dual socket Nehalem 2.9GHz nodes (8 cores) with 12GB of memory, hyper-
threading disabled
 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network
Connection (rev 01)
 Fujitsu xg700 switch
 8 nodes (64 cores)
 Intel MPI 4.0.0.014

 [r...@cstnh-1 library]# ethtool -i eth2
 driver: ixgbe
 version: 2.0.44.14-NAPI
 firmware-version: 1.8-0
 bus-info: :02:00.0

 ixgbe driver loaded with following options:
 modprobe ixgbe InterruptThrottleRate=0,0

 netstat -t on node nh1-eth2

 Proto Recv-Q Send-Q Local Address   Foreign Address
State
 tcp0   5224 nh1-eth2:55716  nh2-eth2:44115
ESTABLISHED

 netstat -t on node nh2-eth2

 Proto Recv-Q Send-Q Local Address   Foreign Address
State
 tcp0 331648 nh2-eth2:44115  nh1-eth2:55716
ESTABLISHED

 The tcpdump(8) trace shows the connection is not making progress

 [r...@cstnh-1 library]# tcpdump -i eth2 -v host nh2-eth2 and host nh1-
eth2 and port 55716 and port 44115
 tcpdump-tnic: listening on eth2, link-type EN10MB (Ethernet), capture
size 96 bytes
 17:17:17.092120 IP (tos 0x0, ttl 64, id 95, offset 0, flags [DF], proto
TCP (6), length 1500)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], ack 162998588, win 382,
length 1460
 17:17:17.092227 IP (tos 0x0, ttl 64, id 56737, offset 0, flags [DF],
proto TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct),
ack 39426, win 382, length 0
 17:17:17.348305 IP (tos 0x0, ttl 64, id 56738, offset 0, flags [DF],
proto TCP (6), length 1500)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], ack 39426, win 382,
length 1460
 17:17:17.348326 IP (tos 0x0, ttl 64, id 96, offset 0, flags [DF], proto
TCP (6), length 40)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct),
ack 1, win 382, length 0
 17:17:17.348331 IP (tos 0x0, ttl 64, id 56739, offset 0, flags [DF],
proto TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct),
ack 39426, win 382, length 0
 17:18:08.548706 IP (tos 0x0, ttl 64, id 97, offset 0, flags [DF], proto
TCP (6), length 1500)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], ack 1, win 382, length
1460
 17:18:08.548711 IP (tos 0x0, ttl 64, id 56740, offset 0, flags [DF],
proto TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct),
ack 39426, win 382, length 0
 17:18:09.060306 IP (tos 0x0, ttl 64, id 56741, offset 0, flags [DF],
proto TCP (6), length 1500)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], ack 39426, win 382,
length 1460
 17:18:09.060327 IP (tos 0x0, ttl 64, id 98, offset 0, flags [DF], proto
TCP (6), length 40)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], cksum 0x99b6 (correct),
ack 1, win 382, length 0
 17:18:09.060332 IP (tos 0x0, ttl 64, id 56742, offset 0, flags [DF],
proto TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct),
ack 39426, win 382, length 0
 17:19:51.461901 IP (tos 0x0, ttl 64, id 99, offset 0, flags [DF], proto
TCP (6), length 1500)
 nh2-eth2.44115  nh1-eth2.55716: Flags [.], ack 1, win 382, length
1460
 17:19:51.461909 IP (tos 0x0, ttl 64, id 56743, offset 0, flags [DF],
proto TCP (6), length 40)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], cksum 0x99b4 (correct),
ack 39426, win 382, length 0
 17:19:52.484306 IP (tos 0x0, ttl 64, id 56744, offset 0, flags [DF],
proto TCP (6), length 1500)
 nh1-eth2.55716  nh2-eth2.44115: Flags [.], ack 39426, win 382,
length 1460
 17:19:52.484328 IP (tos 0x0, ttl 64, id 100, offset 0, flags [DF], proto
TCP (6), length 40)