Re: [Intel-wired-lan] i40e X722 RSS problem with NAT-Traversal IPsec packets

2019-05-21 Thread Lennart Sorensen
On Tue, May 21, 2019 at 09:51:33AM -0700, Alexander Duyck wrote:
> I think we need to narrow this down a bit more. Let's try forcing the
> lookup table all to one value and see if traffic is still going to
> queue 0.
> 
> Specifically what we need to is run the following command to try and
> force all RSS traffic to queue 8, you can verify the result with
> "ethtool -x":
> ethtool -X  weight 0 0 0 0 0 0 0 0 1
> 
> If that works and the IPSec traffic goes to queue 8 then we are likely
> looking at some sort of input issue, either in the parsing or the
> population of things like the input mask that we can then debug
> further.
> 
> If traffic still goes to queue 0 then that tells us the output of the
> RSS hash and lookup table are being ignored, this would imply either
> some other filter is rerouting the traffic or is directing us to limit
> the queue index to 0 bits.

# ethtool -x eth2
RX flow hash indirection table for eth2 with 12 RX ring(s):
0:  7 7 7 7 7 7 7 7
8:  7 7 7 7 7 7 7 7
   16:  7 7 7 7 7 7 7 7
   24:  7 7 7 7 7 7 7 7
   32:  7 7 7 7 7 7 7 7
...
  472:  7 7 7 7 7 7 7 7
  480:  7 7 7 7 7 7 7 7
  488:  7 7 7 7 7 7 7 7
  496:  7 7 7 7 7 7 7 7
  504:  7 7 7 7 7 7 7 7
RSS hash key:
0b:1f:ae:ed:60:04:7d:e5:8a:2b:43:3f:1d:ee:6c:99:89:29:94:b0:25:db:c7:4b:fa:da:4d:3f:e8:cc:bc:00:ad:32:01:d6:1c:30:3f:f8:79:3e:f4:48:04:1f:51:d2:5a:39:f0:90
root@ECA:~# ethtool --show-priv-flags eth2
Private flags for eth2:
MFP  : off
LinkPolling  : off
flow-director-atr: off
veb-stats: off
hw-atr-eviction  : on
legacy-rx: off

All ipsec packets are still hitting queue 0.

Seems it is completely ignoring RSS for these packets.  That is
impressively weird.

-- 
Len Sorensen


Re: [Intel-wired-lan] i40e X722 RSS problem with NAT-Traversal IPsec packets

2019-05-21 Thread Alexander Duyck
On Tue, May 21, 2019 at 8:15 AM Lennart Sorensen
 wrote:
>
> On Fri, May 17, 2019 at 03:20:02PM -0700, Alexander Duyck wrote:
> > I was hoping it would work too. It seemed like it should have been the
> > answer since it definitely didn't seem right. Now it has me wondering
> > about some of the other code in the driver.
> >
> > By any chance have you run anything like DPDK on any of the X722
> > interfaces on this system recently? I ask because it occurs to me that
> > if you had and it loaded something like a custom parsing profile it
> > could cause issues similar to this.
>
> I have never used DPDK on anything.  I was hoping never to do so. :)
>
> This system has so far booted Debian (with a 4.19 kernel) and our own OS
> (which has a 4.9 kernel).
>
> > A debugging step you might try would be to revert back to my earlier
> > patch that only displayed the input mask instead of changing it. Once
> > you have done that you could look at doing a full power cycle on the
> > system by either physically disconnecting the power, or using the
> > power switch on the power supply itself if one is available. It is
> > necessary to disconnect the motherboard/NIC from power in order to
> > fully clear the global state stored in the device as it is retained
> > when the system is in standby.
> >
> > What I want to verify is if the input mask that we have ran into is
> > the natural power-on input mask of if that is something that was
> > overridden by something else. The mask change I made should be reset
> > if the system loses power, and then it will either default back to the
> > value with the 6's if that is it's natural state, or it will match
> > what I had if it was not.
> >
> > Other than that I really can't think up too much else. I suppose there
> > is the possibility of the NVM either setting up a DCB setting or
> > HREGION register causing an override that is limiting the queues to 1.
> > However, the likelihood of that should be really low.
>
> Here is the register dump after a full power off:
>
> 40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.7-k
> i40e: Copyright (c) 2013 - 2014 Intel Corporation.
> i40e :3d:00.0: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
> i40e :3d:00.0: The driver for the device detected a newer version of the 
> NVM image than expected. Please install the most recent version of the 
> network driver.
> i40e :3d:00.0: MAC address: a4:bf:01:4e:0c:87
> i40e :3d:00.0: flow_type: 63 input_mask:0x4000
> i40e :3d:00.0: flow_type: 46 input_mask:0x0007fff8
> i40e :3d:00.0: flow_type: 45 input_mask:0x0007fff8
> i40e :3d:00.0: flow_type: 44 input_mask:0x00078000
> i40e :3d:00.0: flow_type: 43 input_mask:0x0007fffe
> i40e :3d:00.0: flow_type: 42 input_mask:0x0007fffe
> i40e :3d:00.0: flow_type: 41 input_mask:0x0007fffe
> i40e :3d:00.0: flow_type: 40 input_mask:0x0007fffe
> i40e :3d:00.0: flow_type: 39 input_mask:0x0007fffe
> i40e :3d:00.0: flow_type: 36 input_mask:0x00060600
> i40e :3d:00.0: flow_type: 35 input_mask:0x00060600
> i40e :3d:00.0: flow_type: 34 input_mask:0x000606078000
> i40e :3d:00.0: flow_type: 33 input_mask:0x00060606
> i40e :3d:00.0: flow_type: 32 input_mask:0x00060606
> i40e :3d:00.0: flow_type: 31 input_mask:0x00060606
> i40e :3d:00.0: flow_type: 30 input_mask:0x00060606
> i40e :3d:00.0: flow_type: 29 input_mask:0x00060606
> i40e :3d:00.0: Features: PF-id[0] VSIs: 34 QP: 12 TXQ: 13 RSS VxLAN 
> Geneve VEPA
> i40e :3d:00.1: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
> i40e :3d:00.1: The driver for the device detected a newer version of the 
> NVM image than expected. Please install the most recent version of the 
> network driver.
> i40e :3d:00.1: MAC address: a4:bf:01:4e:0c:88
> i40e :3d:00.1: flow_type: 63 input_mask:0x4000
> i40e :3d:00.1: flow_type: 46 input_mask:0x0007fff8
> i40e :3d:00.1: flow_type: 45 input_mask:0x0007fff8
> i40e :3d:00.1: flow_type: 44 input_mask:0x00078000
> i40e :3d:00.1: flow_type: 43 input_mask:0x0007fffe
> i40e :3d:00.1: flow_type: 42 input_mask:0x0007fffe
> i40e :3d:00.1: flow_type: 41 input_mask:0x0007fffe
> i40e :3d:00.1: flow_type: 40 input_mask:0x0007fffe
> i40e :3d:00.1: flow_type: 39 input_mask:0x0007fffe
> i40e :3d:00.1: flow_type: 36 input_mask:0x00060600
> i40e :3d:00.1: flow_type: 35 input_mask:0x00060600
> i40e :3d:00.1: flow_type: 34 input_mask:0x000606078000
> i40e :3d:00.1: flow_type: 33 input_mask:0x00060606
> i40e :3d:00.1: flow_type: 32 input_mask:0x00060606
> i40e :3d:00.1: flow_type: 31 input_mask:0x00060606
> i40e :3d:00.1: flow_type: 30 input_mask:0x00060606
> i40e :

Re: [Intel-wired-lan] i40e X722 RSS problem with NAT-Traversal IPsec packets

2019-05-21 Thread Lennart Sorensen
On Fri, May 17, 2019 at 03:20:02PM -0700, Alexander Duyck wrote:
> I was hoping it would work too. It seemed like it should have been the
> answer since it definitely didn't seem right. Now it has me wondering
> about some of the other code in the driver.
> 
> By any chance have you run anything like DPDK on any of the X722
> interfaces on this system recently? I ask because it occurs to me that
> if you had and it loaded something like a custom parsing profile it
> could cause issues similar to this.

I have never used DPDK on anything.  I was hoping never to do so. :)

This system has so far booted Debian (with a 4.19 kernel) and our own OS
(which has a 4.9 kernel).

> A debugging step you might try would be to revert back to my earlier
> patch that only displayed the input mask instead of changing it. Once
> you have done that you could look at doing a full power cycle on the
> system by either physically disconnecting the power, or using the
> power switch on the power supply itself if one is available. It is
> necessary to disconnect the motherboard/NIC from power in order to
> fully clear the global state stored in the device as it is retained
> when the system is in standby.
> 
> What I want to verify is if the input mask that we have ran into is
> the natural power-on input mask of if that is something that was
> overridden by something else. The mask change I made should be reset
> if the system loses power, and then it will either default back to the
> value with the 6's if that is it's natural state, or it will match
> what I had if it was not.
> 
> Other than that I really can't think up too much else. I suppose there
> is the possibility of the NVM either setting up a DCB setting or
> HREGION register causing an override that is limiting the queues to 1.
> However, the likelihood of that should be really low.

Here is the register dump after a full power off:

40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.1.7-k
i40e: Copyright (c) 2013 - 2014 Intel Corporation.
i40e :3d:00.0: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
i40e :3d:00.0: The driver for the device detected a newer version of the 
NVM image than expected. Please install the most recent version of the network 
driver.
i40e :3d:00.0: MAC address: a4:bf:01:4e:0c:87
i40e :3d:00.0: flow_type: 63 input_mask:0x4000
i40e :3d:00.0: flow_type: 46 input_mask:0x0007fff8
i40e :3d:00.0: flow_type: 45 input_mask:0x0007fff8
i40e :3d:00.0: flow_type: 44 input_mask:0x00078000
i40e :3d:00.0: flow_type: 43 input_mask:0x0007fffe
i40e :3d:00.0: flow_type: 42 input_mask:0x0007fffe
i40e :3d:00.0: flow_type: 41 input_mask:0x0007fffe
i40e :3d:00.0: flow_type: 40 input_mask:0x0007fffe
i40e :3d:00.0: flow_type: 39 input_mask:0x0007fffe
i40e :3d:00.0: flow_type: 36 input_mask:0x00060600
i40e :3d:00.0: flow_type: 35 input_mask:0x00060600
i40e :3d:00.0: flow_type: 34 input_mask:0x000606078000
i40e :3d:00.0: flow_type: 33 input_mask:0x00060606
i40e :3d:00.0: flow_type: 32 input_mask:0x00060606
i40e :3d:00.0: flow_type: 31 input_mask:0x00060606
i40e :3d:00.0: flow_type: 30 input_mask:0x00060606
i40e :3d:00.0: flow_type: 29 input_mask:0x00060606
i40e :3d:00.0: Features: PF-id[0] VSIs: 34 QP: 12 TXQ: 13 RSS VxLAN Geneve 
VEPA
i40e :3d:00.1: fw 3.10.52896 api 1.6 nvm 4.00 0x80001577 1.1767.0
i40e :3d:00.1: The driver for the device detected a newer version of the 
NVM image than expected. Please install the most recent version of the network 
driver.
i40e :3d:00.1: MAC address: a4:bf:01:4e:0c:88
i40e :3d:00.1: flow_type: 63 input_mask:0x4000
i40e :3d:00.1: flow_type: 46 input_mask:0x0007fff8
i40e :3d:00.1: flow_type: 45 input_mask:0x0007fff8
i40e :3d:00.1: flow_type: 44 input_mask:0x00078000
i40e :3d:00.1: flow_type: 43 input_mask:0x0007fffe
i40e :3d:00.1: flow_type: 42 input_mask:0x0007fffe
i40e :3d:00.1: flow_type: 41 input_mask:0x0007fffe
i40e :3d:00.1: flow_type: 40 input_mask:0x0007fffe
i40e :3d:00.1: flow_type: 39 input_mask:0x0007fffe
i40e :3d:00.1: flow_type: 36 input_mask:0x00060600
i40e :3d:00.1: flow_type: 35 input_mask:0x00060600
i40e :3d:00.1: flow_type: 34 input_mask:0x000606078000
i40e :3d:00.1: flow_type: 33 input_mask:0x00060606
i40e :3d:00.1: flow_type: 32 input_mask:0x00060606
i40e :3d:00.1: flow_type: 31 input_mask:0x00060606
i40e :3d:00.1: flow_type: 30 input_mask:0x00060606
i40e :3d:00.1: flow_type: 29 input_mask:0x00060606
i40e :3d:00.1: Features: PF-id[1] VSIs: 34 QP: 12 TXQ: 13 RSS VxLAN Geneve 
VEPA
i40e :3d:00.1 eth2: NIC Link is Up, 1000 Mbps Full Duplex, Flow Control: 
None

P

Re: [Intel-wired-lan] i40e X722 RSS problem with NAT-Traversal IPsec packets

2019-05-13 Thread Alexander Duyck
On Mon, May 13, 2019 at 9:55 AM Lennart Sorensen
 wrote:
>
> On Fri, May 03, 2019 at 04:59:35PM -0400, Lennart Sorensen wrote:
> > On Fri, May 03, 2019 at 10:19:47AM -0700, Alexander Duyck wrote:
> > > The TCP flow could be bypassing RSS and may be using ATR to decide
> > > where the Rx packets are processed. Now that I think about it there is
> > > a possibility that ATR could be interfering with the queue selection.
> > > You might try disabling it by running:
> > > ethtool --set-priv-flags  flow-director-atr off
> >
> > Hmm, I thought I had killed ATR (I certainly meant to), but it appears
> > I had not.  I will experiment to see if that makes a difference.
> >
> > > The problem is RSS can be bypassed for queue selection by things like
> > > ATR which I called out above. One possibility is that if the
> > > encryption you were using was leaving the skb->encapsulation flag set,
> > > and the NIC might have misidentified the packets as something it could
> > > parse and set up a bunch of rules that were rerouting incoming traffic
> > > based on outgoing traffic. Disabling the feature should switch off
> > > that behavior if that is in fact the case.
> > >
> > > You are probably fine using 40 queues. That isn't an even power of two
> > > so it would actually improve the entropy a bit since the lower bits
> > > don't have a many:1 mapping to queues.
> >
> > I will let you know Monday how my tests go with atr off.  I really
> > thought that was off already since it was supposed to be.  We always
> > try to turn that off because it does not work well.
>
> OK it took a while to try a bunch of stuff to make sure ATR really really
> was off.
>
> I still see the problem it seems.
>
> # ethtool --show-priv-flags eth2
> Private flags for eth2:
> MFP  : off
> LinkPolling  : off
> flow-director-atr: off
> veb-stats: off
> hw-atr-eviction  : on
> legacy-rx: off
>
> # ethtool -i eth2
> driver: i40e
> version: 2.1.7-k
> firmware-version: 4.00 0x80001577 1.1767.0
> expansion-rom-version:
> bus-info: :3d:00.1
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: yes
>
>
> Here are two packets that for some reason both go to queue 0 which
> seems odd.  As far as I can tell all of the packets for UDP port 4500
> traffic from any IP is going to queue 0.
>
> UDP from 10.49.1.50:4500 to 10.49.1.1:4500 encapsulating ESP:
>
> a4bf 014e 0c88 001f 45ff f410 0800 45e0
> 0060 166e 4000 4011 0b1b 0af9 0132 0af9
> 0101 1194 1194 004c   0201 
>  4eaf 2f76 58cd aae0 4d92 8cb7 0835
> 1141 7a23 9f06 f323 b816 1a2b c88d 322c
> 5f16 d4a6 ba72 7c89 2258 9d20 085e d6ed
> c7a4 5cc1 3ef2 0753 783d b691 e9d6
>
> UDP from 10.49.1.51:4500 to 10.49.1.1:4500 encapsulating ESP:
>
> a4bf 014e 0c88 20f3 99ae c688 0800 45e0
> 0060 1671 4000 4011 0b17 0af9 0133 0af9
> 0101 1194 1194 004c   0200 
>  4ec5 253f 27f1 7fdd 4d82 0697 bef2
> 45bd 281f 8ecf ac4f 06ed 79ba 3cbb 5eaf
> 494b 146e a013 8b93 1c38 8aef da3f a73d
> 6f13 5f80 e946 82e2 7da7 21e8 9d03
>
>
> # ethtool -x eth2
> RX flow hash indirection table for eth2 with 12 RX ring(s):
> 0:  0 1 2 3 4 5 6 7
> 8:  8 91011 0 1 2 3
>16:  4 5 6 7 8 91011
> ...
>   488:  8 91011 0 1 2 3
>   496:  4 5 6 7 8 91011
>   504:  0 1 2 3 4 5 6 7
> RSS hash key:
> 60:56:66:39:8e:70:46:02:5d:33:5e:9c:5f:f6:fa:9d:ac:50:63:7c:ca:01:23:22:07:a3:8a:23:98:fd:38:5b:74:96:7e:72:0c:aa:83:fc:10:aa:6d:35:bb:8c:4e:eb:46:03:07:6a
>
> Changing the key to:
>
> aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55
>
> makes no change in the queue the packets are going to.
>
> --
> Len Sorensen

So I recreated the first packet you listed via text2pcap, replayed it
on my test system via tcpreplay, updated my configuration to 12
queues, and used the 2 hash keys you listed. I ended up seeing the
traffic bounce between queues 4 and 8 with an X710 I had to test with
when I was changing the key value.

Unfortunately I don't have an X722 to test with. I'm suspecting that
there may be some difference in the RSS setup, specifically it seems
like values in the PFQF_HENA register were changed for the X722 part
that may be causing the issues we are seeing.

I will see if I can get someone from the networking division to take a
look at this since I don't have access to the part in question nor a
datasheet for it so I am not sure if I can help much more.

Thanks.

- Alex


Re: [Intel-wired-lan] i40e X722 RSS problem with NAT-Traversal IPsec packets

2019-05-03 Thread Alexander Duyck
On Fri, May 3, 2019 at 8:14 AM Lennart Sorensen
 wrote:
>
> On Thu, May 02, 2019 at 01:59:46PM -0700, Alexander Duyck wrote:
> > If I recall correctly RSS is only using something like the lower 9
> > bits (indirection table size of 512) of the resultant hash on the
> > X722, even fewer if you have fewer queues that are a power of 2 and
> > happen to program the indirection table in a round robin fashion. So
> > for example on my system setup with 32 queues it is technically only
> > using the lower 5 bits of the hash.
> >
> > One issue as a result of that is that you can end up with swaths of
> > bits that don't really seem to impact the hash all that much since it
> > will never actually change those bits of the resultant hash. In order
> > to guarantee that every bit in the input impacts the hash you have to
> > make certain you have to gaps in the key wider than the bits you
> > examine in the final hash.
> >
> > A quick and dirty way to verify that the hash key is part of the issue
> > would be to use something like a simple repeating value such as AA:55
> > as your hash key. With something like that every bit you change in the
> > UDP port number should result in a change in the final RSS hash for
> > queue counts of 3 or greater. The downside is the upper 16 bits of the
> > hash are identical to the lower 16 so the actual hash value itself
> > isn't as useful.
>
> OK I set the hkey to
> aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55:aa:55
> and still only see queue 0 and 2 getting hit with a couple of dozen
> different UDP port numbers I picked.  Changing the hash with ethtool to
> that didn't even move where the tcp packets for my ssh connection are
> going (they are always on queue 2 it seems).

The TCP flow could be bypassing RSS and may be using ATR to decide
where the Rx packets are processed. Now that I think about it there is
a possibility that ATR could be interfering with the queue selection.
You might try disabling it by running:
ethtool --set-priv-flags  flow-director-atr off

> Does it just not hash UDP packets correctly?  Is it even doing RSS?
> (the register I checked claimed it is).

The problem is RSS can be bypassed for queue selection by things like
ATR which I called out above. One possibility is that if the
encryption you were using was leaving the skb->encapsulation flag set,
and the NIC might have misidentified the packets as something it could
parse and set up a bunch of rules that were rerouting incoming traffic
based on outgoing traffic. Disabling the feature should switch off
that behavior if that is in fact the case.

> This system has 40 queues assigned by default since that is how many
> CPUs there are.  Changing it to a lower number didn't make a difference
> (I tried 32 and 8).

You are probably fine using 40 queues. That isn't an even power of two
so it would actually improve the entropy a bit since the lower bits
don't have a many:1 mapping to queues.