date:20160516

Re: [PATCH net] netfilter: nf_conntrack: Use net_mutex for helper unregistration.

2016-05-16 Thread Joe Stringer

On 6 May 2016 at 04:03, Pablo Neira Ayuso  wrote:
> Hi Joe,
>
> On Thu, May 05, 2016 at 03:50:37PM -0700, Joe Stringer wrote:
>> diff --git a/net/netfilter/nf_conntrack_helper.c 
>> b/net/netfilter/nf_conntrack_helper.c
>> index 3b40ec575cd5..6860b19be406 100644
>> --- a/net/netfilter/nf_conntrack_helper.c
>> +++ b/net/netfilter/nf_conntrack_helper.c
>> @@ -449,10 +449,10 @@ void nf_conntrack_helper_unregister(struct 
>> nf_conntrack_helper *me)
>>*/
>>   synchronize_rcu();
>>
>> - rtnl_lock();
>> + mutex_lock(_mutex);
>>   for_each_net(net)
>>   __nf_conntrack_helper_unregister(me, net);
>> - rtnl_unlock();
>> + mutex_unlock(_mutex);
>
> This simple solution works because we have no .exit callbacks in any
> of our helpers. Otherwise, the helper code may be already gone by when
> the worker has a chance to run to release the netns.
>
> If so, probably I can append this as comment to this function so we
> don't forget. If we ever have .exit callbacks (I don't expect so), we
> would need to wait for worker completion.

Hi Pablo,

Did you want me to re-spin this patch or look into another approach?

[PATCH] asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions

2016-05-16 Thread John Stultz

In testing with HiKey, we found that since
commit 3f30b158eba5 ("asix: On RX avoid creating bad Ethernet
frames"),
we're seeing lots of noise during network transfers:

[  239.027993] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation 
was lost, remaining 988
[  239.037310] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x54ebb5ec, offset 4
[  239.045519] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0xcdffe7a2, offset 4
[  239.275044] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation 
was lost, remaining 988
[  239.284355] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x1d36f59d, offset 4
[  239.292541] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0xaef3c1e9, offset 4
[  239.518996] asix 1-1.1:1.0 eth0: asix_rx_fixup() Data Header synchronisation 
was lost, remaining 988
[  239.528300] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x2881912, offset 4
[  239.536413] asix 1-1.1:1.0 eth0: asix_rx_fixup() Bad Header Length 
0x5638f7e2, offset 4

And network throughput ends up being pretty bursty and slow with
a overall throughput of at best ~30kB/s (where as previously we
got 1.1MB/s with the slower USB1.1 "full speed" host).

We found the issue also was reproducible on a x86_64 system,
using a "high-speed" USB2.0 port but the throughput did not
measurably drop (possibly due to the scp transfer being cpu
bound on my slow test hardware).

After lots of debugging, I found the check added in the
problematic commit seems to be calculating the offset
incorrectly.

In the normal case, in the main loop of the function, we do:
(where offset is zero, or set to "offset += (copy_length + 1) &
0xfffe" in the previous loop)
rx->header = get_unaligned_le32(skb->data +
offset);
offset += sizeof(u32);

But the problematic patch calculates:
offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);
rx->header = get_unaligned_le32(skb->data + offset);

Adding some debug logic to check those offset calculation used
to find rx->header, the one in problematic code is always too
large by sizeof(u32).

Thus, this patch removes the incorrect " + sizeof(u32)" addition
in the problematic calculation, and resolves the issue.

Cc: Dean Jenkins 
Cc: "David B. Robins" 
Cc: Mark Craske 
Cc: Emil Goode 
Cc: "David S. Miller" 
Cc: YongQin Liu 
Cc: Guodong Xu 
Cc: Ivan Vecera 
Cc: linux-...@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: stable  #4.4+
Reported-by: Yongqin Liu 
Signed-off-by: John Stultz 
---
 drivers/net/usb/asix_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/usb/asix_common.c b/drivers/net/usb/asix_common.c
index 0c5c22b..7de5ab5 100644
--- a/drivers/net/usb/asix_common.c
+++ b/drivers/net/usb/asix_common.c
@@ -66,7 +66,7 @@ int asix_rx_fixup_internal(struct usbnet *dev, struct sk_buff 
*skb,
 * buffer.
 */
if (rx->remaining && (rx->remaining + sizeof(u32) <= skb->len)) {
-   offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);
+   offset = ((rx->remaining + 1) & 0xfffe);
rx->header = get_unaligned_le32(skb->data + offset);
offset = 0;
 
-- 
1.9.1

[PATCH 2/2] net: Fix coding style warnings and errors.

2016-05-16 Thread Amit Ghadge

Clean up checkpatch warnings and errors:

* WARNING: Block comments use * on subsequent lines
* WARNING: Missing a blank line after declarations
* WARNING: networking block comments don't use an empty /* line, use /*
* ERROR: code indent should use tabs where possible
* WARNING: please, no space before tabs
* WARNING: please, no spaces at the start of a line
* WARNING: line over 80 characters
* ERROR: space prohibited after that open parenthesis '('

Signed-off-by: Amit Ghadge 
---
 drivers/net/Space.c | 21 ++---
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/net/Space.c b/drivers/net/Space.c
index 67977f1..b5e92a6 100644
--- a/drivers/net/Space.c
+++ b/drivers/net/Space.c
@@ -35,8 +35,8 @@
 #include 
 
 /* A unified ethernet device probe.  This is the easiest way to have every
-   ethernet adaptor have the name "eth[0123...]".
-   */
+ * ethernet adaptor have the name "eth[0123...]".
+ */
 
 struct devprobe2 {
struct net_device *(*probe)(int unit);
@@ -46,6 +46,7 @@ struct devprobe2 {
 static int __init probe_list2(int unit, struct devprobe2 *p, int autoprobe)
 {
struct net_device *dev;
+
for (; p->probe; p++) {
if (autoprobe && p->status)
continue;
@@ -58,8 +59,7 @@ static int __init probe_list2(int unit, struct devprobe2 *p, 
int autoprobe)
return -ENODEV;
 }
 
-/*
- * ISA probes that touch addresses < 0x400 (including those that also
+/*ISA probes that touch addresses < 0x400 (including those that also
  * look for EISA/PCI cards in addition to ISA cards).
  */
 static struct devprobe2 isa_probes[] __initdata = {
@@ -86,11 +86,11 @@ static struct devprobe2 isa_probes[] __initdata = {
 #endif
 #ifdef CONFIG_CS89x0
 #ifndef CONFIG_CS89x0_PLATFORM
-   {cs89x0_probe, 0},
+   {cs89x0_probe, 0},
 #endif
 #endif
-#if defined(CONFIG_MVME16x_NET) || defined(CONFIG_BVME6000_NET)/* 
Intel I82596 */
-   {i82596_probe, 0},
+#if defined(CONFIG_MVME16x_NET) || defined(CONFIG_BVME6000_NET)/* 
Intel */
+   {i82596_probe, 0},  /* I82596 */
 #endif
 #ifdef CONFIG_NI65
{ni65_probe, 0},
@@ -118,13 +118,12 @@ static struct devprobe2 m68k_probes[] __initdata = {
{mac8390_probe, 0},
 #endif
 #ifdef CONFIG_MAC89x0
-   {mac89x0_probe, 0},
+   {mac89x0_probe, 0},
 #endif
{NULL, 0},
 };
 
-/*
- * Unified ethernet device probe, segmented per architecture and
+/* Unified ethernet device probe, segmented per architecture and
  * per bus interface. This drives the legacy devices only for now.
  */
 
@@ -135,7 +134,7 @@ static void __init ethif_probe2(int unit)
if (base_addr == 1)
return;
 
-   (void)( probe_list2(unit, m68k_probes, base_addr == 0) &&
+   (void)(probe_list2(unit, m68k_probes, base_addr == 0) &&
probe_list2(unit, isa_probes, base_addr == 0));
 }
 
-- 
2.5.5

RE: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-05-16 Thread Dexuan Cui

> From: David Miller [mailto:da...@davemloft.net]
> Sent: Monday, May 16, 2016 1:16
> To: Dexuan Cui 
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY
> Srinivasan ; Haiyang Zhang ;
> j...@perches.com; vkuzn...@redhat.com
> Subject: Re: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
>
> From: Dexuan Cui 
> Date: Sun, 15 May 2016 09:52:42 -0700
>
> > Changes since v10
> >
> > 1) add module params: send_ring_page, recv_ring_page. They can be used to
> > enlarge the ringbuffer size to get better performance, e.g.,
> > # modprobe hv_sock  recv_ring_page=16 send_ring_page=16
> > By default, recv_ring_page is 3 and send_ring_page is 2.
> >
> > 2) add module param max_socket_number (the default is 1024).
> > A user can enlarge the number to create more than 1024 hv_sock sockets.
> > By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
> > (Here 1+1 means 1 page for send/recv buffers per connection, respectively.)
>
> This is papering around my objections, and create module parameters which
> I am fundamentally against.
>
> You're making the facility unusable by default, just to work around my
> memory consumption concerns.
>
> What will end up happening is that everyone will simply increase the
> values.
>
> You're not really addressing the core issue, and I will be ignoring you
> future submissions of this change until you do.

David,
I am sorry I came across as ignoring your feedback; that was not my intention.
The current host side design for this feature is such that each socket 
connection
needs its own channel, which consists of

1.A ring buffer for host to guest communication
2.A ring buffer for guest to host communication

The memory for the ring buffers has to be pinned down as this will be accessed
both from interrupt level in Linux guest and from the host OS at any time.

To address your concerns, I am planning to re-implement both the receive path
and the send path so that no additional pinned memory will be needed.

Receive Path:
When the application does a read on the socket, we will dynamically allocate
the buffer and perform the read operation on the incoming ring buffer. Since
we will be in the process context, we can sleep here and will set the
"GFP_KERNEL | __GFP_NOFAIL" flags. This buffer will be freed once the
application consumes all the data.

Send Path:
On the send side, we will construct the payload to be sent directly on the
outgoing ringbuffer.

So, with these changes, the only memory that will be pinned down will be the
memory for the ring buffers on a per-connection basis and this memory will be
pinned down until the connection is torn down.

Please let me know if this addresses your concerns.

 Thanks,
-- Dexuan

Re: [PATCH iproute2] ss: Tell user about -EOPNOTSUPP for SOCK_DESTROY

2016-05-16 Thread Lorenzo Colitti

On Tue, May 17, 2016 at 11:24 AM, David Ahern  wrote:
> As I mentioned we can print the unsupported once or per socket matched and
> with the socket params. e.g.,
>
> +   } else if (errno == EOPNOTSUPP) {
> +   printf("Operation not supported for:\n");
> +   inet_show_sock(h, diag_arg->f, diag_arg->protocol);
>
> Actively suppressing all error messages is just wrong. I get the flooding
> issue so I'm fine with just printing it once.

I disagree, but then I'm the one who wrote it in the first place, so
you wouldn't expect me to agree. :-) Let's see what Stephen says.

[net-next 1/2] ixgbe: use correct mask when enabling sriov

2016-05-16 Thread Jeff Kirsher

From: Emil Tantilov 

Swap the parameters in GENMASK in order to generate the correct mask.

This change fixes Tx hangs when enabling SRIOV.

Signed-off-by: Emil Tantilov 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d08fbcf..7bbf9b1 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3767,9 +3767,9 @@ static void ixgbe_configure_virtualization(struct 
ixgbe_adapter *adapter)
reg_offset = (VMDQ_P(0) >= 32) ? 1 : 0;
 
/* Enable only the PF's pool for Tx/Rx */
-   IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset), GENMASK(vf_shift, 31));
+   IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset), GENMASK(31, vf_shift));
IXGBE_WRITE_REG(hw, IXGBE_VFRE(reg_offset ^ 1), reg_offset - 1);
-   IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), GENMASK(vf_shift, 31));
+   IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset), GENMASK(31, vf_shift));
IXGBE_WRITE_REG(hw, IXGBE_VFTE(reg_offset ^ 1), reg_offset - 1);
if (adapter->bridge_mode == BRIDGE_MODE_VEB)
IXGBE_WRITE_REG(hw, IXGBE_PFDTXGSWC, IXGBE_PFDTXGSWC_VT_LBEN);
-- 
2.5.5

[net-next 0/2][pull request] 10GbE Intel Wired LAN Driver Updates 2016-05-16

2016-05-16 Thread Jeff Kirsher

This series contains 2 fixes to ixgbe only.

Emil fixes transmit hangs when enabling SRIOV by swapping the parameters
in GENMASK in order to generate the correct mask.

Alex fixes his previous patch b83e30104bd9 ("ixgbe/ixgbevf: Add support
for GSO partial") where he somehow transposed the location of setting
the VLAN features in netdev->features and the configuration of the
vlan_features.

The following are changes since commit 7e2c3aea4398d079745b9faa2c17b6cbd010f221:
  net: also make sch_handle_egress() drop monitor ready
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 10GbE

Alexander Duyck (1):
  ixgbe: Fix VLAN features error

Emil Tantilov (1):
  ixgbe: use correct mask when enabling sriov

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

-- 
2.5.5

[net-next 2/2] ixgbe: Fix VLAN features error

2016-05-16 Thread Jeff Kirsher

From: Alexander Duyck 

It looks like at some point I somehow transposed the location of setting
the VLAN features in netdev->features and the configuration of the
vlan_features.  As a result the driver is now generating a warning about
vlan_features being setup incorrectly.

This patch corrects that by placing the update of netdev->features to
include the VLAN features so that it is after the point where we write
netdev->features into netdev->vlan_features.

Fixes: b83e30104bd9 ("ixgbe/ixgbevf: Add support for GSO partial")
Signed-off-by: Alexander Duyck 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 7bbf9b1..9f3677c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9508,15 +9508,15 @@ skip_sriov:
if (pci_using_dac)
netdev->features |= NETIF_F_HIGHDMA;
 
+   netdev->vlan_features |= netdev->features | NETIF_F_TSO_MANGLEID;
+   netdev->hw_enc_features |= netdev->vlan_features;
+   netdev->mpls_features |= NETIF_F_HW_CSUM;
+
/* set this bit last since it cannot be part of vlan_features */
netdev->features |= NETIF_F_HW_VLAN_CTAG_FILTER |
NETIF_F_HW_VLAN_CTAG_RX |
NETIF_F_HW_VLAN_CTAG_TX;
 
-   netdev->vlan_features |= netdev->features | NETIF_F_TSO_MANGLEID;
-   netdev->hw_enc_features |= netdev->vlan_features;
-   netdev->mpls_features |= NETIF_F_HW_CSUM;
-
netdev->priv_flags |= IFF_UNICAST_FLT;
netdev->priv_flags |= IFF_SUPP_NOFCS;
 
-- 
2.5.5

Re: [PATCH iproute2] ss: Tell user about -EOPNOTSUPP for SOCK_DESTROY

2016-05-16 Thread David Ahern


On 5/16/16 8:04 PM, Lorenzo Colitti wrote:

Given that the filter can specify a number of sockets, some of which
can and some of which can't be closed, and that whether a given socket
can be closed is only known at the time we attempt to close it, there
is a choice between two bad outcomes:

1. Users try to use "ss -K" with a kernel that doesn't support it, and
get confused about why it does nothing and doesn't print an error
message.
2. Users use "ss -K" with a kernel that does support it, and get
irritated by seeing one error message per TCP_TIME_WAIT socket, UDP
socket, etc.


As I mentioned we can print the unsupported once or per socket matched 
and with the socket params. e.g.,


+   } else if (errno == EOPNOTSUPP) {
+   printf("Operation not supported for:\n");
+   inet_show_sock(h, diag_arg->f, diag_arg->protocol);

Actively suppressing all error messages is just wrong. I get the 
flooding issue so I'm fine with just printing it once.

Re: [PATCH 1/2] net: ethernet: fec-mpc52xx: use phydev from struct net_device

2016-05-16 Thread David Miller

From: Philippe Reynes 
Date: Tue, 17 May 2016 00:32:33 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH 2/2] net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings

2016-05-16 Thread David Miller

From: Philippe Reynes 
Date: Tue, 17 May 2016 00:32:34 +0200

> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH net-next] bpf, doc: fix typo on bpf_asm descriptions

2016-05-16 Thread David Miller

From: Daniel Borkmann 
Date: Mon, 16 May 2016 23:06:53 +0200

> Fix description of some of the bpf_asm tool related jump instructions
> and generally move them to format A  k.
> 
> Reported-by: Sebastian Amend 
> Signed-off-by: Daniel Borkmann 

Applied.

Re: [PATCH] stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set

2016-05-16 Thread David Miller

From: Ezequiel Garcia 
Date: Mon, 16 May 2016 12:41:07 -0300

> Commit f748be531d70 ("stmmac: support new GMAC4") reverted a previous fix
> by mistake. This commit re-applies said fix:
> 
>   commit dec2165ff38a99f937fe61875d102c6c8596c815
>   Author: Sonic Zhang 
>   Date:   Thu Jan 22 14:55:57 2015 +0800
>   stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set
> 
>   Clear the TX COE bit when force_thresh_dma_mode is set even hardware
>   dma capability says support.
> 
>   Tested on BF609.
> 
>   Signed-off-by: Sonic Zhang 
>   Acked-by: Giuseppe Cavallaro 
>   Signed-off-by: David S. Miller 
> 
> Tested on LPC4350 Hitex board.
> 
> Fixes: f748be531d70 ("stmmac: support new GMAC4")
> Signed-off-by: Ezequiel Garcia 

Applied.

Re: [PATCH 2/2] net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings

2016-05-16 Thread David Miller

From: Philippe Reynes 
Date: Mon, 16 May 2016 16:52:37 +0200

> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH 1/2] net: ethernet: fs-enet: use phydev from struct net_device

2016-05-16 Thread David Miller

From: Philippe Reynes 
Date: Mon, 16 May 2016 16:52:36 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: BUG: use-after-free in netlink_dump

2016-05-16 Thread David Miller

From: Herbert Xu 
Date: Mon, 16 May 2016 17:28:16 +0800

> Subject: netlink: Fix dump skb leak/double free
> 
> When we free cb->skb after a dump, we do it after releasing the
> lock.  This means that a new dump could have started in the time
> being and we'll end up freeing their skb instead of ours.
> 
> This patch saves the skb and module before we unlock so we free
> the right memory.
> 
> Fixes: 16b304f3404f ("netlink: Eliminate kmalloc in netlink dump operation.")
> Reported-by: Baozeng Ding 
> Signed-off-by: Herbert Xu 

Applied and queued up for -stable, thanks.

Re: [PATCH iproute2] ss: Tell user about -EOPNOTSUPP for SOCK_DESTROY

2016-05-16 Thread Lorenzo Colitti

On Tue, May 17, 2016 at 10:52 AM, David Ahern  wrote:
> code is not setup to handle that. Only option seems to be at least dump an
> error message, but the message can not relate any of the specifics about the
> filter. So something like this though it dumps the message per socket
> matched by the filter. Could throttle it to once.
> [...]
> if (diag_arg->f->kill && kill_inet_sock(h, arg) != 0) {
> -   if (errno == EOPNOTSUPP || errno == ENOENT) {
> -   /* Socket can't be closed, or is already closed. */
> +   if (errno == ENOENT) {
> +   /* socket is already closed. */
> +   return 0;
> +   /* Socket can't be closed OR config is not enabled */
> +   } else if (errno == EOPNOTSUPP) {
> +   perror("SOCK_DESTROY answers");

The reason the code was written like that is that I didn't want to
print one error message for every socket that can't be closed - such
as TIME_WAIT sockets or UDP sockets.

Given that the filter can specify a number of sockets, some of which
can and some of which can't be closed, and that whether a given socket
can be closed is only known at the time we attempt to close it, there
is a choice between two bad outcomes:

1. Users try to use "ss -K" with a kernel that doesn't support it, and
get confused about why it does nothing and doesn't print an error
message.
2. Users use "ss -K" with a kernel that does support it, and get
irritated by seeing one error message per TCP_TIME_WAIT socket, UDP
socket, etc.

Personally I think it's more important to avoid #2 than #1, because #1
is one time (only if you're compiling your own kernel), but #2 is
forever. Also, I think it's consistent with other behaviours in ss -
for example, if the kernel doesn't support SOCK_DIAG for UDP, you just
get nothing back if you run "ss -u".

That said, I'm not the maintainer of this code. Stephen, any thoughts?

[PATCH 3.14 14/17] VSOCK: do not disconnect socket when peer has shutdown SEND only

2016-05-16 Thread Greg Kroah-Hartman

3.14-stable review patch.  If anyone has any objections, please let me know.

--

From: Ian Campbell 

[ Upstream commit dedc58e067d8c379a15a8a183c5db318201295bb ]

The peer may be expecting a reply having sent a request and then done a
shutdown(SHUT_WR), so tearing down the whole socket at this point seems
wrong and breaks for me with a client which does a SHUT_WR.

Looking at other socket family's stream_recvmsg callbacks doing a shutdown
here does not seem to be the norm and removing it does not seem to have
had any adverse effects that I can see.

I'm using Stefan's RFC virtio transport patches, I'm unsure of the impact
on the vmci transport.

Signed-off-by: Ian Campbell 
Cc: "David S. Miller" 
Cc: Stefan Hajnoczi 
Cc: Claudio Imbrenda 
Cc: Andy King 
Cc: Dmitry Torokhov 
Cc: Jorgen Hansen 
Cc: Adit Ranadive 
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/vmw_vsock/af_vsock.c |   21 +
 1 file changed, 1 insertion(+), 20 deletions(-)

--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1796,27 +1796,8 @@ vsock_stream_recvmsg(struct kiocb *kiocb
else if (sk->sk_shutdown & RCV_SHUTDOWN)
err = 0;
 
-   if (copied > 0) {
-   /* We only do these additional bookkeeping/notification steps
-* if we actually copied something out of the queue pair
-* instead of just peeking ahead.
-*/
-
-   if (!(flags & MSG_PEEK)) {
-   /* If the other side has shutdown for sending and there
-* is nothing more to read, then modify the socket
-* state.
-*/
-   if (vsk->peer_shutdown & SEND_SHUTDOWN) {
-   if (vsock_stream_has_data(vsk) <= 0) {
-   sk->sk_state = SS_UNCONNECTED;
-   sock_set_flag(sk, SOCK_DONE);
-   sk->sk_state_change(sk);
-   }
-   }
-   }
+   if (copied > 0)
err = copied;
-   }
 
 out_wait:
finish_wait(sk_sleep(sk), );

Re: [PATCH net-next] tipc: check nl sock before parsing nested attributes

2016-05-16 Thread David Miller

From: Richard Alpe 
Date: Mon, 16 May 2016 11:14:54 +0200

> Make sure the socket for which the user is listing publication exists
> before parsing the socket netlink attributes.
> 
> Prior to this patch a call without any socket caused a NULL pointer
> dereference in tipc_nl_publ_dump().
> 
> Tested-and-reported-by: Baozeng Ding 
> Signed-off-by: Richard Alpe 

Applied and queued up for -stable.

Re: [PATCH net-next] fq_codel: fix memory limitation drift

2016-05-16 Thread David Miller

From: Eric Dumazet 
Date: Sun, 15 May 2016 18:16:38 -0700

> From: Eric Dumazet 
> 
> memory_usage must be decreased in dequeue_func(), not in
> fq_codel_dequeue(), otherwise packets dropped by Codel algo
> are missing this decrease.
> 
> Also we need to clear memory_usage in fq_codel_reset()
> 
> Fixes: 95b58430abe7 ("fq_codel: add memory limitation per queue")
> Signed-off-by: Eric Dumazet 

Applied.

[PATCH 4.4 31/73] VSOCK: do not disconnect socket when peer has shutdown SEND only

2016-05-16 Thread Greg Kroah-Hartman

4.4-stable review patch.  If anyone has any objections, please let me know.

--

From: Ian Campbell 

[ Upstream commit dedc58e067d8c379a15a8a183c5db318201295bb ]

The peer may be expecting a reply having sent a request and then done a
shutdown(SHUT_WR), so tearing down the whole socket at this point seems
wrong and breaks for me with a client which does a SHUT_WR.

Looking at other socket family's stream_recvmsg callbacks doing a shutdown
here does not seem to be the norm and removing it does not seem to have
had any adverse effects that I can see.

I'm using Stefan's RFC virtio transport patches, I'm unsure of the impact
on the vmci transport.

Signed-off-by: Ian Campbell 
Cc: "David S. Miller" 
Cc: Stefan Hajnoczi 
Cc: Claudio Imbrenda 
Cc: Andy King 
Cc: Dmitry Torokhov 
Cc: Jorgen Hansen 
Cc: Adit Ranadive 
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/vmw_vsock/af_vsock.c |   21 +
 1 file changed, 1 insertion(+), 20 deletions(-)

--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1794,27 +1794,8 @@ vsock_stream_recvmsg(struct socket *sock
else if (sk->sk_shutdown & RCV_SHUTDOWN)
err = 0;
 
-   if (copied > 0) {
-   /* We only do these additional bookkeeping/notification steps
-* if we actually copied something out of the queue pair
-* instead of just peeking ahead.
-*/
-
-   if (!(flags & MSG_PEEK)) {
-   /* If the other side has shutdown for sending and there
-* is nothing more to read, then modify the socket
-* state.
-*/
-   if (vsk->peer_shutdown & SEND_SHUTDOWN) {
-   if (vsock_stream_has_data(vsk) <= 0) {
-   sk->sk_state = SS_UNCONNECTED;
-   sock_set_flag(sk, SOCK_DONE);
-   sk->sk_state_change(sk);
-   }
-   }
-   }
+   if (copied > 0)
err = copied;
-   }
 
 out_wait:
finish_wait(sk_sleep(sk), );

Re: [PATCH 1/2] net: ethernet: gianfar: use phydev from struct net_device

2016-05-16 Thread David Miller

From: Philippe Reynes 
Date: Mon, 16 May 2016 01:30:08 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH iproute2] ss: Tell user about -EOPNOTSUPP for SOCK_DESTROY

2016-05-16 Thread David Ahern


On 5/16/16 7:20 PM, Lorenzo Colitti wrote:

On Tue, May 17, 2016 at 10:14 AM, David Ahern  wrote:


For example, EOPNOTSUPP can just mean "this socket can't be closed
because it's a timewait or NEW_SYN_RECV socket". In hindsight it might
have been better to return EBADFD in those cases, but that still
doesn't solve the UI problem. If the user does something like "ss -K
dport = :443", the user would expect the command to kill all TCP
sockets and not just abort if there happens to be a UDP socket to port
443 (which can't be closed because UDP doesn't currently implement
SOCK_DESTROY).



Silently doing nothing is just as bad - or worse. I was running in circles 
trying to figure out why nothing was happening and ss was exiting 0.



At least that's documented to be the case in the man page.

On the other hand, if your patch is applied, there will be no way to
close more than one socket if one of them returns EOPNOTSUPP. On a
busy server where things go into TIME_WAIT all the time, you might
never be able to close all sockets.

If you want to inform the user, then you could do so via the return
value of ss - e.g., return 0 if at least one socket was printed and
closed, or 1 otherwise.



code is not setup to handle that. Only option seems to be at least dump 
an error message, but the message can not relate any of the specifics 
about the filter. So something like this though it dumps the message per 
socket matched by the filter. Could throttle it to once.


diff --git a/misc/ss.c b/misc/ss.c
index 23fff19d9199..1925c6fd9c36 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2264,8 +2264,12 @@ static int show_one_inet_sock(const struct 
sockaddr_nl *addr,

if (!(diag_arg->f->families & (1 << r->idiag_family)))
return 0;
if (diag_arg->f->kill && kill_inet_sock(h, arg) != 0) {
-   if (errno == EOPNOTSUPP || errno == ENOENT) {
-   /* Socket can't be closed, or is already closed. */
+   if (errno == ENOENT) {
+   /* socket is already closed. */
+   return 0;
+   /* Socket can't be closed OR config is not enabled */
+   } else if (errno == EOPNOTSUPP) {
+   perror("SOCK_DESTROY answers");
return 0;
} else {
perror("SOCK_DESTROY answers");

Re: [PATCH 1/2] net: ethernet: ftgmac100: use phydev from struct net_device

2016-05-16 Thread David Miller

From: Philippe Reynes 
Date: Mon, 16 May 2016 01:35:13 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH 2/2] net: ethernet: ftgmac100: use phy_ethtool_{get|set}_link_ksettings

2016-05-16 Thread David Miller

From: Philippe Reynes 
Date: Mon, 16 May 2016 01:35:14 +0200

> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Regarding vxlan unicast configuration

2016-05-16 Thread Ajith Adapa

Hi,

I am trying vxlan unicast configuration on a back to back connected
interfaces of VM1 and VM2 running fedora 23 with 4.2 version of
kernel. Below is my configuration

VM1

ip address add 100.1.1.1/24 dev enp0s8
ifconfig enp0s8 up
ip link add name vxlan42 type vxlan id 42 dev enp0s8 remote 50.1.1.2
local 50.1.1.1 dstport 4789
ip address add 50.1.1.1/24 dev vxlan42
ip link set up vxlan42

VM2

ip address add 100.1.1.2/24 dev enp0s8
ifconfig enp0s8 up
ip link add name vxlan42 type vxlan id 42 dev enp0s8 remote 50.1.1.1
local 50.1.1.2 dstport 4789
ip address add 50.1.1.2/24 dev vxlan42
ip link set up vxlan42


Now when I try to ping 50.1.1.1 from VM2, I am receiving ARP packets
on VM1 which are not vxlan tagged. As a result ping is not working.

I am able to successfully configure multicast based vxlan but having
issues with vxlan unicast.

Is there something wrong with my configuration ?

Regards,
Ajith

Re: [PATCH 2/2] net: ethernet: gianfar: use phy_ethtool_{get|set}_link_ksettings

2016-05-16 Thread David Miller

From: Philippe Reynes 
Date: Mon, 16 May 2016 01:30:09 +0200

> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH net-next] tuntap: introduce tx skb ring

2016-05-16 Thread Jason Wang




On 2016年05月16日 16:08, Michael S. Tsirkin wrote:

On Mon, May 16, 2016 at 03:52:11PM +0800, Jason Wang wrote:


On 2016年05月16日 12:23, Michael S. Tsirkin wrote:

On Mon, May 16, 2016 at 09:17:01AM +0800, Jason Wang wrote:

We used to queue tx packets in sk_receive_queue, this is less
efficient since it requires spinlocks to synchronize between producer
and consumer.

This patch tries to address this by using circular buffer which allows
lockless synchronization. This is done by switching from
sk_receive_queue to a tx skb ring with a new flag IFF_TX_RING and when
this is set:

Why do we need a new flag? Is there a userspace-visible
behaviour change?

Probably yes since tx_queue_length does not work.

So the flag name should reflect the behaviour somehow, not
the implementation.


- store pointer to skb in circular buffer in tun_net_xmit(), and read
   it from the circular buffer in tun_do_read().
- introduce a new proto_ops peek which could be implemented by
   specific socket which does not use sk_receive_queue.
- store skb length in circular buffer too, and implement a lockless
   peek for tuntap.
- change vhost_net to use proto_ops->peek() instead
- new spinlocks were introduced to synchronize among producers (and so
   did for consumers).

Pktgen test shows about 9% improvement on guest receiving pps:

Before: ~148pps
After : ~161pps

(I'm not sure noblocking read is still needed, so it was not included
  in this patch)

How do you mean? Of course we must support blocking and non-blocking
read - userspace uses it.

Ok, will add this.


Signed-off-by: Jason Wang 
---
---
  drivers/net/tun.c   | 157 +---
  drivers/vhost/net.c |  16 -
  include/linux/net.h |   1 +
  include/uapi/linux/if_tun.h |   1 +
  4 files changed, 165 insertions(+), 10 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 425e983..6001ece 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -71,6 +71,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
@@ -130,6 +131,8 @@ struct tap_filter {
  #define MAX_TAP_FLOWS  4096
  #define TUN_FLOW_EXPIRE (3 * HZ)
+#define TUN_RING_SIZE 256

Can we resize this according to tx_queue_len set by user?

We can, but it needs lots of other changes, e.g being notified when
tx_queue_len was changed by user.

Some kind of notifier?


Yes, maybe.


Probably better than a new user interface.


Ok.




And if tx_queue_length is not power of 2,
we probably need modulus to calculate the capacity.

Is that really that important for speed?


Not sure, I can test.


If yes, round it up to next power of two.


Right, this sounds a good solution.


You can also probably wrap it with a conditional instead.


+#define TUN_RING_MASK (TUN_RING_SIZE - 1)
  struct tun_pcpu_stats {
u64 rx_packets;
@@ -142,6 +145,11 @@ struct tun_pcpu_stats {
u32 rx_frame_errors;
  };
+struct tun_desc {
+   struct sk_buff *skb;
+   int len; /* Cached skb len for peeking */
+};
+
  /* A tun_file connects an open character device to a tuntap netdevice. It
   * also contains all socket related structures (except sock_fprog and 
tap_filter)
   * to serve as one transmit queue for tuntap device. The sock_fprog and
@@ -167,6 +175,13 @@ struct tun_file {
};
struct list_head next;
struct tun_struct *detached;
+   /* reader lock */
+   spinlock_t rlock;
+   unsigned long tail;
+   struct tun_desc tx_descs[TUN_RING_SIZE];
+   /* writer lock */
+   spinlock_t wlock;
+   unsigned long head;
  };
  struct tun_flow_entry {
@@ -515,7 +530,27 @@ static struct tun_struct *tun_enable_queue(struct tun_file 
*tfile)
  static void tun_queue_purge(struct tun_file *tfile)
  {
+   unsigned long head, tail;
+   struct tun_desc *desc;
+   struct sk_buff *skb;
skb_queue_purge(>sk.sk_receive_queue);
+   spin_lock(>rlock);
+
+   head = ACCESS_ONCE(tfile->head);
+   tail = tfile->tail;
+
+   /* read tail before reading descriptor at tail */
+   smp_rmb();

I think you mean read *head* here

Right.




+
+   while (CIRC_CNT(head, tail, TUN_RING_SIZE) >= 1) {
+   desc = >tx_descs[tail];
+   skb = desc->skb;
+   kfree_skb(skb);
+   tail = (tail + 1) & TUN_RING_MASK;
+   /* read descriptor before incrementing tail. */
+   smp_store_release(>tail, tail & TUN_RING_MASK);
+   }
+   spin_unlock(>rlock);
skb_queue_purge(>sk.sk_error_queue);
  }


Barrier pairing seems messed up. Could you tag
each barrier with its pair pls?
E.g. add /* Barrier A for pairing */ Before barrier and
its pair.

Ok.

for both tun_queue_purge() and tun_do_read():

smp_rmb() is paired with smp_store_release() in tun_net_xmit().

this seems at least an overkill. rmb would normally be paired with wmb,
not a full mb within release.


wmb is not enough here. We need

[PATCH 4.5 036/101] VSOCK: do not disconnect socket when peer has shutdown SEND only

2016-05-16 Thread Greg Kroah-Hartman

4.5-stable review patch.  If anyone has any objections, please let me know.

--

From: Ian Campbell 

[ Upstream commit dedc58e067d8c379a15a8a183c5db318201295bb ]

The peer may be expecting a reply having sent a request and then done a
shutdown(SHUT_WR), so tearing down the whole socket at this point seems
wrong and breaks for me with a client which does a SHUT_WR.

Looking at other socket family's stream_recvmsg callbacks doing a shutdown
here does not seem to be the norm and removing it does not seem to have
had any adverse effects that I can see.

I'm using Stefan's RFC virtio transport patches, I'm unsure of the impact
on the vmci transport.

Signed-off-by: Ian Campbell 
Cc: "David S. Miller" 
Cc: Stefan Hajnoczi 
Cc: Claudio Imbrenda 
Cc: Andy King 
Cc: Dmitry Torokhov 
Cc: Jorgen Hansen 
Cc: Adit Ranadive 
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller 
Signed-off-by: Greg Kroah-Hartman 
---
 net/vmw_vsock/af_vsock.c |   21 +
 1 file changed, 1 insertion(+), 20 deletions(-)

--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1789,27 +1789,8 @@ vsock_stream_recvmsg(struct socket *sock
else if (sk->sk_shutdown & RCV_SHUTDOWN)
err = 0;
 
-   if (copied > 0) {
-   /* We only do these additional bookkeeping/notification steps
-* if we actually copied something out of the queue pair
-* instead of just peeking ahead.
-*/
-
-   if (!(flags & MSG_PEEK)) {
-   /* If the other side has shutdown for sending and there
-* is nothing more to read, then modify the socket
-* state.
-*/
-   if (vsk->peer_shutdown & SEND_SHUTDOWN) {
-   if (vsock_stream_has_data(vsk) <= 0) {
-   sk->sk_state = SS_UNCONNECTED;
-   sock_set_flag(sk, SOCK_DONE);
-   sk->sk_state_change(sk);
-   }
-   }
-   }
+   if (copied > 0)
err = copied;
-   }
 
 out:
release_sock(sk);

Re: [PATCH iproute2] ss: Tell user about -EOPNOTSUPP for SOCK_DESTROY

2016-05-16 Thread Lorenzo Colitti

On Tue, May 17, 2016 at 10:14 AM, David Ahern  wrote:
>>
>> For example, EOPNOTSUPP can just mean "this socket can't be closed
>> because it's a timewait or NEW_SYN_RECV socket". In hindsight it might
>> have been better to return EBADFD in those cases, but that still
>> doesn't solve the UI problem. If the user does something like "ss -K
>> dport = :443", the user would expect the command to kill all TCP
>> sockets and not just abort if there happens to be a UDP socket to port
>> 443 (which can't be closed because UDP doesn't currently implement
>> SOCK_DESTROY).
>
>
> Silently doing nothing is just as bad - or worse. I was running in circles 
> trying to figure out why nothing was happening and ss was exiting 0.

At least that's documented to be the case in the man page.

On the other hand, if your patch is applied, there will be no way to
close more than one socket if one of them returns EOPNOTSUPP. On a
busy server where things go into TIME_WAIT all the time, you might
never be able to close all sockets.

If you want to inform the user, then you could do so via the return
value of ss - e.g., return 0 if at least one socket was printed and
closed, or 1 otherwise.

Re: [PATCH iproute2] ss: Tell user about -EOPNOTSUPP for SOCK_DESTROY

2016-05-16 Thread David Ahern


On 5/16/16 7:01 PM, Lorenzo Colitti wrote:

On Tue, May 17, 2016 at 8:53 AM, David Ahern  wrote:

@@ -2264,7 +2264,7 @@ static int show_one_inet_sock(const struct sockaddr_nl 
*addr,
if (!(diag_arg->f->families & (1 << r->idiag_family)))
return 0;
if (diag_arg->f->kill && kill_inet_sock(h, arg) != 0) {
-   if (errno == EOPNOTSUPP || errno == ENOENT) {
+   if (errno == ENOENT) {
/* Socket can't be closed, or is already closed. */
return 0;
} else {


I don't think you can do this without breaking the functionality of -K.

The else branch will cause show_one_inet_sock to return -1, which will
cause rtnl_dump_filter to abort and not close any other sockets that
the user requested killing. That's incorrect, because getting
EOPNOTSUPP on one socket doesn't necessarily mean we'll get EOPNOTSUPP
on any future sockets in the same dump.

For example, EOPNOTSUPP can just mean "this socket can't be closed
because it's a timewait or NEW_SYN_RECV socket". In hindsight it might
have been better to return EBADFD in those cases, but that still
doesn't solve the UI problem. If the user does something like "ss -K
dport = :443", the user would expect the command to kill all TCP
sockets and not just abort if there happens to be a UDP socket to port
443 (which can't be closed because UDP doesn't currently implement
SOCK_DESTROY).



Silently doing nothing is just as bad - or worse. I was running in 
circles trying to figure out why nothing was happening and ss was 
exiting 0.

Re: [PATCH] net: diag: Tell user if support for destroying TCP sockets is not enabled

2016-05-16 Thread David Ahern


On 5/16/16 6:49 PM, Lorenzo Colitti wrote:

On Tue, May 17, 2016 at 8:53 AM, David Ahern  wrote:

+#else
+static int tcp_diag_destroy(struct sk_buff *in_skb,
+   const struct inet_diag_req_v2 *req)
+{
+   return -EOPNOTSUPP;
+}
 #endif


I don't understand why you need this. inet_diag_cmd_exact already
returns EOPNOTSUPP if tcp_diag_handler.destroy is NULL:

else if (cmd == SOCK_DIAG_BY_FAMILY)
err = handler->dump_one(in_skb, nlh, req);
else if (cmd == SOCK_DESTROY && handler->destroy)
err = handler->destroy(in_skb, req);
else
err = -EOPNOTSUPP;

Is this not working for some reason?



hmmm kernel patch is not needed. Suppression was happening in ss.

Re: [PATCH iproute2] ss: Tell user about -EOPNOTSUPP for SOCK_DESTROY

2016-05-16 Thread Lorenzo Colitti

On Tue, May 17, 2016 at 8:53 AM, David Ahern  wrote:
> @@ -2264,7 +2264,7 @@ static int show_one_inet_sock(const struct sockaddr_nl 
> *addr,
> if (!(diag_arg->f->families & (1 << r->idiag_family)))
> return 0;
> if (diag_arg->f->kill && kill_inet_sock(h, arg) != 0) {
> -   if (errno == EOPNOTSUPP || errno == ENOENT) {
> +   if (errno == ENOENT) {
> /* Socket can't be closed, or is already closed. */
> return 0;
> } else {

I don't think you can do this without breaking the functionality of -K.

The else branch will cause show_one_inet_sock to return -1, which will
cause rtnl_dump_filter to abort and not close any other sockets that
the user requested killing. That's incorrect, because getting
EOPNOTSUPP on one socket doesn't necessarily mean we'll get EOPNOTSUPP
on any future sockets in the same dump.

For example, EOPNOTSUPP can just mean "this socket can't be closed
because it's a timewait or NEW_SYN_RECV socket". In hindsight it might
have been better to return EBADFD in those cases, but that still
doesn't solve the UI problem. If the user does something like "ss -K
dport = :443", the user would expect the command to kill all TCP
sockets and not just abort if there happens to be a UDP socket to port
443 (which can't be closed because UDP doesn't currently implement
SOCK_DESTROY).

Re: [PATCH] net: diag: Tell user if support for destroying TCP sockets is not enabled

2016-05-16 Thread Lorenzo Colitti

On Tue, May 17, 2016 at 8:53 AM, David Ahern  wrote:
> +#else
> +static int tcp_diag_destroy(struct sk_buff *in_skb,
> +   const struct inet_diag_req_v2 *req)
> +{
> +   return -EOPNOTSUPP;
> +}
>  #endif

I don't understand why you need this. inet_diag_cmd_exact already
returns EOPNOTSUPP if tcp_diag_handler.destroy is NULL:

else if (cmd == SOCK_DIAG_BY_FAMILY)
err = handler->dump_one(in_skb, nlh, req);
else if (cmd == SOCK_DESTROY && handler->destroy)
err = handler->destroy(in_skb, req);
else
err = -EOPNOTSUPP;

Is this not working for some reason?

linux-next: manual merge of the net-next tree with the arm64 tree

2016-05-16 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  arch/arm64/Kconfig

between commit:

  8ee708792e1c ("arm64: Kconfig: remove redundant 
HAVE_ARCH_TRANSPARENT_HUGEPAGE definition")

from the arm64 tree and commit:

  606b5908 ("bpf: split HAVE_BPF_JIT into cBPF and eBPF variant")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/arm64/Kconfig
index 8845c0d100d7,e6761ea2feec..
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@@ -59,9 -58,7 +59,9 @@@ config ARM6
select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT
select HAVE_ARCH_SECCOMP_FILTER
select HAVE_ARCH_TRACEHOOK
 +  select HAVE_ARCH_TRANSPARENT_HUGEPAGE
 +  select HAVE_ARM_SMCCC
-   select HAVE_BPF_JIT
+   select HAVE_EBPF_JIT
select HAVE_C_RECORDMCOUNT
select HAVE_CC_STACKPROTECTOR
select HAVE_CMPXCHG_DOUBLE

Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

2016-05-16 Thread John Stultz

On Wed, May 11, 2016 at 3:00 PM, Dean Jenkins  wrote:
>
> Your observations are consistent with missing URBs from the USB host
> controller.
>
> Here is a summary of what I think is happening in your case:
>
> Good case:
> URB #1: 1514 octets of 1514 Ethernet frame (A)
> URB #2: 1514 octets of 1514 Ethernet frame (B) + 526 octets of 1514 Ethernet
> frame (C)
> URB #3: 988 octets of 1514 Ethernet frame (C)
> URB #4: 1514 octets of 1514 Ethernet frame (D)
>
> Therefore, Ethernet frame (C) is spanning URBs #2 and #3.
>
> Bad case, URB #3 is lost:
> URB #1: 1514 octets of 1514 Ethernet frame (A)
> URB #2: 1514 octets of 1514 Ethernet frame (B) + 526 octets of 1514 Ethernet
> frame (C)
> Remaining is 988
> URB #4: 1514 octets of 1514 Ethernet frame (D)
>
> But when URB #4 is analysed the 32-bit Header word is not found after 988
> octets in the URB buffer so "sync lost".
> The end of Ethernet frame (C) is missing so drop the Ethernet frame.
> Now look at the start of the URB #4 buffer and find a 32-bit header word so
> Ethernet frame (D) can be consumed.
>
> So I think the commit is acting as intended and you are suffering from lost
> URBs.

No. I went digging on this for a bit longer, and it looks like its
just that you're calculating the offset wrong in your check.

I was wondering why without your patch we wouldn't see "Bad Header
Length" messages, since if the remaining was 988 and the skb->len was
2048 as seen in my logs, without your patch we should copy the 988
bytes out clear remaining and then continue processing the rest of the
skb, which calculates the header and checks the size. If we really
lost the URB, we should throw an error at that point, since really
we'd be midway through the following frame.  But we just don't see
that with your patch removed.

Looking more closely, in the main loop, we do:
(where offset is zero, or set to "offset += (copy_length + 1) &
0xfffe" in the previous loop)
rx->header = get_unaligned_le32(skb->data +
offset);
offset += sizeof(u32);

But your check calculates:
offset = ((rx->remaining + 1) & 0xfffe) + sizeof(u32);
rx->header = get_unaligned_le32(skb->data + offset);

Adding some debug logic to check those offset calculation used to find
rx->header, the one in your code is always too large by sizeof(u32).

So removing the extra addition in your offset calculation seems to
solve this for me.

I'll send out a patch here shortly.

thanks
-john

[PATCH v2 net-next] bpf: arm64: remove callee-save registers use for tmp registers

2016-05-16 Thread Yang Shi

In the current implementation of ARM64 eBPF JIT, R23 and R24 are used for
tmp registers, which are callee-saved registers. This leads to variable size
of JIT prologue and epilogue. The latest blinding constant change prefers to
constant size of prologue and epilogue. AAPCS reserves R9 ~ R15 for temp
registers which not need to be saved/restored during function call. So, replace
R23 and R24 to R10 and R11, and remove tmp_used flag to save 2 instructions for
some jited BPF program.

CC: Daniel Borkmann 
Acked-by: Zi Shen Lim 
Signed-off-by: Yang Shi 
---
Changelog v1 --> v2:
  * Updated stack diagram
  * Added the comment from Zi for the commit log
  * Added Zi's Acked-by

Apply on top of Daniel's blinding constant patchset

 arch/arm64/net/bpf_jit_comp.c | 34 +-
 1 file changed, 5 insertions(+), 29 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index d0d5190..49ba37e 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -51,9 +51,9 @@ static const int bpf2a64[] = {
[BPF_REG_9] = A64_R(22),
/* read-only frame pointer to access stack */
[BPF_REG_FP] = A64_R(25),
-   /* temporary register for internal BPF JIT */
-   [TMP_REG_1] = A64_R(23),
-   [TMP_REG_2] = A64_R(24),
+   /* temporary registers for internal BPF JIT */
+   [TMP_REG_1] = A64_R(10),
+   [TMP_REG_2] = A64_R(11),
/* temporary register for blinding constants */
[BPF_REG_AX] = A64_R(9),
 };
@@ -61,7 +61,6 @@ static const int bpf2a64[] = {
 struct jit_ctx {
const struct bpf_prog *prog;
int idx;
-   int tmp_used;
int epilogue_offset;
int *offset;
u32 *image;
@@ -154,8 +153,6 @@ static void build_prologue(struct jit_ctx *ctx)
const u8 r8 = bpf2a64[BPF_REG_8];
const u8 r9 = bpf2a64[BPF_REG_9];
const u8 fp = bpf2a64[BPF_REG_FP];
-   const u8 tmp1 = bpf2a64[TMP_REG_1];
-   const u8 tmp2 = bpf2a64[TMP_REG_2];
 
/*
 * BPF prog stack layout
@@ -167,7 +164,7 @@ static void build_prologue(struct jit_ctx *ctx)
 *| ... | callee saved registers
 *+-+
 *| | x25/x26
-* BPF fp register => -80:+-+ <= (BPF_FP)
+* BPF fp register => -64:+-+ <= (BPF_FP)
 *| |
 *| ... | BPF prog stack
 *| |
@@ -189,8 +186,6 @@ static void build_prologue(struct jit_ctx *ctx)
/* Save callee-saved register */
emit(A64_PUSH(r6, r7, A64_SP), ctx);
emit(A64_PUSH(r8, r9, A64_SP), ctx);
-   if (ctx->tmp_used)
-   emit(A64_PUSH(tmp1, tmp2, A64_SP), ctx);
 
/* Save fp (x25) and x26. SP requires 16 bytes alignment */
emit(A64_PUSH(fp, A64_R(26), A64_SP), ctx);
@@ -210,8 +205,6 @@ static void build_epilogue(struct jit_ctx *ctx)
const u8 r8 = bpf2a64[BPF_REG_8];
const u8 r9 = bpf2a64[BPF_REG_9];
const u8 fp = bpf2a64[BPF_REG_FP];
-   const u8 tmp1 = bpf2a64[TMP_REG_1];
-   const u8 tmp2 = bpf2a64[TMP_REG_2];
 
/* We're done with BPF stack */
emit(A64_ADD_I(1, A64_SP, A64_SP, STACK_SIZE), ctx);
@@ -220,8 +213,6 @@ static void build_epilogue(struct jit_ctx *ctx)
emit(A64_POP(fp, A64_R(26), A64_SP), ctx);
 
/* Restore callee-saved register */
-   if (ctx->tmp_used)
-   emit(A64_POP(tmp1, tmp2, A64_SP), ctx);
emit(A64_POP(r8, r9, A64_SP), ctx);
emit(A64_POP(r6, r7, A64_SP), ctx);
 
@@ -317,7 +308,6 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
emit(A64_UDIV(is64, dst, dst, src), ctx);
break;
case BPF_MOD:
-   ctx->tmp_used = 1;
emit(A64_UDIV(is64, tmp, dst, src), ctx);
emit(A64_MUL(is64, tmp, tmp, src), ctx);
emit(A64_SUB(is64, dst, dst, tmp), ctx);
@@ -390,49 +380,41 @@ emit_bswap_uxt:
/* dst = dst OP imm */
case BPF_ALU | BPF_ADD | BPF_K:
case BPF_ALU64 | BPF_ADD | BPF_K:
-   ctx->tmp_used = 1;
emit_a64_mov_i(is64, tmp, imm, ctx);
emit(A64_ADD(is64, dst, dst, tmp), ctx);
break;
case BPF_ALU | BPF_SUB | BPF_K:
case BPF_ALU64 | BPF_SUB | BPF_K:
-   ctx->tmp_used = 1;
emit_a64_mov_i(is64, tmp, imm, ctx);
emit(A64_SUB(is64, dst, dst, tmp), ctx);
break;
case BPF_ALU | BPF_AND | BPF_K:
case BPF_ALU64 | BPF_AND | BPF_K:
-   ctx->tmp_used = 1;
emit_a64_mov_i(is64, tmp, imm, ctx);
emit(A64_AND(is64, dst, dst, tmp), ctx);

Re: [PATCH v6 net-next 14/14] ip4ip6: Support for GSO/GRO

2016-05-16 Thread Alexander Duyck

On Mon, May 16, 2016 at 2:33 PM, Tom Herbert  wrote:
> Signed-off-by: Tom Herbert 
> ---
>  include/net/inet_common.h |  5 +
>  net/ipv4/af_inet.c| 12 +++-
>  net/ipv6/ip6_offload.c| 33 -
>  net/ipv6/ip6_tunnel.c |  3 +++
>  4 files changed, 47 insertions(+), 6 deletions(-)
>
> diff --git a/include/net/inet_common.h b/include/net/inet_common.h
> index 109e3ee..5d68342 100644
> --- a/include/net/inet_common.h
> +++ b/include/net/inet_common.h
> @@ -39,6 +39,11 @@ int inet_ctl_sock_create(struct sock **sk, unsigned short 
> family,
>  int inet_recv_error(struct sock *sk, struct msghdr *msg, int len,
> int *addr_len);
>
> +struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff 
> *skb);
> +int inet_gro_complete(struct sk_buff *skb, int nhoff);
> +struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> +netdev_features_t features);
> +
>  static inline void inet_ctl_sock_destroy(struct sock *sk)
>  {
> if (sk)
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 25040b1..377424e 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1192,8 +1192,8 @@ int inet_sk_rebuild_header(struct sock *sk)
>  }
>  EXPORT_SYMBOL(inet_sk_rebuild_header);
>
> -static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> -   netdev_features_t features)
> +struct sk_buff *inet_gso_segment(struct sk_buff *skb,
> +netdev_features_t features)
>  {
> bool udpfrag = false, fixedid = false, encap;
> struct sk_buff *segs = ERR_PTR(-EINVAL);
> @@ -1280,9 +1280,9 @@ static struct sk_buff *inet_gso_segment(struct sk_buff 
> *skb,
>  out:
> return segs;
>  }
> +EXPORT_SYMBOL(inet_gso_segment);
>
> -static struct sk_buff **inet_gro_receive(struct sk_buff **head,
> -struct sk_buff *skb)
> +struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>  {
> const struct net_offload *ops;
> struct sk_buff **pp = NULL;
> @@ -1398,6 +1398,7 @@ out:
>
> return pp;
>  }
> +EXPORT_SYMBOL(inet_gro_receive);
>
>  static struct sk_buff **ipip_gro_receive(struct sk_buff **head,
>  struct sk_buff *skb)
> @@ -1449,7 +1450,7 @@ int inet_recv_error(struct sock *sk, struct msghdr 
> *msg, int len, int *addr_len)
> return -EINVAL;
>  }
>
> -static int inet_gro_complete(struct sk_buff *skb, int nhoff)
> +int inet_gro_complete(struct sk_buff *skb, int nhoff)
>  {
> __be16 newlen = htons(skb->len - nhoff);
> struct iphdr *iph = (struct iphdr *)(skb->data + nhoff);
> @@ -1479,6 +1480,7 @@ out_unlock:
>
> return err;
>  }
> +EXPORT_SYMBOL(inet_gro_complete);
>
>  static int ipip_gro_complete(struct sk_buff *skb, int nhoff)
>  {
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index 332d6a0..22e90e5 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -16,6 +16,7 @@
>
>  #include 
>  #include 
> +#include 
>
>  #include "ip6_offload.h"
>
> @@ -268,6 +269,21 @@ static struct sk_buff **sit_ip6ip6_gro_receive(struct 
> sk_buff **head,
> return ipv6_gro_receive(head, skb);
>  }
>
> +static struct sk_buff **ip4ip6_gro_receive(struct sk_buff **head,
> +  struct sk_buff *skb)
> +{
> +   /* Common GRO receive for SIT and IP6IP6 */
> +
> +   if (NAPI_GRO_CB(skb)->encap_mark) {
> +   NAPI_GRO_CB(skb)->flush = 1;
> +   return NULL;
> +   }
> +
> +   NAPI_GRO_CB(skb)->encap_mark = 1;
> +
> +   return inet_gro_receive(head, skb);
> +}
> +
>  static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
>  {
> const struct net_offload *ops;
> @@ -307,6 +323,13 @@ static int ip6ip6_gro_complete(struct sk_buff *skb, int 
> nhoff)
> return ipv6_gro_complete(skb, nhoff);
>  }
>
> +static int ip4ip6_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> +   skb->encapsulation = 1;
> +   skb_shinfo(skb)->gso_type |= SKB_GSO_IPXIP6;
> +   return inet_gro_complete(skb, nhoff);
> +}
> +
>  static struct packet_offload ipv6_packet_offload __read_mostly = {
> .type = cpu_to_be16(ETH_P_IPV6),
> .callbacks = {
> @@ -324,6 +347,14 @@ static const struct net_offload sit_offload = {
> },
>  };
>
> +static const struct net_offload ip4ip6_offload = {
> +   .callbacks = {
> +   .gso_segment= inet_gso_segment,
> +   .gro_receive= ip4ip6_gro_receive,
> +   .gro_complete   = ip4ip6_gro_complete,
> +   },
> +};
> +
>  static const struct net_offload ip6ip6_offload = {
> .callbacks = {
> .gso_segment= ipv6_gso_segment,
> @@ -331,7 +362,6 @@ static const struct net_offload ip6ip6_offload = {
>

[PATCH iproute2] ss: Tell user about -EOPNOTSUPP for SOCK_DESTROY

2016-05-16 Thread David Ahern

Silent failures are not friendly to the user. If a command is
not supported tell the user about it.

Signed-off-by: David Ahern 
---
 misc/ss.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index 23fff19d9199..bd7214c85938 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -2264,7 +2264,7 @@ static int show_one_inet_sock(const struct sockaddr_nl 
*addr,
if (!(diag_arg->f->families & (1 << r->idiag_family)))
return 0;
if (diag_arg->f->kill && kill_inet_sock(h, arg) != 0) {
-   if (errno == EOPNOTSUPP || errno == ENOENT) {
+   if (errno == ENOENT) {
/* Socket can't be closed, or is already closed. */
return 0;
} else {
-- 
2.1.4

[PATCH] net: diag: Tell user if support for destroying TCP sockets is not enabled

2016-05-16 Thread David Ahern

Commit c1e64e298b8c added support for destroying TCP sockets but it is
wrapped in a config option. If the option is not enabled the user is given
no feedback and ss for example just exits 0 which is not a friendly UI:

$ ss -4  state established sport = :22
Netid  Recv-Q Send-Q  Local Address:Port Peer Address:Port
tcp0  0   10.1.1.2:ssh   192.168.2.50:47438

$ ss -4  -K state established sport = :22 dport = :47438
Netid  Recv-Q Send-Q  Local Address:Port Peer Address:Port
(nothing else in the output and the connection lives on).

Fix by returning an error to the user if the config option is not
enabled:

$ ss -4 -K state established sport = :22 dport = :47450
Netid  Recv-Q Send-Q  Local Address:Port Peer Address:Port
SOCK_DESTROY answers: Operation not supported

Fixes: c1e64e298b8c ("net: diag: Support destroying TCP sockets.")
Signed-off-by: David Ahern 
---
 net/ipv4/tcp_diag.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index 4d610934fb39..99590423d468 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -60,6 +60,12 @@ static int tcp_diag_destroy(struct sk_buff *in_skb,
 
return sock_diag_destroy(sk, ECONNABORTED);
 }
+#else
+static int tcp_diag_destroy(struct sk_buff *in_skb,
+   const struct inet_diag_req_v2 *req)
+{
+   return -EOPNOTSUPP;
+}
 #endif
 
 static const struct inet_diag_handler tcp_diag_handler = {
@@ -68,9 +74,7 @@ static const struct inet_diag_handler tcp_diag_handler = {
.idiag_get_info  = tcp_diag_get_info,
.idiag_type  = IPPROTO_TCP,
.idiag_info_size = sizeof(struct tcp_info),
-#ifdef CONFIG_INET_DIAG_DESTROY
.destroy = tcp_diag_destroy,
-#endif
 };
 
 static int __init tcp_diag_init(void)
-- 
2.1.4

Re: [PATCH v6 net-next 13/14] ip6ip6: Support for GSO/GRO

2016-05-16 Thread Alexander Duyck

On Mon, May 16, 2016 at 2:33 PM, Tom Herbert  wrote:
> Signed-off-by: Tom Herbert 
> ---
>  net/ipv6/ip6_offload.c | 24 +---
>  net/ipv6/ip6_tunnel.c  |  3 +++
>  2 files changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
> index 787e55f..332d6a0 100644
> --- a/net/ipv6/ip6_offload.c
> +++ b/net/ipv6/ip6_offload.c
> @@ -253,9 +253,11 @@ out:
> return pp;
>  }
>
> -static struct sk_buff **sit_gro_receive(struct sk_buff **head,
> -   struct sk_buff *skb)
> +static struct sk_buff **sit_ip6ip6_gro_receive(struct sk_buff **head,
> +  struct sk_buff *skb)
>  {
> +   /* Common GRO receive for SIT and IP6IP6 */
> +
> if (NAPI_GRO_CB(skb)->encap_mark) {
> NAPI_GRO_CB(skb)->flush = 1;
> return NULL;
> @@ -298,6 +300,13 @@ static int sit_gro_complete(struct sk_buff *skb, int 
> nhoff)
> return ipv6_gro_complete(skb, nhoff);
>  }
>
> +static int ip6ip6_gro_complete(struct sk_buff *skb, int nhoff)
> +{
> +   skb->encapsulation = 1;
> +   skb_shinfo(skb)->gso_type |= SKB_GSO_IPXIP6;
> +   return ipv6_gro_complete(skb, nhoff);
> +}
> +
>  static struct packet_offload ipv6_packet_offload __read_mostly = {
> .type = cpu_to_be16(ETH_P_IPV6),
> .callbacks = {
> @@ -310,11 +319,19 @@ static struct packet_offload ipv6_packet_offload 
> __read_mostly = {
>  static const struct net_offload sit_offload = {
> .callbacks = {
> .gso_segment= ipv6_gso_segment,
> -   .gro_receive= sit_gro_receive,
> +   .gro_receive= sit_ip6ip6_gro_receive,
> .gro_complete   = sit_gro_complete,
> },
>  };
>
> +static const struct net_offload ip6ip6_offload = {
> +   .callbacks = {
> +   .gso_segment= ipv6_gso_segment,
> +   .gro_receive= sit_ip6ip6_gro_receive,
> +   .gro_complete   = ip6ip6_gro_complete,
> +   },
> +};
> +
>  static int __init ipv6_offload_init(void)
>  {
>
> @@ -326,6 +343,7 @@ static int __init ipv6_offload_init(void)
> dev_add_offload(_packet_offload);
>
> inet_add_offload(_offload, IPPROTO_IPV6);
> +   inet6_add_offload(_offload, IPPROTO_IPV6);
>
> return 0;
>  }
> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
> index 8076c7a..d205f17 100644
> --- a/net/ipv6/ip6_tunnel.c
> +++ b/net/ipv6/ip6_tunnel.c
> @@ -1238,6 +1238,9 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device 
> *dev)
> if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
> fl6.flowi6_mark = skb->mark;
>
> +   if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
> +   return -1;
> +
> err = ip6_tnl_xmit(skb, dev, dsfield, , encap_limit, ,
>IPPROTO_IPV6);
> if (err != 0) {

So one piece you are missing here is
skb_set_inner_ipproto(IPPROTO_IPV6).  Without that the tunnel offload
could be a bit confused as the inner protocol type defaults to
ENCAP_TYPE_ETHER.

- Alex

Re: [PATCH net-next] bpf: arm64: remove callee-save registers use for tmp registers

2016-05-16 Thread Shi, Yang


On 5/16/2016 4:45 PM, Z Lim wrote:

Hi Yang,

On Mon, May 16, 2016 at 4:09 PM, Yang Shi  wrote:

In the current implementation of ARM64 eBPF JIT, R23 and R24 are used for
tmp registers, which are callee-saved registers. This leads to variable size
of JIT prologue and epilogue. The latest blinding constant change prefers to
constant size of prologue and epilogue. AAPCS reserves R9 ~ R15 for temp
registers which not need to be saved/restored during function call. So, replace
R23 and R24 to R10 and R11, and remove tmp_used flag.

CC: Zi Shen Lim 
CC: Daniel Borkmann 
Signed-off-by: Yang Shi 
---


Couple suggestions, but otherwise:
Acked-by: Zi Shen Lim 

1. Update the diagram. I think it should now be:

-* BPF fp register => -80:+-+ <= (BPF_FP)
+* BPF fp register => -64:+-+ <= (BPF_FP)


Nice catch. I forgot the stack diagram.



2. Add a comment in commit log along the lines of: this is an
optimization saving 2 instructions per jited BPF program.


Sure, will address in V2.

Thanks,
Yang



Thanks :)

z


Apply on top of Daniel's blinding constant patchset.

 arch/arm64/net/bpf_jit_comp.c | 32 
 1 file changed, 4 insertions(+), 28 deletions(-)

Re: [ethtool 0/3][pull request] Intel Wired LAN Driver Updates 2016-05-03

2016-05-16 Thread Jeff Kirsher

On Wed, 2016-05-04 at 09:44 -0700, Jeff Kirsher wrote:
> This series contains updates to ixgbe in ethtool.
> 
> Preethi adds missing device IDs and mac_type definitions, also updated
> the display registers for x550, x550em_x/a.  Cleaned up the format string
> storage by taking advantage of "for" loops.
> 
> The following are changes since commit
> deb1c6613ec14fd828d321e38c7bea45fe559bd5:
>   Release version 4.5.
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/ethtool master
> 
> Preethi Banala (3):
>   ethtool/ixgbe: Add device ID and mac_type definitions
>   ethtool/ixgbe: Correct offsets and support x550, x550em_x, x550em_a
>   ethtool/ixgbe: Reduce format string storage
> 
>  ixgbe.c | 173 +++---
> --
>  1 file changed, 95 insertions(+), 78 deletions(-)
> 

Ping?  Ben do you have these changes queued up for ethtool?

signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-next] bpf: arm64: remove callee-save registers use for tmp registers

2016-05-16 Thread Z Lim

Hi Yang,

On Mon, May 16, 2016 at 4:09 PM, Yang Shi  wrote:
> In the current implementation of ARM64 eBPF JIT, R23 and R24 are used for
> tmp registers, which are callee-saved registers. This leads to variable size
> of JIT prologue and epilogue. The latest blinding constant change prefers to
> constant size of prologue and epilogue. AAPCS reserves R9 ~ R15 for temp
> registers which not need to be saved/restored during function call. So, 
> replace
> R23 and R24 to R10 and R11, and remove tmp_used flag.
>
> CC: Zi Shen Lim 
> CC: Daniel Borkmann 
> Signed-off-by: Yang Shi 
> ---

Couple suggestions, but otherwise:
Acked-by: Zi Shen Lim 

1. Update the diagram. I think it should now be:

-* BPF fp register => -80:+-+ <= (BPF_FP)
+* BPF fp register => -64:+-+ <= (BPF_FP)

2. Add a comment in commit log along the lines of: this is an
optimization saving 2 instructions per jited BPF program.

Thanks :)

z

> Apply on top of Daniel's blinding constant patchset.
>
>  arch/arm64/net/bpf_jit_comp.c | 32 
>  1 file changed, 4 insertions(+), 28 deletions(-)
>

[PATCH net-next] bpf: arm64: remove callee-save registers use for tmp registers

2016-05-16 Thread Yang Shi

In the current implementation of ARM64 eBPF JIT, R23 and R24 are used for
tmp registers, which are callee-saved registers. This leads to variable size
of JIT prologue and epilogue. The latest blinding constant change prefers to
constant size of prologue and epilogue. AAPCS reserves R9 ~ R15 for temp
registers which not need to be saved/restored during function call. So, replace
R23 and R24 to R10 and R11, and remove tmp_used flag.

CC: Zi Shen Lim 
CC: Daniel Borkmann 
Signed-off-by: Yang Shi 
---
Apply on top of Daniel's blinding constant patchset.

 arch/arm64/net/bpf_jit_comp.c | 32 
 1 file changed, 4 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index d0d5190..ef3055a 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -51,9 +51,9 @@ static const int bpf2a64[] = {
[BPF_REG_9] = A64_R(22),
/* read-only frame pointer to access stack */
[BPF_REG_FP] = A64_R(25),
-   /* temporary register for internal BPF JIT */
-   [TMP_REG_1] = A64_R(23),
-   [TMP_REG_2] = A64_R(24),
+   /* temporary registers for internal BPF JIT */
+   [TMP_REG_1] = A64_R(10),
+   [TMP_REG_2] = A64_R(11),
/* temporary register for blinding constants */
[BPF_REG_AX] = A64_R(9),
 };
@@ -61,7 +61,6 @@ static const int bpf2a64[] = {
 struct jit_ctx {
const struct bpf_prog *prog;
int idx;
-   int tmp_used;
int epilogue_offset;
int *offset;
u32 *image;
@@ -154,8 +153,6 @@ static void build_prologue(struct jit_ctx *ctx)
const u8 r8 = bpf2a64[BPF_REG_8];
const u8 r9 = bpf2a64[BPF_REG_9];
const u8 fp = bpf2a64[BPF_REG_FP];
-   const u8 tmp1 = bpf2a64[TMP_REG_1];
-   const u8 tmp2 = bpf2a64[TMP_REG_2];
 
/*
 * BPF prog stack layout
@@ -189,8 +186,6 @@ static void build_prologue(struct jit_ctx *ctx)
/* Save callee-saved register */
emit(A64_PUSH(r6, r7, A64_SP), ctx);
emit(A64_PUSH(r8, r9, A64_SP), ctx);
-   if (ctx->tmp_used)
-   emit(A64_PUSH(tmp1, tmp2, A64_SP), ctx);
 
/* Save fp (x25) and x26. SP requires 16 bytes alignment */
emit(A64_PUSH(fp, A64_R(26), A64_SP), ctx);
@@ -210,8 +205,6 @@ static void build_epilogue(struct jit_ctx *ctx)
const u8 r8 = bpf2a64[BPF_REG_8];
const u8 r9 = bpf2a64[BPF_REG_9];
const u8 fp = bpf2a64[BPF_REG_FP];
-   const u8 tmp1 = bpf2a64[TMP_REG_1];
-   const u8 tmp2 = bpf2a64[TMP_REG_2];
 
/* We're done with BPF stack */
emit(A64_ADD_I(1, A64_SP, A64_SP, STACK_SIZE), ctx);
@@ -220,8 +213,6 @@ static void build_epilogue(struct jit_ctx *ctx)
emit(A64_POP(fp, A64_R(26), A64_SP), ctx);
 
/* Restore callee-saved register */
-   if (ctx->tmp_used)
-   emit(A64_POP(tmp1, tmp2, A64_SP), ctx);
emit(A64_POP(r8, r9, A64_SP), ctx);
emit(A64_POP(r6, r7, A64_SP), ctx);
 
@@ -317,7 +308,6 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
emit(A64_UDIV(is64, dst, dst, src), ctx);
break;
case BPF_MOD:
-   ctx->tmp_used = 1;
emit(A64_UDIV(is64, tmp, dst, src), ctx);
emit(A64_MUL(is64, tmp, tmp, src), ctx);
emit(A64_SUB(is64, dst, dst, tmp), ctx);
@@ -390,49 +380,41 @@ emit_bswap_uxt:
/* dst = dst OP imm */
case BPF_ALU | BPF_ADD | BPF_K:
case BPF_ALU64 | BPF_ADD | BPF_K:
-   ctx->tmp_used = 1;
emit_a64_mov_i(is64, tmp, imm, ctx);
emit(A64_ADD(is64, dst, dst, tmp), ctx);
break;
case BPF_ALU | BPF_SUB | BPF_K:
case BPF_ALU64 | BPF_SUB | BPF_K:
-   ctx->tmp_used = 1;
emit_a64_mov_i(is64, tmp, imm, ctx);
emit(A64_SUB(is64, dst, dst, tmp), ctx);
break;
case BPF_ALU | BPF_AND | BPF_K:
case BPF_ALU64 | BPF_AND | BPF_K:
-   ctx->tmp_used = 1;
emit_a64_mov_i(is64, tmp, imm, ctx);
emit(A64_AND(is64, dst, dst, tmp), ctx);
break;
case BPF_ALU | BPF_OR | BPF_K:
case BPF_ALU64 | BPF_OR | BPF_K:
-   ctx->tmp_used = 1;
emit_a64_mov_i(is64, tmp, imm, ctx);
emit(A64_ORR(is64, dst, dst, tmp), ctx);
break;
case BPF_ALU | BPF_XOR | BPF_K:
case BPF_ALU64 | BPF_XOR | BPF_K:
-   ctx->tmp_used = 1;
emit_a64_mov_i(is64, tmp, imm, ctx);
emit(A64_EOR(is64, dst, dst, tmp), ctx);
break;
case BPF_ALU | BPF_MUL | BPF_K:
case BPF_ALU64 | BPF_MUL | BPF_K:
-   ctx->tmp_used = 1;

Re: [PATCH v2] r8169: default to 64-bit DMA on recent PCIe chips

2016-05-16 Thread Francois Romieu

Ard Biesheuvel  :
[...]
> This is a followup to 'r8169: default to 64-bit DMA on systems without memory
> below 4 GB' [1]. At the request of Francois, this version bases the decision
> whether to use 64-bit DMA by default on whether the device is PCIe and
> sufficiently recent, rather than whether the platform requires 64-bit DMA
> because it does not have any memory below 4 GB to begin with. This is safer,
> since it will prevent the use of such problematic cards on these platforms.

Testing has not been conclusive. It apparently works but I have not been
able to set addresses above 4Gb for the Rx or Tx descriptor rings yet.

-- 
Ueimor

Re: [PATCH v6 net-next 09/14] ip6_tun: Add infrastructure for doing encapsulation

2016-05-16 Thread Alexander Duyck

On Mon, May 16, 2016 at 3:37 PM, Tom Herbert  wrote:
> On Mon, May 16, 2016 at 3:25 PM, Alexander Duyck
>  wrote:
>> On Mon, May 16, 2016 at 2:33 PM, Tom Herbert  wrote:
>>> Add encap_hlen and ip_tunnel_encap structure to ip6_tnl. Add functions
>>> for getting encap hlen, setting up encap on a tunnel, performing
>>> encapsulation operation.
>>>
>>> Signed-off-by: Tom Herbert 
>>> ---
>>>  include/net/ip6_tunnel.h  | 58 
>>>  net/ipv4/ip_tunnel_core.c |  5 +++
>>>  net/ipv6/ip6_tunnel.c | 85 
>>> ++-
>>>  3 files changed, 139 insertions(+), 9 deletions(-)
>>>
>>
>> So it looks like you completely dropped the two spots that were
>> updating mtu and max_headroom with the t->hlen.  I thought you needed
>> to at least have a check that used t->encap_hlen here in order to
>> avoid overflowing the buffer or exceeding skb_headroom, or am I
>> missing something?
>>
> Sorry, you're probably right. max_headroom seems to be an absolute
> value. mtu being calculated seems relative to what is in skbuff
> already.

The second invocation of max_headroom is an absolute value.  The first
one is used to measure if there is enough space for the headers we
will need to add.  My thought is that is why we need encap_hlen to be
added to the first case so we make sure there is enough room for the
UDP header if one is present plus the IPv6 header.

Also I just found another issue in this patch.  In ip6_tnl_dev_setup
you can probably just drop all references to "t" since you only assign
the pointer but you never actually access it.  I only noticed because
I was looking at adding support for TSO to the tunnel itself.

- Alex

Re: task_diag: add a new interface to get information about processes

2016-05-16 Thread Andrew Vagin

On Wed, May 04, 2016 at 08:39:51PM -0700, Andy Lutomirski wrote:
> 
> Linus, this is Yet Another Credential Fuckup, except that it hasn't
> happened yet, so it's okay.  The tl;dr is that Andrey wants to add an
> interface to ask a pidns some questions, and netlink looks natural,
> except that using netlink sockets to interrogate a pidns seems rather
> problematic.  I would also love to see a decent interface for
> interrogating user namespaces, and again, netlink would be great,
> except that it's a socket and makes no sense in this context.
> 
> Netlink had, and possibly still has, tons of serious security bugs
> involving code checking send() callers' creds.  I found and fixed a
> few a couple years ago.  To reiterate once again, send() CANNOT use
> caller creds safely.  (I feel like I say this once every few weeks.
> It's getting old.)
> 
> I realize that it's convenient to use a socket as a context to keep
> state between syscalls, but it has some annoying side effects:
> 
>  - It makes people want to rely on send()'s caller's creds.
> 
>  - It's miserable in combination with seccomp.
> 
>  - It doesn't play nicely with namespaces.
> 
>  - It makes me wonder why things like task_diag, which have nothing to
> do with networking, seem to get tangled up with networking.
> 
> 
> Would it be worth considering adding a parallel interface, using it
> for new things, and slowly migrating old use cases over?
> 
> int issue_kernel_command(int ns, int command, const struct iovec *iov,
> int iovcnt, int flags);
> 
> ns is an actual namespace fd or:
> 
> KERNEL_COMMAND_CURRENT_NETNS
> KERNEL_COMMAND_CURRENT_PIDNS
> etc, or a special one:
> KERNEL_COMMAND_GLOBAL.  KERNEL_COMMAND_GLOBAL can't be used in a
> non-root namespace.

An request can depend on a few namespaces. For example, we can request
credentials for a specified task. In this case we may want to specify
pid and user namespace.

> 
> KERNEL_COMMAND_GLOBAL works even for namespaced things, if the
> relevant current ns is the init namespace.  (This feature is optional,
> but it would allow gradually namespacing global things.)
> 
> command is an enumerated command.  Each command implies a namespace
> type, and, if you feed this thing the wrong namespace type, you get
> EINVAL.  The high bit of command indicates whether it's read-only
> command.
> 
> iov gives a command in the format expected, which, for the most part,
> would be a netlink message.
> 
> The return value is an fd that you can call read/readv on to read the
> response.  It's not a socket (or at least you can't do normal socket
> operations on it if it is a socket behind the scenes).  The
> implementation of read() promises *not* to look at caller creds.  The
> returned fd is unconditionally cloexec -- it's 2016 already.  Sheesh.
> 
> When you've read all the data, all you can do is close the fd.  You
> can't issue another command on the same fd.  You also can't call
> write() or send() on the fd unless someone has a good reason why you
> should be able to and why it's safe.  You can't issue another command
> on the same fd.
> 
> 
> I imagine that the implementation could re-use a bunch of netlink code
> under the hood.

I'm agree with this interface. For me it's interesting to know an
opinion from the other side. Stephen, could you share you comments
about these netlink issues and this new interface?

Thanks,
Andrew

> 
> 
> --Andy

Re: [PATCH v6 net-next 09/14] ip6_tun: Add infrastructure for doing encapsulation

2016-05-16 Thread Tom Herbert

On Mon, May 16, 2016 at 3:25 PM, Alexander Duyck
 wrote:
> On Mon, May 16, 2016 at 2:33 PM, Tom Herbert  wrote:
>> Add encap_hlen and ip_tunnel_encap structure to ip6_tnl. Add functions
>> for getting encap hlen, setting up encap on a tunnel, performing
>> encapsulation operation.
>>
>> Signed-off-by: Tom Herbert 
>> ---
>>  include/net/ip6_tunnel.h  | 58 
>>  net/ipv4/ip_tunnel_core.c |  5 +++
>>  net/ipv6/ip6_tunnel.c | 85 
>> ++-
>>  3 files changed, 139 insertions(+), 9 deletions(-)
>>
>
> So it looks like you completely dropped the two spots that were
> updating mtu and max_headroom with the t->hlen.  I thought you needed
> to at least have a check that used t->encap_hlen here in order to
> avoid overflowing the buffer or exceeding skb_headroom, or am I
> missing something?
>
Sorry, you're probably right. max_headroom seems to be an absolute
value. mtu being calculated seems relative to what is in skbuff
already.

Tom

> - Alex

[PATCH] net: don't lose features in netdev_add_tso_features()

2016-05-16 Thread Dimitris Michailidis

The goal of netdev_add_tso_features() is to enable all TSO features but
it unintentionally loses NETIF_F_ALL_FOR_ALL features. This is because
the netdev_increment_features() it calls clears any NETIF_F_ALL_FOR_ALL
bits that aren't included in the incremental features and none of them
are included in NETIF_F_ALL_TSO. The behavior can be seen by enabling
tx-nocache-copy on the slaves and noticing the feature remains off at
the master.

Fix this by including NETIF_F_ALL_FOR_ALL in the incremental features.

Signed-off-by: Dave Platt 
Signed-off-by: Dimitris Michailidis 
---
 include/linux/netdevice.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c2f5112..da45388 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3978,7 +3978,12 @@ netdev_features_t 
netdev_increment_features(netdev_features_t all,
 static inline netdev_features_t netdev_add_tso_features(netdev_features_t 
features,
netdev_features_t mask)
 {
-   return netdev_increment_features(features, NETIF_F_ALL_TSO, mask);
+   /* OR in NETIF_F_ALL_FOR_ALL to preserve any of its bits already present
+* in features
+*/
+   return netdev_increment_features(features,
+NETIF_F_ALL_TSO | NETIF_F_ALL_FOR_ALL,
+mask);
 }
 
 int __netdev_update_features(struct net_device *dev);
-- 
2.8.0.rc3.226.g39d4020

[PATCH 1/2] net: ethernet: fec-mpc52xx: use phydev from struct net_device

2016-05-16 Thread Philippe Reynes

The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/freescale/fec_mpc52xx.c |   43 --
 1 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_mpc52xx.c 
b/drivers/net/ethernet/freescale/fec_mpc52xx.c
index f444714..bcf0600 100644
--- a/drivers/net/ethernet/freescale/fec_mpc52xx.c
+++ b/drivers/net/ethernet/freescale/fec_mpc52xx.c
@@ -66,7 +66,6 @@ struct mpc52xx_fec_priv {
/* MDIO link details */
unsigned int mdio_speed;
struct device_node *phy_node;
-   struct phy_device *phydev;
enum phy_state link;
int seven_wire_mode;
 };
@@ -165,7 +164,7 @@ static int mpc52xx_fec_alloc_rx_buffers(struct net_device 
*dev, struct bcom_task
 static void mpc52xx_fec_adjust_link(struct net_device *dev)
 {
struct mpc52xx_fec_priv *priv = netdev_priv(dev);
-   struct phy_device *phydev = priv->phydev;
+   struct phy_device *phydev = dev->phydev;
int new_state = 0;
 
if (phydev->link != PHY_DOWN) {
@@ -215,16 +214,17 @@ static void mpc52xx_fec_adjust_link(struct net_device 
*dev)
 static int mpc52xx_fec_open(struct net_device *dev)
 {
struct mpc52xx_fec_priv *priv = netdev_priv(dev);
+   struct phy_device *phydev = NULL;
int err = -EBUSY;
 
if (priv->phy_node) {
-   priv->phydev = of_phy_connect(priv->ndev, priv->phy_node,
- mpc52xx_fec_adjust_link, 0, 0);
-   if (!priv->phydev) {
+   phydev = of_phy_connect(priv->ndev, priv->phy_node,
+   mpc52xx_fec_adjust_link, 0, 0);
+   if (!phydev) {
dev_err(>dev, "of_phy_connect failed\n");
return -ENODEV;
}
-   phy_start(priv->phydev);
+   phy_start(phydev);
}
 
if (request_irq(dev->irq, mpc52xx_fec_interrupt, IRQF_SHARED,
@@ -268,10 +268,9 @@ static int mpc52xx_fec_open(struct net_device *dev)
  free_ctrl_irq:
free_irq(dev->irq, dev);
  free_phy:
-   if (priv->phydev) {
-   phy_stop(priv->phydev);
-   phy_disconnect(priv->phydev);
-   priv->phydev = NULL;
+   if (phydev) {
+   phy_stop(phydev);
+   phy_disconnect(phydev);
}
 
return err;
@@ -280,6 +279,7 @@ static int mpc52xx_fec_open(struct net_device *dev)
 static int mpc52xx_fec_close(struct net_device *dev)
 {
struct mpc52xx_fec_priv *priv = netdev_priv(dev);
+   struct phy_device *phydev = dev->phydev;
 
netif_stop_queue(dev);
 
@@ -291,11 +291,10 @@ static int mpc52xx_fec_close(struct net_device *dev)
free_irq(priv->r_irq, dev);
free_irq(priv->t_irq, dev);
 
-   if (priv->phydev) {
+   if (phydev) {
/* power down phy */
-   phy_stop(priv->phydev);
-   phy_disconnect(priv->phydev);
-   priv->phydev = NULL;
+   phy_stop(phydev);
+   phy_disconnect(phydev);
}
 
return 0;
@@ -766,10 +765,9 @@ static void mpc52xx_fec_reset(struct net_device *dev)
 static int mpc52xx_fec_get_ksettings(struct net_device *dev,
 struct ethtool_link_ksettings *cmd)
 {
-   struct mpc52xx_fec_priv *priv = netdev_priv(dev);
-   struct phy_device *phydev = priv->phydev;
+   struct phy_device *phydev = dev->phydev;
 
-   if (!priv->phydev)
+   if (!phydev)
return -ENODEV;
 
return phy_ethtool_ksettings_get(phydev, cmd);
@@ -778,10 +776,9 @@ static int mpc52xx_fec_get_ksettings(struct net_device 
*dev,
 static int mpc52xx_fec_set_ksettings(struct net_device *dev,
 const struct ethtool_link_ksettings *cmd)
 {
-   struct mpc52xx_fec_priv *priv = netdev_priv(dev);
-   struct phy_device *phydev = priv->phydev;
+   struct phy_device *phydev = dev->phydev;
 
-   if (!priv->phydev)
+   if (!phydev)
return -ENODEV;
 
return phy_ethtool_ksettings_set(phydev, cmd);
@@ -811,12 +808,12 @@ static const struct ethtool_ops mpc52xx_fec_ethtool_ops = 
{
 
 static int mpc52xx_fec_ioctl(struct net_device *dev, struct ifreq *rq, int cmd)
 {
-   struct mpc52xx_fec_priv *priv = netdev_priv(dev);
+   struct phy_device *phydev = dev->phydev;
 
-   if (!priv->phydev)
+   if (!phydev)
return -ENOTSUPP;
 
-   return phy_mii_ioctl(priv->phydev, rq, cmd);
+   return phy_mii_ioctl(phydev, rq, cmd);
 }
 
 static const struct net_device_ops mpc52xx_fec_netdev_ops = {
-- 
1.7.4.4

[PATCH 2/2] net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings

2016-05-16 Thread Philippe Reynes

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/freescale/fec_mpc52xx.c |   26 ++
 1 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_mpc52xx.c 
b/drivers/net/ethernet/freescale/fec_mpc52xx.c
index bcf0600..446ae9d 100644
--- a/drivers/net/ethernet/freescale/fec_mpc52xx.c
+++ b/drivers/net/ethernet/freescale/fec_mpc52xx.c
@@ -762,28 +762,6 @@ static void mpc52xx_fec_reset(struct net_device *dev)
 
 /* ethtool interface */
 
-static int mpc52xx_fec_get_ksettings(struct net_device *dev,
-struct ethtool_link_ksettings *cmd)
-{
-   struct phy_device *phydev = dev->phydev;
-
-   if (!phydev)
-   return -ENODEV;
-
-   return phy_ethtool_ksettings_get(phydev, cmd);
-}
-
-static int mpc52xx_fec_set_ksettings(struct net_device *dev,
-const struct ethtool_link_ksettings *cmd)
-{
-   struct phy_device *phydev = dev->phydev;
-
-   if (!phydev)
-   return -ENODEV;
-
-   return phy_ethtool_ksettings_set(phydev, cmd);
-}
-
 static u32 mpc52xx_fec_get_msglevel(struct net_device *dev)
 {
struct mpc52xx_fec_priv *priv = netdev_priv(dev);
@@ -801,8 +779,8 @@ static const struct ethtool_ops mpc52xx_fec_ethtool_ops = {
.get_msglevel = mpc52xx_fec_get_msglevel,
.set_msglevel = mpc52xx_fec_set_msglevel,
.get_ts_info = ethtool_op_get_ts_info,
-   .get_link_ksettings = mpc52xx_fec_get_ksettings,
-   .set_link_ksettings = mpc52xx_fec_set_ksettings,
+   .get_link_ksettings = phy_ethtool_get_link_ksettings,
+   .set_link_ksettings = phy_ethtool_set_link_ksettings,
 };
 
 
-- 
1.7.4.4

RE: [Intel-wired-lan] [PATCH] e1000e: prevent division by zero if TIMINCA is zero

2016-05-16 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Denys Vlasenko
> Sent: Friday, May 6, 2016 12:42 PM
> To: Kirsher, Jeffrey T 
> Cc: intel-wired-...@lists.osuosl.org; Denys Vlasenko
> ; LKML ;
> netdev@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH] e1000e: prevent division by zero if
> TIMINCA is zero
> 
> Users report that under VMWare, er32(TIMINCA) returns zero.
> This causes division by zero at init time as follows:
> 
>  ==>incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK;
> for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) {
> /* latch SYSTIMH on read of SYSTIML */
> systim_next = (cycle_t)er32(SYSTIML);
> systim_next |= (cycle_t)er32(SYSTIMH) << 32;
> 
> time_delta = systim_next - systim;
> temp = time_delta;
>  >  rem = do_div(temp, incvalue);
> 
> This change makes kernel survive this, and users report that
> NIC does work after this change.
> 
> Since on real hardware incvalue is never zero, this should not affect
> real hardware use case.
> 
> Signed-off-by: Denys Vlasenko 
> CC: Jeff Kirsher 
> CC: "Ruinskiy, Dima" 
> CC: intel-wired-...@lists.osuosl.org
> CC: netdev@vger.kernel.org
> CC: LKML 
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

As Mark Rustad pointed out I recall this was earlier rejected as something that 
is a VMWare error and it should be fixed there so that existing VMs will start 
working without installing a new driver.  Having said that, it does not seem to 
be causing any harm in my testing, so...

Tested-by: Aaron Brown

Re: [PATCH v6 net-next 09/14] ip6_tun: Add infrastructure for doing encapsulation

2016-05-16 Thread Alexander Duyck

On Mon, May 16, 2016 at 2:33 PM, Tom Herbert  wrote:
> Add encap_hlen and ip_tunnel_encap structure to ip6_tnl. Add functions
> for getting encap hlen, setting up encap on a tunnel, performing
> encapsulation operation.
>
> Signed-off-by: Tom Herbert 
> ---
>  include/net/ip6_tunnel.h  | 58 
>  net/ipv4/ip_tunnel_core.c |  5 +++
>  net/ipv6/ip6_tunnel.c | 85 
> ++-
>  3 files changed, 139 insertions(+), 9 deletions(-)
>

So it looks like you completely dropped the two spots that were
updating mtu and max_headroom with the t->hlen.  I thought you needed
to at least have a check that used t->encap_hlen here in order to
avoid overflowing the buffer or exceeding skb_headroom, or am I
missing something?

- Alex

[Patch net] net_sched: close another race condition in tcf_mirred_release()

2016-05-16 Thread Cong Wang

We saw the following extra refcount release on veth device:

  kernel: [7957821.463992] unregister_netdevice: waiting for mesos50284 to 
become free. Usage count = -1

Since we heavily use mirred action to redirect packets to veth, I think
this is caused by the following race condition:

CPU0:
tcf_mirred_release(): (in RCU callback)
struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1);

CPU1:
mirred_device_event():
spin_lock_bh(_list_lock);
list_for_each_entry(m, _list, tcfm_list) {
if (rcu_access_pointer(m->tcfm_dev) == dev) {
dev_put(dev);
/* Note : no rcu grace period necessary, as
 * net_device are already rcu protected.
 */
RCU_INIT_POINTER(m->tcfm_dev, NULL);
}
}
spin_unlock_bh(_list_lock);

CPU0:
tcf_mirred_release():
spin_lock_bh(_list_lock);
list_del(>tcfm_list);
spin_unlock_bh(_list_lock);
if (dev)   // < Stil refers to the old m->tcfm_dev
dev_put(dev);  // < dev_put() is called on it again

The action init code path is good because it is impossible to modify
an action that is being removed.

So, fix this by moving everything under the spinlock.

Fixes: 2ee22a90c7af ("net_sched: act_mirred: remove spinlock in fast path")
Fixes: 6bd00b850635 ("act_mirred: fix a race condition on mirred_list")
Cc: Jamal Hadi Salim 
Signed-off-by: Cong Wang 
---
 net/sched/act_mirred.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 8f3948d..78db6d4 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -36,14 +36,15 @@ static DEFINE_SPINLOCK(mirred_list_lock);
 static void tcf_mirred_release(struct tc_action *a, int bind)
 {
struct tcf_mirred *m = to_mirred(a);
-   struct net_device *dev = rcu_dereference_protected(m->tcfm_dev, 1);
+   struct net_device *dev;
 
/* We could be called either in a RCU callback or with RTNL lock held. 
*/
spin_lock_bh(_list_lock);
list_del(>tcfm_list);
-   spin_unlock_bh(_list_lock);
+   dev = rcu_dereference_protected(m->tcfm_dev, 1);
if (dev)
dev_put(dev);
+   spin_unlock_bh(_list_lock);
 }
 
 static const struct nla_policy mirred_policy[TCA_MIRRED_MAX + 1] = {
-- 
2.1.0

Re: [PATCH net-next] bpf, doc: fix typo on bpf_asm descriptions

2016-05-16 Thread Alexei Starovoitov

On Mon, May 16, 2016 at 11:06:53PM +0200, Daniel Borkmann wrote:
> Fix description of some of the bpf_asm tool related jump instructions
> and generally move them to format A  k.
> 
> Reported-by: Sebastian Amend 
> Signed-off-by: Daniel Borkmann 

Acked-by: Alexei Starovoitov

[PATCH v6 net-next 01/14] gso: Remove arbitrary checks for unsupported GSO

2016-05-16 Thread Tom Herbert

In several gso_segment functions there are checks of gso_type against
a seemingly arbitrary list of SKB_GSO_* flags. This seems like an
attempt to identify unsupported GSO types, but since the stack is
the one that set these GSO types in the first place this seems
unnecessary to do. If a combination isn't valid in the first
place that stack should not allow setting it.

This is a code simplication especially for add new GSO types.

Signed-off-by: Tom Herbert 
---
 net/ipv4/af_inet.c | 18 --
 net/ipv4/gre_offload.c | 14 --
 net/ipv4/tcp_offload.c | 19 ---
 net/ipv4/udp_offload.c | 10 --
 net/ipv6/ip6_offload.c | 18 --
 net/ipv6/udp_offload.c | 13 -
 net/mpls/mpls_gso.c| 11 +--
 7 files changed, 1 insertion(+), 102 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 2e6e65f..7f08d45 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1205,24 +1205,6 @@ static struct sk_buff *inet_gso_segment(struct sk_buff 
*skb,
int ihl;
int id;
 
-   if (unlikely(skb_shinfo(skb)->gso_type &
-~(SKB_GSO_TCPV4 |
-  SKB_GSO_UDP |
-  SKB_GSO_DODGY |
-  SKB_GSO_TCP_ECN |
-  SKB_GSO_GRE |
-  SKB_GSO_GRE_CSUM |
-  SKB_GSO_IPIP |
-  SKB_GSO_SIT |
-  SKB_GSO_TCPV6 |
-  SKB_GSO_UDP_TUNNEL |
-  SKB_GSO_UDP_TUNNEL_CSUM |
-  SKB_GSO_TCP_FIXEDID |
-  SKB_GSO_TUNNEL_REMCSUM |
-  SKB_GSO_PARTIAL |
-  0)))
-   goto out;
-
skb_reset_network_header(skb);
nhoff = skb_network_header(skb) - skb_mac_header(skb);
if (unlikely(!pskb_may_pull(skb, sizeof(*iph
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index e88190a..ecd1e09 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -26,20 +26,6 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
int gre_offset, outer_hlen;
bool need_csum, ufo;
 
-   if (unlikely(skb_shinfo(skb)->gso_type &
-   ~(SKB_GSO_TCPV4 |
- SKB_GSO_TCPV6 |
- SKB_GSO_UDP |
- SKB_GSO_DODGY |
- SKB_GSO_TCP_ECN |
- SKB_GSO_TCP_FIXEDID |
- SKB_GSO_GRE |
- SKB_GSO_GRE_CSUM |
- SKB_GSO_IPIP |
- SKB_GSO_SIT |
- SKB_GSO_PARTIAL)))
-   goto out;
-
if (!skb->encapsulation)
goto out;
 
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 02737b6..5c59649 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -83,25 +83,6 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 
if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
/* Packet is from an untrusted source, reset gso_segs. */
-   int type = skb_shinfo(skb)->gso_type;
-
-   if (unlikely(type &
-~(SKB_GSO_TCPV4 |
-  SKB_GSO_DODGY |
-  SKB_GSO_TCP_ECN |
-  SKB_GSO_TCP_FIXEDID |
-  SKB_GSO_TCPV6 |
-  SKB_GSO_GRE |
-  SKB_GSO_GRE_CSUM |
-  SKB_GSO_IPIP |
-  SKB_GSO_SIT |
-  SKB_GSO_UDP_TUNNEL |
-  SKB_GSO_UDP_TUNNEL_CSUM |
-  SKB_GSO_TUNNEL_REMCSUM |
-  0) ||
-!(type & (SKB_GSO_TCPV4 |
-  SKB_GSO_TCPV6
-   goto out;
 
skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss);
 
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 6b7459c..81f253b 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -209,16 +209,6 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff 
*skb,
 
if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
/* Packet is from an untrusted source, reset gso_segs. */
-   int type = skb_shinfo(skb)->gso_type;
-
-   if (unlikely(type & ~(SKB_GSO_UDP | SKB_GSO_DODGY |
- SKB_GSO_UDP_TUNNEL |
- SKB_GSO_UDP_TUNNEL_CSUM |
- SKB_GSO_TUNNEL_REMCSUM |
- SKB_GSO_IPIP |
-

[PATCH v6 net-next 10/14] fou: Add encap ops for IPv6 tunnels

2016-05-16 Thread Tom Herbert

This patch add a new fou6 module that provides encapsulation
operations for IPv6.

Signed-off-by: Tom Herbert 
---
 include/net/fou.h |   2 +-
 net/ipv6/Makefile |   1 +
 net/ipv6/fou6.c   | 140 ++
 3 files changed, 142 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv6/fou6.c

diff --git a/include/net/fou.h b/include/net/fou.h
index 7d2fda2..f5cc691 100644
--- a/include/net/fou.h
+++ b/include/net/fou.h
@@ -9,7 +9,7 @@
 #include 
 
 size_t fou_encap_hlen(struct ip_tunnel_encap *e);
-static size_t gue_encap_hlen(struct ip_tunnel_encap *e);
+size_t gue_encap_hlen(struct ip_tunnel_encap *e);
 
 int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
   u8 *protocol, __be16 *sport, int type);
diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
index 5e9d6bf..7ec3129 100644
--- a/net/ipv6/Makefile
+++ b/net/ipv6/Makefile
@@ -42,6 +42,7 @@ obj-$(CONFIG_IPV6_VTI) += ip6_vti.o
 obj-$(CONFIG_IPV6_SIT) += sit.o
 obj-$(CONFIG_IPV6_TUNNEL) += ip6_tunnel.o
 obj-$(CONFIG_IPV6_GRE) += ip6_gre.o
+obj-$(CONFIG_NET_FOU) += fou6.o
 
 obj-y += addrconf_core.o exthdrs_core.o ip6_checksum.o ip6_icmp.o
 obj-$(CONFIG_INET) += output_core.o protocol.o $(ipv6-offload)
diff --git a/net/ipv6/fou6.c b/net/ipv6/fou6.c
new file mode 100644
index 000..c972d0b
--- /dev/null
+++ b/net/ipv6/fou6.c
@@ -0,0 +1,140 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static void fou6_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e,
+  struct flowi6 *fl6, u8 *protocol, __be16 sport)
+{
+   struct udphdr *uh;
+
+   skb_push(skb, sizeof(struct udphdr));
+   skb_reset_transport_header(skb);
+
+   uh = udp_hdr(skb);
+
+   uh->dest = e->dport;
+   uh->source = sport;
+   uh->len = htons(skb->len);
+   udp6_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM6), skb,
+ >saddr, >daddr, skb->len);
+
+   *protocol = IPPROTO_UDP;
+}
+
+int fou6_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+ u8 *protocol, struct flowi6 *fl6)
+{
+   __be16 sport;
+   int err;
+   int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM6 ?
+   SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
+
+   err = __fou_build_header(skb, e, protocol, , type);
+   if (err)
+   return err;
+
+   fou6_build_udp(skb, e, fl6, protocol, sport);
+
+   return 0;
+}
+EXPORT_SYMBOL(fou6_build_header);
+
+int gue6_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+ u8 *protocol, struct flowi6 *fl6)
+{
+   __be16 sport;
+   int err;
+   int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM6 ?
+   SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
+
+   err = __gue_build_header(skb, e, protocol, , type);
+   if (err)
+   return err;
+
+   fou6_build_udp(skb, e, fl6, protocol, sport);
+
+   return 0;
+}
+EXPORT_SYMBOL(gue6_build_header);
+
+#ifdef CONFIG_NET_FOU_IP_TUNNELS
+
+static const struct ip6_tnl_encap_ops fou_ip6tun_ops = {
+   .encap_hlen = fou_encap_hlen,
+   .build_header = fou6_build_header,
+};
+
+static const struct ip6_tnl_encap_ops gue_ip6tun_ops = {
+   .encap_hlen = gue_encap_hlen,
+   .build_header = gue6_build_header,
+};
+
+static int ip6_tnl_encap_add_fou_ops(void)
+{
+   int ret;
+
+   ret = ip6_tnl_encap_add_ops(_ip6tun_ops, TUNNEL_ENCAP_FOU);
+   if (ret < 0) {
+   pr_err("can't add fou6 ops\n");
+   return ret;
+   }
+
+   ret = ip6_tnl_encap_add_ops(_ip6tun_ops, TUNNEL_ENCAP_GUE);
+   if (ret < 0) {
+   pr_err("can't add gue6 ops\n");
+   ip6_tnl_encap_del_ops(_ip6tun_ops, TUNNEL_ENCAP_FOU);
+   return ret;
+   }
+
+   return 0;
+}
+
+static void ip6_tnl_encap_del_fou_ops(void)
+{
+   ip6_tnl_encap_del_ops(_ip6tun_ops, TUNNEL_ENCAP_FOU);
+   ip6_tnl_encap_del_ops(_ip6tun_ops, TUNNEL_ENCAP_GUE);
+}
+
+#else
+
+static int ip6_tnl_encap_add_fou_ops(void)
+{
+   return 0;
+}
+
+static void ip6_tnl_encap_del_fou_ops(void)
+{
+}
+
+#endif
+
+static int __init fou6_init(void)
+{
+   int ret;
+
+   ret = ip6_tnl_encap_add_fou_ops();
+
+   return ret;
+}
+
+static void __exit fou6_fini(void)
+{
+   ip6_tnl_encap_del_fou_ops();
+}
+
+module_init(fou6_init);
+module_exit(fou6_fini);
+MODULE_AUTHOR("Tom Herbert ");
+MODULE_LICENSE("GPL");
-- 
2.8.0.rc2

[PATCH v6 net-next 11/14] ip6_gre: Add support for fou/gue encapsulation

2016-05-16 Thread Tom Herbert

Add netlink and setup for encapsulation

Signed-off-by: Tom Herbert 
---
 net/ipv6/ip6_gre.c | 79 +++---
 1 file changed, 75 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 4541fa5..6fb1b89 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -729,7 +729,7 @@ static void ip6gre_tnl_link_config(struct ip6_tnl *t, int 
set_mtu)
 
t->tun_hlen = gre_calc_hlen(t->parms.o_flags);
 
-   t->hlen = t->tun_hlen;
+   t->hlen = t->encap_hlen + t->tun_hlen;
 
t_hlen = t->hlen + sizeof(struct ipv6hdr);
 
@@ -1022,9 +1022,7 @@ static int ip6gre_tunnel_init_common(struct net_device 
*dev)
}
 
tunnel->tun_hlen = gre_calc_hlen(tunnel->parms.o_flags);
-
-   tunnel->hlen = tunnel->tun_hlen;
-
+   tunnel->hlen = tunnel->tun_hlen + tunnel->encap_hlen;
t_hlen = tunnel->hlen + sizeof(struct ipv6hdr);
 
dev->hard_header_len = LL_MAX_HEADER + t_hlen;
@@ -1290,15 +1288,57 @@ static void ip6gre_tap_setup(struct net_device *dev)
dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 }
 
+static bool ip6gre_netlink_encap_parms(struct nlattr *data[],
+  struct ip_tunnel_encap *ipencap)
+{
+   bool ret = false;
+
+   memset(ipencap, 0, sizeof(*ipencap));
+
+   if (!data)
+   return ret;
+
+   if (data[IFLA_GRE_ENCAP_TYPE]) {
+   ret = true;
+   ipencap->type = nla_get_u16(data[IFLA_GRE_ENCAP_TYPE]);
+   }
+
+   if (data[IFLA_GRE_ENCAP_FLAGS]) {
+   ret = true;
+   ipencap->flags = nla_get_u16(data[IFLA_GRE_ENCAP_FLAGS]);
+   }
+
+   if (data[IFLA_GRE_ENCAP_SPORT]) {
+   ret = true;
+   ipencap->sport = nla_get_be16(data[IFLA_GRE_ENCAP_SPORT]);
+   }
+
+   if (data[IFLA_GRE_ENCAP_DPORT]) {
+   ret = true;
+   ipencap->dport = nla_get_be16(data[IFLA_GRE_ENCAP_DPORT]);
+   }
+
+   return ret;
+}
+
 static int ip6gre_newlink(struct net *src_net, struct net_device *dev,
struct nlattr *tb[], struct nlattr *data[])
 {
struct ip6_tnl *nt;
struct net *net = dev_net(dev);
struct ip6gre_net *ign = net_generic(net, ip6gre_net_id);
+   struct ip_tunnel_encap ipencap;
int err;
 
nt = netdev_priv(dev);
+
+   if (ip6gre_netlink_encap_parms(data, )) {
+   int err = ip6_tnl_encap_setup(nt, );
+
+   if (err < 0)
+   return err;
+   }
+
ip6gre_netlink_parms(data, >parms);
 
if (ip6gre_tunnel_find(net, >parms, dev->type))
@@ -1345,10 +1385,18 @@ static int ip6gre_changelink(struct net_device *dev, 
struct nlattr *tb[],
struct net *net = nt->net;
struct ip6gre_net *ign = net_generic(net, ip6gre_net_id);
struct __ip6_tnl_parm p;
+   struct ip_tunnel_encap ipencap;
 
if (dev == ign->fb_tunnel_dev)
return -EINVAL;
 
+   if (ip6gre_netlink_encap_parms(data, )) {
+   int err = ip6_tnl_encap_setup(nt, );
+
+   if (err < 0)
+   return err;
+   }
+
ip6gre_netlink_parms(data, );
 
t = ip6gre_tunnel_locate(net, , 0);
@@ -1400,6 +1448,14 @@ static size_t ip6gre_get_size(const struct net_device 
*dev)
nla_total_size(4) +
/* IFLA_GRE_FLAGS */
nla_total_size(4) +
+   /* IFLA_GRE_ENCAP_TYPE */
+   nla_total_size(2) +
+   /* IFLA_GRE_ENCAP_FLAGS */
+   nla_total_size(2) +
+   /* IFLA_GRE_ENCAP_SPORT */
+   nla_total_size(2) +
+   /* IFLA_GRE_ENCAP_DPORT */
+   nla_total_size(2) +
0;
 }
 
@@ -1422,6 +1478,17 @@ static int ip6gre_fill_info(struct sk_buff *skb, const 
struct net_device *dev)
nla_put_be32(skb, IFLA_GRE_FLOWINFO, p->flowinfo) ||
nla_put_u32(skb, IFLA_GRE_FLAGS, p->flags))
goto nla_put_failure;
+
+   if (nla_put_u16(skb, IFLA_GRE_ENCAP_TYPE,
+   t->encap.type) ||
+   nla_put_be16(skb, IFLA_GRE_ENCAP_SPORT,
+t->encap.sport) ||
+   nla_put_be16(skb, IFLA_GRE_ENCAP_DPORT,
+t->encap.dport) ||
+   nla_put_u16(skb, IFLA_GRE_ENCAP_FLAGS,
+   t->encap.flags))
+   goto nla_put_failure;
+
return 0;
 
 nla_put_failure:
@@ -1440,6 +1507,10 @@ static const struct nla_policy 
ip6gre_policy[IFLA_GRE_MAX + 1] = {
[IFLA_GRE_ENCAP_LIMIT] = { .type = NLA_U8 },
[IFLA_GRE_FLOWINFO]= { .type = NLA_U32 },
[IFLA_GRE_FLAGS]   = { .type = NLA_U32 },
+   [IFLA_GRE_ENCAP_TYPE]   = { .type = NLA_U16 },
+   [IFLA_GRE_ENCAP_FLAGS]  = { .type = NLA_U16 },
+   [IFLA_GRE_ENCAP_SPORT]  = { .type = NLA_U16 },
+

[PATCH v6 net-next 03/14] ipv6: Fix nexthdr for reinjection

2016-05-16 Thread Tom Herbert

In ip6_input_finish the nexthdr protocol is retrieved from the
next header offset that is returned in the cb of the skb.
This method does not work for UDP encapsulation that may not
even have a concept of a nexthdr field (e.g. FOU).

This patch checks for a final protocol (INET6_PROTO_FINAL) when a
protocol handler returns > 0. If the protocol is not final then
resubmission is performed on nhoff value. If the protocol is final
then the nexthdr is taken to be the return value.

Signed-off-by: Tom Herbert 
---
 net/ipv6/ip6_input.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index f185cbc..d35dff2 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -236,6 +236,7 @@ resubmit:
nhoff = IP6CB(skb)->nhoff;
nexthdr = skb_network_header(skb)[nhoff];
 
+resubmit_final:
raw = raw6_local_deliver(skb, nexthdr);
ipprot = rcu_dereference(inet6_protos[nexthdr]);
if (ipprot) {
@@ -263,10 +264,21 @@ resubmit:
goto discard;
 
ret = ipprot->handler(skb);
-   if (ret > 0)
-   goto resubmit;
-   else if (ret == 0)
+   if (ret > 0) {
+   if (ipprot->flags & INET6_PROTO_FINAL) {
+   /* Not an extension header, most likely UDP
+* encapsulation. Use return value as nexthdr
+* protocol not nhoff (which presumably is
+* not set by handler).
+*/
+   nexthdr = ret;
+   goto resubmit_final;
+   } else {
+   goto resubmit;
+   }
+   } else if (ret == 0) {
__IP6_INC_STATS(net, idev, IPSTATS_MIB_INDELIVERS);
+   }
} else {
if (!raw) {
if (xfrm6_policy_check(NULL, XFRM_POLICY_IN, skb)) {
-- 
2.8.0.rc2

[PATCH v6 net-next 14/14] ip4ip6: Support for GSO/GRO

2016-05-16 Thread Tom Herbert

Signed-off-by: Tom Herbert 
---
 include/net/inet_common.h |  5 +
 net/ipv4/af_inet.c| 12 +++-
 net/ipv6/ip6_offload.c| 33 -
 net/ipv6/ip6_tunnel.c |  3 +++
 4 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/include/net/inet_common.h b/include/net/inet_common.h
index 109e3ee..5d68342 100644
--- a/include/net/inet_common.h
+++ b/include/net/inet_common.h
@@ -39,6 +39,11 @@ int inet_ctl_sock_create(struct sock **sk, unsigned short 
family,
 int inet_recv_error(struct sock *sk, struct msghdr *msg, int len,
int *addr_len);
 
+struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb);
+int inet_gro_complete(struct sk_buff *skb, int nhoff);
+struct sk_buff *inet_gso_segment(struct sk_buff *skb,
+netdev_features_t features);
+
 static inline void inet_ctl_sock_destroy(struct sock *sk)
 {
if (sk)
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 25040b1..377424e 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1192,8 +1192,8 @@ int inet_sk_rebuild_header(struct sock *sk)
 }
 EXPORT_SYMBOL(inet_sk_rebuild_header);
 
-static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
-   netdev_features_t features)
+struct sk_buff *inet_gso_segment(struct sk_buff *skb,
+netdev_features_t features)
 {
bool udpfrag = false, fixedid = false, encap;
struct sk_buff *segs = ERR_PTR(-EINVAL);
@@ -1280,9 +1280,9 @@ static struct sk_buff *inet_gso_segment(struct sk_buff 
*skb,
 out:
return segs;
 }
+EXPORT_SYMBOL(inet_gso_segment);
 
-static struct sk_buff **inet_gro_receive(struct sk_buff **head,
-struct sk_buff *skb)
+struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
 {
const struct net_offload *ops;
struct sk_buff **pp = NULL;
@@ -1398,6 +1398,7 @@ out:
 
return pp;
 }
+EXPORT_SYMBOL(inet_gro_receive);
 
 static struct sk_buff **ipip_gro_receive(struct sk_buff **head,
 struct sk_buff *skb)
@@ -1449,7 +1450,7 @@ int inet_recv_error(struct sock *sk, struct msghdr *msg, 
int len, int *addr_len)
return -EINVAL;
 }
 
-static int inet_gro_complete(struct sk_buff *skb, int nhoff)
+int inet_gro_complete(struct sk_buff *skb, int nhoff)
 {
__be16 newlen = htons(skb->len - nhoff);
struct iphdr *iph = (struct iphdr *)(skb->data + nhoff);
@@ -1479,6 +1480,7 @@ out_unlock:
 
return err;
 }
+EXPORT_SYMBOL(inet_gro_complete);
 
 static int ipip_gro_complete(struct sk_buff *skb, int nhoff)
 {
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 332d6a0..22e90e5 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -16,6 +16,7 @@
 
 #include 
 #include 
+#include 
 
 #include "ip6_offload.h"
 
@@ -268,6 +269,21 @@ static struct sk_buff **sit_ip6ip6_gro_receive(struct 
sk_buff **head,
return ipv6_gro_receive(head, skb);
 }
 
+static struct sk_buff **ip4ip6_gro_receive(struct sk_buff **head,
+  struct sk_buff *skb)
+{
+   /* Common GRO receive for SIT and IP6IP6 */
+
+   if (NAPI_GRO_CB(skb)->encap_mark) {
+   NAPI_GRO_CB(skb)->flush = 1;
+   return NULL;
+   }
+
+   NAPI_GRO_CB(skb)->encap_mark = 1;
+
+   return inet_gro_receive(head, skb);
+}
+
 static int ipv6_gro_complete(struct sk_buff *skb, int nhoff)
 {
const struct net_offload *ops;
@@ -307,6 +323,13 @@ static int ip6ip6_gro_complete(struct sk_buff *skb, int 
nhoff)
return ipv6_gro_complete(skb, nhoff);
 }
 
+static int ip4ip6_gro_complete(struct sk_buff *skb, int nhoff)
+{
+   skb->encapsulation = 1;
+   skb_shinfo(skb)->gso_type |= SKB_GSO_IPXIP6;
+   return inet_gro_complete(skb, nhoff);
+}
+
 static struct packet_offload ipv6_packet_offload __read_mostly = {
.type = cpu_to_be16(ETH_P_IPV6),
.callbacks = {
@@ -324,6 +347,14 @@ static const struct net_offload sit_offload = {
},
 };
 
+static const struct net_offload ip4ip6_offload = {
+   .callbacks = {
+   .gso_segment= inet_gso_segment,
+   .gro_receive= ip4ip6_gro_receive,
+   .gro_complete   = ip4ip6_gro_complete,
+   },
+};
+
 static const struct net_offload ip6ip6_offload = {
.callbacks = {
.gso_segment= ipv6_gso_segment,
@@ -331,7 +362,6 @@ static const struct net_offload ip6ip6_offload = {
.gro_complete   = ip6ip6_gro_complete,
},
 };
-
 static int __init ipv6_offload_init(void)
 {
 
@@ -344,6 +374,7 @@ static int __init ipv6_offload_init(void)
 
inet_add_offload(_offload, IPPROTO_IPV6);
inet6_add_offload(_offload, IPPROTO_IPV6);
+   inet6_add_offload(_offload, IPPROTO_IPIP);
 
return

[PATCH v6 net-next 06/14] fou: Call setup_udp_tunnel_sock

2016-05-16 Thread Tom Herbert

Use helper function to set up UDP tunnel related information for a fou
socket.

Signed-off-by: Tom Herbert 
---
 net/ipv4/fou.c | 50 --
 1 file changed, 16 insertions(+), 34 deletions(-)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index eeec7d6..6cbc725 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -448,31 +448,13 @@ static void fou_release(struct fou *fou)
kfree_rcu(fou, rcu);
 }
 
-static int fou_encap_init(struct sock *sk, struct fou *fou, struct fou_cfg 
*cfg)
-{
-   udp_sk(sk)->encap_rcv = fou_udp_recv;
-   udp_sk(sk)->gro_receive = fou_gro_receive;
-   udp_sk(sk)->gro_complete = fou_gro_complete;
-   fou_from_sock(sk)->protocol = cfg->protocol;
-
-   return 0;
-}
-
-static int gue_encap_init(struct sock *sk, struct fou *fou, struct fou_cfg 
*cfg)
-{
-   udp_sk(sk)->encap_rcv = gue_udp_recv;
-   udp_sk(sk)->gro_receive = gue_gro_receive;
-   udp_sk(sk)->gro_complete = gue_gro_complete;
-
-   return 0;
-}
-
 static int fou_create(struct net *net, struct fou_cfg *cfg,
  struct socket **sockp)
 {
struct socket *sock = NULL;
struct fou *fou = NULL;
struct sock *sk;
+   struct udp_tunnel_sock_cfg tunnel_cfg;
int err;
 
/* Open UDP socket */
@@ -491,33 +473,33 @@ static int fou_create(struct net *net, struct fou_cfg 
*cfg,
 
fou->flags = cfg->flags;
fou->port = cfg->udp_config.local_udp_port;
+   fou->type = cfg->type;
+   fou->sock = sock;
+
+   memset(_cfg, 0, sizeof(tunnel_cfg));
+   tunnel_cfg.encap_type = 1;
+   tunnel_cfg.sk_user_data = fou;
+   tunnel_cfg.encap_destroy = NULL;
 
/* Initial for fou type */
switch (cfg->type) {
case FOU_ENCAP_DIRECT:
-   err = fou_encap_init(sk, fou, cfg);
-   if (err)
-   goto error;
+   tunnel_cfg.encap_rcv = fou_udp_recv;
+   tunnel_cfg.gro_receive = fou_gro_receive;
+   tunnel_cfg.gro_complete = fou_gro_complete;
+   fou->protocol = cfg->protocol;
break;
case FOU_ENCAP_GUE:
-   err = gue_encap_init(sk, fou, cfg);
-   if (err)
-   goto error;
+   tunnel_cfg.encap_rcv = gue_udp_recv;
+   tunnel_cfg.gro_receive = gue_gro_receive;
+   tunnel_cfg.gro_complete = gue_gro_complete;
break;
default:
err = -EINVAL;
goto error;
}
 
-   fou->type = cfg->type;
-
-   udp_sk(sk)->encap_type = 1;
-   udp_encap_enable();
-
-   sk->sk_user_data = fou;
-   fou->sock = sock;
-
-   inet_inc_convert_csum(sk);
+   setup_udp_tunnel_sock(net, sock, _cfg);
 
sk->sk_allocation = GFP_ATOMIC;
 
-- 
2.8.0.rc2

[PATCH v6 net-next 07/14] fou: Split out {fou,gue}_build_header

2016-05-16 Thread Tom Herbert

Create __fou_build_header and __gue_build_header. These implement the
protocol generic parts of building the fou and gue header.
fou_build_header and gue_build_header implement the IPv4 specific
functions and call the __*_build_header functions.

Signed-off-by: Tom Herbert 
---
 include/net/fou.h |  8 
 net/ipv4/fou.c| 47 +--
 2 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/include/net/fou.h b/include/net/fou.h
index 19b8a0c..7d2fda2 100644
--- a/include/net/fou.h
+++ b/include/net/fou.h
@@ -11,9 +11,9 @@
 size_t fou_encap_hlen(struct ip_tunnel_encap *e);
 static size_t gue_encap_hlen(struct ip_tunnel_encap *e);
 
-int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
-u8 *protocol, struct flowi4 *fl4);
-int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
-u8 *protocol, struct flowi4 *fl4);
+int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+  u8 *protocol, __be16 *sport, int type);
+int __gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+  u8 *protocol, __be16 *sport, int type);
 
 #endif
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 6cbc725..f4f2ddd 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -780,6 +780,22 @@ static void fou_build_udp(struct sk_buff *skb, struct 
ip_tunnel_encap *e,
*protocol = IPPROTO_UDP;
 }
 
+int __fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+  u8 *protocol, __be16 *sport, int type)
+{
+   int err;
+
+   err = iptunnel_handle_offloads(skb, type);
+   if (err)
+   return err;
+
+   *sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
+   skb, 0, 0, false);
+
+   return 0;
+}
+EXPORT_SYMBOL(__fou_build_header);
+
 int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
 u8 *protocol, struct flowi4 *fl4)
 {
@@ -788,26 +804,21 @@ int fou_build_header(struct sk_buff *skb, struct 
ip_tunnel_encap *e,
__be16 sport;
int err;
 
-   err = iptunnel_handle_offloads(skb, type);
+   err = __fou_build_header(skb, e, protocol, , type);
if (err)
return err;
 
-   sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
-  skb, 0, 0, false);
fou_build_udp(skb, e, fl4, protocol, sport);
 
return 0;
 }
 EXPORT_SYMBOL(fou_build_header);
 
-int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
-u8 *protocol, struct flowi4 *fl4)
+int __gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+  u8 *protocol, __be16 *sport, int type)
 {
-   int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM :
-  SKB_GSO_UDP_TUNNEL;
struct guehdr *guehdr;
size_t hdrlen, optlen = 0;
-   __be16 sport;
void *data;
bool need_priv = false;
int err;
@@ -826,8 +837,8 @@ int gue_build_header(struct sk_buff *skb, struct 
ip_tunnel_encap *e,
return err;
 
/* Get source port (based on flow hash) before skb_push */
-   sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
-  skb, 0, 0, false);
+   *sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
+   skb, 0, 0, false);
 
hdrlen = sizeof(struct guehdr) + optlen;
 
@@ -872,6 +883,22 @@ int gue_build_header(struct sk_buff *skb, struct 
ip_tunnel_encap *e,
 
}
 
+   return 0;
+}
+EXPORT_SYMBOL(__gue_build_header);
+
+int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
+u8 *protocol, struct flowi4 *fl4)
+{
+   int type = e->flags & TUNNEL_ENCAP_FLAG_CSUM ? SKB_GSO_UDP_TUNNEL_CSUM :
+  SKB_GSO_UDP_TUNNEL;
+   __be16 sport;
+   int err;
+
+   err = __gue_build_header(skb, e, protocol, , type);
+   if (err)
+   return err;
+
fou_build_udp(skb, e, fl4, protocol, sport);
 
return 0;
-- 
2.8.0.rc2

[PATCH v6 net-next 08/14] fou: Support IPv6 in fou

2016-05-16 Thread Tom Herbert

This patch adds receive path support for IPv6 with fou.

- Add address family to fou structure for open sockets. This supports
  AF_INET and AF_INET6. Lookups for fou ports are performed on both the
  port number and family.
- In fou and gue receive adjust tot_len in IPv4 header or payload_len
  based on address family.
- Allow AF_INET6 in FOU_ATTR_AF netlink attribute.

Signed-off-by: Tom Herbert 
---
 net/ipv4/fou.c | 47 +++
 1 file changed, 35 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index f4f2ddd..5f9207c 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -21,6 +21,7 @@ struct fou {
u8 protocol;
u8 flags;
__be16 port;
+   u8 family;
u16 type;
struct list_head list;
struct rcu_head rcu;
@@ -47,14 +48,17 @@ static inline struct fou *fou_from_sock(struct sock *sk)
return sk->sk_user_data;
 }
 
-static int fou_recv_pull(struct sk_buff *skb, size_t len)
+static int fou_recv_pull(struct sk_buff *skb, struct fou *fou, size_t len)
 {
-   struct iphdr *iph = ip_hdr(skb);
-
/* Remove 'len' bytes from the packet (UDP header and
 * FOU header if present).
 */
-   iph->tot_len = htons(ntohs(iph->tot_len) - len);
+   if (fou->family == AF_INET)
+   ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(skb)->tot_len) - len);
+   else
+   ipv6_hdr(skb)->payload_len =
+   htons(ntohs(ipv6_hdr(skb)->payload_len) - len);
+
__skb_pull(skb, len);
skb_postpull_rcsum(skb, udp_hdr(skb), len);
skb_reset_transport_header(skb);
@@ -68,7 +72,7 @@ static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
if (!fou)
return 1;
 
-   if (fou_recv_pull(skb, sizeof(struct udphdr)))
+   if (fou_recv_pull(skb, fou, sizeof(struct udphdr)))
goto drop;
 
return -fou->protocol;
@@ -141,7 +145,11 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff 
*skb)
 
hdrlen = sizeof(struct guehdr) + optlen;
 
-   ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(skb)->tot_len) - len);
+   if (fou->family == AF_INET)
+   ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(skb)->tot_len) - len);
+   else
+   ipv6_hdr(skb)->payload_len =
+   htons(ntohs(ipv6_hdr(skb)->payload_len) - len);
 
/* Pull csum through the guehdr now . This can be used if
 * there is a remote checksum offload.
@@ -426,7 +434,8 @@ static int fou_add_to_port_list(struct net *net, struct fou 
*fou)
 
mutex_lock(>fou_lock);
list_for_each_entry(fout, >fou_list, list) {
-   if (fou->port == fout->port) {
+   if (fou->port == fout->port &&
+   fou->family == fout->family) {
mutex_unlock(>fou_lock);
return -EALREADY;
}
@@ -471,8 +480,9 @@ static int fou_create(struct net *net, struct fou_cfg *cfg,
 
sk = sock->sk;
 
-   fou->flags = cfg->flags;
fou->port = cfg->udp_config.local_udp_port;
+   fou->family = cfg->udp_config.family;
+   fou->flags = cfg->flags;
fou->type = cfg->type;
fou->sock = sock;
 
@@ -524,12 +534,13 @@ static int fou_destroy(struct net *net, struct fou_cfg 
*cfg)
 {
struct fou_net *fn = net_generic(net, fou_net_id);
__be16 port = cfg->udp_config.local_udp_port;
+   u8 family = cfg->udp_config.family;
int err = -EINVAL;
struct fou *fou;
 
mutex_lock(>fou_lock);
list_for_each_entry(fou, >fou_list, list) {
-   if (fou->port == port) {
+   if (fou->port == port && fou->family == family) {
fou_release(fou);
err = 0;
break;
@@ -567,8 +578,15 @@ static int parse_nl_config(struct genl_info *info,
if (info->attrs[FOU_ATTR_AF]) {
u8 family = nla_get_u8(info->attrs[FOU_ATTR_AF]);
 
-   if (family != AF_INET)
-   return -EINVAL;
+   switch (family) {
+   case AF_INET:
+   break;
+   case AF_INET6:
+   cfg->udp_config.ipv6_v6only = 1;
+   break;
+   default:
+   return -EAFNOSUPPORT;
+   }
 
cfg->udp_config.family = family;
}
@@ -659,6 +677,7 @@ static int fou_nl_cmd_get_port(struct sk_buff *skb, struct 
genl_info *info)
struct fou_cfg cfg;
struct fou *fout;
__be16 port;
+   u8 family;
int ret;
 
ret = parse_nl_config(info, );
@@ -668,6 +687,10 @@ static int fou_nl_cmd_get_port(struct sk_buff *skb, struct 
genl_info *info)
if (port == 0)
return -EINVAL;
 
+   family = cfg.udp_config.family;
+   if (family !=

[PATCH v6 net-next 09/14] ip6_tun: Add infrastructure for doing encapsulation

2016-05-16 Thread Tom Herbert

Add encap_hlen and ip_tunnel_encap structure to ip6_tnl. Add functions
for getting encap hlen, setting up encap on a tunnel, performing
encapsulation operation.

Signed-off-by: Tom Herbert 
---
 include/net/ip6_tunnel.h  | 58 
 net/ipv4/ip_tunnel_core.c |  5 +++
 net/ipv6/ip6_tunnel.c | 85 ++-
 3 files changed, 139 insertions(+), 9 deletions(-)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index fb9e015..d325c81 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -52,10 +52,68 @@ struct ip6_tnl {
__u32 o_seqno;  /* The last output seqno */
int hlen;   /* tun_hlen + encap_hlen */
int tun_hlen;   /* Precalculated header length */
+   int encap_hlen; /* Encap header length (FOU,GUE) */
+   struct ip_tunnel_encap encap;
int mlink;
+};
 
+struct ip6_tnl_encap_ops {
+   size_t (*encap_hlen)(struct ip_tunnel_encap *e);
+   int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
+   u8 *protocol, struct flowi6 *fl6);
 };
 
+extern const struct ip6_tnl_encap_ops __rcu *
+   ip6tun_encaps[MAX_IPTUN_ENCAP_OPS];
+
+int ip6_tnl_encap_add_ops(const struct ip6_tnl_encap_ops *ops,
+ unsigned int num);
+int ip6_tnl_encap_del_ops(const struct ip6_tnl_encap_ops *ops,
+ unsigned int num);
+int ip6_tnl_encap_setup(struct ip6_tnl *t,
+   struct ip_tunnel_encap *ipencap);
+
+static inline int ip6_encap_hlen(struct ip_tunnel_encap *e)
+{
+   const struct ip6_tnl_encap_ops *ops;
+   int hlen = -EINVAL;
+
+   if (e->type == TUNNEL_ENCAP_NONE)
+   return 0;
+
+   if (e->type >= MAX_IPTUN_ENCAP_OPS)
+   return -EINVAL;
+
+   rcu_read_lock();
+   ops = rcu_dereference(ip6tun_encaps[e->type]);
+   if (likely(ops && ops->encap_hlen))
+   hlen = ops->encap_hlen(e);
+   rcu_read_unlock();
+
+   return hlen;
+}
+
+static inline int ip6_tnl_encap(struct sk_buff *skb, struct ip6_tnl *t,
+   u8 *protocol, struct flowi6 *fl6)
+{
+   const struct ip6_tnl_encap_ops *ops;
+   int ret = -EINVAL;
+
+   if (t->encap.type == TUNNEL_ENCAP_NONE)
+   return 0;
+
+   if (t->encap.type >= MAX_IPTUN_ENCAP_OPS)
+   return -EINVAL;
+
+   rcu_read_lock();
+   ops = rcu_dereference(ip6tun_encaps[t->encap.type]);
+   if (likely(ops && ops->build_header))
+   ret = ops->build_header(skb, >encap, protocol, fl6);
+   rcu_read_unlock();
+
+   return ret;
+}
+
 /* Tunnel encapsulation limit destination sub-option */
 
 struct ipv6_tlv_tnl_enc_lim {
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index cc66a20..afd6b59 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -51,6 +52,10 @@ const struct ip_tunnel_encap_ops __rcu *
iptun_encaps[MAX_IPTUN_ENCAP_OPS] __read_mostly;
 EXPORT_SYMBOL(iptun_encaps);
 
+const struct ip6_tnl_encap_ops __rcu *
+   ip6tun_encaps[MAX_IPTUN_ENCAP_OPS] __read_mostly;
+EXPORT_SYMBOL(ip6tun_encaps);
+
 void iptunnel_xmit(struct sock *sk, struct rtable *rt, struct sk_buff *skb,
   __be32 src, __be32 dst, __u8 proto,
   __u8 tos, __u8 ttl, __be16 df, bool xnet)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index e79330f..ec53612 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1125,10 +1125,14 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device 
*dev, __u8 dsfield,
}
 
max_headroom = LL_RESERVED_SPACE(dst->dev) + sizeof(struct ipv6hdr)
-   + dst->header_len;
+   + dst->header_len + t->hlen;
if (max_headroom > dev->needed_headroom)
dev->needed_headroom = max_headroom;
 
+   err = ip6_tnl_encap(skb, t, , fl6);
+   if (err)
+   return err;
+
skb_push(skb, sizeof(struct ipv6hdr));
skb_reset_network_header(skb);
ipv6h = ipv6_hdr(skb);
@@ -1280,6 +1284,7 @@ static void ip6_tnl_link_config(struct ip6_tnl *t)
struct net_device *dev = t->dev;
struct __ip6_tnl_parm *p = >parms;
struct flowi6 *fl6 = >fl.u.ip6;
+   int t_hlen;
 
memcpy(dev->dev_addr, >laddr, sizeof(struct in6_addr));
memcpy(dev->broadcast, >raddr, sizeof(struct in6_addr));
@@ -1303,6 +1308,10 @@ static void ip6_tnl_link_config(struct ip6_tnl *t)
else
dev->flags &= ~IFF_POINTOPOINT;
 
+   t->tun_hlen = 0;
+   t->hlen = t->encap_hlen + t->tun_hlen;
+   t_hlen = t->hlen + sizeof(struct ipv6hdr);
+
if (p->flags & IP6_TNL_F_CAP_XMIT) {
int strict =

[PATCH v6 net-next 12/14] ip6_tunnel: Add support for fou/gue encapsulation

2016-05-16 Thread Tom Herbert

Add netlink and setup for encapsulation

Signed-off-by: Tom Herbert 
---
 net/ipv6/ip6_tunnel.c | 72 +++
 1 file changed, 72 insertions(+)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index ec53612..8076c7a 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1796,13 +1796,55 @@ static void ip6_tnl_netlink_parms(struct nlattr *data[],
parms->proto = nla_get_u8(data[IFLA_IPTUN_PROTO]);
 }
 
+static bool ip6_tnl_netlink_encap_parms(struct nlattr *data[],
+   struct ip_tunnel_encap *ipencap)
+{
+   bool ret = false;
+
+   memset(ipencap, 0, sizeof(*ipencap));
+
+   if (!data)
+   return ret;
+
+   if (data[IFLA_IPTUN_ENCAP_TYPE]) {
+   ret = true;
+   ipencap->type = nla_get_u16(data[IFLA_IPTUN_ENCAP_TYPE]);
+   }
+
+   if (data[IFLA_IPTUN_ENCAP_FLAGS]) {
+   ret = true;
+   ipencap->flags = nla_get_u16(data[IFLA_IPTUN_ENCAP_FLAGS]);
+   }
+
+   if (data[IFLA_IPTUN_ENCAP_SPORT]) {
+   ret = true;
+   ipencap->sport = nla_get_be16(data[IFLA_IPTUN_ENCAP_SPORT]);
+   }
+
+   if (data[IFLA_IPTUN_ENCAP_DPORT]) {
+   ret = true;
+   ipencap->dport = nla_get_be16(data[IFLA_IPTUN_ENCAP_DPORT]);
+   }
+
+   return ret;
+}
+
 static int ip6_tnl_newlink(struct net *src_net, struct net_device *dev,
   struct nlattr *tb[], struct nlattr *data[])
 {
struct net *net = dev_net(dev);
struct ip6_tnl *nt, *t;
+   struct ip_tunnel_encap ipencap;
 
nt = netdev_priv(dev);
+
+   if (ip6_tnl_netlink_encap_parms(data, )) {
+   int err = ip6_tnl_encap_setup(nt, );
+
+   if (err < 0)
+   return err;
+   }
+
ip6_tnl_netlink_parms(data, >parms);
 
t = ip6_tnl_locate(net, >parms, 0);
@@ -1819,10 +1861,17 @@ static int ip6_tnl_changelink(struct net_device *dev, 
struct nlattr *tb[],
struct __ip6_tnl_parm p;
struct net *net = t->net;
struct ip6_tnl_net *ip6n = net_generic(net, ip6_tnl_net_id);
+   struct ip_tunnel_encap ipencap;
 
if (dev == ip6n->fb_tnl_dev)
return -EINVAL;
 
+   if (ip6_tnl_netlink_encap_parms(data, )) {
+   int err = ip6_tnl_encap_setup(t, );
+
+   if (err < 0)
+   return err;
+   }
ip6_tnl_netlink_parms(data, );
 
t = ip6_tnl_locate(net, , 0);
@@ -1863,6 +1912,14 @@ static size_t ip6_tnl_get_size(const struct net_device 
*dev)
nla_total_size(4) +
/* IFLA_IPTUN_PROTO */
nla_total_size(1) +
+   /* IFLA_IPTUN_ENCAP_TYPE */
+   nla_total_size(2) +
+   /* IFLA_IPTUN_ENCAP_FLAGS */
+   nla_total_size(2) +
+   /* IFLA_IPTUN_ENCAP_SPORT */
+   nla_total_size(2) +
+   /* IFLA_IPTUN_ENCAP_DPORT */
+   nla_total_size(2) +
0;
 }
 
@@ -1880,6 +1937,17 @@ static int ip6_tnl_fill_info(struct sk_buff *skb, const 
struct net_device *dev)
nla_put_u32(skb, IFLA_IPTUN_FLAGS, parm->flags) ||
nla_put_u8(skb, IFLA_IPTUN_PROTO, parm->proto))
goto nla_put_failure;
+
+   if (nla_put_u16(skb, IFLA_IPTUN_ENCAP_TYPE,
+   tunnel->encap.type) ||
+   nla_put_be16(skb, IFLA_IPTUN_ENCAP_SPORT,
+tunnel->encap.sport) ||
+   nla_put_be16(skb, IFLA_IPTUN_ENCAP_DPORT,
+tunnel->encap.dport) ||
+   nla_put_u16(skb, IFLA_IPTUN_ENCAP_FLAGS,
+   tunnel->encap.flags))
+   goto nla_put_failure;
+
return 0;
 
 nla_put_failure:
@@ -1903,6 +1971,10 @@ static const struct nla_policy 
ip6_tnl_policy[IFLA_IPTUN_MAX + 1] = {
[IFLA_IPTUN_FLOWINFO]   = { .type = NLA_U32 },
[IFLA_IPTUN_FLAGS]  = { .type = NLA_U32 },
[IFLA_IPTUN_PROTO]  = { .type = NLA_U8 },
+   [IFLA_IPTUN_ENCAP_TYPE] = { .type = NLA_U16 },
+   [IFLA_IPTUN_ENCAP_FLAGS]= { .type = NLA_U16 },
+   [IFLA_IPTUN_ENCAP_SPORT]= { .type = NLA_U16 },
+   [IFLA_IPTUN_ENCAP_DPORT]= { .type = NLA_U16 },
 };
 
 static struct rtnl_link_ops ip6_link_ops __read_mostly = {
-- 
2.8.0.rc2

[PATCH v6 net-next 13/14] ip6ip6: Support for GSO/GRO

2016-05-16 Thread Tom Herbert

Signed-off-by: Tom Herbert 
---
 net/ipv6/ip6_offload.c | 24 +---
 net/ipv6/ip6_tunnel.c  |  3 +++
 2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 787e55f..332d6a0 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -253,9 +253,11 @@ out:
return pp;
 }
 
-static struct sk_buff **sit_gro_receive(struct sk_buff **head,
-   struct sk_buff *skb)
+static struct sk_buff **sit_ip6ip6_gro_receive(struct sk_buff **head,
+  struct sk_buff *skb)
 {
+   /* Common GRO receive for SIT and IP6IP6 */
+
if (NAPI_GRO_CB(skb)->encap_mark) {
NAPI_GRO_CB(skb)->flush = 1;
return NULL;
@@ -298,6 +300,13 @@ static int sit_gro_complete(struct sk_buff *skb, int nhoff)
return ipv6_gro_complete(skb, nhoff);
 }
 
+static int ip6ip6_gro_complete(struct sk_buff *skb, int nhoff)
+{
+   skb->encapsulation = 1;
+   skb_shinfo(skb)->gso_type |= SKB_GSO_IPXIP6;
+   return ipv6_gro_complete(skb, nhoff);
+}
+
 static struct packet_offload ipv6_packet_offload __read_mostly = {
.type = cpu_to_be16(ETH_P_IPV6),
.callbacks = {
@@ -310,11 +319,19 @@ static struct packet_offload ipv6_packet_offload 
__read_mostly = {
 static const struct net_offload sit_offload = {
.callbacks = {
.gso_segment= ipv6_gso_segment,
-   .gro_receive= sit_gro_receive,
+   .gro_receive= sit_ip6ip6_gro_receive,
.gro_complete   = sit_gro_complete,
},
 };
 
+static const struct net_offload ip6ip6_offload = {
+   .callbacks = {
+   .gso_segment= ipv6_gso_segment,
+   .gro_receive= sit_ip6ip6_gro_receive,
+   .gro_complete   = ip6ip6_gro_complete,
+   },
+};
+
 static int __init ipv6_offload_init(void)
 {
 
@@ -326,6 +343,7 @@ static int __init ipv6_offload_init(void)
dev_add_offload(_packet_offload);
 
inet_add_offload(_offload, IPPROTO_IPV6);
+   inet6_add_offload(_offload, IPPROTO_IPV6);
 
return 0;
 }
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 8076c7a..d205f17 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1238,6 +1238,9 @@ ip6ip6_tnl_xmit(struct sk_buff *skb, struct net_device 
*dev)
if (t->parms.flags & IP6_TNL_F_USE_ORIG_FWMARK)
fl6.flowi6_mark = skb->mark;
 
+   if (iptunnel_handle_offloads(skb, SKB_GSO_IPXIP6))
+   return -1;
+
err = ip6_tnl_xmit(skb, dev, dsfield, , encap_limit, ,
   IPPROTO_IPV6);
if (err != 0) {
-- 
2.8.0.rc2

[PATCH v6 net-next 05/14] net: Cleanup encap items in ip_tunnels.h

2016-05-16 Thread Tom Herbert

Consolidate all the ip_tunnel_encap definitions in one spot in the
header file. Also, move ip_encap_hlen and ip_tunnel_encap from
ip_tunnel.c to ip_tunnels.h so they call be called without a dependency
on ip_tunnel module. Similarly, move iptun_encaps to ip_tunnel_core.c.

Signed-off-by: Tom Herbert 
---
 include/net/ip_tunnels.h  | 76 ---
 net/ipv4/ip_tunnel.c  | 45 
 net/ipv4/ip_tunnel_core.c |  4 +++
 3 files changed, 62 insertions(+), 63 deletions(-)

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index d916b43..dbf 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -171,22 +171,6 @@ struct ip_tunnel_net {
struct ip_tunnel __rcu *collect_md_tun;
 };
 
-struct ip_tunnel_encap_ops {
-   size_t (*encap_hlen)(struct ip_tunnel_encap *e);
-   int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
-   u8 *protocol, struct flowi4 *fl4);
-};
-
-#define MAX_IPTUN_ENCAP_OPS 8
-
-extern const struct ip_tunnel_encap_ops __rcu *
-   iptun_encaps[MAX_IPTUN_ENCAP_OPS];
-
-int ip_tunnel_encap_add_ops(const struct ip_tunnel_encap_ops *op,
-   unsigned int num);
-int ip_tunnel_encap_del_ops(const struct ip_tunnel_encap_ops *op,
-   unsigned int num);
-
 static inline void ip_tunnel_key_init(struct ip_tunnel_key *key,
  __be32 saddr, __be32 daddr,
  u8 tos, u8 ttl, __be32 label,
@@ -251,8 +235,6 @@ void ip_tunnel_delete_net(struct ip_tunnel_net *itn, struct 
rtnl_link_ops *ops);
 void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
const struct iphdr *tnl_params, const u8 protocol);
 int ip_tunnel_ioctl(struct net_device *dev, struct ip_tunnel_parm *p, int cmd);
-int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t,
-   u8 *protocol, struct flowi4 *fl4);
 int __ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict);
 int ip_tunnel_change_mtu(struct net_device *dev, int new_mtu);
 
@@ -271,9 +253,67 @@ int ip_tunnel_changelink(struct net_device *dev, struct 
nlattr *tb[],
 int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
  struct ip_tunnel_parm *p);
 void ip_tunnel_setup(struct net_device *dev, int net_id);
+
+struct ip_tunnel_encap_ops {
+   size_t (*encap_hlen)(struct ip_tunnel_encap *e);
+   int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
+   u8 *protocol, struct flowi4 *fl4);
+};
+
+#define MAX_IPTUN_ENCAP_OPS 8
+
+extern const struct ip_tunnel_encap_ops __rcu *
+   iptun_encaps[MAX_IPTUN_ENCAP_OPS];
+
+int ip_tunnel_encap_add_ops(const struct ip_tunnel_encap_ops *op,
+   unsigned int num);
+int ip_tunnel_encap_del_ops(const struct ip_tunnel_encap_ops *op,
+   unsigned int num);
+
 int ip_tunnel_encap_setup(struct ip_tunnel *t,
  struct ip_tunnel_encap *ipencap);
 
+static inline int ip_encap_hlen(struct ip_tunnel_encap *e)
+{
+   const struct ip_tunnel_encap_ops *ops;
+   int hlen = -EINVAL;
+
+   if (e->type == TUNNEL_ENCAP_NONE)
+   return 0;
+
+   if (e->type >= MAX_IPTUN_ENCAP_OPS)
+   return -EINVAL;
+
+   rcu_read_lock();
+   ops = rcu_dereference(iptun_encaps[e->type]);
+   if (likely(ops && ops->encap_hlen))
+   hlen = ops->encap_hlen(e);
+   rcu_read_unlock();
+
+   return hlen;
+}
+
+static inline int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t,
+ u8 *protocol, struct flowi4 *fl4)
+{
+   const struct ip_tunnel_encap_ops *ops;
+   int ret = -EINVAL;
+
+   if (t->encap.type == TUNNEL_ENCAP_NONE)
+   return 0;
+
+   if (t->encap.type >= MAX_IPTUN_ENCAP_OPS)
+   return -EINVAL;
+
+   rcu_read_lock();
+   ops = rcu_dereference(iptun_encaps[t->encap.type]);
+   if (likely(ops && ops->build_header))
+   ret = ops->build_header(skb, >encap, protocol, fl4);
+   rcu_read_unlock();
+
+   return ret;
+}
+
 /* Extract dsfield from inner protocol */
 static inline u8 ip_tunnel_get_dsfield(const struct iphdr *iph,
   const struct sk_buff *skb)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index a69ed94..d8f5e0a 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -443,29 +443,6 @@ drop:
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_rcv);
 
-static int ip_encap_hlen(struct ip_tunnel_encap *e)
-{
-   const struct ip_tunnel_encap_ops *ops;
-   int hlen = -EINVAL;
-
-   if (e->type == TUNNEL_ENCAP_NONE)
-   return 0;
-
-   if (e->type >= MAX_IPTUN_ENCAP_OPS)
-   return -EINVAL;
-
-

[PATCH v6 net-next 02/14] net: define gso types for IPx over IPv4 and IPv6

2016-05-16 Thread Tom Herbert

This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
NETIF_F_GSO_IPXIP6. These are used to described IP in IP
tunnel and what the outer protocol is. The inner protocol
can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
are removed (these are both instances of SKB_GSO_IPXIP4).
SKB_GSO_IPXIP6 will be used when support for GSO with IP
encapsulation over IPv6 is added.

Signed-off-by: Tom Herbert 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  5 ++---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |  5 ++---
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  3 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  3 +--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  3 +--
 drivers/net/ethernet/intel/i40evf/i40evf_main.c   |  3 +--
 drivers/net/ethernet/intel/igb/igb_main.c |  3 +--
 drivers/net/ethernet/intel/igbvf/netdev.c |  3 +--
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  3 +--
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  3 +--
 include/linux/netdev_features.h   | 12 ++--
 include/linux/netdevice.h |  4 ++--
 include/linux/skbuff.h|  4 ++--
 net/core/ethtool.c|  4 ++--
 net/ipv4/af_inet.c|  2 +-
 net/ipv4/ipip.c   |  2 +-
 net/ipv6/ip6_offload.c|  4 ++--
 net/ipv6/sit.c|  4 ++--
 net/netfilter/ipvs/ip_vs_xmit.c   | 17 +++--
 19 files changed, 37 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index d465bd7..0a5b770 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -13259,12 +13259,11 @@ static int bnx2x_init_dev(struct bnx2x *bp, struct 
pci_dev *pdev,
NETIF_F_RXHASH | NETIF_F_HW_VLAN_CTAG_TX;
if (!chip_is_e1x) {
dev->hw_features |= NETIF_F_GSO_GRE | NETIF_F_GSO_UDP_TUNNEL |
-   NETIF_F_GSO_IPIP | NETIF_F_GSO_SIT;
+   NETIF_F_GSO_IPXIP4;
dev->hw_enc_features =
NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_SG |
NETIF_F_TSO | NETIF_F_TSO_ECN | NETIF_F_TSO6 |
-   NETIF_F_GSO_IPIP |
-   NETIF_F_GSO_SIT |
+   NETIF_F_GSO_IPXIP4 |
NETIF_F_GSO_GRE | NETIF_F_GSO_UDP_TUNNEL;
}
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 5a0dca3..72a2eff 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6311,7 +6311,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM | NETIF_F_SG |
   NETIF_F_TSO | NETIF_F_TSO6 |
   NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_GRE |
-  NETIF_F_GSO_IPIP | NETIF_F_GSO_SIT |
+  NETIF_F_GSO_IPXIP4 |
   NETIF_F_GSO_UDP_TUNNEL_CSUM | NETIF_F_GSO_GRE_CSUM |
   NETIF_F_GSO_PARTIAL | NETIF_F_RXHASH |
   NETIF_F_RXCSUM | NETIF_F_LRO | NETIF_F_GRO;
@@ -6321,8 +6321,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
NETIF_F_TSO | NETIF_F_TSO6 |
NETIF_F_GSO_UDP_TUNNEL | NETIF_F_GSO_GRE |
NETIF_F_GSO_UDP_TUNNEL_CSUM | NETIF_F_GSO_GRE_CSUM |
-   NETIF_F_GSO_IPIP | NETIF_F_GSO_SIT |
-   NETIF_F_GSO_PARTIAL;
+   NETIF_F_GSO_IPXIP4 | NETIF_F_GSO_PARTIAL;
dev->gso_partial_features = NETIF_F_GSO_UDP_TUNNEL_CSUM |
NETIF_F_GSO_GRE_CSUM;
dev->vlan_features = dev->hw_features | NETIF_F_HIGHDMA;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 1cd0ebf..242a1ff 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9083,8 +9083,7 @@ static int i40e_config_netdev(struct i40e_vsi *vsi)
   NETIF_F_TSO6 |
   NETIF_F_GSO_GRE  |
   NETIF_F_GSO_GRE_CSUM |
-  NETIF_F_GSO_IPIP |
-  NETIF_F_GSO_SIT  |
+

[PATCH v6 net-next 04/14] ipv6: Change "final" protocol processing for encapsulation

2016-05-16 Thread Tom Herbert

When performing foo-over-UDP, UDP packets are processed by the
encapsulation handler which returns another protocol to process.
This may result in processing two (or more) protocols in the
loop that are marked as INET6_PROTO_FINAL. The actions taken
for hitting a final protocol, in particular the skb_postpull_rcsum
can only be performed once.

This patch set adds a check of a final protocol has been seen. The
rules are:
  - If the final protocol has not been seen any protocol is processed
(final and non-final). In the case of a final protocol, the final
actions are taken (like the skb_postpull_rcsum)
  - If a final protocol has been seen (e.g. an encapsulating UDP
header) then no further non-final protocols are allowed
(e.g. extension headers). For more final protocols the
final actions are not taken (e.g. skb_postpull_rcsum).

Signed-off-by: Tom Herbert 
---
 net/ipv6/ip6_input.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index d35dff2..94611e4 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -223,6 +223,7 @@ static int ip6_input_finish(struct net *net, struct sock 
*sk, struct sk_buff *sk
unsigned int nhoff;
int nexthdr;
bool raw;
+   bool have_final = false;
 
/*
 *  Parse extension headers
@@ -242,9 +243,21 @@ resubmit_final:
if (ipprot) {
int ret;
 
-   if (ipprot->flags & INET6_PROTO_FINAL) {
+   if (have_final) {
+   if (!(ipprot->flags & INET6_PROTO_FINAL)) {
+   /* Once we've seen a final protocol don't
+* allow encapsulation on any non-final
+* ones. This allows foo in UDP encapsulation
+* to work.
+*/
+   goto discard;
+   }
+   } else if (ipprot->flags & INET6_PROTO_FINAL) {
const struct ipv6hdr *hdr;
 
+   /* Only do this once for first final protocol */
+   have_final = true;
+
/* Free reference early: we don't need it any more,
   and it may hold ip_conntrack module loaded
   indefinitely. */
-- 
2.8.0.rc2

[PATCH v6 net-next 00/14] ipv6: Enable GUEoIPv6 and more fixes for v6 tunneling

2016-05-16 Thread Tom Herbert

This patch set:
  - Fixes GRE6 to process translate flags correctly from configuration
  - Adds support for GSO and GRO for ip6ip6 and ip4ip6
  - Add support for FOU and GUE in IPv6
  - Support GRE, ip6ip6 and ip4ip6 over FOU/GUE
  - Fixes ip6_input to deal with UDP encapsulations
  - Some other minor fixes

v2:
  - Removed a check of GSO types in MPLS
  - Define GSO type SKB_GSO_IPXIP6 and SKB_GSO_IPXIP4 (based on input
from Alexander)
  - Don't define GSO types specifically for IP6IP6 and IP4IP6, above
fix makes that unnecessary
  - Don't bother clearing encapsulation flag in UDP tunnel segment
(another item suggested by Alexander).

v3:
  - Address some minor comments from Alexander

v4:
  - Rebase on changes to fix IP TX tunnels
  - Fix MTU issues in ip4ip6, ip6ip6
  - Add test data for above

v5:
  - Address feedback from Shmulik Ladkani regarding extension header
code that does not return next header but in instead relies
on returning value via nhoff. Solution here is to fix EH
processing to return nexthdr value.
  - Refactored IPv4 encaps so that we won't need to create
a ip6_tunnel_core.c when adding encap support IPv6.

v6:
  - Fix build issues with regard to new GSO constants
  - FIx MTU calculation issues ip6_tunnel.c pointed out byt ALex
  - Add encap_hlen into headroom for GREv6 to work with FOU/GUE

Tested:
   Tested a variety of case, but not the full matrix (which is quite
   large now). Most of the obvious cases (e.g. GRE) work fine. Still
   some issues probably with GSO/GRO being effective in all cases.

- IPv4/GRE/GUE/IPv6 with RCO
  1 TCP_STREAM
6616 Mbps
  200 TCP_RR
1244043 tps
141/243/446 90/95/99% latencies
86.61% CPU utilization

- IPv6/GRE/GUE/IPv6 with RCO
  1 TCP_STREAM
6940 Mbps
  200 TCP_RR
1270903 tps
138/236/440 90/95/99% latencies
87.51% CPU utilization

 - IP6IP6
  1 TCP_STREAM
2576 Mbps
  200 TCP_RR
498981 tps
388/498/631 90/95/99% latencies
19.75% CPU utilization (1 CPU saturated)

 - IP6IP6/GUE with RCO
  1 TCP_STREAM
2031 Mbps
  200 TCP_RR
1233818 tps
143/244/451 90/95/99% latencies
87.57 CPU utilization

 - IP4IP6
  1 TCP_STREAM
2371 Mbps
  200 TCP_RR
763774 tps
250/318/466 90/95/99% latencies
35.25% CPU utilization (1 CPU saturated)

 - IP4IP6/GUE with RCO
  1 TCP_STREAM
2054 Mbps
  200 TCP_RR
1196385 tps
148/251/460 90/95/99% latencies
87.56 CPU utilization

 - GRE with keyid
  200 TCP_RR
744173 tps
258/332/461 90/95/99% latencies
34.59% CPU utilization (1 CPU saturated)
  

Tom Herbert (14):
  gso: Remove arbitrary checks for unsupported GSO
  net: define gso types for IPx over IPv4 and IPv6
  ipv6: Fix nexthdr for reinjection
  ipv6: Change "final" protocol processing for encapsulation
  net: Cleanup encap items in ip_tunnels.h
  fou: Call setup_udp_tunnel_sock
  fou: Split out {fou,gue}_build_header
  fou: Support IPv6 in fou
  ip6_tun: Add infrastructure for doing encapsulation
  fou: Add encap ops for IPv6 tunnels
  ip6_gre: Add support for fou/gue encapsulation
  ip6_tunnel: Add support for fou/gue encapsulation
  ip6ip6: Support for GSO/GRO
  ip4ip6: Support for GSO/GRO

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |   5 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   5 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c   |   3 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   3 +-
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |   3 +-
 drivers/net/ethernet/intel/i40evf/i40evf_main.c   |   3 +-
 drivers/net/ethernet/intel/igb/igb_main.c |   3 +-
 drivers/net/ethernet/intel/igbvf/netdev.c |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   3 +-
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   3 +-
 include/linux/netdev_features.h   |  12 +-
 include/linux/netdevice.h |   4 +-
 include/linux/skbuff.h|   4 +-
 include/net/fou.h |  10 +-
 include/net/inet_common.h |   5 +
 include/net/ip6_tunnel.h  |  58 
 include/net/ip_tunnels.h  |  76 +++---
 net/core/ethtool.c|   4 +-
 net/ipv4/af_inet.c|  32 ++---
 net/ipv4/fou.c| 144 +++
 net/ipv4/gre_offload.c|  14 --
 net/ipv4/ip_tunnel.c  |  45 --
 net/ipv4/ip_tunnel_core.c |   9 ++
 net/ipv4/ipip.c   |   2 +-
 net/ipv4/tcp_offload.c|  19 ---
 net/ipv4/udp_offload.c|  10

[PATCH net-next] bpf, doc: fix typo on bpf_asm descriptions

2016-05-16 Thread Daniel Borkmann

Fix description of some of the bpf_asm tool related jump instructions
and generally move them to format A  k.

Reported-by: Sebastian Amend 
Signed-off-by: Daniel Borkmann 
---
 Documentation/networking/filter.txt | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/Documentation/networking/filter.txt 
b/Documentation/networking/filter.txt
index 6aef0b5..b9a4edf 100644
--- a/Documentation/networking/filter.txt
+++ b/Documentation/networking/filter.txt
@@ -216,14 +216,14 @@ opcodes as defined in linux/filter.h stand for:
 
   jmp  6Jump to label
   ja   6Jump to label
-  jeq  7, 8 Jump on k == A
-  jneq 8Jump on k != A
-  jne  8Jump on k != A
-  jlt  8Jump on k < A
-  jle  8Jump on k <= A
-  jgt  7, 8 Jump on k > A
-  jge  7, 8 Jump on k >= A
-  jset 7, 8 Jump on k & A
+  jeq  7, 8 Jump on A == k
+  jneq 8Jump on A != k
+  jne  8Jump on A != k
+  jlt  8Jump on A <  k
+  jle  8Jump on A <= k
+  jgt  7, 8 Jump on A >  k
+  jge  7, 8 Jump on A >= k
+  jset 7, 8 Jump on A &  k
 
   add  0, 4 A + 
   sub  0, 4 A - 
-- 
1.9.3

Re: [PATCH] ixgbe: take online CPU number as MQ max limit when alloc_etherdev_mq()

2016-05-16 Thread Jeff Kirsher

On Fri, 2016-05-13 at 14:56 +0900, Ethan Zhao wrote:
> Allocating 64 Tx/Rx as default doesn't benefit perfomrnace when less
> CPUs were assigned. especially when DCB is enabled, so we should take
> num_online_cpus() as top limit, and aslo to make sure every TC has
> at least one queue, take the MAX_TRAFFIC_CLASS as bottom limit of queues
> number.
> 
> Signed-off-by: Ethan Zhao 
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 4 
>  1 file changed, 4 insertions(+)

Dropping this patch based on Alex's and John's feedback.

signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2 2/2] phy dp83867: Make rgmii parameters optional

2016-05-16 Thread Dan Murphy

On 05/16/2016 01:52 PM, Alexander Graf wrote:
> If you compile without OF_MDIO support in an RGMII configuration, we fail
> to configure the dp83867 phy today by writing garbage into its configuration
> registers.
>
> On the other hand if you do compile with OF_MDIO and the phy gets loaded via
> device tree, you have to have the properties set in the device tree, otherwise
> we fail to load the driver and don't even attach the generic phy driver to
> the interface anymore.
>
> To make things slightly more consistent, make the rgmii configuration 
> properties
> optional and allow a user to omit them in their device tree.
>
> Signed-off-by: Alexander Graf 
> ---
>  drivers/net/phy/dp83867.c | 31 ---
>  1 file changed, 28 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
> index 94cc278..1b01680 100644
> --- a/drivers/net/phy/dp83867.c
> +++ b/drivers/net/phy/dp83867.c
> @@ -65,6 +65,7 @@ struct dp83867_private {
>   int rx_id_delay;
>   int tx_id_delay;
>   int fifo_depth;
> + int values_are_sane;
>  };
>  
>  static int dp83867_ack_interrupt(struct phy_device *phydev)
> @@ -113,15 +114,30 @@ static int dp83867_of_init(struct phy_device *phydev)
>   ret = of_property_read_u32(of_node, "ti,rx-internal-delay",
>  >rx_id_delay);
>   if (ret)
> - return ret;
> + goto invalid_dt;
>  
>   ret = of_property_read_u32(of_node, "ti,tx-internal-delay",
>  >tx_id_delay);
>   if (ret)
> - return ret;
> + goto invalid_dt;

Optional means you may or may not have the entries

I would prefer to wrap the DT reading with the interface type check.

if (phydev->interface == PHY_INTERFACE_MODE_RGMII_TXID ||
phydev->interface == PHY_INTERFACE_MODE_RGMII_ID  )

ret = of_property_read_u32(of_node, "ti,tx-internal-delay",
   >tx_id_delay);
if (ret)

goto invalid_dt;

Otherwise this continues to mandate that you need to declare all the DT entries
when in fact you may only have to declare 1.

And if the other interfaces are declared then DT entries are ignored.  And 
configuring
internal delay is not required per section 8.9 footnote 3 of the data sheet.

Dan



>  
> - return of_property_read_u32(of_node, "ti,fifo-depth",
> + ret = of_property_read_u32(of_node, "ti,fifo-depth",
>  >fifo_depth);
> + if (ret)
> + goto invalid_dt;
> +
> + dp83867->values_are_sane = 1;
> +
> + return 0;
> +
> +invalid_dt:
> + phydev_err(phydev, "missing properties in device tree");
> +
> + /*
> +  * We can still run with a broken dt by not using any of the optional
> +  * parameters, so just don't set dp83867->values_are_sane.
> +  */
> + return 0;
>  }
>  #else
>  static int dp83867_of_init(struct phy_device *phydev)
> @@ -150,6 +166,15 @@ static int dp83867_config_init(struct phy_device *phydev)
>   dp83867 = (struct dp83867_private *)phydev->priv;
>   }
>  
> + /*
> +  * With no or broken device tree, we don't have the values that we would
> +  * want to configure the phy with. In that case, cross our fingers and
> +  * assume that firmware did everything correctly for us or that we don't
> +  * need them.
> +  */
> + if (!dp83867->values_are_sane)
> + return 0;
> +
>   if (phy_interface_is_rgmii(phydev)) {
>   ret = phy_write(phydev, MII_DP83867_PHYCTRL,
>   (dp83867->fifo_depth << 
> DP83867_PHYCR_FIFO_DEPTH_SHIFT));


-- 
--
Dan Murphy

Re: [PATCH v5 net-next 09/14] ip6_tun: Add infrastructure for doing encapsulation

2016-05-16 Thread Alexander Duyck

On Mon, May 16, 2016 at 12:28 PM, Tom Herbert  wrote:
> On Mon, May 16, 2016 at 12:24 PM, Alexander Duyck
>  wrote:
>> On Sun, May 15, 2016 at 4:42 PM, Tom Herbert  wrote:
>>> Add encap_hlen and ip_tunnel_encap structure to ip6_tnl. Add functions
>>> for getting encap hlen, setting up encap on a tunnel, performing
>>> encapsulation operation.
>>>
>>> Signed-off-by: Tom Herbert 
>>> ---
>>>  include/net/ip6_tunnel.h  | 58 ++
>>>  net/ipv4/ip_tunnel_core.c |  5 +++
>>>  net/ipv6/ip6_tunnel.c | 89 
>>> +--
>>>  3 files changed, 141 insertions(+), 11 deletions(-)
>>
>> So a bisect is pointing to this patch as causing a regression in IPv6
>> GRE throughput from 20 Gb/s to .04 Mb/s
>>
>> <...>
>>
>>> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
>>> index e79330f..9f0ea85 100644
>>> --- a/net/ipv6/ip6_tunnel.c
>>> +++ b/net/ipv6/ip6_tunnel.c
>>> @@ -1010,7 +1010,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct 
>>> net_device *dev, __u8 dsfield,
>>> struct dst_entry *dst = NULL, *ndst = NULL;
>>> struct net_device *tdev;
>>> int mtu;
>>> -   unsigned int max_headroom = sizeof(struct ipv6hdr);
>>> +   unsigned int max_headroom = sizeof(struct ipv6hdr) + t->hlen;
>>> int err = -1;
>>>
>>> /* NBMA tunnel */
>>> @@ -1063,7 +1063,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct 
>>> net_device *dev, __u8 dsfield,
>>>  t->parms.name);
>>> goto tx_err_dst_release;
>>> }
>>> -   mtu = dst_mtu(dst) - sizeof(*ipv6h);
>>> +   mtu = dst_mtu(dst) - sizeof(*ipv6h) - t->hlen;
>>> if (encap_limit >= 0) {
>>> max_headroom += 8;
>>> mtu -= 8;
>>
>> So I am pretty sure this bit here is causing the regression.  Your skb
>> already has a GRE header added and it is included in skb->len.  In the
>> tests just below here you are comparing skb->len to mtu, but you now
>> have the GRE header included twice so it is going to fail.  Odds are
>> this should be t->encap_hlen, and not t->hlen.
>>
> Good catch! Fixing now...

Actually I think the one other case above for max_headroom probably
should be encap_hlen as well.  After all we don't need to allocate
headroom for something we have already placed in the skb.

I'm still digging into the patch set.  If I find anything else I will
let you know.  I'm hoping to be able to test ip6ip6 hardware tunnel
offloads by the end of today.

- Alex

Re: [PATCH v2 2/2] phy dp83867: Make rgmii parameters optional

2016-05-16 Thread Andrew Lunn

On Mon, May 16, 2016 at 08:52:43PM +0200, Alexander Graf wrote:
> If you compile without OF_MDIO support in an RGMII configuration, we fail
> to configure the dp83867 phy today by writing garbage into its configuration
> registers.
> 
> On the other hand if you do compile with OF_MDIO and the phy gets loaded via
> device tree, you have to have the properties set in the device tree, otherwise
> we fail to load the driver and don't even attach the generic phy driver to
> the interface anymore.
> 
> To make things slightly more consistent, make the rgmii configuration 
> properties
> optional and allow a user to omit them in their device tree.

The binding document actually says they are required. It would be good
to make the binding documentation and the code consistent.

   Andrew

Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)

2016-05-16 Thread Roman Yeryomin

On 16 May 2016 at 19:04, Dave Taht  wrote:
> On Mon, May 16, 2016 at 1:14 AM, Roman Yeryomin  wrote:
>> On 16 May 2016 at 01:34, Roman Yeryomin  wrote:
>>> On 6 May 2016 at 22:43, Dave Taht  wrote:
 On Fri, May 6, 2016 at 11:56 AM, Roman Yeryomin  
 wrote:
> On 6 May 2016 at 21:43, Roman Yeryomin  wrote:
>> On 6 May 2016 at 15:47, Jesper Dangaard Brouer  wrote:
>>>
>>> I've created a OpenWRT ticket[1] on this issue, as it seems that 
>>> someone[2]
>>> closed Felix'es OpenWRT email account (bad choice! emails bouncing).
>>> Sounds like OpenWRT and the LEDE https://www.lede-project.org/ project
>>> is in some kind of conflict.
>>>
>>> OpenWRT ticket [1] https://dev.openwrt.org/ticket/22349
>>>
>>> [2] 
>>> http://thread.gmane.org/gmane.comp.embedded.openwrt.devel/40298/focus=40335
>>
>> OK, so, after porting the patch to 4.1 openwrt kernel and playing a
>> bit with fq_codel limits I was able to get 420Mbps UDP like this:
>> tc qdisc replace dev wlan0 parent :1 fq_codel flows 16 limit 256
>
> Forgot to mention, I've reduced drop_batch_size down to 32

 0) Not clear to me if that's the right line, there are 4 wifi queues,
 and the third one
 is the BE queue.
>>>
>>> That was an example, sorry, should have stated that. I've applied same
>>> settings to all 4 queues.
>>>
 That is too low a limit, also, for normal use. And:
 for the purpose of this particular UDP test, flows 16 is ok, but not
 ideal.
>>>
>>> I played with different combinations, it doesn't make any
>>> (significant) difference: 20-30Mbps, not more.
>>> What numbers would you propose?
>>>
 1) What's the tcp number (with a simultaneous ping) with this latest 
 patchset?
 (I care about tcp performance a lot more than udp floods - surviving a
 udp flood yes, performance, no)
>>>
>>> During the test (both TCP and UDP) it's roughly 5ms in average, not
>>> running tests ~2ms. Actually I'm now wondering if target is working at
>>> all, because I had same result with target 80ms..
>>> So, yes, latency is good, but performance is poor.
>>>
 before/after?

 tc -s qdisc show dev wlan0 during/after results?
>>>
>>> during the test:
>>>
>>> qdisc mq 0: root
>>>  Sent 1600496000 bytes 1057194 pkt (dropped 1421568, overlimits 0 requeues 
>>> 17)
>>>  backlog 1545794b 1021p requeues 17
>>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 1601271168 bytes 1057706 pkt (dropped 1422304, overlimits 0 requeues 
>>> 17)
>>>  backlog 1541252b 1018p requeues 17
>>>   maxpacket 1514 drop_overlimit 1422304 new_flow_count 35 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 1
>>> qdisc fq_codel 8004: parent :4 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>>
>>>
>>> after the test (60sec):
>>>
>>> qdisc mq 0: root
>>>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 
>>> 28)
>>>  backlog 0b 0p requeues 28
>>> qdisc fq_codel 8001: parent :1 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8002: parent :2 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>>   new_flows_len 0 old_flows_len 0
>>> qdisc fq_codel 8003: parent :3 limit 1024p flows 16 quantum 1514
>>> target 80.0ms ce_threshold 32us interval 100.0ms ecn
>>>  Sent 3084996052 bytes 2037744 pkt (dropped 2770176, overlimits 0 requeues 
>>> 28)
>>>  backlog 0b 0p requeues 28
>>>   maxpacket 1514 drop_overlimit 2770176

[PATCH] i40e: Fix errors resulted while turning off TSO

2016-05-16 Thread Tushar Dave

On systems with 128 CPUs, turning off TSO results in errors,

i40e :03:00.0: failed to get tracking for 1 vectors for VSI 400, err=-12
i40e :03:00.0: Couldn't create FDir VSI
i40e :03:00.0: i40e_ptp_init: PTP not supported on eth0
i40e :03:00.0: couldn't add VEB, err I40E_ERR_ADMIN_QUEUE_ERROR aq_err 
I40E_AQ_RC_ENOENT
i40e :03:00.0: rebuild of switch failed: -1, will try to set up simple PF 
connection
i40e :03:00.0 eth0: adding 00:10:e0:8a:24:b6 vid=0

Enabling FD_SB without checking availability of MSI-X vector is the
root cause. This change adds necessary check.

Signed-off-by: Tushar Dave 
---
 drivers/net/ethernet/intel/i40e/i40e.h  |1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c |8 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 68f2204..80dcb5c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -270,6 +270,7 @@ struct i40e_pf {
 #endif /* I40E_FCOE */
u16 num_lan_qps;   /* num lan queues this PF has set up */
u16 num_lan_msix;  /* num queue vectors for the base PF vsi */
+   u16 num_fdsb_msix; /* num queue vectors for sideband Fdir */
int queues_left;   /* queues left unclaimed */
u16 alloc_rss_size;/* allocated RSS queues */
u16 rss_size_max;  /* HW defined max RSS queues */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 8f3b53e..9248863 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7170,7 +7170,7 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
vsi->alloc_queue_pairs = 1;
vsi->num_desc = ALIGN(I40E_FDIR_RING_COUNT,
  I40E_REQ_DESCRIPTOR_MULTIPLE);
-   vsi->num_q_vectors = 1;
+   vsi->num_q_vectors = pf->num_fdsb_msix;
break;
 
case I40E_VSI_VMDQ2:
@@ -7558,9 +7558,11 @@ static int i40e_init_msix(struct i40e_pf *pf)
/* reserve one vector for sideband flow director */
if (pf->flags & I40E_FLAG_FD_SB_ENABLED) {
if (vectors_left) {
+   pf->num_fdsb_msix = 1;
v_budget++;
vectors_left--;
} else {
+   pf->num_fdsb_msix = 0;
pf->flags &= ~I40E_FLAG_FD_SB_ENABLED;
}
}
@@ -8443,7 +8445,9 @@ bool i40e_set_ntuple(struct i40e_pf *pf, 
netdev_features_t features)
/* Enable filters and mark for reset */
if (!(pf->flags & I40E_FLAG_FD_SB_ENABLED))
need_reset = true;
-   pf->flags |= I40E_FLAG_FD_SB_ENABLED;
+   /* enable FD_SB only if there is MSI-X vector */
+   if (pf->num_fdsb_msix > 0)
+   pf->flags |= I40E_FLAG_FD_SB_ENABLED;
} else {
/* turn off filters, mark for reset and clear SW filter list */
if (pf->flags & I40E_FLAG_FD_SB_ENABLED) {
-- 
1.7.1

i40e: Errors while turning off TSO

2016-05-16 Thread tndave


On systems with 128 CPUs, turning off TSO results in errors.

Errors:
i40e :03:00.0: failed to get tracking for 1 vectors for VSI 400, err=-12
i40e :03:00.0: Couldn't create FDir VSI
i40e :03:00.0: i40e_ptp_init: PTP not supported on eth0
i40e :03:00.0: couldn't add VEB, err I40E_ERR_ADMIN_QUEUE_ERROR
aq_err I40E_AQ_RC_ENOENT
i40e :03:00.0: rebuild of switch failed: -1, will try to set up
simple PF connection
i40e :03:00.0 eth0: adding 00:10:e0:8a:24:b6 vid=0


From kernel log:
i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 1.4.8-k
i40e: Copyright (c) 2013 - 2014 Intel Corporation.
i40e :03:00.0: fw 4.40.35115 api 1.4 nvm 4.53 0x80001e8c 0.0.0
i40e :03:00.0: MAC address: 00:10:e0:8a:24:b6
i40e :03:00.0: i40e_ptp_init: PTP not supported on eth0
i40e :03:00.0: PCI-Express: Speed 8.0GT/s Width x8
i40e :03:00.0: Features: PF-id[0] VFs: 32 VSIs: 34 QP: 128 RX: 1BUF
RSS FD_ATR VxLAN VEPA

As per the log above, feature FD_SB (sideband flow director)) is not
enabled. Because there are no enough MSI-X vectors available.
(Device function caps report 129 MSI-X vectors in this case. And driver
reserved 1 of them for misc interrupt and rest 128 for 128 QP. So no
vector left for FD_SB. Therefore driver disables FD_SB)

However turning off TSO invokes i40e_set_ntuple()
that enables FD_SB and returns true, issues reset in i40e_set_features()
i.e i40e_do_reset.

Later during reset, driver fails to find irq for FD_SB from irq pile
(and it won't because there was no irq vector assigned for FD_SB).
This results in the very first error,
'i40e :03:00.0: failed to get tracking for 1 vectors for VSI 400,
err=-12'

I believe before enabling FD_SB in i40e_set_ntuple(), driver should
check if MSI-X vector available for FD_SB.

Sending patch in separate email.

(FWIW, if number of CPUs reduced to 64, I don't see the issue
described above because in that case out of 129 MSI-X vectors only
64 get assigned to QP. Remaining are used for features like FD_SB)

-Tushar

Re: [PATCH v5 net-next 09/14] ip6_tun: Add infrastructure for doing encapsulation

2016-05-16 Thread Tom Herbert

On Mon, May 16, 2016 at 12:24 PM, Alexander Duyck
 wrote:
> On Sun, May 15, 2016 at 4:42 PM, Tom Herbert  wrote:
>> Add encap_hlen and ip_tunnel_encap structure to ip6_tnl. Add functions
>> for getting encap hlen, setting up encap on a tunnel, performing
>> encapsulation operation.
>>
>> Signed-off-by: Tom Herbert 
>> ---
>>  include/net/ip6_tunnel.h  | 58 ++
>>  net/ipv4/ip_tunnel_core.c |  5 +++
>>  net/ipv6/ip6_tunnel.c | 89 
>> +--
>>  3 files changed, 141 insertions(+), 11 deletions(-)
>
> So a bisect is pointing to this patch as causing a regression in IPv6
> GRE throughput from 20 Gb/s to .04 Mb/s
>
> <...>
>
>> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
>> index e79330f..9f0ea85 100644
>> --- a/net/ipv6/ip6_tunnel.c
>> +++ b/net/ipv6/ip6_tunnel.c
>> @@ -1010,7 +1010,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct 
>> net_device *dev, __u8 dsfield,
>> struct dst_entry *dst = NULL, *ndst = NULL;
>> struct net_device *tdev;
>> int mtu;
>> -   unsigned int max_headroom = sizeof(struct ipv6hdr);
>> +   unsigned int max_headroom = sizeof(struct ipv6hdr) + t->hlen;
>> int err = -1;
>>
>> /* NBMA tunnel */
>> @@ -1063,7 +1063,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct 
>> net_device *dev, __u8 dsfield,
>>  t->parms.name);
>> goto tx_err_dst_release;
>> }
>> -   mtu = dst_mtu(dst) - sizeof(*ipv6h);
>> +   mtu = dst_mtu(dst) - sizeof(*ipv6h) - t->hlen;
>> if (encap_limit >= 0) {
>> max_headroom += 8;
>> mtu -= 8;
>
> So I am pretty sure this bit here is causing the regression.  Your skb
> already has a GRE header added and it is included in skb->len.  In the
> tests just below here you are comparing skb->len to mtu, but you now
> have the GRE header included twice so it is going to fail.  Odds are
> this should be t->encap_hlen, and not t->hlen.
>
Good catch! Fixing now...

> - Alex

Re: [PATCH v5 net-next 09/14] ip6_tun: Add infrastructure for doing encapsulation

2016-05-16 Thread Alexander Duyck

On Sun, May 15, 2016 at 4:42 PM, Tom Herbert  wrote:
> Add encap_hlen and ip_tunnel_encap structure to ip6_tnl. Add functions
> for getting encap hlen, setting up encap on a tunnel, performing
> encapsulation operation.
>
> Signed-off-by: Tom Herbert 
> ---
>  include/net/ip6_tunnel.h  | 58 ++
>  net/ipv4/ip_tunnel_core.c |  5 +++
>  net/ipv6/ip6_tunnel.c | 89 
> +--
>  3 files changed, 141 insertions(+), 11 deletions(-)

So a bisect is pointing to this patch as causing a regression in IPv6
GRE throughput from 20 Gb/s to .04 Mb/s

<...>

> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
> index e79330f..9f0ea85 100644
> --- a/net/ipv6/ip6_tunnel.c
> +++ b/net/ipv6/ip6_tunnel.c
> @@ -1010,7 +1010,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device 
> *dev, __u8 dsfield,
> struct dst_entry *dst = NULL, *ndst = NULL;
> struct net_device *tdev;
> int mtu;
> -   unsigned int max_headroom = sizeof(struct ipv6hdr);
> +   unsigned int max_headroom = sizeof(struct ipv6hdr) + t->hlen;
> int err = -1;
>
> /* NBMA tunnel */
> @@ -1063,7 +1063,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device 
> *dev, __u8 dsfield,
>  t->parms.name);
> goto tx_err_dst_release;
> }
> -   mtu = dst_mtu(dst) - sizeof(*ipv6h);
> +   mtu = dst_mtu(dst) - sizeof(*ipv6h) - t->hlen;
> if (encap_limit >= 0) {
> max_headroom += 8;
> mtu -= 8;

So I am pretty sure this bit here is causing the regression.  Your skb
already has a GRE header added and it is included in skb->len.  In the
tests just below here you are comparing skb->len to mtu, but you now
have the GRE header included twice so it is going to fail.  Odds are
this should be t->encap_hlen, and not t->hlen.

- Alex

Re: [PATCH v2 2/2] phy dp83867: Make rgmii parameters optional

2016-05-16 Thread Florian Fainelli

On 05/16/2016 11:52 AM, Alexander Graf wrote:
> If you compile without OF_MDIO support in an RGMII configuration, we fail
> to configure the dp83867 phy today by writing garbage into its configuration
> registers.
> 
> On the other hand if you do compile with OF_MDIO and the phy gets loaded via
> device tree, you have to have the properties set in the device tree, otherwise
> we fail to load the driver and don't even attach the generic phy driver to
> the interface anymore.
> 
> To make things slightly more consistent, make the rgmii configuration 
> properties
> optional and allow a user to omit them in their device tree.
> 
> Signed-off-by: Alexander Graf 
> ---
>  drivers/net/phy/dp83867.c | 31 ---
>  1 file changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
> index 94cc278..1b01680 100644
> --- a/drivers/net/phy/dp83867.c
> +++ b/drivers/net/phy/dp83867.c
> @@ -65,6 +65,7 @@ struct dp83867_private {
>   int rx_id_delay;
>   int tx_id_delay;
>   int fifo_depth;
> + int values_are_sane;

This could be a boolean type.

>  };
>  
>  static int dp83867_ack_interrupt(struct phy_device *phydev)
> @@ -113,15 +114,30 @@ static int dp83867_of_init(struct phy_device *phydev)
>   ret = of_property_read_u32(of_node, "ti,rx-internal-delay",
>  >rx_id_delay);
>   if (ret)
> - return ret;
> + goto invalid_dt;
>  
>   ret = of_property_read_u32(of_node, "ti,tx-internal-delay",
>  >tx_id_delay);
>   if (ret)
> - return ret;
> + goto invalid_dt;
>  
> - return of_property_read_u32(of_node, "ti,fifo-depth",
> + ret = of_property_read_u32(of_node, "ti,fifo-depth",
>  >fifo_depth);
> + if (ret)
> + goto invalid_dt;
> +
> + dp83867->values_are_sane = 1;
> +
> + return 0;
> +
> +invalid_dt:
> + phydev_err(phydev, "missing properties in device tree");

phydev_warn() maybe?

Other than that, this looks okay to me.
-- 
Florian

[PATCH v2 2/2] phy dp83867: Make rgmii parameters optional

2016-05-16 Thread Alexander Graf

If you compile without OF_MDIO support in an RGMII configuration, we fail
to configure the dp83867 phy today by writing garbage into its configuration
registers.

On the other hand if you do compile with OF_MDIO and the phy gets loaded via
device tree, you have to have the properties set in the device tree, otherwise
we fail to load the driver and don't even attach the generic phy driver to
the interface anymore.

To make things slightly more consistent, make the rgmii configuration properties
optional and allow a user to omit them in their device tree.

Signed-off-by: Alexander Graf 
---
 drivers/net/phy/dp83867.c | 31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index 94cc278..1b01680 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -65,6 +65,7 @@ struct dp83867_private {
int rx_id_delay;
int tx_id_delay;
int fifo_depth;
+   int values_are_sane;
 };
 
 static int dp83867_ack_interrupt(struct phy_device *phydev)
@@ -113,15 +114,30 @@ static int dp83867_of_init(struct phy_device *phydev)
ret = of_property_read_u32(of_node, "ti,rx-internal-delay",
   >rx_id_delay);
if (ret)
-   return ret;
+   goto invalid_dt;
 
ret = of_property_read_u32(of_node, "ti,tx-internal-delay",
   >tx_id_delay);
if (ret)
-   return ret;
+   goto invalid_dt;
 
-   return of_property_read_u32(of_node, "ti,fifo-depth",
+   ret = of_property_read_u32(of_node, "ti,fifo-depth",
   >fifo_depth);
+   if (ret)
+   goto invalid_dt;
+
+   dp83867->values_are_sane = 1;
+
+   return 0;
+
+invalid_dt:
+   phydev_err(phydev, "missing properties in device tree");
+
+   /*
+* We can still run with a broken dt by not using any of the optional
+* parameters, so just don't set dp83867->values_are_sane.
+*/
+   return 0;
 }
 #else
 static int dp83867_of_init(struct phy_device *phydev)
@@ -150,6 +166,15 @@ static int dp83867_config_init(struct phy_device *phydev)
dp83867 = (struct dp83867_private *)phydev->priv;
}
 
+   /*
+* With no or broken device tree, we don't have the values that we would
+* want to configure the phy with. In that case, cross our fingers and
+* assume that firmware did everything correctly for us or that we don't
+* need them.
+*/
+   if (!dp83867->values_are_sane)
+   return 0;
+
if (phy_interface_is_rgmii(phydev)) {
ret = phy_write(phydev, MII_DP83867_PHYCTRL,
(dp83867->fifo_depth << 
DP83867_PHYCR_FIFO_DEPTH_SHIFT));
-- 
1.8.5.6

[PATCH v2 1/2] phy dp83867: Fix compilation with CONFIG_OF_MDIO=m

2016-05-16 Thread Alexander Graf

When CONFIG_OF_MDIO is configured as module, the #define for it really
is CONFIG_OF_MDIO_MODULE, not CONFIG_OF_MDIO. So if we are compiling it
as module, the dp83867 doesn't see that OF_MDIO was selected and doesn't
read the dt rgmii parameters.

The fix is simple: Use IS_ENABLED(). It checks for both - module as well
as compiled in code.

Signed-off-by: Alexander Graf 
---
 drivers/net/phy/dp83867.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index 2afa61b..94cc278 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -99,7 +99,7 @@ static int dp83867_config_intr(struct phy_device *phydev)
return phy_write(phydev, MII_DP83867_MICR, micr_status);
 }
 
-#ifdef CONFIG_OF_MDIO
+#if IS_ENABLED(CONFIG_OF_MDIO)
 static int dp83867_of_init(struct phy_device *phydev)
 {
struct dp83867_private *dp83867 = phydev->priv;
-- 
1.8.5.6

Re: [ANNOUNCE] Netdev 1.2 conference

2016-05-16 Thread Stephen Hemminger

On Tue, 17 May 2016 00:36:58 +0900
Hajime Tazaki  wrote:

> Following the last successful Netdev 0.1 in Ottawa, Canada and
> 1.1 in Seville, Spain. We are happy to announce the third Netdev conference:
> Netdev 1.2 (year 1, conference 2) from 5th to 7th October 2016 in Tokyo,
> Japan (http://netdevconf.org/1.2/).

I understand that getting a free date for a conference is hard to find,
but those dates overlap with LinuxCon Europe in Berlin. There may not
be a lot of overlap in possible attendees but it seems like there might
be a better date?

linux-4.6/net/kcm/kcmsock.c:1508: bad if test ?

2016-05-16 Thread David Binderman

Hello there,

linux-4.6/net/kcm/kcmsock.c:1508]: (style) Checking if unsigned
variable 'copied' is less than zero.

Source code is

if (copied < 0) {

but

   size_t copied;

Suggest code rework.


Regards

David Binderman

Re: [PATCH v5 net-next 03/14] ipv6: Fix nexthdr for reinjection

2016-05-16 Thread Tom Herbert

On Mon, May 16, 2016 at 11:19 AM, Shmulik Ladkani
 wrote:
> Hi,
>
> On Sun, 15 May 2016 16:42:24 -0700 Tom Herbert  wrote:
>> In ip6_input_finish the nexthdr protocol is retrieved from the
>> next header offset that is returned in the cb of the skb.
>> This method does not work for UDP encapsulation that may not
>> even have a concept of a nexthdr field (e.g. FOU).
>>
>> This patch checks for a final protocol (INET6_PROTO_FINAL) when a
>> protocol handler returns > 1. If the protocol is not final then
>
> If you respin due to other reasons:  s/> 1/> 0/
>
Will do. Thanks!

Tom

>> resubmission is performed on nhoff value. If the protocol is final
>> then the nexthdr is taken to be the return value.
>>
>> Signed-off-by: Tom Herbert 
>
> Reviewed-by: Shmulik Ladkani

Re: [PATCH v5 net-next 02/14] net: define gso types for IPx over IPv4 and IPv6

2016-05-16 Thread Alexander Duyck

On Mon, May 16, 2016 at 11:28 AM, Tom Herbert  wrote:
> On Mon, May 16, 2016 at 11:13 AM, Alexander Duyck
>  wrote:
>> On Mon, May 16, 2016 at 11:07 AM, Tom Herbert  wrote:
>>> On Mon, May 16, 2016 at 9:32 AM, Alexander Duyck
>>>  wrote:
 On Sun, May 15, 2016 at 4:42 PM, Tom Herbert  wrote:
> This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
> SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
> NETIF_F_GSO_IPXIP6. These are used to described IP in IP
> tunnel and what the outer protocol is. The inner protocol
> can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
> SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
> are removed (these are both instances of SKB_GSO_IPXIP4).
> SKB_GSO_IPXIP6 will be used when support for GSO with IP
> encapsulation over IPv6 is added.
>
> Signed-off-by: Tom Herbert 
> ---
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  5 ++---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c |  4 ++--
>  drivers/net/ethernet/intel/i40e/i40e_main.c   |  3 +--
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  3 +--
>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  3 +--
>  drivers/net/ethernet/intel/i40evf/i40evf_main.c   |  3 +--
>  drivers/net/ethernet/intel/igb/igb_main.c |  3 +--
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  3 +--
>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  3 +--
>  include/linux/netdev_features.h   | 12 ++--
>  include/linux/netdevice.h |  4 ++--
>  include/linux/skbuff.h|  4 ++--
>  net/core/ethtool.c|  4 ++--
>  net/ipv4/af_inet.c|  2 +-
>  net/ipv4/ipip.c   |  2 +-
>  net/ipv6/ip6_offload.c|  4 ++--
>  net/ipv6/sit.c|  4 ++--
>  net/netfilter/ipvs/ip_vs_xmit.c   | 17 +++--
>  18 files changed, 36 insertions(+), 47 deletions(-)

 It looks like you missed drivers/net/ethernet/intel/igb/netdev.c.  If
 you don't get it then it will break the build.

>>>
>>> I don't see file that in betdev branch, maybe it's new?
>>
>> Nope, it has been there for a while.  It got patched to support IPIP
>> and SIT tunnels in the same patch that updated igb/igb_main.c.
>>
>
> Looks like it's
>
> drivers/net/ethernet/intel/igbvf/netdev.c
>
>> I am also looking into the other patches now.  It looks like something
>> broke hardware offloads again as I am only getting .9 Mb/s for IPv6
>> based GRE tunnels with your patches applied.  I'm trying to bisect it
>> now.
>>
> What hardware are you using?

I'm using an Intel X710, it is an i40e based NIC.

- Alex

Re: [PATCH iproute2] ip link: Add support for kernel side filtering

2016-05-16 Thread David Ahern


On 5/16/16 12:27 PM, David Ahern wrote:

In general older kernels do not parse the attributes appended to the get
request.


sorry, wrong wording: the attributes are parsed but ignored. I just 
checked an older 3.4 kernel tree and that is true there as well as prior 
to the kernel commit for this feature.

Re: [PATCH v5 net-next 02/14] net: define gso types for IPx over IPv4 and IPv6

2016-05-16 Thread Tom Herbert

On Mon, May 16, 2016 at 11:13 AM, Alexander Duyck
 wrote:
> On Mon, May 16, 2016 at 11:07 AM, Tom Herbert  wrote:
>> On Mon, May 16, 2016 at 9:32 AM, Alexander Duyck
>>  wrote:
>>> On Sun, May 15, 2016 at 4:42 PM, Tom Herbert  wrote:
 This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
 SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
 NETIF_F_GSO_IPXIP6. These are used to described IP in IP
 tunnel and what the outer protocol is. The inner protocol
 can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
 SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
 are removed (these are both instances of SKB_GSO_IPXIP4).
 SKB_GSO_IPXIP6 will be used when support for GSO with IP
 encapsulation over IPv6 is added.

 Signed-off-by: Tom Herbert 
 ---
  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  5 ++---
  drivers/net/ethernet/broadcom/bnxt/bnxt.c |  4 ++--
  drivers/net/ethernet/intel/i40e/i40e_main.c   |  3 +--
  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  3 +--
  drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  3 +--
  drivers/net/ethernet/intel/i40evf/i40evf_main.c   |  3 +--
  drivers/net/ethernet/intel/igb/igb_main.c |  3 +--
  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  3 +--
  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  3 +--
  include/linux/netdev_features.h   | 12 ++--
  include/linux/netdevice.h |  4 ++--
  include/linux/skbuff.h|  4 ++--
  net/core/ethtool.c|  4 ++--
  net/ipv4/af_inet.c|  2 +-
  net/ipv4/ipip.c   |  2 +-
  net/ipv6/ip6_offload.c|  4 ++--
  net/ipv6/sit.c|  4 ++--
  net/netfilter/ipvs/ip_vs_xmit.c   | 17 +++--
  18 files changed, 36 insertions(+), 47 deletions(-)
>>>
>>> It looks like you missed drivers/net/ethernet/intel/igb/netdev.c.  If
>>> you don't get it then it will break the build.
>>>
>>
>> I don't see file that in betdev branch, maybe it's new?
>
> Nope, it has been there for a while.  It got patched to support IPIP
> and SIT tunnels in the same patch that updated igb/igb_main.c.
>

Looks like it's

drivers/net/ethernet/intel/igbvf/netdev.c

> I am also looking into the other patches now.  It looks like something
> broke hardware offloads again as I am only getting .9 Mb/s for IPv6
> based GRE tunnels with your patches applied.  I'm trying to bisect it
> now.
>
What hardware are you using?

Tom

> - Alex

Re: [PATCH iproute2] ip link: Add support for kernel side filtering

2016-05-16 Thread David Ahern


On 5/16/16 12:19 PM, Stephen Hemminger wrote:

On Wed, 11 May 2016 06:51:58 -0700
David Ahern  wrote:


Kernel gained support for filtering link dumps with commit dc599f76c22b
("net: Add support for filtering link dump by master device and kind").
Add support to ip link command. If a user passes master device or
kind to ip link command they are added to the link dump request message.

Signed-off-by: David Ahern 
---
 include/libnetlink.h |  6 ++
 ip/ipaddress.c   | 33 -
 lib/libnetlink.c | 28 
 3 files changed, 66 insertions(+), 1 deletion(-)



Was this tested on older kernels?  Don't want to add something that breaks
when run on old kernels that are in stable distros.



Yes. Not really far back but older 4.x kernels.

In general older kernels do not parse the attributes appended to the get 
request. This is very similar to the neigh filter added by 
b8c753245bad3f13a03b105b724ff406d278c753.

Re: [PATCH] phy dp83867: depend on CONFIG_OF_MDIO

2016-05-16 Thread Dan Murphy

Alex

On 05/16/2016 12:57 PM, Alexander Graf wrote:
> Hi Dan,
>
> On 16.05.16 15:38, Dan Murphy wrote:
>> Alexander
>>
>> On 05/16/2016 06:28 AM, Alexander Graf wrote:
>>> The DP83867 phy driver doesn't actually work when CONFIG_OF_MDIO isn't 
>>> enabled.
>>> It simply passes the device tree test, but leaves all internal configuration
>>> initialized at 0. Then it configures the phy with those values and renders a
>>> previously working configuration useless.
>>>
>>> This patch makes sure that we only build the DP83867 phy code when
>>> CONFIG_OF_MDIO is set, to not run into that problem.
>>>
>>> Signed-off-by: Alexander Graf 
>>> ---
>>>  drivers/net/phy/Kconfig   | 1 +
>>>  drivers/net/phy/dp83867.c | 7 ---
>>>  2 files changed, 1 insertion(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
>>> index 6dad9a9..4265ad5 100644
>>> --- a/drivers/net/phy/Kconfig
>>> +++ b/drivers/net/phy/Kconfig
>>> @@ -148,6 +148,7 @@ config DP83848_PHY
>>>  
>>>  config DP83867_PHY
>>> tristate "Drivers for Texas Instruments DP83867 Gigabit PHY"
>>> +   depends on OF_MDIO
>>> ---help---
>>>   Currently supports the DP83867 PHY.
>>>  
>>> diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
>>> index 2afa61b..ff867ba 100644
>>> --- a/drivers/net/phy/dp83867.c
>>> +++ b/drivers/net/phy/dp83867.c
>>> @@ -99,7 +99,6 @@ static int dp83867_config_intr(struct phy_device *phydev)
>>> return phy_write(phydev, MII_DP83867_MICR, micr_status);
>>>  }
>>>  
>>> -#ifdef CONFIG_OF_MDIO
>>>  static int dp83867_of_init(struct phy_device *phydev)
>>>  {
>>> struct dp83867_private *dp83867 = phydev->priv;
>>> @@ -123,12 +122,6 @@ static int dp83867_of_init(struct phy_device *phydev)
>>> return of_property_read_u32(of_node, "ti,fifo-depth",
>>>>fifo_depth);
>>>  }
>>> -#else
>>> -static int dp83867_of_init(struct phy_device *phydev)
>>> -{
>>> -   return 0;
>>> -}
>>> -#endif /* CONFIG_OF_MDIO */
>>>  
>>>  static int dp83867_config_init(struct phy_device *phydev)
>>>  {
>> I don't think we want this to depend solely on OF_MDIO.
>>
>> The #else case should probably be coded to look at platform data, if
>> it exists.  I don't have any boards that still used platform data to test 
>> this
>> out so I did not feel comfortable adding code I could not test.
> Since there was no code to look at platform data, those boards would be
> broken just as well today, no? So at the end of the day, this change
> should be no regression for them.

As Andrew pointed out if you are not using RGMII you don't need internal delay 
or fifo_depth so making the driver dependent on OF_MDIO
does not make sense.

The DP83867 RGMII tx and rx delays and fifo should really be changed to 
optional parameters and only programmed if set.

Dan
>
> Alex


-- 
--
Dan Murphy

Re: [PATCH iproute2 -next] ingress, clsact: don't add TCA_OPTIONS to nl msg

2016-05-16 Thread Stephen Hemminger

On Sun, 15 May 2016 18:36:03 +0200
Daniel Borkmann  wrote:

> In ingress and clsact qdisc TCA_OPTIONS are ignored, since it's
> parameterless. In tc, we add an empty addattr_l(... TCA_OPTIONS,
> NULL, 0) to the netlink message nevertheless. This has the
> side effect that when someone tries a 'tc qdisc replace' and
> already an existing such qdisc is present, tc fails with
> EINVAL here.
> 
> Reason is that in the kernel, this invokes qdisc_change() when
> such requested qdisc is already present. When TCA_OPTIONS are
> passed to modify parameters, it looks whether qdisc implements
> .change() callback, and if not present (like in both cases here)
> it returns with error. Rather than adding an empty stub to the
> kernel that ignores TCA_OPTIONS again, just don't add TCA_OPTIONS
> to the netlink message in the first place.
> 
> Before:
> 
>   # tc qdisc replace dev foo clsact# first try
>   # tc qdisc replace dev foo clsact# second one
>   RTNETLINK answers: Invalid argument
> 
> After:
> 
>   # tc qdisc replace dev foo clsact
>   # tc qdisc replace dev foo clsact
>   # tc qdisc replace dev foo clsact
> 
> Signed-off-by: Daniel Borkmann 
> ---
>  tc/q_clsact.c  | 1 -
>  tc/q_ingress.c | 1 -
>  2 files changed, 2 deletions(-)

Applied to net-next

Re: [iproute2 net-next repost 1/2] devlink: implement shared buffer support

2016-05-16 Thread Stephen Hemminger

On Sat, 14 May 2016 15:21:01 +0200
Jiri Pirko  wrote:

> From: Jiri Pirko 
> 
> Implement kernel devlink shared buffer interface. Introduce new object
> "sb" and allow to browse the shared buffer parameters and also change
> configuration.
> 
> Signed-off-by: Jiri Pirko 
> ---
>  devlink/devlink.c | 653 
> +-
>  1 file changed, 652 insertions(+), 1 deletion(-)

Both applied to net-next

Re: [PATCH iproute2] ip link: Add support for kernel side filtering

2016-05-16 Thread Stephen Hemminger

On Wed, 11 May 2016 06:51:58 -0700
David Ahern  wrote:

> Kernel gained support for filtering link dumps with commit dc599f76c22b
> ("net: Add support for filtering link dump by master device and kind").
> Add support to ip link command. If a user passes master device or
> kind to ip link command they are added to the link dump request message.
> 
> Signed-off-by: David Ahern 
> ---
>  include/libnetlink.h |  6 ++
>  ip/ipaddress.c   | 33 -
>  lib/libnetlink.c | 28 
>  3 files changed, 66 insertions(+), 1 deletion(-)
> 

Was this tested on older kernels?  Don't want to add something that breaks
when run on old kernels that are in stable distros.

Re: [PATCH v5 net-next 03/14] ipv6: Fix nexthdr for reinjection

2016-05-16 Thread Shmulik Ladkani

Hi,

On Sun, 15 May 2016 16:42:24 -0700 Tom Herbert  wrote:
> In ip6_input_finish the nexthdr protocol is retrieved from the
> next header offset that is returned in the cb of the skb.
> This method does not work for UDP encapsulation that may not
> even have a concept of a nexthdr field (e.g. FOU).
> 
> This patch checks for a final protocol (INET6_PROTO_FINAL) when a
> protocol handler returns > 1. If the protocol is not final then

If you respin due to other reasons:  s/> 1/> 0/

> resubmission is performed on nhoff value. If the protocol is final
> then the nexthdr is taken to be the return value.
> 
> Signed-off-by: Tom Herbert 

Reviewed-by: Shmulik Ladkani

Re: [iproute2 PATCH 1/1] tc fix ife late binding

2016-05-16 Thread Stephen Hemminger

On Sun,  8 May 2016 11:28:49 -0400
Jamal Hadi Salim  wrote:

> From: Jamal Hadi Salim 
> 
> following late action binding didn't work:
> 
> sudo tc actions add action ife encode \
> type 0xDEAD allow mark dst 02:15:15:15:15:15 index 1
> 
> sudo tc filter add dev lo parent : protocol ip prio 2 u32\
> match ip src 127.0.0.2/32 flowid 1:2 action ife index 1
> 
> Signed-off-by: Jamal Hadi Salim 

Ok, applied all the ife patches (for 4.6)

Re: iwlwifi: mvm: add reorder buffer per queue

2016-05-16 Thread Dave Taht

I can't even describe how much I hate the concept of the reorder
buffer in general. Ordering is the endpoints problem.

Someday, after we get fq_codeled, short queues again, I'll be able to show why.

On Mon, May 16, 2016 at 4:41 AM, Luca Coelho  wrote:
> On Fri, 2016-05-13 at 11:54 +0300, Dan Carpenter wrote:
>> Hello Sara Sharon,
>>
>> The patch b915c10174fb: "iwlwifi: mvm: add reorder buffer per queue"
>> from Mar 23, 2016, leads to the following static checker warnings:
>>
>>   drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c:912
>> iwl_mvm_rx_mpdu_mq()
>>   error: potential NULL dereference 'sta'.
>>
>>   drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c:912
>> iwl_mvm_rx_mpdu_mq()
>>   error: we previously assumed 'sta' could be null (see line 796)
>
> Thanks for the analysis and report, Dan!
>
> I have queued a fix for this through our internal tree.
>
> --
> Cheers,
> Luca.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org

Re: [PATCH v5 net-next 02/14] net: define gso types for IPx over IPv4 and IPv6

2016-05-16 Thread Alexander Duyck

On Mon, May 16, 2016 at 11:07 AM, Tom Herbert  wrote:
> On Mon, May 16, 2016 at 9:32 AM, Alexander Duyck
>  wrote:
>> On Sun, May 15, 2016 at 4:42 PM, Tom Herbert  wrote:
>>> This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
>>> SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
>>> NETIF_F_GSO_IPXIP6. These are used to described IP in IP
>>> tunnel and what the outer protocol is. The inner protocol
>>> can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
>>> SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
>>> are removed (these are both instances of SKB_GSO_IPXIP4).
>>> SKB_GSO_IPXIP6 will be used when support for GSO with IP
>>> encapsulation over IPv6 is added.
>>>
>>> Signed-off-by: Tom Herbert 
>>> ---
>>>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c  |  5 ++---
>>>  drivers/net/ethernet/broadcom/bnxt/bnxt.c |  4 ++--
>>>  drivers/net/ethernet/intel/i40e/i40e_main.c   |  3 +--
>>>  drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  3 +--
>>>  drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  3 +--
>>>  drivers/net/ethernet/intel/i40evf/i40evf_main.c   |  3 +--
>>>  drivers/net/ethernet/intel/igb/igb_main.c |  3 +--
>>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  3 +--
>>>  drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |  3 +--
>>>  include/linux/netdev_features.h   | 12 ++--
>>>  include/linux/netdevice.h |  4 ++--
>>>  include/linux/skbuff.h|  4 ++--
>>>  net/core/ethtool.c|  4 ++--
>>>  net/ipv4/af_inet.c|  2 +-
>>>  net/ipv4/ipip.c   |  2 +-
>>>  net/ipv6/ip6_offload.c|  4 ++--
>>>  net/ipv6/sit.c|  4 ++--
>>>  net/netfilter/ipvs/ip_vs_xmit.c   | 17 +++--
>>>  18 files changed, 36 insertions(+), 47 deletions(-)
>>
>> It looks like you missed drivers/net/ethernet/intel/igb/netdev.c.  If
>> you don't get it then it will break the build.
>>
>
> I don't see file that in betdev branch, maybe it's new?

Nope, it has been there for a while.  It got patched to support IPIP
and SIT tunnels in the same patch that updated igb/igb_main.c.

I am also looking into the other patches now.  It looks like something
broke hardware offloads again as I am only getting .9 Mb/s for IPv6
based GRE tunnels with your patches applied.  I'm trying to bisect it
now.

- Alex

Re: pull-request: wireless-drivers-next 2016-05-13

2016-05-16 Thread Coelho, Luciano

On Mon, 2016-05-16 at 17:08 +0300, Kalle Valo wrote:
> Kalle Valo  writes:
> 
> > 
> > Kalle Valo  writes:
> > 
> > > 
> > > The following changes since commit
> > > ede00a5ceb4d903a8c137a52bb77d574baaef8bd:
> > > 
> > >   Merge tag 'wireless-drivers-next-for-davem-2016-05-02' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-
> > > drivers-next (2016-05-03 00:35:16 -0400)
> > > 
> > > are available in the git repository at:
> > > 
> > > 
> > >   git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-
> > > drivers-next.git tags/wireless-drivers-next-for-davem-2016-05-13
> > Please don't pull this yet, there might be something wrong now with
> > merges and need to check that first.
> Ok, like discussed in thread "linux-next: manual merge of the
> wireless-drivers-next tree with the net-next tree" there seems to be
> a
> problem on net-next in function iwl_mvm_set_tx_cmd(). Here is how I
> propose to fix this.
> 
> When pulling the tag above you should get a conflict like this:
> 
> diff --cc drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> index 880210917a6f,779bafcbc9a1..
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
> @@@ -294,7 -295,7 +294,11 @@@ void iwl_mvm_set_tx_cmd(struct iwl_mvm 
> tx_cmd->tx_flags = cpu_to_le32(tx_flags);
> /* Total # bytes to be transmitted */
> tx_cmd->len = cpu_to_le16((u16)skb->len +
> ++<<< HEAD
>  +  (uintptr_t)info->driver_data[0]);
> ++===
> +   (uintptr_t)skb_info->driver_data[0]);
> ++>>> master
> tx_cmd->life_time = cpu_to_le32(TX_CMD_LIFE_TIME_INFINITE);
> tx_cmd->sta_id = sta_id;
> 
> Pick the latter with skb_info and then add skb_info to the beginning
> of
> the same function. So the function should be:
> 
> void iwl_mvm_set_tx_cmd(struct iwl_mvm *mvm, struct sk_buff *skb,
>   struct iwl_tx_cmd *tx_cmd,
>   struct ieee80211_tx_info *info, u8 sta_id)
> {
>   struct ieee80211_tx_info *skb_info = IEEE80211_SKB_CB(skb);
>   struct ieee80211_hdr *hdr = (void *)skb->data;
>   __le16 fc = hdr->frame_control;
>   u32 tx_flags = le32_to_cpu(tx_cmd->tx_flags);
>   u32 len = skb->len + FCS_LEN;
>   u8 ac;
> 
> [...]
> 
>   tx_cmd->tx_flags = cpu_to_le32(tx_flags);
>   /* Total # bytes to be transmitted */
>   tx_cmd->len = cpu_to_le16((u16)skb->len +
>   (uintptr_t)skb_info->driver_data[0]);
>   tx_cmd->life_time = cpu_to_le32(TX_CMD_LIFE_TIME_INFINITE);
>   tx_cmd->sta_id = sta_id;
> 
> Sorry about the hassle and please let me know if you have any
> problems.
> Adding Luca and Emmanuel just in case I missed something.

ACK.  This looks correct.  I just diffed the iwlwifi-next.git tree (at
commit a525d0eab17d -- which is where I merge iwlwifi-fixes into
iwlwifi-next) with net-next.git master and the difference [1] is
exactly what you proposed to fix.

[1] http://pastebin.coelho.fi/1b6907cdb9a25413.txt

--
Cheers,
Luca.

1 2 >

1 - 100 of 188 matches

Mail list logo