Re: [PATCH] openvswitch: allow management from inside user namespaces

2016-01-29 Thread Eric W. Biederman
Tycho Andersen  writes:

> Operations with the GENL_ADMIN_PERM flag fail permissions checks because
> this flag means we call netlink_capable, which uses the init user ns.
>
> Instead, let's do permissions checks in each function, but use the netlink
> socket's user ns instead of the initial one, to allow management of
> openvswitch resources from inside a user ns.
>
> The motivation for this is to be able to run openvswitch in unprivileged
> containers. I've tested this and it seems to work, but I really have no
> idea about the security consequences of this patch, so thoughts would be
> much appreciated.

So at a quick look using ns_capable this way is probably buggy.

netlink is goofy (because historically we got this wrong), and I forget
what the specific rules are.  The general rule is that you need to do
your permission checks on open/create/connect and not inside send/write
while processing data.  Otherwise there is a class of privileged
applications where you can set their stdout to some precreated file
descriptor and their output can be made to act as a command, bypassing
your permission checks.

Eric

> Reported-by: James Page 
> Signed-off-by: Tycho Andersen 
> CC: Eric Biederman 
> CC: Pravin Shelar 
> CC: Justin Pettit 
> CC: "David S. Miller" 
> ---
>  net/openvswitch/datapath.c | 63 
> ++
>  1 file changed, 53 insertions(+), 10 deletions(-)
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index deadfda..aacfb11 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -557,6 +557,10 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, 
> struct genl_info *info)
>   int err;
>   bool log = !a[OVS_PACKET_ATTR_PROBE];
>  
> + err = -EPERM;
> + if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> + goto err;
> +
>   err = -EINVAL;
>   if (!a[OVS_PACKET_ATTR_PACKET] || !a[OVS_PACKET_ATTR_KEY] ||
>   !a[OVS_PACKET_ATTR_ACTIONS])
> @@ -654,7 +658,7 @@ static const struct nla_policy 
> packet_policy[OVS_PACKET_ATTR_MAX + 1] = {
>  
>  static const struct genl_ops dp_packet_genl_ops[] = {
>   { .cmd = OVS_PACKET_CMD_EXECUTE,
> -   .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
> +   .flags = 0,
> .policy = packet_policy,
> .doit = ovs_packet_cmd_execute
>   }
> @@ -920,6 +924,10 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   int error;
>   bool log = !a[OVS_FLOW_ATTR_PROBE];
>  
> + error = -EPERM;
> + if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> + goto error;
> +
>   /* Must have key and actions. */
>   error = -EINVAL;
>   if (!a[OVS_FLOW_ATTR_KEY]) {
> @@ -1104,6 +1112,10 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, 
> struct genl_info *info)
>   bool log = !a[OVS_FLOW_ATTR_PROBE];
>   bool ufid_present;
>  
> + error = -EPERM;
> + if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> + goto error;
> +
>   /* Extract key. */
>   error = -EINVAL;
>   if (!a[OVS_FLOW_ATTR_KEY]) {
> @@ -1274,6 +1286,9 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct 
> genl_info *info)
>   bool log = !a[OVS_FLOW_ATTR_PROBE];
>   bool ufid_present;
>  
> + if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> + return -EPERM;
> +
>   ufid_present = ovs_nla_get_ufid(, a[OVS_FLOW_ATTR_UFID], log);
>   if (a[OVS_FLOW_ATTR_KEY]) {
>   ovs_match_init(, , NULL);
> @@ -1391,12 +1406,12 @@ static const struct nla_policy 
> flow_policy[OVS_FLOW_ATTR_MAX + 1] = {
>  
>  static const struct genl_ops dp_flow_genl_ops[] = {
>   { .cmd = OVS_FLOW_CMD_NEW,
> -   .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
> +   .flags = 0,
> .policy = flow_policy,
> .doit = ovs_flow_cmd_new
>   },
>   { .cmd = OVS_FLOW_CMD_DEL,
> -   .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
> +   .flags = 0,
> .policy = flow_policy,
> .doit = ovs_flow_cmd_del
>   },
> @@ -1407,7 +1422,7 @@ static const struct genl_ops dp_flow_genl_ops[] = {
> .dumpit = ovs_flow_cmd_dump
>   },
>   { .cmd = OVS_FLOW_CMD_SET,
> -   .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
> +   .flags = 0,
> .policy = flow_policy,
> .doit = ovs_flow_cmd_set,
>   },
> @@ -1530,8 +1545,12 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
> genl_info *info)
>   struct datapath *dp;
>   struct vport *vport;
>   struct ovs_net *ovs_net;
> + struct net *net = sock_net(skb->sk);
>   int err, i;
>  
> + if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> + return -EPERM;
> +
>   

Re: [PATCH net] ipv4: early demux should be aware of fragments

2016-01-29 Thread Eric Dumazet
On Wed, 2016-01-27 at 17:02 -0500, David Miller wrote:
> From: Eric Dumazet 
> Date: Tue, 26 Jan 2016 16:59:42 -0800
> 
> > From: Eric Dumazet 
> > 
> > We should not assume a valid protocol header is present,
> > as this is not the case for IPv4 fragments.
> > 
> > Lets avoid extra cache line misses and potential bugs
> > if we actually find a socket and incorrectly uses its dst.
> > 
> > Signed-off-by: Eric Dumazet 
> 
> Applied, thanks.

David, I do not see this patch in net tree ?

Thanks !




Re: [PATCH net] net: dsa: mv88e6xxx: fix port VLAN maps

2016-01-29 Thread Kevin Smith
On 01/28/2016 08:18 PM, Andrew Lunn wrote:
> On Thu, Jan 28, 2016 at 04:54:37PM -0500, Vivien Didelot wrote:
>> Currently the port based VLAN maps should be configured to allow every
>> port to egress frames on all other ports, except themselves.
>>
>> The debugfs interface shows that they are misconfigured. For instance, a
>> 7-port switch has the following content in the related register 0x06:
>>
>> GLOBAL GLOBAL2 SERDES   0123456
>>  ...
>>  6:  1fa41f0f   4   7f   7e   7d   7c   7b   7a   79
>>  ...
>>
>> This means that port 3 is allowed to talk to port 2-6, but cannot talk
>> to ports 0 and 1. With this fix, port 3 can correctly talk to all ports
>> except 3 itself:
>>
>> GLOBAL GLOBAL2 SERDES   0123456
>>  ...
>>  6:  1fa41f0f   4   7e   7d   7b   77   6f   5f   3f
>>  ...
>>
>> Fixes: ede8098d0fef ("net: dsa: mv88e6xxx: bridges do not need an FID")
>> Reported-by: Kevin Smith 
>> Signed-off-by: Vivien Didelot 
> Reviewed-by: Andrew Lunn 
Tested-by: Kevin Smith 


Re: [RFC PATCH net-next 0/3] sctp: add GSO support

2016-01-29 Thread Marcelo Ricardo Leitner
On Fri, Jan 29, 2016 at 11:57:46AM +0100, Michael Tuexen wrote:
> > On 29 Jan 2016, at 02:18, Marcelo Ricardo Leitner 
> >  wrote:
> > 
> > On Fri, Jan 29, 2016 at 12:36:05AM +0100, Michael Tuexen wrote:
> >> 
> >>> On 28 Jan 2016, at 22:03, Marcelo Ricardo Leitner 
> >>>  wrote:
> >>> 
> >>> On Thu, Jan 28, 2016 at 06:54:06PM +0100, Michael Tuexen wrote:
> > On 28 Jan 2016, at 14:51, David Laight  wrote:
> > 
> > From: Marcelo Ricardo Leitner
> >> Sent: 27 January 2016 17:07
> >> This patchset is merely a RFC for the moment. There are some
> >> controversial points that I'd like to discuss before actually proposing
> >> the patches.
> > 
> > You also need to look at how a 'user' can actually get SCTP to
> > merge data chunks in the first place.
> > 
> > With Nagle disabled (and it probably has to be since the data flow
> > is unlikely to be 'command-response' or 'unidirectional bulk')
> > it is currently almost impossible to get more than one chunk
> > into an ethernet frame.
> > 
> > Support for MSG_MORE would help.
>  What about adding support for the explicit EOR mode as specified in
>  https://tools.ietf.org/html/rfc6458#section-8.1.26
> >>> 
> >>> Seizing the moment to clarify my understanding on that. :)
> >>> Such multiple calls to send system calls will result in a single data
> >>> chunk. Is that so? That's what I get from that text and also from this
> >> No. It results in a single user message. This means you can send
> >> a user message larger than the send buffer size. How the user message
> >> is fragmented in DATA chunks is transparent to the upper layer.
> >> 
> >> Does this make things clearer?
> > 
> > I think so, yes. So it allows delaying setting the Ending fragment bit
> > until the application set SCTP_EOR. All the rest before this stays as
> > before: first send() will generate a chunk with Beginning bit set and
> > may generate some other middle-fragments (no B nor E bit set) if
> > necessary, second to N-1 call to send will generate only middle
> > fragments, while the last send, with SCTP_EOF, will then set the Ending
> > fragment in the last one. Right?
> Yes. But there are no restrictions on the user data provided in send()
> calls and DATA chunks. So you can
> send(10 byte, no SCTP_EOR)
> resulting in one DATA chunk with the B bit, several with no B and no E bit.
> send(10 byte, no SCTP_EOR)
> resulting in several chunks with no B and no E bit.
> send(10 byte, SCTP_EOR)
> resulting in several chunks with no B and no E bit and one (the last) chunk
> with the E bit.
> 
> On the other hand you can do
> send(1 byte, no SCTP_EOR)
> resulting in a single DATA chunk with the E bit set.
> send(1 byte, no SCTP_EOR)
> send(1 byte, no SCTP_EOR)
> send(1 byte, no SCTP_EOR)
> send(1 byte, no SCTP_EOR)
> send(1 byte, no SCTP_EOR)
> All resulting in a single DATA chunk with 5 bytes user data and no B or E bit.
> (For example if Nagle is enabled and only after the last send call the SACK 
> arrives).
> send(1 byte, SCTP_EOR)
> results in a single DATA chunk with the E bist set.

Cool, thanks Michael. It will be quite fun to mix this with MSG_MORE
logic, I think :)

Best regards,
Marcelo



[PATCH v2 1/7] net: moxart: use correct accessors for DMA memory

2016-01-29 Thread Arnd Bergmann
The moxart ethernet driver confuses coherent DMA buffers with
MMIO registers.

moxart_ether.c: In function 'moxart_mac_setup_desc_ring':
moxart_ether.c:146:428: error: passing argument 1 of '__fswab32' makes integer 
from pointer without a cast [-Werror=int-conversion]
moxart_ether.c:74:39: warning: incorrect type in argument 3 (different address 
spaces)
moxart_ether.c:74:39:expected void *cpu_addr
moxart_ether.c:74:39:got void [noderef] *tx_desc_base

This leaves the basic logic alone and uses normal pointers for
the virtual address of the descriptor. As we cannot use readl/writel
to access them, we also introduce our own moxart_desc_read
moxart_desc_write helpers that perform the same endianess swap
as the original code, but without the address space conversion.

The barriers are made explicit here where needed: Even in the worst-case
scenario, we just have to use a rmb() after checking ownership so
we don't read any input data before we are sure it is value, and we
use wmb() before transferring ownership back to the device.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/moxa/moxart_ether.c | 46 +---
 drivers/net/ethernet/moxa/moxart_ether.h |  4 +--
 2 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/moxa/moxart_ether.c 
b/drivers/net/ethernet/moxa/moxart_ether.c
index a10c928bbd6b..00cfd95ca59d 100644
--- a/drivers/net/ethernet/moxa/moxart_ether.c
+++ b/drivers/net/ethernet/moxa/moxart_ether.c
@@ -28,6 +28,16 @@
 
 #include "moxart_ether.h"
 
+static inline void moxart_desc_write(u32 data, u32 *desc)
+{
+   *desc = cpu_to_le32(data);
+}
+
+static inline u32 moxart_desc_read(u32 *desc)
+{
+   return le32_to_cpu(*desc);
+}
+
 static inline void moxart_emac_write(struct net_device *ndev,
 unsigned int reg, unsigned long value)
 {
@@ -112,7 +122,7 @@ static void moxart_mac_enable(struct net_device *ndev)
 static void moxart_mac_setup_desc_ring(struct net_device *ndev)
 {
struct moxart_mac_priv_t *priv = netdev_priv(ndev);
-   void __iomem *desc;
+   void *desc;
int i;
 
for (i = 0; i < TX_DESC_NUM; i++) {
@@ -121,7 +131,7 @@ static void moxart_mac_setup_desc_ring(struct net_device 
*ndev)
 
priv->tx_buf[i] = priv->tx_buf_base + priv->tx_buf_size * i;
}
-   writel(TX_DESC1_END, desc + TX_REG_OFFSET_DESC1);
+   moxart_desc_write(TX_DESC1_END, desc + TX_REG_OFFSET_DESC1);
 
priv->tx_head = 0;
priv->tx_tail = 0;
@@ -129,8 +139,8 @@ static void moxart_mac_setup_desc_ring(struct net_device 
*ndev)
for (i = 0; i < RX_DESC_NUM; i++) {
desc = priv->rx_desc_base + i * RX_REG_DESC_SIZE;
memset(desc, 0, RX_REG_DESC_SIZE);
-   writel(RX_DESC0_DMA_OWN, desc + RX_REG_OFFSET_DESC0);
-   writel(RX_BUF_SIZE & RX_DESC1_BUF_SIZE_MASK,
+   moxart_desc_write(RX_DESC0_DMA_OWN, desc + RX_REG_OFFSET_DESC0);
+   moxart_desc_write(RX_BUF_SIZE & RX_DESC1_BUF_SIZE_MASK,
   desc + RX_REG_OFFSET_DESC1);
 
priv->rx_buf[i] = priv->rx_buf_base + priv->rx_buf_size * i;
@@ -141,12 +151,12 @@ static void moxart_mac_setup_desc_ring(struct net_device 
*ndev)
if (dma_mapping_error(>dev, priv->rx_mapping[i]))
netdev_err(ndev, "DMA mapping error\n");
 
-   writel(priv->rx_mapping[i],
+   moxart_desc_write(priv->rx_mapping[i],
   desc + RX_REG_OFFSET_DESC2 + RX_DESC2_ADDRESS_PHYS);
-   writel(priv->rx_buf[i],
+   moxart_desc_write((uintptr_t)priv->rx_buf[i],
   desc + RX_REG_OFFSET_DESC2 + RX_DESC2_ADDRESS_VIRT);
}
-   writel(RX_DESC1_END, desc + RX_REG_OFFSET_DESC1);
+   moxart_desc_write(RX_DESC1_END, desc + RX_REG_OFFSET_DESC1);
 
priv->rx_head = 0;
 
@@ -201,14 +211,15 @@ static int moxart_rx_poll(struct napi_struct *napi, int 
budget)
  napi);
struct net_device *ndev = priv->ndev;
struct sk_buff *skb;
-   void __iomem *desc;
+   void *desc;
unsigned int desc0, len;
int rx_head = priv->rx_head;
int rx = 0;
 
while (rx < budget) {
desc = priv->rx_desc_base + (RX_REG_DESC_SIZE * rx_head);
-   desc0 = readl(desc + RX_REG_OFFSET_DESC0);
+   desc0 = moxart_desc_read(desc + RX_REG_OFFSET_DESC0);
+   rmb(); /* ensure desc0 is up to date */
 
if (desc0 & RX_DESC0_DMA_OWN)
break;
@@ -250,7 +261,8 @@ static int moxart_rx_poll(struct napi_struct *napi, int 
budget)
priv->stats.multicast++;
 
 rx_next:
-   writel(RX_DESC0_DMA_OWN, desc + RX_REG_OFFSET_DESC0);
+   wmb(); /* prevent setting ownership back too early */
+

Re: [PATCH net v2 2/2] ipv6/udp: use sticky pktinfo egress ifindex on connect()

2016-01-29 Thread Hannes Frederic Sowa

On 29.01.2016 12:30, Paolo Abeni wrote:

Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.

Signed-off-by: Paolo Abeni 


Acked-by: Hannes Frederic Sowa 




Re: [PATCH net v2 1/2] ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()

2016-01-29 Thread Hannes Frederic Sowa

On 29.01.2016 12:30, Paolo Abeni wrote:

The current implementation of ip6_dst_lookup_tail basically
ignore the egress ifindex match: if the saddr is set,
ip6_route_output() purposefully ignores flowi6_oif, due
to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
flag if saddr set"), if the saddr is 'any' the first route lookup
in ip6_dst_lookup_tail fails, but upon failure a second lookup will
be performed with saddr set, thus ignoring the ifindex constraint.

This commit adds an output route lookup function variant, which
allows the caller to specify lookup flags, and modify
ip6_dst_lookup_tail() to enforce the ifindex match on the second
lookup via said helper.

ip6_route_output() becames now a static inline function build on
top of ip6_route_output_flags(); as a side effect, out-of-tree
modules need now a GPL license to access the output route lookup
functionality.

Signed-off-by: Paolo Abeni 
--
v1 -> v2 move the ip6_route_output() implementation into the header


Acked-by: Hannes Frederic Sowa 



Yours Truely Mrs Machiko Kumiko

2016-01-29 Thread Mrs Machiko Kumiko
Hi Dear, 

I am Mrs Machiko Kumiko, from Japan, I Have Been Diagnosed with esophageal 
Cancer. 
I Have Chosen you to distribute my Funds to Charities homes in your Country, 
so if you wish to Carry out this humanitarian Work kindly Get back to me for 
FURTHER details. 

Yours Truely Mrs Machiko Kumiko


[PATCH net v2 0/2] pv6: fix sticky pktinfo behaviour

2016-01-29 Thread Paolo Abeni
Currently:

ip addr add dev eth0 2001:0010::1/64
ip addr add dev eth1 2001:0020::1/64
ping6 -I eth0 2001:0020::2

do not lead to the expected results, i.e. eth1 is used as the
egress interface.

This is due to two related issues in handling sticky pktinfo,
used by ping6 to enforce the device binding:

- ip6_dst_lookup_flow()/ip6_dst_lookup_tail() do not really enforce
flowi6_oif match
- ipv6 udp connect() just ignore flowi6_oif

These patches address each issue individually.

The kernel has never enforced the egress interface specified
via the sticky pktinfo, except briefly between the commits
741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
and
d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set"),
but the ping6 tools was unaffected up to iputils-20100214,
since before it used SO_BINDTODEVICE to enforce the egress
interface.

Paolo Abeni (2):
  ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
  ipv6/udp: use sticky pktinfo egress ifindex on connect()

 include/net/ip6_route.h | 12 ++--
 net/ipv6/datagram.c |  3 +++
 net/ipv6/ip6_output.c   |  6 +-
 net/ipv6/route.c|  7 +++
 4 files changed, 21 insertions(+), 7 deletions(-)

-- 
1.8.3.1



[PATCH net v2 2/2] ipv6/udp: use sticky pktinfo egress ifindex on connect()

2016-01-29 Thread Paolo Abeni
Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.

Signed-off-by: Paolo Abeni 
---
 net/ipv6/datagram.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 517c55b..4281621 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -162,6 +162,9 @@ ipv4_connected:
fl6.fl6_dport = inet->inet_dport;
fl6.fl6_sport = inet->inet_sport;
 
+   if (!fl6.flowi6_oif)
+   fl6.flowi6_oif = np->sticky_pktinfo.ipi6_ifindex;
+
if (!fl6.flowi6_oif && (addr_type_ADDR_MULTICAST))
fl6.flowi6_oif = np->mcast_oif;
 
-- 
1.8.3.1



[PATCH net v2 1/2] ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()

2016-01-29 Thread Paolo Abeni
The current implementation of ip6_dst_lookup_tail basically
ignore the egress ifindex match: if the saddr is set,
ip6_route_output() purposefully ignores flowi6_oif, due
to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
flag if saddr set"), if the saddr is 'any' the first route lookup
in ip6_dst_lookup_tail fails, but upon failure a second lookup will
be performed with saddr set, thus ignoring the ifindex constraint.

This commit adds an output route lookup function variant, which
allows the caller to specify lookup flags, and modify
ip6_dst_lookup_tail() to enforce the ifindex match on the second
lookup via said helper.

ip6_route_output() becames now a static inline function build on
top of ip6_route_output_flags(); as a side effect, out-of-tree
modules need now a GPL license to access the output route lookup
functionality.

Signed-off-by: Paolo Abeni 
--
v1 -> v2 move the ip6_route_output() implementation into the header
---
 include/net/ip6_route.h | 12 ++--
 net/ipv6/ip6_output.c   |  6 +-
 net/ipv6/route.c|  7 +++
 3 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 877f682..295d291 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -64,8 +64,16 @@ static inline bool rt6_need_strict(const struct in6_addr 
*daddr)
 
 void ip6_route_input(struct sk_buff *skb);
 
-struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk,
-  struct flowi6 *fl6);
+struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock 
*sk,
+struct flowi6 *fl6, int flags);
+
+static inline struct dst_entry *ip6_route_output(struct net *net,
+const struct sock *sk,
+struct flowi6 *fl6)
+{
+   return ip6_route_output_flags(net, sk, fl6, 0);
+}
+
 struct dst_entry *ip6_route_lookup(struct net *net, struct flowi6 *fl6,
   int flags);
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 23de98f..a163102 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -909,6 +909,7 @@ static int ip6_dst_lookup_tail(struct net *net, const 
struct sock *sk,
struct rt6_info *rt;
 #endif
int err;
+   int flags = 0;
 
/* The correct way to handle this would be to do
 * ip6_route_get_saddr, and then ip6_route_output; however,
@@ -940,10 +941,13 @@ static int ip6_dst_lookup_tail(struct net *net, const 
struct sock *sk,
dst_release(*dst);
*dst = NULL;
}
+
+   if (fl6->flowi6_oif)
+   flags |= RT6_LOOKUP_F_IFACE;
}
 
if (!*dst)
-   *dst = ip6_route_output(net, sk, fl6);
+   *dst = ip6_route_output_flags(net, sk, fl6, flags);
 
err = (*dst)->error;
if (err)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 3c8834b..ed44663 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1183,11 +1183,10 @@ static struct rt6_info *ip6_pol_route_output(struct net 
*net, struct fib6_table
return ip6_pol_route(net, table, fl6->flowi6_oif, fl6, flags);
 }
 
-struct dst_entry *ip6_route_output(struct net *net, const struct sock *sk,
-   struct flowi6 *fl6)
+struct dst_entry *ip6_route_output_flags(struct net *net, const struct sock 
*sk,
+struct flowi6 *fl6, int flags)
 {
struct dst_entry *dst;
-   int flags = 0;
bool any_src;
 
dst = l3mdev_rt6_dst_by_oif(net, fl6);
@@ -1208,7 +1207,7 @@ struct dst_entry *ip6_route_output(struct net *net, const 
struct sock *sk,
 
return fib6_rule_lookup(net, fl6, flags, ip6_pol_route_output);
 }
-EXPORT_SYMBOL(ip6_route_output);
+EXPORT_SYMBOL_GPL(ip6_route_output_flags);
 
 struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry 
*dst_orig)
 {
-- 
1.8.3.1



[PATCH v2 0/7] network driver fixes

2016-01-29 Thread Arnd Bergmann
This is an updated series of fixes for the network device drivers
that showed warnings in ARM randconfig.

Changes since v1 are:

dropped "net: macb: avoid uninitialized variables", already fixed in net-next

dropped "net: fddi/defxx: avoid warning about uninitialized variable
use", already fixed in net-next

added missing barriers in "net: moxart: use correct accessors for
DMA memory"

clarified "net: bgmac: clarify CONFIG_BCMA dependency" changelog


Arnd Bergmann (7):
  net: moxart: use correct accessors for DMA memory
  net: davinci_cpdma: use dma_addr_t for DMA address
  net: hp100: remove unnecessary #ifdefs
  net: bgmac: clarify CONFIG_BCMA dependency
  net: vxge: avoid unused function warnings
  net: nb8800: avoid uninitialized variable warning
  net: tg3: avoid uninitialized variable warning

 drivers/net/ethernet/aurora/nb8800.c   |  4 +--
 drivers/net/ethernet/broadcom/Kconfig  |  5 ++-
 drivers/net/ethernet/broadcom/tg3.c|  2 +-
 drivers/net/ethernet/hp/hp100.c| 18 --
 drivers/net/ethernet/moxa/moxart_ether.c   | 46 +-
 drivers/net/ethernet/moxa/moxart_ether.h   |  4 +--
 drivers/net/ethernet/neterion/vxge/vxge-main.c | 31 +++--
 drivers/net/ethernet/ti/davinci_cpdma.c| 12 +++
 8 files changed, 57 insertions(+), 65 deletions(-)

-- 
2.7.0



Re: [RFC PATCH net-next 0/3] sctp: add GSO support

2016-01-29 Thread Michael Tuexen

> On 29 Jan 2016, at 12:26, Marcelo Ricardo Leitner  
> wrote:
> 
> On Fri, Jan 29, 2016 at 11:57:46AM +0100, Michael Tuexen wrote:
>>> On 29 Jan 2016, at 02:18, Marcelo Ricardo Leitner 
>>>  wrote:
>>> 
>>> On Fri, Jan 29, 2016 at 12:36:05AM +0100, Michael Tuexen wrote:
 
> On 28 Jan 2016, at 22:03, Marcelo Ricardo Leitner 
>  wrote:
> 
> On Thu, Jan 28, 2016 at 06:54:06PM +0100, Michael Tuexen wrote:
>>> On 28 Jan 2016, at 14:51, David Laight  wrote:
>>> 
>>> From: Marcelo Ricardo Leitner
 Sent: 27 January 2016 17:07
 This patchset is merely a RFC for the moment. There are some
 controversial points that I'd like to discuss before actually proposing
 the patches.
>>> 
>>> You also need to look at how a 'user' can actually get SCTP to
>>> merge data chunks in the first place.
>>> 
>>> With Nagle disabled (and it probably has to be since the data flow
>>> is unlikely to be 'command-response' or 'unidirectional bulk')
>>> it is currently almost impossible to get more than one chunk
>>> into an ethernet frame.
>>> 
>>> Support for MSG_MORE would help.
>> What about adding support for the explicit EOR mode as specified in
>> https://tools.ietf.org/html/rfc6458#section-8.1.26
> 
> Seizing the moment to clarify my understanding on that. :)
> Such multiple calls to send system calls will result in a single data
> chunk. Is that so? That's what I get from that text and also from this
 No. It results in a single user message. This means you can send
 a user message larger than the send buffer size. How the user message
 is fragmented in DATA chunks is transparent to the upper layer.
 
 Does this make things clearer?
>>> 
>>> I think so, yes. So it allows delaying setting the Ending fragment bit
>>> until the application set SCTP_EOR. All the rest before this stays as
>>> before: first send() will generate a chunk with Beginning bit set and
>>> may generate some other middle-fragments (no B nor E bit set) if
>>> necessary, second to N-1 call to send will generate only middle
>>> fragments, while the last send, with SCTP_EOF, will then set the Ending
>>> fragment in the last one. Right?
>> Yes. But there are no restrictions on the user data provided in send()
>> calls and DATA chunks. So you can
>> send(10 byte, no SCTP_EOR)
>> resulting in one DATA chunk with the B bit, several with no B and no E bit.
>> send(10 byte, no SCTP_EOR)
>> resulting in several chunks with no B and no E bit.
>> send(10 byte, SCTP_EOR)
>> resulting in several chunks with no B and no E bit and one (the last) chunk
>> with the E bit.
>> 
>> On the other hand you can do
>> send(1 byte, no SCTP_EOR)
>> resulting in a single DATA chunk with the E bit set.
>> send(1 byte, no SCTP_EOR)
>> send(1 byte, no SCTP_EOR)
>> send(1 byte, no SCTP_EOR)
>> send(1 byte, no SCTP_EOR)
>> send(1 byte, no SCTP_EOR)
>> All resulting in a single DATA chunk with 5 bytes user data and no B or E 
>> bit.
>> (For example if Nagle is enabled and only after the last send call the SACK 
>> arrives).
>> send(1 byte, SCTP_EOR)
>> results in a single DATA chunk with the E bist set.
> 
> Cool, thanks Michael. It will be quite fun to mix this with MSG_MORE
> logic, I think :)
Don't know. In FreeBSD we do support SCTP_EOR, but not MSG_MORE, which seems
to be Linux specific.

Best regards
Michael
> 
> Best regards,
> Marcelo
> 
> 



[PATCH] openvswitch: allow management from inside user namespaces

2016-01-29 Thread Tycho Andersen
Operations with the GENL_ADMIN_PERM flag fail permissions checks because
this flag means we call netlink_capable, which uses the init user ns.

Instead, let's do permissions checks in each function, but use the netlink
socket's user ns instead of the initial one, to allow management of
openvswitch resources from inside a user ns.

The motivation for this is to be able to run openvswitch in unprivileged
containers. I've tested this and it seems to work, but I really have no
idea about the security consequences of this patch, so thoughts would be
much appreciated.

Reported-by: James Page 
Signed-off-by: Tycho Andersen 
CC: Eric Biederman 
CC: Pravin Shelar 
CC: Justin Pettit 
CC: "David S. Miller" 
---
 net/openvswitch/datapath.c | 63 ++
 1 file changed, 53 insertions(+), 10 deletions(-)

diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index deadfda..aacfb11 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -557,6 +557,10 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, 
struct genl_info *info)
int err;
bool log = !a[OVS_PACKET_ATTR_PROBE];
 
+   err = -EPERM;
+   if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+   goto err;
+
err = -EINVAL;
if (!a[OVS_PACKET_ATTR_PACKET] || !a[OVS_PACKET_ATTR_KEY] ||
!a[OVS_PACKET_ATTR_ACTIONS])
@@ -654,7 +658,7 @@ static const struct nla_policy 
packet_policy[OVS_PACKET_ATTR_MAX + 1] = {
 
 static const struct genl_ops dp_packet_genl_ops[] = {
{ .cmd = OVS_PACKET_CMD_EXECUTE,
- .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
+ .flags = 0,
  .policy = packet_policy,
  .doit = ovs_packet_cmd_execute
}
@@ -920,6 +924,10 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
int error;
bool log = !a[OVS_FLOW_ATTR_PROBE];
 
+   error = -EPERM;
+   if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+   goto error;
+
/* Must have key and actions. */
error = -EINVAL;
if (!a[OVS_FLOW_ATTR_KEY]) {
@@ -1104,6 +1112,10 @@ static int ovs_flow_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
bool log = !a[OVS_FLOW_ATTR_PROBE];
bool ufid_present;
 
+   error = -EPERM;
+   if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+   goto error;
+
/* Extract key. */
error = -EINVAL;
if (!a[OVS_FLOW_ATTR_KEY]) {
@@ -1274,6 +1286,9 @@ static int ovs_flow_cmd_del(struct sk_buff *skb, struct 
genl_info *info)
bool log = !a[OVS_FLOW_ATTR_PROBE];
bool ufid_present;
 
+   if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+   return -EPERM;
+
ufid_present = ovs_nla_get_ufid(, a[OVS_FLOW_ATTR_UFID], log);
if (a[OVS_FLOW_ATTR_KEY]) {
ovs_match_init(, , NULL);
@@ -1391,12 +1406,12 @@ static const struct nla_policy 
flow_policy[OVS_FLOW_ATTR_MAX + 1] = {
 
 static const struct genl_ops dp_flow_genl_ops[] = {
{ .cmd = OVS_FLOW_CMD_NEW,
- .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
+ .flags = 0,
  .policy = flow_policy,
  .doit = ovs_flow_cmd_new
},
{ .cmd = OVS_FLOW_CMD_DEL,
- .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
+ .flags = 0,
  .policy = flow_policy,
  .doit = ovs_flow_cmd_del
},
@@ -1407,7 +1422,7 @@ static const struct genl_ops dp_flow_genl_ops[] = {
  .dumpit = ovs_flow_cmd_dump
},
{ .cmd = OVS_FLOW_CMD_SET,
- .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
+ .flags = 0,
  .policy = flow_policy,
  .doit = ovs_flow_cmd_set,
},
@@ -1530,8 +1545,12 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
struct datapath *dp;
struct vport *vport;
struct ovs_net *ovs_net;
+   struct net *net = sock_net(skb->sk);
int err, i;
 
+   if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+   return -EPERM;
+
err = -EINVAL;
if (!a[OVS_DP_ATTR_NAME] || !a[OVS_DP_ATTR_UPCALL_PID])
goto err;
@@ -1655,8 +1674,12 @@ static int ovs_dp_cmd_del(struct sk_buff *skb, struct 
genl_info *info)
 {
struct sk_buff *reply;
struct datapath *dp;
+   struct net *net = sock_net(skb->sk);
int err;
 
+   if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
+   return -EPERM;
+
reply = ovs_dp_cmd_alloc_info(info);
if (!reply)
return -ENOMEM;
@@ -1688,8 +1711,12 @@ static int ovs_dp_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
 {
struct sk_buff *reply;
struct datapath *dp;
+   

[PATCH v2 3/7] net: hp100: remove unnecessary #ifdefs

2016-01-29 Thread Arnd Bergmann
Building the hp100 ethernet driver causes warnings when both the PCI
and EISA drivers are disabled:

ethernet/hp/hp100.c: In function 'hp100_module_init':
ethernet/hp/hp100.c:3047:2: warning: label 'out3' defined but not used 
[-Wunused-label]
ethernet/hp/hp100.c: At top level:
ethernet/hp/hp100.c:2828:13: warning: 'cleanup_dev' defined but not used 
[-Wunused-function]

We can easily avoid the warnings and make the driver look slightly
nicer by removing the #ifdefs that check for the CONFIG_PCI and
CONFIG_EISA, as all the registration functions are designed to
have no effect when the buses are disabled.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/hp/hp100.c | 18 --
 1 file changed, 18 deletions(-)

diff --git a/drivers/net/ethernet/hp/hp100.c b/drivers/net/ethernet/hp/hp100.c
index 1d5c3e16d8f4..3daf2d4a7ca0 100644
--- a/drivers/net/ethernet/hp/hp100.c
+++ b/drivers/net/ethernet/hp/hp100.c
@@ -194,7 +194,6 @@ static const char *hp100_isa_tbl[] = {
 };
 #endif
 
-#ifdef CONFIG_EISA
 static struct eisa_device_id hp100_eisa_tbl[] = {
{ "HWPF180" }, /* HP J2577 rev A */
{ "HWP1920" }, /* HP 27248B */
@@ -205,9 +204,7 @@ static struct eisa_device_id hp100_eisa_tbl[] = {
{ "" } /* Mandatory final entry ! */
 };
 MODULE_DEVICE_TABLE(eisa, hp100_eisa_tbl);
-#endif
 
-#ifdef CONFIG_PCI
 static const struct pci_device_id hp100_pci_tbl[] = {
{PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_J2585A, PCI_ANY_ID, PCI_ANY_ID,},
{PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_J2585B, PCI_ANY_ID, PCI_ANY_ID,},
@@ -219,7 +216,6 @@ static const struct pci_device_id hp100_pci_tbl[] = {
{}  /* Terminating entry */
 };
 MODULE_DEVICE_TABLE(pci, hp100_pci_tbl);
-#endif
 
 static int hp100_rx_ratio = HP100_DEFAULT_RX_RATIO;
 static int hp100_priority_tx = HP100_DEFAULT_PRIORITY_TX;
@@ -2842,7 +2838,6 @@ static void cleanup_dev(struct net_device *d)
free_netdev(d);
 }
 
-#ifdef CONFIG_EISA
 static int hp100_eisa_probe(struct device *gendev)
 {
struct net_device *dev = alloc_etherdev(sizeof(struct hp100_private));
@@ -2884,9 +2879,7 @@ static struct eisa_driver hp100_eisa_driver = {
.remove  = hp100_eisa_remove,
 }
 };
-#endif
 
-#ifdef CONFIG_PCI
 static int hp100_pci_probe(struct pci_dev *pdev,
   const struct pci_device_id *ent)
 {
@@ -2955,7 +2948,6 @@ static struct pci_driver hp100_pci_driver = {
.probe  = hp100_pci_probe,
.remove = hp100_pci_remove,
 };
-#endif
 
 /*
  *  module section
@@ -3032,23 +3024,17 @@ static int __init hp100_module_init(void)
err = hp100_isa_init();
if (err && err != -ENODEV)
goto out;
-#ifdef CONFIG_EISA
err = eisa_driver_register(_eisa_driver);
if (err && err != -ENODEV)
goto out2;
-#endif
-#ifdef CONFIG_PCI
err = pci_register_driver(_pci_driver);
if (err && err != -ENODEV)
goto out3;
-#endif
  out:
return err;
  out3:
-#ifdef CONFIG_EISA
eisa_driver_unregister (_eisa_driver);
  out2:
-#endif
hp100_isa_cleanup();
goto out;
 }
@@ -3057,12 +3043,8 @@ static int __init hp100_module_init(void)
 static void __exit hp100_module_exit(void)
 {
hp100_isa_cleanup();
-#ifdef CONFIG_EISA
eisa_driver_unregister (_eisa_driver);
-#endif
-#ifdef CONFIG_PCI
pci_unregister_driver (_pci_driver);
-#endif
 }
 
 module_init(hp100_module_init)
-- 
2.7.0



[PATCH v2 4/7] net: bgmac: clarify CONFIG_BCMA dependency

2016-01-29 Thread Arnd Bergmann
The bgmac driver depends on BCMA_HOST_SOC, which is only used
when CONFIG_BCMA is enabled. However, it is a bool option and can
be set when CONFIG_BCMA=m, and then bgmac can be built-in, leading
to an obvious link error:

drivers/built-in.o: In function `bgmac_init':
:(.init.text+0x7f2c): undefined reference to `__bcma_driver_register'
drivers/built-in.o: In function `bgmac_exit':
:(.exit.text+0x110a): undefined reference to `bcma_driver_unregister'

To avoid this case, we need to depend on both BCMA and BCMA_SOC,
as this patch does. I'm also trying to make the dependency more
readable by splitting it into three lines, and adding a COMPILE_TEST
alternative so we can test-build it in all configurations that
support BCMA.

The added dependency on FIXED_PHY addresses a related issue where
we cannot call fixed_phy_register() when CONFIG_FIXED_PHY=m and
CONFIG_BGMAC=y.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/broadcom/Kconfig | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/Kconfig 
b/drivers/net/ethernet/broadcom/Kconfig
index 8550df189ceb..19f7cd02e085 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -151,8 +151,11 @@ config BNX2X_VXLAN
 
 config BGMAC
tristate "BCMA bus GBit core support"
-   depends on BCMA_HOST_SOC && HAS_DMA && (BCM47XX || ARCH_BCM_5301X)
+   depends on BCMA && BCMA_HOST_SOC
+   depends on HAS_DMA
+   depends on BCM47XX || ARCH_BCM_5301X || COMPILE_TEST
select PHYLIB
+   select FIXED_PHY
---help---
  This driver supports GBit MAC and BCM4706 GBit MAC cores on BCMA bus.
  They can be found on BCM47xx SoCs and provide gigabit ethernet.
-- 
2.7.0



[PATCH v2 2/7] net: davinci_cpdma: use dma_addr_t for DMA address

2016-01-29 Thread Arnd Bergmann
The davinci_cpdma mixes up physical addresses as seen from the CPU
and DMA addresses as seen from a DMA master, since it can operate
on both normal memory or an on-chip buffer. If dma_addr_t is
different from phys_addr_t, this means we get a compile-time warning
about the type mismatch:

ethernet/ti/davinci_cpdma.c: In function 'cpdma_desc_pool_create':
ethernet/ti/davinci_cpdma.c:182:48: error: passing argument 3 of 
'dma_alloc_coherent' from incompatible pointer type 
[-Werror=incompatible-pointer-types]
   pool->cpumap = dma_alloc_coherent(dev, size, >phys,
In file included from ethernet/ti/davinci_cpdma.c:21:0:
dma-mapping.h:398:21: note: expected 'dma_addr_t * {aka long long unsigned int 
*}' but argument is of type 'phys_addr_t * {aka unsigned int *}'
 static inline void *dma_alloc_coherent(struct device *dev, size_t size,

This slightly restructures the code so the address we use for
mapping RAM into a DMA address is always a dma_addr_t, avoiding
the warning. The code is correct even if both types are 32-bit
because the DMA master in this device only supports 32-bit addressing
anyway, independent of the types that are used.

We still assign this value to pool->phys, and that is wrong if
the driver is ever used with an IOMMU, but that value appears to
be never used, so there is no problem really. I've added a couple
of comments about where we do things that are slightly violating
the API.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index 657b65bf5cac..18bf3a8fdc50 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -82,7 +82,7 @@ struct cpdma_desc {
 
 struct cpdma_desc_pool {
phys_addr_t phys;
-   u32 hw_addr;
+   dma_addr_t  hw_addr;
void __iomem*iomap; /* ioremap map */
void*cpumap;/* dma_alloc map */
int desc_size, mem_size;
@@ -152,7 +152,7 @@ struct cpdma_chan {
  * abstract out these details
  */
 static struct cpdma_desc_pool *
-cpdma_desc_pool_create(struct device *dev, u32 phys, u32 hw_addr,
+cpdma_desc_pool_create(struct device *dev, u32 phys, dma_addr_t hw_addr,
int size, int align)
 {
int bitmap_size;
@@ -176,13 +176,13 @@ cpdma_desc_pool_create(struct device *dev, u32 phys, u32 
hw_addr,
 
if (phys) {
pool->phys  = phys;
-   pool->iomap = ioremap(phys, size);
+   pool->iomap = ioremap(phys, size); /* should be memremap? */
pool->hw_addr = hw_addr;
} else {
-   pool->cpumap = dma_alloc_coherent(dev, size, >phys,
+   pool->cpumap = dma_alloc_coherent(dev, size, >hw_addr,
  GFP_KERNEL);
-   pool->iomap = pool->cpumap;
-   pool->hw_addr = pool->phys;
+   pool->iomap = (void __iomem __force *)pool->cpumap;
+   pool->phys = pool->hw_addr; /* assumes no IOMMU, don't use this 
value */
}
 
if (pool->iomap)
-- 
2.7.0



[PATCH v2 7/7] net: tg3: avoid uninitialized variable warning

2016-01-29 Thread Arnd Bergmann
The tg3_set_eeprom() function correctly initializes the 'start' variable,
but gcc generates a false warning:

drivers/net/ethernet/broadcom/tg3.c: In function 'tg3_set_eeprom':
drivers/net/ethernet/broadcom/tg3.c:12057:4: warning: 'start' may be used 
uninitialized in this function [-Wmaybe-uninitialized]

I have not come up with a way to restructure the code in a way that
avoids the warning without making it less readable, so this adds an
initialization for the declaration to shut up that warning.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/broadcom/tg3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/tg3.c 
b/drivers/net/ethernet/broadcom/tg3.c
index 9293675df7ba..49eea8981332 100644
--- a/drivers/net/ethernet/broadcom/tg3.c
+++ b/drivers/net/ethernet/broadcom/tg3.c
@@ -12016,7 +12016,7 @@ static int tg3_set_eeprom(struct net_device *dev, 
struct ethtool_eeprom *eeprom,
int ret;
u32 offset, len, b_offset, odd_len;
u8 *buf;
-   __be32 start, end;
+   __be32 start = 0, end;
 
if (tg3_flag(tp, NO_NVRAM) ||
eeprom->magic != TG3_EEPROM_MAGIC)
-- 
2.7.0



[PATCH v2 5/7] net: vxge: avoid unused function warnings

2016-01-29 Thread Arnd Bergmann
When CONFIG_PCI_MSI is disabled, we get warnings about unused functions
in the vxge driver:

drivers/net/ethernet/neterion/vxge/vxge-main.c:2121:13: warning: 
'adaptive_coalesce_tx_interrupts' defined but not used [-Wunused-function]
drivers/net/ethernet/neterion/vxge/vxge-main.c:2149:13: warning: 
'adaptive_coalesce_rx_interrupts' defined but not used [-Wunused-function]

We could add another #ifdef here, but it's nicer to avoid those warnings
for good by converting the existing #ifdef to if(IS_ENABLED()), which has
the same effect but provides better compile-time coverage in general,
and lets the compiler understand better when the function is intentionally
unused.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/neterion/vxge/vxge-main.c | 31 ++
 1 file changed, 12 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c 
b/drivers/net/ethernet/neterion/vxge/vxge-main.c
index 50d5604833ed..e0993eba5df3 100644
--- a/drivers/net/ethernet/neterion/vxge/vxge-main.c
+++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c
@@ -2223,8 +2223,6 @@ static irqreturn_t vxge_isr_napi(int irq, void *dev_id)
return IRQ_NONE;
 }
 
-#ifdef CONFIG_PCI_MSI
-
 static irqreturn_t vxge_tx_msix_handle(int irq, void *dev_id)
 {
struct vxge_fifo *fifo = (struct vxge_fifo *)dev_id;
@@ -2442,16 +2440,13 @@ static void vxge_rem_msix_isr(struct vxgedev *vdev)
if (vdev->config.intr_type == MSI_X)
pci_disable_msix(vdev->pdev);
 }
-#endif
 
 static void vxge_rem_isr(struct vxgedev *vdev)
 {
-#ifdef CONFIG_PCI_MSI
-   if (vdev->config.intr_type == MSI_X) {
+   if (IS_ENABLED(CONFIG_PCI_MSI) &&
+   vdev->config.intr_type == MSI_X) {
vxge_rem_msix_isr(vdev);
-   } else
-#endif
-   if (vdev->config.intr_type == INTA) {
+   } else if (vdev->config.intr_type == INTA) {
synchronize_irq(vdev->pdev->irq);
free_irq(vdev->pdev->irq, vdev);
}
@@ -2460,11 +2455,10 @@ static void vxge_rem_isr(struct vxgedev *vdev)
 static int vxge_add_isr(struct vxgedev *vdev)
 {
int ret = 0;
-#ifdef CONFIG_PCI_MSI
int vp_idx = 0, intr_idx = 0, intr_cnt = 0, msix_idx = 0, irq_req = 0;
int pci_fun = PCI_FUNC(vdev->pdev->devfn);
 
-   if (vdev->config.intr_type == MSI_X)
+   if (IS_ENABLED(CONFIG_PCI_MSI) && vdev->config.intr_type == MSI_X)
ret = vxge_enable_msix(vdev);
 
if (ret) {
@@ -2475,7 +2469,7 @@ static int vxge_add_isr(struct vxgedev *vdev)
vdev->config.intr_type = INTA;
}
 
-   if (vdev->config.intr_type == MSI_X) {
+   if (IS_ENABLED(CONFIG_PCI_MSI) && vdev->config.intr_type == MSI_X) {
for (intr_idx = 0;
 intr_idx < (vdev->no_of_vpath *
VXGE_HW_VPATH_MSIX_ACTIVE); intr_idx++) {
@@ -2576,9 +2570,8 @@ static int vxge_add_isr(struct vxgedev *vdev)
vdev->vxge_entries[intr_cnt].in_use = 1;
vdev->vxge_entries[intr_cnt].arg = >vpaths[0];
}
-INTA_MODE:
-#endif
 
+INTA_MODE:
if (vdev->config.intr_type == INTA) {
snprintf(vdev->desc[0], VXGE_INTR_STRLEN,
"%s:vxge:INTA", vdev->ndev->name);
@@ -3889,12 +3882,12 @@ static void vxge_device_config_init(struct 
vxge_hw_device_config *device_config,
if (max_mac_vpath > VXGE_MAX_MAC_ADDR_COUNT)
max_mac_vpath = VXGE_MAX_MAC_ADDR_COUNT;
 
-#ifndef CONFIG_PCI_MSI
-   vxge_debug_init(VXGE_ERR,
-   "%s: This Kernel does not support "
-   "MSI-X. Defaulting to INTA", VXGE_DRIVER_NAME);
-   *intr_type = INTA;
-#endif
+   if (!IS_ENABLED(CONFIG_PCI_MSI)) {
+   vxge_debug_init(VXGE_ERR,
+   "%s: This Kernel does not support "
+   "MSI-X. Defaulting to INTA", VXGE_DRIVER_NAME);
+   *intr_type = INTA;
+   }
 
/* Configure whether MSI-X or IRQL. */
switch (*intr_type) {
-- 
2.7.0



Re: [PATCH] mac80211: minstrel_ht: fix out-of-bound in minstrel_ht_set_best_prob_rate

2016-01-29 Thread Felix Fietkau
On 2016-01-29 09:35, Konstantin Khlebnikov wrote:
> Patch fixes this splat
> 
> BUG: KASAN: slab-out-of-bounds in minstrel_ht_update_stats.isra.7+0x6e1/0x9e0
> [mac80211] at addr 8800cee640f4 Read of size 4 by task swapper/3/0
> 
> Signed-off-by: Konstantin Khlebnikov 
> Link: 
> http://lkml.kernel.org/r/calygninyjhsavne35qs6ucgasb2dx1_i5hcravuox14otz2...@mail.gmail.com
Acked-by: Felix Fietkau 


Re: [RFC PATCH net-next 0/3] sctp: add GSO support

2016-01-29 Thread Michael Tuexen
> On 29 Jan 2016, at 02:18, Marcelo Ricardo Leitner  
> wrote:
> 
> On Fri, Jan 29, 2016 at 12:36:05AM +0100, Michael Tuexen wrote:
>> 
>>> On 28 Jan 2016, at 22:03, Marcelo Ricardo Leitner 
>>>  wrote:
>>> 
>>> On Thu, Jan 28, 2016 at 06:54:06PM +0100, Michael Tuexen wrote:
> On 28 Jan 2016, at 14:51, David Laight  wrote:
> 
> From: Marcelo Ricardo Leitner
>> Sent: 27 January 2016 17:07
>> This patchset is merely a RFC for the moment. There are some
>> controversial points that I'd like to discuss before actually proposing
>> the patches.
> 
> You also need to look at how a 'user' can actually get SCTP to
> merge data chunks in the first place.
> 
> With Nagle disabled (and it probably has to be since the data flow
> is unlikely to be 'command-response' or 'unidirectional bulk')
> it is currently almost impossible to get more than one chunk
> into an ethernet frame.
> 
> Support for MSG_MORE would help.
 What about adding support for the explicit EOR mode as specified in
 https://tools.ietf.org/html/rfc6458#section-8.1.26
>>> 
>>> Seizing the moment to clarify my understanding on that. :)
>>> Such multiple calls to send system calls will result in a single data
>>> chunk. Is that so? That's what I get from that text and also from this
>> No. It results in a single user message. This means you can send
>> a user message larger than the send buffer size. How the user message
>> is fragmented in DATA chunks is transparent to the upper layer.
>> 
>> Does this make things clearer?
> 
> I think so, yes. So it allows delaying setting the Ending fragment bit
> until the application set SCTP_EOR. All the rest before this stays as
> before: first send() will generate a chunk with Beginning bit set and
> may generate some other middle-fragments (no B nor E bit set) if
> necessary, second to N-1 call to send will generate only middle
> fragments, while the last send, with SCTP_EOF, will then set the Ending
> fragment in the last one. Right?
Yes. But there are no restrictions on the user data provided in send()
calls and DATA chunks. So you can
send(10 byte, no SCTP_EOR)
resulting in one DATA chunk with the B bit, several with no B and no E bit.
send(10 byte, no SCTP_EOR)
resulting in several chunks with no B and no E bit.
send(10 byte, SCTP_EOR)
resulting in several chunks with no B and no E bit and one (the last) chunk
with the E bit.

On the other hand you can do
send(1 byte, no SCTP_EOR)
resulting in a single DATA chunk with the E bit set.
send(1 byte, no SCTP_EOR)
send(1 byte, no SCTP_EOR)
send(1 byte, no SCTP_EOR)
send(1 byte, no SCTP_EOR)
send(1 byte, no SCTP_EOR)
All resulting in a single DATA chunk with 5 bytes user data and no B or E bit.
(For example if Nagle is enabled and only after the last send call the SACK 
arrives).
send(1 byte, SCTP_EOR)
results in a single DATA chunk with the E bist set.

Best regards
Michael
> 
> Thanks,
> Marcelo
> 
>> 
>> Best regards
>> Michael
>>> snippet:
>>> "Sending a message using sendmsg() is atomic unless explicit end of
>>> record (EOR) marking is enabled on the socket specified by sd (see
>>> Section 8.1.26)."
>>> 
>>> Best regards,
>>> Marcelo
>>> 
 Best regards
 Michael
> 
> Given the current implementation you can get almost the required
> behaviour by turning nagle off and on repeatedly.
> 
> I did wonder whether the queued data could actually be picked up
> be a Heartbeat chunk that is probing a different remote address
> (which would be bad news).
> 
>   David
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
 
>>> 
>> 
> 



[PATCH v2 6/7] net: nb8800: avoid uninitialized variable warning

2016-01-29 Thread Arnd Bergmann
The nb8800_poll() function initializes the 'next' variable in the
loop looking for new input data. We know this will be called at
least once because 'budget' is a guaranteed to be a positive number
when we enter the function, but the compiler doesn't know that
and warns when the variable is used later:

drivers/net/ethernet/aurora/nb8800.c: In function 'nb8800_poll':
drivers/net/ethernet/aurora/nb8800.c:350:21: warning: 'next' may be used 
uninitialized in this function [-Wmaybe-uninitialized]

Changing the 'while() {}' loop to 'do {} while()' makes it obvious
to the compiler what is going on so it no longer warns.

Signed-off-by: Arnd Bergmann 
Acked-by: Mans Rullgard 
---
 drivers/net/ethernet/aurora/nb8800.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index ecc4a334c507..f71ab2647a3b 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -302,7 +302,7 @@ static int nb8800_poll(struct napi_struct *napi, int budget)
nb8800_tx_done(dev);
 
 again:
-   while (work < budget) {
+   do {
struct nb8800_rx_buf *rxb;
unsigned int len;
 
@@ -330,7 +330,7 @@ again:
rxd->report = 0;
last = next;
work++;
-   }
+   } while (work < budget);
 
if (work) {
priv->rx_descs[last].desc.config |= DESC_EOC;
-- 
2.7.0



Re: [PATCH net v2 0/2] pv6: fix sticky pktinfo behaviour

2016-01-29 Thread Hannes Frederic Sowa

On 29.01.2016 12:30, Paolo Abeni wrote:

The kernel has never enforced the egress interface specified
via the sticky pktinfo, except briefly between the commits
741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
and
d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set"),
but the ping6 tools was unaffected up to iputils-20100214,
since before it used SO_BINDTODEVICE to enforce the egress
interface.


I think because of the bad situation in terms of uapi and that recently 
behavior switched multiple times in kernel and iputils, enforcing the 
outgoing interface seems to be the right thing to do for me.


Also it matches expected behavior by ping6. Treating the interface as 
enforced is from a standards PoV the correct thing to do.


Thanks,
Hannes



Re: [PATCH 0/9] network driver fixes

2016-01-29 Thread Arnd Bergmann
On Thursday 28 January 2016 16:14:13 David Miller wrote:
> From: Arnd Bergmann 
> Date: Wed, 27 Jan 2016 15:04:50 +0100
> 
> > These are all fixes for relatively harmless bugs that showed up
> > in my randconfig testing, so they should not be needed for v4.5
> > but get merged into net-next.
> > 
> > I've managed to address all 'uninitialized variable' warnings that
> > I get in ARM randconfig kernels now, this series includes the
> > last five I got in network drivers. They are often really annoying
> > warnings but when we get new ones, they often are about actual
> > bugs in corner cases, so I'm trying hard to eliminate the false
> > positives here to get people to pay attention to added warnings.
> > I've recently tried building with an older gcc and found tons more
> > that are all bogus, this series only addresses the ones that
> > gcc-5.2 finds.
> 
> Arnd I'm expecting a respin of this series to address with the
> feedback you've been given.
> 

Done. I wasn't sure if you planned to pick up the patches
individually, thanks for the clarification.

Arnd


Re: [PATCH net 2/2] net/mlx5e: Use a private copy of netdev ops

2016-01-29 Thread Or Gerlitz
On Fri, Jan 29, 2016 at 6:00 AM, David Miller  wrote:

> This is not the canonical way to fix this.  Please look at how
> other drivers handle this situation before inventing your own
> way of solving the problem.

Dave, I was aware that other drivers do that differently, but the
maintainer here preferred to go this way and I didn't see a deep
reason why not.

> The proper way is to have multiple netdevice ops instances, and simply
> choose the one which is correct for the chip in question.

yep, but - this can become little lengthy if you have N ndos shared
among (say) three different profiles with m1, m2, m3 distinct ndos -
so we thought it can be a bit more elegant to do selective assignment
within a function and not as three globals.

> You should also mark the ops as const.  They should never _ever_ be
> modified at runtime.

okay

Or.


Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Johannes Berg
On Thu, 2016-01-28 at 19:54 -0600, Larry Finger wrote:
> 
> I have been running an RTL8821AE since kernel 3.18 without hitting
> this problem 
> using a TRENDnet AC1750 dual-band AP. The UniFi may be doing
> something that the 
> driver is not expecting.

Are you quite sure you're actually using VHT though, perhaps the AP
somehow turned it off? It seems unlikely that you could successfully
use it in any way given that RATE_INFO_FLAGS_VHT_MCS doesn't show up in
the driver or rate scaling at all.


johannes


pull-request: wireless-drivers 2016-01-29

2016-01-29 Thread Kalle Valo
Hi Dave,

few fixes for 4.5. Nothing really standing out, see the tag for
more info. Please let me know if you have any problems.

Kalle

The following changes since commit a200dcb34693084e56496960d855afdeaaf9578f:

  Merge tag 'for_linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost (2016-01-18 16:44:24 
-0800)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers.git 
tags/wireless-drivers-for-davem-2016-01-29

for you to fetch changes up to f9ead9beef3f44be0db4b542a8c2ce698fb1530e:

  Merge tag 'iwlwifi-for-kalle-2016-01-26_2' of 
https://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes 
(2016-01-27 14:19:18 +0200)



iwlwifi

* Fix support for 3168 device:
  * NVM version
  * firmware file name
  * device IDs
* Fix a compilation warning in dvm calibration code
* Fix the TPC (reduced Tx Power) code. This fixes performance issues
* Add device IDs for 8265

rtx2x00

* fix monitor mode regression dating back to 4.1

brcmfmac

* fix sdio initialisation related crash

rtlwifi

* rtl8821ae: Fix 5G failure when EEPROM is incorrectly encoded

ath9k

* ignore eeprom magic mismatch on flash based devices


Arnd Bergmann (1):
  ssb: mark ssb_bus_register as __maybe_unused

Eli Cooper (1):
  rt2x00: fix monitor mode regression

Felix Fietkau (2):
  brcmfmac: add missing include
  ath9k_hw: ignore eeprom magic mismatch on flash based devices

Gregory Greenman (1):
  iwlwifi: mvm: rs: fix TPC statistics handling

Hante Meuleman (1):
  brcmfmac: fix sdio sg table alloc crash

Kalle Valo (2):
  Merge ath-current from ath.git
  Merge tag 'iwlwifi-for-kalle-2016-01-26_2' of 
https://git.kernel.org/.../iwlwifi/iwlwifi-fixes

Larry Finger (1):
  rtlwifi: rtl8821ae: Fix 5G failure when EEPROM is incorrectly encoded

Michael Buesch (1):
  ssb: Set linux-wireless as MAINTAINERS list

Oren Givon (3):
  iwlwifi: add new 3168 series devices support
  iwlwifi: add device ID for 8265
  iwlwifi: update support for 3168 series firmware and NVM

 MAINTAINERS|2 +-
 drivers/net/wireless/ath/ath9k/eeprom.c|   12 ++--
 .../wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c  |   40 ---
 .../wireless/broadcom/brcm80211/brcmfmac/common.c  |1 +
 .../wireless/broadcom/brcm80211/brcmfmac/sdio.c|5 ++
 .../wireless/broadcom/brcm80211/brcmfmac/sdio.h|1 +
 drivers/net/wireless/intel/iwlwifi/iwl-7000.c  |   23 --
 drivers/net/wireless/intel/iwlwifi/mvm/fw-api-tx.h |6 +-
 drivers/net/wireless/intel/iwlwifi/mvm/rs.c|   74 +---
 drivers/net/wireless/intel/iwlwifi/mvm/tx.c|5 +-
 drivers/net/wireless/intel/iwlwifi/pcie/drv.c  |4 ++
 drivers/net/wireless/ralink/rt2x00/rt2400pci.c |4 +-
 drivers/net/wireless/ralink/rt2x00/rt2500pci.c |4 +-
 drivers/net/wireless/ralink/rt2x00/rt2500usb.c |4 +-
 drivers/net/wireless/ralink/rt2x00/rt2800lib.c |3 +-
 drivers/net/wireless/ralink/rt2x00/rt2x00.h|1 +
 drivers/net/wireless/ralink/rt2x00/rt2x00config.c  |5 ++
 drivers/net/wireless/ralink/rt2x00/rt2x00mac.c |5 --
 drivers/net/wireless/ralink/rt2x00/rt61pci.c   |4 +-
 drivers/net/wireless/ralink/rt2x00/rt73usb.c   |4 +-
 drivers/net/wireless/realtek/rtlwifi/regd.c|2 +-
 drivers/ssb/main.c |7 +-
 22 files changed, 136 insertions(+), 80 deletions(-)

-- 
Kalle Valo


[PATCH] mac80211: fix memory leak

2016-01-29 Thread Sudip Mukherjee
On error we jumped to the error label and returned the error code but we
missed releasing sinfo.

Signed-off-by: Sudip Mukherjee 
---
 net/mac80211/sta_info.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 6c198e6..36e75c4 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -561,6 +561,7 @@ static int sta_info_insert_finish(struct sta_info *sta) 
__acquires(RCU)
__cleanup_single_sta(sta);
  out_err:
mutex_unlock(>sta_mtx);
+   kfree(sinfo);
rcu_read_lock();
return err;
 }
-- 
1.9.1



[PATCH] mac80211: minstrel_ht: fix out-of-bound in minstrel_ht_set_best_prob_rate

2016-01-29 Thread Konstantin Khlebnikov
Patch fixes this splat

BUG: KASAN: slab-out-of-bounds in minstrel_ht_update_stats.isra.7+0x6e1/0x9e0
[mac80211] at addr 8800cee640f4 Read of size 4 by task swapper/3/0

Signed-off-by: Konstantin Khlebnikov 
Link: 
http://lkml.kernel.org/r/calygninyjhsavne35qs6ucgasb2dx1_i5hcravuox14otz2...@mail.gmail.com
---
 net/mac80211/rc80211_minstrel_ht.c |9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/net/mac80211/rc80211_minstrel_ht.c 
b/net/mac80211/rc80211_minstrel_ht.c
index 3928dbd24e25..93bf2b743e20 100644
--- a/net/mac80211/rc80211_minstrel_ht.c
+++ b/net/mac80211/rc80211_minstrel_ht.c
@@ -414,15 +414,16 @@ minstrel_ht_set_best_prob_rate(struct minstrel_ht_sta 
*mi, u16 index)
(max_tp_group != MINSTREL_CCK_GROUP))
return;
 
+   max_gpr_group = mg->max_group_prob_rate / MCS_GROUP_RATES;
+   max_gpr_idx = mg->max_group_prob_rate % MCS_GROUP_RATES;
+   max_gpr_prob = mi->groups[max_gpr_group].rates[max_gpr_idx].prob_ewma;
+
if (mrs->prob_ewma > MINSTREL_FRAC(75, 100)) {
cur_tp_avg = minstrel_ht_get_tp_avg(mi, cur_group, cur_idx,
mrs->prob_ewma);
if (cur_tp_avg > tmp_tp_avg)
mi->max_prob_rate = index;
 
-   max_gpr_group = mg->max_group_prob_rate / MCS_GROUP_RATES;
-   max_gpr_idx = mg->max_group_prob_rate % MCS_GROUP_RATES;
-   max_gpr_prob = 
mi->groups[max_gpr_group].rates[max_gpr_idx].prob_ewma;
max_gpr_tp_avg = minstrel_ht_get_tp_avg(mi, max_gpr_group,
max_gpr_idx,
max_gpr_prob);
@@ -431,7 +432,7 @@ minstrel_ht_set_best_prob_rate(struct minstrel_ht_sta *mi, 
u16 index)
} else {
if (mrs->prob_ewma > tmp_prob)
mi->max_prob_rate = index;
-   if (mrs->prob_ewma > 
mg->rates[mg->max_group_prob_rate].prob_ewma)
+   if (mrs->prob_ewma > max_gpr_prob)
mg->max_group_prob_rate = index;
}
 }



Re: wlcore: fix error handling in wlcore_event_fw_logger

2016-01-29 Thread Kalle Valo

> wlcore_read/wlcore_write can return negative values so it should
> be assigned to signed variable.
> 
> The problem has been detected using proposed semantic patch
> scripts/coccinelle/tests/unsigned_lesser_than_zero.cocci [1].
> 
> [1]: http://permalink.gmane.org/gmane.linux.kernel/2120705
> 
> Fixes: 3719c17e1816 ("wlcore/wl18xx: fw logger over sdio")
> Signed-off-by: Andrzej Hajda 

Thanks, applied to wireless-drivers-next.git.

Kalle Valo


[PATCH net 4/6] net: mvneta: Modify the queue related fields from each cpu

2016-01-29 Thread Gregory CLEMENT
In the MVNETA_INTR_* registers, the queues related fields are per cpu,
according to the datasheet (comment in [] are added by me):
"In a multi-CPU system, bits of RX[or TX] queues for which the access by
the reading[or writing] CPU is disabled are read as 0, and cannot be
cleared[or written]."

That means that each time we want to manipulate these bits we had to do
it on each cpu and not only on the current cpu.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 100 --
 1 file changed, 46 insertions(+), 54 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 861b7e0d7d5f..1ed813d478e8 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1038,6 +1038,43 @@ static void mvneta_set_autoneg(struct mvneta_port *pp, 
int enable)
}
 }
 
+static void mvneta_percpu_unmask_interrupt(void *arg)
+{
+   struct mvneta_port *pp = arg;
+
+   /* All the queue are unmasked, but actually only the ones
+* mapped to this CPU will be unmasked
+*/
+   mvreg_write(pp, MVNETA_INTR_NEW_MASK,
+   MVNETA_RX_INTR_MASK_ALL |
+   MVNETA_TX_INTR_MASK_ALL |
+   MVNETA_MISCINTR_INTR_MASK);
+}
+
+static void mvneta_percpu_mask_interrupt(void *arg)
+{
+   struct mvneta_port *pp = arg;
+
+   /* All the queue are masked, but actually only the ones
+* mapped to this CPU will be masked
+*/
+   mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
+   mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
+   mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
+}
+
+static void mvneta_percpu_clear_intr_cause(void *arg)
+{
+   struct mvneta_port *pp = arg;
+
+   /* All the queue are cleared, but actually only the ones
+* mapped to this CPU will be cleared
+*/
+   mvreg_write(pp, MVNETA_INTR_NEW_CAUSE, 0);
+   mvreg_write(pp, MVNETA_INTR_MISC_CAUSE, 0);
+   mvreg_write(pp, MVNETA_INTR_OLD_CAUSE, 0);
+}
+
 /* This method sets defaults to the NETA port:
  * Clears interrupt Cause and Mask registers.
  * Clears all MAC tables.
@@ -1055,14 +1092,10 @@ static void mvneta_defaults_set(struct mvneta_port *pp)
int max_cpu = num_present_cpus();
 
/* Clear all Cause registers */
-   mvreg_write(pp, MVNETA_INTR_NEW_CAUSE, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_CAUSE, 0);
-   mvreg_write(pp, MVNETA_INTR_MISC_CAUSE, 0);
+   on_each_cpu(mvneta_percpu_clear_intr_cause, pp, true);
 
/* Mask all interrupts */
-   mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
+   on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
mvreg_write(pp, MVNETA_INTR_ENABLE, 0);
 
/* Enable MBUS Retry bit16 */
@@ -2528,31 +2561,6 @@ static int mvneta_setup_txqs(struct mvneta_port *pp)
return 0;
 }
 
-static void mvneta_percpu_unmask_interrupt(void *arg)
-{
-   struct mvneta_port *pp = arg;
-
-   /* All the queue are unmasked, but actually only the ones
-* maped to this CPU will be unmasked
-*/
-   mvreg_write(pp, MVNETA_INTR_NEW_MASK,
-   MVNETA_RX_INTR_MASK_ALL |
-   MVNETA_TX_INTR_MASK_ALL |
-   MVNETA_MISCINTR_INTR_MASK);
-}
-
-static void mvneta_percpu_mask_interrupt(void *arg)
-{
-   struct mvneta_port *pp = arg;
-
-   /* All the queue are masked, but actually only the ones
-* maped to this CPU will be masked
-*/
-   mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
-}
-
 static void mvneta_start_dev(struct mvneta_port *pp)
 {
int cpu;
@@ -2603,13 +2611,10 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
mvneta_port_disable(pp);
 
/* Clear all ethernet port interrupts */
-   mvreg_write(pp, MVNETA_INTR_MISC_CAUSE, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_CAUSE, 0);
+   on_each_cpu(mvneta_percpu_clear_intr_cause, pp, true);
 
/* Mask all ethernet port interrupts */
-   mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
+   on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
 
mvneta_tx_reset(pp);
mvneta_rx_reset(pp);
@@ -2916,9 +2921,7 @@ static int mvneta_percpu_notifier(struct notifier_block 
*nfb,
}
 
/* Mask all ethernet port interrupts */
-   mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
-   mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
+   on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);

[PATCH net 6/6] net: mvneta: Fix race condition during stopping

2016-01-29 Thread Gregory CLEMENT
When stopping the port, the CPU notifier are still there whereas the
mvneta_stop_dev function calls mvneta_percpu_disable() on each CPUs.
It was possible to have a new CPU coming at this point which could be
racy.

This patch adds a flag preventing executing the code notifier for a new CPU
when the port is stopping.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 3358c9a70467..22962fac47c2 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -372,6 +372,7 @@ struct mvneta_port {
int rxq_def;
/* protect  */
spinlock_t lock;
+   bool is_stopping;
 
/* Core clock */
struct clk *clk;
@@ -2914,6 +2915,11 @@ static int mvneta_percpu_notifier(struct notifier_block 
*nfb,
switch (action) {
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
+   /* Configuring the driver for a new CPU while the
+* driver is stopping is racy, so just avoid it.
+*/
+   if (pp->is_stopping)
+   break;
netif_tx_stop_all_queues(pp->dev);
 
/* We have to synchronise on tha napi of each CPU
@@ -3052,9 +3058,17 @@ static int mvneta_stop(struct net_device *dev)
 {
struct mvneta_port *pp = netdev_priv(dev);
 
+   /* Inform that we are stopping so we don't want to setup the
+* driver for new CPUs in the notifiers
+*/
+   pp->is_stopping = true;
mvneta_stop_dev(pp);
mvneta_mdio_remove(pp);
unregister_cpu_notifier(>cpu_notifier);
+   /* Now that the notifier are unregistered, we can clear the
+* flag
+*/
+   pp->is_stopping = false;
on_each_cpu(mvneta_percpu_disable, pp, true);
free_percpu_irq(dev->irq, pp->ports);
mvneta_cleanup_rxqs(pp);
-- 
2.5.0



RE: [RFC PATCH net-next 0/3] sctp: add GSO support

2016-01-29 Thread David Laight
From: 'Marcelo Ricardo Leitner'
> Sent: 28 January 2016 20:56
> On Thu, Jan 28, 2016 at 05:30:24PM +, David Laight wrote:
> > From: 'Marcelo Ricardo Leitner'
> > > Sent: 28 January 2016 15:53
> > > On Thu, Jan 28, 2016 at 01:51:02PM +, David Laight wrote:
...
> > > > With Nagle disabled (and it probably has to be since the data flow
> > > > is unlikely to be 'command-response' or 'unidirectional bulk')
> > > > it is currently almost impossible to get more than one chunk
> > > > into an ethernet frame.
> > > >
> > > > Support for MSG_MORE would help.
> > > >
> > > > Given the current implementation you can get almost the required
> > > > behaviour by turning nagle off and on repeatedly.
> > >
> > > That's pretty much expected, I think. Without Nagle, if bandwidth and
> > > cwnd allow, segment will be sent. GSO by itself shouldn't cause a
> > > buffering to protect from that.
> > >
> > > If something causes a bottleneck, tx may get queue up.  Like if I do a
> > > stress test in my system, generally receiver side is slower than sender,
> > > so I end up having tx buffers pretty easily. It mimics bandwidth
> > > restrictions.
> >
> > Imagine a using M2UA to connect local machines (one running mtp3, the other 
> > mtp2).
> > Configure two linksets of 16 signalling links and perform a double-reflect
> > loopback test.
> > The SCTP connection won't ever saturate, so every msu ends up in its own
> > ethernet packet.
> > It is easy to generate 1000's of ethernet frames/sec on a single connection.
> >
> > (We do this with something not entirely quite like M2UA over TCP,
> > even then it is very hard to get multiple message into a single
> > ethernet frame.)
> >
> 
> Agreed, GSO won't help much in there without a corking feature like
> MSG_MORE or that on/off switch on Nagle you mentioned.
> 
> Thing about this (and also GRO) is on identifying how much can be spent
> on waiting for the next chunk/packet without causing issues to the
> application. Nagle is there and helps quite a lot, but timing-sensitive
> applications will turn it off.

Nagle is only any good for unidirectional data (think file transfer)
and command-response (think telnet or rlogin), for anything else
it generates 200ms+ delays.
The SIGTRAN protocols (M3UA etc) are all specified to use SCTP
and have no real relationship between the send and receive data.
SCTP is being used to replace 64kb/s links so a few milliseconds
of delay (1ms is 8 byte times) probably don't matter.
The Nagle timeout is, however far too long - so has to be disabled.

So never mind GSO to improve 'bulk' output, the big SCTP performance
issue (as I see it) is the inability for a lot of workloads to
ever get more than one small data chunk into an ethernet frame.

Which other (application) protocols are using SCTP?
It doesn't seem an appropriate protocol for bulk data.

David



Re: [PATCH net v2 1/2] ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()

2016-01-29 Thread David Ahern

On 1/29/16 4:30 AM, Paolo Abeni wrote:

The current implementation of ip6_dst_lookup_tail basically
ignore the egress ifindex match: if the saddr is set,
ip6_route_output() purposefully ignores flowi6_oif, due
to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
flag if saddr set"), if the saddr is 'any' the first route lookup
in ip6_dst_lookup_tail fails, but upon failure a second lookup will
be performed with saddr set, thus ignoring the ifindex constraint.

This commit adds an output route lookup function variant, which
allows the caller to specify lookup flags, and modify
ip6_dst_lookup_tail() to enforce the ifindex match on the second
lookup via said helper.

ip6_route_output() becames now a static inline function build on
top of ip6_route_output_flags(); as a side effect, out-of-tree
modules need now a GPL license to access the output route lookup
functionality.

Signed-off-by: Paolo Abeni 
--


Acked-by: David Ahern 



RE: [RFC PATCH net-next 0/3] sctp: add GSO support

2016-01-29 Thread David Laight
> From: 'Marcelo Ricardo Leitner' [mailto:marcelo.leit...@gmail.com]
> Sent: 28 January 2016 20:56
...
> > > But yes, agreed, MSG_MORE is at least a welcomed compliment here,
> > > specially for applications generating a train of chunks. Will put that in
> > > my ToDo here, thanks.
> >
> > I've posted a patch in the past for MSG_MORE, didn't quite work.
> 
> Ahh cool. Can you share the archive link please? Maybe I can take it
> from there then.

I think the last record is:
https://patchwork.ozlabs.org/patch/372404

David


[PATCH net 1/6] net: mvneta: Fix for_each_present_cpu usage

2016-01-29 Thread Gregory CLEMENT
This patch convert the for_each_present in on_each_cpu, instead of
applying on the present cpus it will be applied only on the online cpus.
This fix a bug reported on
http://thread.gmane.org/gmane.linux.ports.arm.kernel/468173.

Using the macro on_each_cpu (instead of a for_each_* loop) also ensures
that all the calls will be done all at once.

Fixes: f86428854480 ("net: mvneta: Statically assign queues to CPUs")
Reported-by: Stefan Roese 
Suggested-by: Jisheng Zhang 
Suggested-by: Russell King 
Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 662c2ee268c7..90ff5c7e19ea 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2564,7 +2564,7 @@ static void mvneta_start_dev(struct mvneta_port *pp)
mvneta_port_enable(pp);
 
/* Enable polling on the port */
-   for_each_present_cpu(cpu) {
+   for_each_online_cpu(cpu) {
struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
 
napi_enable(>napi);
@@ -2589,7 +2589,7 @@ static void mvneta_stop_dev(struct mvneta_port *pp)
 
phy_stop(pp->phy_dev);
 
-   for_each_present_cpu(cpu) {
+   for_each_online_cpu(cpu) {
struct mvneta_pcpu_port *port = per_cpu_ptr(pp->ports, cpu);
 
napi_disable(>napi);
@@ -3057,13 +3057,11 @@ err_cleanup_rxqs:
 static int mvneta_stop(struct net_device *dev)
 {
struct mvneta_port *pp = netdev_priv(dev);
-   int cpu;
 
mvneta_stop_dev(pp);
mvneta_mdio_remove(pp);
unregister_cpu_notifier(>cpu_notifier);
-   for_each_present_cpu(cpu)
-   smp_call_function_single(cpu, mvneta_percpu_disable, pp, true);
+   on_each_cpu(mvneta_percpu_disable, pp, true);
free_percpu_irq(dev->irq, pp->ports);
mvneta_cleanup_rxqs(pp);
mvneta_cleanup_txqs(pp);
-- 
2.5.0



RE: [RFC PATCH net-next 0/3] sctp: add GSO support

2016-01-29 Thread David Laight
From: 'Marcelo Ricardo Leitner'
> Sent: 28 January 2016 20:56
...
> > > > I did wonder whether the queued data could actually be picked up
> > > > be a Heartbeat chunk that is probing a different remote address
> > > > (which would be bad news).
> > >
> > > I don't follow. You mean if a heartbeat may get stuck in queue or if
> > > sending of a heartbeat can end up carrying additional data by accident?
> >
> > My suspicion was that the heartbeat would carry the queued data.
> 
> I'm afraid I'm  still not following, sorry. You mean that this GSO patch
> would cause the heartbeat to carry queued data? If yes, no, because for
> SCTP side of it it mangles the packet size and make it look bigger
> instead of handling multiple packets. It will then break this large
> sctp_packet into several sk_buff and glue them together as if they were
> GROed, allowing skb_segment to just split them back. The reason the
> sctp_packet is generated, being it due to user data or control chunks
> like heartbeats, is not modified.

I'm thinking of the code prior to your GSO changes.

IIRC with nagle enabled data chunks are built and put onto an
internal queue to be sent later (with nagle disabled they are
sent at the end of the tx processing provided (IIRC) there is window).
Any message that forces a transmit picks up the queued data - and
I think this might include heartbeats

I didn't read the code closely enough to find out whether this
was true or not.

David



Re: [PATCH] openvswitch: allow management from inside user namespaces

2016-01-29 Thread Tycho Andersen
Hi Eric,

Thanks for the review.

On Fri, Jan 29, 2016 at 08:29:55AM -0600, Eric W. Biederman wrote:
> Tycho Andersen  writes:
> 
> > Operations with the GENL_ADMIN_PERM flag fail permissions checks because
> > this flag means we call netlink_capable, which uses the init user ns.
> >
> > Instead, let's do permissions checks in each function, but use the netlink
> > socket's user ns instead of the initial one, to allow management of
> > openvswitch resources from inside a user ns.
> >
> > The motivation for this is to be able to run openvswitch in unprivileged
> > containers. I've tested this and it seems to work, but I really have no
> > idea about the security consequences of this patch, so thoughts would be
> > much appreciated.
> 
> So at a quick look using ns_capable this way is probably buggy.
> 
> netlink is goofy (because historically we got this wrong), and I forget
> what the specific rules are.  The general rule is that you need to do
> your permission checks on open/create/connect and not inside send/write
> while processing data.  Otherwise there is a class of privileged
> applications where you can set their stdout to some precreated file
> descriptor and their output can be made to act as a command, bypassing
> your permission checks.

It's worth noting that this patch doesn't move the checks (i.e., they
are still done at write time currently in the kernel), it just relaxes
them to root in the user ns instead of real root. This means I can
currently exploit netlink this way as an unprivileged, just not from
within an unprivileged container.

An alternate version of this patch below might be more favorable, as
we may want to do something like this elsewhere in netlink. I think it
also clarifies the situation a bit, at the cost of adding another
flag.

A third option would be to move this check to connect time, but that
would force everything in the family (since that's the only thing you
connect /to/ in netlink) to have the same required permissions, which
might (probably?) break stuff; e.g. you can call OVS_FLOW_CMD_GET
without CAP_NET_ADMIN, but if we changed everything in the family to
require it, that would break.

Tycho
---
 include/uapi/linux/genetlink.h |  1 +
 net/netlink/genetlink.c|  6 --
 net/openvswitch/datapath.c | 20 ++--
 3 files changed, 15 insertions(+), 12 deletions(-)

diff --git a/include/uapi/linux/genetlink.h b/include/uapi/linux/genetlink.h
index c3363ba..5512c90 100644
--- a/include/uapi/linux/genetlink.h
+++ b/include/uapi/linux/genetlink.h
@@ -21,6 +21,7 @@ struct genlmsghdr {
 #define GENL_CMD_CAP_DO0x02
 #define GENL_CMD_CAP_DUMP  0x04
 #define GENL_CMD_CAP_HASPOL0x08
+#define GENL_UNS_ADMIN_PERM0x10
 
 /*
  * List of reserved static generic netlink identifiers:
diff --git a/net/netlink/genetlink.c b/net/netlink/genetlink.c
index f830326..6bbb3eb 100644
--- a/net/netlink/genetlink.c
+++ b/net/netlink/genetlink.c
@@ -576,8 +576,10 @@ static int genl_family_rcv_msg(struct genl_family *family,
if (ops == NULL)
return -EOPNOTSUPP;
 
-   if ((ops->flags & GENL_ADMIN_PERM) &&
-   !netlink_capable(skb, CAP_NET_ADMIN))
+   if (((ops->flags & GENL_ADMIN_PERM) &&
+   !netlink_capable(skb, CAP_NET_ADMIN)) ||
+   ((ops->flags & GENL_UNS_ADMIN_PERM) &&
+   !netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)))
return -EPERM;
 
if ((nlh->nlmsg_flags & NLM_F_DUMP) == NLM_F_DUMP) {
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index deadfda..d6f7fe9 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -654,7 +654,7 @@ static const struct nla_policy 
packet_policy[OVS_PACKET_ATTR_MAX + 1] = {
 
 static const struct genl_ops dp_packet_genl_ops[] = {
{ .cmd = OVS_PACKET_CMD_EXECUTE,
- .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
+ .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
  .policy = packet_policy,
  .doit = ovs_packet_cmd_execute
}
@@ -1391,12 +1391,12 @@ static const struct nla_policy 
flow_policy[OVS_FLOW_ATTR_MAX + 1] = {
 
 static const struct genl_ops dp_flow_genl_ops[] = {
{ .cmd = OVS_FLOW_CMD_NEW,
- .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
+ .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
  .policy = flow_policy,
  .doit = ovs_flow_cmd_new
},
{ .cmd = OVS_FLOW_CMD_DEL,
- .flags = GENL_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
+ .flags = GENL_UNS_ADMIN_PERM, /* Requires CAP_NET_ADMIN privilege. */
  .policy = flow_policy,
  .doit = ovs_flow_cmd_del
},
@@ -1407,7 +1407,7 @@ static const struct genl_ops dp_flow_genl_ops[] = {
  .dumpit = ovs_flow_cmd_dump
},
{ .cmd = OVS_FLOW_CMD_SET,
-   

Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Johannes Berg
On Fri, 2016-01-29 at 10:12 -0600, Larry Finger wrote:
> 
> Upon further inspection, my log has the line "rtl8821ae :02:00.0
> wlp2s0: disabling HT/VHT due to WEP/TKIP use". I need to fix that
> first.
> 
Likely TKIP; enable only WPA2 (CCMP) on the AP.

johannes


[PATCH net 5/6] net: mvneta: The mvneta_percpu_elect function should be atomic

2016-01-29 Thread Gregory CLEMENT
Electing a CPU must be done in an atomic way: it should be done after or
before the removal/insertion of a CPU and this function is not reentrant.

During the loop of mvneta_percpu_elect we associates the queues to the
CPUs, if there is a topology change during this loop, then the mapping
between the CPUs and the queues could be wrong. During this loop the
interrupt mask is also updating for each CPUs, It should not be changed
in the same time by other part of the driver.

This patch adds spinlock to create the needed critical sections.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 1ed813d478e8..3358c9a70467 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -370,6 +370,8 @@ struct mvneta_port {
struct net_device *dev;
struct notifier_block cpu_notifier;
int rxq_def;
+   /* protect  */
+   spinlock_t lock;
 
/* Core clock */
struct clk *clk;
@@ -2855,6 +2857,11 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
 {
int online_cpu_idx, max_cpu, cpu, i = 0;
 
+   /* Electing a CPU must done in an atomic way: it should be
+* done after or before the removal/insertion of a CPU and
+* this function is not reentrant.
+*/
+   spin_lock(>lock);
online_cpu_idx = pp->rxq_def % num_online_cpus();
max_cpu = num_present_cpus();
 
@@ -2893,6 +2900,7 @@ static void mvneta_percpu_elect(struct mvneta_port *pp)
i++;
 
}
+   spin_unlock(>lock);
 };
 
 static int mvneta_percpu_notifier(struct notifier_block *nfb,
@@ -2947,8 +2955,13 @@ static int mvneta_percpu_notifier(struct notifier_block 
*nfb,
case CPU_DOWN_PREPARE:
case CPU_DOWN_PREPARE_FROZEN:
netif_tx_stop_all_queues(pp->dev);
+   /* Thanks to this lock we are sure that any pending
+* cpu election is done
+*/
+   spin_lock(>lock);
/* Mask all ethernet port interrupts */
on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
+   spin_unlock(>lock);
 
napi_synchronize(>napi);
napi_disable(>napi);
-- 
2.5.0



[PATCH net 0/6] mvneta fixes for SMP

2016-01-29 Thread Gregory CLEMENT
Hi,

Following this bug report:
http://thread.gmane.org/gmane.linux.ports.arm.kernel/468173 and the
suggestions from Russell King, I reviewed all the code involving
multi-CPU. It ended with this series of patches which should improve
the stability of the driver.

The first patch fix a real bug, the other fix potential issues in the
driver.

Thanks,

Gregory

Gregory CLEMENT (6):
  net: mvneta: Fix for_each_present_cpu usage
  net: mvneta: Use on_each_cpu when possible
  net: mvneta: Remove unused code
  net: mvneta: Modify the queue related fields from each cpu
  net: mvneta: The mvneta_percpu_elect function should be atomic
  net: mvneta: Fix race condition during stopping

 drivers/net/ethernet/marvell/mvneta.c | 160 +-
 1 file changed, 82 insertions(+), 78 deletions(-)

-- 
2.5.0



[PATCH net 3/6] net: mvneta: Remove unused code

2016-01-29 Thread Gregory CLEMENT
Since the commit 2dcf75e2793c ("net: mvneta: Associate RX queues with
each CPU") all the percpu irq are used and unmask at initialization, so
there is no point to unmask them first.

Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 3d6e3137f305..861b7e0d7d5f 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3009,14 +3009,6 @@ static int mvneta_open(struct net_device *dev)
goto err_cleanup_txqs;
}
 
-   /* Even though the documentation says that request_percpu_irq
-* doesn't enable the interrupts automatically, it actually
-* does so on the local CPU.
-*
-* Make sure it's disabled.
-*/
-   mvneta_percpu_disable(pp);
-
/* Enable per-CPU interrupt on all the CPU to handle our RX
 * queue interrupts
 */
-- 
2.5.0



[PATCH net 2/6] net: mvneta: Use on_each_cpu when possible

2016-01-29 Thread Gregory CLEMENT
Instead of using a for_each_* loop in which we just call the
smp_call_function_single macro, it is more simple to directly use the
on_each_cpu macro. Moreover, this macro ensures that the calls will be
done all at once.

Suggested-by: Russell King 
Signed-off-by: Gregory CLEMENT 
---
 drivers/net/ethernet/marvell/mvneta.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 90ff5c7e19ea..3d6e3137f305 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2555,7 +2555,7 @@ static void mvneta_percpu_mask_interrupt(void *arg)
 
 static void mvneta_start_dev(struct mvneta_port *pp)
 {
-   unsigned int cpu;
+   int cpu;
 
mvneta_max_rx_size_set(pp, pp->pkt_size);
mvneta_txq_max_tx_size_set(pp, pp->pkt_size);
@@ -2571,9 +2571,8 @@ static void mvneta_start_dev(struct mvneta_port *pp)
}
 
/* Unmask interrupts. It has to be done from each CPU */
-   for_each_online_cpu(cpu)
-   smp_call_function_single(cpu, mvneta_percpu_unmask_interrupt,
-pp, true);
+   on_each_cpu(mvneta_percpu_unmask_interrupt, pp, true);
+
mvreg_write(pp, MVNETA_INTR_MISC_MASK,
MVNETA_CAUSE_PHY_STATUS_CHANGE |
MVNETA_CAUSE_LINK_CHANGE |
@@ -2988,7 +2987,7 @@ static int mvneta_percpu_notifier(struct notifier_block 
*nfb,
 static int mvneta_open(struct net_device *dev)
 {
struct mvneta_port *pp = netdev_priv(dev);
-   int ret, cpu;
+   int ret;
 
pp->pkt_size = MVNETA_RX_PKT_SIZE(pp->dev->mtu);
pp->frag_size = SKB_DATA_ALIGN(MVNETA_RX_BUF_SIZE(pp->pkt_size)) +
@@ -3021,9 +3020,7 @@ static int mvneta_open(struct net_device *dev)
/* Enable per-CPU interrupt on all the CPU to handle our RX
 * queue interrupts
 */
-   for_each_online_cpu(cpu)
-   smp_call_function_single(cpu, mvneta_percpu_enable,
-pp, true);
+   on_each_cpu(mvneta_percpu_enable, pp, true);
 
 
/* Register a CPU notifier to handle the case where our CPU
@@ -3310,9 +3307,7 @@ static int  mvneta_config_rss(struct mvneta_port *pp)
 
netif_tx_stop_all_queues(pp->dev);
 
-   for_each_online_cpu(cpu)
-   smp_call_function_single(cpu, mvneta_percpu_mask_interrupt,
-pp, true);
+   on_each_cpu(mvneta_percpu_mask_interrupt, pp, true);
 
/* We have to synchronise on the napi of each CPU */
for_each_online_cpu(cpu) {
-- 
2.5.0



Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Larry Finger

On 01/29/2016 10:15 AM, Johannes Berg wrote:

On Fri, 2016-01-29 at 10:12 -0600, Larry Finger wrote:


Upon further inspection, my log has the line "rtl8821ae :02:00.0
wlp2s0: disabling HT/VHT due to WEP/TKIP use". I need to fix that
first.


Likely TKIP; enable only WPA2 (CCMP) on the AP.


I found and fixed it. The AP was using only 20 MHz channels, but now is 
configured for 20/40/80. That duplicates the warning for me and the lack of 
throughput even though it associates and authenticates.


Larry




Re: [PATCH V3] net: emac: emac gigabit ethernet controller driver

2016-01-29 Thread Timur Tabi
Gilad is no longer working for Qualcomm, so I'm taking over (as best as 
I can) this driver.  Let's just say it's going to be a learning experience.


Rob Herring wrote:

diff --git a/Documentation/devicetree/bindings/net/qcom-emac.txt 
b/Documentation/devicetree/bindings/net/qcom-emac.txt
new file mode 100644
index 000..8d58a40
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/qcom-emac.txt
@@ -0,0 +1,68 @@
+Qualcomm EMAC Gigabit Ethernet Controller
+
+Required properties:
+- cell-index : EMAC controller instance number.
+- compatible : Should be "qcom,emac".


This should be more specific with the SOC name.


The emac is present on a lot of Qualcomm SOCs, and there are only a few 
variations of it.  It's not really SOC-specific, and the hardware 
version can be queried by the driver.


What did you have in mind?  I don't know even have a list of which SOCs 
has an emac, especially since it's mostly on MSM parts, and I work on 
the server SOC.



+- reg : Offset and length of the register regions for the device
+- reg-names : Register region names referenced in 'reg' above.
+   Required register resource entries are:
+   "base"   : EMAC controller base register block.
+   "csr": EMAC wrapper register block.
+   Optional register resource entries are:
+   "ptp": EMAC PTP (1588) register block.
+  Required if 'qcom,emac-tstamp-en' is present.
+   "sgmii"  : EMAC SGMII PHY register block.
+- interrupts : Interrupt numbers used by this controller
+- interrupt-names : Interrupt resource names referenced in 'interrupts' above.
+   Required interrupt resource entries are:
+   "emac_core0"   : EMAC core0 interrupt.
+   "sgmii_irq"   : EMAC SGMII interrupt.
+- qcom,emac-gpio-mdc  : GPIO pin number of the MDC line of MDIO bus.
+- qcom,emac-gpio-mdio : GPIO pin number of the MDIO line of MDIO bus.


Use the standard binding for GPIO controlled MDIO bus.


I'm not familiar with that one.  Are you talking about 
bindings/net/mdio-gpio.txt?



+- phy-addr: Specifies phy address on MDIO bus.
+   Required if the optional property "qcom,no-external-phy"
+   is not specified.


Don't you think you will need to know the specific phy device or other
properties of the phy?


That, I can't answer.  Aren't all MDIO devices basically the same?  It's 
been a while since I've worked on them.



+Optional properties:
+- qcom,emac-tstamp-en   : Enables the PTP (1588) timestamping feature.
+ Include this only if PTP (1588) timestamping
+ feature is needed. If included, "ptp" register
+ base should be specified.


Isn't this a user enabled feature if the h/w supports it?


Is there a sysfs entry for that?  We were planning on having a similar 
ACPI property.




Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Larry Finger

Linus,

Attached is a trial patch that fixes the problem on my system. As I told 
Johannes earlier, my AP was not configured to use VHT, thus I did not see the 
problem.


The test patch that Johannes sent earlier was close. The section needed to add 
VHT rates is:


--- a/drivers/net/wireless/realtek/rtlwifi/rc.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rc.c
@@ -138,6 +138,9 @@ static void _rtl_rc_rate_set_series(struct rtl_priv 
*rtlpriv,
((wireless_mode == WIRELESS_MODE_N_5G) ||
 (wireless_mode == WIRELESS_MODE_N_24G)))
rate->flags |= IEEE80211_TX_RC_MCS;
+   if (sta && sta->vht_cap.vht_supported &&
+   (wireless_mode == WIRELESS_MODE_AC_5G))
+   rate->flags |= IEEE80211_TX_RC_VHT_MCS;
}
 }

Larry

>From bd34ac0c3caa9ff982194256b0e96772a17e719d Mon Sep 17 00:00:00 2001
From: Larry Finger 
Date: Fri, 29 Jan 2016 11:29:10 -0600
Subject: [PATCH] rtlwifi: Fix warning from ieee80211_get_tx_rates() when using
 5G
To: kv...@codeaurora.org
Cc: linux-wirel...@vger.kernel.org,
de...@driverdev.osuosl.org

When using a 5G-capable device with VHT rates enabled, the following
warning results:

WARNING: CPU: 3 PID: 2253 at net/mac80211/rate.c:625 ieee80211_get_tx_rates+0x22e/0x620 [mac80211]()
Modules linked in: rtl8821ae btcoexist rtl_pci rtlwifi fuse drbg ansi_cprng ctr ccm bnep bluetooth af_packet nfs fscache vboxpci(O) vboxnetadp(O) vboxne
tflt(O) vboxdrv(O) arc4 snd_hda_codec_generic x86_pkg_temp_thermal rtsx_pci_sdmmc mmc_core rtsx_pci_ms kvm_intel memstick iwlmvm kvm mac80211 snd_hda_intel snd_hda_cod
ec snd_hwdep snd_hda_core irqbypass snd_pcm iwlwifi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 snd_timer lrw gf128mul glue_h
elper ablk_helper cryptd snd cfg80211 pcspkr serio_raw e1000e rtsx_pci lpc_ich ptp xhci_pci mfd_core pps_core xhci_hcd soundcore toshiba_acpi thermal sparse_keymap wmi
 toshiba_bluetooth rfkill acpi_cpufreq battery ac processor dm_mod i915 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
drm sr_mod cdrom video button sg autofs4 [last unloaded: rtlwifi]
CPU: 3 PID: 2253 Comm: Timer Tainted: GW  O4.5.0-rc1-wl+ #79
Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.20   04/17/2014
  a05c4be6 8802262036d8 813d7912 
  880226203710 8106bcb6 8800c6831300 8800c6831330
   8800c683133c 880065923638 880226203720
Call Trace:
[] dump_stack+0x4b/0x79
  [] warn_slowpath_common+0x86/0xc0
  [] warn_slowpath_null+0x1a/0x20
  [] ieee80211_get_tx_rates+0x22e/0x620 [mac80211]
  [] ? rtl_is_special_data+0x32/0x240 [rtlwifi]
  [] ? rate_control_get_rate+0xce/0x150 [mac80211]
  [] ? trace_hardirqs_on+0xd/0x10
  [] ? __local_bh_enable_ip+0x65/0xd0
--- traceback terminated here ---

The problem is that IEEE80211_TX_RC_VHT_MCS is not set in the rate flags.

Reported-by: Linus Torvalds 
Cc: Johannes Berg 
Signed-off-by: Larry Finger 
Cc: Stable 
---
 drivers/net/wireless/realtek/rtlwifi/rc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rc.c b/drivers/net/wireless/realtek/rtlwifi/rc.c
index 74c14ce..e7eae63 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rc.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rc.c
@@ -138,6 +138,9 @@ static void _rtl_rc_rate_set_series(struct rtl_priv *rtlpriv,
 		((wireless_mode == WIRELESS_MODE_N_5G) ||
 		 (wireless_mode == WIRELESS_MODE_N_24G)))
 			rate->flags |= IEEE80211_TX_RC_MCS;
+		if (sta && sta->vht_cap.vht_supported &&
+		(wireless_mode == WIRELESS_MODE_AC_5G))
+			rate->flags |= IEEE80211_TX_RC_VHT_MCS;
 	}
 }
 
-- 
2.1.4



[RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Sowmini Varadhan

There is an unaligned access at __skb_flow_dissect when it calls
ip6_flowlabel() with the call stack

  __skb_flow_dissect+0x2a8/0x87c
  eth_get_headlen+0x5c/0xaxa4
  ixgbe_clean_rx_irq+0x5cc/0xb20 [ixgbe]
  ixgbe_poll+0x5a4/0x760 [ixgbe]
  net_rx_action+0x13c/0x354
:

Essentially, ixgbe_pull_tail() is trying to figure out how much
to pull, in order to have an aligned buffer:

pull_len = eth_get_headlen(va, IXGBE_RX_HDR_SIZE);

/* align pull length to size of long to optimize memcpy performance */
skb_copy_to_linear_data(skb, va, ALIGN(pull_len, sizeof(long)));

and seems like the unaligned access is unavoidable here (see comments
in __skb_get_poff, for example).

This (below) is what I came up with, to get rid of the unaligned access
errors on sparc, Is there a better solution? (Not having access
to struct ip6_hdr in this file made put_unaligned usage non-obvious)


--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -102,6 +102,17 @@ __be32 __skb_flow_get_ports(const struct sk_buff *skb, int 
 }
 EXPORT_SYMBOL(__skb_flow_get_ports);
 
+static inline __be32 ip6_flowlabel_align(const u8 *hdr)
+{
+   union {
+   __u8 w[4];
+   __u32 flow;
+   } ip6_flow;
+
+   memcpy(ip6_flow.w, hdr, 4);
+   return (ip6_flow.flow & IPV6_FLOWLABEL_MASK);
+}
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are specifie
@@ -230,7 +241,7 @@ ipv6:
key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
}
 
-   flow_label = ip6_flowlabel(iph);
+   flow_label = ip6_flowlabel_align((const u8 *)iph);
if (flow_label) {
if (dissector_uses_key(flow_dissector,
   FLOW_DISSECTOR_KEY_FLOW_LABEL)) {




Re: [PATCH RESEND net] nfc: use GFP_USER for user-controlled kmalloc

2016-01-29 Thread Cong Wang
On Wed, Jan 27, 2016 at 1:05 PM, Eric Dumazet  wrote:
> On Wed, 2016-01-27 at 11:50 -0800, Cong Wang wrote:
>
>> Hmm? I think nfc_llcp_send_ui_frame() needs to do some fragmention
>> with this temporary memory, or you are saying msg_iter has some
>> API available to seek the pointer? Even if so, it doesn't look like
>> suitable for -stable.
>>
>
> memcpy_from_msg(msg_data, msg, len) will overwrite the msg_data with len
> bytes, or return an error.
>
> So prior msg_data content does not matter.
>
> kzalloc() before a memset() or memcpy() sounds defensive programming,
> kmalloc() is a bit faster.

Oh, you mean s/kzalloc/kmalloc/, I thought you mean s/kzalloc//. ;)


Re: [PATCH] net_sched: drr: check for NULL pointer in drr_dequeue

2016-01-29 Thread Cong Wang
On Wed, Jan 27, 2016 at 7:30 PM, Bernie Harris
 wrote:
> There are cases where qdisc_dequeue_peeked can return NULL, and the result
> is dereferenced later on in the function.
>
> Similarly to the other qdisc dequeue functions, check whether the skb
> pointer is NULL and if it is, goto out.
>
> Signed-off-by: Bernie Harris 

Reviewed-by: Cong Wang 


Re: [v2 PATCH 9/26] eCryptfs: Use skcipher and shash

2016-01-29 Thread Tyler Hicks
On 2016-01-25 10:29:33, Herbert Xu wrote:
> On Sun, Jan 24, 2016 at 07:10:50PM +0100, Julia Lawall wrote:
> > Maybe the goto on line 1726 needs a preceding mutex_unlock?
> 
> Good catch! Thanks.
> 
> ---8<---
> This patch replaces uses of ablkcipher and blkcipher with skcipher,
> and the long obsolete hash interface with shash.
> 
> Signed-off-by: Herbert Xu 

Acked-by: Tyler Hicks 

I have no problem with you taking this through the cryptodev tree.
Thanks!

Tyler

> 
> diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
> index 80d6901..11255cb 100644
> --- a/fs/ecryptfs/crypto.c
> +++ b/fs/ecryptfs/crypto.c
> @@ -23,6 +23,8 @@
>   * 02111-1307, USA.
>   */
>  
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -30,7 +32,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -74,6 +75,19 @@ void ecryptfs_from_hex(char *dst, char *src, int dst_size)
>   }
>  }
>  
> +static int ecryptfs_hash_digest(struct crypto_shash *tfm,
> + char *src, int len, char *dst)
> +{
> + SHASH_DESC_ON_STACK(desc, tfm);
> + int err;
> +
> + desc->tfm = tfm;
> + desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
> + err = crypto_shash_digest(desc, src, len, dst);
> + shash_desc_zero(desc);
> + return err;
> +}
> +
>  /**
>   * ecryptfs_calculate_md5 - calculates the md5 of @src
>   * @dst: Pointer to 16 bytes of allocated memory
> @@ -88,45 +102,26 @@ static int ecryptfs_calculate_md5(char *dst,
> struct ecryptfs_crypt_stat *crypt_stat,
> char *src, int len)
>  {
> - struct scatterlist sg;
> - struct hash_desc desc = {
> - .tfm = crypt_stat->hash_tfm,
> - .flags = CRYPTO_TFM_REQ_MAY_SLEEP
> - };
> + struct crypto_shash *tfm;
>   int rc = 0;
>  
>   mutex_lock(_stat->cs_hash_tfm_mutex);
> - sg_init_one(, (u8 *)src, len);
> - if (!desc.tfm) {
> - desc.tfm = crypto_alloc_hash(ECRYPTFS_DEFAULT_HASH, 0,
> -  CRYPTO_ALG_ASYNC);
> - if (IS_ERR(desc.tfm)) {
> - rc = PTR_ERR(desc.tfm);
> + tfm = crypt_stat->hash_tfm;
> + if (!tfm) {
> + tfm = crypto_alloc_shash(ECRYPTFS_DEFAULT_HASH, 0, 0);
> + if (IS_ERR(tfm)) {
> + rc = PTR_ERR(tfm);
>   ecryptfs_printk(KERN_ERR, "Error attempting to "
>   "allocate crypto context; rc = [%d]\n",
>   rc);
>   goto out;
>   }
> - crypt_stat->hash_tfm = desc.tfm;
> - }
> - rc = crypto_hash_init();
> - if (rc) {
> - printk(KERN_ERR
> -"%s: Error initializing crypto hash; rc = [%d]\n",
> -__func__, rc);
> - goto out;
> + crypt_stat->hash_tfm = tfm;
>   }
> - rc = crypto_hash_update(, , len);
> + rc = ecryptfs_hash_digest(tfm, src, len, dst);
>   if (rc) {
>   printk(KERN_ERR
> -"%s: Error updating crypto hash; rc = [%d]\n",
> -__func__, rc);
> - goto out;
> - }
> - rc = crypto_hash_final(, dst);
> - if (rc) {
> - printk(KERN_ERR
> -"%s: Error finalizing crypto hash; rc = [%d]\n",
> +"%s: Error computing crypto hash; rc = [%d]\n",
>  __func__, rc);
>   goto out;
>   }
> @@ -234,10 +229,8 @@ void ecryptfs_destroy_crypt_stat(struct 
> ecryptfs_crypt_stat *crypt_stat)
>  {
>   struct ecryptfs_key_sig *key_sig, *key_sig_tmp;
>  
> - if (crypt_stat->tfm)
> - crypto_free_ablkcipher(crypt_stat->tfm);
> - if (crypt_stat->hash_tfm)
> - crypto_free_hash(crypt_stat->hash_tfm);
> + crypto_free_skcipher(crypt_stat->tfm);
> + crypto_free_shash(crypt_stat->hash_tfm);
>   list_for_each_entry_safe(key_sig, key_sig_tmp,
>_stat->keysig_list, crypt_stat_list) {
>   list_del(_sig->crypt_stat_list);
> @@ -342,7 +335,7 @@ static int crypt_scatterlist(struct ecryptfs_crypt_stat 
> *crypt_stat,
>struct scatterlist *src_sg, int size,
>unsigned char *iv, int op)
>  {
> - struct ablkcipher_request *req = NULL;
> + struct skcipher_request *req = NULL;
>   struct extent_crypt_result ecr;
>   int rc = 0;
>  
> @@ -358,20 +351,20 @@ static int crypt_scatterlist(struct ecryptfs_crypt_stat 
> *crypt_stat,
>   init_completion();
>  
>   mutex_lock(_stat->cs_tfm_mutex);
> - req = ablkcipher_request_alloc(crypt_stat->tfm, GFP_NOFS);
> + req = skcipher_request_alloc(crypt_stat->tfm, GFP_NOFS);
>   if (!req) {
>   

Re: [PATCH net-next 08/40] net: fec: move cbd_bufaddr assignment closer to the mapping function

2016-01-29 Thread Troy Kisky
On 1/28/2016 3:02 PM, Arnd Bergmann wrote:
> On Thursday 28 January 2016 14:25:32 Troy Kisky wrote:
>> Signed-off-by: Troy Kisky 
>> ---
>>  drivers/net/ethernet/freescale/fec_main.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> [note: missing changelog?]
> 
>> diff --git a/drivers/net/ethernet/freescale/fec_main.c 
>> b/drivers/net/ethernet/freescale/fec_main.c
>> index b87f80d..15a93f90 100644
>> --- a/drivers/net/ethernet/freescale/fec_main.c
>> +++ b/drivers/net/ethernet/freescale/fec_main.c
>> @@ -476,6 +476,8 @@ static int fec_enet_txq_submit_skb(struct 
>> fec_enet_priv_tx_q *txq,
>>  estatus |= BD_ENET_TX_TS;
>>  }
>>  }
>> +bdp->cbd_bufaddr = addr;
>> +bdp->cbd_datlen = buflen;
>>  
>>  if (fep->bufdesc_ex) {
>>  
>> @@ -499,8 +501,6 @@ static int fec_enet_txq_submit_skb(struct 
>> fec_enet_priv_tx_q *txq,
>>  /* Save skb pointer */
>>  txq->tx_skbuff[index] = skb;
>>  
>> -bdp->cbd_datlen = buflen;
>> -bdp->cbd_bufaddr = addr;
>>  /* Make sure the updates to rest of the descriptor are performed before
>>   * transferring ownership.
>>   */
>>
> 
> This patch and others in the series conflicts with the bugfix "net: fec: make
> driver endian-safe" that Johannes sent this week. Can you include his fix
> in your series and ensure that all descriptor accesses are done in
> an endian-safe way?
> 
>   Arnd

Sure, I'll rebase and send the first few when net-next is updated.

Thanks
Troy



Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Tom Herbert
On Fri, Jan 29, 2016 at 3:58 PM, Sowmini Varadhan
 wrote:
> On (01/29/16 15:31), Tom Herbert wrote:
>> But even within flow dissector, to be completely correct, we need to
>> replace all 32-bit accesses with the mempy (flow_label, mpls label,
>> keyid) and be vigilant about new ones coming in. For that matter, ..
>
> well, one question that came to my mind when I was looking at this is:
> why does eth_get_headlen try to compute the flow hash even before the
> driver has had a chance to pull things into an aligned buffer? Isn't
> that just risking even more unaligned accesses?
>
Hmm, thanks for pointing that out. It looks like this is called by a
couple drivers as part of pulling up the data to get alignment. I am
actually surprised they are doing this. Flow dissector is not the
cheapest function in the world and it seems like a shame to be calling
it this early and not even getting a hash for the skb. Also, this is
another instance of touching the data so if we do get to the point of
trying steering packets without cache miss (another thread); this
undermines that. Seems like it might be just as well to pull up a
fixed number of bytes (ixgbe want min of 60 bytes), or don't pull up
anything avoid the cache miss here and let the stack pull up as
needed.

Anyway, I think you do have a point that flow_dissector does not
assume alignment or pull up data like the rest of the stack. Seems
like we need a utility function like align_access (maybe there is one
already) to get 32-bit fields and probably do need the sanity check
for odd alignment.

Tom


Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread David Miller
From: Tom Herbert 
Date: Fri, 29 Jan 2016 16:47:46 -0800

> Seems like it might be just as well to pull up a fixed number of
> bytes (ixgbe want min of 60 bytes),

This is precisely what we were trying to move away from, please
don't regress back to that.


Re: [PATCH] vxlan: fix a out of bounds access in __vxlan_find_mac

2016-01-29 Thread Eric Dumazet
On Fri, 2016-01-29 at 13:41 -0800, Cong Wang wrote:
> On Thu, Jan 28, 2016 at 5:43 PM,   wrote:
> > From: Li RongQing 
> >
> > The size of all_zeros_mac is 6 byte, but eth_hash() will access the
> > 8 byte, and KASan reported the below bug:
> 
> 
> Sounds like we should fix eth_hash() (macvlan has a same function),
> it should not read beyond 6 bytes.

Why ? We always have at least 2 bytes following ethernet address in a
packet.

Better add these 2 bytes in all_zeros_mac[] and not slow down the hash
function.

We use same tricks in eth_type_trans()





Re: [PATCH] vxlan: fix a out of bounds access in __vxlan_find_mac

2016-01-29 Thread Cong Wang
On Fri, Jan 29, 2016 at 2:01 PM, Eric Dumazet  wrote:
> On Fri, 2016-01-29 at 13:41 -0800, Cong Wang wrote:
>> On Thu, Jan 28, 2016 at 5:43 PM,   wrote:
>> > From: Li RongQing 
>> >
>> > The size of all_zeros_mac is 6 byte, but eth_hash() will access the
>> > 8 byte, and KASan reported the below bug:
>>
>>
>> Sounds like we should fix eth_hash() (macvlan has a same function),
>> it should not read beyond 6 bytes.
>
> Why ? We always have at least 2 bytes following ethernet address in a
> packet.

Because we have to fix all the cases that don't.

>
> Better add these 2 bytes in all_zeros_mac[] and not slow down the hash
> function.
>
> We use same tricks in eth_type_trans()
>

I never doubt it works, as long as you want to audit all the cases
like this one,
sure go for it.


Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Sowmini Varadhan
On (01/29/16 15:00), Tom Herbert wrote:

> The sparc documentation is pretty clear
> http://docs.oracle.com/cd/E19253-01/816-4854/hwovr-2/index.html, seems
> like unaligned accesses are not allowed in the architecture.

yes, but looks like you  can paper over some of this with
memcpy (as was happening with the saddr ref in skb_flow_dissect
that puzzled me and Eric because it did not generate any traps).

I suppose one could sprinkele a few WARN_ON's for !IS_ALIGNED
but that's not a fool-proof detection method either (in addition
to all the ugly shouting in the code).

--Sowmini


Re: i40e: Kernel unaligned access due to 'struct i40e_dma_mem' being 'packed'

2016-01-29 Thread tndave



On 01/27/2016 10:56 PM, David Miller wrote:

From: tndave 
Date: Wed, 27 Jan 2016 17:50:14 -0800


Hi,

i40e driver has 'struct i40e_dma_mem' defined with 'packed' directive
causing kernel unaligned errors on sparc (when
40e_allocate_dma_mem_d()
is being called)

log_unaligned: 1031 callbacks suppressed
Kernel unaligned access at TPC[448ae8]
dma_4v_alloc_coherent+0x188/0x2e0
Kernel unaligned access at TPC[448ae8]
dma_4v_alloc_coherent+0x188/0x2e0
Kernel unaligned access at TPC[448ae8]
dma_4v_alloc_coherent+0x188/0x2e0
Kernel unaligned access at TPC[448ae8]
dma_4v_alloc_coherent+0x188/0x2e0

This can be fixed with get_unaligned/put_unaligned(). However I don't
see 'struct i40e_dma_mem' is being directly shoved into NIC hardware.
But instead fields of the struct are being read and used for hardware
(e.g. dma_addr_t pa). For the test, I remove __packed, and i40e driver
and HW works fine. (of course kernel unaligned errors are gone too).
My question is, does 'struct i40e_dma_mem' required to be __packed?


People get overzealoud with __packed.

And even if it doesn't cause unaligned accesses like this, it generates
terrible code (byte at a time accesses to words) on several architectures.
True. For the same reason I want to clarify if __packed is actually 
needed? instead of fixing it with get_unaligned/put_unaligned()!


-Tushar




[Intel-wired-lan] [next] igb: allow setting MAC address on i211 using a device tree blob

2016-01-29 Thread John Holland
The Intel i211 LOM pcie ethernet controllers' iNVM operates as an OTP 
and has no externel EEPROM interface [1]. The following allows the 
driver to pickup the MAC address from a device tree blob when CONFIG_OF 
has been enabled.


[1] 
http://www.intel.com/content/www/us/en/embedded/products/networking/i211-ethernet-controller-datasheet.html


Signed-off-by: John Holland 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 30 
++

 1 file changed, 30 insertions(+)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c

index 31e5f39..9c92443 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -56,6 +56,11 @@
 #include 
 #include "igb.h"

+#ifdef defined(CONFIG_OF)
+#include 
+#include 
+#endif
+
 #define MAJ 5
 #define MIN 3
 #define BUILD 0
@@ -2217,6 +,26 @@ static s32 igb_init_i2c(struct igb_adapter *adapter)
 }

 /**
+ *  igb_read_mac_addr_dts - Read mac addres from the device tree blob.
+ *  @hw: pointer to the e1000 hardware structure
+ **/
+#ifdef defined(CONFIG_OF)
+static void igb_read_mac_addr_dts(struct e1000_hw *hw)
+{
+   const u8 *mac;
+   struct device_node *dn;
+
+   dn = of_find_compatible_node(NULL, NULL, "intel,i211");
+   if (!dn)
+   return;
+
+   mac = of_get_mac_address(dn);
+   if (mac)
+   ether_addr_copy(hw->mac.addr, mac);
+}
+#endif
+
+/**
  *  igb_probe - Device Initialization Routine
  *  @pdev: PCI device information struct
  *  @ent: entry in igb_pci_tbl
@@ -2420,6 +2445,11 @@ static int igb_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)

if (hw->mac.ops.read_mac_addr(hw))
dev_err(>dev, "NVM Read Error\n");

+#ifdef defined(CONFIG_OF)
+   if (!is_valid_ether_addr(hw->mac.addr))
+   igb_read_mac_addr_dts(hw);
+#endif
+
memcpy(netdev->dev_addr, hw->mac.addr, netdev->addr_len);

if (!is_valid_ether_addr(netdev->dev_addr)) {


Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Eric Dumazet
On Fri, 2016-01-29 at 14:08 -0800, Alexander Duyck wrote:

> It also means DMA becomes dramatically slower as it introduces a
> partial write access for the start of every frame.  It is why we had
> set NET_IP_ALIGN to 0 on x86 since DMA was becoming more expensive
> when unaligned then reading IP unaligned headers.

Well, I guess that if you have an arch where DMA accesses are slow and
NET_IP_ALIGN = 2, you are out of luck. This is why some platforms are
better than others.


> 
> The gain on recvmsg would probably be minimal.  The only time I have
> seen any significant speed-up for copying is if you can get both ends
> aligned to something like 16B.

On modern intel cpus, this does not matter at all, sure. It took a while
before "rep movsb" finally did the right thing.

memcpy() and friends implementations are much slower on some older
arches (when dealing with unaligned src/dst)

arch/mips/lib/memcpy.S is a gem ;)






Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Tom Herbert
On Fri, Jan 29, 2016 at 1:09 PM, Sowmini Varadhan
 wrote:
> On (01/29/16 13:00), Tom Herbert wrote:
>> Doesn't every IP/VXLAN packet contains unaligned headers? Why don't
>> those create alignment issues (like when stack looks at addresses)?
>
> They do.
>
> http://comments.gmane.org/gmane.linux.network/370672
>
> some of it was fixed in https://lkml.org/lkml/2015/7/20/63.
> There's still some NET_IP_ALIGN missing. IIRC, I was seeing
> this for mldv3, but the fix has to be done carefully, by
> someone who knows how to fully regression test it.
>
> I dont know if other tunneling methods manage to get the
> NET_IP_ALIGN correct in every case.
>
> Also, while sparc complains about unaligned access
> in most cases, some sins may pass under the radar, and other
> platforms dont even generate traps, so it's easy to not know
> that there's a problem, a lot of the time.
>
The sparc documentation is pretty clear
http://docs.oracle.com/cd/E19253-01/816-4854/hwovr-2/index.html, seems
like unaligned accesses are not allowed in the architecture.

Tom

> --Sowmini
>


Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Alexander Duyck
On Fri, Jan 29, 2016 at 2:28 PM, Eric Dumazet  wrote:
> On Fri, 2016-01-29 at 14:08 -0800, Alexander Duyck wrote:
>
>> It also means DMA becomes dramatically slower as it introduces a
>> partial write access for the start of every frame.  It is why we had
>> set NET_IP_ALIGN to 0 on x86 since DMA was becoming more expensive
>> when unaligned then reading IP unaligned headers.
>
> Well, I guess that if you have an arch where DMA accesses are slow and
> NET_IP_ALIGN = 2, you are out of luck. This is why some platforms are
> better than others.

The other bit you forgot to mention was an IOMMU.  That is another
per-architecture thing that can really slow us down.  Back when I
rewrote the receive path I was dealing with a number of performance
complaints on PowerPC.  The approach I took with the Intel drivers was
supposed to be the best compromise for IOMMU, DMA alignment, and IP
header alignment.

>>
>> The gain on recvmsg would probably be minimal.  The only time I have
>> seen any significant speed-up for copying is if you can get both ends
>> aligned to something like 16B.
>
> On modern intel cpus, this does not matter at all, sure. It took a while
> before "rep movsb" finally did the right thing.
>
> memcpy() and friends implementations are much slower on some older
> arches (when dealing with unaligned src/dst)
>
> arch/mips/lib/memcpy.S is a gem ;)

Yeah.  I can imagine.  The fact is you can't may everybody happy so I
am good with just trying to support the majority architectures as best
as possible if a few have to take a performance hit for an unaligned
memcpy then so be it.

- Alex


Re: [PATCH] net_sched: drr: check for NULL pointer in drr_dequeue

2016-01-29 Thread David Miller
From: Bernie Harris 
Date: Thu, 28 Jan 2016 16:30:51 +1300

> There are cases where qdisc_dequeue_peeked can return NULL, and the result
> is dereferenced later on in the function.
> 
> Similarly to the other qdisc dequeue functions, check whether the skb
> pointer is NULL and if it is, goto out.
> 
> Signed-off-by: Bernie Harris 

Applied, thanks.


Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Sowmini Varadhan
On (01/29/16 15:31), Tom Herbert wrote:
> But even within flow dissector, to be completely correct, we need to
> replace all 32-bit accesses with the mempy (flow_label, mpls label,
> keyid) and be vigilant about new ones coming in. For that matter, ..

well, one question that came to my mind when I was looking at this is:
why does eth_get_headlen try to compute the flow hash even before the
driver has had a chance to pull things into an aligned buffer? Isn't
that just risking even more unaligned accesses?




Re: [PATCH net 0/3] bnxt_en: Bug fixes.

2016-01-29 Thread David Miller
From: Michael Chan 
Date: Thu, 28 Jan 2016 03:11:19 -0500

> 3 small bug fix patches for net.

Series applied, thanks Michael.


Re: [PATCH v2 net] nfc: use GFP_USER for user-controlled kmalloc

2016-01-29 Thread Julian Calaby
Hi Cong

On Sat, Jan 30, 2016 at 6:46 AM, Eric Dumazet  wrote:
> On Fri, 2016-01-29 at 11:24 -0800, Cong Wang wrote:
>> These two functions are called in sendmsg path, and the
>> 'len' is passed from user-space, so we should not allow
>> malicious users to OOM kernel on purpose.
>>
>> Reported-by: Dmitry Vyukov 
>> Cc: Lauro Ramos Venancio 
>> Cc: Aloisio Almeida Jr 
>> Cc: Samuel Ortiz 
>> Signed-off-by: Cong Wang 
>> ---
>
> Note that the issue is not OOM the kernel (as the allocation is
> attempted even after your patch), but having a way to
> spill stack traces in the syslog.
>
> Acked-by: Eric Dumazet 

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [PATCH] vxlan: fix a out of bounds access in __vxlan_find_mac

2016-01-29 Thread Eric Dumazet
On Fri, 2016-01-29 at 14:17 -0800, Cong Wang wrote:

> Because we have to fix all the cases that don't

Sure, lets do that without removing the optimizations.

macvlan control path seems to need a fix, as you pointed out.





Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Alexander Duyck
On Fri, Jan 29, 2016 at 1:33 PM, Eric Dumazet  wrote:
> On Fri, 2016-01-29 at 13:16 -0800, Alexander Duyck wrote:
>
>> It has to be something recent.  I know back when I wrote the code I
>> tested it on a few different architectures and had to add a few bits
>> in __skb_get_poff so that it would read doff as a u8 instead of
>> bitfield in a u32.
>>
>> Looking at the code it seems like this should be an easy fix.  Just
>> swap the two lines that acquire and test the flow label with the check
>> for dissector_uses_key(... _KEY_FLOW_LABEL).  Then ixgbe will stop
>> trying to grab flow labels.
>
> Note that if we properly align headers, we also align TCP/UDP payload.
>
> Meaning that copying data in recvmsg() is likely faster in some cpus.
> (The ones having NET_IP_ALIGN set to 2 presumably)

It also means DMA becomes dramatically slower as it introduces a
partial write access for the start of every frame.  It is why we had
set NET_IP_ALIGN to 0 on x86 since DMA was becoming more expensive
when unaligned then reading IP unaligned headers.

The gain on recvmsg would probably be minimal.  The only time I have
seen any significant speed-up for copying is if you can get both ends
aligned to something like 16B.

- Alex


Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Tom Herbert
On Fri, Jan 29, 2016 at 3:04 PM, Sowmini Varadhan
 wrote:
> On (01/29/16 15:00), Tom Herbert wrote:
>
>> The sparc documentation is pretty clear
>> http://docs.oracle.com/cd/E19253-01/816-4854/hwovr-2/index.html, seems
>> like unaligned accesses are not allowed in the architecture.
>
> yes, but looks like you  can paper over some of this with
> memcpy (as was happening with the saddr ref in skb_flow_dissect
> that puzzled me and Eric because it did not generate any traps).
>
Well, I am more worried about what happens when an unaligned
encapsulated TPC/IP packet gets into the stack. It seems unlikely we'd
want to replace a bunch of address, seq numbers, etc. with memcpy all
over the stack. ;-)

But even within flow dissector, to be completely correct, we need to
replace all 32-bit accesses with the mempy (flow_label, mpls label,
keyid) and be vigilant about new ones coming in. For that matter, if
we really want to be pedantic, nothing prevents a driver from giving
us a packet with 1 byte alignment in which case we need to consider
access to 16-bit values also! Considering that this is a very narrow
use case (requires encapsulated Ethernet and architecture that is
alignment sensitive), I wonder if we should just punt on flow
dissection when we see non 4-byte alignment and
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set.

Tom


[PATCH] bgmac: add helper checking for BCM4707 / BCM53018 chip id

2016-01-29 Thread Rafał Miłecki
Chipsets with BCM4707 / BCM53018 ID require special handling at a few
places in the code. It's likely there will be more IDs to check in the
future. To simplify it add this trivial helper.

Signed-off-by: Rafał Miłecki 
---
 drivers/net/ethernet/broadcom/bgmac.c | 30 --
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac.c 
b/drivers/net/ethernet/broadcom/bgmac.c
index 06f6cff..230f8e6 100644
--- a/drivers/net/ethernet/broadcom/bgmac.c
+++ b/drivers/net/ethernet/broadcom/bgmac.c
@@ -26,6 +26,17 @@ static const struct bcma_device_id bgmac_bcma_tbl[] = {
 };
 MODULE_DEVICE_TABLE(bcma, bgmac_bcma_tbl);
 
+static inline bool bgmac_is_bcm4707_family(struct bgmac *bgmac)
+{
+   switch (bgmac->core->bus->chipinfo.id) {
+   case BCMA_CHIP_ID_BCM4707:
+   case BCMA_CHIP_ID_BCM53018:
+   return true;
+   default:
+   return false;
+   }
+}
+
 static bool bgmac_wait_value(struct bcma_device *core, u16 reg, u32 mask,
 u32 value, int timeout)
 {
@@ -987,11 +998,9 @@ static void bgmac_mac_speed(struct bgmac *bgmac)
 static void bgmac_miiconfig(struct bgmac *bgmac)
 {
struct bcma_device *core = bgmac->core;
-   struct bcma_chipinfo *ci = >bus->chipinfo;
u8 imode;
 
-   if (ci->id == BCMA_CHIP_ID_BCM4707 ||
-   ci->id == BCMA_CHIP_ID_BCM53018) {
+   if (bgmac_is_bcm4707_family(bgmac)) {
bcma_awrite32(core, BCMA_IOCTL,
  bcma_aread32(core, BCMA_IOCTL) | 0x40 |
  BGMAC_BCMA_IOCTL_SW_CLKEN);
@@ -1055,9 +1064,7 @@ static void bgmac_chip_reset(struct bgmac *bgmac)
}
 
/* Request Misc PLL for corerev > 2 */
-   if (core->id.rev > 2 &&
-   ci->id != BCMA_CHIP_ID_BCM4707 &&
-   ci->id != BCMA_CHIP_ID_BCM53018) {
+   if (core->id.rev > 2 && !bgmac_is_bcm4707_family(bgmac)) {
bgmac_set(bgmac, BCMA_CLKCTLST,
  BGMAC_BCMA_CLKCTLST_MISC_PLL_REQ);
bgmac_wait_value(bgmac->core, BCMA_CLKCTLST,
@@ -1193,8 +1200,7 @@ static void bgmac_enable(struct bgmac *bgmac)
break;
}
 
-   if (ci->id != BCMA_CHIP_ID_BCM4707 &&
-   ci->id != BCMA_CHIP_ID_BCM53018) {
+   if (!bgmac_is_bcm4707_family(bgmac)) {
rxq_ctl = bgmac_read(bgmac, BGMAC_RXQ_CTL);
rxq_ctl &= ~BGMAC_RXQ_CTL_MDP_MASK;
bp_clk = bcma_pmu_get_bus_clock(>core->bus->drv_cc) /
@@ -1472,14 +1478,12 @@ static int bgmac_fixed_phy_register(struct bgmac *bgmac)
 
 static int bgmac_mii_register(struct bgmac *bgmac)
 {
-   struct bcma_chipinfo *ci = >core->bus->chipinfo;
struct mii_bus *mii_bus;
struct phy_device *phy_dev;
char bus_id[MII_BUS_ID_SIZE + 3];
int err = 0;
 
-   if (ci->id == BCMA_CHIP_ID_BCM4707 ||
-   ci->id == BCMA_CHIP_ID_BCM53018)
+   if (bgmac_is_bcm4707_family(bgmac))
return bgmac_fixed_phy_register(bgmac);
 
mii_bus = mdiobus_alloc();
@@ -1539,7 +1543,6 @@ static void bgmac_mii_unregister(struct bgmac *bgmac)
 /* http://bcm-v4.sipsolutions.net/mac-gbit/gmac/chipattach */
 static int bgmac_probe(struct bcma_device *core)
 {
-   struct bcma_chipinfo *ci = >bus->chipinfo;
struct net_device *net_dev;
struct bgmac *bgmac;
struct ssb_sprom *sprom = >bus->sprom;
@@ -1620,8 +1623,7 @@ static int bgmac_probe(struct bcma_device *core)
bgmac_chip_reset(bgmac);
 
/* For Northstar, we have to take all GMAC core out of reset */
-   if (ci->id == BCMA_CHIP_ID_BCM4707 ||
-   ci->id == BCMA_CHIP_ID_BCM53018) {
+   if (bgmac_is_bcm4707_family(bgmac)) {
struct bcma_device *ns_core;
int ns_gmac;
 
-- 
1.8.4.5



Re: [PATCH] ipv6: release idev lock before calling ipv6_ifa_notify()

2016-01-29 Thread David Miller
From: Gregory Herrero 
Date: Thu, 28 Jan 2016 09:34:52 +0100

> @@ -2302,8 +2302,11 @@ static void manage_tempaddrs(struct inet6_dev *idev,
>   ift->flags &= ~IFA_F_DEPRECATED;
>  
>   spin_unlock(>lock);
> - if (!(flags_F_TENTATIVE))
> + if (!(flags & IFA_F_TENTATIVE)) {
> + read_unlock_bh(>lock);
>   ipv6_ifa_notify(0, ift);
> + read_lock_bh(>lock);
> + }
>   }
>  
>   if ((create || list_empty(>tempaddr_list)) &&

You can't do this, the idev->lock has to be held to keep the list we
are iterating over from changing.


Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen

2016-01-29 Thread David Miller
From: Eric Dumazet 
Date: Fri, 29 Jan 2016 19:23:54 -0800

> On Fri, 2016-01-29 at 18:49 -0800, Alexander Duyck wrote:
>> This patch fixes an issue with unaligned accesses when using
>> eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
>> The fact is when trying to check the length we don't need to be looking at
>> the flow label so we can reorder the checks to first check if we are
>> supposed to gather the flow label and then make the call to actually get
>> it.
>> 
>> Reported-by: Sowmini Varadhan 
>> Signed-off-by: Alexander Duyck 
>> ---
> 
> 
> You did not quite follow the discussion Alexander, we did not exactly
> took a decision about this bug.
> 
> As we mentioned, there are other parts that need care.
> 
> key_keyid->keyid = *keyid;
> 
> Please address them, instead of having to 'wait' for the next crash.

Indeed, this is a more fundamental issue.

This change in and of itself isn't bad, and is probably a suitable
micro-optimization for net-next, but it doesn't fix the bug in
question.


Re: xfrm: UFO + ESP = double fragmentation

2016-01-29 Thread Herbert Xu
On Sat, Jan 30, 2016 at 12:44:24AM +0100, Jiri Bohac wrote:
>
> Is there a situation when xfrm_output_gso() does the right thing?

Yes because you've just broken TSO over IPsec.

In fact you're remarkably close to the right solution which is
to avoid xfrm_output_gso for SKB_GSO_UDP packets.

You should also work through all the other types (e.g., tunnels)
one-by-one and determine which ones should be fragmented and
which ones shouldn't.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: bonding (IEEE 802.3ad) not working with qemu/virtio

2016-01-29 Thread David Miller
From: Nikolay Aleksandrov 
Date: Fri, 29 Jan 2016 22:48:26 +0100

> On 01/29/2016 10:45 PM, Jay Vosburgh wrote:
>> Nikolay Aleksandrov  wrote:
>> 
>>> On 01/25/2016 05:24 PM, Bjørnar Ness wrote:
 As subject says, 802.3ad bonding is not working with virtio network model.

 The only errors I see is:

 No 802.3ad response from the link partner for any adapters in the bond.

 Dumping the network traffic shows that no LACP packets are sent from the
 host running with virtio driver, changing to for example e1000 solves
 this problem
 with no configuration changes.

 Is this a known problem?

>>> [Including bonding maintainers for comments]
>>>
>>> Hi,
>>> Here's a workaround patch for virtio_net devices that "cheats" the
>>> duplex test (which is the actual problem). I've tested this locally
>>> and it works for me.
>>> I'd let the others comment on the implementation, there're other signs
>>> that can be used to distinguish a virtio_net device so I'm open to 
>>> suggestions.
>>> Also feedback if this is at all acceptable would be appreciated.
>> 
>>  Should virtio instead provide an arbitrary speed and full duplex
>> to ethtool, as veth does?
>> 
>>  Creating a magic whitelist of devices deep inside the 802.3ad
>> implementation seems less desirable.
>> 
> TBH, I absolutely agree. In fact here's what we've been doing:
> add set_settings which allows the user to set any speed/duplex
> and get_settings of course to retrieve that. This is also useful
> for testing other stuff that requires speed and duplex, not only
> for the bonding case.

I also agree.  Having a whitelist is just rediculous.

There should be a default speed/duplex setting for such devices as well.
We can pick one that will be use universally for these kinds of devices.


Re: [PATCH net] irda: fix a potential use-after-free in ircomm_param_request

2016-01-29 Thread David Miller
From: Cong Wang 
Date: Fri, 29 Jan 2016 11:58:03 -0800

> self->ctrl_skb is protected by self->spinlock, we should not
> access it out of the lock. Move the debugging printk inside.
> 
> Reported-by: Dmitry Vyukov 
> Cc: Samuel Ortiz 
> Signed-off-by: Cong Wang 

Applied, thanks.


Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Eric Dumazet
On Fri, 2016-01-29 at 16:47 -0800, Tom Herbert wrote:

> Hmm, thanks for pointing that out. It looks like this is called by a
> couple drivers as part of pulling up the data to get alignment. I am
> actually surprised they are doing this. Flow dissector is not the
> cheapest function in the world and it seems like a shame to be calling
> it this early and not even getting a hash for the skb. Also, this is
> another instance of touching the data so if we do get to the point of
> trying steering packets without cache miss (another thread); this
> undermines that. Seems like it might be just as well to pull up a
> fixed number of bytes (ixgbe want min of 60 bytes), or don't pull up
> anything avoid the cache miss here and let the stack pull up as
> needed.

Except that when for example mlx4 is able to pull only the headers
instead of a fixed amount of bytes, we increased the performance,
because GRO could cook packets twice as big for tunneled traffic.

( commit cfecec56ae7c7c40f23fbdac04acee027ca3bd66 )

1) Increasing performance by prefetching headers is a separate matter,
completely orthogonal to this.

2) Taking the opportunity to also give back the l4 hash as a bonus
has been considered, but tests showed no clear benefit.

  For typical cases where l4 hash is not used (RPS and RFS being not
enabled by default on linux), computing it brings nothing but added
complexity.





Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen

2016-01-29 Thread Eric Dumazet
On Fri, 2016-01-29 at 18:49 -0800, Alexander Duyck wrote:
> This patch fixes an issue with unaligned accesses when using
> eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
> The fact is when trying to check the length we don't need to be looking at
> the flow label so we can reorder the checks to first check if we are
> supposed to gather the flow label and then make the call to actually get
> it.
> 
> Reported-by: Sowmini Varadhan 
> Signed-off-by: Alexander Duyck 
> ---


You did not quite follow the discussion Alexander, we did not exactly
took a decision about this bug.

As we mentioned, there are other parts that need care.

key_keyid->keyid = *keyid;

Please address them, instead of having to 'wait' for the next crash.

BTW, even a memcpy(_addrs->v4addrs, >saddr, 8) could crash, as
the compiler can certainly assume src and dst are 4 bytes aligned, and
could use word accesses when inlining memcpy() even on Sparc.

Apparently the compiler used by Sowmini is gentle.

Thanks.




Re: [PATCHv2] ipv4: ipconfig: avoid unused ic_proto_used symbol

2016-01-29 Thread David Miller
From: Arnd Bergmann 
Date: Thu, 28 Jan 2016 17:39:24 +0100

> When CONFIG_PROC_FS, CONFIG_IP_PNP_BOOTP, CONFIG_IP_PNP_DHCP and
> CONFIG_IP_PNP_RARP are all disabled, we get a warning about the
> ic_proto_used variable being unused:
> 
> net/ipv4/ipconfig.c:146:12: error: 'ic_proto_used' defined but not used 
> [-Werror=unused-variable]
> 
> This avoids the warning, by making the definition conditional on
> whether a dynamic IP configuration protocol is configured. If not,
> we know that the value is always zero, so we can optimize away the
> variable and all code that depends on it.
> 
> Signed-off-by: Arnd Bergmann 

Applied.


Re: [PATCHv2] net: moxart: use correct accessors for DMA memory

2016-01-29 Thread David Miller
From: Arnd Bergmann 
Date: Thu, 28 Jan 2016 17:54:33 +0100

> The moxart ethernet driver confuses coherent DMA buffers with
> MMIO registers.
> 
> moxart_ether.c: In function 'moxart_mac_setup_desc_ring':
> moxart_ether.c:146:428: error: passing argument 1 of '__fswab32' makes 
> integer from pointer without a cast [-Werror=int-conversion]
> moxart_ether.c:74:39: warning: incorrect type in argument 3 (different 
> address spaces)
> moxart_ether.c:74:39:expected void *cpu_addr
> moxart_ether.c:74:39:got void [noderef] *tx_desc_base
> 
> This leaves the basic logic alone and uses normal pointers for
> the virtual address of the descriptor. As we cannot use readl/writel
> to access them, we also introduce our own moxart_desc_read
> moxart_desc_write helpers that perform the same endianess swap
> as the original code, but without the address space conversion.
> 
> The barriers are made explicit here where needed: Even in the worst-case
> scenario, we just have to use a rmb() after checking ownership so
> we don't read any input data before we are sure it is value, and we
> use wmb() before transferring ownership back to the device.
> 
> Signed-off-by: Arnd Bergmann 

Applied, thanks.


Re: [PATCH net 5/6] net: mvneta: The mvneta_percpu_elect function should be atomic

2016-01-29 Thread David Miller
From: Gregory CLEMENT 
Date: Fri, 29 Jan 2016 17:26:06 +0100

> @@ -370,6 +370,8 @@ struct mvneta_port {
>   struct net_device *dev;
>   struct notifier_block cpu_notifier;
>   int rxq_def;
> + /* protect  */
> + spinlock_t lock;
>  
>   /* Core clock */
>   struct clk *clk;

Protect what?  This comment needs a lot of improvement.

Everyone knows a spinlock "protects" things, so if you aren't going
to actually describe what this lock protects, and in what contexts
the lock is used, you might as well not say anything at all.


Re: [PATCH net v2 0/4] net: add and use rx_nohandler stat counter

2016-01-29 Thread David Miller
From: Jarod Wilson 
Date: Thu, 28 Jan 2016 10:49:44 -0500

> The network core tries to keep track of dropped packets, but some packets
> you wouldn't really call dropped, so much as intentionally ignored, under
> certain circumstances. One such case is that of bonding and team device
> slaves that are currently inactive. Their respective rx_handler functions
> return RX_HANDLER_EXACT (the only places in the kernel that return that),
> which ends up tracking into the network core's __netif_receive_skb_core()
> function's drop path, with no pt_prev set. On a noisy network, this can
> result in a very rapidly incrementing rx_dropped counter, not only on the
> inactive slave(s), but also on the master device, such as the following:
 ...

Both my inbox and patchwork only show patch 2, 3, and 4.  Where is #1?

Thanks.


Re: [PATCH net v2 0/2] pv6: fix sticky pktinfo behaviour

2016-01-29 Thread David Miller
From: Paolo Abeni 
Date: Fri, 29 Jan 2016 12:30:18 +0100

> Currently:
> 
> ip addr add dev eth0 2001:0010::1/64
> ip addr add dev eth1 2001:0020::1/64
> ping6 -I eth0 2001:0020::2
> 
> do not lead to the expected results, i.e. eth1 is used as the
> egress interface.
> 
> This is due to two related issues in handling sticky pktinfo,
> used by ping6 to enforce the device binding:
> 
> - ip6_dst_lookup_flow()/ip6_dst_lookup_tail() do not really enforce
> flowi6_oif match
> - ipv6 udp connect() just ignore flowi6_oif
> 
> These patches address each issue individually.
> 
> The kernel has never enforced the egress interface specified
> via the sticky pktinfo, except briefly between the commits
> 741a11d9e410 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set")
> and
> d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE flag if saddr set"),
> but the ping6 tools was unaffected up to iputils-20100214,
> since before it used SO_BINDTODEVICE to enforce the egress
> interface.

Series applied, thanks.


Re: [PATCH net] net: dsa: mv88e6xxx: fix port VLAN maps

2016-01-29 Thread David Miller
From: Vivien Didelot 
Date: Thu, 28 Jan 2016 16:54:37 -0500

> Currently the port based VLAN maps should be configured to allow every
> port to egress frames on all other ports, except themselves.
> 
> The debugfs interface shows that they are misconfigured. For instance, a
> 7-port switch has the following content in the related register 0x06:
> 
>GLOBAL GLOBAL2 SERDES   0123456
> ...
> 6:  1fa41f0f   4   7f   7e   7d   7c   7b   7a   79
> ...
> 
> This means that port 3 is allowed to talk to port 2-6, but cannot talk
> to ports 0 and 1. With this fix, port 3 can correctly talk to all ports
> except 3 itself:
> 
>GLOBAL GLOBAL2 SERDES   0123456
> ...
> 6:  1fa41f0f   4   7e   7d   7b   77   6f   5f   3f
> ...
> 
> Fixes: ede8098d0fef ("net: dsa: mv88e6xxx: bridges do not need an FID")
> Reported-by: Kevin Smith 
> Signed-off-by: Vivien Didelot 

Applied.


Re: [net PATCH] fib_trie: Fix shift by 32 in fib_table_lookup

2016-01-29 Thread David Miller
From: Alexander Duyck 
Date: Thu, 28 Jan 2016 13:42:24 -0800

> The fib_table_lookup function had a shift by 32 that triggered a UBSAN
> warning.  This was due to the fact that I had placed the shift first and
> then followed it with the check for the suffix length to ignore the
> undefined behavior.  If we reorder this so that we verify the suffix is
> less than 32 before shifting the value we can avoid the issue.
> 
> Reported-by: Toralf Förster 
> Signed-off-by: Alexander Duyck 

Applied.


Re: [PATCH] vxlan: fix a out of bounds access in __vxlan_find_mac

2016-01-29 Thread David Miller
From: roy.qing...@gmail.com
Date: Fri, 29 Jan 2016 09:43:47 +0800

> From: Li RongQing 
> 
> The size of all_zeros_mac is 6 byte, but eth_hash() will access the
> 8 byte, and KASan reported the below bug:
 ...
> it can be fixed by enlarging the all_zeros_mac to 8 byte, although it is
> harmless; eth_hash() will be called in other place with the memory which
> is larger and equal to 8 byte.
> 
> Signed-off-by: Li RongQing 

Applied, thanks.


Re: [PATCH] bgmac: add helper checking for BCM4707 / BCM53018 chip id

2016-01-29 Thread David Miller
From: Rafał Miłecki 
Date: Sat, 30 Jan 2016 00:41:07 +0100

> Chipsets with BCM4707 / BCM53018 ID require special handling at a few
> places in the code. It's likely there will be more IDs to check in the
> future. To simplify it add this trivial helper.
> 
> Signed-off-by: Rafał Miłecki 

The net-next tree is not open, therefore submitting this kind of cleanup
is not appropriate.


Re: [PATCH net] netlink: not trim skb for mmaped socket when dump

2016-01-29 Thread David Miller
From: Ken-ichirou MATSUZAWA 
Date: Fri, 29 Jan 2016 10:45:50 +0900

> We should not trim skb for mmaped socket since its buf size is fixed
> and userspace will read as frame which data equals head. mmaped
> socket will not call recvmsg, means max_recvmsg_len is 0,
> skb_reserve was not called before commit: db65a3aaf29e.
> 
> Fixes: db65a3aaf29e (netlink: Trim skb to alloc size to avoid MSG_TRUNC)
> Signed-off-by: Ken-ichirou MATSUZAWA 

Applied, thank you.


Re: [PATCH v2 0/7] network driver fixes

2016-01-29 Thread David Miller
From: Arnd Bergmann 
Date: Fri, 29 Jan 2016 12:39:08 +0100

> This is an updated series of fixes for the network device drivers
> that showed warnings in ARM randconfig.
> 
> Changes since v1 are:
> 
> dropped "net: macb: avoid uninitialized variables", already fixed in net-next
> 
> dropped "net: fddi/defxx: avoid warning about uninitialized variable
>   use", already fixed in net-next
> 
> added missing barriers in "net: moxart: use correct accessors for
>   DMA memory"
> 
> clarified "net: bgmac: clarify CONFIG_BCMA dependency" changelog

Series applied, thanks.


[net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen

2016-01-29 Thread Alexander Duyck
This patch fixes an issue with unaligned accesses when using
eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
The fact is when trying to check the length we don't need to be looking at
the flow label so we can reorder the checks to first check if we are
supposed to gather the flow label and then make the call to actually get
it.

Reported-by: Sowmini Varadhan 
Signed-off-by: Alexander Duyck 
---

 net/core/flow_dissector.c |   15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index d79699c9d1b9..f09114f8ee33 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -208,7 +208,6 @@ ip:
case htons(ETH_P_IPV6): {
const struct ipv6hdr *iph;
struct ipv6hdr _iph;
-   __be32 flow_label;
 
 ipv6:
iph = __skb_header_pointer(skb, nhoff, sizeof(_iph), data, 
hlen, &_iph);
@@ -230,17 +229,19 @@ ipv6:
key_control->addr_type = FLOW_DISSECTOR_KEY_IPV6_ADDRS;
}
 
-   flow_label = ip6_flowlabel(iph);
-   if (flow_label) {
-   if (dissector_uses_key(flow_dissector,
-  FLOW_DISSECTOR_KEY_FLOW_LABEL)) {
+   if (dissector_uses_key(flow_dissector,
+  FLOW_DISSECTOR_KEY_FLOW_LABEL)) {
+   __be32 flow_label = ip6_flowlabel(iph);
+
+   if (flow_label) {
key_tags = 
skb_flow_dissector_target(flow_dissector,
 
FLOW_DISSECTOR_KEY_FLOW_LABEL,
 
target_container);
key_tags->flow_label = ntohl(flow_label);
+
+   if (flags & FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL)
+   goto out_good;
}
-   if (flags & FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL)
-   goto out_good;
}
 
if (flags & FLOW_DISSECTOR_F_STOP_AT_L3)



Re: pull-request: wireless-drivers 2016-01-29

2016-01-29 Thread David Miller
From: Kalle Valo 
Date: Fri, 29 Jan 2016 10:47:19 +0200

> few fixes for 4.5. Nothing really standing out, see the tag for
> more info. Please let me know if you have any problems.

Pulled, thanks Kalle.



Re: [net PATCH] flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen

2016-01-29 Thread Alexander Duyck
On Fri, Jan 29, 2016 at 7:35 PM, David Miller  wrote:
> From: Eric Dumazet 
> Date: Fri, 29 Jan 2016 19:23:54 -0800
>
>> On Fri, 2016-01-29 at 18:49 -0800, Alexander Duyck wrote:
>>> This patch fixes an issue with unaligned accesses when using
>>> eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
>>> The fact is when trying to check the length we don't need to be looking at
>>> the flow label so we can reorder the checks to first check if we are
>>> supposed to gather the flow label and then make the call to actually get
>>> it.
>>>
>>> Reported-by: Sowmini Varadhan 
>>> Signed-off-by: Alexander Duyck 
>>> ---
>>
>>
>> You did not quite follow the discussion Alexander, we did not exactly
>> took a decision about this bug.
>>
>> As we mentioned, there are other parts that need care.
>>
>> key_keyid->keyid = *keyid;
>>
>> Please address them, instead of having to 'wait' for the next crash.

The key_id bit is not part of the "basic" keys.  The basic keys only
require a 2 byte alignment, it is when you start parsing to populate a
hash that you get into the wider fields.  As far as I know there
aren't any length fields that are wider than 2 bytes.

> Indeed, this is a more fundamental issue.
>
> This change in and of itself isn't bad, and is probably a suitable
> micro-optimization for net-next, but it doesn't fix the bug in
> question.

Actually it does fix the bug as reported.  The problem as reported was
for ixgbe using the function for parsing the length.  In the case of
just getting the length my patch resolves the issue as there are no
other accesses wider than 2 bytes used when *just* computing the
length.  This is something that will have to get addressed sooner or
later anyway, and at least with this patch in the basic case of IPv6
over ixgbe won't cause any issues.

The key_id bit that Eric is pointing out is called in a path that is
only executed if you are actually attempting to compute a hash.  Odds
are the fix for that is going to be much more complicated and extends
well outside this function since it is a fundamental issue with VXLAN
and GRE TEB tunnels as either the inner or outer headers end up being
unaligned unless you want to break them into two buffers so that they
can both be aligned at the same time.

Extending beyond this has anyone taken a look at the transmit path for
these tunnels?  I would suspect we probably also have a number of
issues there since either the inner or outer IP headers have to be
unaligned when we are setting those up.

- Alex


Re: [PATCH net] tcp: avoid cwnd undo after receiving ECN

2016-01-29 Thread David Miller
From: Yuchung Cheng 
Date: Fri, 29 Jan 2016 15:11:50 -0800

> RFC 4015 section 3.4 says the TCP sender MUST refrain from
> reversing the congestion control state when the ACK signals
> congestion through the ECN-Echo flag. Currently we may not
> always do that when prior_ssthresh is reset upon receiving
> ACKs with ECE marks. This patch fixes that.
> 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Eric Dumazet 

Applied, thanks.


[PATCH net] tcp: avoid cwnd undo after receiving ECN

2016-01-29 Thread Yuchung Cheng
RFC 4015 section 3.4 says the TCP sender MUST refrain from
reversing the congestion control state when the ACK signals
congestion through the ECN-Echo flag. Currently we may not
always do that when prior_ssthresh is reset upon receiving
ACKs with ECE marks. This patch fixes that.

Signed-off-by: Yuchung Cheng 
Signed-off-by: Neal Cardwell 
Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_input.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d2ad433..1c2a734 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2366,8 +2366,6 @@ static void tcp_undo_cwnd_reduction(struct sock *sk, bool 
unmark_loss)
tp->snd_ssthresh = tp->prior_ssthresh;
tcp_ecn_withdraw_cwr(tp);
}
-   } else {
-   tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh);
}
tp->snd_cwnd_stamp = tcp_time_stamp;
tp->undo_marker = 0;
-- 
2.7.0.rc3.207.g0ac5344



xfrm: UFO + ESP = double fragmentation

2016-01-29 Thread Jiri Bohac
Hi,

I'm seeing wrong fragmentation on locally generated UDPv6 packets
going out over ESP (transport mode):

UFO is turned on on the outgoing interface and MTU is 1500.
When 8 kB is written to a UDP socket, udpv6_sendmsg() calls
ip_append_data() which generates a single 8 kB GSO skb.

Through ip6_send_skb() it reaches xfrm_output(). Since
skb_is_gso(skb) is nonzero, xfrm_output_gso() is called.
It immediatelly segments the skb via skb_gso_segment() and then
calls xfrm_output2() on each individual segment.

This is wrong. RFC4303 says:
3.3.4.  Fragmentation
   If necessary, fragmentation is performed after ESP
   processing within an IPsec implementation.  Thus,
   transport mode ESP is applied only to whole IP
   datagrams (not to IP fragments).

Instead, xfrm_output_gso() applies the transform to each segment.
Since both the fragmentation header _and_ the ESP headers now
don't fit in the MTU and the ESP-encapsulated segments
are fragmented for a second time in ip6_finish_output().

The outcome is:
- the original 8k UDP packet is split into 6 ESP fragments
- the first 5 ESP fragments are 1508 bytes each, thus fragmented
  again into two fragments 

The destination host replies with ICMP parameter problem.

How is this supposed to work?
This hack fixes this specific case:

--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -198,7 +198,7 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
int err;
 
if (skb_is_gso(skb))
-   return xfrm_output_gso(net, sk, skb);
+   return xfrm_output2(net, sk, skb);
 
if (skb->ip_summed == CHECKSUM_PARTIAL) {
err = skb_checksum_help(skb);


Is there a situation when xfrm_output_gso() does the right thing?

Thanks,

-- 
Jiri Bohac 
SUSE Labs, SUSE CZ



Re: bonding (IEEE 802.3ad) not working with qemu/virtio

2016-01-29 Thread Nikolay Aleksandrov
On 01/25/2016 05:24 PM, Bjørnar Ness wrote:
> As subject says, 802.3ad bonding is not working with virtio network model.
> 
> The only errors I see is:
> 
> No 802.3ad response from the link partner for any adapters in the bond.
> 
> Dumping the network traffic shows that no LACP packets are sent from the
> host running with virtio driver, changing to for example e1000 solves
> this problem
> with no configuration changes.
> 
> Is this a known problem?
> 
[Including bonding maintainers for comments]

Hi,
Here's a workaround patch for virtio_net devices that "cheats" the
duplex test (which is the actual problem). I've tested this locally
and it works for me.
I'd let the others comment on the implementation, there're other signs
that can be used to distinguish a virtio_net device so I'm open to suggestions.
Also feedback if this is at all acceptable would be appreciated.

Could you give it a try ?

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 4cbb8b27a891..0578a95e3ade 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -334,6 +334,7 @@ static u16 __get_link_speed(struct port *port)
 static u8 __get_duplex(struct port *port)
 {
struct slave *slave = port->slave;
+   const struct net_device *slave_dev = slave->dev;
u8 retval = 0x0;
 
/* handling a special case: when the configuration starts with
@@ -354,6 +355,15 @@ static u8 __get_duplex(struct port *port)
break;
}
}
+   if (!retval && slave_dev->dev.parent &&
+   !strcmp(dev_driver_string(slave_dev->dev.parent),
+   "virtio_net") &&
+   netif_carrier_ok(slave_dev)) {
+   netdev_dbg(slave->bond->dev, "port %d: activating virtio_net 
workaround\n",
+  port->actor_port_number);
+   retval = 1;
+   }
+
return retval;
 }
 


Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Eric Dumazet
On Fri, 2016-01-29 at 13:16 -0800, Alexander Duyck wrote:

> It has to be something recent.  I know back when I wrote the code I
> tested it on a few different architectures and had to add a few bits
> in __skb_get_poff so that it would read doff as a u8 instead of
> bitfield in a u32.
> 
> Looking at the code it seems like this should be an easy fix.  Just
> swap the two lines that acquire and test the flow label with the check
> for dissector_uses_key(... _KEY_FLOW_LABEL).  Then ixgbe will stop
> trying to grab flow labels.

Note that if we properly align headers, we also align TCP/UDP payload.

Meaning that copying data in recvmsg() is likely faster in some cpus.
(The ones having NET_IP_ALIGN set to 2 presumably)




Re: [RFC] Kernel unaligned access at __skb_flow_dissect

2016-01-29 Thread Tom Herbert
On Fri, Jan 29, 2016 at 1:09 PM, Sowmini Varadhan
 wrote:
> On (01/29/16 13:00), Tom Herbert wrote:
>> Doesn't every IP/VXLAN packet contains unaligned headers? Why don't
>> those create alignment issues (like when stack looks at addresses)?
>
> They do.
>
> http://comments.gmane.org/gmane.linux.network/370672
>
> some of it was fixed in https://lkml.org/lkml/2015/7/20/63.
> There's still some NET_IP_ALIGN missing. IIRC, I was seeing
> this for mldv3, but the fix has to be done carefully, by
> someone who knows how to fully regression test it.
>
> I dont know if other tunneling methods manage to get the
> NET_IP_ALIGN correct in every case.
>
Fortunately, most encapsulations are based on 4-byte alignment.
ETHER_IP even specifies a two byte header to ensure that whatever
follows the encapsulated Ethernet header is 4-byte aligned. The
encapsulation protocols that carry Ethernet without any inserted bytes
for alignment are the potential problem (GRE, VXLAN, etc.).
NET_IP_ALIGN can't help because the outer and inner (IP) headers have
different alignment, so no one value can work.

> Also, while sparc complains about unaligned access
> in most cases, some sins may pass under the radar, and other
> platforms dont even generate traps, so it's easy to not know
> that there's a problem, a lot of the time.
>
That is worrisome. The only way I can think of to be completely
protected would be to copy encapsulated Ethernet frames to
NET_IP_ALIGN buffer in the driver :-( It doesn't look like VXLAN does
anything like that.

Tom

> --Sowmini
>


Re: [PATCH] vxlan: fix a out of bounds access in __vxlan_find_mac

2016-01-29 Thread Cong Wang
On Thu, Jan 28, 2016 at 5:43 PM,   wrote:
> From: Li RongQing 
>
> The size of all_zeros_mac is 6 byte, but eth_hash() will access the
> 8 byte, and KASan reported the below bug:


Sounds like we should fix eth_hash() (macvlan has a same function),
it should not read beyond 6 bytes.


Re: bonding (IEEE 802.3ad) not working with qemu/virtio

2016-01-29 Thread Jay Vosburgh
Nikolay Aleksandrov  wrote:

>On 01/25/2016 05:24 PM, Bjørnar Ness wrote:
>> As subject says, 802.3ad bonding is not working with virtio network model.
>> 
>> The only errors I see is:
>> 
>> No 802.3ad response from the link partner for any adapters in the bond.
>> 
>> Dumping the network traffic shows that no LACP packets are sent from the
>> host running with virtio driver, changing to for example e1000 solves
>> this problem
>> with no configuration changes.
>> 
>> Is this a known problem?
>> 
>[Including bonding maintainers for comments]
>
>Hi,
>Here's a workaround patch for virtio_net devices that "cheats" the
>duplex test (which is the actual problem). I've tested this locally
>and it works for me.
>I'd let the others comment on the implementation, there're other signs
>that can be used to distinguish a virtio_net device so I'm open to suggestions.
>Also feedback if this is at all acceptable would be appreciated.

Should virtio instead provide an arbitrary speed and full duplex
to ethtool, as veth does?

Creating a magic whitelist of devices deep inside the 802.3ad
implementation seems less desirable.

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com


Re: bonding (IEEE 802.3ad) not working with qemu/virtio

2016-01-29 Thread Nikolay Aleksandrov
On 01/29/2016 10:45 PM, Jay Vosburgh wrote:
> Nikolay Aleksandrov  wrote:
> 
>> On 01/25/2016 05:24 PM, Bjørnar Ness wrote:
>>> As subject says, 802.3ad bonding is not working with virtio network model.
>>>
>>> The only errors I see is:
>>>
>>> No 802.3ad response from the link partner for any adapters in the bond.
>>>
>>> Dumping the network traffic shows that no LACP packets are sent from the
>>> host running with virtio driver, changing to for example e1000 solves
>>> this problem
>>> with no configuration changes.
>>>
>>> Is this a known problem?
>>>
>> [Including bonding maintainers for comments]
>>
>> Hi,
>> Here's a workaround patch for virtio_net devices that "cheats" the
>> duplex test (which is the actual problem). I've tested this locally
>> and it works for me.
>> I'd let the others comment on the implementation, there're other signs
>> that can be used to distinguish a virtio_net device so I'm open to 
>> suggestions.
>> Also feedback if this is at all acceptable would be appreciated.
> 
>   Should virtio instead provide an arbitrary speed and full duplex
> to ethtool, as veth does?
> 
>   Creating a magic whitelist of devices deep inside the 802.3ad
> implementation seems less desirable.
> 
TBH, I absolutely agree. In fact here's what we've been doing:
add set_settings which allows the user to set any speed/duplex
and get_settings of course to retrieve that. This is also useful
for testing other stuff that requires speed and duplex, not only
for the bonding case.

I'll add the virtio_net maintainers to the discussion, see if it's
okay with everyone and I'll move to send patches once net-next opens up.

Thanks!


>   -J
>   
> ---
>   -Jay Vosburgh, jay.vosbu...@canonical.com
> 



Re: WARNING at net/mac80211/rate.c:513 ieee80211_get_tx_rates [mac80211]

2016-01-29 Thread Linus Torvalds
On Fri, Jan 29, 2016 at 9:54 AM, Larry Finger  wrote:
>
> The test patch that Johannes sent earlier was close. The section needed to
> add VHT rates is:

Hmm. This looks pretty much exactly like what I already tried (I had
fixed Johannes' patch to use "vht_cap" already, since it didn't
compile otherwise).

So the only difference is that it only checks WIRELESS_MODE_AC_5G.

But it worked for me this time. I have no idea why.

Maybe Johannes' patch actually always worked for me, but I just had a
transient problem that made me think it didn't. I think I only booted
it once, and went "oh, ok, no network, that didn't work".

  Linus


  1   2   >