date:20160711

Re: [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, do segmentation even for non IPSKB_FORWARDED skbs

2016-07-11 Thread Shmulik Ladkani

On Sat, 9 Jul 2016 15:22:30 +0200 Florian Westphal  wrote:
> Shmulik Ladkani  wrote:
> > I'd appreciate any suggestion how to determine traffic is local OTHER
> > THAN testing IPSKB_FORWARDED; If we have such a way, there wouldn't be an
> > impact on local traffic.
> >   
> > > What about setting IPCB FORWARD flag in iptunnel_xmit if
> > > skb->skb_iif != 0... instead?  

I've came up with a suggestion that does not abuse IPSKB_FORWARDED,
while properly addressing the use case (and similar ones), without
introducing the cost of entering 'skb_gso_validate_mtu' in the local
case.

How about:

@@ -220,12 +220,15 @@ static int ip_finish_output_gso(struct net *net, struct 
sock *sk,
struct sk_buff *skb, unsigned int mtu)
 {
netdev_features_t features;
+   int local_trusted_gso;
struct sk_buff *segs;
int ret = 0;

-   /* common case: locally created skb or seglen is <= mtu */
-   if (((IPCB(skb)->flags & IPSKB_FORWARDED) == 0) ||
- skb_gso_validate_mtu(skb, mtu))
+   local_trusted_gso = (IPCB(skb)->flags & IPSKB_FORWARDED) == 0 &&
+   !(skb_shinfo(skb)->gso_type & SKB_GSO_DODGY);
+   /* common case: locally created skb from a trusted gso source or
+* seglen is <= mtu */
+   if (local_trusted_gso || skb_gso_validate_mtu(skb, mtu))
return ip_finish_output2(net, sk, skb);

/* Slowpath -  GSO segment length is exceeding the dst MTU.

This well addresses the usecase where we have gso-skb arriving from an
untrusted source, thus its gso_size is out of our control (e.g. tun/tap,
macvtap, af_packet, xen-netfront...).

Locally "gso trusted" skbs (the common case) will NOT suffer the
additional (possibly costy) call to 'skb_gso_validate_mtu'.

Also, if IPSKB_FORWARDED is true, behavior stays exactly the same.

Regards,
Shmulik

[no subject]

2016-07-11 Thread EASY LOAN FINANCE

Contact us if you need a loan for 1% interest

Re: [PATCH -next] ipv4: af_inet: make it explicitly non-modular

2016-07-11 Thread David Miller

From: Paul Gortmaker 
Date: Mon, 11 Jul 2016 16:37:51 -0400

> The Makefile controlling compilation of this file is obj-y,
> meaning that it currently is never being built as a module.
> 
> Since MODULE_ALIAS is a no-op for non-modular code, we can simply
> remove the MODULE_ALIAS_NETPROTO variant used here.
> 
> We replace module.h with kmod.h since the file does make use of
> request_module() in order to load other modules from here.
> 
> We don't have to worry about init.h coming in via the removed
> module.h since the file explicitly includes init.h already.
> 
> Cc: "David S. Miller" 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Cc: netdev@vger.kernel.org
> Signed-off-by: Paul Gortmaker 

Applied to net-next, thanks.

Re: [PATCH net 0/3] tipc: three small fixes

2016-07-11 Thread David Miller

From: Jon Maloy 
Date: Mon, 11 Jul 2016 16:08:34 -0400

> Fixes for some broadcast link problems that may occur in large systems.

Series applied, thanks Jon.

Re: [PATCH 1/2] Bluetooth: Add LED triggers for HCI frames tx and rx

2016-07-11 Thread Guodong Xu

Dear maintainers,

Would you have a review on this?

-Guodong

On 23 June 2016 at 12:58, Guodong Xu  wrote:
> Two LED triggers are defined: tx_led and rx_led. Upon frames
> available in HCI core layer, for tx or for rx, the combined LED
> can blink.
>
> Verified on HiKey, 96boards. It uses hi6220 SoC and TI WL1835 combo
> chip.
>
> Signed-off-by: Guodong Xu 
> ---
>  include/net/bluetooth/hci_core.h |  1 +
>  net/bluetooth/hci_core.c |  3 +++
>  net/bluetooth/leds.c | 15 +++
>  net/bluetooth/leds.h |  2 ++
>  4 files changed, 21 insertions(+)
>
> diff --git a/include/net/bluetooth/hci_core.h 
> b/include/net/bluetooth/hci_core.h
> index dc71473..37b8dd9 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -398,6 +398,7 @@ struct hci_dev {
> bdaddr_trpa;
>
> struct led_trigger  *power_led;
> +   struct led_trigger  *tx_led, *rx_led;
>
> int (*open)(struct hci_dev *hdev);
> int (*close)(struct hci_dev *hdev);
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 45a9fc6..c6e1210 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -3248,6 +3248,7 @@ int hci_recv_frame(struct hci_dev *hdev, struct sk_buff 
> *skb)
> skb_queue_tail(>rx_q, skb);
> queue_work(hdev->workqueue, >rx_work);
>
> +   hci_leds_blink_oneshot(hdev->rx_led);
> return 0;
>  }
>  EXPORT_SYMBOL(hci_recv_frame);
> @@ -3325,6 +3326,8 @@ static void hci_send_frame(struct hci_dev *hdev, struct 
> sk_buff *skb)
> BT_ERR("%s sending frame failed (%d)", hdev->name, err);
> kfree_skb(skb);
> }
> +
> +   hci_leds_blink_oneshot(hdev->tx_led);
>  }
>
>  /* Send HCI command */
> diff --git a/net/bluetooth/leds.c b/net/bluetooth/leds.c
> index 8319c84..c4825d5 100644
> --- a/net/bluetooth/leds.c
> +++ b/net/bluetooth/leds.c
> @@ -19,6 +19,8 @@ struct hci_basic_led_trigger {
>  #define to_hci_basic_led_trigger(arg) container_of(arg, \
> struct hci_basic_led_trigger, led_trigger)
>
> +#define BLUETOOTH_BLINK_DELAY  50 /* ms */
> +
>  void hci_leds_update_powered(struct hci_dev *hdev, bool enabled)
>  {
> if (hdev->power_led)
> @@ -37,6 +39,15 @@ static void power_activate(struct led_classdev *led_cdev)
> led_trigger_event(led_cdev->trigger, powered ? LED_FULL : LED_OFF);
>  }
>
> +void hci_leds_blink_oneshot(struct led_trigger *trig)
> +{
> +   unsigned long led_delay = BLUETOOTH_BLINK_DELAY;
> +
> +   if (!trig)
> +   return;
> +   led_trigger_blink_oneshot(trig, _delay, _delay, 0);
> +}
> +
>  static struct led_trigger *led_allocate_basic(struct hci_dev *hdev,
> void (*activate)(struct led_classdev *led_cdev),
> const char *name)
> @@ -71,4 +82,8 @@ void hci_leds_init(struct hci_dev *hdev)
>  {
> /* initialize power_led */
> hdev->power_led = led_allocate_basic(hdev, power_activate, "power");
> +   /* initialize tx_led */
> +   hdev->tx_led = led_allocate_basic(hdev, NULL, "tx");
> +   /* initialize rx_led */
> +   hdev->rx_led = led_allocate_basic(hdev, NULL, "rx");
>  }
> diff --git a/net/bluetooth/leds.h b/net/bluetooth/leds.h
> index a9c4d6e..9b1cccd 100644
> --- a/net/bluetooth/leds.h
> +++ b/net/bluetooth/leds.h
> @@ -9,8 +9,10 @@
>  #if IS_ENABLED(CONFIG_BT_LEDS)
>  void hci_leds_update_powered(struct hci_dev *hdev, bool enabled);
>  void hci_leds_init(struct hci_dev *hdev);
> +void hci_leds_blink_oneshot(struct led_trigger *trig);
>  #else
>  static inline void hci_leds_update_powered(struct hci_dev *hdev,
>bool enabled) {}
>  static inline void hci_leds_init(struct hci_dev *hdev) {}
> +static inline void hci_leds_blink_oneshot(struct led_trigger *trig) {}
>  #endif
> --
> 1.9.1
>

Re: [PATCH 3/3] crypto: Added Chelsio Menu to the Kconfig file

2016-07-11 Thread kbuild test robot

Hi,

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.7-rc7 next-20160711]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Yeshaswi-M-R-Gowda/crypto-chcr-Add-Chelsio-Crypto-Driver/20160712-023513
config: sh-allyesconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh 

All warnings (new ones prefixed by >>):

   In file included from include/linux/swab.h:4:0,
from include/uapi/linux/byteorder/little_endian.h:12,
from include/linux/byteorder/little_endian.h:4,
from arch/sh/include/uapi/asm/byteorder.h:5,
from arch/sh/include/asm/bitops.h:11,
from include/linux/bitops.h:36,
from include/linux/kernel.h:10,
from drivers/crypto/chelsio/chcr_algo.c:42:
   drivers/crypto/chelsio/chcr_algo.c: In function 'create_wreq':
   drivers/crypto/chelsio/chcr_algo.c:454:29: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
 wreq->cookie = cpu_to_be64((u64)req);
^
   include/uapi/linux/swab.h:129:32: note: in definition of macro '__swab64'
 (__builtin_constant_p((__u64)(x)) ? \
   ^
   include/linux/byteorder/generic.h:91:21: note: in expansion of macro 
'__cpu_to_be64'
#define cpu_to_be64 __cpu_to_be64
^
   drivers/crypto/chelsio/chcr_algo.c:454:17: note: in expansion of macro 
'cpu_to_be64'
 wreq->cookie = cpu_to_be64((u64)req);
^
   drivers/crypto/chelsio/chcr_algo.c:454:29: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
 wreq->cookie = cpu_to_be64((u64)req);
^
   include/uapi/linux/swab.h:23:12: note: in definition of macro 
'___constant_swab64'
 (((__u64)(x) & (__u64)0x00ffULL) << 56) | \
   ^
>> include/uapi/linux/byteorder/little_endian.h:36:43: note: in expansion of 
>> macro '__swab64'
#define __cpu_to_be64(x) ((__force __be64)__swab64((x)))
  ^
   include/linux/byteorder/generic.h:91:21: note: in expansion of macro 
'__cpu_to_be64'
#define cpu_to_be64 __cpu_to_be64
^
   drivers/crypto/chelsio/chcr_algo.c:454:17: note: in expansion of macro 
'cpu_to_be64'
 wreq->cookie = cpu_to_be64((u64)req);
^
   drivers/crypto/chelsio/chcr_algo.c:454:29: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
 wreq->cookie = cpu_to_be64((u64)req);
^
   include/uapi/linux/swab.h:24:12: note: in definition of macro 
'___constant_swab64'
 (((__u64)(x) & (__u64)0xff00ULL) << 40) | \
   ^
>> include/uapi/linux/byteorder/little_endian.h:36:43: note: in expansion of 
>> macro '__swab64'
#define __cpu_to_be64(x) ((__force __be64)__swab64((x)))
  ^
   include/linux/byteorder/generic.h:91:21: note: in expansion of macro 
'__cpu_to_be64'
#define cpu_to_be64 __cpu_to_be64
^
   drivers/crypto/chelsio/chcr_algo.c:454:17: note: in expansion of macro 
'cpu_to_be64'
 wreq->cookie = cpu_to_be64((u64)req);
^
   drivers/crypto/chelsio/chcr_algo.c:454:29: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
 wreq->cookie = cpu_to_be64((u64)req);
^
   include/uapi/linux/swab.h:25:12: note: in definition of macro 
'___constant_swab64'
 (((__u64)(x) & (__u64)0x00ffULL) << 24) | \
   ^
>> include/uapi/linux/byteorder/little_endian.h:36:43: note: in expansion of 
>> macro '__swab64'
#define __cpu_to_be64(x) ((__force __be64)__swab64((x)))
  ^
   include/linux/byteorder/generic.h:91:21: note: in expansion of macro 
'__cpu_to_be64'
#define cpu_to_be64 __cpu_to_be64
^
   drivers/crypto/chelsio/chcr_algo.c:454:17: note: in expansion of macro 
'cpu_to_be64'
 wreq->cookie = cpu_to_be64((u64)req);
^
   drivers/crypto/chelsio/chcr_algo.c:454:29: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
 wreq->cookie = cpu_to_be64((u64)req);
^
   include/uapi/linux/swab.h:26:12: note: in definition of macro 
'___constant_swab64'
 (((__

Re: [PATCH v2 net-next] rtnl: Add GFP flag argument to rtnl_unicast()

2016-07-11 Thread Masashi Honma


On 2016年07月12日 05:01, David Miller wrote:


The code is correct and optimal as-is.  There is no gain to your
changes.  gfp_any() will do the right thing.

In fact, your change makes the code more error prone because if any
of these code paths get moved into an atomic context they will break
unless somone remembers to also fix up the GFP flags.

Meanwhile with the existing use of gfp_any() it will work
transparently in such a situation.

I'm not applying this.


I see. Thank you for reviewing.

Re: eBPF tunable max instructions or max tail call?

2016-07-11 Thread Alexei Starovoitov

On Mon, Jul 11, 2016 at 05:56:07PM -0700, Sargun Dhillon wrote:
> It would be nice to have eBPF programs that are longer than 4096
> instructions. I'm trying to implement XSalsa20 in eBPF, and
> unfortunately, it doesn't fit into 4096 instructions since I'm
> unrolling all of the loops. Further than that, doing tail calls to
> process each block results in me hitting the tail call limit.

a cipher in bpf? wow. that's pushing it :)
we've been discussing various way of adding 'bounded loop' instruction
to avoid manual unrolling, but it will be still limited to the 4k
instruction per program, so probably won't help this use case.
Are you trying to do it in the networking context?

> It don't think that it makes much sense to expose the crypto API as
> BPF helpers, as I'm not sure if we can ensure safety, and timely
> execution with it. I may be wrong here, and if there is a sane, safe
> way to expose the crypto API, I'm all ears.

we had the patches to connect crypto api with bpf, but they were
too hacky to upstream, since then we redesigned the approach
and the latest should be much cleaner. The keys will be managed
through normal xfrm api and bpf will call into crypto with
mechanism similar to tail-call. The program will specify the
offset/length within the packet to encrypt/decrypt and next
program to execute when crypto operation completes.
Root only for xdp and tc only.

> Other than that, it would be nice to make the max instructions a knob,
> and I don't think that it has much downside, given it's only checked
> on load time. It would be nice to make the tail call limit a tunable
> as well, but I'm unsure of the performance impact it might have given
> that it's checked at runtime.
> 
> What do y'all think is reasonable? Make them both tunable? Just one? None?

It is preferred to achieve the goal without introducing a knob.
Also sounds like that increasing 4k to 8k won't really solve it anyway.

[added to the 3.18 stable tree] VSOCK: do not disconnect socket when peer has shutdown SEND only

2016-07-11 Thread Sasha Levin

From: Ian Campbell 

This patch has been added to the 3.18 stable tree. If you have any
objections, please let us know.

===

[ Upstream commit dedc58e067d8c379a15a8a183c5db318201295bb ]

The peer may be expecting a reply having sent a request and then done a
shutdown(SHUT_WR), so tearing down the whole socket at this point seems
wrong and breaks for me with a client which does a SHUT_WR.

Looking at other socket family's stream_recvmsg callbacks doing a shutdown
here does not seem to be the norm and removing it does not seem to have
had any adverse effects that I can see.

I'm using Stefan's RFC virtio transport patches, I'm unsure of the impact
on the vmci transport.

Signed-off-by: Ian Campbell 
Cc: "David S. Miller" 
Cc: Stefan Hajnoczi 
Cc: Claudio Imbrenda 
Cc: Andy King 
Cc: Dmitry Torokhov 
Cc: Jorgen Hansen 
Cc: Adit Ranadive 
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 net/vmw_vsock/af_vsock.c | 21 +
 1 file changed, 1 insertion(+), 20 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 85d232b..e8d3313 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1796,27 +1796,8 @@ vsock_stream_recvmsg(struct kiocb *kiocb,
else if (sk->sk_shutdown & RCV_SHUTDOWN)
err = 0;
 
-   if (copied > 0) {
-   /* We only do these additional bookkeeping/notification steps
-* if we actually copied something out of the queue pair
-* instead of just peeking ahead.
-*/
-
-   if (!(flags & MSG_PEEK)) {
-   /* If the other side has shutdown for sending and there
-* is nothing more to read, then modify the socket
-* state.
-*/
-   if (vsk->peer_shutdown & SEND_SHUTDOWN) {
-   if (vsock_stream_has_data(vsk) <= 0) {
-   sk->sk_state = SS_UNCONNECTED;
-   sock_set_flag(sk, SOCK_DONE);
-   sk->sk_state_change(sk);
-   }
-   }
-   }
+   if (copied > 0)
err = copied;
-   }
 
 out_wait:
finish_wait(sk_sleep(sk), );
-- 
2.5.0

[added to the 4.1 stable tree] VSOCK: do not disconnect socket when peer has shutdown SEND only

2016-07-11 Thread Sasha Levin

From: Ian Campbell 

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===

[ Upstream commit dedc58e067d8c379a15a8a183c5db318201295bb ]

The peer may be expecting a reply having sent a request and then done a
shutdown(SHUT_WR), so tearing down the whole socket at this point seems
wrong and breaks for me with a client which does a SHUT_WR.

Looking at other socket family's stream_recvmsg callbacks doing a shutdown
here does not seem to be the norm and removing it does not seem to have
had any adverse effects that I can see.

I'm using Stefan's RFC virtio transport patches, I'm unsure of the impact
on the vmci transport.

Signed-off-by: Ian Campbell 
Cc: "David S. Miller" 
Cc: Stefan Hajnoczi 
Cc: Claudio Imbrenda 
Cc: Andy King 
Cc: Dmitry Torokhov 
Cc: Jorgen Hansen 
Cc: Adit Ranadive 
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller 
Signed-off-by: Sasha Levin 
---
 net/vmw_vsock/af_vsock.c | 21 +
 1 file changed, 1 insertion(+), 20 deletions(-)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 2ec86e6..e1c69b2 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -1794,27 +1794,8 @@ vsock_stream_recvmsg(struct socket *sock, struct msghdr 
*msg, size_t len,
else if (sk->sk_shutdown & RCV_SHUTDOWN)
err = 0;
 
-   if (copied > 0) {
-   /* We only do these additional bookkeeping/notification steps
-* if we actually copied something out of the queue pair
-* instead of just peeking ahead.
-*/
-
-   if (!(flags & MSG_PEEK)) {
-   /* If the other side has shutdown for sending and there
-* is nothing more to read, then modify the socket
-* state.
-*/
-   if (vsk->peer_shutdown & SEND_SHUTDOWN) {
-   if (vsock_stream_has_data(vsk) <= 0) {
-   sk->sk_state = SS_UNCONNECTED;
-   sock_set_flag(sk, SOCK_DONE);
-   sk->sk_state_change(sk);
-   }
-   }
-   }
+   if (copied > 0)
err = copied;
-   }
 
 out_wait:
finish_wait(sk_sleep(sk), );
-- 
2.5.0

Re: [PATCH -next] net: ethernet: bgmac: Fix return value check in bgmac_probe()

2016-07-11 Thread David Miller

From: weiyj...@163.com
Date: Tue, 12 Jul 2016 00:17:28 +

> From: Wei Yongjun 
> 
> In case of error, the function devm_ioremap_resource() returns ERR_PTR()
> and never returns NULL. The NULL test in the return value check should be
> replaced with IS_ERR().
> 
> Signed-off-by: Wei Yongjun 

Applied, thanks.

Re: [PATCH net v2 0/2] net: ethoc: Error path and transmit fixes

2016-07-11 Thread David Miller

From: Florian Fainelli 
Date: Mon, 11 Jul 2016 16:35:53 -0700

> I don't have access to any other platform using an ethoc interface so
> it could be good to some testing on Xtensa for instance.

So I'll wait until someone does such testing before applying this
series.

Re: [PATCH v6 RESEND] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-07-11 Thread David Miller

From: Mario Limonciello 
Date: Mon, 11 Jul 2016 19:58:04 -0500

> The RTL8153-AD supports a persistent system specific MAC address.
> This means a device plugged into two different systems with host side
> support will show different (but persistent) MAC addresses.
> 
> This information for the system's persistent MAC address is burned in when
> the system HW is built and available under \_SB.AMAC in the DSDT at runtime.
> 
> This technology is currently implemented in the Dell TB15 and WD15 Type-C
> docks.  More information is available here:
> http://www.dell.com/support/article/us/en/04/SLN301147
> 
> Signed-off-by: Mario Limonciello 

Applied.

Re: XDP seeking input from NIC hardware vendors

2016-07-11 Thread Alexei Starovoitov

On Sat, Jul 09, 2016 at 01:27:26PM +0200, Jesper Dangaard Brouer wrote:
> On Fri, 8 Jul 2016 18:51:07 +0100
> Jakub Kicinski  wrote:
> 
> > On Fri, 8 Jul 2016 09:45:25 -0700, John Fastabend wrote:
> > > The only distinction between VFs and queue groupings on my side is VFs
> > > provide RSS where as queue groupings have to be selected explicitly.
> > > In a programmable NIC world the distinction might be lost if a "RSS"
> > > program can be loaded into the NIC to select queues but for existing
> > > hardware the distinction is there.  
> > 
> > To do BPF RSS we need a way to select the queue which I think is all
> > Jesper wanted.  So we will have to tackle the queue selection at some
> > point.  The main obstacle with it for me is to define what queue
> > selection means when program is not offloaded to HW...  Implementing
> > queue selection on HW side is trivial.
> 
> Yes, I do see the problem of fallback, when the programs "filter" demux
> cannot be offloaded to hardware.
> 
> First I though it was a good idea to keep the "demux-filter" part of
> the eBPF program, as software fallback can still apply this filter in
> SW, and just mark the packets as not-zero-copy-safe.  But when HW
> offloading is not possible, then packets can be delivered every RX
> queue, and SW would need to handle that, which hard to keep transparent.
> 
> 
> > > If you demux using a eBPF program or via a filter model like
> > > flow_director or cls_{u32|flower} I think we can support both. And this
> > > just depends on the programmability of the hardware. Note flow_director
> > > and cls_{u32|flower} steering to VFs is already in place.  
> 
> Maybe we should keep HW demuxing as a separate setup step.
> 
> Today I can almost do what I want: by setting up ntuple filters, and (if
> Alexei allows it) assign an application specific XDP eBPF program to a
> specific RX queue.
> 
>  ethtool -K eth2 ntuple on
>  ethtool -N eth2 flow-type udp4 dst-ip 192.168.254.1 dst-port 53 action 42
> 
> Then the XDP program can be attached to RX queue 42, and
> promise/guarantee that it will consume all packet.  And then the
> backing page-pool can allow zero-copy RX (and enable scrubbing when
> refilling pool).

so such ntuple rule will send udp4 traffic for specific ip and port
into a queue then it will somehow gets zero-copied to vm?
. looks like a lot of other pieces about zero-copy and qemu need to be
implemented (or at least architected) for this scheme to be conceivable
. and when all that happens what vm is going to do with this very specific
traffic? vm won't have any tcp or even ping?

the network virtualization traffic is typically encapsulated,
so if xdp is used to do steer the traffic, the program would need
to figure out vm id based on headers, strip tunnel, apply policy before
forwarding the packet further. Clearly hw ntuple is not going to suffice.

If there is no networking virtualization and VMs are operating in the
flat network, then there is no policy, no ip filter, no vm migration.
Only mac per vm and sriov handles this case just fine.
When hw becomes more programmable we'll be able to load xdp program
into hw that does tunnel, policy and forwards into vf then sriov will
become actually usable for cloud providers.
hw xdp into vf is more interesting than into a queue, since there is
more than one queue/interrupt per vf and network heavy vm can actually
consume large amount of traffic.

[PATCH v6 RESEND] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-07-11 Thread Mario Limonciello

The RTL8153-AD supports a persistent system specific MAC address.
This means a device plugged into two different systems with host side
support will show different (but persistent) MAC addresses.

This information for the system's persistent MAC address is burned in when
the system HW is built and available under \_SB.AMAC in the DSDT at runtime.

This technology is currently implemented in the Dell TB15 and WD15 Type-C
docks.  More information is available here:
http://www.dell.com/support/article/us/en/04/SLN301147

Signed-off-by: Mario Limonciello 
---
 drivers/net/usb/r8152.c | 76 +++--
 1 file changed, 74 insertions(+), 2 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 0da72d3..2298f26 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Information for net-next */
 #define NETNEXT_VERSION"08"
@@ -460,6 +461,11 @@
 /* SRAM_IMPEDANCE */
 #define RX_DRIVING_MASK0x6000
 
+/* MAC PASSTHRU */
+#define AD_MASK0xfee0
+#define EFUSE  0xcfdb
+#define PASS_THRU_MASK 0x1
+
 enum rtl_register_content {
_1000bps= 0x10,
_100bps = 0x08,
@@ -1036,6 +1042,65 @@ out1:
return ret;
 }
 
+/* Devices containing RTL8153-AD can support a persistent
+ * host system provided MAC address.
+ * Examples of this are Dell TB15 and Dell WD15 docks
+ */
+static int vendor_mac_passthru_addr_read(struct r8152 *tp, struct sockaddr *sa)
+{
+   acpi_status status;
+   struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+   union acpi_object *obj;
+   int ret = -EINVAL;
+   u32 ocp_data;
+   unsigned char buf[6];
+
+   /* test for -AD variant of RTL8153 */
+   ocp_data = ocp_read_word(tp, MCU_TYPE_USB, USB_MISC_0);
+   if ((ocp_data & AD_MASK) != 0x1000)
+   return -ENODEV;
+
+   /* test for MAC address pass-through bit */
+   ocp_data = ocp_read_byte(tp, MCU_TYPE_USB, EFUSE);
+   if ((ocp_data & PASS_THRU_MASK) != 1)
+   return -ENODEV;
+
+   /* returns _AUXMAC_#AABBCCDDEEFF# */
+   status = acpi_evaluate_object(NULL, "\\_SB.AMAC", NULL, );
+   obj = (union acpi_object *)buffer.pointer;
+   if (!ACPI_SUCCESS(status))
+   return -ENODEV;
+   if (obj->type != ACPI_TYPE_BUFFER || obj->string.length != 0x17) {
+   netif_warn(tp, probe, tp->netdev,
+  "Invalid buffer when reading pass-thru MAC addr: "
+  "(%d, %d)\n",
+  obj->type, obj->string.length);
+   goto amacout;
+   }
+   if (strncmp(obj->string.pointer, "_AUXMAC_#", 9) != 0 ||
+   strncmp(obj->string.pointer + 0x15, "#", 1) != 0) {
+   netif_warn(tp, probe, tp->netdev,
+  "Invalid header when reading pass-thru MAC addr\n");
+   goto amacout;
+   }
+   ret = hex2bin(buf, obj->string.pointer + 9, 6);
+   if (!(ret == 0 && is_valid_ether_addr(buf))) {
+   netif_warn(tp, probe, tp->netdev,
+  "Invalid MAC when reading pass-thru MAC addr: "
+  "%d, %pM\n", ret, buf);
+   ret = -EINVAL;
+   goto amacout;
+   }
+   memcpy(sa->sa_data, buf, 6);
+   ether_addr_copy(tp->netdev->dev_addr, sa->sa_data);
+   netif_info(tp, probe, tp->netdev,
+  "Using pass-thru MAC addr %pM\n", sa->sa_data);
+
+amacout:
+   kfree(obj);
+   return ret;
+}
+
 static int set_ethernet_addr(struct r8152 *tp)
 {
struct net_device *dev = tp->netdev;
@@ -1044,8 +1109,15 @@ static int set_ethernet_addr(struct r8152 *tp)
 
if (tp->version == RTL_VER_01)
ret = pla_ocp_read(tp, PLA_IDR, 8, sa.sa_data);
-   else
-   ret = pla_ocp_read(tp, PLA_BACKUP, 8, sa.sa_data);
+   else {
+   /* if this is not an RTL8153-AD, no eFuse mac pass thru set,
+* or system doesn't provide valid _SB.AMAC this will be
+* be expected to non-zero
+*/
+   ret = vendor_mac_passthru_addr_read(tp, );
+   if (ret < 0)
+   ret = pla_ocp_read(tp, PLA_BACKUP, 8, sa.sa_data);
+   }
 
if (ret < 0) {
netif_err(tp, probe, dev, "Get ether addr fail\n");
-- 
2.7.4

eBPF tunable max instructions or max tail call?

2016-07-11 Thread Sargun Dhillon

It would be nice to have eBPF programs that are longer than 4096
instructions. I'm trying to implement XSalsa20 in eBPF, and
unfortunately, it doesn't fit into 4096 instructions since I'm
unrolling all of the loops. Further than that, doing tail calls to
process each block results in me hitting the tail call limit.

It don't think that it makes much sense to expose the crypto API as
BPF helpers, as I'm not sure if we can ensure safety, and timely
execution with it. I may be wrong here, and if there is a sane, safe
way to expose the crypto API, I'm all ears.

Other than that, it would be nice to make the max instructions a knob,
and I don't think that it has much downside, given it's only checked
on load time. It would be nice to make the tail call limit a tunable
as well, but I'm unsure of the performance impact it might have given
that it's checked at runtime.

What do y'all think is reasonable? Make them both tunable? Just one? None?

RE: [Intel-wired-lan] [BUG] Panic on boot in ixgbe with Xeon D-1531

2016-07-11 Thread Skidmore, Donald C

The igbvf is a bit of a red herring since it is not the VF driver for ixgbe and 
sriov isn't enable on this boot anyway as we can tell by the number of queue 
allocated.

As for the panic I believe we already have a patch, but if you could supply the 
DevID we could tell you for sure.

Thanks,
-Don Skidmore 



> -Original Message-
> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
> Behalf Of Patrick McLean
> Sent: Thursday, June 30, 2016 9:12 PM
> To: Kirsher, Jeffrey T 
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
> Subject: [Intel-wired-lan] [BUG] Panic on boot in ixgbe with Xeon D-1531
> 
> Hi,
> 
> We are getting a panic on boot with Linus git as of this morning. I have
> attached the boot log, it looks like the panic is in igbvf/ixgbe.
> The machine is being netbooted via legacy PXE.
> 
> I have attached the full boot log from a kernel with igbvf enabled, and one 
> log
> of just the panic message with igbvf disabled. Please let me know if you need
> any more information.

[PATCH -next] net: ethernet: bgmac: Fix return value check in bgmac_probe()

2016-07-11 Thread weiyj_lk

From: Wei Yongjun 

In case of error, the function devm_ioremap_resource() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should be
replaced with IS_ERR().

Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/broadcom/bgmac-platform.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bgmac-platform.c 
b/drivers/net/ethernet/broadcom/bgmac-platform.c
index 7a8f7ef..1a2d841 100644
--- a/drivers/net/ethernet/broadcom/bgmac-platform.c
+++ b/drivers/net/ethernet/broadcom/bgmac-platform.c
@@ -141,7 +141,7 @@ static int bgmac_probe(struct platform_device *pdev)
}
 
bgmac->plat.idm_base = devm_ioremap_resource(>dev, regs);
-   if (!bgmac->plat.idm_base) {
+   if (IS_ERR(bgmac->plat.idm_base)) {
dev_err(>dev, "Unable to map idm resource\n");
return PTR_ERR(bgmac->plat.idm_base);
}

Re: [PATCH 3/3] crypto: Added Chelsio Menu to the Kconfig file

2016-07-11 Thread kbuild test robot

Hi,

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.7-rc7 next-20160711]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Yeshaswi-M-R-Gowda/crypto-chcr-Add-Chelsio-Crypto-Driver/20160712-023513
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/crypto/chelsio/chcr_core.c: In function 'cpl_fw6_pld_handler':
>> drivers/crypto/chelsio/chcr_core.c:134:8: warning: cast to pointer from 
>> integer of different size [-Wint-to-pointer-cast]
 req = (struct crypto_async_request *)cookie;
   ^
--
   In file included from include/linux/swab.h:4:0,
from include/uapi/linux/byteorder/little_endian.h:12,
from include/linux/byteorder/little_endian.h:4,
from arch/x86/include/uapi/asm/byteorder.h:4,
from include/asm-generic/bitops/le.h:5,
from arch/x86/include/asm/bitops.h:504,
from include/linux/bitops.h:36,
from include/linux/kernel.h:10,
from drivers/crypto/chelsio/chcr_algo.c:42:
   drivers/crypto/chelsio/chcr_algo.c: In function 'create_wreq':
>> drivers/crypto/chelsio/chcr_algo.c:454:29: warning: cast from pointer to 
>> integer of different size [-Wpointer-to-int-cast]
 wreq->cookie = cpu_to_be64((u64)req);
^
   include/uapi/linux/swab.h:126:54: note: in definition of macro '__swab64'
#define __swab64(x) (__u64)__builtin_bswap64((__u64)(x))
 ^
>> include/linux/byteorder/generic.h:91:21: note: in expansion of macro 
>> '__cpu_to_be64'
#define cpu_to_be64 __cpu_to_be64
^
>> drivers/crypto/chelsio/chcr_algo.c:454:17: note: in expansion of macro 
>> 'cpu_to_be64'
 wreq->cookie = cpu_to_be64((u64)req);
^~~
   drivers/crypto/chelsio/chcr_algo.c: In function 'chcr_register_alg':
>> drivers/crypto/chelsio/chcr_algo.c:1471:48: warning: operation on 
>> 'driver_algs[i].alg.hash.halg.base.cra_init' may be undefined 
>> [-Wsequence-point]
driver_algs[i].alg.hash.halg.base.cra_init =
~~~^
driver_algs[i].alg.hash.halg.base.cra_init =

 chcr_hmac_cra_init;
 ~~ 

vim +134 drivers/crypto/chelsio/chcr_core.c

5c923415 Yeshaswi M R Gowda 2016-07-11  118 u_ctx->dev = NULL;
5c923415 Yeshaswi M R Gowda 2016-07-11  119 atomic_dec(_count);
5c923415 Yeshaswi M R Gowda 2016-07-11  120 return 0;
5c923415 Yeshaswi M R Gowda 2016-07-11  121  }
5c923415 Yeshaswi M R Gowda 2016-07-11  122  
5c923415 Yeshaswi M R Gowda 2016-07-11  123  static int 
cpl_fw6_pld_handler(struct chcr_dev *dev,
5c923415 Yeshaswi M R Gowda 2016-07-11  124unsigned 
char *input)
5c923415 Yeshaswi M R Gowda 2016-07-11  125  {
5c923415 Yeshaswi M R Gowda 2016-07-11  126 struct crypto_async_request 
*req;
5c923415 Yeshaswi M R Gowda 2016-07-11  127 struct cpl_fw6_pld *fw6_pld;
5c923415 Yeshaswi M R Gowda 2016-07-11  128 u64 cookie;
5c923415 Yeshaswi M R Gowda 2016-07-11  129 u32 ack_err_status = 0;
5c923415 Yeshaswi M R Gowda 2016-07-11  130 int error_status = 0;
5c923415 Yeshaswi M R Gowda 2016-07-11  131  
5c923415 Yeshaswi M R Gowda 2016-07-11  132 fw6_pld = (struct cpl_fw6_pld 
*)input;
5c923415 Yeshaswi M R Gowda 2016-07-11  133 cookie = 
be64_to_cpu(fw6_pld->data[1]);
5c923415 Yeshaswi M R Gowda 2016-07-11 @134 req = (struct 
crypto_async_request *)cookie;
5c923415 Yeshaswi M R Gowda 2016-07-11  135  
5c923415 Yeshaswi M R Gowda 2016-07-11  136 ack_err_status =
5c923415 Yeshaswi M R Gowda 2016-07-11  137 ntohl(*(__be32 
*)((unsigned char *)_pld->data[0] + 4));
5c923415 Yeshaswi M R Gowda 2016-07-11  138 if (ack_err_status) {
5c923415 Yeshaswi M R Gowda 2016-07-11  139 if 
(CHK_MAC_ERR_BIT(ack_err_status) ||
5c923415 Yeshaswi M R Gowda 2016-07-11  140 
CHK_PAD_ERR_BIT(ack_err_status))
5c923415 Yeshaswi M R Gowda 2016-07-11  141 error_status = 
-EINVAL;
5c923415 Yeshaswi M R Gowda 2016-07-11  142 }

:: The code at line 134 was first introduced by commit
:: 5c9234157776103907606c9f4c93a311467e246f chcr: Support for Chelsio's 
Crypto Hardware

:: TO: Yeshaswi M R Gowda <yesha...@chelsio.com>
:: CC: 0day robot <fengguang...@intel.com>

---
0-DAY kernel test infrastructureOpen Sou

[PATCH net v2 2/2] net: ethoc: Correctly pad short packets

2016-07-11 Thread Florian Fainelli

Even though the hardware can be doing zero padding, we want the SKB to
be going out on the wire with the appropriate size. This fixes packet
truncations observed with e.g: ARP packets.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/ethoc.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 06ae14a8e946..ca678d46c322 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -860,6 +860,11 @@ static netdev_tx_t ethoc_start_xmit(struct sk_buff *skb, 
struct net_device *dev)
unsigned int entry;
void *dest;
 
+   if (skb_put_padto(skb, ETHOC_ZLEN)) {
+   dev->stats.tx_errors++;
+   goto out;
+   }
+
if (unlikely(skb->len > ETHOC_BUFSIZ)) {
dev->stats.tx_errors++;
goto out;
-- 
2.7.4

[PATCH net v2 0/2] net: ethoc: Error path and transmit fixes

2016-07-11 Thread Florian Fainelli

Hi all,

This patch series contains two patches for the ethoc driver while testing on a
TS-7300 board where ethoc is provided by an on-board FPGA.

First patch was cooked after chasing crashes with invalid resources passed to
the driver.

Second patch was cooked after seeing that an interface configured with IP
192.168.2.2 was sending ARP packets for 192.168.0.0, no wonder why it could not
work.

I don't have access to any other platform using an ethoc interface so
it could be good to some testing on Xtensa for instance.

Changes in v2, fixed the first commit message

Florian Fainelli (2):
  net: ethoc: Fix early error paths
  net: ethoc: Correctly pad short packets

 drivers/net/ethernet/ethoc.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

-- 
2.7.4

[PATCH net v2 1/2] net: ethoc: Fix early error paths

2016-07-11 Thread Florian Fainelli

In case any operation fails before we can successfully go the point
where we would register a MDIO bus, we would be going to an error label
which involves unregistering then freeing this yet to be created MDIO
bus. Update all error paths to go to label free which is the only one
valid until either the clock is enabled, or the MDIO bus is allocated
and registered. This fixes kernel oops observed while trying to
dereference the MDIO bus structure which is not yet allocated.

Fixes: a1702857724f ("net: Add support for the OpenCores 10/100 Mbps Ethernet 
MAC.")
Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/ethoc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 4edb98c3c6c7..06ae14a8e946 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -1086,7 +1086,7 @@ static int ethoc_probe(struct platform_device *pdev)
if (!priv->iobase) {
dev_err(>dev, "cannot remap I/O memory space\n");
ret = -ENXIO;
-   goto error;
+   goto free;
}
 
if (netdev->mem_end) {
@@ -1095,7 +1095,7 @@ static int ethoc_probe(struct platform_device *pdev)
if (!priv->membase) {
dev_err(>dev, "cannot remap memory space\n");
ret = -ENXIO;
-   goto error;
+   goto free;
}
} else {
/* Allocate buffer memory */
@@ -1106,7 +1106,7 @@ static int ethoc_probe(struct platform_device *pdev)
dev_err(>dev, "cannot allocate %dB buffer\n",
buffer_size);
ret = -ENOMEM;
-   goto error;
+   goto free;
}
netdev->mem_end = netdev->mem_start + buffer_size;
priv->dma_alloc = buffer_size;
@@ -1120,7 +1120,7 @@ static int ethoc_probe(struct platform_device *pdev)
128, (netdev->mem_end - netdev->mem_start + 1) / ETHOC_BUFSIZ);
if (num_bd < 4) {
ret = -ENODEV;
-   goto error;
+   goto free;
}
priv->num_bd = num_bd;
/* num_tx must be a power of two */
@@ -1133,7 +1133,7 @@ static int ethoc_probe(struct platform_device *pdev)
priv->vma = devm_kzalloc(>dev, num_bd*sizeof(void *), GFP_KERNEL);
if (!priv->vma) {
ret = -ENOMEM;
-   goto error;
+   goto free;
}
 
/* Allow the platform setup code to pass in a MAC address. */
-- 
2.7.4

Re: [PATCH 0/2] ARM: dts: NSP: Add built-in Ethernet switch nodes

2016-07-11 Thread Florian Fainelli

On 07/08/2016 01:07 PM, Andrew Lunn wrote:
> On Fri, Jul 08, 2016 at 11:49:27AM -0700, Florian Fainelli wrote:
>> This patch series is based on Broadcom/stblinux/devicetree/next which
>> contains proper support for the BCM958625HR board. To get working
>> Ethernet switch and CPU Ethernet support, the following dependencies
>> based on David Miller's net-next tree are required:
> 
> Reviewed-by: Andrew Lunn 

Both applied, thanks!
-- 
Florian

Re: [PATCH v7 00/11] Add driver bpf hook for early packet drop and forwarding

2016-07-11 Thread Tom Herbert

On Mon, Jul 11, 2016 at 2:53 PM, Or Gerlitz  wrote:
> On Tue, Jul 12, 2016 at 12:29 AM, Brenden Blanco  wrote:
>
>> v7:
> [...]
>>  TODOs:
>>  Add ethtool per-ring stats for aborted, default cases, maybe even drop
>>  and tx as well.
>
> please no... lets stop and think if we can have something better vs
> every XDP enabled driver to have bunch of new ethtool based stats, was
> this somehow discussed over the threads so far?

We'll also need to consider statistics what statistics a device keeps
and how they are exposed when the device offloads XDP, and also the
possibility that the BPF program itself might keep its own stats.

Tom

Re: [net-next PATCH RFC] mlx4: RX prefetch loop

2016-07-11 Thread Alexei Starovoitov

On Mon, Jul 11, 2016 at 01:09:22PM +0200, Jesper Dangaard Brouer wrote:
> > -   /* Process all completed CQEs */
> > +   /* Extract and prefetch completed CQEs */
> > while (XNOR(cqe->owner_sr_opcode & MLX4_CQE_OWNER_MASK,
> > cq->mcq.cons_index & cq->size)) {
> > +   void *data;
> >  
> > frags = ring->rx_info + (index << priv->log_rx_info);
> > rx_desc = ring->buf + (index << ring->log_stride);
> > +   prefetch(rx_desc);
> >  
> > /*
> >  * make sure we read the CQE after we read the ownership bit
> >  */
> > dma_rmb();
> >  
> > +   cqe_array[cqe_idx++] = cqe;
> > +
> > +   /* Base error handling here, free handled in next loop */
> > +   if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) ==
> > +MLX4_CQE_OPCODE_ERROR))
> > +   goto skip;
> > +
> > +   data = page_address(frags[0].page) + frags[0].page_offset;
> > +   prefetch(data);

that's probably not correct in all cases, since doing prefetch on the address
that is going to be evicted soon may hurt performance.
We need to dma_sync_single_for_cpu() before doing a prefetch or
somehow figure out that dma_sync is a nop, so we can omit it altogether
and do whatever prefetches we like.
Also unconditionally doing batch of 8 may also hurt depending on what
is happening either with the stack, bpf afterwards or even cpu version.
Doing single prefetch of Nth packet is probably ok most of the time,
but asking cpu to prefetch 8 packets at once is unnecessary especially
since single prefetch gives the same performance.

[PATCH net-next,v3] tools: hv: Add a script to help bonding synthetic and VF NICs

2016-07-11 Thread Haiyang Zhang

From: Haiyang Zhang 

This script helps to create bonding network devices based on synthetic NIC
(the virtual network adapter usually provided by Hyper-V) and the matching
VF NIC (SRIOV virtual function). So the synthetic NIC and VF NIC can
function as one network device, and fail over to the synthetic NIC if VF is
down.

Mayjor distros (RHEL, Ubuntu, SLES) supported by Hyper-V are supported by
this script.

Signed-off-by: Haiyang Zhang 
Reviewed-by: K. Y. Srinivasan 
---
 tools/hv/bondvf.sh |  194 
 1 files changed, 194 insertions(+), 0 deletions(-)
 create mode 100755 tools/hv/bondvf.sh

diff --git a/tools/hv/bondvf.sh b/tools/hv/bondvf.sh
new file mode 100755
index 000..d6eb257
--- /dev/null
+++ b/tools/hv/bondvf.sh
@@ -0,0 +1,194 @@
+#!/bin/bash
+
+# This example script creates bonding network devices based on synthetic NIC
+# (the virtual network adapter usually provided by Hyper-V) and the matching
+# VF NIC (SRIOV virtual function). So the synthetic NIC and VF NIC can
+# function as one network device, and fail over to the synthetic NIC if VF is
+# down.
+#
+# Usage:
+# - After configured vSwitch and vNIC with SRIOV, start Linux virtual
+#   machine (VM)
+# - Run this scripts on the VM. It will create configuration files in
+#   distro specific directory.
+# - Reboot the VM, so that the bonding config are enabled.
+#
+# The config files are DHCP by default. You may edit them if you need to change
+# to Static IP or change other settings.
+#
+
+sysdir=/sys/class/net
+netvsc_cls={f8615163-df3e-46c5-913f-f2d2f965ed0e}
+bondcnt=0
+
+# Detect Distro
+if [ -f /etc/redhat-release ];
+then
+   cfgdir=/etc/sysconfig/network-scripts
+   distro=redhat
+elif grep -q 'Ubuntu' /etc/issue
+then
+   cfgdir=/etc/network
+   distro=ubuntu
+elif grep -q 'SUSE' /etc/issue
+then
+   cfgdir=/etc/sysconfig/network
+   distro=suse
+else
+   echo "Unsupported Distro"
+   exit 1
+fi
+
+echo Detected Distro: $distro, or compatible
+
+# Get a list of ethernet names
+list_eth=(`cd $sysdir && ls -d */ | cut -d/ -f1 | grep -v bond`)
+eth_cnt=${#list_eth[@]}
+
+echo List of net devices:
+
+# Get the MAC addresses
+for (( i=0; i < $eth_cnt; i++ ))
+do
+   list_mac[$i]=`cat $sysdir/${list_eth[$i]}/address`
+   echo ${list_eth[$i]}, ${list_mac[$i]}
+done
+
+# Find NIC with matching MAC
+for (( i=0; i < $eth_cnt-1; i++ ))
+do
+   for (( j=i+1; j < $eth_cnt; j++ ))
+   do
+   if [ "${list_mac[$i]}" = "${list_mac[$j]}" ]
+   then
+   list_match[$i]=${list_eth[$j]}
+   break
+   fi
+   done
+done
+
+function create_eth_cfg_redhat {
+   local fn=$cfgdir/ifcfg-$1
+
+   rm -f $fn
+   echo DEVICE=$1 >>$fn
+   echo TYPE=Ethernet >>$fn
+   echo BOOTPROTO=none >>$fn
+   echo ONBOOT=yes >>$fn
+   echo NM_CONTROLLED=no >>$fn
+   echo PEERDNS=yes >>$fn
+   echo IPV6INIT=yes >>$fn
+   echo MASTER=$2 >>$fn
+   echo SLAVE=yes >>$fn
+}
+
+function create_eth_cfg_pri_redhat {
+   create_eth_cfg_redhat $1 $2
+}
+
+function create_bond_cfg_redhat {
+   local fn=$cfgdir/ifcfg-$1
+
+   rm -f $fn
+   echo DEVICE=$1 >>$fn
+   echo TYPE=Bond >>$fn
+   echo BOOTPROTO=dhcp >>$fn
+   echo ONBOOT=yes >>$fn
+   echo NM_CONTROLLED=no >>$fn
+   echo PEERDNS=yes >>$fn
+   echo IPV6INIT=yes >>$fn
+   echo BONDING_MASTER=yes >>$fn
+   echo BONDING_OPTS=\"mode=active-backup miimon=100 primary=$2\" >>$fn
+}
+
+function create_eth_cfg_ubuntu {
+   local fn=$cfgdir/interfaces
+
+   echo $'\n'auto $1 >>$fn
+   echo iface $1 inet manual >>$fn
+   echo bond-master $2 >>$fn
+}
+
+function create_eth_cfg_pri_ubuntu {
+   local fn=$cfgdir/interfaces
+
+   create_eth_cfg_ubuntu $1 $2
+   echo bond-primary $1 >>$fn
+}
+
+function create_bond_cfg_ubuntu {
+   local fn=$cfgdir/interfaces
+
+   echo $'\n'auto $1 >>$fn
+   echo iface $1 inet dhcp >>$fn
+   echo bond-mode active-backup >>$fn
+   echo bond-miimon 100 >>$fn
+   echo bond-slaves none >>$fn
+}
+
+function create_eth_cfg_suse {
+local fn=$cfgdir/ifcfg-$1
+
+rm -f $fn
+   echo BOOTPROTO=none >>$fn
+   echo STARTMODE=auto >>$fn
+}
+
+function create_eth_cfg_pri_suse {
+   create_eth_cfg_suse $1
+}
+
+function create_bond_cfg_suse {
+   local fn=$cfgdir/ifcfg-$1
+
+   rm -f $fn
+   echo BOOTPROTO=dhcp >>$fn
+   echo STARTMODE=auto >>$fn
+   echo BONDING_MASTER=yes >>$fn
+   echo BONDING_SLAVE_0=$2 >>$fn
+   echo BONDING_SLAVE_1=$3 >>$fn
+   echo BONDING_MODULE_OPTS=\'mode=active-backup miimon=100 primary=$2\' 
>>$fn
+}
+
+function create_bond {
+   local bondname=bond$bondcnt
+   local primary
+   local secondary
+
+   local class_id1=`cat

Re: [PATCH v7 00/11] Add driver bpf hook for early packet drop and forwarding

2016-07-11 Thread Brenden Blanco

On Tue, Jul 12, 2016 at 12:53:34AM +0300, Or Gerlitz wrote:
> On Tue, Jul 12, 2016 at 12:29 AM, Brenden Blanco  wrote:
> 
> > v7:
> [...]
> >  TODOs:
> >  Add ethtool per-ring stats for aborted, default cases, maybe even drop
> >  and tx as well.
> 
> please no... lets stop and think if we can have something better vs
> every XDP enabled driver to have bunch of new ethtool based stats, was
> this somehow discussed over the threads so far?
This was in the context of the discussion in [01/12] relating to the
return codes and debuggability.

Re: [PATCH v6] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-07-11 Thread David Miller

From: 
Date: Mon, 11 Jul 2016 21:54:07 +

> Please let me know what else can be done for this patch to make it
> acceptable so we can have parity for Linux.

Just resubmit it and I'll apply it, I'm so tired of hearing about this...

RE: [PATCH v6] r8152: Add support for setting pass through MAC address on RTL8153-AD

2016-07-11 Thread Mario_Limonciello

> David,
> 
> Did you have any more thoughts about this?  I'm happy to make some other
> adjustments to the patch, if you have some recommendations.

Hi,

I just wanted to share that the maintenance BIOSes released for the Dell 
platforms with Type-C this past week enables the MAC address pass 
through feature in UEFI, so any network booted machines will offer
the auxiliary MAC to the DHCP server.  In Windows a network booted
machine will always use auxiliary MAC now.

XPS 9350 (BIOS 1.4.4): http://goo.gl/7Sw2DZ

Please let me know what else can be done for this patch to make it
acceptable so we can have parity for Linux.

Thanks,

Re: [PATCH v7 00/11] Add driver bpf hook for early packet drop and forwarding

2016-07-11 Thread Or Gerlitz

On Tue, Jul 12, 2016 at 12:29 AM, Brenden Blanco  wrote:

> v7:
[...]
>  TODOs:
>  Add ethtool per-ring stats for aborted, default cases, maybe even drop
>  and tx as well.

please no... lets stop and think if we can have something better vs
every XDP enabled driver to have bunch of new ethtool based stats, was
this somehow discussed over the threads so far?

Re: [PATCH v6 04/12] net/mlx4_en: add support for fast rx drop bpf program

2016-07-11 Thread Brenden Blanco

On Mon, Jul 11, 2016 at 02:48:17PM +0300, Saeed Mahameed wrote:
[...]
> 
> yes, we need something like:
> 
> +static inline void
> +mlx4_en_sync_dma(struct mlx4_en_priv *priv,
> +struct mlx4_en_rx_desc *rx_desc,
> +int length)
> +{
> +   dma_addr_t dma;
> +
> +   /* Sync dma addresses from HW descriptor */
> +   for (nr = 0; nr < priv->num_frags; nr++) {
> +   struct mlx4_en_frag_info *frag_info = >frag_info[nr];
> +
> +   if (length <= frag_info->frag_prefix_size)
> +   break;
> +
> +   dma = be64_to_cpu(rx_desc->data[nr].addr);
> +   dma_sync_single_for_cpu(priv->ddev, dma, frag_info->frag_size,
> +   DMA_FROM_DEVICE);
> +   }
> +}
> 
> 
> @@ -790,6 +808,10 @@ int mlx4_en_process_rx_cq(struct net_device *dev,
> struct mlx4_en_cq *cq, int bud
> goto next;
> }
> 
> +   length = be32_to_cpu(cqe->byte_cnt);
> +   length -= ring->fcs_del;
> +
> +   mlx4_en_sync_dma(priv,rx_desc, length);
>  /* data is available continue processing the packet */
> 
> and make sure to remove all explicit dma_sync_single_for_cpu calls.

I see. At first glance, this may work, but introduces some changes in
the driver that may be unwanted. For instance, the dma sync cost is now
being paid even in the case where no skb will be allocated. So, under
memory pressure, it might cause extra work which would slow down your
ability to recover from the stress.

Let's keep discussing it, but in the context of a standalone cleanup.

Re: [PATCH v2 net] sock: ignore SCM_RIGHTS and SCM_CREDENTIALS in __sock_cmsg_send

2016-07-11 Thread David Miller

From: Soheil Hassas Yeganeh 
Date: Mon, 11 Jul 2016 16:51:26 -0400

> From: Soheil Hassas Yeganeh 
> 
> Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS
> as a control message to TCP. Since __sock_cmsg_send does not
> support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and
> hence breaks pulse audio over TCP.
> 
> SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer
> but they semantically belong to SOL_UNIX. Since all
> cmsg-processing functions including sock_cmsg_send ignore control
> messages of other layers, it is best to ignore SCM_RIGHTS
> and SCM_CREDENTIALS for consistency (and also for fixing pulse
> audio over TCP).
> 
> Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
> Signed-off-by: Soheil Hassas Yeganeh 
> Reported-by: Sergei Trofimovich 
> Tested-by: Sergei Trofimovich 

Applied and queued up for -stable, thanks.

Re: [net-next:master 1118/1134] drivers/net/dsa/b53/b53_srab.c:388:20: warning: cast from pointer to integer of different size

2016-07-11 Thread Florian Fainelli

On 07/11/2016 02:31 PM, David Miller wrote:
> From: kbuild test robot 
> Date: Tue, 12 Jul 2016 05:19:57 +0800
> 
>> All warnings (new ones prefixed by >>):
>>
>>drivers/net/dsa/b53/b53_srab.c: In function 'b53_srab_probe':
 drivers/net/dsa/b53/b53_srab.c:388:20: warning: cast from pointer to 
 integer of different size [-Wpointer-to-int-cast]
>>   pdata->chip_id = (u32)of_id->data;
>>^
>>
> 
> Fixed as follows:
> 
> 
> [PATCH] b53: Fix build warning.
> 
>drivers/net/dsa/b53/b53_srab.c: In function 'b53_srab_probe':
>>> drivers/net/dsa/b53/b53_srab.c:388:20: warning: cast from pointer to 
>>> integer of different size [-Wpointer-to-int-cast]
>   pdata->chip_id = (u32)of_id->data;
>^
> 
> Reported-by: kbuild test robot 
> Signed-off-by: David S. Miller 

Acked-by: Florian Fainelli 

You are fast, thanks David!
-- 
Florian

[PATCH v7 04/11] net/mlx4_en: add support for fast rx drop bpf program

2016-07-11 Thread Brenden Blanco

Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver.

In tc/socket bpf programs, helpers linearize skb fragments as needed
when the program touches the packet data. However, in the pursuit of
speed, XDP programs will not be allowed to use these slower functions,
especially if it involves allocating an skb.

Therefore, disallow MTU settings that would produce a multi-fragment
packet that XDP programs would fail to access. Future enhancements could
be done to increase the allowable MTU.

Signed-off-by: Brenden Blanco 
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 49 ++
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 37 ---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h   |  5 +++
 3 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 6083775..31070f9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -31,6 +31,7 @@
  *
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -2084,6 +2085,9 @@ void mlx4_en_destroy_netdev(struct net_device *dev)
if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
mlx4_en_remove_timestamp(mdev);
 
+   if (priv->prog)
+   bpf_prog_put(priv->prog);
+
/* Detach the netdev so tasks would not attempt to access it */
mutex_lock(>state_lock);
mdev->pndev[priv->port] = NULL;
@@ -2112,6 +2116,11 @@ static int mlx4_en_change_mtu(struct net_device *dev, 
int new_mtu)
en_err(priv, "Bad MTU size:%d.\n", new_mtu);
return -EPERM;
}
+   if (priv->prog && MLX4_EN_EFF_MTU(new_mtu) > FRAG_SZ0) {
+   en_err(priv, "MTU size:%d requires frags but bpf prog running",
+  new_mtu);
+   return -EOPNOTSUPP;
+   }
dev->mtu = new_mtu;
 
if (netif_running(dev)) {
@@ -2520,6 +2529,44 @@ static int mlx4_en_set_tx_maxrate(struct net_device 
*dev, int queue_index, u32 m
return err;
 }
 
+static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
+{
+   struct mlx4_en_priv *priv = netdev_priv(dev);
+   struct bpf_prog *old_prog;
+
+   if (priv->num_frags > 1)
+   return -EOPNOTSUPP;
+
+   /* This xchg is paired with READ_ONCE in the fast path, but is
+* also protected from itself via rtnl lock
+*/
+   old_prog = xchg(>prog, prog);
+   if (old_prog)
+   bpf_prog_put(old_prog);
+
+   return 0;
+}
+
+static bool mlx4_xdp_attached(struct net_device *dev)
+{
+   struct mlx4_en_priv *priv = netdev_priv(dev);
+
+   return !!READ_ONCE(priv->prog);
+}
+
+static int mlx4_xdp(struct net_device *dev, struct netdev_xdp *xdp)
+{
+   switch (xdp->command) {
+   case XDP_SETUP_PROG:
+   return mlx4_xdp_set(dev, xdp->prog);
+   case XDP_QUERY_PROG:
+   xdp->prog_attached = mlx4_xdp_attached(dev);
+   return 0;
+   default:
+   return -EINVAL;
+   }
+}
+
 static const struct net_device_ops mlx4_netdev_ops = {
.ndo_open   = mlx4_en_open,
.ndo_stop   = mlx4_en_close,
@@ -2548,6 +2595,7 @@ static const struct net_device_ops mlx4_netdev_ops = {
.ndo_udp_tunnel_del = mlx4_en_del_vxlan_port,
.ndo_features_check = mlx4_en_features_check,
.ndo_set_tx_maxrate = mlx4_en_set_tx_maxrate,
+   .ndo_xdp= mlx4_xdp,
 };
 
 static const struct net_device_ops mlx4_netdev_ops_master = {
@@ -2584,6 +2632,7 @@ static const struct net_device_ops mlx4_netdev_ops_master 
= {
.ndo_udp_tunnel_del = mlx4_en_del_vxlan_port,
.ndo_features_check = mlx4_en_features_check,
.ndo_set_tx_maxrate = mlx4_en_set_tx_maxrate,
+   .ndo_xdp= mlx4_xdp,
 };
 
 struct mlx4_en_bond {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c1b3a9c..adfa123 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -743,6 +743,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
struct mlx4_en_rx_ring *ring = priv->rx_ring[cq->ring];
struct mlx4_en_rx_alloc *frags;
struct mlx4_en_rx_desc *rx_desc;
+   struct bpf_prog *prog;
struct sk_buff *skb;
int index;
int nr;
@@ -759,6 +760,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
if (budget <= 0)
return polled;
 
+   prog = READ_ONCE(priv->prog);
+
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
 * descriptor offset can be deduced from the CQE index instead of
 * reading 'cqe->index' */
@@ -835,6 +838,35 @@ int

[PATCH v7 02/11] net: add ndo to setup/query xdp prog in adapter rx

2016-07-11 Thread Brenden Blanco

Add one new netdev op for drivers implementing the BPF_PROG_TYPE_XDP
filter. The single op is used for both setup/query of the xdp program,
modelled after ndo_setup_tc.

Signed-off-by: Brenden Blanco 
---
 include/linux/netdevice.h | 32 
 net/core/dev.c| 33 +
 2 files changed, 65 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 49736a3..2f04746 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -63,6 +63,7 @@ struct wpan_dev;
 struct mpls_dev;
 /* UDP Tunnel offloads */
 struct udp_tunnel_info;
+struct bpf_prog;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
const struct ethtool_ops *ops);
@@ -799,6 +800,31 @@ struct tc_to_netdev {
};
 };
 
+/* These structures hold the attributes of xdp state that are being passed
+ * to the netdevice through the xdp op.
+ */
+enum xdp_netdev_command {
+   /* Set or clear a bpf program used in the earliest stages of packet
+* rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee
+* is responsible for calling bpf_prog_put on any old progs that are
+* stored, but not on the passed in prog.
+*/
+   XDP_SETUP_PROG,
+   /* Check if a bpf program is set on the device.  The callee should
+* return true if a program is currently attached and running.
+*/
+   XDP_QUERY_PROG,
+};
+
+struct netdev_xdp {
+   enum xdp_netdev_command command;
+   union {
+   /* XDP_SETUP_PROG */
+   struct bpf_prog *prog;
+   /* XDP_QUERY_PROG */
+   bool prog_attached;
+   };
+};
 
 /*
  * This structure defines the management hooks for network devices.
@@ -1087,6 +1113,9 @@ struct tc_to_netdev {
  * appropriate rx headroom value allows avoiding skb head copy on
  * forward. Setting a negative value resets the rx headroom to the
  * default value.
+ * int (*ndo_xdp)(struct net_device *dev, struct netdev_xdp *xdp);
+ * This function is used to set or query state related to XDP on the
+ * netdevice. See definition of enum xdp_netdev_command for details.
  *
  */
 struct net_device_ops {
@@ -1271,6 +1300,8 @@ struct net_device_ops {
   struct sk_buff *skb);
void(*ndo_set_rx_headroom)(struct net_device *dev,
   int needed_headroom);
+   int (*ndo_xdp)(struct net_device *dev,
+  struct netdev_xdp *xdp);
 };
 
 /**
@@ -3257,6 +3288,7 @@ int dev_get_phys_port_id(struct net_device *dev,
 int dev_get_phys_port_name(struct net_device *dev,
   char *name, size_t len);
 int dev_change_proto_down(struct net_device *dev, bool proto_down);
+int dev_change_xdp_fd(struct net_device *dev, int fd);
 struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device 
*dev);
 struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device 
*dev,
struct netdev_queue *txq, int *ret);
diff --git a/net/core/dev.c b/net/core/dev.c
index 7894e40..2a9c39f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -94,6 +94,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -6615,6 +6616,38 @@ int dev_change_proto_down(struct net_device *dev, bool 
proto_down)
 EXPORT_SYMBOL(dev_change_proto_down);
 
 /**
+ * dev_change_xdp_fd - set or clear a bpf program for a device rx path
+ * @dev: device
+ * @fd: new program fd or negative value to clear
+ *
+ * Set or clear a bpf program for a device
+ */
+int dev_change_xdp_fd(struct net_device *dev, int fd)
+{
+   const struct net_device_ops *ops = dev->netdev_ops;
+   struct bpf_prog *prog = NULL;
+   struct netdev_xdp xdp = {};
+   int err;
+
+   if (!ops->ndo_xdp)
+   return -EOPNOTSUPP;
+   if (fd >= 0) {
+   prog = bpf_prog_get_type(fd, BPF_PROG_TYPE_XDP);
+   if (IS_ERR(prog))
+   return PTR_ERR(prog);
+   }
+
+   xdp.command = XDP_SETUP_PROG;
+   xdp.prog = prog;
+   err = ops->ndo_xdp(dev, );
+   if (err < 0 && prog)
+   bpf_prog_put(prog);
+
+   return err;
+}
+EXPORT_SYMBOL(dev_change_xdp_fd);
+
+/**
  * dev_new_index   -   allocate an ifindex
  * @net: the applicable net namespace
  *
-- 
2.8.2

[PATCH] virtio-net: Remove more stack DMA

2016-07-11 Thread Andy Lutomirski

VLAN and MQ control was doing DMA from the stack.  Fix it.

Cc: Michael S. Tsirkin 
Cc: "netdev@vger.kernel.org" 
Signed-off-by: Andy Lutomirski 
---

I tested VLAN addition and removal with CONFIG_VMAP_STACK=y,
CONFIG_DEBUG_SG=y and it got rid of the warnings I saw.  I haven't
tested the MQ part because I don't know how to enable it in the first
place (I'm guessing it needs me to enable some QEMU feature I don't
know about.)

Michael, can you double-check this? DaveM, is it okay for this to go
in via -tip?

 drivers/net/virtio_net.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index e0638e556fe7..5044ca37d725 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -144,8 +144,10 @@ struct virtnet_info {
/* Control VQ buffers: protected by the rtnl lock */
struct virtio_net_ctrl_hdr ctrl_hdr;
virtio_net_ctrl_ack ctrl_status;
+   struct virtio_net_ctrl_mq ctrl_mq;
u8 ctrl_promisc;
u8 ctrl_allmulti;
+   u16 ctrl_vid;
 
/* Ethtool settings */
u8 duplex;
@@ -1116,14 +1118,13 @@ static void virtnet_ack_link_announce(struct 
virtnet_info *vi)
 static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 {
struct scatterlist sg;
-   struct virtio_net_ctrl_mq s;
struct net_device *dev = vi->dev;
 
if (!vi->has_cvq || !virtio_has_feature(vi->vdev, VIRTIO_NET_F_MQ))
return 0;
 
-   s.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
-   sg_init_one(, , sizeof(s));
+   vi->ctrl_mq.virtqueue_pairs = cpu_to_virtio16(vi->vdev, queue_pairs);
+   sg_init_one(, >ctrl_mq, sizeof(vi->ctrl_mq));
 
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MQ,
  VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET, )) {
@@ -1230,7 +1231,8 @@ static int virtnet_vlan_rx_add_vid(struct net_device *dev,
struct virtnet_info *vi = netdev_priv(dev);
struct scatterlist sg;
 
-   sg_init_one(, , sizeof(vid));
+   vi->ctrl_vid = vid;
+   sg_init_one(, >ctrl_vid, sizeof(vi->ctrl_vid));
 
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
  VIRTIO_NET_CTRL_VLAN_ADD, ))
@@ -1244,7 +1246,8 @@ static int virtnet_vlan_rx_kill_vid(struct net_device 
*dev,
struct virtnet_info *vi = netdev_priv(dev);
struct scatterlist sg;
 
-   sg_init_one(, , sizeof(vid));
+   vi->ctrl_vid = vid;
+   sg_init_one(, >ctrl_vid, sizeof(vi->ctrl_vid));
 
if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_VLAN,
  VIRTIO_NET_CTRL_VLAN_DEL, ))
-- 
2.7.4

[PATCH v7 03/11] rtnl: add option for setting link xdp prog

2016-07-11 Thread Brenden Blanco

Sets the bpf program represented by fd as an early filter in the rx path
of the netdev. The fd must have been created as BPF_PROG_TYPE_XDP.
Providing a negative value as fd clears the program. Getting the fd back
via rtnl is not possible, therefore reading of this value merely
provides a bool whether the program is valid on the link or not.

Signed-off-by: Brenden Blanco 
---
 include/uapi/linux/if_link.h | 12 +
 net/core/rtnetlink.c | 64 
 2 files changed, 76 insertions(+)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 4285ac3..a1b5202 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -156,6 +156,7 @@ enum {
IFLA_GSO_MAX_SEGS,
IFLA_GSO_MAX_SIZE,
IFLA_PAD,
+   IFLA_XDP,
__IFLA_MAX
 };
 
@@ -843,4 +844,15 @@ enum {
 };
 #define LINK_XSTATS_TYPE_MAX (__LINK_XSTATS_TYPE_MAX - 1)
 
+/* XDP section */
+
+enum {
+   IFLA_XDP_UNSPEC,
+   IFLA_XDP_FD,
+   IFLA_XDP_ATTACHED,
+   __IFLA_XDP_MAX,
+};
+
+#define IFLA_XDP_MAX (__IFLA_XDP_MAX - 1)
+
 #endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a9e3805..eba2b82 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -891,6 +891,16 @@ static size_t rtnl_port_size(const struct net_device *dev,
return port_self_size;
 }
 
+static size_t rtnl_xdp_size(const struct net_device *dev)
+{
+   size_t xdp_size = nla_total_size(1);/* XDP_ATTACHED */
+
+   if (!dev->netdev_ops->ndo_xdp)
+   return 0;
+   else
+   return xdp_size;
+}
+
 static noinline size_t if_nlmsg_size(const struct net_device *dev,
 u32 ext_filter_mask)
 {
@@ -927,6 +937,7 @@ static noinline size_t if_nlmsg_size(const struct 
net_device *dev,
   + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
   + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
   + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */
+  + rtnl_xdp_size(dev) /* IFLA_XDP */
   + nla_total_size(1); /* IFLA_PROTO_DOWN */
 
 }
@@ -1211,6 +1222,33 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, 
struct net_device *dev)
return 0;
 }
 
+static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
+{
+   struct netdev_xdp xdp_op = {};
+   struct nlattr *xdp;
+   int err;
+
+   if (!dev->netdev_ops->ndo_xdp)
+   return 0;
+   xdp = nla_nest_start(skb, IFLA_XDP);
+   if (!xdp)
+   return -EMSGSIZE;
+   xdp_op.command = XDP_QUERY_PROG;
+   err = dev->netdev_ops->ndo_xdp(dev, _op);
+   if (err)
+   goto err_cancel;
+   err = nla_put_u8(skb, IFLA_XDP_ATTACHED, xdp_op.prog_attached);
+   if (err)
+   goto err_cancel;
+
+   nla_nest_end(skb, xdp);
+   return 0;
+
+err_cancel:
+   nla_nest_cancel(skb, xdp);
+   return err;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
int type, u32 pid, u32 seq, u32 change,
unsigned int flags, u32 ext_filter_mask)
@@ -1307,6 +1345,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct 
net_device *dev,
if (rtnl_port_fill(skb, dev, ext_filter_mask))
goto nla_put_failure;
 
+   if (rtnl_xdp_fill(skb, dev))
+   goto nla_put_failure;
+
if (dev->rtnl_link_ops || rtnl_have_link_slave_info(dev)) {
if (rtnl_link_fill(skb, dev) < 0)
goto nla_put_failure;
@@ -1392,6 +1433,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
[IFLA_PHYS_SWITCH_ID]   = { .type = NLA_BINARY, .len = 
MAX_PHYS_ITEM_ID_LEN },
[IFLA_LINK_NETNSID] = { .type = NLA_S32 },
[IFLA_PROTO_DOWN]   = { .type = NLA_U8 },
+   [IFLA_XDP]  = { .type = NLA_NESTED },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -1429,6 +1471,11 @@ static const struct nla_policy 
ifla_port_policy[IFLA_PORT_MAX+1] = {
[IFLA_PORT_RESPONSE]= { .type = NLA_U16, },
 };
 
+static const struct nla_policy ifla_xdp_policy[IFLA_XDP_MAX + 1] = {
+   [IFLA_XDP_FD]   = { .type = NLA_S32 },
+   [IFLA_XDP_ATTACHED] = { .type = NLA_U8 },
+};
+
 static const struct rtnl_link_ops *linkinfo_to_kind_ops(const struct nlattr 
*nla)
 {
const struct rtnl_link_ops *ops = NULL;
@@ -2054,6 +2101,23 @@ static int do_setlink(const struct sk_buff *skb,
status |= DO_SETLINK_NOTIFY;
}
 
+   if (tb[IFLA_XDP]) {
+   struct nlattr *xdp[IFLA_XDP_MAX + 1];
+
+   err = nla_parse_nested(xdp, IFLA_XDP_MAX, tb[IFLA_XDP],
+  ifla_xdp_policy);
+   if (err < 0)
+

[PATCH v7 06/11] net/mlx4_en: add page recycle to prepare rx ring for tx support

2016-07-11 Thread Brenden Blanco

The mlx4 driver by default allocates order-3 pages for the ring to
consume in multiple fragments. When the device has an xdp program, this
behavior will prevent tx actions since the page must be re-mapped in
TODEVICE mode, which cannot be done if the page is still shared.

Start by making the allocator configurable based on whether xdp is
running, such that order-0 pages are always used and never shared.

Since this will stress the page allocator, add a simple page cache to
each rx ring. Pages in the cache are left dma-mapped, and in drop-only
stress tests the page allocator is eliminated from the perf report.

Note that setting an xdp program will now require the rings to be
reconfigured.

Before:
 26.91%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_process_rx_cq
 17.88%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_alloc_frags
  6.00%  ksoftirqd/0  [mlx4_en] [k] mlx4_en_free_frag
  4.49%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
  3.21%  swapper  [kernel.vmlinux]  [k] intel_idle
  2.73%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
  2.57%  swapper  [mlx4_en] [k] mlx4_en_process_rx_cq

After:
 31.72%  swapper  [kernel.vmlinux]   [k] intel_idle
  8.79%  swapper  [mlx4_en]  [k] mlx4_en_process_rx_cq
  7.54%  swapper  [kernel.vmlinux]   [k] poll_idle
  6.36%  swapper  [mlx4_core][k] mlx4_eq_int
  4.21%  swapper  [kernel.vmlinux]   [k] tasklet_action
  4.03%  swapper  [kernel.vmlinux]   [k] cpuidle_enter_state
  3.43%  swapper  [mlx4_en]  [k] mlx4_en_prepare_rx_desc
  2.18%  swapper  [kernel.vmlinux]   [k] native_irq_return_iret
  1.37%  swapper  [kernel.vmlinux]   [k] menu_select
  1.09%  swapper  [kernel.vmlinux]   [k] bpf_map_lookup_elem

Signed-off-by: Brenden Blanco 
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  | 46 +++--
 drivers/net/ethernet/mellanox/mlx4/en_rx.c  | 69 ++---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h| 12 -
 4 files changed, 115 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index 51a2e82..d3d51fa 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -47,7 +47,7 @@
 #define EN_ETHTOOL_SHORT_MASK cpu_to_be16(0x)
 #define EN_ETHTOOL_WORD_MASK  cpu_to_be32(0x)
 
-static int mlx4_en_moderation_update(struct mlx4_en_priv *priv)
+int mlx4_en_moderation_update(struct mlx4_en_priv *priv)
 {
int i;
int err = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 31070f9..0417023 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2532,19 +2532,57 @@ static int mlx4_en_set_tx_maxrate(struct net_device 
*dev, int queue_index, u32 m
 static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 {
struct mlx4_en_priv *priv = netdev_priv(dev);
+   struct mlx4_en_dev *mdev = priv->mdev;
struct bpf_prog *old_prog;
+   int port_up = 0;
+   int err;
+
+   /* No need to reconfigure buffers when simply swapping the
+* program for a new one.
+*/
+   if (READ_ONCE(priv->prog) && prog) {
+   /* This xchg is paired with READ_ONCE in the fast path, but is
+* also protected from itself via rtnl lock
+*/
+   old_prog = xchg(>prog, prog);
+   if (old_prog)
+   bpf_prog_put(old_prog);
+   return 0;
+   }
 
if (priv->num_frags > 1)
return -EOPNOTSUPP;
 
-   /* This xchg is paired with READ_ONCE in the fast path, but is
-* also protected from itself via rtnl lock
-*/
+   mutex_lock(>state_lock);
+   if (priv->port_up) {
+   port_up = 1;
+   mlx4_en_stop_port(dev, 1);
+   }
+
+   mlx4_en_free_resources(priv);
+
old_prog = xchg(>prog, prog);
if (old_prog)
bpf_prog_put(old_prog);
 
-   return 0;
+   err = mlx4_en_alloc_resources(priv);
+   if (err) {
+   en_err(priv, "Failed reallocating port resources\n");
+   goto out;
+   }
+   if (port_up) {
+   err = mlx4_en_start_port(dev);
+   if (err)
+   en_err(priv, "Failed starting port\n");
+   }
+
+   err = mlx4_en_moderation_update(priv);
+
+out:
+   if (err)
+   priv->prog = NULL;
+   mutex_unlock(>state_lock);
+   return err;
 }
 
 static bool mlx4_xdp_attached(struct net_device *dev)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c

[PATCH v7 10/11] bpf: enable direct packet data write for xdp progs

2016-07-11 Thread Brenden Blanco

For forwarding to be effective, XDP programs should be allowed to
rewrite packet data.

This requires that the drivers supporting XDP must all map the packet
memory as TODEVICE or BIDIRECTIONAL before invoking the program.

Signed-off-by: Brenden Blanco 
---
 kernel/bpf/verifier.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a8d67d0..f72f23b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -653,6 +653,16 @@ static int check_map_access(struct verifier_env *env, u32 
regno, int off,
 
 #define MAX_PACKET_OFF 0x
 
+static bool may_write_pkt_data(enum bpf_prog_type type)
+{
+   switch (type) {
+   case BPF_PROG_TYPE_XDP:
+   return true;
+   default:
+   return false;
+   }
+}
+
 static int check_packet_access(struct verifier_env *env, u32 regno, int off,
   int size)
 {
@@ -806,10 +816,15 @@ static int check_mem_access(struct verifier_env *env, u32 
regno, int off,
err = check_stack_read(state, off, size, value_regno);
}
} else if (state->regs[regno].type == PTR_TO_PACKET) {
-   if (t == BPF_WRITE) {
+   if (t == BPF_WRITE && !may_write_pkt_data(env->prog->type)) {
verbose("cannot write into packet\n");
return -EACCES;
}
+   if (t == BPF_WRITE && value_regno >= 0 &&
+   is_pointer_value(env, value_regno)) {
+   verbose("R%d leaks addr into packet\n", value_regno);
+   return -EACCES;
+   }
err = check_packet_access(env, regno, off, size);
if (!err && t == BPF_READ && value_regno >= 0)
mark_reg_unknown_value(state->regs, value_regno);
-- 
2.8.2

[PATCH v7 07/11] bpf: add XDP_TX xdp_action for direct forwarding

2016-07-11 Thread Brenden Blanco

XDP enabled drivers must transmit received packets back out on the same
port they were received on when a program returns this action.

Signed-off-by: Brenden Blanco 
---
 include/uapi/linux/bpf.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 4282d44..a8f1ea1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -447,6 +447,7 @@ enum xdp_action {
XDP_ABORTED = 0,
XDP_DROP,
XDP_PASS,
+   XDP_TX,
 };
 
 /* user accessible metadata for XDP packet hook
-- 
2.8.2

[PATCH v7 11/11] bpf: add sample for xdp forwarding and rewrite

2016-07-11 Thread Brenden Blanco

Add a sample that rewrites and forwards packets out on the same
interface. Observed single core forwarding performance of ~10Mpps.

Since the mlx4 driver under test recycles every single packet page, the
perf output shows almost exclusively just the ring management and bpf
program work. Slowdowns are likely occurring due to cache misses.

Signed-off-by: Brenden Blanco 
---
 samples/bpf/Makefile|   5 +++
 samples/bpf/xdp2_kern.c | 114 
 2 files changed, 119 insertions(+)
 create mode 100644 samples/bpf/xdp2_kern.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 0e4ab3a..d2d2b35 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -22,6 +22,7 @@ hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
 hostprogs-y += xdp1
+hostprogs-y += xdp2
 
 test_verifier-objs := test_verifier.o libbpf.o
 test_maps-objs := test_maps.o libbpf.o
@@ -44,6 +45,8 @@ map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
 test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
 xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
+# reuse xdp1 source intentionally
+xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -67,6 +70,7 @@ always += test_overhead_kprobe_kern.o
 always += parse_varlen.o parse_simple.o parse_ldabs.o
 always += test_cgrp2_tc_kern.o
 always += xdp1_kern.o
+always += xdp2_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
@@ -88,6 +92,7 @@ HOSTLOADLIBES_spintest += -lelf
 HOSTLOADLIBES_map_perf_test += -lelf -lrt
 HOSTLOADLIBES_test_overhead += -lelf -lrt
 HOSTLOADLIBES_xdp1 += -lelf
+HOSTLOADLIBES_xdp2 += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/xdp2_kern.c b/samples/bpf/xdp2_kern.c
new file mode 100644
index 000..38fe7e1
--- /dev/null
+++ b/samples/bpf/xdp2_kern.c
@@ -0,0 +1,114 @@
+/* Copyright (c) 2016 PLUMgrid
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#define KBUILD_MODNAME "foo"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") dropcnt = {
+   .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+   .key_size = sizeof(u32),
+   .value_size = sizeof(long),
+   .max_entries = 256,
+};
+
+static void swap_src_dst_mac(void *data)
+{
+   unsigned short *p = data;
+   unsigned short dst[3];
+
+   dst[0] = p[0];
+   dst[1] = p[1];
+   dst[2] = p[2];
+   p[0] = p[3];
+   p[1] = p[4];
+   p[2] = p[5];
+   p[3] = dst[0];
+   p[4] = dst[1];
+   p[5] = dst[2];
+}
+
+static int parse_ipv4(void *data, u64 nh_off, void *data_end)
+{
+   struct iphdr *iph = data + nh_off;
+
+   if (iph + 1 > data_end)
+   return 0;
+   return iph->protocol;
+}
+
+static int parse_ipv6(void *data, u64 nh_off, void *data_end)
+{
+   struct ipv6hdr *ip6h = data + nh_off;
+
+   if (ip6h + 1 > data_end)
+   return 0;
+   return ip6h->nexthdr;
+}
+
+SEC("xdp1")
+int xdp_prog1(struct xdp_md *ctx)
+{
+   void *data_end = (void *)(long)ctx->data_end;
+   void *data = (void *)(long)ctx->data;
+   struct ethhdr *eth = data;
+   int rc = XDP_DROP;
+   long *value;
+   u16 h_proto;
+   u64 nh_off;
+   u32 index;
+
+   nh_off = sizeof(*eth);
+   if (data + nh_off > data_end)
+   return rc;
+
+   h_proto = eth->h_proto;
+
+   if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
+   struct vlan_hdr *vhdr;
+
+   vhdr = data + nh_off;
+   nh_off += sizeof(struct vlan_hdr);
+   if (data + nh_off > data_end)
+   return rc;
+   h_proto = vhdr->h_vlan_encapsulated_proto;
+   }
+   if (h_proto == htons(ETH_P_8021Q) || h_proto == htons(ETH_P_8021AD)) {
+   struct vlan_hdr *vhdr;
+
+   vhdr = data + nh_off;
+   nh_off += sizeof(struct vlan_hdr);
+   if (data + nh_off > data_end)
+   return rc;
+   h_proto = vhdr->h_vlan_encapsulated_proto;
+   }
+
+   if (h_proto == htons(ETH_P_IP))
+   index = parse_ipv4(data, nh_off, data_end);
+   else if (h_proto == htons(ETH_P_IPV6))
+   index = parse_ipv6(data, nh_off, data_end);
+   else
+   index = 0;
+
+   value = bpf_map_lookup_elem(, );
+   if (value)
+   *value += 1;
+
+   if (index == 17) {
+

Re: [net-next:master 1118/1134] drivers/net/dsa/b53/b53_srab.c:388:20: warning: cast from pointer to integer of different size

2016-07-11 Thread David Miller

From: kbuild test robot 
Date: Tue, 12 Jul 2016 05:19:57 +0800

> All warnings (new ones prefixed by >>):
> 
>drivers/net/dsa/b53/b53_srab.c: In function 'b53_srab_probe':
>>> drivers/net/dsa/b53/b53_srab.c:388:20: warning: cast from pointer to 
>>> integer of different size [-Wpointer-to-int-cast]
>   pdata->chip_id = (u32)of_id->data;
>^
> 

Fixed as follows:


[PATCH] b53: Fix build warning.

   drivers/net/dsa/b53/b53_srab.c: In function 'b53_srab_probe':
>> drivers/net/dsa/b53/b53_srab.c:388:20: warning: cast from pointer to integer 
>> of different size [-Wpointer-to-int-cast]
  pdata->chip_id = (u32)of_id->data;
   ^

Reported-by: kbuild test robot 
Signed-off-by: David S. Miller 
---
 drivers/net/dsa/b53/b53_srab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c
index 2b304ea..3e2d4a5 100644
--- a/drivers/net/dsa/b53/b53_srab.c
+++ b/drivers/net/dsa/b53/b53_srab.c
@@ -393,7 +393,7 @@ static int b53_srab_probe(struct platform_device *pdev)
if (!pdata)
return -ENOMEM;
 
-   pdata->chip_id = (u32)of_id->data;
+   pdata->chip_id = (u32)(unsigned long)of_id->data;
}
 
priv = devm_kzalloc(>dev, sizeof(*priv), GFP_KERNEL);
-- 
2.1.0

[PATCH v7 08/11] net/mlx4_en: break out tx_desc write into separate function

2016-07-11 Thread Brenden Blanco

In preparation for writing the tx descriptor from multiple functions,
create a helper for both normal and blueflame access.

Signed-off-by: Brenden Blanco 
---
 drivers/infiniband/hw/mlx4/qp.c|  11 +--
 drivers/net/ethernet/mellanox/mlx4/en_tx.c | 127 +
 include/linux/mlx4/qp.h|  18 ++--
 3 files changed, 90 insertions(+), 66 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 8db8405..768085f 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -232,7 +232,7 @@ static void stamp_send_wqe(struct mlx4_ib_qp *qp, int n, 
int size)
}
} else {
ctrl = buf = get_send_wqe(qp, n & (qp->sq.wqe_cnt - 1));
-   s = (ctrl->fence_size & 0x3f) << 4;
+   s = (ctrl->qpn_vlan.fence_size & 0x3f) << 4;
for (i = 64; i < s; i += 64) {
wqe = buf + i;
*wqe = cpu_to_be32(0x);
@@ -264,7 +264,7 @@ static void post_nop_wqe(struct mlx4_ib_qp *qp, int n, int 
size)
inl->byte_count = cpu_to_be32(1 << 31 | (size - s - sizeof 
*inl));
}
ctrl->srcrb_flags = 0;
-   ctrl->fence_size = size / 16;
+   ctrl->qpn_vlan.fence_size = size / 16;
/*
 * Make sure descriptor is fully written before setting ownership bit
 * (because HW can start executing as soon as we do).
@@ -1992,7 +1992,8 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
ctrl = get_send_wqe(qp, i);
ctrl->owner_opcode = cpu_to_be32(1 << 31);
if (qp->sq_max_wqes_per_wr == 1)
-   ctrl->fence_size = 1 << (qp->sq.wqe_shift - 4);
+   ctrl->qpn_vlan.fence_size =
+   1 << (qp->sq.wqe_shift - 4);
 
stamp_send_wqe(qp, i, 1 << qp->sq.wqe_shift);
}
@@ -3169,8 +3170,8 @@ int mlx4_ib_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
wmb();
*lso_wqe = lso_hdr_sz;
 
-   ctrl->fence_size = (wr->send_flags & IB_SEND_FENCE ?
-   MLX4_WQE_CTRL_FENCE : 0) | size;
+   ctrl->qpn_vlan.fence_size = (wr->send_flags & IB_SEND_FENCE ?
+MLX4_WQE_CTRL_FENCE : 0) | size;
 
/*
 * Make sure descriptor is fully written before
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 76aa4d2..c29191e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -700,10 +700,66 @@ static void mlx4_bf_copy(void __iomem *dst, const void 
*src,
__iowrite64_copy(dst, src, bytecnt / 8);
 }
 
+void mlx4_en_xmit_doorbell(struct mlx4_en_tx_ring *ring)
+{
+   wmb();
+   /* Since there is no iowrite*_native() that writes the
+* value as is, without byteswapping - using the one
+* the doesn't do byteswapping in the relevant arch
+* endianness.
+*/
+#if defined(__LITTLE_ENDIAN)
+   iowrite32(
+#else
+   iowrite32be(
+#endif
+ ring->doorbell_qpn,
+ ring->bf.uar->map + MLX4_SEND_DOORBELL);
+}
+
+static void mlx4_en_tx_write_desc(struct mlx4_en_tx_ring *ring,
+ struct mlx4_en_tx_desc *tx_desc,
+ union mlx4_wqe_qpn_vlan qpn_vlan,
+ int desc_size, int bf_index,
+ __be32 op_own, bool bf_ok,
+ bool send_doorbell)
+{
+   tx_desc->ctrl.qpn_vlan = qpn_vlan;
+
+   if (bf_ok) {
+   op_own |= htonl((bf_index & 0x) << 8);
+   /* Ensure new descriptor hits memory
+* before setting ownership of this descriptor to HW
+*/
+   dma_wmb();
+   tx_desc->ctrl.owner_opcode = op_own;
+
+   wmb();
+
+   mlx4_bf_copy(ring->bf.reg + ring->bf.offset, _desc->ctrl,
+desc_size);
+
+   wmb();
+
+   ring->bf.offset ^= ring->bf.buf_size;
+   } else {
+   /* Ensure new descriptor hits memory
+* before setting ownership of this descriptor to HW
+*/
+   dma_wmb();
+   tx_desc->ctrl.owner_opcode = op_own;
+   if (send_doorbell)
+   mlx4_en_xmit_doorbell(ring);
+   else
+   ring->xmit_more++;
+   }
+}
+
 netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct skb_shared_info *shinfo = skb_shinfo(skb);
struct mlx4_en_priv *priv = netdev_priv(dev);
+

[PATCH v7 00/11] Add driver bpf hook for early packet drop and forwarding

2016-07-11 Thread Brenden Blanco

This patch set introduces new infrastructure for programmatically
processing packets in the earliest stages of rx, as part of an effort
others are calling eXpress Data Path (XDP) [1]. Start this effort by
introducing a new bpf program type for early packet filtering, before
even an skb has been allocated.

Extend on this with the ability to modify packet data and send back out
on the same port.

Patch 1 introduces the new prog type and helpers for validating the bpf
  program. A new userspace struct is defined containing only data and
  data_end as fields, with others to follow in the future.
In patch 2, create a new ndo to pass the fd to supported drivers.
In patch 3, expose a new rtnl option to userspace.
In patch 4, enable support in mlx4 driver.
In patch 5, create a sample drop and count program. With single core,
  achieved ~20 Mpps drop rate on a 40G ConnectX3-Pro. This includes
  packet data access, bpf array lookup, and increment.
In patch 6, add a page recycle facility to mlx4 rx, enabled when xdp is
  active.
In patch 7, add the XDP_TX type to bpf.h
In patch 8, add helper in tx patch for writing tx_desc
In patch 9, add support in mlx4 for packet data write and forwarding
In patch 10, turn on packet write support in the bpf verifier
In patch 11, add a sample program for packet write and forwarding. With
  single core, achieved ~10 Mpps rewrite and forwarding.

[1] https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf

v7:
 Addressing two of the major discussion points: return codes and ndo.
 The rest will be taken as todo items for separate patches.

 Add an XDP_ABORTED type, which explicitly falls through to DROP. The
 same result must be taken for the default case as well, as it is now
 well-defined API behavior.

 Merge ndo_xdp_* into a single ndo. The style is similar to
 ndo_setup_tc, but with less unidirectional naming convention. The IFLA
 parameter names are unchanged.

 TODOs:
 Add ethtool per-ring stats for aborted, default cases, maybe even drop
 and tx as well.
 Avoid duplicate dma sync operation in XDP_PASS case as mentioned by
 Saeed.

  1/12: Add XDP_ABORTED enum, reword API comment, and update commit
   message.
  2/12: Rewrite ndo_xdp_*() into single ndo_xdp() with type/union style
calling convention.
  3/12: Switch to ndo_xdp callback.
  4/12: Add XDP_ABORTED case as a fall-through to XDP_DROP. Implement
ndo_xdp.
 12/12: Dropped, this will need some more work.

v6:
  2/12: drop unnecessary netif_device_present check
  4/12, 6/12, 9/12: Reorder default case statement above drop case to
remove some copy/paste.

v5:
  0/12: Rebase and remove previous 1/13 patch
  1/12: Fix nits from Daniel. Left the (void *) cast as-is, to be fixed
in future. Add bpf_warn_invalid_xdp_action() helper, to be used when
out of bounds action is returned by the program. Add a comment to
bpf.h denoting the undefined nature of out of bounds returns.
  2/12: Switch to using bpf_prog_get_type(). Rename ndo_xdp_get() to
ndo_xdp_attached().
  3/12: Add IFLA_XDP as a nested type, and add the associated nla_policy
for the new subtypes IFLA_XDP_FD and IFLA_XDP_ATTACHED.
  4/12: Fixup the use of READ_ONCE in the ndos. Add a user of
bpf_warn_invalid_xdp_action helper.
  5/12: Adjust to using the nested netlink options.
  6/12: kbuild was complaining about overflow of u16 on tile
architecture...bump frag_stride to u32. The page_offset member that
is computed from this was already u32.

v4:
  2/12: Add inline helper for calling xdp bpf prog under rcu
  3/12: Add detail to ndo comments
  5/12: Remove mlx4_call_xdp and use inline helper instead.
  6/12: Fix checkpatch complaints
  9/12: Introduce new patch 9/12 with common helper for tx_desc write
Refactor to use common tx_desc write helper
 11/12: Fix checkpatch complaints

v3:
  Rewrite from v2 trying to incorporate feedback from multiple sources.
  Specifically, add ability to forward packets out the same port and
allow packet modification.
  For packet forwarding, the driver reserves a dedicated set of tx rings
for exclusive use by xdp. Upon completion, the pages on this ring are
recycled directly back to a small per-rx-ring page cache without
being dma unmapped.
  Use of the percpu skb is dropped in favor of a lightweight struct
xdp_buff. The direct packet access feature is leveraged to remove
dependence on the skb.
  The mlx4 driver implementation allocates a page-per-packet and maps it
in PCI_DMA_BIDIRECTIONAL mode when the bpf program is activated.
  Naming is converted to use "xdp" instead of "phys_dev".

v2:
  1/5: Drop xdp from types, instead consistently use bpf_phys_dev_
Introduce enum for return values from phys_dev hook
  2/5: Move prog->type check to just before invoking ndo
Change ndo to take a bpf_prog * instead of fd
Add ndo_bpf_get rather than keeping a bool in the netdev struct
  3/5: Use ndo_bpf_get to fetch bool
  4/5: Enforce that only 1 frag

[PATCH v7 05/11] Add sample for adding simple drop program to link

2016-07-11 Thread Brenden Blanco

Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
hook of a link. With the drop-only program, observed single core rate is
~20Mpps.

Other tests were run, for instance without the dropcnt increment or
without reading from the packet header, the packet rate was mostly
unchanged.

$ perf record -a samples/bpf/xdp1 $(
---
 samples/bpf/Makefile|   4 ++
 samples/bpf/bpf_load.c  |   8 +++
 samples/bpf/xdp1_kern.c |  93 +
 samples/bpf/xdp1_user.c | 181 
 4 files changed, 286 insertions(+)
 create mode 100644 samples/bpf/xdp1_kern.c
 create mode 100644 samples/bpf/xdp1_user.c

diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index a98b780..0e4ab3a 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -21,6 +21,7 @@ hostprogs-y += spintest
 hostprogs-y += map_perf_test
 hostprogs-y += test_overhead
 hostprogs-y += test_cgrp2_array_pin
+hostprogs-y += xdp1
 
 test_verifier-objs := test_verifier.o libbpf.o
 test_maps-objs := test_maps.o libbpf.o
@@ -42,6 +43,7 @@ spintest-objs := bpf_load.o libbpf.o spintest_user.o
 map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
 test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
 test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
+xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
 
 # Tell kbuild to always build the programs
 always := $(hostprogs-y)
@@ -64,6 +66,7 @@ always += test_overhead_tp_kern.o
 always += test_overhead_kprobe_kern.o
 always += parse_varlen.o parse_simple.o parse_ldabs.o
 always += test_cgrp2_tc_kern.o
+always += xdp1_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
@@ -84,6 +87,7 @@ HOSTLOADLIBES_offwaketime += -lelf
 HOSTLOADLIBES_spintest += -lelf
 HOSTLOADLIBES_map_perf_test += -lelf -lrt
 HOSTLOADLIBES_test_overhead += -lelf -lrt
+HOSTLOADLIBES_xdp1 += -lelf
 
 # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on 
cmdline:
 #  make samples/bpf/ LLC=~/git/llvm/build/bin/llc 
CLANG=~/git/llvm/build/bin/clang
diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 022af71..0cfda23 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -50,6 +50,7 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
bool is_kprobe = strncmp(event, "kprobe/", 7) == 0;
bool is_kretprobe = strncmp(event, "kretprobe/", 10) == 0;
bool is_tracepoint = strncmp(event, "tracepoint/", 11) == 0;
+   bool is_xdp = strncmp(event, "xdp", 3) == 0;
enum bpf_prog_type prog_type;
char buf[256];
int fd, efd, err, id;
@@ -66,6 +67,8 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
prog_type = BPF_PROG_TYPE_KPROBE;
} else if (is_tracepoint) {
prog_type = BPF_PROG_TYPE_TRACEPOINT;
+   } else if (is_xdp) {
+   prog_type = BPF_PROG_TYPE_XDP;
} else {
printf("Unknown event '%s'\n", event);
return -1;
@@ -79,6 +82,9 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
 
prog_fd[prog_cnt++] = fd;
 
+   if (is_xdp)
+   return 0;
+
if (is_socket) {
event += 6;
if (*event != '/')
@@ -319,6 +325,7 @@ int load_bpf_file(char *path)
if (memcmp(shname_prog, "kprobe/", 7) == 0 ||
memcmp(shname_prog, "kretprobe/", 10) == 0 ||
memcmp(shname_prog, "tracepoint/", 11) == 0 ||
+   memcmp(shname_prog, "xdp", 3) == 0 ||
memcmp(shname_prog, "socket", 6) == 0)
load_and_attach(shname_prog, insns, 
data_prog->d_size);
}
@@ -336,6 +343,7 @@ int load_bpf_file(char *path)
if (memcmp(shname, "kprobe/", 7) == 0 ||
memcmp(shname, "kretprobe/", 10) == 0 ||
memcmp(shname, "tracepoint/", 11) == 0 ||
+   memcmp(shname, "xdp", 3) == 0 ||
memcmp(shname, "socket", 6) == 0)
load_and_attach(shname, data->d_buf, data->d_size);
}
diff --git a/samples/bpf/xdp1_kern.c b/samples/bpf/xdp1_kern.c
new file mode 100644
index 000..e7dd8ac
--- /dev/null
+++ b/samples/bpf/xdp1_kern.c
@@ -0,0 +1,93 @@
+/* Copyright (c) 2016 PLUMgrid
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#define KBUILD_MODNAME "foo"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") dropcnt = {
+   .type = BPF_MAP_TYPE_PERCPU_ARRAY,
+   .key_size = sizeof(u32),
+   .value_size = sizeof(long),
+   .max_entries = 256,
+};
+

[PATCH v7 09/11] net/mlx4_en: add xdp forwarding and data write support

2016-07-11 Thread Brenden Blanco

A user will now be able to loop packets back out of the same port using
a bpf program attached to xdp hook. Updates to the packet contents from
the bpf program is also supported.

For the packet write feature to work, the rx buffers are now mapped as
bidirectional when the page is allocated. This occurs only when the xdp
hook is active.

When the program returns a TX action, enqueue the packet directly to a
dedicated tx ring, so as to avoid completely any locking. This requires
the tx ring to be allocated 1:1 for each rx ring, as well as the tx
completion running in the same softirq.

Upon tx completion, this dedicated tx ring recycles pages without
unmapping directly back to the original rx ring. In steady state tx/drop
workload, effectively 0 page allocs/frees will occur.

Signed-off-by: Brenden Blanco 
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  15 ++-
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  |  19 +++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c  |  14 +++
 drivers/net/ethernet/mellanox/mlx4/en_tx.c  | 126 +++-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h|  14 ++-
 5 files changed, 181 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index d3d51fa..10642b1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -1694,6 +1694,11 @@ static int mlx4_en_set_rxnfc(struct net_device *dev, 
struct ethtool_rxnfc *cmd)
return err;
 }
 
+static int mlx4_en_max_tx_channels(struct mlx4_en_priv *priv)
+{
+   return (MAX_TX_RINGS - priv->rsv_tx_rings) / MLX4_EN_NUM_UP;
+}
+
 static void mlx4_en_get_channels(struct net_device *dev,
 struct ethtool_channels *channel)
 {
@@ -1705,7 +1710,7 @@ static void mlx4_en_get_channels(struct net_device *dev,
channel->max_tx = MLX4_EN_MAX_TX_RING_P_UP;
 
channel->rx_count = priv->rx_ring_num;
-   channel->tx_count = priv->tx_ring_num / MLX4_EN_NUM_UP;
+   channel->tx_count = priv->num_tx_rings_p_up;
 }
 
 static int mlx4_en_set_channels(struct net_device *dev,
@@ -1717,7 +1722,7 @@ static int mlx4_en_set_channels(struct net_device *dev,
int err = 0;
 
if (channel->other_count || channel->combined_count ||
-   channel->tx_count > MLX4_EN_MAX_TX_RING_P_UP ||
+   channel->tx_count > mlx4_en_max_tx_channels(priv) ||
channel->rx_count > MAX_RX_RINGS ||
!channel->tx_count || !channel->rx_count)
return -EINVAL;
@@ -1731,7 +1736,8 @@ static int mlx4_en_set_channels(struct net_device *dev,
mlx4_en_free_resources(priv);
 
priv->num_tx_rings_p_up = channel->tx_count;
-   priv->tx_ring_num = channel->tx_count * MLX4_EN_NUM_UP;
+   priv->tx_ring_num = channel->tx_count * MLX4_EN_NUM_UP +
+   priv->rsv_tx_rings;
priv->rx_ring_num = channel->rx_count;
 
err = mlx4_en_alloc_resources(priv);
@@ -1740,7 +1746,8 @@ static int mlx4_en_set_channels(struct net_device *dev,
goto out;
}
 
-   netif_set_real_num_tx_queues(dev, priv->tx_ring_num);
+   netif_set_real_num_tx_queues(dev, priv->tx_ring_num -
+   priv->rsv_tx_rings);
netif_set_real_num_rx_queues(dev, priv->rx_ring_num);
 
if (dev->num_tc)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c 
b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 0417023..3257db7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1636,7 +1636,7 @@ int mlx4_en_start_port(struct net_device *dev)
/* Configure ring */
tx_ring = priv->tx_ring[i];
err = mlx4_en_activate_tx_ring(priv, tx_ring, cq->mcq.cqn,
-   i / priv->num_tx_rings_p_up);
+   i / (priv->tx_ring_num / MLX4_EN_NUM_UP));
if (err) {
en_err(priv, "Failed allocating Tx ring\n");
mlx4_en_deactivate_cq(priv, cq);
@@ -2022,6 +2022,16 @@ int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)
goto err;
}
 
+   /* When rsv_tx_rings is non-zero, each rx ring will have a
+* corresponding tx ring, with the tx completion event for that ring
+* recycling buffers into the cache.
+*/
+   for (i = 0; i < priv->rsv_tx_rings; i++) {
+   int j = (priv->tx_ring_num - priv->rsv_tx_rings) + i;
+
+   priv->tx_ring[j]->recycle_ring = priv->rx_ring[i];
+   }
+
 #ifdef CONFIG_RFS_ACCEL
priv->dev->rx_cpu_rmap = mlx4_get_cpu_rmap(priv->mdev->dev, priv->port);
 #endif
@@ -2534,9 +2544,12 @@ static int mlx4_xdp_set(struct net_device *dev, struct 
bpf_prog *prog)

[PATCH v7 01/11] bpf: add XDP prog type for early driver filter

2016-07-11 Thread Brenden Blanco

Add a new bpf prog type that is intended to run in early stages of the
packet rx path. Only minimal packet metadata will be available, hence a
new context type, struct xdp_md, is exposed to userspace. So far only
expose the packet start and end pointers, and only in read mode.

An XDP program must return one of the well known enum values, all other
return codes are reserved for future use. Unfortunately, this
restriction is hard to enforce at verification time, so take the
approach of warning at runtime when such programs are encountered. Out
of bounds return codes should alias to XDP_ABORTED.

Signed-off-by: Brenden Blanco 
---
 include/linux/filter.h   | 18 ++
 include/uapi/linux/bpf.h | 20 +++
 kernel/bpf/verifier.c|  1 +
 net/core/filter.c| 91 
 4 files changed, 130 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 6fc31ef..522dbc9 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -368,6 +368,11 @@ struct bpf_skb_data_end {
void *data_end;
 };
 
+struct xdp_buff {
+   void *data;
+   void *data_end;
+};
+
 /* compute the linear packet data range [data, data_end) which
  * will be accessed by cls_bpf and act_bpf programs
  */
@@ -429,6 +434,18 @@ static inline u32 bpf_prog_run_clear_cb(const struct 
bpf_prog *prog,
return BPF_PROG_RUN(prog, skb);
 }
 
+static inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
+  struct xdp_buff *xdp)
+{
+   u32 ret;
+
+   rcu_read_lock();
+   ret = BPF_PROG_RUN(prog, (void *)xdp);
+   rcu_read_unlock();
+
+   return ret;
+}
+
 static inline unsigned int bpf_prog_size(unsigned int proglen)
 {
return max(sizeof(struct bpf_prog),
@@ -509,6 +526,7 @@ bool bpf_helper_changes_skb_data(void *func);
 
 struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
   const struct bpf_insn *patch, u32 len);
+void bpf_warn_invalid_xdp_action(int act);
 
 #ifdef CONFIG_BPF_JIT
 extern int bpf_jit_enable;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 262a7e8..4282d44 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -94,6 +94,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SCHED_CLS,
BPF_PROG_TYPE_SCHED_ACT,
BPF_PROG_TYPE_TRACEPOINT,
+   BPF_PROG_TYPE_XDP,
 };
 
 #define BPF_PSEUDO_MAP_FD  1
@@ -437,4 +438,23 @@ struct bpf_tunnel_key {
__u32 tunnel_label;
 };
 
+/* User return codes for XDP prog type.
+ * A valid XDP program must return one of these defined values. All other
+ * return codes are reserved for future use. Unknown return codes will result
+ * in packet drop.
+ */
+enum xdp_action {
+   XDP_ABORTED = 0,
+   XDP_DROP,
+   XDP_PASS,
+};
+
+/* user accessible metadata for XDP packet hook
+ * new fields must be added to the end of this structure
+ */
+struct xdp_md {
+   __u32 data;
+   __u32 data_end;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index e206c21..a8d67d0 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -713,6 +713,7 @@ static int check_ptr_alignment(struct verifier_env *env, 
struct reg_state *reg,
switch (env->prog->type) {
case BPF_PROG_TYPE_SCHED_CLS:
case BPF_PROG_TYPE_SCHED_ACT:
+   case BPF_PROG_TYPE_XDP:
break;
default:
verbose("verifier is misconfigured\n");
diff --git a/net/core/filter.c b/net/core/filter.c
index 10c4a2f..3c993ac 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2369,6 +2369,12 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
}
 }
 
+static const struct bpf_func_proto *
+xdp_func_proto(enum bpf_func_id func_id)
+{
+   return sk_filter_func_proto(func_id);
+}
+
 static bool __is_valid_access(int off, int size, enum bpf_access_type type)
 {
if (off < 0 || off >= sizeof(struct __sk_buff))
@@ -2436,6 +2442,56 @@ static bool tc_cls_act_is_valid_access(int off, int size,
return __is_valid_access(off, size, type);
 }
 
+static bool __is_valid_xdp_access(int off, int size,
+ enum bpf_access_type type)
+{
+   if (off < 0 || off >= sizeof(struct xdp_md))
+   return false;
+   if (off % size != 0)
+   return false;
+   if (size != 4)
+   return false;
+
+   return true;
+}
+
+static bool xdp_is_valid_access(int off, int size,
+   enum bpf_access_type type,
+   enum bpf_reg_type *reg_type)
+{
+   if (type == BPF_WRITE)
+   return false;
+
+   switch (off) {
+   case offsetof(struct xdp_md, data):
+   *reg_type = PTR_TO_PACKET;
+   break;
+   case offsetof(struct xdp_md, data_end):
+   *reg_type

[net-next:master 1118/1134] drivers/net/dsa/b53/b53_srab.c:388:20: warning: cast from pointer to integer of different size

2016-07-11 Thread kbuild test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   a536a6e13ecd0d6eb0ffc36c5d56555896617282
commit: fefae6909ead1798c39bee4d94e7e8f1f2752ef6 [1118/1134] net: dsa: b53: 
Allow SRAB driver to specify platform data
config: ia64-allyesconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 4.9.0
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout fefae6909ead1798c39bee4d94e7e8f1f2752ef6
# save the attached .config to linux build tree
make.cross ARCH=ia64 

All warnings (new ones prefixed by >>):

   drivers/net/dsa/b53/b53_srab.c: In function 'b53_srab_probe':
>> drivers/net/dsa/b53/b53_srab.c:388:20: warning: cast from pointer to integer 
>> of different size [-Wpointer-to-int-cast]
  pdata->chip_id = (u32)of_id->data;
   ^

vim +388 drivers/net/dsa/b53/b53_srab.c

   372  {
   373  struct b53_platform_data *pdata = pdev->dev.platform_data;
   374  struct device_node *dn = pdev->dev.of_node;
   375  const struct of_device_id *of_id = NULL;
   376  struct b53_srab_priv *priv;
   377  struct b53_device *dev;
   378  struct resource *r;
   379  
   380  if (dn)
   381  of_id = of_match_node(b53_srab_of_match, dn);
   382  
   383  if (of_id) {
   384  pdata = devm_kzalloc(>dev, sizeof(*pdata), 
GFP_KERNEL);
   385  if (!pdata)
   386  return -ENOMEM;
   387  
 > 388  pdata->chip_id = (u32)of_id->data;
   389  }
   390  
   391  priv = devm_kzalloc(>dev, sizeof(*priv), GFP_KERNEL);
   392  if (!priv)
   393  return -ENOMEM;
   394  
   395  r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
   396  priv->regs = devm_ioremap_resource(>dev, r);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: [PATCH v6 01/12] bpf: add XDP prog type for early driver filter

2016-07-11 Thread Daniel Borkmann


On 07/11/2016 06:51 PM, Brenden Blanco wrote:

On Sun, Jul 10, 2016 at 03:56:02PM -0500, Tom Herbert wrote:

On Thu, Jul 7, 2016 at 9:15 PM, Brenden Blanco  wrote:

[...]

+static bool __is_valid_xdp_access(int off, int size,
+ enum bpf_access_type type)
+{
+   if (off < 0 || off >= sizeof(struct xdp_md))
+   return false;
+   if (off % size != 0)


off & 3 != 0

Feasible, but was intending to keep with the surrounding style. What do
the other bpf maintainers think?



+   return false;
+   if (size != 4)
+   return false;


If size must always be 4 why is it even an argument?

Because this is the first time that the verifier has a chance to check
it, and size == 4 could potentially be a prog_type-specific requirement.


Yep and wrt above, I think it's more important that all is_valid_*_access()
functions are consistent to each other and easily reviewable than adding
optimizations to some of them, which is slow-path anyway. If we find a nice
simplification, then we should apply it also to others obviously.

Re: Multi-thread udp 4.7 regression, bisected to 71d8c47fc653

2016-07-11 Thread Marc Dionne

On Mon, Jul 11, 2016 at 1:26 PM, Pablo Neira Ayuso  wrote:
> On Sun, Jul 10, 2016 at 04:48:26PM -0300, Marc Dionne wrote:
>> An update here since I've had some interactions with Pablo off list.
>>
>> Further testing shows that the underlying cause of the different test
>> results is a udp packet that has a bogus source port number.  In the
>> test the server process tries to send an ack to the bogus port and the
>> flow is disrupted.
>>
>> Notes:
>> - The packet with the bad source port is from a sendmsg() call that
>> has hit the connection tracker clash code introduced by 71d8c47fc653
>> - Packets are successfully sent after the bad one, from the same
>> socket, with the correct source port number
>> - The problem does not reproduce with 71d8c47fc653 reverted, or
>> without nf_conntrack loaded
>> - The bogus port numbers start at 1024, bumping up by 1 every few
>> times the problem occurs (1025, 1026, etc.)
>> - The patch above does not change the behaviour
>> - Enabling lockdep does not show anything
>>
>> Our workaround for the original race was to retry sendmsg() once on
>> EPERM errors, and that had been effective.
>> I can trigger the insertion clash easily with some simple test code,
>> but I have not been able so far to reproduce the packets with bad
>> source port numbers with some simpler code that I could share.
>
> The NAT nul-binding is triggering the source port mangling, even if
> there is no NAT rules in place. The following patch skips the clash
> resolution for NAT by now since we don't see a simple solution for
> this case at the moment.
>
> Could you give a try to this patch in these two cases?
>
> 1) No NAT in place: Make sure iptable_nat module is not there. Or if
>you're using nf_tables, just make sure you have no nat chains at
>all.
>
> 2) With NAT in place, you hit back the EPERM errors that you've
>observed so far.
>
> Please, test both scenarios and report back. Thanks.

Hi Pablo,

Testing out your patch:

1) With no NAT in place, the clash resolution happens, with no side
effects.  No EPERM errors are seen.

2) With ip(6)table_nat loaded, the clash resolution fails and I get
some EPERM errors from sendmsg(), same as before 71d8c47fc653.

Turns out that even though I have no NAT rules in my iptables config,
the system also had firewalld active and that caused the modules to be
loaded.

So the bottom line is that the patch looks good to me..

Thanks,
Marc

[PATCH v5 01/32] bluetooth: Switch SMP to crypto_cipher_encrypt_one()

2016-07-11 Thread Andy Lutomirski

SMP does ECB crypto on stack buffers.  This is complicated and
fragile, and it will not work if the stack is virtually allocated.

Switch to the crypto_cipher interface, which is simpler and safer.

Cc: Marcel Holtmann 
Cc: Gustavo Padovan 
Cc: Johan Hedberg 
Cc: "David S. Miller" 
Cc: linux-blueto...@vger.kernel.org
Cc: netdev@vger.kernel.org
Acked-by: Herbert Xu 
Acked-and-tested-by: Johan Hedberg 
Signed-off-by: Andy Lutomirski 
---
 net/bluetooth/smp.c | 67 ++---
 1 file changed, 28 insertions(+), 39 deletions(-)

diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c
index 50976a6481f3..4c1a16a96ae5 100644
--- a/net/bluetooth/smp.c
+++ b/net/bluetooth/smp.c
@@ -22,9 +22,9 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -88,7 +88,7 @@ struct smp_dev {
u8  min_key_size;
u8  max_key_size;
 
-   struct crypto_skcipher  *tfm_aes;
+   struct crypto_cipher*tfm_aes;
struct crypto_shash *tfm_cmac;
 };
 
@@ -127,7 +127,7 @@ struct smp_chan {
u8  dhkey[32];
u8  mackey[16];
 
-   struct crypto_skcipher  *tfm_aes;
+   struct crypto_cipher*tfm_aes;
struct crypto_shash *tfm_cmac;
 };
 
@@ -361,10 +361,8 @@ static int smp_h6(struct crypto_shash *tfm_cmac, const u8 
w[16],
  * s1 and ah.
  */
 
-static int smp_e(struct crypto_skcipher *tfm, const u8 *k, u8 *r)
+static int smp_e(struct crypto_cipher *tfm, const u8 *k, u8 *r)
 {
-   SKCIPHER_REQUEST_ON_STACK(req, tfm);
-   struct scatterlist sg;
uint8_t tmp[16], data[16];
int err;
 
@@ -378,7 +376,7 @@ static int smp_e(struct crypto_skcipher *tfm, const u8 *k, 
u8 *r)
/* The most significant octet of key corresponds to k[0] */
swap_buf(k, tmp, 16);
 
-   err = crypto_skcipher_setkey(tfm, tmp, 16);
+   err = crypto_cipher_setkey(tfm, tmp, 16);
if (err) {
BT_ERR("cipher setkey failed: %d", err);
return err;
@@ -387,16 +385,7 @@ static int smp_e(struct crypto_skcipher *tfm, const u8 *k, 
u8 *r)
/* Most significant octet of plaintextData corresponds to data[0] */
swap_buf(r, data, 16);
 
-   sg_init_one(, data, 16);
-
-   skcipher_request_set_tfm(req, tfm);
-   skcipher_request_set_callback(req, 0, NULL, NULL);
-   skcipher_request_set_crypt(req, , , 16, NULL);
-
-   err = crypto_skcipher_encrypt(req);
-   skcipher_request_zero(req);
-   if (err)
-   BT_ERR("Encrypt data error %d", err);
+   crypto_cipher_encrypt_one(tfm, data, data);
 
/* Most significant octet of encryptedData corresponds to data[0] */
swap_buf(data, r, 16);
@@ -406,7 +395,7 @@ static int smp_e(struct crypto_skcipher *tfm, const u8 *k, 
u8 *r)
return err;
 }
 
-static int smp_c1(struct crypto_skcipher *tfm_aes, const u8 k[16],
+static int smp_c1(struct crypto_cipher *tfm_aes, const u8 k[16],
  const u8 r[16], const u8 preq[7], const u8 pres[7], u8 _iat,
  const bdaddr_t *ia, u8 _rat, const bdaddr_t *ra, u8 res[16])
 {
@@ -455,7 +444,7 @@ static int smp_c1(struct crypto_skcipher *tfm_aes, const u8 
k[16],
return err;
 }
 
-static int smp_s1(struct crypto_skcipher *tfm_aes, const u8 k[16],
+static int smp_s1(struct crypto_cipher *tfm_aes, const u8 k[16],
  const u8 r1[16], const u8 r2[16], u8 _r[16])
 {
int err;
@@ -471,7 +460,7 @@ static int smp_s1(struct crypto_skcipher *tfm_aes, const u8 
k[16],
return err;
 }
 
-static int smp_ah(struct crypto_skcipher *tfm, const u8 irk[16],
+static int smp_ah(struct crypto_cipher *tfm, const u8 irk[16],
  const u8 r[3], u8 res[3])
 {
u8 _res[16];
@@ -759,7 +748,7 @@ static void smp_chan_destroy(struct l2cap_conn *conn)
kzfree(smp->slave_csrk);
kzfree(smp->link_key);
 
-   crypto_free_skcipher(smp->tfm_aes);
+   crypto_free_cipher(smp->tfm_aes);
crypto_free_shash(smp->tfm_cmac);
 
/* Ensure that we don't leave any debug key around if debug key
@@ -1359,9 +1348,9 @@ static struct smp_chan *smp_chan_create(struct l2cap_conn 
*conn)
if (!smp)
return NULL;
 
-   smp->tfm_aes = crypto_alloc_skcipher("ecb(aes)", 0, CRYPTO_ALG_ASYNC);
+   smp->tfm_aes = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC);
if (IS_ERR(smp->tfm_aes)) {
-   BT_ERR("Unable to create ECB crypto context");
+   BT_ERR("Unable to create AES crypto context");
kzfree(smp);
return NULL;
}
@@ -1369,7 +1358,7 @@ static struct smp_chan *smp_chan_create(struct l2cap_conn 
*conn)

Re: [patch net-next 1/2] devlink: add hardware messages tracing facility

2016-07-11 Thread Steven Rostedt

On Mon, 11 Jul 2016 15:18:47 +0200
Jiri Pirko  wrote:

> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index c99ffe8..865ade6 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -115,6 +115,8 @@ struct devlink *devlink_alloc(const struct devlink_ops 
> *ops, size_t priv_size);
>  int devlink_register(struct devlink *devlink, struct device *dev);
>  void devlink_unregister(struct devlink *devlink);
>  void devlink_free(struct devlink *devlink);
> +void devlink_trace_hwmsg(const struct devlink *devlink, bool incoming,
> +  unsigned long type, const u8 *buf, size_t len);
>  int devlink_port_register(struct devlink *devlink,
> struct devlink_port *devlink_port,
> unsigned int port_index);
> @@ -154,6 +156,12 @@ static inline void devlink_free(struct devlink *devlink)
>   kfree(devlink);
>  }
>  
> +static inline void devlink_trace_hwmsg(const struct devlink *devlink,
> +bool incoming, unsigned long type,
> +const u8 *buf, size_t len);
> +{
> +}
> +

I'm assuming the !CONFIG_NET_DEVLINK was never tested, because the
above probably wont build, and if it did, it would be wrong.

-- Steve

Re: [PATCH net] sock: ignore SCM_RIGHTS and SCM_CREDENTIALS in __sock_cmsg_send

2016-07-11 Thread Soheil Hassas Yeganeh

On Mon, Jul 11, 2016 at 4:39 PM, David Miller  wrote:
>
> From: Soheil Hassas Yeganeh 
> Date: Sun, 10 Jul 2016 12:51:46 -0400
>
> > From: Soheil Hassas Yeganeh 
> >
> > Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS
> > as a control message to TCP. Since __sock_cmsg_send does not
> > support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and
> > hence breaks pulse audio over TCP.
> >
> > SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer
> > but they semantically belong to SOL_UNIX. Since all
> > cmsg-processing functions including sock_cmsg_send ignore control
> > messages of other layers, it is best to ignore SCM_RIGHTS
> > and SCM_CREDENTIALS for consistency (and also for fixing pulse
> > audio over TCP).
> >
> > Signed-off-by: Soheil Hassas Yeganeh 
> > Reported-by: Sergei Trofimovich 
> > Tested-by: Sergei Trofimovich 
>
> Please resubmit this with a proper "Fixes: " tag which tells which
> commit introduced this regression.

Sorry David that I forgot the Fixes tag. I just updated the patch and
resubmitted.

Thanks!
Soheil

Re: [PATCH -next] bpf: make inode code explicitly non-modular

2016-07-11 Thread David Miller

From: Paul Gortmaker 
Date: Mon, 11 Jul 2016 12:51:01 -0400

> The Kconfig currently controlling compilation of this code is:
> 
> init/Kconfig:config BPF_SYSCALL
> init/Kconfig:   bool "Enable bpf() system call"
> 
> ...meaning that it currently is not being built as a module by anyone.
> 
> Lets remove the couple traces of modular infrastructure use, so that
> when reading the driver there is no doubt it is builtin-only.
> 
> Note that MODULE_ALIAS is a no-op for non-modular code.
> 
> We replace module.h with init.h since the file does use __init.
> 
> Cc: Alexei Starovoitov 
> Cc: netdev@vger.kernel.org
> Signed-off-by: Paul Gortmaker 

Applied.

Re: [PATCH][V2] nfp: check idx is -ENOSPC before using it is an index

2016-07-11 Thread David Miller

From: Colin King 
Date: Mon, 11 Jul 2016 16:54:20 +0100

> From: Colin Ian King 
> 
> idx can be returned as -ENOSPC, so we should check for this first
> before using it as an index into nn->vxlan_usecnt[] to avoid an
> out of bounds array offset read.
> 
> Signed-off-by: Colin Ian King 

Applied to net-next.

[PATCH v2 net] sock: ignore SCM_RIGHTS and SCM_CREDENTIALS in __sock_cmsg_send

2016-07-11 Thread Soheil Hassas Yeganeh

From: Soheil Hassas Yeganeh 

Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS
as a control message to TCP. Since __sock_cmsg_send does not
support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and
hence breaks pulse audio over TCP.

SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer
but they semantically belong to SOL_UNIX. Since all
cmsg-processing functions including sock_cmsg_send ignore control
messages of other layers, it is best to ignore SCM_RIGHTS
and SCM_CREDENTIALS for consistency (and also for fixing pulse
audio over TCP).

Fixes: c14ac9451c34 ("sock: enable timestamping using control messages")
Signed-off-by: Soheil Hassas Yeganeh 
Reported-by: Sergei Trofimovich 
Tested-by: Sergei Trofimovich 
Cc: Eric Dumazet 
Cc: Willem de Bruijn 
---
 net/core/sock.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index 08bf97e..b7f1263 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1938,6 +1938,10 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr 
*msg, struct cmsghdr *cmsg,
sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
sockc->tsflags |= tsflags;
break;
+   /* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
+   case SCM_RIGHTS:
+   case SCM_CREDENTIALS:
+   break;
default:
return -EINVAL;
}
-- 
2.8.0.rc3.226.g39d4020

Re: [PATCH v2] net: smc91x: ACPI Enable lan91x adapters

2016-07-11 Thread David Miller

From: Jeremy Linton 
Date: Mon, 11 Jul 2016 10:28:40 -0500

> Enable lan91x adapters in some ARM machines and models
> when booted with an ACPI kernel.
> 
> Signed-off-by: Jeremy Linton 

Applied to net-next, thanks.

Re: [PATCH v2 net] ipv6: addrconf: fix Juniper SSL VPN client regression

2016-07-11 Thread David Miller

From: Bjørn Mork 
Date: Mon, 11 Jul 2016 16:43:50 +0200

> The Juniper SSL VPN client use a "tun" interface and seems to
> be picky about visible changes.to it. Commit cc9da6cc4f56
> ("ipv6: addrconf: use stable address generator for ARPHRD_NONE")
> made such interfaces get an auto-generated IPv6 link local address
> by default, similar to most other interface types. This made the
> Juniper SSL VPN client fail for unknown reasons.
> 
> Fixing this regression by adding a new private netdevice flag
> which disables automatic IPv6 link local address generation, and
> making the flag default for "tun" devices.
> 
> Setting an explicit addrgenmode will disable the flag, so userspace
> can choose to enable automatic LL generation by selecting a suitable
> addrgenmode.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=121131
> Fixes: cc9da6cc4f56 ("ipv6: addrconf: use stable address generator for 
> ARPHRD_NONE")
> Reported-by: Valdis Kletnieks 
> Reported-by: Jonas Lippuner 
> Suggested-by: Hannes Frederic Sowa 
> Cc: 吉藤英明 
> Signed-off-by: Bjørn Mork 

What really irks me is that we "fixing" something without knowing what
actually is the problem.

Someone needs to figure out exactly what is making the Juniper thing
unhappy.  It really shouldn't care if a link local address is assigned
to the tun device, this is fundamental ipv6 stuff.

[PATCH] iwlwifi: add missing type declaration

2016-07-11 Thread Arnd Bergmann

The iwl-debug.h header relies in implicit inclusion of linux/device.h and
we get a lot of warnings without that:

drivers/net/wireless/intel/iwlwifi/iwl-debug.h:44:23: error: 'struct device' 
declared inside parameter list will not be visible outside of this definition 
or declaration [-Werror]
 void __iwl_err(struct device *dev, bool rfkill_prefix, bool only_trace,
   ^~
In file included from drivers/net/wireless/intel/iwlwifi/iwl-eeprom-read.h:66:0,
 from drivers/net/wireless/intel/iwlwifi/iwl-eeprom-read.c:68:
drivers/net/wireless/intel/iwlwifi/iwl-trans.h: In function 'iwl_trans_tx':
drivers/net/wireless/intel/iwlwifi/iwl-trans.h:1030:348: error: passing 
argument 1 of '__iwl_err' from incompatible pointer type 
[-Werror=incompatible-pointer-types]
   IWL_ERR(trans, "%s bad state = %d\n", __func__, trans->state);




^
In file included from drivers/net/wireless/intel/iwlwifi/iwl-eeprom-read.c:67:0:
drivers/net/wireless/intel/iwlwifi/iwl-debug.h:44:6: note: expected 'struct 
device *' but argument is of type 'struct device *'
 void __iwl_err(struct device *dev, bool rfkill_prefix, bool only_trace,
  ^

The easiest workaround is to just declare 'struct device' before its first use,
rather than including the entire header file.

Signed-off-by: Arnd Bergmann 
Fixes: 21cb3222fe56 ("iwlwifi: decouple PCIe transport from mac80211")
---
 drivers/net/wireless/intel/iwlwifi/iwl-debug.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-debug.h 
b/drivers/net/wireless/intel/iwlwifi/iwl-debug.h
index 110333208450..cd77c6971753 100644
--- a/drivers/net/wireless/intel/iwlwifi/iwl-debug.h
+++ b/drivers/net/wireless/intel/iwlwifi/iwl-debug.h
@@ -41,6 +41,7 @@ static inline bool iwl_have_debug_level(u32 level)
 #endif
 }
 
+struct device;
 void __iwl_err(struct device *dev, bool rfkill_prefix, bool only_trace,
const char *fmt, ...) __printf(4, 5);
 void __iwl_warn(struct device *dev, const char *fmt, ...) __printf(2, 3);
-- 
2.9.0

Re: [patch net-next 1/2] devlink: add hardware messages tracing facility

2016-07-11 Thread David Miller

From: Jiri Pirko 
Date: Mon, 11 Jul 2016 15:18:47 +0200

> From: Jiri Pirko 
> 
> Define a tracepoint and allow user to trace messages going to and from
> hardware associated with devlink instance.
> 
> Signed-off-by: Jiri Pirko 

Jiri, I don't think "having a devlink_ prefix" is a strong enough argument
for doing things specially here.

Just use the tracing facilities and export them the way everyone else
does, and the way that two people already have suggested.

Thanks.

Re: [PATCH net-next] drivers/net: fixup comments after "Future-proof tunnel offload handlers"

2016-07-11 Thread David Miller

From: Sabrina Dubroca 
Date: Mon, 11 Jul 2016 13:12:28 +0200

> Some comments weren't updated to reflect the renaming of ndo's and the
> change of arguments.
> 
> Signed-off-by: Sabrina Dubroca 

Applied, thanks Sabrina.

Re: [PATCHv2 net] ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space

2016-07-11 Thread David Miller

From: Julian Anastasov 
Date: Sun, 10 Jul 2016 21:11:55 +0300

> Vegard Nossum is reporting for a crash in fib_dump_info
> when nh_dev = NULL and fib_nhs == 1:
 ...
> $ addr2line -e vmlinux -i 0x602b3d18
> include/linux/inetdevice.h:222
> net/ipv4/fib_semantics.c:1264
> 
> Problem happens when RTNH_F_LINKDOWN is provided from user space
> when creating routes that do not use the flag, catched with
> netlink fuzzer.
> 
> Currently, the kernel allows user space to set both flags
> to nh_flags and fib_flags but this is not intentional, the
> assumption was that they are not set. Fix this by rejecting
> both flags with EINVAL.
> 
> Reported-by: Vegard Nossum 
> Fixes: 0eeb075fad73 ("net: ipv4 sysctl option to ignore routes when nexthop 
> link is down")
> Signed-off-by: Julian Anastasov 

Applied and queud up for -stable, thanks Julian.

[PATCH -next] ipv4: af_inet: make it explicitly non-modular

2016-07-11 Thread Paul Gortmaker

The Makefile controlling compilation of this file is obj-y,
meaning that it currently is never being built as a module.

Since MODULE_ALIAS is a no-op for non-modular code, we can simply
remove the MODULE_ALIAS_NETPROTO variant used here.

We replace module.h with kmod.h since the file does make use of
request_module() in order to load other modules from here.

We don't have to worry about init.h coming in via the removed
module.h since the file explicitly includes init.h already.

Cc: "David S. Miller" 
Cc: Alexey Kuznetsov 
Cc: James Morris 
Cc: Hideaki YOSHIFUJI 
Cc: Patrick McHardy 
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker 
---
 net/ipv4/af_inet.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index d39e9e47a26e..55513e654d79 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -73,7 +73,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -1916,6 +1916,3 @@ static int __init ipv4_proc_init(void)
return 0;
 }
 #endif /* CONFIG_PROC_FS */
-
-MODULE_ALIAS_NETPROTO(PF_INET);
-
-- 
2.8.4

Re: [PATCH net] sock: ignore SCM_RIGHTS and SCM_CREDENTIALS in __sock_cmsg_send

2016-07-11 Thread David Miller

From: Soheil Hassas Yeganeh 
Date: Sun, 10 Jul 2016 12:51:46 -0400

> From: Soheil Hassas Yeganeh 
> 
> Sergei Trofimovich reported that pulse audio sends SCM_CREDENTIALS
> as a control message to TCP. Since __sock_cmsg_send does not
> support SCM_RIGHTS and SCM_CREDENTIALS, it returns an error and
> hence breaks pulse audio over TCP.
> 
> SCM_RIGHTS and SCM_CREDENTIALS are sent on the SOL_SOCKET layer
> but they semantically belong to SOL_UNIX. Since all
> cmsg-processing functions including sock_cmsg_send ignore control
> messages of other layers, it is best to ignore SCM_RIGHTS
> and SCM_CREDENTIALS for consistency (and also for fixing pulse
> audio over TCP).
> 
> Signed-off-by: Soheil Hassas Yeganeh 
> Reported-by: Sergei Trofimovich 
> Tested-by: Sergei Trofimovich 

Please resubmit this with a proper "Fixes: " tag which tells which
commit introduced this regression.

Re: [PATCH v2 net] tcp: make challenge acks less predictable

2016-07-11 Thread David Miller

From: Eric Dumazet 
Date: Sun, 10 Jul 2016 10:04:02 +0200

> From: Eric Dumazet 
> 
> Yue Cao claims that current host rate limiting of challenge ACKS
> (RFC 5961) could leak enough information to allow a patient attacker
> to hijack TCP sessions. He will soon provide details in an academic
> paper.
> 
> This patch increases the default limit from 100 to 1000, and adds
> some randomization so that the attacker can no longer hijack
> sessions without spending a considerable amount of probes.
> 
> Based on initial analysis and patch from Linus.
> 
> Note that we also have per socket rate limiting, so it is tempting
> to remove the host limit in the future.
> 
> v2: randomize the count of challenge acks per second, not the period.
> 
> Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2")
> Reported-by: Yue Cao 
> Signed-off-by: Eric Dumazet 
> Suggested-by: Linus Torvalds 

Applied and queued up for -stable, thanks Eric.

Re: [PATCH RESEND] iwlwifi, Do not implement thermal zone unless ucode is loaded

2016-07-11 Thread Prarit Bhargava



On 07/11/2016 02:27 PM, Grumbach, Emmanuel wrote:
> On Mon, 2016-07-11 at 14:19 -0400, Prarit Bhargava wrote:
>>
>> On 07/11/2016 02:00 PM, Emmanuel Grumbach wrote:
>>> On Mon, Jul 11, 2016 at 6:18 PM, Prarit Bhargava >>
>>> This change is obviously completely broken. It simply disables the
>>> registration to thermal zone core.
>>
>> No it is not broken, and yes, that is exactly what should happen IMO.
>>
>> The problem is that the iwlwifi driver implements the thermal zone
>> even when the
>> device doesn't support it.
> 
> We implement thermal zone because we do support it, but the problem is
> that we need the firmware to be loaded for that. So you can argue that
> we should register *later* when the firmware is loaded. But this is
> really not helping all that much because the firmware can also be
> stopped at any time. So you'd want us to register / unregister the
> thermal zone anytime the firmware is loaded / unloaded?

You might have to do that.  I think that if the firmware enables a feature then
the act of loading the firmware should run the code that enables the feature.
IMO of course.

> I guess that works, but it seems wrong to me. Usually, registration
> should happen only upon INIT, and yes, at that time the firmware is not
> ready to provide the information yet.
> Maybe returning -EBUSY would help lm-sensors not to get confused?

I'll give that a shot, but I expect that won't work either as an error message
will still be displayed.

> 
>>
>> As can be seen in the current code base, iwl_mvm_tzone_get_temp()
>> will return
>> -EIO 100% of the time when the firmware doesn't support reading the
>> temperature[1].  In this case a read of sysfs will result in a return
>> of -EIO,
>> and this breaks existing userspace programs such as lm-sensors (which
>> by all
>> accounts is bad to do).
> 
> Right, but I don't understand why the userspace is broken because of
> that? 

Before the iwlwifi change, sensors successfully returned.  Now, because of the
error, it doesn't.

Unless we register / unregister anytime the firmware is loaded, I
> don't see any proper way to fix this. And yes, I'd expect the userspace
> to handle gracefully failures in its requests.

I agree with you in principle *and there's a great many things I wish userspace
would do gracefully* but updating the kernel shouldn't result in userspace
programs failing.

> 
>>
>> Note that in my patch I have removed the -EIO return in favor of not
>> registering
>> the non-existent thermal zone.  I'm not removing any functionality by
>> changing
>> this, nor am I adding functionality.  In both cases the thermal zone
>> is not
>> functional, and with my patch userspace continues to work.
> 
> You are removing the thermal zone functionality since even when the
> firmware will be loaded (which typically happens fairly quickly),
> thermal zone won't work.

Then I agree with your suggestion above that you need to enable the thermal zone
on a successful load of the firmware.  [Aside: I wonder what other drivers do in
this situation?  While this does seem like an odd case, I can't believe that the
iwlwifi driver is the only driver to enable features based on firmware.]

P.

Re: [patch net-next] MAINTAINERS: release Scott from being a rocker maintainer

2016-07-11 Thread David Miller

From: Jiri Pirko 
Date: Sun, 10 Jul 2016 09:42:44 +0200

> From: Jiri Pirko 
> 
> As requested by Scott, removing him.
> 
> Signed-off-by: Scott Feldman 
> Signed-off-by: Jiri Pirko 

Applied.

Re: [PATCH net-next v2] tunnels: correct conditional build of MPLS and IPv6

2016-07-11 Thread David Miller

From: Simon Horman 
Date: Sun, 10 Jul 2016 10:20:11 +0900

> Using a combination if #if conditionals and goto labels to unwind
> tunnel4_init seems unwieldy. This patch takes a simpler approach of
> directly unregistering previously registered protocols when an error
> occurs.
> 
> This fixes a number of problems with the current implementation
> including the potential presence of labels when they are unused
> and the potential absence of unregister code when it is needed.
> 
> Fixes: 8afe97e5d416 ("tunnels: support MPLS over IPv4 tunnels")
> Signed-off-by: Simon Horman 

Applied, thanks Simon.

Re: [PATCH net-next 0/6] sctp: implement rfc7496 in sctp

2016-07-11 Thread David Miller

From: Marcelo Ricardo Leitner 
Date: Mon, 11 Jul 2016 13:15:29 -0300

> On Sat, Jul 09, 2016 at 07:47:39PM +0800, Xin Long wrote:
>> This patchset implements "Additional Policies for the Partially Reliable
>> Stream Control Transmission Protocol Extension" described on RFC7496.
>> 
>> The Partially Reliable SCTP (PR-SCTP) extension defined in [RFC3758]
>> provides a generic method for senders to abandon user messages. The
>> decision to abandon a user message is sender side only, and the exact
>> condition is called a "PR-SCTP policy". This patchset implements 3
>> policies:
>> 
>>  1. Timed Reliability:  This allows the sender to specify a timeout for
>> a user message after which the SCTP stack abandons the user message.
>> 
>>  2. Limited Retransmission Policy:  Allows limitation of the number of
>> retransmissions.
>> 
>>  3. Priority Policy:  Allows removal of lower-priority messages if space
>> for higher-priority messages is needed in the send buffer.
>> 
>> Patch 1-3 add some sockopts in sctp to set/get pr_sctp policy status.
>> Patch 4-6 implement these 3 policies one by one.
>> 
>> Xin Long (6):
>>   sctp: add SCTP_PR_SUPPORTED on sctp sockopt
>>   sctp: add SCTP_DEFAULT_PRINFO into sctp sockopt
>>   sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt
>>   sctp: implement prsctp TTL policy
>>   sctp: implement prsctp RTX policy
>>   sctp: implement prsctp PRIO policy
> 
> Acked-by: Marcelo Ricardo Leitner 

Series applied.

[PATCH net 3/3] tipc: reset all unicast links when broadcast send link fails

2016-07-11 Thread Jon Maloy

In test situations with many nodes and a heavily stressed system we have
observed that the transmission broadcast link may fail due to an
excessive number of retransmissions of the same packet. In such
situations we need to reset all unicast links to all peers, in order to
reset and re-synchronize the broadcast link.

In this commit, we add a new function tipc_bearer_reset_all() to be used
in such situations. The function scans across all bearers and resets all
their pertaining links.

Acked-by: Ying Xue 
Signed-off-by: Jon Maloy 
---
 net/tipc/bearer.c | 15 +++
 net/tipc/bearer.h |  1 +
 net/tipc/node.c   | 15 +++
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index bf8f05c..a597708 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -330,6 +330,21 @@ static int tipc_reset_bearer(struct net *net, struct 
tipc_bearer *b)
return 0;
 }
 
+/* tipc_bearer_reset_all - reset all links on all bearers
+ */
+void tipc_bearer_reset_all(struct net *net)
+{
+   struct tipc_net *tn = tipc_net(net);
+   struct tipc_bearer *b;
+   int i;
+
+   for (i = 0; i < MAX_BEARERS; i++) {
+   b = rcu_dereference_rtnl(tn->bearer_list[i]);
+   if (b)
+   tipc_reset_bearer(net, b);
+   }
+}
+
 /**
  * bearer_disable
  *
diff --git a/net/tipc/bearer.h b/net/tipc/bearer.h
index f686e41..60e49c3 100644
--- a/net/tipc/bearer.h
+++ b/net/tipc/bearer.h
@@ -198,6 +198,7 @@ void tipc_bearer_add_dest(struct net *net, u32 bearer_id, 
u32 dest);
 void tipc_bearer_remove_dest(struct net *net, u32 bearer_id, u32 dest);
 struct tipc_bearer *tipc_bearer_find(struct net *net, const char *name);
 struct tipc_media *tipc_media_find(const char *name);
+void tipc_bearer_reset_all(struct net *net);
 int tipc_bearer_setup(void);
 void tipc_bearer_cleanup(void);
 void tipc_bearer_stop(struct net *net);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index e01e2c71..23d4761 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1297,10 +1297,6 @@ static void tipc_node_bc_rcv(struct net *net, struct 
sk_buff *skb, int bearer_id
 
rc = tipc_bcast_rcv(net, be->link, skb);
 
-   /* Broadcast link reset may happen at reassembly failure */
-   if (rc & TIPC_LINK_DOWN_EVT)
-   tipc_node_reset_links(n);
-
/* Broadcast ACKs are sent on a unicast link */
if (rc & TIPC_LINK_SND_BC_ACK) {
tipc_node_read_lock(n);
@@ -1320,6 +1316,17 @@ static void tipc_node_bc_rcv(struct net *net, struct 
sk_buff *skb, int bearer_id
spin_unlock_bh(>inputq2.lock);
tipc_sk_mcast_rcv(net, >arrvq, >inputq2);
}
+
+   if (rc & TIPC_LINK_DOWN_EVT) {
+   /* Reception reassembly failure => reset all links to peer */
+   if (!tipc_link_is_up(be->link))
+   tipc_node_reset_links(n);
+
+   /* Retransmission failure => reset all links to all peers */
+   if (!tipc_link_is_up(tipc_bc_sndlink(net)))
+   tipc_bearer_reset_all(net);
+   }
+
tipc_node_put(n);
 }
 
-- 
1.9.1

[PATCH net 2/3] tipc: ensure correct broadcast send buffer release when peer is lost

2016-07-11 Thread Jon Maloy

After a new receiver peer has been added to the broadcast transmission
link, we allow immediate transmission of new broadcast packets, trusting
that the new peer will not accept the packets until it has received the
previously sent unicast broadcast initialiation message. In the same
way, the sender must not accept any acknowledges until it has itself
received the broadcast initialization from the peer, as well as
confirmation of the reception of its own initialization message.

Furthermore, when a receiver peer goes down, the sender has to produce
the missing acknowledges from the lost peer locally, in order ensure
correct release of the buffers that were expected to be acknowledged by
the said peer.

In a highly stressed system we have observed that contact with a peer
may come up and be lost before the above mentioned broadcast initial-
ization and confirmation have been received. This leads to the locally
produced acknowledges being rejected, and the non-acknowledged buffers
to linger in the broadcast link transmission queue until it fills up
and the link goes into permanent congestion.

In this commit, we remedy this by temporarily setting the corresponding
broadcast receive link state to ESTABLISHED and the 'bc_peer_is_up'
state to true before we issue the local acknowledges. This ensures that
those acknowledges will always be accepted. The mentioned state values
are restored immediately afterwards when the link is reset.

Acked-by: Ying Xue 
Signed-off-by: Jon Maloy 
---
 net/tipc/link.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 6483dc4..7d89f87 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -349,6 +349,8 @@ void tipc_link_remove_bc_peer(struct tipc_link *snd_l,
u16 ack = snd_l->snd_nxt - 1;
 
snd_l->ackers--;
+   rcv_l->bc_peer_is_up = true;
+   rcv_l->state = LINK_ESTABLISHED;
tipc_link_bc_ack_rcv(rcv_l, ack, xmitq);
tipc_link_reset(rcv_l);
rcv_l->state = LINK_RESET;
-- 
1.9.1

[PATCH net 0/3] tipc: three small fixes

2016-07-11 Thread Jon Maloy

Fixes for some broadcast link problems that may occur in large systems.

Jon Maloy (3):
  tipc: extend broadcast link initialization criteria
  tipc: ensure correct broadcast send buffer release when peer is lost
  tipc: reset all unicast links when broadcast send link fails

 net/tipc/bearer.c | 15 +++
 net/tipc/bearer.h |  1 +
 net/tipc/link.c   |  9 -
 net/tipc/node.c   | 15 +++
 4 files changed, 35 insertions(+), 5 deletions(-)

-- 
1.9.1

[PATCH net 1/3] tipc: extend broadcast link initialization criteria

2016-07-11 Thread Jon Maloy

At first contact between two nodes, an endpoint might sometimes have
time to send out a LINK_PROTOCOL/STATE packet before it has received
the broadcast initialization packet from the peer, i.e., before it has
received a valid broadcast packet number to add to the 'bc_ack' field
of the protocol message.

This means that the peer endpoint will receive a protocol packet with an
invalid broadcast acknowledge value of 0. Under unlucky circumstances
this may lead to the original, already received acknowledge value being
overwritten, so that the whole broadcast link goes stale after a while.

We fix this by delaying the setting of the link field 'bc_peer_is_up'
until we know that the peer really has received our own broadcast
initialization message. The latter is always sent out as the first
unicast message on a link, and always with seqeunce number 1. Because
of this, we only need to look for a non-zero unicast acknowledge value
in the arriving STATE messages, and once that is confirmed we know we
are safe and can set the mentioned field. Before this moment, we must
ignore all broadcast acknowledges from the peer.

Acked-by: Ying Xue 
Signed-off-by: Jon Maloy 
---
 net/tipc/link.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 67b6ab9..6483dc4 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1559,7 +1559,12 @@ void tipc_link_bc_sync_rcv(struct tipc_link *l, struct 
tipc_msg *hdr,
if (!msg_peer_node_is_up(hdr))
return;
 
-   l->bc_peer_is_up = true;
+   /* Open when peer ackowledges our bcast init msg (pkt #1) */
+   if (msg_ack(hdr))
+   l->bc_peer_is_up = true;
+
+   if (!l->bc_peer_is_up)
+   return;
 
/* Ignore if peers_snd_nxt goes beyond receive window */
if (more(peers_snd_nxt, l->rcv_nxt + l->window))
-- 
1.9.1

Re: [PATCH v2 net-next] rtnl: Add GFP flag argument to rtnl_unicast()

2016-07-11 Thread David Miller

From: Masashi Honma 
Date: Sat,  9 Jul 2016 12:59:04 +0900

> This commit extends rtnl_unicast() to specify GFP flags.
> 
> This commit depends on Eric Dumazet's commits below.
>   ipv4: do not abuse GFP_ATOMIC in inet_netconf_notify_devconf()
>   ipv6: do not abuse GFP_ATOMIC in inet6_netconf_notify_devconf()
> 
> Signed-off-by: Masashi Honma 

The code is correct and optimal as-is.  There is no gain to your
changes.  gfp_any() will do the right thing.

In fact, your change makes the code more error prone because if any
of these code paths get moved into an atomic context they will break
unless somone remembers to also fix up the GFP flags.

Meanwhile with the existing use of gfp_any() it will work
transparently in such a situation.

I'm not applying this.

Re: [PATCH net] tcp_timer.c: Add kernel-doc function descriptions

2016-07-11 Thread Edward Cree

On 08/07/16 21:58, Richard Sailer wrote:
> This adds kernel-doc style descriptions for 6 functions and
> fixes 1 typo.
>
> Signed-off-by: Richard Sailer 
> ---
>  net/ipv4/tcp_timer.c | 66 
> +---
>  1 file changed, 57 insertions(+), 9 deletions(-)
>
> diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
> index debdd8b..bdccd67 100644
> --- a/net/ipv4/tcp_timer.c
> +++ b/net/ipv4/tcp_timer.c
> @@ -350,10 +389,18 @@ static void tcp_fastopen_synack_timer(struct sock *sk)
> TCP_TIMEOUT_INIT << req->num_timeout, TCP_RTO_MAX);
>  }
>  
> -/*
> - *   The TCP retransmit timer.
> - */
>  
> +/**
> + * tcp_retransmit_timer() - The TCP retransmit timout handler.
"timeout"
-Ed

Re: [PATCH] Revert "net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings"

2016-07-11 Thread David Miller

From: Philippe Reynes 
Date: Sat,  9 Jul 2016 00:54:47 +0200

> This reverts commit 4386f5662e63 ("net: ethernet: bcmgenet: use
> phy_ethtool_{get|set}_link_ksettings")
> 
> This patch is wrong, the function phy_ethtool_{get|set}_link_ksettings
> don't check if the device is running, but the driver bcmgenet need this
> check.
> 
> The function {get|set}_settings need to access the mdio bus, and this
> bus may only be used when the device is running. Otherwise, the clock
> is disable and a mdio access will fail.
> 
> Signed-off-by: Philippe Reynes 

Applied.

Re: [PATCH net] tcp_timer.c: Add kernel-doc function descriptions

2016-07-11 Thread David Miller

From: Richard Sailer 
Date: Fri,  8 Jul 2016 22:58:26 +0200

>  
> +/**
> + *   tcp_write_err() - close socket and save error info.
> + *   @sk:  The socket the error has appeared on.
> + *
> + *   Returns: Nothing (void)
> + */
> +
...
> +/**
> + * tcp_out_of_resources() - Close socket if out of resources
> + * @sk:pointer to current socket
> + * @do_reset:  send a last packet with reset flag
> + *
> + * Do not allow orphaned sockets to eat all our resources.
>   * This is direct violation of TCP specs, but it is required
>   * to prevent DoS attacks. It is called when a retransmission timeout
>   * or zero probe timeout occurs on orphaned socket.

Please indent your comments consistently.

RE: [net-next 0/6] common library for Chelsio drivers

2016-07-11 Thread Steve Wise

> > Hi,
> >
> >  This patch series adds common library module(libcxgb.ko)
> > for Chelsio drivers to remove duplicate code.
> >
> > This series moves common iSCSI DDP Page Pod manager
> > code from cxgb4.ko to libcxgb.ko, earlier this code
> > was used by only cxgbit.ko now it is used by
> > three Chelsio iSCSI drivers cxgb3i, cxgb4i, cxgbit.
> >
> > In future this module will have common connection
> > management and hardware specific code that can
> > be shared by multiple Chelsio drivers(cxgb4,
> > csiostor, iw_cxgb4, cxgb4i, cxgbit).
> >
> > Please review.
> 
> As currently implemented the user is prompted for the Kconfig symbol
> for the library.  That really needs to be hidden from the user and
> they should be able to select these drivers without having to know
> about this implementation detail at all.

Maybe adding something like:

select CHELSIO_LIB

to the Kconfig files for the drivers that are depenedent on CHELSIO_LIB?

Re: [PATCH net-next 0/2] net: dsa: b53: Add Broadcom NSP switch support

2016-07-11 Thread David Miller

From: Florian Fainelli 
Date: Fri,  8 Jul 2016 11:39:11 -0700

> This patch series updates the B53 driver to support Broadcom's Northstar Plus
> Soc integrated switch.
> 
> Unlike the version of the core present in BCM5301x/Northstar, we cannot read 
> the
> full chip id of the switch, so we need to get the information about our switch
> id from Device Tree.
> 
> Other than that, this is a regular Broadcom Ethernet switch which is register
> compatible for all practical purposes with the existing switch driver.
> 
> Since DSA requires a working CPU Ethernet MAC driver this depends on Jon
> Mason's AMAC/BGMAC driver changes to support NSP. Board specific changes 
> depend
> on patches present in Broadcom's ARM SoC branches and will be posted in a 
> short
> while.

Series applied, thanks.

Re: [net-next 0/6] common library for Chelsio drivers

2016-07-11 Thread David Miller

From: Varun Prakash 
Date: Fri,  8 Jul 2016 23:03:53 +0530

> Hi,
> 
>  This patch series adds common library module(libcxgb.ko)
> for Chelsio drivers to remove duplicate code.
> 
> This series moves common iSCSI DDP Page Pod manager
> code from cxgb4.ko to libcxgb.ko, earlier this code
> was used by only cxgbit.ko now it is used by
> three Chelsio iSCSI drivers cxgb3i, cxgb4i, cxgbit.
> 
> In future this module will have common connection
> management and hardware specific code that can
> be shared by multiple Chelsio drivers(cxgb4,
> csiostor, iw_cxgb4, cxgb4i, cxgbit).
> 
> Please review.

As currently implemented the user is prompted for the Kconfig symbol
for the library.  That really needs to be hidden from the user and
they should be able to select these drivers without having to know
about this implementation detail at all.

4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start_xmit

2016-07-11 Thread nuclearcat


Hi

On latest kernel i noticed kernel panic happening 1-2 times per day. It 
is also happening on older kernel (at least 4.5.3).


Panic message received over netconsole:

[42916.416307] skbuff: skb_under_panic: text:a00e8ce5 len:581 
put:2 head:8800b0bf2800 data:ffa00500b0bf284c tail:0x291 end:0x6c0 
dev:ppp2828

[42916.416677] [ cut here ]
[42916.416876] kernel BUG at net/core/skbuff.c:104!
[42916.417075] invalid opcode:  [#1]
SMP

[42916.417388] Modules linked in:
cls_fw
act_police
cls_u32
sch_ingress
sch_sfq
sch_htb
netconsole
configfs
coretemp
nf_nat_pptp
nf_nat_proto_gre
nf_conntrack_pptp
nf_conntrack_proto_gre
pppoe
pppox
ppp_generic
slhc
tun
xt_REDIRECT
nf_nat_redirect
xt_TCPMSS
ipt_REJECT
nf_reject_ipv4
xt_set
ts_bm
xt_string
xt_connmark
xt_DSCP
xt_mark
xt_tcpudp
ip_set_hash_net
ip_set_hash_ip
ip_set
nfnetlink
iptable_mangle
iptable_filter
iptable_nat
nf_conntrack_ipv4
nf_defrag_ipv4
nf_nat_ipv4
nf_nat
nf_conntrack
ip_tables
x_tables
8021q
garp
mrp
stp
llc


 [42916.421443] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.6.3-build-0105 #4
 [42916.421643] Hardware name: HP ProLiant DL320e Gen8 v2, BIOS P80 
04/02/2015
 [42916.421842] task: 8200b500 ti: 8200 task.ti: 
8200

 [42916.422178] RIP: 0010:[]
 [] skb_panic+0x49/0x4b
 [42916.422574] RSP: 0018:880447403da8  EFLAGS: 00010296
 [42916.422773] RAX: 0089 RBX: 880422c13900 RCX: 

 [42916.422974] RDX: 88044740df50 RSI: 88044740c908 RDI: 
88044740c908
 [42916.423175] RBP: 880447403dc8 R08: 0001 R09: 

 [42916.423439] R10: 820050c0 R11: 88041c7ee900 R12: 
880423037000
 [42916.423640] R13:  R14: 880423037000 R15: 

 [42916.423841] FS:  () GS:88044740() 
knlGS:

 [42916.424179] CS:  0010 DS:  ES:  CR0: 80050033
 [42916.424379] CR2: 7effd0814b00 CR3: 000430ab2000 CR4: 
001406f0

 [42916.424577] Stack:
 [42916.424772]  ffa00500b0bf284c
 0291
 06c0
 880423037000

 [42916.425333]  880447403dd8
 81843786
 880447403e00
 a00e8ce5

 [42916.425898]  880422c13900
 8800ae7c6c00
 820b3210
 880447403e68

 [42916.426463] Call Trace:
 [42916.426658]  

 [42916.426719]  [] skb_push+0x36/0x37
 [42916.427111]  [] ppp_start_xmit+0x10f/0x150 
[ppp_generic]

 [42916.427314]  [] dev_hard_start_xmit+0x25a/0x2d3
 [42916.427516]  [] ? 
validate_xmit_skb.isra.107.part.108+0x11d/0x238

 [42916.427858]  [] sch_direct_xmit+0x89/0x1b5
 [42916.428060]  [] __qdisc_run+0x133/0x170
 [42916.428261]  [] net_tx_action+0xe3/0x148
 [42916.428462]  [] __do_softirq+0xb9/0x1a9
 [42916.428663]  [] irq_exit+0x37/0x7c
 [42916.428862]  [] smp_apic_timer_interrupt+0x3d/0x48
 [42916.429063]  [] apic_timer_interrupt+0x7c/0x90
 [42916.429263]  

 [42916.429324]  [] ? mwait_idle+0x68/0x7e
 [42916.429719]  [] ? 
atomic_notifier_call_chain+0x13/0x15

 [42916.429921]  [] arch_cpu_idle+0xa/0xc
 [42916.430121]  [] default_idle_call+0x27/0x29
 [42916.430323]  [] cpu_startup_entry+0x115/0x1bf
 [42916.430526]  [] rest_init+0x72/0x74
 [42916.430727]  [] start_kernel+0x3b7/0x3c4
 [42916.430929]  [] 
x86_64_start_reservations+0x2a/0x2c

 [42916.431130]  [] x86_64_start_kernel+0xbb/0xbe
 [42916.431332] Code:
 78
 50
 8b
 87
 c0
 00
 00
 00
 50
 8b
 87
 bc
 00
 00
 00
 50
 ff
 b7
 d0
 00
 00
 00
 31
 c0
 4c
 8b
 8f
 c8
 00
 00
 00
 48
 c7
 c7
 49
 10
 e1
 81
 e8
 0e
 60
 8e
 ff
 0b
 48
 8b
 97
 d0
 00
 00
 00
 89
 f0
 01
 77
 78
 48
 29
 c2
 48
 3b
 97
 c8

 [42916.435514] RIP
 [] skb_panic+0x49/0x4b
 [42916.439115]  RSP 
 [42916.439336] ---[ end trace d7bfed0177be96d1 ]---
 [42916.445801] Kernel panic - not syncing: Fatal exception in interrupt
 [42916.446005] Kernel Offset: disabled
 [42916.477266] Rebooting in 5 seconds..

Re: [PATCH net] udp: prevent bugcheck if filter truncates packet too much

2016-07-11 Thread David Miller

From: Michal Kubecek 
Date: Fri,  8 Jul 2016 17:52:33 +0200 (CEST)

> If socket filter truncates an udp packet below the length of UDP header
> in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
> BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
> kernel is configured that way) can be easily enforced by an unprivileged
> user which was reported as CVE-2016-6162. For a reproducer, see
> http://seclists.org/oss-sec/2016/q3/8
> 
> Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
> Reported-by: Marco Grassi 
> Signed-off-by: Michal Kubecek 

Applied and queued up for -stable, thanks.

Re: [PATCH] bnxt_en: initialize rc to zero to avoid returning garbage

2016-07-11 Thread David Miller

From: Colin King 
Date: Fri,  8 Jul 2016 16:42:48 +0100

> From: Colin Ian King 
> 
> rc is not initialized so it can contain garbage if it is not
> set by the call to bnxt_read_sfp_module_eeprom_info. Ensure
> garbage is not returned by initializing rc to 0.
> 
> Signed-off-by: Colin Ian King 

Applied, thanks.

Re: [PATCH 3/3] crypto: Added Chelsio Menu to the Kconfig file

2016-07-11 Thread kbuild test robot

Hi,

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.7-rc7 next-20160711]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Yeshaswi-M-R-Gowda/crypto-chcr-Add-Chelsio-Crypto-Driver/20160712-023513
config: sh-allmodconfig (attached as .config)
compiler: sh4-linux-gnu-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sh 

All warnings (new ones prefixed by >>):

warning: (ISCSI_TARGET_CXGB4) selects CHELSIO_T4_UWIRE which has unmet direct 
dependencies (NETDEVICES && ETHERNET && NET_VENDOR_CHELSIO && CHELSIO_T4)
warning: (SCSI_CXGB4_ISCSI && CRYPTO_DEV_CHELSIO) selects CHELSIO_T4 which has 
unmet direct dependencies (NETDEVICES && ETHERNET && NET_VENDOR_CHELSIO && PCI 
&& (IPV6 || IPV6=n))

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data

Re: [PATCH 0/7] pull request for net: batman-adv 2016-07-08

2016-07-11 Thread David Miller

From: Simon Wunderlich 
Date: Fri,  8 Jul 2016 11:49:15 +0200

> here are some more bugfix patches which we would like to have integrated
> into net, if that is still possible!
> 
> Please pull or let me know of any problem!

Pulled, thanks Simon.

RE: [Intel-wired-lan] [PATCH] net: ethernet: intel: fm10k: Remove create_workqueue

2016-07-11 Thread Singh, Krishneil K

-Original Message-
From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On 
Behalf Of Bhaktipriya Shridhar
Sent: Wednesday, June 1, 2016 8:40 AM
To: Kirsher, Jeffrey T 
Cc: Tejun Heo ; netdev@vger.kernel.org; 
intel-wired-...@lists.osuosl.org; linux-ker...@vger.kernel.org
Subject: [Intel-wired-lan] [PATCH] net: ethernet: intel: fm10k: Remove 
create_workqueue

alloc_workqueue replaces deprecated create_workqueue().

A dedicated workqueue has been used since the workitem (viz fm10k_service_task, 
which manages and runs other subtasks) is involved in normal device operation 
and requires forward progress under memory pressure.

create_workqueue has been replaced with alloc_workqueue with max_active as 0 
since there is no need for throttling the number of active work items.

Since network devices may be used in memory reclaim path, WQ_MEM_RECLAIM has 
been set to guarantee forward progress.

flush_workqueue is unnecessary since destroy_workqueue() itself calls
drain_workqueue() which flushes repeatedly till the workqueue becomes empty. 
Hence the call to flush_workqueue() has been dropped.

Signed-off-by: Bhaktipriya Shridhar 
---

Tested-by: Krishneil Singh

Re: [PATCH v2 net] ipv6: addrconf: fix Juniper SSL VPN client regression

2016-07-11 Thread Bjørn Mork

valdis.kletni...@vt.edu writes:

> Tested against next-20160708, and the Juniper code works fine. Feel free
> to stick a Tested-By: on the V2 patch...

Thanks to both of you for verifying that it solves the Juniper problem.

A tip: Patchworks is nice enough to automatically pick up tags from
review comments, as long as the tags start at column 0.  I haven't tried
multiple tags, but let's see if this works:

Tested-by: Jonas Lippuner 
Tested-by: Valdis Kletnieks 

Bjørn

Re: [PATCH 2/3] chcr: Support for Chelsio's Crypto Hardware

2016-07-11 Thread Joe Perches

On Mon, 2016-07-11 at 11:28 -0700, Yeshaswi M R Gowda wrote:
> The Chelsio's Crypto Hardware can perform the following operations:
> SHA1, SHA224, SHA256, SHA384 and SHA512, HMAC(SHA1), HMAC(SHA224),
> HMAC(SHA256), HMAC(SHA384), HAMC(SHA512), AES-128-CBC, AES-192-CBC,
> AES-256-CBC, AES-128-XTS, AES-256-XTS
> 
> This patch implements the driver for above mentioned features.

trivial notes:

> diff --git a/drivers/crypto/chelsio/chcr_algo.c 
> b/drivers/crypto/chelsio/chcr_algo.c
[]
> +int chcr_handle_resp(struct crypto_async_request *req, unsigned char *input,
> +  int error_status)
> +{
[]
> + case CRYPTO_ALG_TYPE_BLKCIPHER:
> + ctx_req.req.ablk_req = (struct ablkcipher_request *)req;
> + ctx_req.ctx.ablk_ctx =
> + ablkcipher_request_ctx(ctx_req.req.ablk_req);
> + if (error_status)
> + goto dma_unmap_blkcipher;
> + fw6_pld = (struct cpl_fw6_pld *)input;
> + memcpy(ctx_req.req.ablk_req->info, _pld->data[2],
> +    AES_BLOCK_SIZE);
> +dma_unmap_blkcipher:
> + dma_unmap_sg(_ctx->lldi.pdev->dev, ctx_req.req.ablk_req->dst,
> +  ABLK_CTX(ctx)->dst_nents, DMA_FROM_DEVICE);
> + if (ctx_req.ctx.ablk_ctx->skb) {
> + kfree_skb(ctx_req.ctx.ablk_ctx->skb);
> + ctx_req.ctx.ablk_ctx->skb = NULL;
> + }
> + break;

This case label is only used here right?

This would be better without the goto

[]

> + if (IS_ERR(base_hash)) {
> + pr_err("Can not allocate sha-generic algo.\n");
> + return (void *)base_hash;
> + }

Please add
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
before any #include to prefix any pr_ uses.

[]
> +/*
> + *   chcr_register_alg - Register crypto algorithms with kernel framework.
> + */
> +static int chcr_register_alg(void)
> +{
> + struct crypto_alg ai;
> + int err = 0, i;
> + char *name = NULL;
> +
> + for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
> + if (driver_algs[i].is_registered)
> + continue;
> + switch (driver_algs[i].type & CRYPTO_ALG_TYPE_MASK) {
> + case CRYPTO_ALG_TYPE_ABLKCIPHER:
> + err = crypto_register_alg(_algs[i].alg.crypto);
> + name = driver_algs[i].alg.crypto.cra_driver_name;
> + break;
> + case CRYPTO_ALG_TYPE_AHASH:

This could be clearer with a temporary for
driver_algs[i].alg.hash

 hash = _algs[i].alg.hash;

> + driver_algs[i].alg.hash.update = chcr_ahash_update;

hash->update = chcr_ahash_update;
etc...

> + driver_algs[i].alg.hash.final = chcr_ahash_final;
> + driver_algs[i].alg.hash.finup = chcr_ahash_finup;
> + driver_algs[i].alg.hash.digest = chcr_ahash_digest;
> + driver_algs[i].alg.hash.export = chcr_ahash_export;
> + driver_algs[i].alg.hash.import = chcr_ahash_import;
> + driver_algs[i].alg.hash.halg.statesize =
> + sizeof(struct chcr_ahash_req_ctx);

Even with this sort of change, a lot of barely >80 column lines
are split making the code a bit less readable.

It might be better to avoid splitting these long lines and
ignore the >80 column limits occasionally.

Re: [PATCH nf-next 3/3] netfilter: replace list_head with single linked list

2016-07-11 Thread Aaron Conole

Thanks for this;  I will send a v2 in the next two days.

-Aaron

Florian Westphal  writes:

> Aaron Conole  wrote:
>> --- a/net/netfilter/core.c
>> +++ b/net/netfilter/core
> [..]
>> +#define nf_entry_dereference(e) \
>> +rcu_dereference_protected(e, lockdep_is_held(_hook_mutex))
>>  
>> -static struct list_head *nf_find_hook_list(struct net *net,
>> -   const struct nf_hook_ops *reg)
>> +static struct nf_hook_entry *nf_find_hook_list(struct net *net,
>> +   const struct nf_hook_ops *reg)
>>  {
>> -struct list_head *hook_list = NULL;
>> +struct nf_hook_entry *hook_list = NULL;
>>  
>>  if (reg->pf != NFPROTO_NETDEV)
>> -hook_list = >nf.hooks[reg->pf][reg->hooknum];
>> +hook_list = rcu_dereference(net->nf.hooks[reg->pf]
>> +[reg->hooknum]);
>>  else if (reg->hooknum == NF_NETDEV_INGRESS) {
>>  #ifdef CONFIG_NETFILTER_INGRESS
>>  if (reg->dev && dev_net(reg->dev) == net)
>> -hook_list = >dev->nf_hooks_ingress;
>> +hook_list =
>> +rcu_dereference(reg->dev->nf_hooks_ingress);
>
> Both of these should use nf_entry_dereference() to avoid the lockdep
> splat reported by kbuild robot:
>
> net/netfilter/core.c:75 suspicious rcu_dereference_check() usage!
> 2 locks held by swapper/1:
> #0:  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
> #1: (nf_hook_mutex){+.+...}, at: []
> nf_register_net_hook+0xcb/0x240
>

RE: [Intel-wired-lan] [PATCH] ixgbe: always initialize setup_fc

2016-07-11 Thread Tantilov, Emil S

>-Original Message-
>From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org] On
>Behalf Of Patrick McLean
>Sent: Friday, July 01, 2016 6:31 PM
>To: Kirsher, Jeffrey T 
>Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org
>Subject: [Intel-wired-lan] [PATCH] ixgbe: always initialize setup_fc
>
>In ixgbe_init_mac_link_ops_X550em, the code has a special case for
>backplane media type, but does not fall through to the default case, so the
>setup_fc never gets initialized. This causes a panic when it later tries to
>set up the card, and the kernel dereferences the null pointer.
>
>This patch lets the the function fall through, which initialized setup_fc
>properly.

Why are you resending this patch? I have already submitted a patch to handle 
this properly:
http://patchwork.ozlabs.org/patch/646228/

Thanks,
Emil

Re: [patch net-next 1/2] devlink: add hardware messages tracing facility

2016-07-11 Thread Jiri Pirko

Mon, Jul 11, 2016 at 06:08:14PM CEST, rost...@goodmis.org wrote:
>On Mon, 11 Jul 2016 15:18:47 +0200
>Jiri Pirko  wrote:
>
>> From: Jiri Pirko 
>> 
>> Define a tracepoint and allow user to trace messages going to and from
>> hardware associated with devlink instance.
>> 
>> Signed-off-by: Jiri Pirko 
>> ---
>>  include/net/devlink.h  |  8 +++
>>  include/trace/events/devlink.h | 49 
>> ++
>>  net/core/devlink.c |  9 
>>  3 files changed, 66 insertions(+)
>>  create mode 100644 include/trace/events/devlink.h
>> 
>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>> index c99ffe8..865ade6 100644
>> --- a/include/net/devlink.h
>> +++ b/include/net/devlink.h
>> @@ -115,6 +115,8 @@ struct devlink *devlink_alloc(const struct devlink_ops 
>> *ops, size_t priv_size);
>>  int devlink_register(struct devlink *devlink, struct device *dev);
>>  void devlink_unregister(struct devlink *devlink);
>>  void devlink_free(struct devlink *devlink);
>> +void devlink_trace_hwmsg(const struct devlink *devlink, bool incoming,
>> + unsigned long type, const u8 *buf, size_t len);
>>  int devlink_port_register(struct devlink *devlink,
>>struct devlink_port *devlink_port,
>>unsigned int port_index);
>> @@ -154,6 +156,12 @@ static inline void devlink_free(struct devlink *devlink)
>>  kfree(devlink);
>>  }
>>  
>> +static inline void devlink_trace_hwmsg(const struct devlink *devlink,
>> +   bool incoming, unsigned long type,
>> +   const u8 *buf, size_t len);
>> +{
>> +}
>> +
>>  static inline int devlink_port_register(struct devlink *devlink,
>>  struct devlink_port *devlink_port,
>>  unsigned int port_index)
>> diff --git a/include/trace/events/devlink.h b/include/trace/events/devlink.h
>> new file mode 100644
>> index 000..7918c57
>> --- /dev/null
>> +++ b/include/trace/events/devlink.h
>> @@ -0,0 +1,49 @@
>> +#undef TRACE_SYSTEM
>> +#define TRACE_SYSTEM devlink
>> +
>> +#if !defined(_TRACE_DEVLINK_H) || defined(TRACE_HEADER_MULTI_READ)
>> +#define _TRACE_DEVLINK_H
>> +
>> +#include 
>> +#include 
>> +#include 
>> +
>> +/*
>> + * Tracepoint for devlink hardware message:
>> + */
>> +TRACE_EVENT(devlink_hwmsg,
>> +TP_PROTO(const struct devlink *devlink, bool incoming,
>> + unsigned long type, const u8 *buf, size_t len),
>> +
>> +TP_ARGS(devlink, incoming, type, buf, len),
>> +
>> +TP_STRUCT__entry(
>> +__string(bus_name, devlink->dev->bus->name)
>> +__string(dev_name, dev_name(devlink->dev))
>> +__string(owner_name, devlink->dev->driver->owner->name)
>> +__field(bool, incoming)
>> +__field(unsigned long, type)
>> +__dynamic_array(u8, buf, len)
>> +__field(size_t, len)
>> +),
>> +
>> +TP_fast_assign(
>> +__assign_str(bus_name, devlink->dev->bus->name);
>> +__assign_str(dev_name, dev_name(devlink->dev));
>> +__assign_str(owner_name, devlink->dev->driver->owner->name);
>> +__entry->incoming = incoming;
>> +__entry->type = type;
>> +memcpy(__get_dynamic_array(buf), buf, len);
>> +__entry->len = len;
>> +),
>> +
>> +TP_printk("bus_name=%s dev_name=%s owner_name=%s incoming=%d type=%lu 
>> buf=0x[%*phD] len=%lu",
>> +  __get_str(bus_name), __get_str(dev_name),
>> +  __get_str(owner_name), __entry->incoming, __entry->type,
>> +  (int) __entry->len, __get_dynamic_array(buf), __entry->len)
>> +);
>> +
>> +#endif /* _TRACE_DEVLINK_H */
>> +
>> +/* This part must be outside protection */
>> +#include 
>> diff --git a/net/core/devlink.c b/net/core/devlink.c
>> index b2e592a..8cfa3b0 100644
>> --- a/net/core/devlink.c
>> +++ b/net/core/devlink.c
>> @@ -26,6 +26,8 @@
>>  #include 
>>  #include 
>>  #include 
>> +#define CREATE_TRACE_POINTS
>> +#include 
>>  
>>  static LIST_HEAD(devlink_list);
>>  
>> @@ -1679,6 +1681,13 @@ void devlink_free(struct devlink *devlink)
>>  }
>>  EXPORT_SYMBOL_GPL(devlink_free);
>>  
>> +void devlink_trace_hwmsg(const struct devlink *devlink, bool incoming,
>> + unsigned long type, const u8 *buf, size_t len)
>> +{
>> +trace_devlink_hwmsg(devlink, incoming, type, buf, len);
>> +}
>> +EXPORT_SYMBOL_GPL(devlink_trace_hwmsg);
>> +
>
>Instead of having a function that always gets called even when tracing
>isn't enabled, why not have the caller call the trace_devlink_hwmsg()
>directly?

That's what David already pointed at. I like to have a simple wrapper
function with "devlink_" prefix.


>
>In the trace/devlink.h file you could encapsulate it with:
>
>#if IS_ENABLED(CONFIG_NET_DEVLINK)
>
>[...]
>
>#else
>static

[PATCH 3/3] crypto: Added Chelsio Menu to the Kconfig file

2016-07-11 Thread Yeshaswi M R Gowda

Adds the config entry for the Chelsio Crypto Driver, Makefile changes
for the same.

Signed-off-by: Yeshaswi M R Gowda 
---
 drivers/crypto/Kconfig  |2 ++
 drivers/crypto/Makefile |1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index d77ba2f..b44faf0 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -537,4 +537,6 @@ config CRYPTO_DEV_ROCKCHIP
  This driver interfaces with the hardware crypto accelerator.
  Supporting cbc/ecb chainmode, and aes/des/des3_ede cipher mode.
 
+source "drivers/crypto/chelsio/Kconfig"
+
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 3c6432d..ad7250f 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -31,3 +31,4 @@ obj-$(CONFIG_CRYPTO_DEV_QCE) += qce/
 obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
 obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
 obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
+obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chelsio/
-- 
1.7.10.1

[PATCH 2/3] chcr: Support for Chelsio's Crypto Hardware

2016-07-11 Thread Yeshaswi M R Gowda

The Chelsio's Crypto Hardware can perform the following operations:
SHA1, SHA224, SHA256, SHA384 and SHA512, HMAC(SHA1), HMAC(SHA224),
HMAC(SHA256), HMAC(SHA384), HAMC(SHA512), AES-128-CBC, AES-192-CBC,
AES-256-CBC, AES-128-XTS, AES-256-XTS

This patch implements the driver for above mentioned features. This
driver is an Upper Layer Driver which is attached to Chelsio's LLD
(cxgb4) and uses the queue allocated by the LLD for sending the crypto
requests to the Hardware and receiving the responses from it.

The crypto operations can be performed by Chelsio's hardware from the
userspace applications and/or from within the kernel space using the
kernel's crypto API.

The above mentioned crypto features have been tested using kernel's
tests mentioned in testmgr.h. They also have been tested from user
space using libkcapi and Openssl.

Signed-off-by: Yeshaswi M R Gowda 
---
 drivers/crypto/chelsio/Kconfig   |   19 +
 drivers/crypto/chelsio/Makefile  |4 +
 drivers/crypto/chelsio/chcr_algo.c   | 1531 ++
 drivers/crypto/chelsio/chcr_algo.h   |  502 +++
 drivers/crypto/chelsio/chcr_core.c   |  273 ++
 drivers/crypto/chelsio/chcr_core.h   |   85 ++
 drivers/crypto/chelsio/chcr_crypto.h |  255 ++
 7 files changed, 2669 insertions(+)
 create mode 100644 drivers/crypto/chelsio/Kconfig
 create mode 100644 drivers/crypto/chelsio/Makefile
 create mode 100644 drivers/crypto/chelsio/chcr_algo.c
 create mode 100644 drivers/crypto/chelsio/chcr_algo.h
 create mode 100644 drivers/crypto/chelsio/chcr_core.c
 create mode 100644 drivers/crypto/chelsio/chcr_core.h
 create mode 100644 drivers/crypto/chelsio/chcr_crypto.h

diff --git a/drivers/crypto/chelsio/Kconfig b/drivers/crypto/chelsio/Kconfig
new file mode 100644
index 000..5684a55
--- /dev/null
+++ b/drivers/crypto/chelsio/Kconfig
@@ -0,0 +1,19 @@
+config CRYPTO_DEV_CHELSIO
+   tristate "Chelsio Crypto Co-processor Driver"
+   select CHELSIO_T4
+   select CRYPTO_SHA1
+   select CRYPTO_SHA256
+   select CRYPTO_SHA512
+   ---help---
+ The Chelsio Crypto Co-processor driver for T6 adapters.
+
+ For general information about Chelsio and our products, visit
+ our website at .
+
+ For customer support, please visit our customer support page at
+ .
+
+ Please send feedback to .
+
+ To compile this driver as a module, choose M here: the module
+ will be called chcr.
diff --git a/drivers/crypto/chelsio/Makefile b/drivers/crypto/chelsio/Makefile
new file mode 100644
index 000..7e4fda5
--- /dev/null
+++ b/drivers/crypto/chelsio/Makefile
@@ -0,0 +1,4 @@
+ ccflags-y := -Idrivers/net/ethernet/chelsio/cxgb4
+
+ obj-$(CONFIG_CRYPTO_DEV_CHELSIO) += chcr.o
+ chcr-objs :=  chcr_core.o chcr_algo.o
\ No newline at end of file
diff --git a/drivers/crypto/chelsio/chcr_algo.c 
b/drivers/crypto/chelsio/chcr_algo.c
new file mode 100644
index 000..b070486
--- /dev/null
+++ b/drivers/crypto/chelsio/chcr_algo.c
@@ -0,0 +1,1531 @@
+/*
+ * This file is part of the Chelsio T6 Crypto driver for Linux.
+ *
+ * Copyright (c) 2003-2016 Chelsio Communications, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * Written and Maintained by:
+ * Manoj Malviya (manojmalv...@chelsio.com)
+ * Atul Gupta (atul.gu...@chelsio.com)
+ * Jitendra Lulla (jlu...@chelsio.com)
+ * Yeshaswi M R Gowda (yesha...@chelsio.com)
+ * Harsh Jain (ha...@chelsio.com)
+ */
+
+#include 
+#include 
+#include 
+#include

[PATCH 1/3] cxgb4: Add Chelsio LLD support Chelsio Crypto ULD

2016-07-11 Thread Yeshaswi M R Gowda

The Chelsio crypto driver is an Upper Layer Driver (ULD), making use
of the Chelsio Lower Layer Driver (LLD - cxgb4). The LLD facilitates
the basic infrastructure services of the ULD. These services include
queue allocation, deallocation and registration with LLD. The queues
are used for sending the crypto requests to the Chelsio's hardware
and for receiving the responses from the hardware.

This patch enables the services mentioned for the Chelsio's crypto
driver.

Signed-off-by: Yeshaswi M R Gowda 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |   22 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   71 +++-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h  |   10 +
 drivers/net/ethernet/chelsio/cxgb4/sge.c|   64 
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h |  437 +++
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h   |  125 +++
 6 files changed, 721 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index b4fceb9..14b26dd 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -346,6 +346,8 @@ struct adapter_params {
 
unsigned int max_ordird_qp;   /* Max read depth per RDMA QP */
unsigned int max_ird_adapter; /* Max read depth per adapter */
+
+   unsigned char ulp_crypto_lookaside; /* crypto lookaside support */
 };
 
 /* State needed to monitor the forward progress of SGE Ingress DMA activities
@@ -435,7 +437,7 @@ enum {
MAX_CTRL_QUEUES = NCHAN,  /* # of control Tx queues */
MAX_RDMA_QUEUES = NCHAN,  /* # of streaming RDMA Rx queues */
MAX_RDMA_CIQS = 32,/* # of  RDMA concentrator IQs */
-
+   MAX_CRYPTO_QUEUES = 32,   /* # of crypto queues */
/* # of streaming iSCSIT Rx queues */
MAX_ISCSIT_QUEUES = MAX_OFLD_QSETS,
 };
@@ -455,7 +457,8 @@ enum {
INGQ_EXTRAS = 2,/* firmware event queue and */
/*   forwarded interrupts */
MAX_INGQ = MAX_ETH_QSETS + MAX_OFLD_QSETS + MAX_RDMA_QUEUES +
-  MAX_RDMA_CIQS + MAX_ISCSIT_QUEUES + INGQ_EXTRAS,
+  MAX_RDMA_CIQS + MAX_ISCSIT_QUEUES + INGQ_EXTRAS +
+  MAX_CRYPTO_QUEUES,
 };
 
 struct adapter;
@@ -509,6 +512,10 @@ enum { /* adapter flags */
FW_OFLD_CONN   = (1 << 9),
 };
 
+enum {
+   ULP_CRYPTO_LOOKASIDE = 1 << 0,
+};
+
 struct rx_sw_desc;
 
 struct sge_fl { /* SGE free-buffer queue state */
@@ -682,10 +689,12 @@ struct sge_ctrl_txq {   /* state for an SGE 
control Tx queue */
 struct sge {
struct sge_eth_txq ethtxq[MAX_ETH_QSETS];
struct sge_ofld_txq ofldtxq[MAX_OFLD_QSETS];
+   struct sge_ofld_txq cryptotxq[MAX_CRYPTO_QUEUES];
struct sge_ctrl_txq ctrlq[MAX_CTRL_QUEUES];
 
struct sge_eth_rxq ethrxq[MAX_ETH_QSETS];
struct sge_ofld_rxq iscsirxq[MAX_OFLD_QSETS];
+   struct sge_ofld_rxq cryptorxq[MAX_CRYPTO_QUEUES];
struct sge_ofld_rxq iscsitrxq[MAX_ISCSIT_QUEUES];
struct sge_ofld_rxq rdmarxq[MAX_RDMA_QUEUES];
struct sge_ofld_rxq rdmaciq[MAX_RDMA_CIQS];
@@ -699,10 +708,12 @@ struct sge {
u16 ethtxq_rover;   /* Tx queue to clean up next */
u16 iscsiqsets;  /* # of active iSCSI queue sets */
u16 niscsitq;   /* # of available iSCST Rx queues */
+   u16 ncryptoq;   /* # of available lookaside crypto queues */
u16 rdmaqs; /* # of available RDMA Rx queues */
u16 rdmaciqs;   /* # of available RDMA concentrator IQs */
u16 iscsi_rxq[MAX_OFLD_QSETS];
u16 iscsit_rxq[MAX_ISCSIT_QUEUES];
+   u16 crypto_rxq[MAX_CRYPTO_QUEUES];
u16 rdma_rxq[MAX_RDMA_QUEUES];
u16 rdma_ciq[MAX_RDMA_CIQS];
u16 timer_val[SGE_NTIMERS];
@@ -732,6 +743,7 @@ struct sge {
 #define for_each_iscsitrxq(sge, i) for (i = 0; i < (sge)->niscsitq; i++)
 #define for_each_rdmarxq(sge, i) for (i = 0; i < (sge)->rdmaqs; i++)
 #define for_each_rdmaciq(sge, i) for (i = 0; i < (sge)->rdmaciqs; i++)
+#define for_each_cryptorxq(sge, i) for (i = 0; i < (sge)->ncryptoq; i++)
 
 struct l2t_data;
 
@@ -1441,7 +1453,7 @@ int t4_fw_bye(struct adapter *adap, unsigned int mbox);
 int t4_early_init(struct adapter *adap, unsigned int mbox);
 int t4_fw_reset(struct adapter *adap, unsigned int mbox, int reset);
 int t4_fixup_host_params(struct adapter *adap, unsigned int page_size,
- unsigned int cache_line_size);
+unsigned int cache_line_size);
 int t4_fw_initialize(struct adapter *adap, unsigned int mbox);
 int t4_query_params(struct adapter *adap, unsigned int mbox, unsigned int pf,
unsigned int vf, unsigned int nparams, const u32 *params,
@@ -1468,8 +1480,8 @@

[PATCH 0/3] crypto/chcr: Add Chelsio Crypto Driver

2016-07-11 Thread Yeshaswi M R Gowda

Hi Herbert,

This patch series contains 3 patches that add support for Chelsio's
Crypto Hardware.

The patch series has been created against Herbert Xu's tree (crypto-2.6).
It includes patches for Chelsio Low Level Driver(cxgb4) and adds the new
crypto Upper Layer Driver(chcr) under a new directory drivers/crypto/chelsio.

The first of the patch series implements necessary changes in the Chelsio
LLD for queue allocation, deallocation and registration of the ULD.

The second patch implements the Chelsio crypto driver.

The third patch contains the changes to the driver/crypto/Kconfig and
drivers/crypto/Makefile to enable the Chelsio Crypto driver.

We have included all the maintainers of respective drivers. Kindly
review the changes and provide feedback on the same.

Yeshaswi M R Gowda (3):
  cxgb4: Add Chelsio LLD support Chelsio Crypto ULD
  chcr: Support for Chelsio's Crypto Hardware
  crypto: Added Chelsio Menu to the Kconfig file

 drivers/crypto/Kconfig  |2 +
 drivers/crypto/Makefile |1 +
 drivers/crypto/chelsio/Kconfig  |   19 +
 drivers/crypto/chelsio/Makefile |4 +
 drivers/crypto/chelsio/chcr_algo.c  | 1531 +++
 drivers/crypto/chelsio/chcr_algo.h  |  502 
 drivers/crypto/chelsio/chcr_core.c  |  273 
 drivers/crypto/chelsio/chcr_core.h  |   85 ++
 drivers/crypto/chelsio/chcr_crypto.h|  255 
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h  |   22 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   71 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.h  |   10 +
 drivers/net/ethernet/chelsio/cxgb4/sge.c|   64 +
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h |  437 +++
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h   |  125 ++
 15 files changed, 3393 insertions(+), 8 deletions(-)
 create mode 100644 drivers/crypto/chelsio/Kconfig
 create mode 100644 drivers/crypto/chelsio/Makefile
 create mode 100644 drivers/crypto/chelsio/chcr_algo.c
 create mode 100644 drivers/crypto/chelsio/chcr_algo.h
 create mode 100644 drivers/crypto/chelsio/chcr_core.c
 create mode 100644 drivers/crypto/chelsio/chcr_core.h
 create mode 100644 drivers/crypto/chelsio/chcr_crypto.h

-- 
1.7.10.1

Re: [PATCH RESEND] iwlwifi, Do not implement thermal zone unless ucode is loaded

2016-07-11 Thread Grumbach, Emmanuel

On Mon, 2016-07-11 at 14:19 -0400, Prarit Bhargava wrote:
> 
> On 07/11/2016 02:00 PM, Emmanuel Grumbach wrote:
> > On Mon, Jul 11, 2016 at 6:18 PM, Prarit Bhargava  > > wrote:
> > > 
> > > Didn't get any feedback or review comments on this patch. 
> > >  Resending ...
> > > 
> > > P.
> > 
> > This change is obviously completely broken. It simply disables the
> > registration to thermal zone core.
> 
> No it is not broken, and yes, that is exactly what should happen IMO.
> 
> The problem is that the iwlwifi driver implements the thermal zone
> even when the
> device doesn't support it.

We implement thermal zone because we do support it, but the problem is
that we need the firmware to be loaded for that. So you can argue that
we should register *later* when the firmware is loaded. But this is
really not helping all that much because the firmware can also be
stopped at any time. So you'd want us to register / unregister the
thermal zone anytime the firmware is loaded / unloaded?
I guess that works, but it seems wrong to me. Usually, registration
should happen only upon INIT, and yes, at that time the firmware is not
ready to provide the information yet.
Maybe returning -EBUSY would help lm-sensors not to get confused?

> 
> As can be seen in the current code base, iwl_mvm_tzone_get_temp()
> will return
> -EIO 100% of the time when the firmware doesn't support reading the
> temperature[1].  In this case a read of sysfs will result in a return
> of -EIO,
> and this breaks existing userspace programs such as lm-sensors (which
> by all
> accounts is bad to do).

Right, but I don't understand why the userspace is broken because of
that? Unless we register / unregister anytime the firmware is loaded, I
don't see any proper way to fix this. And yes, I'd expect the userspace
to handle gracefully failures in its requests.

> 
> Note that in my patch I have removed the -EIO return in favor of not
> registering
> the non-existent thermal zone.  I'm not removing any functionality by
> changing
> this, nor am I adding functionality.  In both cases the thermal zone
> is not
> functional, and with my patch userspace continues to work.

You are removing the thermal zone functionality since even when the
firmware will be loaded (which typically happens fairly quickly),
thermal zone won't work.

> 
> P.
> 
> [1] iwl_mvm_tzone_set_trip_temp() also returns -EIO, so setting and
> getting of
> the temperature is non-functional.
> 
> 
> > 
> > > 
> > > ---8<---
> > > 
> > > The iwlwifi driver implements a thermal zone and hwmon device,
> > > but
> > > returns -EIO on temperature reads if the firmware isn't loaded. 
> > >  This
> > > results in the error
> > > 
> > > iwlwifi-virtual-0
> > > Adapter: Virtual device
> > > ERROR: Can't get value of subfeature temp1_input: I/O error
> > > temp1:N/A
> > > 
> > > being output when using sensors from the lm-sensors package. 
> > >  Since
> > > the temperature cannot be read unless the ucode is loaded there
> > > is no
> > > reason to add the interface only to have it return an error 100%
> > > of
> > > the time.
> > > 
> > > This patch moves the firmware check to
> > > iwl_mvm_thermal_zone_register() and
> > > stops the thermal zone from being created if the ucode hasn't
> > > been loaded.
> > > 
> > > Signed-off-by: Prarit Bhargava 
> > > Cc: Johannes Berg 
> > > Cc: Emmanuel Grumbach 
> > > Cc: Luca Coelho 
> > > Cc: Intel Linux Wireless 
> > > Cc: Kalle Valo 
> > > Cc: Chaya Rachel Ivgi 
> > > Cc: Sara Sharon 
> > > Cc: linux-wirel...@vger.kernel.org
> > > Cc: netdev@vger.kernel.org
> > > ---
> > >  drivers/net/wireless/intel/iwlwifi/mvm/tt.c |   13 +++--
> > >  1 file changed, 3 insertions(+), 10 deletions(-)
> > > 
> > > diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> > > b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> > > index 58fc7b3c711c..64802659711f 100644
> > > --- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> > > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> > > @@ -634,11 +634,6 @@ static int iwl_mvm_tzone_get_temp(struct
> > > thermal_zone_device *device,
> > > 
> > > mutex_lock(>mutex);
> > > 
> > > -   if (!mvm->ucode_loaded || !(mvm->cur_ucode ==
> > > IWL_UCODE_REGULAR)) {
> > > -   ret = -EIO;
> > > -   goto out;
> > > -   }
> > > -
> > > ret = iwl_mvm_get_temp(mvm, );
> > > if (ret)
> > > goto out;
> > > @@ -684,11 +679,6 @@ static int
> > > iwl_mvm_tzone_set_trip_temp(struct thermal_zone_device *device,
> > > 
> > > mutex_lock(>mutex);
> > > 
> > > -   if (!mvm->ucode_loaded || !(mvm->cur_ucode ==
> > > IWL_UCODE_REGULAR)) {
> > > -   ret = -EIO;
> > > -   goto out;
> > > -   }
> > > -
> > > if

Re: [PATCH RESEND] iwlwifi, Do not implement thermal zone unless ucode is loaded

2016-07-11 Thread Prarit Bhargava



On 07/11/2016 02:00 PM, Emmanuel Grumbach wrote:
> On Mon, Jul 11, 2016 at 6:18 PM, Prarit Bhargava  wrote:
>>
>> Didn't get any feedback or review comments on this patch.  Resending ...
>>
>> P.
> 
> This change is obviously completely broken. It simply disables the
> registration to thermal zone core.

No it is not broken, and yes, that is exactly what should happen IMO.

The problem is that the iwlwifi driver implements the thermal zone even when the
device doesn't support it.

As can be seen in the current code base, iwl_mvm_tzone_get_temp() will return
-EIO 100% of the time when the firmware doesn't support reading the
temperature[1].  In this case a read of sysfs will result in a return of -EIO,
and this breaks existing userspace programs such as lm-sensors (which by all
accounts is bad to do).

Note that in my patch I have removed the -EIO return in favor of not registering
the non-existent thermal zone.  I'm not removing any functionality by changing
this, nor am I adding functionality.  In both cases the thermal zone is not
functional, and with my patch userspace continues to work.

P.

[1] iwl_mvm_tzone_set_trip_temp() also returns -EIO, so setting and getting of
the temperature is non-functional.


> 
>>
>> ---8<---
>>
>> The iwlwifi driver implements a thermal zone and hwmon device, but
>> returns -EIO on temperature reads if the firmware isn't loaded.  This
>> results in the error
>>
>> iwlwifi-virtual-0
>> Adapter: Virtual device
>> ERROR: Can't get value of subfeature temp1_input: I/O error
>> temp1:N/A
>>
>> being output when using sensors from the lm-sensors package.  Since
>> the temperature cannot be read unless the ucode is loaded there is no
>> reason to add the interface only to have it return an error 100% of
>> the time.
>>
>> This patch moves the firmware check to iwl_mvm_thermal_zone_register() and
>> stops the thermal zone from being created if the ucode hasn't been loaded.
>>
>> Signed-off-by: Prarit Bhargava 
>> Cc: Johannes Berg 
>> Cc: Emmanuel Grumbach 
>> Cc: Luca Coelho 
>> Cc: Intel Linux Wireless 
>> Cc: Kalle Valo 
>> Cc: Chaya Rachel Ivgi 
>> Cc: Sara Sharon 
>> Cc: linux-wirel...@vger.kernel.org
>> Cc: netdev@vger.kernel.org
>> ---
>>  drivers/net/wireless/intel/iwlwifi/mvm/tt.c |   13 +++--
>>  1 file changed, 3 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c 
>> b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
>> index 58fc7b3c711c..64802659711f 100644
>> --- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
>> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
>> @@ -634,11 +634,6 @@ static int iwl_mvm_tzone_get_temp(struct 
>> thermal_zone_device *device,
>>
>> mutex_lock(>mutex);
>>
>> -   if (!mvm->ucode_loaded || !(mvm->cur_ucode == IWL_UCODE_REGULAR)) {
>> -   ret = -EIO;
>> -   goto out;
>> -   }
>> -
>> ret = iwl_mvm_get_temp(mvm, );
>> if (ret)
>> goto out;
>> @@ -684,11 +679,6 @@ static int iwl_mvm_tzone_set_trip_temp(struct 
>> thermal_zone_device *device,
>>
>> mutex_lock(>mutex);
>>
>> -   if (!mvm->ucode_loaded || !(mvm->cur_ucode == IWL_UCODE_REGULAR)) {
>> -   ret = -EIO;
>> -   goto out;
>> -   }
>> -
>> if (trip < 0 || trip >= IWL_MAX_DTS_TRIPS) {
>> ret = -EINVAL;
>> goto out;
>> @@ -750,6 +740,9 @@ static void iwl_mvm_thermal_zone_register(struct iwl_mvm 
>> *mvm)
>> return;
>> }
>>
>> +   if (!mvm->ucode_loaded || !(mvm->cur_ucode == IWL_UCODE_REGULAR))
>> +   return;
>> +
>> BUILD_BUG_ON(ARRAY_SIZE(name) >= THERMAL_NAME_LENGTH);
>>
>> mvm->tz_device.tzone = thermal_zone_device_register(name,
>> --
>> 1.7.9.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RESEND] iwlwifi, Do not implement thermal zone unless ucode is loaded

2016-07-11 Thread Emmanuel Grumbach

On Mon, Jul 11, 2016 at 6:18 PM, Prarit Bhargava  wrote:
>
> Didn't get any feedback or review comments on this patch.  Resending ...
>
> P.

This change is obviously completely broken. It simply disables the
registration to thermal zone core.

>
> ---8<---
>
> The iwlwifi driver implements a thermal zone and hwmon device, but
> returns -EIO on temperature reads if the firmware isn't loaded.  This
> results in the error
>
> iwlwifi-virtual-0
> Adapter: Virtual device
> ERROR: Can't get value of subfeature temp1_input: I/O error
> temp1:N/A
>
> being output when using sensors from the lm-sensors package.  Since
> the temperature cannot be read unless the ucode is loaded there is no
> reason to add the interface only to have it return an error 100% of
> the time.
>
> This patch moves the firmware check to iwl_mvm_thermal_zone_register() and
> stops the thermal zone from being created if the ucode hasn't been loaded.
>
> Signed-off-by: Prarit Bhargava 
> Cc: Johannes Berg 
> Cc: Emmanuel Grumbach 
> Cc: Luca Coelho 
> Cc: Intel Linux Wireless 
> Cc: Kalle Valo 
> Cc: Chaya Rachel Ivgi 
> Cc: Sara Sharon 
> Cc: linux-wirel...@vger.kernel.org
> Cc: netdev@vger.kernel.org
> ---
>  drivers/net/wireless/intel/iwlwifi/mvm/tt.c |   13 +++--
>  1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c 
> b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> index 58fc7b3c711c..64802659711f 100644
> --- a/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tt.c
> @@ -634,11 +634,6 @@ static int iwl_mvm_tzone_get_temp(struct 
> thermal_zone_device *device,
>
> mutex_lock(>mutex);
>
> -   if (!mvm->ucode_loaded || !(mvm->cur_ucode == IWL_UCODE_REGULAR)) {
> -   ret = -EIO;
> -   goto out;
> -   }
> -
> ret = iwl_mvm_get_temp(mvm, );
> if (ret)
> goto out;
> @@ -684,11 +679,6 @@ static int iwl_mvm_tzone_set_trip_temp(struct 
> thermal_zone_device *device,
>
> mutex_lock(>mutex);
>
> -   if (!mvm->ucode_loaded || !(mvm->cur_ucode == IWL_UCODE_REGULAR)) {
> -   ret = -EIO;
> -   goto out;
> -   }
> -
> if (trip < 0 || trip >= IWL_MAX_DTS_TRIPS) {
> ret = -EINVAL;
> goto out;
> @@ -750,6 +740,9 @@ static void iwl_mvm_thermal_zone_register(struct iwl_mvm 
> *mvm)
> return;
> }
>
> +   if (!mvm->ucode_loaded || !(mvm->cur_ucode == IWL_UCODE_REGULAR))
> +   return;
> +
> BUILD_BUG_ON(ARRAY_SIZE(name) >= THERMAL_NAME_LENGTH);
>
> mvm->tz_device.tzone = thermal_zone_device_register(name,
> --
> 1.7.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH -next] bpf: make inode code explicitly non-modular

2016-07-11 Thread Daniel Borkmann


On 07/11/2016 06:51 PM, Paul Gortmaker wrote:

The Kconfig currently controlling compilation of this code is:

init/Kconfig:config BPF_SYSCALL
init/Kconfig:   bool "Enable bpf() system call"

...meaning that it currently is not being built as a module by anyone.

Lets remove the couple traces of modular infrastructure use, so that
when reading the driver there is no doubt it is builtin-only.

Note that MODULE_ALIAS is a no-op for non-modular code.

We replace module.h with init.h since the file does use __init.

Cc: Alexei Starovoitov 
Cc: netdev@vger.kernel.org
Signed-off-by: Paul Gortmaker 


(Patch is for net-next tree then.)

Acked-by: Daniel Borkmann

Re: [PATCH net] i40e: use valid online CPU on q_vector initialization

2016-07-11 Thread Guilherme G. Piccoli


On 06/27/2016 12:16 PM, Guilherme G. Piccoli wrote:

Currently, the q_vector initialization routine sets the affinity_mask
of a q_vector based on v_idx value. Meaning a loop iterates on v_idx,
which is an incremental value, and the cpumask is created based on
this value.

This is a problem in systems with multiple logical CPUs per core (like in
SMT scenarios). If we disable some logical CPUs, by turning SMT off for
example, we will end up with a sparse cpu_online_mask, i.e., only the first
CPU in a core is online, and incremental filling in q_vector cpumask might
lead to multiple offline CPUs being assigned to q_vectors.

Example: if we have a system with 8 cores each one containing 8 logical
CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is
disabled, only the 1st CPU in each core remains online, so the
cpu_online_mask in this case would have only 8 bits set, in a sparse way.

In general case, when SMT is off the cpu_online_mask has only C bits set:
0, 1*N, 2*N, ..., C*(N-1)  where
C == # of cores;
N == # of logical CPUs per core.
In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set.

This patch changes the way q_vector's affinity_mask is created: it iterates
on v_idx, but consumes the CPU index from the cpu_online_mask instead of
just using the v_idx incremental value.

No functional changes were introduced.

Signed-off-by: Guilherme G. Piccoli 
---
  drivers/net/ethernet/intel/i40e/i40e_main.c | 16 +++-
  1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 5ea2200..a89bddd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -7726,10 +7726,11 @@ static int i40e_init_msix(struct i40e_pf *pf)
   * i40e_vsi_alloc_q_vector - Allocate memory for a single interrupt vector
   * @vsi: the VSI being configured
   * @v_idx: index of the vector in the vsi struct
+ * @cpu: cpu to be used on affinity_mask
   *
   * We allocate one q_vector.  If allocation fails we return -ENOMEM.
   **/
-static int i40e_vsi_alloc_q_vector(struct i40e_vsi *vsi, int v_idx)
+static int i40e_vsi_alloc_q_vector(struct i40e_vsi *vsi, int v_idx, int cpu)
  {
struct i40e_q_vector *q_vector;

@@ -7740,7 +7741,8 @@ static int i40e_vsi_alloc_q_vector(struct i40e_vsi *vsi, 
int v_idx)

q_vector->vsi = vsi;
q_vector->v_idx = v_idx;
-   cpumask_set_cpu(v_idx, _vector->affinity_mask);
+   cpumask_set_cpu(cpu, _vector->affinity_mask);
+
if (vsi->netdev)
netif_napi_add(vsi->netdev, _vector->napi,
   i40e_napi_poll, NAPI_POLL_WEIGHT);
@@ -7764,8 +7766,7 @@ static int i40e_vsi_alloc_q_vector(struct i40e_vsi *vsi, 
int v_idx)
  static int i40e_vsi_alloc_q_vectors(struct i40e_vsi *vsi)
  {
struct i40e_pf *pf = vsi->back;
-   int v_idx, num_q_vectors;
-   int err;
+   int err, v_idx, num_q_vectors, current_cpu;

/* if not MSIX, give the one vector only to the LAN VSI */
if (pf->flags & I40E_FLAG_MSIX_ENABLED)
@@ -7775,10 +7776,15 @@ static int i40e_vsi_alloc_q_vectors(struct i40e_vsi 
*vsi)
else
return -EINVAL;

+   current_cpu = cpumask_first(cpu_online_mask);
+
for (v_idx = 0; v_idx < num_q_vectors; v_idx++) {
-   err = i40e_vsi_alloc_q_vector(vsi, v_idx);
+   err = i40e_vsi_alloc_q_vector(vsi, v_idx, current_cpu);
if (err)
goto err_out;
+   current_cpu = cpumask_next(current_cpu, cpu_online_mask);
+   if (unlikely(current_cpu >= nr_cpu_ids))
+   current_cpu = cpumask_first(cpu_online_mask);
}

return 0;




Ping?

Sorry to bother, if you think I need to improve something here, let me 
know :)


I'm adding another Intel people in this thread, based on patches to i40e.

Thanks in advance,



Guilherme

1 2 >

1 - 100 of 170 matches

Mail list logo