date:20160705

[Patch net] ppp: defer netns reference release for ppp channel

2016-07-05 Thread Cong Wang

Matt reported that we have a NULL pointer dereference
in ppp_pernet() from ppp_connect_channel(),
i.e. pch->chan_net is NULL.

This is due to that a parallel ppp_unregister_channel()
could happen while we are in ppp_connect_channel(), during
which pch->chan_net set to NULL. Since we need a reference
to net per channel, it makes sense to sync the refcnt
with the life time of the channel, therefore we should
release this reference when we destroy it.

Fixes: 1f461dcdd296 ("ppp: take reference on channels netns")
Reported-by: Matt Bennett 
Cc: Paul Mackerras 
Cc: linux-...@vger.kernel.org
Cc: Guillaume Nault 
Cc: Cyrill Gorcunov 
Signed-off-by: Cong Wang 
---
 drivers/net/ppp/ppp_generic.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 8dedafa..a30ee42 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2601,8 +2601,6 @@ ppp_unregister_channel(struct ppp_channel *chan)
spin_lock_bh(>all_channels_lock);
list_del(>list);
spin_unlock_bh(>all_channels_lock);
-   put_net(pch->chan_net);
-   pch->chan_net = NULL;
 
pch->file.dead = 1;
wake_up_interruptible(>file.rwait);
@@ -3136,6 +3134,9 @@ ppp_disconnect_channel(struct channel *pch)
  */
 static void ppp_destroy_channel(struct channel *pch)
 {
+   put_net(pch->chan_net);
+   pch->chan_net = NULL;
+
atomic_dec(_count);
 
if (!pch->file.dead) {
-- 
1.8.4.5

Re: [PATCH net] net: mvneta: set real interrupt per packet for tx_done

2016-07-05 Thread Willy Tarreau

Hi,

On Wed, Jul 06, 2016 at 04:18:58AM +0200, Marcin Wojtas wrote:
> From: Dmitri Epshtein 
> 
> Commit aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay") intended to
> set coalescing threshold to a value guaranteeing interrupt generation
> per each sent packet, so that buffers can be released with no delay.
> 
> In fact setting threshold to '1' was wrong, because it causes interrupt
> every two packets. According to the documentation a reason behind it is
> following - interrupt occurs once sent buffers counter reaches a value,
> which is higher than one specified in MVNETA_TXQ_SIZE_REG(q). This
> behavior was confirmed during tests. Also when testing the SoC working
> as a NAS device, better performance was observed with int-per-packet,
> as it strongly depends on the fact that all transmitted packets are
> released immediately.
> 
> This commit enables NETA controller work in interrupt per sent packet mode
> by setting coalescing threshold to 0.

We had a discussion about this in January 2015 and I thought I sent a patch
to change it but I can't find it, so I think it was only proposed to some
users for testing. I also remember that on more recent kernels by then
(>=3.13) we observed a slightly better performance with this value set to
zero.

Acked-by: Willy Tarreau 

Willy

Re: [PATCH stable] sfc: report supported link speeds on SFP connections

2016-07-05 Thread David Miller

From: Bert Kenward 
Date: Tue, 28 Jun 2016 11:11:11 +0100

> commit 1974282ab547df7437276c8d4ec47f3d2300f339
> Author: Bert Kenward 
> Date:   Mon Jun 6 17:29:30 2016 +0100
> 
> sfc: report supported link speeds on SFP connections

[davem@localhost linux]$ git describe 1974282ab547df7437276c8d4ec47f3d2300f339
fatal: 1974282ab547df7437276c8d4ec47f3d2300f339 is not a valid 'commit' object

If you want me to backport something, don't show me a commit from one
of your private trees.

Thanks.

Re: Backport bpf: try harder on clones when writing into skb? [Commit: 3697649ff29e0f647565eed04b27a7779c646a22]

2016-07-05 Thread David Miller

From: Alexei Starovoitov 
Date: Tue, 5 Jul 2016 19:16:51 -0700

> On Tue, Jul 05, 2016 at 08:35:18AM -0700, Sargun Dhillon wrote:
>> Does it make sense to backport
>> 3697649ff29e0f647565eed04b27a7779c646a22 from 4.6 to the longterm
>> (4.4) release? I can trivially recreate the issue represented by
>> 3697649ff29e0f647565eed04b27a7779c646a22 by attaching a eBPF filter
>> that clones an ingress ICMP packet, and then tries to set the
>> destination MAC address.
>> 
>> It seems like the patch applies cleanly to 4.4. I cherry-picked it,
>> and rebuilt my kernel, and at least in the trivial test case passes.
> 
> Makes sense to me, especially since it's lts.
> Daniel, thoughts?

I'll queued this up for 4.4 -stable.

Re: [RFC 0/7] netlink: Add allocation flag to netlink_unicast()

2016-07-05 Thread David Miller

From: Masashi Honma 
Date: Wed,  6 Jul 2016 09:28:29 +0900

> Though currently such a use case was not found, to solve potential
> issue we will add an allocation flag to netlink_unicast().

We don't solve potential issues, we solve real issues.

There is no reason to add the GFP parameter until it is actually
needed.

[PATCH] batman-adv: fix boolreturn.cocci warnings

2016-07-05 Thread kbuild test robot

net/batman-adv/bridge_loop_avoidance.c:1105:9-10: WARNING: return of 0/1 in 
function 'batadv_bla_process_claim' with return type bool

 Return statements in functions returning bool should use
 true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci

CC: Sven Eckelmann 
Signed-off-by: Fengguang Wu 
---

 bridge_loop_avoidance.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/batman-adv/bridge_loop_avoidance.c
+++ b/net/batman-adv/bridge_loop_avoidance.c
@@ -1102,7 +1102,7 @@ static bool batadv_bla_process_claim(str
 
/* Let the loopdetect frames on the mesh in any case. */
if (bla_dst->type == BATADV_CLAIM_TYPE_LOOPDETECT)
-   return 0;
+   return false;
 
/* check if it is a claim frame. */
ret = batadv_check_claim_group(bat_priv, primary_if, hw_src, hw_dst,

Re: [PATCH] xfrm: fix crash in XFRM_MSG_GETSA netlink handler

2016-07-05 Thread Herbert Xu

On Tue, Jul 05, 2016 at 12:13:03PM -0700, David Miller wrote:
> From: Vegard Nossum 
> Date: Tue,  5 Jul 2016 10:18:08 +0200
> 
> > If we hit any of the error conditions inside xfrm_dump_sa(), then
> > xfrm_state_walk_init() never gets called. However, we still call
> > xfrm_state_walk_done() from xfrm_dump_sa_done(), which will crash
> > because the state walk was never initialized properly.
> > 
> > We can fix this by setting cb->args[0] only after we've processed the
> > first element and checking this before calling xfrm_state_walk_done().
> > 
> > Fixes: d3623099d3 ("ipsec: add support of limited SA dump")
> > Cc: Nicolas Dichtel 
> > Cc: Steffen Klassert 
> > Signed-off-by: Vegard Nossum 
> 
> I assume Steffen will pick this up.

I think Steffen said that he is going to be on vacation for two
weeks starting this week.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: Backport bpf: try harder on clones when writing into skb? [Commit: 3697649ff29e0f647565eed04b27a7779c646a22]

2016-07-05 Thread Alexei Starovoitov

On Tue, Jul 05, 2016 at 08:35:18AM -0700, Sargun Dhillon wrote:
> Does it make sense to backport
> 3697649ff29e0f647565eed04b27a7779c646a22 from 4.6 to the longterm
> (4.4) release? I can trivially recreate the issue represented by
> 3697649ff29e0f647565eed04b27a7779c646a22 by attaching a eBPF filter
> that clones an ingress ICMP packet, and then tries to set the
> destination MAC address.
> 
> It seems like the patch applies cleanly to 4.4. I cherry-picked it,
> and rebuilt my kernel, and at least in the trivial test case passes.

Makes sense to me, especially since it's lts.
Daniel, thoughts?

[PATCH net] net: mvneta: set real interrupt per packet for tx_done

2016-07-05 Thread Marcin Wojtas

From: Dmitri Epshtein 

Commit aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay") intended to
set coalescing threshold to a value guaranteeing interrupt generation
per each sent packet, so that buffers can be released with no delay.

In fact setting threshold to '1' was wrong, because it causes interrupt
every two packets. According to the documentation a reason behind it is
following - interrupt occurs once sent buffers counter reaches a value,
which is higher than one specified in MVNETA_TXQ_SIZE_REG(q). This
behavior was confirmed during tests. Also when testing the SoC working
as a NAS device, better performance was observed with int-per-packet,
as it strongly depends on the fact that all transmitted packets are
released immediately.

This commit enables NETA controller work in interrupt per sent packet mode
by setting coalescing threshold to 0.

Signed-off-by: Dmitri Epshtein 
Signed-off-by: Marcin Wojtas 
Cc:  # v3.10+
Fixes aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay")
---
 drivers/net/ethernet/marvell/mvneta.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index cf04c97..0ad2fa3 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -244,7 +244,7 @@
 /* Various constants */
 
 /* Coalescing */
-#define MVNETA_TXDONE_COAL_PKTS1
+#define MVNETA_TXDONE_COAL_PKTS0   /* interrupt per packet 
*/
 #define MVNETA_RX_COAL_PKTS32
 #define MVNETA_RX_COAL_USEC100
 
-- 
1.8.3.1

Re: [PATCH net-next 00/10] NCSI Support

2016-07-05 Thread Benjamin Herrenschmidt

On Tue, 2016-07-05 at 19:07 -0700, Alexei Starovoitov wrote:
> 
> Great! Thanks for clarifying.
> So then future netlink api is mandatory to drive this kernel patches?
> How one can use this set without it?

The netlink API is to tweak things, it works reasonably well
autonomously without it.

> What is the main reason for this infra to be in the kernel instead of
> userspace raw socket? Some interaction with the driver, right?
> but it's not obvious from the patches.

There are a few reasons. One it means we can use kernel level
autoconfiguration like DHCP and NFS root which are quite handy when
developing BMC stacks :-)

Another one is that we haven't completely given up on reflecting the
state of the remote NC-SI link into the "carrier status" of the local
interface.

We can't yet do it because the link monitor would stop the driver
queues, but we could possibly invent a flag we set on the device that
prevents this from happening and causes the queues to remain up even
when the link appears down.

This will be useful as some BMCs have multiple NICs that can all do
NC-SI and thus we could have automatic fail over.

Cheers,
Ben.

Re: [PATCH net-next 00/10] NCSI Support

2016-07-05 Thread Alexei Starovoitov

On Wed, Jul 06, 2016 at 07:42:39AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2016-07-05 at 10:44 -0700, Alexei Starovoitov wrote:
> > 
> 
>  .../...
> 
> > > > The design for the patchset is highlighted as below:
> > > > 
> > > >    * The NCSI interface is abstracted with "struct ncsi_dev". It's 
> > > > registered
> > > >  when net_device is created, started to work by calling 
> > > > ncsi_start_dev()
> > > >  when net_device is opened (ndo_open()). For the first time, NCSI 
> > > > packets
> > > >  are sent and received to/from the far end (host in above figure) 
> > > > to probe
> > > >  available NCSI packages and channels. After that, one channel is 
> > > > chosen as
> > > >  active one to provide service.
> > > >    * The NCSI stack is driven by workqueue and state machine internally.
> > > >    * AEN (Asychronous Event Notification) might be received from the 
> > > > far end
> > > >  (host). The currently active NCSI channel fails over to another 
> > > > available
> > > >  one if possible. Otherwise, the NCSI channel is out of service.
> > > >    * NCSI stack should be configurable through netlink or another 
> > > > mechanism,
> > > >  but it's not implemented in this patchset. It's something TBD.
> > 
> > Gavin,
> > what configurations do you have in mind?
> > For ncsi itself or to control the nic on the host?
> > This set of patches is for BMC side, right?
> > What needs to be done on the host?
> 
> I'll respond for Gavin since I'm awake first ;-)
> 
> We use that stack today on OpenBMC on some OpenPOWER machines.
> 
> The configuration is thus for the above stack to run on the BMC in
> order to control the host NIC.
> 
> NC-SI capable host NICs operate autonomously, so there is nothing to be
> done on the host OS itself, at least not with the BCM NICs that we use
> today, but of course the host NIC firmware needs to have the other side
> of the stack.

Great! Thanks for clarifying.
So then future netlink api is mandatory to drive this kernel patches?
How one can use this set without it?
What is the main reason for this infra to be in the kernel instead of
userspace raw socket? Some interaction with the driver, right?
but it's not obvious from the patches.

Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC

2016-07-05 Thread Cong Wang

On Tue, Jul 5, 2016 at 5:05 PM, Matt Bennett
 wrote:
> On 07/06/2016 08:37 AM, Cong Wang wrote:
>> On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang  wrote:
>>> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
>>>  wrote:
 Using printk I have confirmed that ppp_pernet() is called from
 ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).

 This behavior appears to have been introduced in commit 1f461dc ("ppp:
 take reference on channels netns").
>>>
>>> We have some race condition here, where a parallel ppp_unregister_channel()
>>> could happen while we are in ppp_connect_channel().
>>>
>>> We need some synchronization for them. I am not sure what is the right lock
>>> here since ppp locking looks crazy.
>>
>> Matt, could you try if the attached patch helps?
>>
>> Thanks!
>>
> I have given that patch a good amount of testing and the BUG_ON() no
> longer is hit. Whether that is the best fix or not I am unsure?

At least my patch makes the net refcnt sync with pch life-time:
we grab a net refcnt when we allocate a pch, and release it when
we are going to destroy a pch. Makes sense to you?

>
> Either way, the following comment in ppp_unregister_channel() seems
> incorrect to me and should probably be deleted unless it is fixed?
>
> /*
>   * This ensures that we have returned from any calls into the
>   * the channel's start_xmit or ioctl routine before we proceed.
>   */

This comment is pretty old, I think it refers to the pch->ppp
check in ppp_connect_channel().

Thanks.

Re: [PATCH net-next v2 3/4] net: dsa: Suffix function manipulating device_node with _dn

2016-07-05 Thread Vivien Didelot

Hi,

Florian Fainelli  writes:

> On 07/05/2016 03:36 PM, Andrew Lunn wrote:
>> On Tue, Jul 05, 2016 at 03:07:12PM -0700, Florian Fainelli wrote:
>>> Make it clear that these functions take a device_node structure pointer
>> 
>> Hi Florian
>> 
>> Didn't we agree that we would only support a single device via a C
>> coded platform data structure?
>
> That is true for the devices I know about, both in and out of tree,
> however, while discussing offline with Vivien it seemed like there was a
> potential need for having a x86-based platform which could need that,
> Vivien do you think this platform could be in-tree one day (if not already)?

This customer platform is not mainlined yet and I cannot say today if it
will be. However it is likely to get a new revision soon with 3
interconnected 6352 hanging the x86 Baytrail.

DT on x86 is possible, but not straight-forward, and thanks to Florian's
work the pdata support is almost there for free.

>> All the functions you are renaming will never be called in that
>> case. So i think they can retain there names. You have no need to add
>> none device node equivalents.
>> 
>> So lets drop this patch.

The patch is not big and I think it doesn't hurt to add that explicit
suffix, I'd keep the patch in the series.

Thanks,

Vivien

Re: Network hang after c3f1010b30f7fc611139cfb702a8685741aa6827 with CIPSO & Smack

2016-07-05 Thread Casey Schaufler

On 7/5/2016 5:49 PM, David Ahern wrote:
> On 7/5/16 5:38 PM, Casey Schaufler wrote:
>> I have encountered a system hang with my Smack
>> networking tests that bisects to the change below.
>> I can't say that I have any idea why the change
>> would impact the Smack processing, but there appears
>> to be some serious packet processing going on. The
>> Smack code is using CIPSO on the loopback interface.
>> The test is supposed to verify that labels can be
>> set on the packets using CIPSO. Unlabeled packets
>> do not appear to be impacted. I do not know if SELinux
>> is affected, and if not, why not. Smack and SELinux
>> use CIPSO differently.
>
> What are the commands to repeat the test?
>
There is a tar file attached with the tests.
Put the etc/smack/user file into /etc/smack/user.
In the tools-2012 directory run make to build
the tools. The test in question is called
testnetworking.sh and needs to be run as root.
You will need to configure Smack in the kernel,
of course.



smack-tests.tar
Description: Binary data

Where is a git tree for submitting patch

2016-07-05 Thread Masashi Honma

Though I have made a patch to netdev, kbuild failed.

The kbuild failed with the file "net/netlabel/netlabel_calipso.c".

Though I watched below two trees, there was not such a file.

http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/tree/net/netlabel
http://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/tree/net/netlabel

Where is a correct tree ?

Regards,
Masashi Honma.

linux-next: manual merge of the net-next tree with the net tree

2016-07-05 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  drivers/net/usb/r8152.c

between commit:

  2609af19362d ("r8152: fix runtime function for RTL8152")

from the net tree and commit:

  a028a9e003f2 ("r8152: move the settings of PHY to a work queue")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/net/usb/r8152.c
index 0da72d39b4f9,24d367280ecf..
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@@ -624,7 -624,7 +624,8 @@@ struct r8152 
int (*eee_get)(struct r8152 *, struct ethtool_eee *);
int (*eee_set)(struct r8152 *, struct ethtool_eee *);
bool (*in_nway)(struct r8152 *);
 +  void (*autosuspend_en)(struct r8152 *tp, bool enable);
+   void (*hw_phy_cfg)(struct r8152 *);
} rtl_ops;
  
int intr_interval;
@@@ -4156,7 -4157,7 +4176,8 @@@ static int rtl_ops_init(struct r8152 *t
ops->eee_get= r8152_get_eee;
ops->eee_set= r8152_set_eee;
ops->in_nway= rtl8152_in_nway;
 +  ops->autosuspend_en = rtl_runtime_suspend_enable;
+   ops->hw_phy_cfg = r8152b_hw_phy_cfg;
break;
  
case RTL_VER_03:
@@@ -4172,7 -4173,7 +4193,8 @@@
ops->eee_get= r8153_get_eee;
ops->eee_set= r8153_set_eee;
ops->in_nway= rtl8153_in_nway;
 +  ops->autosuspend_en = rtl8153_runtime_enable;
+   ops->hw_phy_cfg = r8153_hw_phy_cfg;
break;
  
default:

Re: Network hang after c3f1010b30f7fc611139cfb702a8685741aa6827 with CIPSO & Smack

2016-07-05 Thread David Ahern


On 7/5/16 5:38 PM, Casey Schaufler wrote:

I have encountered a system hang with my Smack
networking tests that bisects to the change below.
I can't say that I have any idea why the change
would impact the Smack processing, but there appears
to be some serious packet processing going on. The
Smack code is using CIPSO on the loopback interface.
The test is supposed to verify that labels can be
set on the packets using CIPSO. Unlabeled packets
do not appear to be impacted. I do not know if SELinux
is affected, and if not, why not. Smack and SELinux
use CIPSO differently.


What are the commands to repeat the test?

Network hang after c3f1010b30f7fc611139cfb702a8685741aa6827 with CIPSO & Smack

2016-07-05 Thread Casey Schaufler

I have encountered a system hang with my Smack
networking tests that bisects to the change below.
I can't say that I have any idea why the change
would impact the Smack processing, but there appears
to be some serious packet processing going on. The
Smack code is using CIPSO on the loopback interface.
The test is supposed to verify that labels can be
set on the packets using CIPSO. Unlabeled packets
do not appear to be impacted. I do not know if SELinux
is affected, and if not, why not. Smack and SELinux
use CIPSO differently.


c3f1010b30f7fc611139cfb702a8685741aa6827

commit c3f1010b30f7fc611139cfb702a8685741aa6827
Merge: ca4aa97 0b922b7
Author: David S. Miller 
Date:   Wed May 11 19:31:40 2016 -0400

Merge branch 'vrf-pktinfo'

David Ahern says:


net: vrf: Fixup PKTINFO to return enslaved device index

Applications such as OSPF and BFD need the original ingress device not
the VRF device; the latter can be derived from the former. To that end
move the packet intercept from an rx handler that is invoked by
__netif_receive_skb_core to the ipv4 and ipv6 receive processing.

IPv6 already saves the skb_iif to the control buffer in ipv6_rcv. Since
the skb->dev has not been switched the cb has the enslaved device. Make
the same happen for IPv4 by adding the skb_iif to inet_skb_parm and set
it in ipv4 code after clearing the skb control buffer similar to IPv6.
From there the pktinfo can just pull it from cb with the PKTINFO_SKB_CB
cast.


Signed-off-by: David S. Miller

[RFC 6/7] genetlink: Add allocation flag to genlmsg_unicast()

2016-07-05 Thread Masashi Honma

Signed-off-by: Masashi Honma 
---
 drivers/net/gtp.c | 3 ++-
 drivers/net/team/team.c   | 5 +++--
 drivers/net/wireless/mac80211_hwsim.c | 2 +-
 fs/dlm/netlink.c  | 2 +-
 include/net/genetlink.h   | 8 +---
 kernel/taskstats.c| 2 +-
 net/hsr/hsr_netlink.c | 6 --
 net/l2tp/l2tp_netlink.c   | 8 +---
 net/openvswitch/datapath.c| 3 ++-
 net/tipc/netlink_compat.c | 2 +-
 net/wireless/nl80211.c| 9 +
 11 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 97e0cbc..0156abb 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -1210,7 +1210,8 @@ static int gtp_genl_get_pdp(struct sk_buff *skb, struct 
genl_info *info)
goto err_unlock_free;
 
rcu_read_unlock();
-   return genlmsg_unicast(genl_info_net(info), skb2, info->snd_portid);
+   return genlmsg_unicast(genl_info_net(info), skb2, info->snd_portid,
+  GFP_ATOMIC);
 
 err_unlock_free:
kfree_skb(skb2);
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index f9eebea..3d40b55 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2194,7 +2194,8 @@ static int team_nl_cmd_noop(struct sk_buff *skb, struct 
genl_info *info)
 
genlmsg_end(msg, hdr);
 
-   return genlmsg_unicast(genl_info_net(info), msg, info->snd_portid);
+   return genlmsg_unicast(genl_info_net(info), msg, info->snd_portid,
+  GFP_KERNEL);
 
 err_msg_put:
nlmsg_free(msg);
@@ -2240,7 +2241,7 @@ typedef int team_nl_send_func_t(struct sk_buff *skb,
 
 static int team_nl_send_unicast(struct sk_buff *skb, struct team *team, u32 
portid)
 {
-   return genlmsg_unicast(dev_net(team->dev), skb, portid);
+   return genlmsg_unicast(dev_net(team->dev), skb, portid, gfp_any());
 }
 
 static int team_nl_fill_one_option_get(struct sk_buff *skb, struct team *team,
diff --git a/drivers/net/wireless/mac80211_hwsim.c 
b/drivers/net/wireless/mac80211_hwsim.c
index 382109bb..5c7bf77 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -1008,7 +1008,7 @@ static int hwsim_unicast_netgroup(struct 
mac80211_hwsim_data *data,
rcu_read_lock();
for_each_net_rcu(net) {
if (data->netgroup == hwsim_net_get_netgroup(net)) {
-   res = genlmsg_unicast(net, skb, portid);
+   res = genlmsg_unicast(net, skb, portid, GFP_ATOMIC);
found = true;
break;
}
diff --git a/fs/dlm/netlink.c b/fs/dlm/netlink.c
index 1e6e227..c498616 100644
--- a/fs/dlm/netlink.c
+++ b/fs/dlm/netlink.c
@@ -59,7 +59,7 @@ static int send_data(struct sk_buff *skb)
 
genlmsg_end(skb, data);
 
-   return genlmsg_unicast(_net, skb, listener_nlportid);
+   return genlmsg_unicast(_net, skb, listener_nlportid, GFP_NOFS);
 }
 
 static int user_cmd(struct sk_buff *skb, struct genl_info *info)
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index b107a35..5f0f2ff 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -331,10 +331,12 @@ int genlmsg_multicast_allns(struct genl_family *family,
  * genlmsg_unicast - unicast a netlink message
  * @skb: netlink message as socket buffer
  * @portid: netlink portid of the destination socket
+ * @flags: allocation flags
  */
-static inline int genlmsg_unicast(struct net *net, struct sk_buff *skb, u32 
portid)
+static inline int genlmsg_unicast(struct net *net, struct sk_buff *skb,
+ u32 portid, gfp_t flags)
 {
-   return nlmsg_unicast(net->genl_sock, skb, portid, 0);
+   return nlmsg_unicast(net->genl_sock, skb, portid, flags);
 }
 
 /**
@@ -344,7 +346,7 @@ static inline int genlmsg_unicast(struct net *net, struct 
sk_buff *skb, u32 port
  */
 static inline int genlmsg_reply(struct sk_buff *skb, struct genl_info *info)
 {
-   return genlmsg_unicast(genl_info_net(info), skb, info->snd_portid);
+   return genlmsg_unicast(genl_info_net(info), skb, info->snd_portid, 0);
 }
 
 /**
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index b3f05ee..ecfcaff 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -140,7 +140,7 @@ static void send_cpu_listeners(struct sk_buff *skb,
if (!skb_next)
break;
}
-   rc = genlmsg_unicast(_net, skb_cur, s->pid);
+   rc = genlmsg_unicast(_net, skb_cur, s->pid, GFP_KERNEL);
if (rc == -ECONNREFUSED) {
s->valid = 0;
delcount++;
diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c
index d4d1617..dcc674f 100644
--- a/net/hsr/hsr_netlink.c
+++ b/net/hsr/hsr_netlink.c
@@

[RFC 7/7] genetlink: Add allocation flag to genlmsg_reply()

2016-07-05 Thread Masashi Honma

Add allocation flag to genlmsg_reply() and remove netlink_unicast()
temporal functionality for stepwise modification.

Signed-off-by: Masashi Honma 
---
 drivers/block/drbd/drbd_nl.c  |  2 +-
 drivers/net/wireless/mac80211_hwsim.c |  2 +-
 include/net/genetlink.h   |  7 +--
 kernel/taskstats.c|  2 +-
 net/core/devlink.c| 12 ++--
 net/ieee802154/ieee802154.h   |  3 ++-
 net/ieee802154/netlink.c  |  5 +++--
 net/ieee802154/nl-mac.c   |  4 ++--
 net/ieee802154/nl-phy.c   |  6 +++---
 net/ieee802154/nl802154.c |  4 ++--
 net/ipv4/fou.c|  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/ipv6/ila/ila_xlat.c   |  2 +-
 net/irda/irnetlink.c  |  2 +-
 net/netfilter/ipvs/ip_vs_ctl.c|  2 +-
 net/netlabel/netlabel_cipso_v4.c  |  2 +-
 net/netlabel/netlabel_mgmt.c  |  4 ++--
 net/netlabel/netlabel_unlabeled.c |  2 +-
 net/netlink/af_netlink.c  |  2 +-
 net/netlink/genetlink.c   |  2 +-
 net/nfc/netlink.c |  6 +++---
 net/openvswitch/datapath.c|  6 +++---
 net/tipc/bearer.c |  4 ++--
 net/tipc/node.c   |  2 +-
 net/wireless/nl80211.c| 34 +-
 25 files changed, 63 insertions(+), 58 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 0bac9c8..3162608 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -100,7 +100,7 @@ static char *drbd_m_holder = "Hands off! this is DRBD's 
meta data device.";
 static void drbd_adm_send_reply(struct sk_buff *skb, struct genl_info *info)
 {
genlmsg_end(skb, genlmsg_data(nlmsg_data(nlmsg_hdr(skb;
-   if (genlmsg_reply(skb, info))
+   if (genlmsg_reply(skb, info, GFP_KERNEL))
pr_err("error sending genl reply\n");
 }
 
diff --git a/drivers/net/wireless/mac80211_hwsim.c 
b/drivers/net/wireless/mac80211_hwsim.c
index 5c7bf77..5319cd1 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -3160,7 +3160,7 @@ static int hwsim_get_radio_nl(struct sk_buff *msg, struct 
genl_info *info)
goto out_err;
}
 
-   genlmsg_reply(skb, info);
+   genlmsg_reply(skb, info, GFP_KERNEL);
break;
}
 
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 5f0f2ff..99c9c39 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -343,10 +343,13 @@ static inline int genlmsg_unicast(struct net *net, struct 
sk_buff *skb,
  * genlmsg_reply - reply to a request
  * @skb: netlink message to be sent back
  * @info: receiver information
+ * @flags: allocation flags
  */
-static inline int genlmsg_reply(struct sk_buff *skb, struct genl_info *info)
+static inline int genlmsg_reply(struct sk_buff *skb, struct genl_info *info,
+   gfp_t flags)
 {
-   return genlmsg_unicast(genl_info_net(info), skb, info->snd_portid, 0);
+   return genlmsg_unicast(genl_info_net(info), skb, info->snd_portid,
+  flags);
 }
 
 /**
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index ecfcaff..894d0da 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -114,7 +114,7 @@ static int send_reply(struct sk_buff *skb, struct genl_info 
*info)
 
genlmsg_end(skb, reply);
 
-   return genlmsg_reply(skb, info);
+   return genlmsg_reply(skb, info, GFP_KERNEL);
 }
 
 /*
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 933e8d4..61a1c8a 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -501,7 +501,7 @@ static int devlink_nl_cmd_get_doit(struct sk_buff *skb, 
struct genl_info *info)
return err;
}
 
-   return genlmsg_reply(msg, info);
+   return genlmsg_reply(msg, info, GFP_KERNEL);
 }
 
 static int devlink_nl_cmd_get_dumpit(struct sk_buff *msg,
@@ -554,7 +554,7 @@ static int devlink_nl_cmd_port_get_doit(struct sk_buff *skb,
return err;
}
 
-   return genlmsg_reply(msg, info);
+   return genlmsg_reply(msg, info, GFP_KERNEL);
 }
 
 static int devlink_nl_cmd_port_get_dumpit(struct sk_buff *msg,
@@ -736,7 +736,7 @@ static int devlink_nl_cmd_sb_get_doit(struct sk_buff *skb,
return err;
}
 
-   return genlmsg_reply(msg, info);
+   return genlmsg_reply(msg, info, GFP_KERNEL);
 }
 
 static int devlink_nl_cmd_sb_get_dumpit(struct sk_buff *msg,
@@ -843,7 +843,7 @@ static int devlink_nl_cmd_sb_pool_get_doit(struct sk_buff 
*skb,
return err;
}
 
-   return genlmsg_reply(msg, info);
+   return genlmsg_reply(msg, info, GFP_KERNEL);
 }
 
 static int __sb_pool_get_dumpit(struct sk_buff *msg, int start, int

[RFC 2/7] netfilter: Add allocation flag to nfnetlink_unicast()

2016-07-05 Thread Masashi Honma

Signed-off-by: Masashi Honma 
---
 include/linux/netfilter/nfnetlink.h | 2 +-
 net/netfilter/nfnetlink.c   | 4 ++--
 net/netfilter/nfnetlink_log.c   | 4 ++--
 net/netfilter/nfnetlink_queue.c | 3 ++-
 4 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/netfilter/nfnetlink.h 
b/include/linux/netfilter/nfnetlink.h
index 1d82dd5..a1c7808 100644
--- a/include/linux/netfilter/nfnetlink.h
+++ b/include/linux/netfilter/nfnetlink.h
@@ -38,7 +38,7 @@ int nfnetlink_send(struct sk_buff *skb, struct net *net, u32 
portid,
   unsigned int group, int echo, gfp_t flags);
 int nfnetlink_set_err(struct net *net, u32 portid, u32 group, int error);
 int nfnetlink_unicast(struct sk_buff *skb, struct net *net, u32 portid,
- int flags);
+ int flags, gfp_t allocation);
 
 void nfnl_lock(__u8 subsys_id);
 void nfnl_unlock(__u8 subsys_id);
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
index f6193e7..b0910c7 100644
--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -141,9 +141,9 @@ int nfnetlink_set_err(struct net *net, u32 portid, u32 
group, int error)
 EXPORT_SYMBOL_GPL(nfnetlink_set_err);
 
 int nfnetlink_unicast(struct sk_buff *skb, struct net *net, u32 portid,
- int flags)
+ int flags, gfp_t allocation)
 {
-   return netlink_unicast(net->nfnl, skb, portid, flags, 0);
+   return netlink_unicast(net->nfnl, skb, portid, flags, allocation);
 }
 EXPORT_SYMBOL_GPL(nfnetlink_unicast);
 
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 11f81c8..c834306 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -357,8 +357,8 @@ __nfulnl_send(struct nfulnl_instance *inst)
goto out;
}
}
-   nfnetlink_unicast(inst->skb, inst->net, inst->peer_portid,
- MSG_DONTWAIT);
+   nfnetlink_unicast(inst->skb, inst->net, inst->peer_portid, MSG_DONTWAIT,
+ gfp_any());
 out:
inst->qlen = 0;
inst->skb = NULL;
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 5d36a09..8d7b6ff 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -638,7 +638,8 @@ __nfqnl_enqueue_packet(struct net *net, struct 
nfqnl_instance *queue,
*packet_id_ptr = htonl(entry->id);
 
/* nfnetlink_unicast will either free the nskb or add it to a socket */
-   err = nfnetlink_unicast(nskb, net, queue->peer_portid, MSG_DONTWAIT);
+   err = nfnetlink_unicast(nskb, net, queue->peer_portid, MSG_DONTWAIT,
+   GFP_ATOMIC);
if (err < 0) {
if (queue->flags & NFQA_CFG_F_FAIL_OPEN) {
failopen = 1;
-- 
2.7.4

[RFC 4/7] infiniband: Add allocation flag to ibnl_unicast()

2016-07-05 Thread Masashi Honma

Signed-off-by: Masashi Honma 
---
 drivers/infiniband/core/iwpm_msg.c  | 6 +++---
 drivers/infiniband/core/iwpm_util.c | 5 +++--
 drivers/infiniband/core/iwpm_util.h | 1 +
 drivers/infiniband/core/netlink.c   | 4 ++--
 include/rdma/rdma_netlink.h | 3 ++-
 5 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/iwpm_msg.c 
b/drivers/infiniband/core/iwpm_msg.c
index 1c41b95..4307eab 100644
--- a/drivers/infiniband/core/iwpm_msg.c
+++ b/drivers/infiniband/core/iwpm_msg.c
@@ -174,7 +174,7 @@ int iwpm_add_mapping(struct iwpm_sa_data *pm_msg, u8 
nl_client)
goto add_mapping_error;
nlmsg_request->req_buffer = pm_msg;
 
-   ret = ibnl_unicast(skb, nlh, iwpm_user_pid);
+   ret = ibnl_unicast(skb, nlh, iwpm_user_pid, GFP_ATOMIC);
if (ret) {
skb = NULL; /* skb is freed in the netlink send-op handling */
iwpm_user_pid = IWPM_PID_UNDEFINED;
@@ -251,7 +251,7 @@ int iwpm_add_and_query_mapping(struct iwpm_sa_data *pm_msg, 
u8 nl_client)
goto query_mapping_error;
nlmsg_request->req_buffer = pm_msg;
 
-   ret = ibnl_unicast(skb, nlh, iwpm_user_pid);
+   ret = ibnl_unicast(skb, nlh, iwpm_user_pid, GFP_ATOMIC);
if (ret) {
skb = NULL; /* skb is freed in the netlink send-op handling */
err_str = "Unable to send a nlmsg";
@@ -312,7 +312,7 @@ int iwpm_remove_mapping(struct sockaddr_storage 
*local_addr, u8 nl_client)
if (ret)
goto remove_mapping_error;
 
-   ret = ibnl_unicast(skb, nlh, iwpm_user_pid);
+   ret = ibnl_unicast(skb, nlh, iwpm_user_pid, GFP_ATOMIC);
if (ret) {
skb = NULL; /* skb is freed in the netlink send-op handling */
iwpm_user_pid = IWPM_PID_UNDEFINED;
diff --git a/drivers/infiniband/core/iwpm_util.c 
b/drivers/infiniband/core/iwpm_util.c
index b65e06c..6dcbb2d 100644
--- a/drivers/infiniband/core/iwpm_util.c
+++ b/drivers/infiniband/core/iwpm_util.c
@@ -609,7 +609,7 @@ static int send_mapinfo_num(u32 mapping_num, u8 nl_client, 
int iwpm_pid)
_num, IWPM_NLA_MAPINFO_SEND_NUM);
if (ret)
goto mapinfo_num_error;
-   ret = ibnl_unicast(skb, nlh, iwpm_pid);
+   ret = ibnl_unicast(skb, nlh, iwpm_pid, GFP_ATOMIC);
if (ret) {
skb = NULL;
err_str = "Unable to send a nlmsg";
@@ -638,7 +638,8 @@ static int send_nlmsg_done(struct sk_buff *skb, u8 
nl_client, int iwpm_pid)
return -ENOMEM;
}
nlh->nlmsg_type = NLMSG_DONE;
-   ret = ibnl_unicast(skb, (struct nlmsghdr *)skb->data, iwpm_pid);
+   ret = ibnl_unicast(skb, (struct nlmsghdr *)skb->data, iwpm_pid,
+  GFP_ATOMIC);
if (ret)
pr_warn("%s Unable to send a nlmsg\n", __func__);
return ret;
diff --git a/drivers/infiniband/core/iwpm_util.h 
b/drivers/infiniband/core/iwpm_util.h
index af1fc14..0ced7f4 100644
--- a/drivers/infiniband/core/iwpm_util.h
+++ b/drivers/infiniband/core/iwpm_util.h
@@ -46,6 +46,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/drivers/infiniband/core/netlink.c 
b/drivers/infiniband/core/netlink.c
index 09037a9..1451238 100644
--- a/drivers/infiniband/core/netlink.c
+++ b/drivers/infiniband/core/netlink.c
@@ -227,9 +227,9 @@ static void ibnl_rcv(struct sk_buff *skb)
 }
 
 int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh,
-   __u32 pid)
+   __u32 pid, gfp_t flags)
 {
-   return nlmsg_unicast(nls, skb, pid, gfp_any());
+   return nlmsg_unicast(nls, skb, pid, flags);
 }
 EXPORT_SYMBOL(ibnl_unicast);
 
diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
index 5852661..0bb3010 100644
--- a/include/rdma/rdma_netlink.h
+++ b/include/rdma/rdma_netlink.h
@@ -61,10 +61,11 @@ int ibnl_put_attr(struct sk_buff *skb, struct nlmsghdr *nlh,
  * @skb: The netlink skb
  * @nlh: Header of the netlink message to send
  * @pid: Userspace netlink process ID
+ * @flags: allocation flags
  * Returns 0 on success or a negative error code.
  */
 int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh,
-   __u32 pid);
+   __u32 pid, gfp_t flags);
 
 /**
  * Send the supplied skb to a netlink group.
-- 
2.7.4

[RFC 3/7] netlink: Add allocation flag to nlmsg_unicast()

2016-07-05 Thread Masashi Honma

Signed-off-by: Masashi Honma 
---
 crypto/crypto_user.c  |  3 ++-
 drivers/infiniband/core/netlink.c |  2 +-
 include/net/genetlink.h   |  2 +-
 include/net/netlink.h |  6 --
 net/core/rtnetlink.c  |  2 +-
 net/netfilter/nf_tables_api.c | 10 +-
 net/netlink/af_netlink.c  |  2 +-
 net/xfrm/xfrm_user.c  | 15 +--
 8 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 7097a33..f379b74 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -249,7 +249,8 @@ drop_alg:
if (err)
return err;
 
-   return nlmsg_unicast(crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
+   return nlmsg_unicast(crypto_nlsk, skb, NETLINK_CB(in_skb).portid,
+GFP_ATOMIC);
 }
 
 static int crypto_dump_report(struct sk_buff *skb, struct netlink_callback *cb)
diff --git a/drivers/infiniband/core/netlink.c 
b/drivers/infiniband/core/netlink.c
index 9b8c20c..09037a9 100644
--- a/drivers/infiniband/core/netlink.c
+++ b/drivers/infiniband/core/netlink.c
@@ -229,7 +229,7 @@ static void ibnl_rcv(struct sk_buff *skb)
 int ibnl_unicast(struct sk_buff *skb, struct nlmsghdr *nlh,
__u32 pid)
 {
-   return nlmsg_unicast(nls, skb, pid);
+   return nlmsg_unicast(nls, skb, pid, gfp_any());
 }
 EXPORT_SYMBOL(ibnl_unicast);
 
diff --git a/include/net/genetlink.h b/include/net/genetlink.h
index 8d4608c..b107a35 100644
--- a/include/net/genetlink.h
+++ b/include/net/genetlink.h
@@ -334,7 +334,7 @@ int genlmsg_multicast_allns(struct genl_family *family,
  */
 static inline int genlmsg_unicast(struct net *net, struct sk_buff *skb, u32 
portid)
 {
-   return nlmsg_unicast(net->genl_sock, skb, portid);
+   return nlmsg_unicast(net->genl_sock, skb, portid, 0);
 }
 
 /**
diff --git a/include/net/netlink.h b/include/net/netlink.h
index 898e449..df5b533 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -585,12 +585,14 @@ static inline int nlmsg_multicast(struct sock *sk, struct 
sk_buff *skb,
  * @sk: netlink socket to spread message to
  * @skb: netlink message as socket buffer
  * @portid: netlink portid of the destination socket
+ * @flags: allocation flags
  */
-static inline int nlmsg_unicast(struct sock *sk, struct sk_buff *skb, u32 
portid)
+static inline int nlmsg_unicast(struct sock *sk, struct sk_buff *skb,
+   u32 portid, gfp_t flags)
 {
int err;
 
-   err = netlink_unicast(sk, skb, portid, MSG_DONTWAIT, 0);
+   err = netlink_unicast(sk, skb, portid, MSG_DONTWAIT, flags);
if (err > 0)
err = 0;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 3433633f..7f7927f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -657,7 +657,7 @@ int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 
pid)
 {
struct sock *rtnl = net->rtnl;
 
-   return nlmsg_unicast(rtnl, skb, pid);
+   return nlmsg_unicast(rtnl, skb, pid, gfp_any());
 }
 EXPORT_SYMBOL(rtnl_unicast);
 
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 2c88187..4afb751 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -581,7 +581,7 @@ static int nf_tables_gettable(struct net *net, struct sock 
*nlsk,
if (err < 0)
goto err;
 
-   return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid);
+   return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid, GFP_KERNEL);
 
 err:
kfree_skb(skb2);
@@ -1144,7 +1144,7 @@ static int nf_tables_getchain(struct net *net, struct 
sock *nlsk,
if (err < 0)
goto err;
 
-   return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid);
+   return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid, GFP_KERNEL);
 
 err:
kfree_skb(skb2);
@@ -1976,7 +1976,7 @@ static int nf_tables_getrule(struct net *net, struct sock 
*nlsk,
if (err < 0)
goto err;
 
-   return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid);
+   return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid, GFP_KERNEL);
 
 err:
kfree_skb(skb2);
@@ -2664,7 +2664,7 @@ static int nf_tables_getset(struct net *net, struct sock 
*nlsk,
if (err < 0)
goto err;
 
-   return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid);
+   return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid, GFP_KERNEL);
 
 err:
kfree_skb(skb2);
@@ -3798,7 +3798,7 @@ static int nf_tables_getgen(struct net *net, struct sock 
*nlsk,
if (err < 0)
goto err;
 
-   return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid);
+   return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid, GFP_KERNEL);
 err:
kfree_skb(skb2);
return err;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index

[RFC 5/7] net: Add allocation flag to rtnl_unicast()

2016-07-05 Thread Masashi Honma

Signed-off-by: Masashi Honma 
---
 include/linux/rtnetlink.h |  3 ++-
 net/core/net_namespace.c  |  2 +-
 net/core/rtnetlink.c  | 10 ++
 net/dcb/dcbnl.c   |  2 +-
 net/decnet/dn_route.c |  3 ++-
 net/ipv4/devinet.c|  2 +-
 net/ipv4/ipmr.c   |  6 --
 net/ipv4/route.c  |  2 +-
 net/ipv6/addrconf.c   |  4 ++--
 net/ipv6/addrlabel.c  |  2 +-
 net/ipv6/ip6mr.c  |  6 --
 net/ipv6/route.c  |  2 +-
 net/sched/act_api.c   |  2 +-
 13 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 2daece8..132730f 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -8,7 +8,8 @@
 #include 
 
 extern int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, u32 
group, int echo);
-extern int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid);
+extern int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid,
+   gfp_t flags);
 extern void rtnl_notify(struct sk_buff *skb, struct net *net, u32 pid,
u32 group, struct nlmsghdr *nlh, gfp_t flags);
 extern void rtnl_set_sk_err(struct net *net, u32 group, int error);
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b..28eed58 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -646,7 +646,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh)
if (err < 0)
goto err_out;
 
-   err = rtnl_unicast(msg, net, NETLINK_CB(skb).portid);
+   err = rtnl_unicast(msg, net, NETLINK_CB(skb).portid, GFP_KERNEL);
goto out;
 
 err_out:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 7f7927f..89fd826 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -653,11 +653,11 @@ int rtnetlink_send(struct sk_buff *skb, struct net *net, 
u32 pid, unsigned int g
return err;
 }
 
-int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid)
+int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid, gfp_t flags)
 {
struct sock *rtnl = net->rtnl;
 
-   return nlmsg_unicast(rtnl, skb, pid, gfp_any());
+   return nlmsg_unicast(rtnl, skb, pid, flags);
 }
 EXPORT_SYMBOL(rtnl_unicast);
 
@@ -2565,7 +2565,8 @@ static int rtnl_getlink(struct sk_buff *skb, struct 
nlmsghdr* nlh)
WARN_ON(err == -EMSGSIZE);
kfree_skb(nskb);
} else
-   err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+   err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid,
+  GFP_KERNEL);
 
return err;
 }
@@ -3601,7 +3602,8 @@ static int rtnl_stats_get(struct sk_buff *skb, struct 
nlmsghdr *nlh)
WARN_ON(err == -EMSGSIZE);
kfree_skb(nskb);
} else {
-   err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+   err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid,
+  GFP_KERNEL);
}
 
return err;
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 4f6c186..e4de9fe 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1749,7 +1749,7 @@ static int dcb_doit(struct sk_buff *skb, struct nlmsghdr 
*nlh)
 
nlmsg_end(reply_skb, reply_nlh);
 
-   ret = rtnl_unicast(reply_skb, net, portid);
+   ret = rtnl_unicast(reply_skb, net, portid, GFP_KERNEL);
 out:
return ret;
 }
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index b1dc096..6fe02bb 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1714,7 +1714,8 @@ static int dn_cache_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh)
goto out_free;
}
 
-   return rtnl_unicast(skb, _net, NETLINK_CB(in_skb).portid);
+   return rtnl_unicast(skb, _net, NETLINK_CB(in_skb).portid,
+   GFP_KERNEL);
 
 out_free:
kfree_skb(skb);
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index e333bc8..5e969e5 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -1917,7 +1917,7 @@ static int inet_netconf_get_devconf(struct sk_buff 
*in_skb,
kfree_skb(skb);
goto errout;
}
-   err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid);
+   err = rtnl_unicast(skb, net, NETLINK_CB(in_skb).portid, GFP_ATOMIC);
 errout:
return err;
 }
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 5ad48ec..c704a2a 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -654,7 +654,8 @@ static void ipmr_destroy_unres(struct mr_table *mrt, struct 
mfc_cache *c)
e->error = -ETIMEDOUT;
memset(>msg, 0, sizeof(e->msg));
 
-   rtnl_unicast(skb, net, NETLINK_CB(skb).portid);
+   rtnl_unicast(skb, net, NETLINK_CB(skb).portid,
+

[RFC 0/7] netlink: Add allocation flag to netlink_unicast()

2016-07-05 Thread Masashi Honma

Though netlink_broadcast() has allocation flag which can specify
memory allocation type (ex. GFP_KERNEL/GFP_ATOMIC), netlink_unicast()
does not have it. This can cause "BUG: sleeping function called from
invalid context at" with CONFIG_DEBUG_ATOMIC_SLEEP enabled kernel when
calling netlink_unicast() inside RCU read-side section and not in IRQ.

Though currently such a use case was not found, to solve potential
issue we will add an allocation flag to netlink_unicast(). Previously
netlink_unicast() have used gfp_any() as a flag. We replaced it to
GFP_KERNEL or GFP_ATOMIC or etc by guessing based on context. If we
could not determine the value, we remain it gfp_any(). We welcome
comments like "this gfp_any() should be GFP_KERNEL". Of course other
comments are welcome as well.

This series of patches are not tested.
This is a RFC because this does not fix existing issue.

Masashi Honma (7):
  netlink: Add allocation flag to netlink_unicast()
  netfilter: Add allocation flag to nfnetlink_unicast()
  netlink: Add allocation flag to nlmsg_unicast()
  infiniband: Add allocation flag to ibnl_unicast()
  net: Add allocation flag to rtnl_unicast()
  genetlink: Add allocation flag to genlmsg_unicast()
  genetlink: Add allocation flag to genlmsg_reply()

 crypto/crypto_user.c  |  3 ++-
 drivers/block/drbd/drbd_nl.c  |  2 +-
 drivers/connector/connector.c |  2 +-
 drivers/infiniband/core/iwpm_msg.c|  6 ++---
 drivers/infiniband/core/iwpm_util.c   |  5 ++--
 drivers/infiniband/core/iwpm_util.h   |  1 +
 drivers/infiniband/core/netlink.c |  4 ++--
 drivers/net/gtp.c |  3 ++-
 drivers/net/team/team.c   |  5 ++--
 drivers/net/wireless/mac80211_hwsim.c |  4 ++--
 fs/dlm/netlink.c  |  2 +-
 include/linux/netfilter/nfnetlink.h   |  2 +-
 include/linux/netlink.h   |  3 ++-
 include/linux/rtnetlink.h |  3 ++-
 include/net/genetlink.h   | 13 +++
 include/net/netlink.h |  6 +++--
 include/rdma/rdma_netlink.h   |  3 ++-
 kernel/audit.c|  9 
 kernel/taskstats.c|  4 ++--
 net/core/devlink.c| 12 +-
 net/core/net_namespace.c  |  2 +-
 net/core/rtnetlink.c  | 12 ++
 net/dcb/dcbnl.c   |  2 +-
 net/decnet/dn_route.c |  3 ++-
 net/hsr/hsr_netlink.c |  6 +++--
 net/ieee802154/ieee802154.h   |  3 ++-
 net/ieee802154/netlink.c  |  5 ++--
 net/ieee802154/nl-mac.c   |  4 ++--
 net/ieee802154/nl-phy.c   |  6 ++---
 net/ieee802154/nl802154.c |  4 ++--
 net/ipv4/devinet.c|  2 +-
 net/ipv4/fib_frontend.c   |  2 +-
 net/ipv4/fou.c|  2 +-
 net/ipv4/inet_diag.c  |  2 +-
 net/ipv4/ipmr.c   |  6 +++--
 net/ipv4/route.c  |  2 +-
 net/ipv4/tcp_metrics.c|  2 +-
 net/ipv4/udp_diag.c   |  2 +-
 net/ipv6/addrconf.c   |  4 ++--
 net/ipv6/addrlabel.c  |  2 +-
 net/ipv6/ila/ila_xlat.c   |  2 +-
 net/ipv6/ip6mr.c  |  6 +++--
 net/ipv6/route.c  |  2 +-
 net/irda/irnetlink.c  |  2 +-
 net/l2tp/l2tp_netlink.c   |  8 ---
 net/netfilter/ipset/ip_set_core.c | 11 +
 net/netfilter/ipvs/ip_vs_ctl.c|  2 +-
 net/netfilter/nf_conntrack_netlink.c  |  9 +---
 net/netfilter/nf_tables_api.c | 10 
 net/netfilter/nfnetlink.c |  4 ++--
 net/netfilter/nfnetlink_acct.c|  2 +-
 net/netfilter/nfnetlink_cthelper.c|  2 +-
 net/netfilter/nfnetlink_cttimeout.c   |  5 ++--
 net/netfilter/nfnetlink_log.c |  4 ++--
 net/netfilter/nfnetlink_queue.c   |  3 ++-
 net/netfilter/nft_compat.c|  4 ++--
 net/netlabel/netlabel_cipso_v4.c  |  2 +-
 net/netlabel/netlabel_mgmt.c  |  4 ++--
 net/netlabel/netlabel_unlabeled.c |  2 +-
 net/netlink/af_netlink.c  | 14 +++-
 net/netlink/genetlink.c   |  2 +-
 net/nfc/netlink.c |  6 ++---
 net/openvswitch/datapath.c|  9 
 net/sched/act_api.c   |  2 +-
 net/sctp/sctp_diag.c  |  2 +-
 net/tipc/bearer.c |  4 ++--
 net/tipc/netlink_compat.c |  2 +-
 net/tipc/node.c   |  2 +-
 net/unix/diag.c   |  2 +-
 net/wireless/nl80211.c| 43 ++-
 net/xfrm/xfrm_user.c  | 15 +++-
 samples/connector/cn_test.c   |  2 +-
 72 files changed, 199 insertions(+), 155 deletions(-)

-- 
2.7.4

[RFC 1/7] netlink: Add allocation flag to netlink_unicast()

2016-07-05 Thread Masashi Honma

Though netlink_broadcast() has allocation flag which can specify
memory allocation type (ex. GFP_KERNEL/GFP_ATOMIC), netlink_unicast()
does not have it. This can cause "BUG: sleeping function called from
invalid context at" with CONFIG_DEBUG_ATOMIC_SLEEP enabled kernel when
calling netlink_unicast() inside RCU read-side section and not in IRQ.

This patch adds an allocation flag to netlink_unicast().

At this moment, the allocation flag could be zero to imply gfp_any().
This is a temporal functionality for stepwise modification and
removed at the end of the series of patches.

Signed-off-by: Masashi Honma 
---
 drivers/connector/connector.c|  2 +-
 include/linux/netlink.h  |  3 ++-
 include/net/netlink.h|  2 +-
 kernel/audit.c   |  9 +
 net/core/rtnetlink.c |  2 +-
 net/ipv4/fib_frontend.c  |  2 +-
 net/ipv4/inet_diag.c |  2 +-
 net/ipv4/udp_diag.c  |  2 +-
 net/netfilter/ipset/ip_set_core.c| 11 +++
 net/netfilter/nf_conntrack_netlink.c |  9 ++---
 net/netfilter/nfnetlink.c|  2 +-
 net/netfilter/nfnetlink_acct.c   |  2 +-
 net/netfilter/nfnetlink_cthelper.c   |  2 +-
 net/netfilter/nfnetlink_cttimeout.c  |  5 +++--
 net/netfilter/nft_compat.c   |  4 ++--
 net/netlink/af_netlink.c | 12 +++-
 net/sctp/sctp_diag.c |  2 +-
 net/unix/diag.c  |  2 +-
 samples/connector/cn_test.c  |  2 +-
 19 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/drivers/connector/connector.c b/drivers/connector/connector.c
index 25693b0..44470e6 100644
--- a/drivers/connector/connector.c
+++ b/drivers/connector/connector.c
@@ -125,7 +125,7 @@ int cn_netlink_send_mult(struct cn_msg *msg, u16 len, u32 
portid, u32 __group,
return netlink_broadcast(dev->nls, skb, portid, group,
 gfp_mask);
return netlink_unicast(dev->nls, skb, portid,
-   !gfpflags_allow_blocking(gfp_mask));
+  !gfpflags_allow_blocking(gfp_mask), gfp_mask);
 }
 EXPORT_SYMBOL_GPL(cn_netlink_send_mult);
 
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index da14ab6..f90d24a 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -69,7 +69,8 @@ extern void __netlink_clear_multicast_users(struct sock *sk, 
unsigned int group)
 extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err);
 extern int netlink_has_listeners(struct sock *sk, unsigned int group);
 
-extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 
portid, int nonblock);
+extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 portid,
+  int nonblock, gfp_t allocation);
 extern int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, __u32 
portid,
 __u32 group, gfp_t allocation);
 extern int netlink_broadcast_filtered(struct sock *ssk, struct sk_buff *skb,
diff --git a/include/net/netlink.h b/include/net/netlink.h
index 254a0fc..898e449 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -590,7 +590,7 @@ static inline int nlmsg_unicast(struct sock *sk, struct 
sk_buff *skb, u32 portid
 {
int err;
 
-   err = netlink_unicast(sk, skb, portid, MSG_DONTWAIT);
+   err = netlink_unicast(sk, skb, portid, MSG_DONTWAIT, 0);
if (err > 0)
err = 0;
 
diff --git a/kernel/audit.c b/kernel/audit.c
index 8d528f9..131577d 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -411,7 +411,7 @@ static void kauditd_send_skb(struct sk_buff *skb)
 restart:
/* take a reference in case we can't send it and we want to hold it */
skb_get(skb);
-   err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0);
+   err = netlink_unicast(audit_sock, skb, audit_nlk_portid, 0, gfp_any());
if (err < 0) {
pr_err("netlink_unicast sending to audit_pid=%d returned error: 
%d\n",
   audit_pid, err);
@@ -547,7 +547,7 @@ int audit_send_list(void *_dest)
mutex_unlock(_cmd_mutex);
 
while ((skb = __skb_dequeue(>q)) != NULL)
-   netlink_unicast(aunet->nlsk, skb, dest->portid, 0);
+   netlink_unicast(aunet->nlsk, skb, dest->portid, 0, gfp_any());
 
put_net(net);
kfree(dest);
@@ -591,7 +591,7 @@ static int audit_send_reply_thread(void *arg)
 
/* Ignore failure. It'll only happen if the sender goes away,
   because our timeout is set to infinite. */
-   netlink_unicast(aunet->nlsk , reply->skb, reply->portid, 0);
+   netlink_unicast(aunet->nlsk , reply->skb, reply->portid, 0, gfp_any());
put_net(net);
kfree(reply);
return 0;
@@ -814,7 +814,8 @@ static int audit_replace(pid_t pid)
 
if (!skb)
return

Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC

2016-07-05 Thread Matt Bennett

On 07/06/2016 08:37 AM, Cong Wang wrote:
> On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang  wrote:
>> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
>>  wrote:
>>> Using printk I have confirmed that ppp_pernet() is called from
>>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>>
>>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>>> take reference on channels netns").
>>
>> We have some race condition here, where a parallel ppp_unregister_channel()
>> could happen while we are in ppp_connect_channel().
>>
>> We need some synchronization for them. I am not sure what is the right lock
>> here since ppp locking looks crazy.
>
> Matt, could you try if the attached patch helps?
>
> Thanks!
>
I have given that patch a good amount of testing and the BUG_ON() no 
longer is hit. Whether that is the best fix or not I am unsure?

Either way, the following comment in ppp_unregister_channel() seems 
incorrect to me and should probably be deleted unless it is fixed?

/*
  * This ensures that we have returned from any calls into the
  * the channel's start_xmit or ioctl routine before we proceed.
  */

It appears mutex_lock(_mutex) what locks ppp_ioctl. ppp_xmit uses 
ppp_xmit_lock(ppp) in ppp_xmit_process.



--
To unsubscribe from this list: send the line "unsubscribe linux-ppp" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: DCB Auto turn off when link become up with Intel 82599ES ixgbe driver

2016-07-05 Thread John Fastabend

On 16-07-05 08:38 AM, Alexander Duyck wrote:
> On Tue, Jul 5, 2016 at 8:17 AM, ayuj  wrote:
>> I'm trying to configure DCB in back-to-back scenario.
>> System details
>>
>> OS :- CentOS 7.2
>> kernel 3.10.0-327.el7.x86_64
>> lldpad:- lldpad v0.9.46
>> dcbtool:- v0.9.46
>> ixgbe :- ixgbe-4.3.15
>>
>> steps followed:-
>>
>> # modporbe ixgbe
>> # service lldpad start
>> Redirecting to /bin/systemctl start  lldpad.service
>>
>>
>> # service lldpad status
>> Redirecting to /bin/systemctl status  lldpad.service
>> ● lldpad.service - Link Layer Discovery Protocol Agent Daemon.
>>Loaded: loaded (/usr/lib/systemd/system/lldpad.service; disabled; vendor
>> preset: disabled)
>>Active: active (running) since Tue 2016-07-05 05:49:12 EDT; 1s ago
>>  Main PID: 133737 (lldpad)
>>CGroup: /system.slice/lldpad.service
>>└─133737 /usr/sbin/lldpad -t
>>
>> Jul 05 05:49:12 localhost.localdomain systemd[1]: Started Link Layer
>> Discovery Protocol Agent Daemon..
>> Jul 05 05:49:12 localhost.localdomain systemd[1]: Starting Link Layer
>> Discovery Protocol Agent Daemon
>>
>>
>> # dcbtool gc p3p2 dcb
>> Command:Get Config
>> Feature:DCB State
>> Port:   p3p2
>> Status: Successful
>> DCB State:  off
>>
>> # dcbtool sc p3p2 dcb on
>> Command:Set Config
>> Feature:DCB State
>> Port:   p3p2
>> Status: Successful
>>
>> # dcbtool gc p3p2 dcb
>> Command:Get Config
>> Feature:DCB State
>> Port:   p3p2
>> Status: Successful
>> DCB State:  on
>>
>>
>> Similar configuration on another system.
>>
>> But as soon as I do "ifconfig p3p2 up and check dcb
>>
>> # dcbtool gc p3p2 dcb
>> Command:Get Config
>> Feature:DCB State
>> Port:   p3p2
>> Status: Successful
>> DCB State:  off
>>
>> Initially suspecting issue with lldpad daemon/ixgbe driver. I replaced them
>> (downloaded from repo) with latest packages.
>> Same behaviour observed.
>>
>> - Tried disabling other dcb features like fcoe. Same issue.
>> - Disabled selinux etc
>>
>>
>> Can any 1 help me regarding this.
> 
> I'm adding e1000-devel as that is the mailing list for the ixgbe maintainers.
> 
> You might want to go through and also check the number of TCs and how
> you have them mapped to your priority groups.  It is possible that one
> end or the other may only have 1 TC configured and if I am not
> mistaken that is interpreted as having DCB disabled by the hardware.
> 
> Hope that helps.
> 
> - Alex
> 

hmm those are older version of the tool but that shouldn't necessarily
be an issue.

The output from the following would help, (note if the peer is not
configured correctly as well DCB will be disabled)

#dcbtool go p3p2 pg
#dcbtool go p3p2 pfc

#lldptool -t -i p3p2
#lldptool -t -i p3p2 -n


The first two dcbtool commands give the dcb status and the last two
lldptool commands print the sent/received lldp TLVs.

.John

[GIT] Networking

2016-07-05 Thread David Miller


1) All users of AF_PACKET's fanout feature want a symmetric packet
   header hash for load balancing purposes, so give it to them.

2) Fix vlan state synchronization in e1000e, from Jarod Wilson.

3) Use correct socket pointer in ip_skb_dst_mtu(), from Shmulik
   Ladkani.

4) mlx5 bug fixes from Mohamad Haj Yahia, Daniel Jurgens, Matthew
   Finlay, Rana Shahout, and Shaker Daibes.  Mostly to do with
   operation timeouts and PCI error handling.

5) Fix checksum handling in mirred packet action, from WANG Cong.

6) Set skb->dev correctly when transmitting in !protect_frames case
   of macsec driver, from Daniel Borkmann.

7) Fix MTU calculation in geneve driver, from Haishuang Yan.

8) Missing netif_napi_del() in unregister path of qeth driver, from
   Ursula Braun.

9) Handle malformed route netlink messages in decnet properly, from
   Vergard Nossum.

10) Memory leak of percpu data in ipv6 routing code, from Martin
KaFai Lau.

Please pull, thanks a lot!

The following changes since commit e7bdea7750eb2a64aea4a08fa5c0a31719c8155d:

  Merge tag 'nfs-for-4.7-2' of git://git.linux-nfs.org/projects/anna/linux-nfs 
(2016-06-29 15:30:26 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to 903ce4abdf374e3365d93bcb3df56c62008835ba:

  ipv6: Fix mem leak in rt6i_pcpu (2016-07-05 14:09:23 -0700)


Aviv Heller (1):
  bonding: fix enslavement slave link notifications

Bjørn Mork (1):
  cdc_ncm: workaround for EM7455 "silent" data interface

Christophe Jaillet (1):
  fsl/fman: fix error handling

Daniel Borkmann (1):
  macsec: set actual real device for xmit when !protect_frames

Daniel Jurgens (5):
  net/mlx5: Fix incorrect page count when in internal error
  net/mlx5: Fix wait_vital for VFs and remove fixed sleep
  net/mlx5e: Timeout if SQ doesn't flush during close
  net/mlx5e: Implement ndo_tx_timeout callback
  net/mlx5e: Handle RQ flush in error cases

David S. Miller (4):
  Merge branch 'master' of git://git.kernel.org/.../jkirsher/net-queue
  Merge branch 'mlx5-fixes'
  packet: Use symmetric hash for PACKET_FANOUT_HASH.
  Revert "fsl/fman: fix error handling"

Eric Dumazet (1):
  bonding: prevent out of bound accesses

Florian Fainelli (1):
  net: bcmsysport: Device stats are unsigned long

Ganesh Goudar (1):
  cxgb4: update latest firmware version supported

Haishuang Yan (1):
  geneve: fix max_mtu setting

Jarod Wilson (1):
  e1000e: keep Rx/Tx HW_VLAN_CTAG in sync

Martin KaFai Lau (1):
  ipv6: Fix mem leak in rt6i_pcpu

Matt Corallo (1):
  net: stmmac: Fix null-function call in ISR on stmmac1000

Matthew Finlay (1):
  net/mlx5e: Copy all L2 headers into inline segment

Mohamad Haj Yahia (4):
  net/mlx5: Fix teardown errors that happen in pci error handler
  net/mlx5: Avoid calling sleeping function by the health poll thread
  net/mlx5: Fix potential deadlock in command mode change
  net/mlx5: Add timeout handle to commands with callback

Or Gerlitz (1):
  net/mlx5: Avoid setting unused var when modifying vport node GUID

Rana Shahout (2):
  net/mlx5e: Fix select queue callback
  net/mlx5e: Validate BW weight values of ETS

Richard Alpe (1):
  tipc: fix nl compat regression for link statistics

Russell King - ARM Linux (1):
  net: mvneta: fix open() error cleanup

Sergio Valverde (1):
  enc28j60: Fix race condition in enc28j60 driver

Shaker Daibes (1):
  net/mlx5e: Log link state changes

Shmulik Ladkani (1):
  ipv4: Fix ip_skb_dst_mtu to use the sk passed by ip_finish_output

Sony Chacko (1):
  qlcnic: add wmb() call in transmit data path.

Soohoon Lee (1):
  usbnet: Stop RX Q on MTU change

Stefan Hauser (1):
  net: phy: dp83867: Fix initialization of PHYCR register

Ursula Braun (1):
  qeth: delete napi struct when removing a qeth device

Vegard Nossum (2):
  RDS: fix rds_tcp_init() error path
  net: fix decnet rtnexthop parsing

WANG Cong (1):
  net_sched: fix mirrored packets checksum

Xin Long (1):
  ixgbevf: ixgbevf_write/read_posted_mbx should use IXGBE_ERR_MBX to 
initialize ret_val

hayeswang (2):
  r8152: clear LINK_OFF_WAKE_EN after autoresume
  r8152: fix runtime function for RTL8152

 drivers/net/bonding/bond_3ad.c  |  11 ---
 drivers/net/bonding/bond_alb.c  |   7 ++---
 drivers/net/bonding/bond_main.c |   1 +
 drivers/net/ethernet/broadcom/bcmsysport.c  |   2 +-
 drivers/net/ethernet/chelsio/cxgb4/t4fw_version.h   |  12 +++
 drivers/net/ethernet/intel/e1000e/netdev.c  |  21 ++---
 drivers/net/ethernet/intel/ixgbevf/mbx.c|   4 +--
 drivers/net/ethernet/marvell/mvneta.c   |   2 ++
 drivers/net/ethernet/mellanox/mlx5/core/cmd.c   | 129

Re: [PATCH 6/7] dt-bindings: net: bgmac: add bindings documentation for bgmac

2016-07-05 Thread Jon Mason

On Tue, Jul 5, 2016 at 9:37 AM, Arnd Bergmann  wrote:
> On Monday, July 4, 2016 9:34:35 AM CEST Ray Jui wrote:
>> On 7/1/2016 8:42 AM, Arnd Bergmann wrote:
>> > On Friday, July 1, 2016 11:17:25 AM CEST Jon Mason wrote:
>> >> On Fri, Jul 1, 2016 at 5:46 AM, Arnd Bergmann  wrote:
>> >>> On Thursday, June 30, 2016 6:59:13 PM CEST Jon Mason wrote:
>>  +
>>  +Required properties:
>>  + - compatible: "brcm,bgmac-nsp"
>>  + - reg:Address and length of the GMAC registers,
>>  +   Address and length of the GMAC IDM registers
>>  + - reg-names:  Names of the registers.  Must have both "gmac_base" and
>>  +   "idm_base"
>>  + - interrupts: Interrupt number
>>  +
>> >>>
>> >>>
>> >>> "brcm,bgmac-nsp" sounds a bit too general. As I understand, this is a 
>> >>> family
>> >>> of SoCs that might not all have the exact same implementation of this
>> >>> ethernet device, as we can see from the long lookup table in 
>> >>> bgmac_probe().
>> >>
>> >> The Broadcom iProc family of SoCs contains:
>> >> Northstar
>> >> Northstar Plus
>> >> Cygnus
>> >> Northstar 2
>> >> a few SoCs that are under development
>> >> and a number of ethernet switches (which might never be officially 
>> >> supported)
>> >>
>> >> Each one of these SoCs could have a different revision of the gmac IP
>> >> block, but they should be uniform within each SoC (though there might
>> >> be a A0/B0 change necessary).  The Northstar Plus product family has a
>> >> number of different implementations, but the SoC is unchanged.  So, I
>> >> think this might be too specific, when we really need a general compat
>> >> string.
>> >
>> > Ok, thanks for the clarification, that sounds good enough.
>> >
>> >> Broadcom has a history of sharing IP blocks amongst the different
>> >> divisions.  So, this driver might be used on other SoC families (as it
>> >> apparently has been done in the past, based on the code you
>> >> reference).  I do not know of any way to know what legacy, non-iProc
>> >> chips have used this IP block.  I can make this "brcm,iproc-bgmac",
>> >> and add "brcm,iproc-nsp-bgmac" as an alternative compatible string in
>> >> this file (which I believe you are suggesting), but there might be
>> >> non-iProc SoCs that use this driver.  Is this acceptable?
>> >
>> > If it is also used outside of iProc, then I see no need for the
>> > extra compatible string, although it would not do any harm either.
>> >
>> > Ideally we should name it whatever the name for this IP block is
>> > inside of the company, with "nsp" as the designation for the variant
>> > in Northstar Plus. A lot of Broadcom IP blocks themselves seem to have
>> > some four-digit or five-digit number, maybe this one does too?
>> >
>> >   Arnd
>> >
>>
>> Note this IP block has an official IP controller name of "amac" from the
>> ASIC team.
>
> Ok, then I'd suggest making the compatible string here
>
> compatible = "brcm,nsp-amac", "brcm,amac";

It is called GMAC in the NS and NSP documentation, but AMAC is fine
with me (as it is called this in the NS2 documentation).  I'll make
the necessary change and repush.

Thanks for all of the input.

> or even better if you have a version number associated with it, make that
>
> compatible = "brcm,nsp-amac", "brcm,amac-1.234", "brcm,amac";
>
> replacing 1.234 with the actual version of course.
>
> Arnd
>

Re: [PATCH net-next v2 3/4] net: dsa: Suffix function manipulating device_node with _dn

2016-07-05 Thread Andrew Lunn

> That is true for the devices I know about, both in and out of tree,
> however, while discussing offline with Vivien it seemed like there was a
> potential need for having a x86-based platform which could need that,
> Vivien do you think this platform could be in-tree one day (if not already)?

x86 can do device tree. And if it is a complex board, it probably
needs more of the device tree features like fixed-phys, phy-mode,
devlink when it lands, etc.

If this board really does come to mainline, we should consider support
for it, one way or the other, but until then, i prefer KISS.

   Andrew

Re: [PATCH net-next 3/9] net: r6040: Utilize skb_put_padto()

2016-07-05 Thread Francois Romieu

Florian Fainelli  :
[...]
> diff --git a/drivers/net/ethernet/rdc/r6040.c 
> b/drivers/net/ethernet/rdc/r6040.c
> index 75776eee36f9..46ed093348da 100644
> --- a/drivers/net/ethernet/rdc/r6040.c
> +++ b/drivers/net/ethernet/rdc/r6040.c
> @@ -815,6 +815,9 @@ static netdev_tx_t r6040_start_xmit(struct sk_buff *skb,
>   void __iomem *ioaddr = lp->base;
>   unsigned long flags;
>  
> + if (skb_put_padto(skb, ETH_ZLEN) < 0)
> + return NETDEV_TX_OK;
> +

Missing dev->stats->tx_dropped++.

-- 
Ueimor

Re: [PATCH net-next v2 3/4] net: dsa: Suffix function manipulating device_node with _dn

2016-07-05 Thread Florian Fainelli

On 07/05/2016 03:36 PM, Andrew Lunn wrote:
> On Tue, Jul 05, 2016 at 03:07:12PM -0700, Florian Fainelli wrote:
>> Make it clear that these functions take a device_node structure pointer
> 
> Hi Florian
> 
> Didn't we agree that we would only support a single device via a C
> coded platform data structure?

That is true for the devices I know about, both in and out of tree,
however, while discussing offline with Vivien it seemed like there was a
potential need for having a x86-based platform which could need that,
Vivien do you think this platform could be in-tree one day (if not already)?

> 
> All the functions you are renaming will never be called in that
> case. So i think they can retain there names. You have no need to add
> none device node equivalents.
> 
> So lets drop this patch.
> 
>Andrew
> 

-- 
Florian

Re: [PATCH 2/3] netfilter: Create revision 2 of xt_hashlimit to support higher pps rates

2016-07-05 Thread Vishwanath Pai

On 06/23/2016 07:16 AM, Pablo Neira Ayuso wrote:
> On Wed, Jun 01, 2016 at 08:11:38PM -0400, Vishwanath Pai wrote:
>> +static void
>> +cfg_copy(struct hashlimit_cfg2 *to, void *from, int revision)
>> +{
>> +if (revision == 1) {
>> +struct hashlimit_cfg1 *cfg = (struct hashlimit_cfg1 *)from;
>> +
>> +to->mode = cfg->mode;
>> +to->avg = cfg->avg;
>> +to->burst = cfg->burst;
>> +to->size = cfg->size;
>> +to->max = cfg->max;
>> +to->gc_interval = cfg->gc_interval;
>> +to->expire = cfg->expire;
>> +to->srcmask = cfg->srcmask;
>> +to->dstmask = cfg->dstmask;
>> +} else if (revision == 2) {
>> +memcpy(to, from, sizeof(struct hashlimit_cfg2));
>> +} else {
>> +BUG();
> 
> BUG here is probably too much, this halts the system. I can see we
> only use this somewhere else in this code. Instead, I'd suggest you
> propagate an error back to userspace if this ever happen.
> 
> I would like to see if this spots any problem with our test
> infrastructure under iptables/.
> 
> Thanks.
> 

copy_cfg is only used internally by the kernel module and the value for
revision is passed to the function by the module itself and not from
userspace. I will remove BUG() and propagate the error back to the
caller, will send a v2.

Re: [PATCH net-next v2 4/4] net: dsa: Move ports assignment closer to error checking

2016-07-05 Thread Andrew Lunn

On Tue, Jul 05, 2016 at 03:07:13PM -0700, Florian Fainelli wrote:
> Move the assignment of ports in _dsa_register_switch() closer to where
> it is checked, no functional change. Re-order declarations to be
> preserve the inverted christmas tree style.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next v2 3/4] net: dsa: Suffix function manipulating device_node with _dn

2016-07-05 Thread Andrew Lunn

On Tue, Jul 05, 2016 at 03:07:12PM -0700, Florian Fainelli wrote:
> Make it clear that these functions take a device_node structure pointer

Hi Florian

Didn't we agree that we would only support a single device via a C
coded platform data structure?

All the functions you are renaming will never be called in that
case. So i think they can retain there names. You have no need to add
none device node equivalents.

So lets drop this patch.

   Andrew

Re: [PATCH net-next v2 2/4] net: dsa: Make most functions take a dsa_port argument

2016-07-05 Thread Andrew Lunn

On Tue, Jul 05, 2016 at 03:07:11PM -0700, Florian Fainelli wrote:
> In preparation for allowing platform data, and therefore no valid
> device_node pointer, make most DSA functions takes a pointer to a
> dsa_port structure whenever possible. While at it, introduce a
> dsa_port_is_valid() helper function which checks whether port->dn is
> NULL or not at the moment.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next v2 1/4] net: dsa: Pass device pointer to dsa_register_switch

2016-07-05 Thread Andrew Lunn

On Tue, Jul 05, 2016 at 03:07:10PM -0700, Florian Fainelli wrote:
> In preparation for allowing dsa_register_switch() to be supplied with
> device/platform data, pass down a struct device pointer instead of a
> struct device_node.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

[PATCH RESEND net-next] netvsc: Use the new in-place consumption APIs in the rx path

2016-07-05 Thread kys

From: K. Y. Srinivasan 

Use the new APIs for eliminating a copy on the receive path. These new APIs also
help in minimizing the number of memory barriers we end up issuing (in the
ringbuffer code) since we can better control when we want to expose the ring
state to the host.

The patch is being resent to address earlier email issues.

Signed-off-by: K. Y. Srinivasan 
---
 drivers/net/hyperv/netvsc.c |   88 +--
 1 files changed, 59 insertions(+), 29 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 719cb35..8cd4c19 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -1141,6 +1141,39 @@ static inline void netvsc_receive_inband(struct 
hv_device *hdev,
}
 }
 
+static void netvsc_process_raw_pkt(struct hv_device *device,
+  struct vmbus_channel *channel,
+  struct netvsc_device *net_device,
+  struct net_device *ndev,
+  u64 request_id,
+  struct vmpacket_descriptor *desc)
+{
+   struct nvsp_message *nvmsg;
+
+   nvmsg = (struct nvsp_message *)((unsigned long)
+   desc + (desc->offset8 << 3));
+
+   switch (desc->type) {
+   case VM_PKT_COMP:
+   netvsc_send_completion(net_device, channel, device, desc);
+   break;
+
+   case VM_PKT_DATA_USING_XFER_PAGES:
+   netvsc_receive(net_device, channel, device, desc);
+   break;
+
+   case VM_PKT_DATA_INBAND:
+   netvsc_receive_inband(device, net_device, nvmsg);
+   break;
+
+   default:
+   netdev_err(ndev, "unhandled packet type %d, tid %llx\n",
+  desc->type, request_id);
+   break;
+   }
+}
+
+
 void netvsc_channel_cb(void *context)
 {
int ret;
@@ -1153,7 +1186,7 @@ void netvsc_channel_cb(void *context)
unsigned char *buffer;
int bufferlen = NETVSC_PACKET_SIZE;
struct net_device *ndev;
-   struct nvsp_message *nvmsg;
+   bool need_to_commit = false;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1167,39 +1200,36 @@ void netvsc_channel_cb(void *context)
buffer = get_per_channel_state(channel);
 
do {
+   desc = get_next_pkt_raw(channel);
+   if (desc != NULL) {
+   netvsc_process_raw_pkt(device,
+  channel,
+  net_device,
+  ndev,
+  desc->trans_id,
+  desc);
+
+   put_pkt_raw(channel, desc);
+   need_to_commit = true;
+   continue;
+   }
+   if (need_to_commit) {
+   need_to_commit = false;
+   commit_rd_index(channel);
+   }
+
ret = vmbus_recvpacket_raw(channel, buffer, bufferlen,
   _recvd, _id);
if (ret == 0) {
if (bytes_recvd > 0) {
desc = (struct vmpacket_descriptor *)buffer;
-   nvmsg = (struct nvsp_message *)((unsigned long)
-desc + (desc->offset8 << 3));
-   switch (desc->type) {
-   case VM_PKT_COMP:
-   netvsc_send_completion(net_device,
-   channel,
-   device, desc);
-   break;
-
-   case VM_PKT_DATA_USING_XFER_PAGES:
-   netvsc_receive(net_device, channel,
-  device, desc);
-   break;
-
-   case VM_PKT_DATA_INBAND:
-   netvsc_receive_inband(device,
- net_device,
- nvmsg);
-   break;
-
-   default:
-   netdev_err(ndev,
-  "unhandled packet type %d, "
-  "tid %llx len %d\n",
-  desc->type, request_id,
-  bytes_recvd);

[PATCH net-next v2 2/4] net: dsa: Make most functions take a dsa_port argument

2016-07-05 Thread Florian Fainelli

In preparation for allowing platform data, and therefore no valid
device_node pointer, make most DSA functions takes a pointer to a
dsa_port structure whenever possible. While at it, introduce a
dsa_port_is_valid() helper function which checks whether port->dn is
NULL or not at the moment.

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa.c  | 14 +++--
 net/dsa/dsa2.c | 61 +-
 net/dsa/dsa_priv.h |  4 ++--
 3 files changed, 43 insertions(+), 36 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 766d2a525ada..d117580a78b6 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -208,8 +208,9 @@ __ATTRIBUTE_GROUPS(dsa_hwmon);
 
 /* basic switch operations **/
 int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
- struct device_node *port_dn, int port)
+ struct dsa_port *dport, int port)
 {
+   struct device_node *port_dn = dport->dn;
struct phy_device *phydev;
int ret, mode;
 
@@ -237,15 +238,15 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct 
device *dev,
 
 static int dsa_cpu_dsa_setups(struct dsa_switch *ds, struct device *dev)
 {
-   struct device_node *port_dn;
+   struct dsa_port *dport;
int ret, port;
 
for (port = 0; port < DSA_MAX_PORTS; port++) {
if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
continue;
 
-   port_dn = ds->ports[port].dn;
-   ret = dsa_cpu_dsa_setup(ds, dev, port_dn, port);
+   dport = >ports[port];
+   ret = dsa_cpu_dsa_setup(ds, dev, dport, port);
if (ret)
return ret;
}
@@ -494,8 +495,9 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
return ds;
 }
 
-void dsa_cpu_dsa_destroy(struct device_node *port_dn)
+void dsa_cpu_dsa_destroy(struct dsa_port *port)
 {
+   struct device_node *port_dn = port->dn;
struct phy_device *phydev;
 
if (of_phy_is_fixed_link(port_dn)) {
@@ -531,7 +533,7 @@ static void dsa_switch_destroy(struct dsa_switch *ds)
for (port = 0; port < DSA_MAX_PORTS; port++) {
if (!(dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port)))
continue;
-   dsa_cpu_dsa_destroy(ds->ports[port].dn);
+   dsa_cpu_dsa_destroy(>ports[port]);
 
/* Clearing a bit which is not set does no harm */
ds->cpu_port_mask |= ~(1 << port);
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 0940a0ec83e6..3a782ceef716 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -77,11 +77,16 @@ static void dsa_dst_del_ds(struct dsa_switch_tree *dst,
kref_put(>refcount, dsa_free_dst);
 }
 
-static bool dsa_port_is_dsa(struct device_node *port)
+static bool dsa_port_is_valid(struct dsa_port *port)
+{
+   return !!port->dn;
+}
+
+static bool dsa_port_is_dsa(struct dsa_port *port)
 {
const char *name;
 
-   name = of_get_property(port, "label", NULL);
+   name = of_get_property(port->dn, "label", NULL);
if (!name)
return false;
 
@@ -91,11 +96,11 @@ static bool dsa_port_is_dsa(struct device_node *port)
return false;
 }
 
-static bool dsa_port_is_cpu(struct device_node *port)
+static bool dsa_port_is_cpu(struct dsa_port *port)
 {
const char *name;
 
-   name = of_get_property(port, "label", NULL);
+   name = of_get_property(port->dn, "label", NULL);
if (!name)
return false;
 
@@ -136,7 +141,7 @@ static struct dsa_switch *dsa_dst_find_port(struct 
dsa_switch_tree *dst,
 
 static int dsa_port_complete(struct dsa_switch_tree *dst,
 struct dsa_switch *src_ds,
-struct device_node *port,
+struct dsa_port *port,
 u32 src_port)
 {
struct device_node *link;
@@ -144,7 +149,7 @@ static int dsa_port_complete(struct dsa_switch_tree *dst,
struct dsa_switch *dst_ds;
 
for (index = 0;; index++) {
-   link = of_parse_phandle(port, "link", index);
+   link = of_parse_phandle(port->dn, "link", index);
if (!link)
break;
 
@@ -167,13 +172,13 @@ static int dsa_port_complete(struct dsa_switch_tree *dst,
  */
 static int dsa_ds_complete(struct dsa_switch_tree *dst, struct dsa_switch *ds)
 {
-   struct device_node *port;
+   struct dsa_port *port;
u32 index;
int err;
 
for (index = 0; index < DSA_MAX_PORTS; index++) {
-   port = ds->ports[index].dn;
-   if (!port)
+   port = >ports[index];
+   if (!dsa_port_is_valid(port))
continue;
 
if (!dsa_port_is_dsa(port))
@@ -213,7 +218,7 @@ static int

[PATCH net-next v2 3/4] net: dsa: Suffix function manipulating device_node with _dn

2016-07-05 Thread Florian Fainelli

Make it clear that these functions take a device_node structure pointer

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa2.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 3a782ceef716..bdde5d217326 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -110,8 +110,8 @@ static bool dsa_port_is_cpu(struct dsa_port *port)
return false;
 }
 
-static bool dsa_ds_find_port(struct dsa_switch *ds,
-struct device_node *port)
+static bool dsa_ds_find_port_dn(struct dsa_switch *ds,
+   struct device_node *port)
 {
u32 index;
 
@@ -121,8 +121,8 @@ static bool dsa_ds_find_port(struct dsa_switch *ds,
return false;
 }
 
-static struct dsa_switch *dsa_dst_find_port(struct dsa_switch_tree *dst,
-   struct device_node *port)
+static struct dsa_switch *dsa_dst_find_port_dn(struct dsa_switch_tree *dst,
+  struct device_node *port)
 {
struct dsa_switch *ds;
u32 index;
@@ -132,7 +132,7 @@ static struct dsa_switch *dsa_dst_find_port(struct 
dsa_switch_tree *dst,
if (!ds)
continue;
 
-   if (dsa_ds_find_port(ds, port))
+   if (dsa_ds_find_port_dn(ds, port))
return ds;
}
 
@@ -153,7 +153,7 @@ static int dsa_port_complete(struct dsa_switch_tree *dst,
if (!link)
break;
 
-   dst_ds = dsa_dst_find_port(dst, link);
+   dst_ds = dsa_dst_find_port_dn(dst, link);
of_node_put(link);
 
if (!dst_ds)
@@ -557,7 +557,7 @@ static int dsa_parse_ports_dn(struct device_node *ports, 
struct dsa_switch *ds)
return 0;
 }
 
-static int dsa_parse_member(struct device_node *np, u32 *tree, u32 *index)
+static int dsa_parse_member_dn(struct device_node *np, u32 *tree, u32 *index)
 {
int err;
 
@@ -603,7 +603,7 @@ static int _dsa_register_switch(struct dsa_switch *ds, 
struct device *dev)
u32 tree, index;
int err;
 
-   err = dsa_parse_member(np, , );
+   err = dsa_parse_member_dn(np, , );
if (err)
return err;
 
-- 
2.7.4

[PATCH net-next v2 0/4] net: dsa: Preparatory patches for pdata

2016-07-05 Thread Florian Fainelli

Hi all,

This is a resend of the patches that just clean up and prepare net/dsa/dsa2.c
to support platform data in the future.

Florian Fainelli (4):
  net: dsa: Pass device pointer to dsa_register_switch
  net: dsa: Make most functions take a dsa_port argument
  net: dsa: Suffix function manipulating device_node with _dn
  net: dsa: Move ports assignment closer to error checking

 drivers/net/dsa/b53/b53_common.c |  2 +-
 drivers/net/dsa/mv88e6xxx/chip.c |  7 ++--
 include/net/dsa.h|  2 +-
 net/dsa/dsa.c| 14 ---
 net/dsa/dsa2.c   | 87 ++--
 net/dsa/dsa_priv.h   |  4 +-
 6 files changed, 62 insertions(+), 54 deletions(-)

-- 
2.7.4

[PATCH net-next v2 4/4] net: dsa: Move ports assignment closer to error checking

2016-07-05 Thread Florian Fainelli

Move the assignment of ports in _dsa_register_switch() closer to where
it is checked, no functional change. Re-order declarations to be
preserve the inverted christmas tree style.

Signed-off-by: Florian Fainelli 
---
 net/dsa/dsa2.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index bdde5d217326..a565bd919aa3 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -598,8 +598,8 @@ static struct device_node *dsa_get_ports(struct dsa_switch 
*ds,
 static int _dsa_register_switch(struct dsa_switch *ds, struct device *dev)
 {
struct device_node *np = dev->of_node;
-   struct device_node *ports = dsa_get_ports(ds, np);
struct dsa_switch_tree *dst;
+   struct device_node *ports;
u32 tree, index;
int err;
 
@@ -607,6 +607,7 @@ static int _dsa_register_switch(struct dsa_switch *ds, 
struct device *dev)
if (err)
return err;
 
+   ports = dsa_get_ports(ds, np);
if (IS_ERR(ports))
return PTR_ERR(ports);
 
-- 
2.7.4

[PATCH net-next v2 1/4] net: dsa: Pass device pointer to dsa_register_switch

2016-07-05 Thread Florian Fainelli

In preparation for allowing dsa_register_switch() to be supplied with
device/platform data, pass down a struct device pointer instead of a
struct device_node.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 2 +-
 drivers/net/dsa/mv88e6xxx/chip.c | 7 +++
 include/net/dsa.h| 2 +-
 net/dsa/dsa2.c   | 7 ---
 4 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 444de7b9..e5799a68cfc8 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1778,7 +1778,7 @@ int b53_switch_register(struct b53_device *dev)
 
pr_info("found switch: %s, rev %i\n", dev->name, dev->core_rev);
 
-   return dsa_register_switch(dev->ds, dev->ds->dev->of_node);
+   return dsa_register_switch(dev->ds, dev->ds->dev);
 }
 EXPORT_SYMBOL(b53_switch_register);
 
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 5cb06f7673af..11617c04cd33 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3848,8 +3848,7 @@ static struct dsa_switch_driver mv88e6xxx_switch_driver = 
{
.port_fdb_dump  = mv88e6xxx_port_fdb_dump,
 };
 
-static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip,
-struct device_node *np)
+static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip)
 {
struct device *dev = chip->dev;
struct dsa_switch *ds;
@@ -3864,7 +3863,7 @@ static int mv88e6xxx_register_switch(struct 
mv88e6xxx_chip *chip,
 
dev_set_drvdata(dev, ds);
 
-   return dsa_register_switch(ds, np);
+   return dsa_register_switch(ds, dev);
 }
 
 static void mv88e6xxx_unregister_switch(struct mv88e6xxx_chip *chip)
@@ -3911,7 +3910,7 @@ static int mv88e6xxx_probe(struct mdio_device *mdiodev)
if (err)
return err;
 
-   err = mv88e6xxx_register_switch(chip, np);
+   err = mv88e6xxx_register_switch(chip);
if (err) {
mv88e6xxx_mdio_unregister(chip);
return err;
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 20b3087ad193..6de162c8283e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -381,5 +381,5 @@ static inline bool dsa_uses_tagged_protocol(struct 
dsa_switch_tree *dst)
 }
 
 void dsa_unregister_switch(struct dsa_switch *ds);
-int dsa_register_switch(struct dsa_switch *ds, struct device_node *np);
+int dsa_register_switch(struct dsa_switch *ds, struct device *dev);
 #endif
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 83b95fc4cede..0940a0ec83e6 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -590,8 +590,9 @@ static struct device_node *dsa_get_ports(struct dsa_switch 
*ds,
return ports;
 }
 
-static int _dsa_register_switch(struct dsa_switch *ds, struct device_node *np)
+static int _dsa_register_switch(struct dsa_switch *ds, struct device *dev)
 {
+   struct device_node *np = dev->of_node;
struct device_node *ports = dsa_get_ports(ds, np);
struct dsa_switch_tree *dst;
u32 tree, index;
@@ -660,12 +661,12 @@ out:
return err;
 }
 
-int dsa_register_switch(struct dsa_switch *ds, struct device_node *np)
+int dsa_register_switch(struct dsa_switch *ds, struct device *dev)
 {
int err;
 
mutex_lock(_mutex);
-   err = _dsa_register_switch(ds, np);
+   err = _dsa_register_switch(ds, dev);
mutex_unlock(_mutex);
 
return err;
-- 
2.7.4

Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc

2016-07-05 Thread Florian Fainelli

On 07/05/2016 02:51 PM, Mason wrote:
>>> Therefore, loss of context cannot possibly explain the
>>> warning I am seeing.
>>
>> No, but if you go all the way down to trying to suspend and the last
>> step is the firmware failing, anything you have suspended needs to be
>> unwinded, for your ethernet driver that means that you went through a
>> successful suspend then resume cycle even if it failed down later when
>> the platform attempted to suspend.
> 
> So it is the driver's responsibility to "shut down" on resume?

It is the driver responsibility to know how to suspend and resume a
device it manages, and it does that by implementing appropriate
suspend/resume callbacks.

> (I had the vague impression that the suspend framework would
> "disable" the device through the appropriate callback.)

The suspend framework knows which drivers implement suspend/resume and
calls them appropriately (based on parenting/bus hierarchy), but it
won't automatigally do anything because there is no such thing as magic
when it comes to suspending hardware, this needs to be a controlled
sequence.
-- 
Florian

Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc

2016-07-05 Thread Mason

On 05/07/2016 23:22, Florian Fainelli wrote:
> On 07/05/2016 01:26 PM, Mason wrote:
>> On 05/07/2016 18:20, Florian Fainelli wrote:
>>> On 07/05/2016 08:56 AM, Mason wrote:
 On 05/07/2016 17:28, Florian Fainelli wrote:

> nb8800.c does not currently show suspend/resume hooks implemented, are
> you positive that when you suspend, you properly tear down all HW, stop
> transmit queues, etc. and do the opposite upon resumption?

 I am currently testing the error path for my suspend routine.
 Firmware is, in fact, denying the suspend request, and immediately
 returns control to Linux, without having powered anything down.

 I expected not having to save any context in that situation.
 Am I mistaken?
>>>
>>> It depends what power state you are going to and resuming from, and how
>>> much of this is platform dependent, on the platforms I work with S2
>>> preserves register states for our On/Off domain, while S3 only keeps an
>>> always-on power island and shuts off the On/Off domain, you therefore
>>> need to have your drivers in the On/Off domain suspend any activity and
>>> preserve important register states, or re-initialize them from scratch
>>> whichever is the most convenient.
>>
>> Thanks for bringing these details to my attention, they will
>> definitely prove useful when I test an actual suspend/resume
>> sequence. However, I must stress that the platform did NOT
>> power down in my test case, because the firmware currently
>> denies all suspend requests.
>>
>> Therefore, loss of context cannot possibly explain the
>> warning I am seeing.
> 
> No, but if you go all the way down to trying to suspend and the last
> step is the firmware failing, anything you have suspended needs to be
> unwinded, for your ethernet driver that means that you went through a
> successful suspend then resume cycle even if it failed down later when
> the platform attempted to suspend.

So it is the driver's responsibility to "shut down" on resume?
(I had the vague impression that the suspend framework would
"disable" the device through the appropriate callback.)

>>> See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
>>> that takes care of that for instance, look for bcmgenet_{suspend,resume}
>>
>> Thanks. I will look into it.
>>
>> If I understand correctly, something is missing in the
>> network interface code? (My system is using an NFS root
>> filesystem, so network is an important subsystem.)
> 
> The typical things are detaching the network device and stopping
> transmit queues, but without knowing what changes you have done to
> nb8800.c, hard to tell what else is needed.

I'm using the driver unaltered. So I guess I need to figure out
the exact steps required for suspending a network device.
(I'll look at bcmgenet.c tomorrow.)

> Is your system clocksource also correctly saved/restored, or if you go
> through a firmware in-between could it be changing the counter values
> and make Linux think that more time as elapsed than it really happened?

 Thanks for pointing this out, I was not aware I was supposed to save
 and restore the tick counter on suspend/resume. (This is not an issue
 in this specific situation, as the platform is NOT suspended.)
>>>
>>> You don't have to save and restore the clocksource counter, although if
>>> you want proper time accounting to be done across suspend states, you
>>> would want to use a clocksource which is persistent across these suspend
>>> states.
>>
>> The clocksource is a 27 MHz 32-bit tick counter. In other words,
>> the counter wraps around every 159 seconds. If Linux suspends
>> for several hours, how can it determine how much time went by?
> 
> Well, that's unfortunate, then you are pretty much either doomed to
> accepting to lose time in between and rely on e.g: NTP to resync your
> time upon resumption, or, if you had smarter hardware you could have a
> prescaler or something that makes this counter wrap far ahead (like
> years or days after).

Maybe the hardware devs thought of that problem, because they
"widened" the counter to 64 bits on newer platforms.

Regards.

Re: [PATCH net-next 00/10] NCSI Support

2016-07-05 Thread Benjamin Herrenschmidt

On Tue, 2016-07-05 at 10:44 -0700, Alexei Starovoitov wrote:
> 

 .../...

> > > The design for the patchset is highlighted as below:
> > > 
> > >    * The NCSI interface is abstracted with "struct ncsi_dev". It's 
> > > registered
> > >  when net_device is created, started to work by calling 
> > > ncsi_start_dev()
> > >  when net_device is opened (ndo_open()). For the first time, NCSI 
> > > packets
> > >  are sent and received to/from the far end (host in above figure) to 
> > > probe
> > >  available NCSI packages and channels. After that, one channel is 
> > > chosen as
> > >  active one to provide service.
> > >    * The NCSI stack is driven by workqueue and state machine internally.
> > >    * AEN (Asychronous Event Notification) might be received from the far 
> > > end
> > >  (host). The currently active NCSI channel fails over to another 
> > > available
> > >  one if possible. Otherwise, the NCSI channel is out of service.
> > >    * NCSI stack should be configurable through netlink or another 
> > > mechanism,
> > >  but it's not implemented in this patchset. It's something TBD.
> 
> Gavin,
> what configurations do you have in mind?
> For ncsi itself or to control the nic on the host?
> This set of patches is for BMC side, right?
> What needs to be done on the host?

I'll respond for Gavin since I'm awake first ;-)

We use that stack today on OpenBMC on some OpenPOWER machines.

The configuration is thus for the above stack to run on the BMC in
order to control the host NIC.

NC-SI capable host NICs operate autonomously, so there is nothing to be
done on the host OS itself, at least not with the BCM NICs that we use
today, but of course the host NIC firmware needs to have the other side
of the stack.

> > >    * The first NIC driver that is aware of NCSI: 
> > > drivers/net/ethernet/faraday/ftgmac100.c
> > 
> > 
> > FWIW, talking to a colleague, he made a comment that some of the text
> > above is wrong:
> > 
> > AENs are sent from NIC to BMC. Not from Host to BMC.
> > 
> > The traffic between a BMC and a NIC is over RBT if it is formatted as
> > NC-SI packets. This is not over network traffic
> > 
> > Or.
> 
> Or,
> since cx4 has ncsi as well, could you do a thorough review of this
> to make sure that it fits mellanox nics as well?

Cheers,
Ben.

Re: [PATCH 2/2] net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings

2016-07-05 Thread Ben Hutchings

On Tue, 2016-07-05 at 14:15 -0700, Florian Fainelli wrote:
> On 07/05/2016 02:07 PM, Philippe Reynes wrote:
> > Hi Florian,
> > 
> > On 05/07/16 06:30, Florian Fainelli wrote:
> > > Le 04/07/2016 16:03, David Miller a écrit :
> > > > From: Philippe Reynes
> > > > Date: Sun,  3 Jul 2016 17:33:57 +0200
> > > > 
> > > > > There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> > > > > so we can use them instead of defining the same code in the driver.
> > > > > 
> > > > > Signed-off-by: Philippe Reynes
> > > > 
> > > > Applied.
> > > > 
> > > 
> > > The transformation is not equivalent, we lost the checks on
> > > netif_running() in the process, and those are here for a reason, if the
> > > interface is down and therefore clock gated, MDIO accesses to the PHY
> > > will simply fail outright and cause bus errors.
> > 
> > Oh, I see, I've missed this. Sorry for this mistake.
> > We should revert this path.
> 
> Well, maybe better than that, actually put the check in the generic
> functions, because if the link is down, aka netif_running() returns
> false, link parameters cannot be reliably queried and they are invalid.

Either the hardware or the driver needs to remember:

- Is auto-negotiation enabled
- If so, which modes are advertised
- If not, which mode is forced

And it should still be possible to get or set that information when the
interface is down.

Ben.

-- 

Ben Hutchings
Life is what happens to you while you're busy making other plans.
   - John
Lennon


signature.asc
Description: This is a digitally signed message part

Re: [PATCH net-next 22/24] rcu: Suppress sparse warnings for rcu_dereference_raw()

2016-07-05 Thread Paul E. McKenney

On Tue, Jul 05, 2016 at 02:14:49PM +0100, David Howells wrote:
> From: Paul E. McKenney 
> 
> Data structures that are used both with and without RCU protection
> are difficult to write in a sparse-clean manner.  If you mark the
> relevant pointers with __rcu, sparse will complain about all non-RCU
> uses, but if you don't mark those pointers, sparse will complain about
> all RCU uses.
> 
> This commit therefore suppresses sparse warnings for rcu_dereference_raw(),
> allowing mixed-protection data structures to avoid these warnings.
> 
> Reported-by: David Howells 
> Signed-off-by: Paul E. McKenney 
> Signed-off-by: David Howells 

This would normally be my cue to give an Acked-by to an RCU patch, but
it already has my Signed-off-by.  So this is just to confirm that I agree
that keeping this patch with the other patches that depend on it is the
right thing to do.  ;-)

Thanx, Paul

> ---
> 
>  include/linux/rcupdate.h |8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 5f1533e3d032..85830e6c797b 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -611,6 +611,12 @@ static inline void rcu_preempt_sleep_check(void)
>   rcu_dereference_sparse(p, space); \
>   ((typeof(*p) __force __kernel *)(p)); \
>  })
> +#define rcu_dereference_raw(p) \
> +({ \
> + /* Dependency order vs. p above. */ \
> + typeof(p) p1 = lockless_dereference(p); \
> + ((typeof(*p) __force __kernel *)(p1)); \
> +})
> 
>  /**
>   * RCU_INITIALIZER() - statically initialize an RCU-protected global variable
> @@ -729,8 +735,6 @@ static inline void rcu_preempt_sleep_check(void)
>   __rcu_dereference_check((p), (c) || rcu_read_lock_sched_held(), \
>   __rcu)
> 
> -#define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ needed? 
> @@@*/
> -
>  /*
>   * The tracing infrastructure traces RCU (we want that), but unfortunately
>   * some of the RCU checks causes tracing to lock up the system.
>

Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc

2016-07-05 Thread Florian Fainelli

On 07/05/2016 01:26 PM, Mason wrote:
> On 05/07/2016 18:20, Florian Fainelli wrote:
>> On 07/05/2016 08:56 AM, Mason wrote:
>>> On 05/07/2016 17:28, Florian Fainelli wrote:
>>>
 nb8800.c does not currently show suspend/resume hooks implemented, are
 you positive that when you suspend, you properly tear down all HW, stop
 transmit queues, etc. and do the opposite upon resumption?
>>>
>>> I am currently testing the error path for my suspend routine.
>>> Firmware is, in fact, denying the suspend request, and immediately
>>> returns control to Linux, without having powered anything down.
>>>
>>> I expected not having to save any context in that situation.
>>> Am I mistaken?
>>
>> It depends what power state you are going to and resuming from, and how
>> much of this is platform dependent, on the platforms I work with S2
>> preserves register states for our On/Off domain, while S3 only keeps an
>> always-on power island and shuts off the On/Off domain, you therefore
>> need to have your drivers in the On/Off domain suspend any activity and
>> preserve important register states, or re-initialize them from scratch
>> whichever is the most convenient.
> 
> Thanks for bringing these details to my attention, they will
> definitely prove useful when I test an actual suspend/resume
> sequence. However, I must stress that the platform did NOT
> power down in my test case, because the firmware currently
> denies all suspend requests.
> 
> Therefore, loss of context cannot possibly explain the
> warning I am seeing.

No, but if you go all the way down to trying to suspend and the last
step is the firmware failing, anything you have suspended needs to be
unwinded, for your ethernet driver that means that you went through a
successful suspend then resume cycle even if it failed down later when
the platform attempted to suspend.

> 
>>> You mention "stop transmit queues". Can you say more about this?
>>
>> See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
>> that takes care of that for instance, look for bcmgenet_{suspend,resume}
> 
> Thanks. I will look into it.
> 
> If I understand correctly, something is missing in the
> network interface code? (My system is using an NFS root
> filesystem, so network is an important subsystem.)

The typical things are detaching the network device and stopping
transmit queues, but without knowing what changes you have done to
nb8800.c, hard to tell what else is needed.

> 
 Is your system clocksource also correctly saved/restored, or if you go
 through a firmware in-between could it be changing the counter values
 and make Linux think that more time as elapsed than it really happened?
>>>
>>> Thanks for pointing this out, I was not aware I was supposed to save
>>> and restore the tick counter on suspend/resume. (This is not an issue
>>> in this specific situation, as the platform is NOT suspended.)
>>
>> You don't have to save and restore the clocksource counter, although if
>> you want proper time accounting to be done across suspend states, you
>> would want to use a clocksource which is persistent across these suspend
>> states.
> 
> The clocksource is a 27 MHz 32-bit tick counter. In other words,
> the counter wraps around every 159 seconds. If Linux suspends
> for several hours, how can it determine how much time went by?

Well, that's unfortunate, then you are pretty much either doomed to
accepting to lose time in between and rely on e.g: NTP to resync your
time upon resumption, or, if you had smarter hardware you could have a
prescaler or something that makes this counter wrap far ahead (like
years or days after).
-- 
Florian

Re: [PATCH net] bonding: fix 802.3ad aggregator reselection

2016-07-05 Thread Jay Vosburgh

Veli-Matti Lintu  wrote:

>2016-06-30 14:15 GMT+03:00 Veli-Matti Lintu :
>> 2016-06-29 18:59 GMT+03:00 Jay Vosburgh :
>>> Veli-Matti Lintu  wrote:
>
>>> I tried this locally, but don't see any failure (at the end, the
>>> "Switch A" agg is still active with the single port).  I am starting
>>> with just two ports in each aggregator (instead of three), so that may
>>> be relevant.
>>
>> When the connection problem occurs, /proc/net/bonding/bond0 always
>> shows the aggregator that has a link up active. Dumpcap sees at least
>> broadcast traffic on the port, but I haven't done extensive analysis
>> on that yet. All TCP connections are cut until the bond is up again
>> when more ports are enabled on the switch. ping doesn't work either
>> way.
>
>I did some further testing on this and it looks like I can get this
>working by enabling the ports in the new aggregator the same way as
>the ports in old aggregator are disabled in ad_agg_selection_logic().

I tested with this some as well, using 6 ports total across two
switches, and was still not able to reproduce the issue.  How are you
configuring the bond in the first place?  It may be that there is some
dependency on the ordering of the slaves within the bond and how they
are disabled.

Also, I am taking the ports down by physically unplugging the
cable from the switch.  If you're doing it differently, that might be
relevant.

>Normally the ports seem to get enabled from ad_mux_machine() in "case
>AD_MUX_COLLECTING_DISTRIBUTING", but something different happens there
>as the port does get enabled, but no traffic passes through. So far I
>haven't been able to figure out what happens. When the connection is
>lost, dumpcap sees traffic on the only active port in the bond, but it
>seems like nothing catches it. If I disable and re-enable the same
>port, traffic start flowing again normally.

Looking at the debug log you provided, the step that fails
appears to correspond to this portion:

[  367.811419] bond0: link status definitely down for interface enp5s0f1, disabl
ing it
[  367.811425] bond0: best Agg=2; P=3; a k=9; p k=57; Ind=0; Act=0
[  367.811429] bond0: best ports 8830113f6638 slave 8830113cfe00 enp5s0f
0
[  367.811432] bond0: Agg=1; P=3; a k=9; p k=57; Ind=0; Act=0
[  367.811434] bond0: Agg=2; P=3; a k=9; p k=57; Ind=0; Act=0
[  367.811437] bond0: Agg=3; P=0; a k=0; p k=0; Ind=0; Act=0
[  367.811439] bond0: Agg=4; P=0; a k=0; p k=0; Ind=0; Act=0
[  367.811441] bond0: Agg=5; P=0; a k=0; p k=0; Ind=0; Act=0
[  367.811444] bond0: Agg=6; P=0; a k=0; p k=0; Ind=0; Act=0
[  367.811446] bond0: LAG 2 chosen as the active LAG
[  367.811448] bond0: Agg=2; P=3; a k=9; p k=57; Ind=0; Act=1
[  367.811451] bond0: Port 1 changed link status to DOWN
[  367.811455] bond0: first active interface up!
[  367.811461] Rx Machine: Port=1 (enp5s0f1), Last State=6, Curr State=2
[  367.811495] ixgbe :05:00.0 enp5s0f0: event: 1b
[  367.811497] ixgbe :05:00.0 enp5s0f0: IFF_SLAVE
[  367.811519] ixgbe :05:00.1 enp5s0f1: event: 19
[  367.811522] ixgbe :05:00.1 enp5s0f1: IFF_SLAVE
[  367.811525] ixgbe :05:00.1 enp5s0f1: event: 19
[  367.811528] ixgbe :05:00.1 enp5s0f1: IFF_SLAVE
[  386.809542] Periodic Machine: Port=2, Last State=3, Curr State=4
[  386.909543] Periodic Machine: Port=2, Last State=4, Curr State=3
[  387.009530] update lacpdu: enp5s0f0, actor port state 3d
[  387.009541] Sent LACPDU on port 2
[  387.571372] bond0: Received LACPDU on port 2 slave enp5s0f0
[  387.571379] Rx Machine: Port=2 (enp5s0f0), Last State=6, Curr State=6
[  387.571381] enp5s0f0 partner sync=1
[  416.810767] Periodic Machine: Port=2, Last State=3, Curr State=4
[  416.910786] Periodic Machine: Port=2, Last State=4, Curr State=3
[  417.010749] update lacpdu: enp5s0f0, actor port state 3d
[  417.010761] Sent LACPDU on port 2
[  417.569137] bond0: Received LACPDU on port 2 slave enp5s0f0
[  417.569156] Rx Machine: Port=2 (enp5s0f0), Last State=6, Curr State=6
[... repeats this cycle for a while ]
[  537.614050] enp5s0f0 partner sync=1
[  566.816851] Periodic Machine: Port=2, Last State=3, Curr State=4
[  566.916843] Periodic Machine: Port=2, Last State=4, Curr State=3
[  567.016829] update lacpdu: enp5s0f0, actor port state 3d
[  567.016839] Sent LACPDU on port 2
[  567.558379] bond0: Received LACPDU on port 2 slave enp5s0f0
[  567.558399] Rx Machine: Port=2 (enp5s0f0), Last State=6, Curr State=6
[  567.558403] enp5s0f0 partner sync=1
[  572.925434] igb :81:00.0 ens5f0: igb: ens5f0 NIC Link is Up 1000 Mbps 
Full Duplex, Flow Control: RX
[  572.925862] igb :81:00.0 ens5f0: event: 4
[  572.925865] igb :81:00.0 ens5f0: IFF_SLAVE
[  572.925890] bond0: Port 6 Received link speed 0 update from adapter

The "Periodic Machine" 3->4 then 4->3 then "Sent LACPDU" looks
normal (3->4 is the periodic timer expiring,

Re: [PATCH 2/2] net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings

2016-07-05 Thread Florian Fainelli

On 07/05/2016 02:07 PM, Philippe Reynes wrote:
> Hi Florian,
> 
> On 05/07/16 06:30, Florian Fainelli wrote:
>> Le 04/07/2016 16:03, David Miller a écrit :
>>> From: Philippe Reynes
>>> Date: Sun,  3 Jul 2016 17:33:57 +0200
>>>
 There are two generics functions phy_ethtool_{get|set}_link_ksettings,
 so we can use them instead of defining the same code in the driver.

 Signed-off-by: Philippe Reynes
>>>
>>> Applied.
>>>
>>
>> The transformation is not equivalent, we lost the checks on
>> netif_running() in the process, and those are here for a reason, if the
>> interface is down and therefore clock gated, MDIO accesses to the PHY
>> will simply fail outright and cause bus errors.
> 
> Oh, I see, I've missed this. Sorry for this mistake.
> We should revert this path.

Well, maybe better than that, actually put the check in the generic
functions, because if the link is down, aka netif_running() returns
false, link parameters cannot be reliably queried and they are invalid.
-- 
Florian

Re: [PATCH net] ipv6: Fix mem leak in rt6i_pcpu

2016-07-05 Thread David Miller

From: Martin KaFai Lau 
Date: Tue, 5 Jul 2016 12:10:23 -0700

> It was first reported and reproduced by Petr (thanks!) in
> https://bugzilla.kernel.org/show_bug.cgi?id=119581
> 
> free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().
> 
> However, after fixing a deadlock bug in
> commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
> free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.
> 
> It is worth to note that rt6i_pcpu is protected by table->tb6_lock.
> 
> kmemleak somehow did not report it.  We nailed it down by
> observing the pcpu entries in /proc/vmallocinfo (first suggested
> by Hannes, thanks!).
> 
> Signed-off-by: Martin KaFai Lau 
> Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt")
> Reported-by: Petr Novopashenniy 
> Tested-by: Petr Novopashenniy 
> Acked-by: Hannes Frederic Sowa 
> Cc: Hannes Frederic Sowa 
> Cc: Petr Novopashenniy 

Applied and queued up for -stable.

Re: [PATCH] net: fix decnet rtnexthop parsing

2016-07-05 Thread David Miller

From: Vegard Nossum 
Date: Tue,  5 Jul 2016 21:12:53 +0200

> dn_fib_count_nhs() could enter an infinite loop if nhp->rtnh_len == 0
> (i.e. if userspace passes a malformed netlink message).
> 
> Let's use the helpers from net/nexthop.h which take care of all this
> stuff. We can do exactly the same as e.g. fib_count_nexthops() and
> fib_get_nhs() from net/ipv4/fib_semantics.c.
> 
> This fixes the softlockup for me.
> 
> Cc: Thomas Graf 
> Signed-off-by: Vegard Nossum 

Applied, thanks.

Re: [PATCH 2/2] net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings

2016-07-05 Thread Philippe Reynes

Hi Florian,

On 05/07/16 06:30, Florian Fainelli wrote:

Le 04/07/2016 16:03, David Miller a écrit :

From: Philippe Reynes
Date: Sun,  3 Jul 2016 17:33:57 +0200

There are two generics functions phy_ethtool_{get|set}_link_ksettings,
so we can use them instead of defining the same code in the driver.

Signed-off-by: Philippe Reynes

Applied.

The transformation is not equivalent, we lost the checks on
netif_running() in the process, and those are here for a reason, if the
interface is down and therefore clock gated, MDIO accesses to the PHY
will simply fail outright and cause bus errors.

Oh, I see, I've missed this. Sorry for this mistake.
We should revert this path.

I think that a lot of hardware had the same behaviour.
I'm going to look for a generic solution for this behaviour.
If someone has an idea ...

Philippe, have you tested this?

I haven't tested, I don't have the hardware.

Philippe

Re: [PATCH net] bonding: fix 802.3ad aggregator reselection

2016-07-05 Thread Veli-Matti Lintu

2016-07-05 17:01 GMT+03:00 Veli-Matti Lintu :
> 2016-06-30 14:15 GMT+03:00 Veli-Matti Lintu :
>> 2016-06-29 18:59 GMT+03:00 Jay Vosburgh :
>>> Veli-Matti Lintu  wrote:
>
>>> I tried this locally, but don't see any failure (at the end, the
>>> "Switch A" agg is still active with the single port).  I am starting
>>> with just two ports in each aggregator (instead of three), so that may
>>> be relevant.
>>
>> When the connection problem occurs, /proc/net/bonding/bond0 always
>> shows the aggregator that has a link up active. Dumpcap sees at least
>> broadcast traffic on the port, but I haven't done extensive analysis
>> on that yet. All TCP connections are cut until the bond is up again
>> when more ports are enabled on the switch. ping doesn't work either
>> way.
>
> I did some further testing on this and it looks like I can get this
> working by enabling the ports in the new aggregator the same way as
> the ports in old aggregator are disabled in ad_agg_selection_logic().
>
> Normally the ports seem to get enabled from ad_mux_machine() in "case
> AD_MUX_COLLECTING_DISTRIBUTING", but something different happens there
> as the port does get enabled, but no traffic passes through. So far I
> haven't been able to figure out what happens. When the connection is
> lost, dumpcap sees traffic on the only active port in the bond, but it
> seems like nothing catches it. If I disable and re-enable the same
> port, traffic start flowing again normally.

One more thing to add here - I have tested the following
bond/bridge/vlan configurations:

1. bond0 has IP address, no bridges/vlans
2. bond0 belongs to a bridge that has the IP address, no vlans
3. bond0 belongs to a bridge that has the IP address + there are
bond0.X VLANs that belong to separate bridges

All configurations behave the same way.

It is also possible to reproduce this with two aggregators with two
links each. The steps are:

   Agg 1   Agg 2
   P1 P2   P3 P4
   X   X   X   X   OK (Agg 2 active)
   X   X   X   -   OK (Agg 1 active)
   X   -   X   -   OK (Agg 1 active)
   -   -   X   -   Fail (Agg 2 active)

The first disabled port needs to be in active aggregator so that the
aggregator is reselected and changed.

Veli-Matti


> Here's the patch I used for testing on top of 4.7.0-rc6. I haven't
> tested this with other modes or h/w setups yet.
>
>
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index ca81f46..45c06c4 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -1706,6 +1706,25 @@ static void ad_agg_selection_logic(struct
> aggregator *agg,
> __disable_port(port);
> }
> }
> +
> +   /* Enable ports in the new aggregator */
> +if (best) {
> +   netdev_dbg(bond->dev, "Enable ports\n");
> +
> +for (port = best->lag_ports; port;
> + port = port->next_port_in_aggregator) {
> +netdev_dbg(bond->dev, "Agg: %d, P=%d:
> Port: %s; Enabled=%d\n",
> +best->aggregator_identifier,
> +best->num_of_ports,
> +port->slave->dev->name,
> +__port_is_enabled(port));
> +
> +if (!__port_is_enabled(port))
> +__enable_port(port);
> +}
> +}
> +
> +
> /* Slave array needs update. */
> *update_slave_arr = true;
> }
>
> Veli-Matti

Re: [iproute PATCH 0/2] Netns performance improvements

2016-07-05 Thread Phil Sutter

Hi Eric,

Thanks for your quick and insightful reply rightfully pointing out the
lack of rationale behind this change. So let me try to catch up:

On Tue, Jul 05, 2016 at 09:44:00AM -0500, Eric W. Biederman wrote:
> Phil Sutter  writes:
> 
> > Stress-testing OpenStack Neutron revealed poor performance of 'ip netns'
> > when dealing with a high amount of namespaces. The cause of this lies in
> > the combination of how iproute2 mounts NETNS_RUN_DIR and the netns files
> > therein and the fact that systemd makes all mount points of the system
> > shared.
> 
> So please tell me.  Given that it was clearly a deliberate choice in the
> code to make these directories shared, and that this is not a result
> of a systemd making all directories shared by default.  Why is it
> better to these directories non-shared?

NETNS_RUN_DIR itself is kept shared as it was intended by you (I hope).
The only difference is that we should avoid it being in the same group
as the parent mount point. Otherwise, all netns mount points will occur
twice.

Regarding the shared state of the netns mount points, I have actually no
idea what's the benefit, as there won't be any child mount points and
therefore no propagation should occur. Or am I missing something?

> This may be the appropriate change but saying you stress testing things
> and have a problem but do not describe how large a scale you had a
> problem, or anything else to make your problem reproducible by anyone
> else makes it difficult to consider the merits of this change.
> 
> Sometimes things are a good default policy but have imperfect scaling on
> extreme workloads.
> 
> My experience with the current situtation with ip netns is that it
> prevents a whole lot of confusion by making the network namespace names
> visible whichever mount namespace your processes are running in.

The only functional difference I noticed was the no longer twice
appearing netns mount points. They are still visible in all namespaces
though, just as before.

Here's the script I wrote to benchmark 'ip netns':

| #!/bin/bash
| 
| IP=${IP:-/usr/sbin/ip}
| echo "using ip at $IP"
| 
| # make sure we start at a clean state
| for netns in $(ls /run/netns/* 2>/dev/null); do
| $IP netns del ${netns##*/}
| done
| umount /run/netns
| 
| echo "creating 100 mount ns"
| touch /tmp/stay_alive
| for ((i = 0; i < 100; i++)); do
| unshare -m --propagation unchanged bash -c \
|   "while [[ -e /tmp/stay_alive ]]; do sleep 1; done" &
| done
| # give a little time for unshare to complete
| sleep 3
| 
| nscount=1000
| 
| echo -en "\ncreating $nscount netns"
| time (for ((i = 0; i < $nscount; i++)); do $IP netns add test$i; done)
| 
| echo -en "\ndeleting $nscount netns"
| time (for ((i = 0; i < $nscount; i++)); do $IP netns del test$i; done)
| 
| echo "removing mount ns again"
| rm /tmp/stay_alive
| wait

So basically it creates 100 idle mount namespaces, then times
adding/removing 1000 network namespaces. I called it three times:
without any patch, with just patch 1 and with both patches applied. Here
are the results:

| # IP=/tmp/base/ip /vmshare/reproducer/ip_netns_bench.sh
| using ip at /tmp/base/ip
| creating 100 mount ns
| 
| creating 1000 netns
| real  0m8.110s
| user  0m1.143s
| sys   0m6.235s
| 
| deleting 1000 netns
| real  0m15.347s
| user  0m0.957s
| sys   0m11.359s
| removing mount ns again

| # IP=/tmp/p1/ip /vmshare/reproducer/ip_netns_bench.sh
| using ip at /tmp/p1/ip
| creating 100 mount ns
| 
| creating 1000 netns
| real  0m7.956s
| user  0m0.987s
| sys   0m4.896s
| 
| deleting 1000 netns
| real  0m7.407s
| user  0m1.165s
| sys   0m3.418s
| removing mount ns again

| # IP=/tmp/p2/ip /vmshare/reproducer/ip_netns_bench.sh
| using ip at /tmp/p2/ip
| creating 100 mount ns
| 
| creating 1000 netns
| real  0m7.843s
| user  0m0.977s
| sys   0m4.915s
| 
| deleting 1000 netns
| real  0m6.407s
| user  0m1.006s
| sys   0m3.057s
| removing mount ns again

As you can see, the biggest improvement comes during deletion and from
patch 1. Though the second patch lowers the total time to delete the
namespaces by another second, which is still relatively much in
comparison to the low total time.

Cheers, Phil

Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC

2016-07-05 Thread Cong Wang

On Tue, Jul 5, 2016 at 10:59 AM, Cong Wang  wrote:
> On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
>  wrote:
>> Using printk I have confirmed that ppp_pernet() is called from
>> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>>
>> This behavior appears to have been introduced in commit 1f461dc ("ppp:
>> take reference on channels netns").
>
> We have some race condition here, where a parallel ppp_unregister_channel()
> could happen while we are in ppp_connect_channel().
>
> We need some synchronization for them. I am not sure what is the right lock
> here since ppp locking looks crazy.

Matt, could you try if the attached patch helps?

Thanks!
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 8dedafa..07f0e49 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2601,8 +2601,6 @@ ppp_unregister_channel(struct ppp_channel *chan)
spin_lock_bh(>all_channels_lock);
list_del(>list);
spin_unlock_bh(>all_channels_lock);
-   put_net(pch->chan_net);
-   pch->chan_net = NULL;
 
pch->file.dead = 1;
wake_up_interruptible(>file.rwait);
@@ -3136,6 +3134,11 @@ ppp_disconnect_channel(struct channel *pch)
  */
 static void ppp_destroy_channel(struct channel *pch)
 {
+   if (pch->chan_net) {
+   put_net(pch->chan_net);
+   pch->chan_net = NULL;
+   }
+
atomic_dec(_count);
 
if (!pch->file.dead) {

Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc

2016-07-05 Thread Mason

On 05/07/2016 18:20, Florian Fainelli wrote:
> On 07/05/2016 08:56 AM, Mason wrote:
>> On 05/07/2016 17:28, Florian Fainelli wrote:
>>
>>> nb8800.c does not currently show suspend/resume hooks implemented, are
>>> you positive that when you suspend, you properly tear down all HW, stop
>>> transmit queues, etc. and do the opposite upon resumption?
>>
>> I am currently testing the error path for my suspend routine.
>> Firmware is, in fact, denying the suspend request, and immediately
>> returns control to Linux, without having powered anything down.
>>
>> I expected not having to save any context in that situation.
>> Am I mistaken?
> 
> It depends what power state you are going to and resuming from, and how
> much of this is platform dependent, on the platforms I work with S2
> preserves register states for our On/Off domain, while S3 only keeps an
> always-on power island and shuts off the On/Off domain, you therefore
> need to have your drivers in the On/Off domain suspend any activity and
> preserve important register states, or re-initialize them from scratch
> whichever is the most convenient.

Thanks for bringing these details to my attention, they will
definitely prove useful when I test an actual suspend/resume
sequence. However, I must stress that the platform did NOT
power down in my test case, because the firmware currently
denies all suspend requests.

Therefore, loss of context cannot possibly explain the
warning I am seeing.

>> You mention "stop transmit queues". Can you say more about this?
> 
> See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
> that takes care of that for instance, look for bcmgenet_{suspend,resume}

Thanks. I will look into it.

If I understand correctly, something is missing in the
network interface code? (My system is using an NFS root
filesystem, so network is an important subsystem.)

>>> Is your system clocksource also correctly saved/restored, or if you go
>>> through a firmware in-between could it be changing the counter values
>>> and make Linux think that more time as elapsed than it really happened?
>>
>> Thanks for pointing this out, I was not aware I was supposed to save
>> and restore the tick counter on suspend/resume. (This is not an issue
>> in this specific situation, as the platform is NOT suspended.)
> 
> You don't have to save and restore the clocksource counter, although if
> you want proper time accounting to be done across suspend states, you
> would want to use a clocksource which is persistent across these suspend
> states.

The clocksource is a 27 MHz 32-bit tick counter. In other words,
the counter wraps around every 159 seconds. If Linux suspends
for several hours, how can it determine how much time went by?

Regards.

Re: [PATCH iptables 3/3] libxt_hashlimit: iptables-restore does not work as expected with xt_hashlimit

2016-07-05 Thread Vishwanath Pai

On 06/25/2016 05:39 AM, Pablo Neira Ayuso wrote:
> I see, but I'm not convinced about this /proc rename feature.
> 
> I think the main point of this, as well as other entries in bugzilla
> related to this, is ability to update an existing hashlimit state.
> 
> So, I'm not proposing to rename --enhanced-procfs to something else,
> I think that a different approach consisting on adding a new option
> like --hashlimit-update that will update the internal state of an
> existing hashlimit object is just fine for your usecase, right?
> 
>> > Other than that, we are doing exactly what you said, but creating a new
>> > entry in the hashtable instead of updating it. The previous entry will
>> > automatically be removed when the old rule is flushed/deleted.
> What I'm missing is why we need this /proc rename at all.

The reason we need the procfs rename feature is because it is broken at
the moment. Let us assume someone adds two rules with the same name
(intentionally or otherwise). We cannot prevent them from doing this or
error out when someone does this because all of this is done in
hashlimit_mt_check which is called for every iptables rule change, even
an entirely different rule. I'll demonstrate two scenarios here. I have
put debug printk statements which prints everytime hashlimit_mt_check is
called.

1) Add two rules with the same name but in different chains

$ iptables -A chain1 -m hashlimit --hashlimit-above 200/sec \
  --hashlimit-mode srcip --hashlimit-name hashlimit1 -j DROP

$ dmesg -c
[  103.965578] hashlimit_mt_check for rule hashlimit1

$ iptables -A chain2 -m hashlimit --hashlimit-above 300/sec \
   --hashlimit-mode srcip --hashlimit-name hashlimit1 -j DROP

$ dmesg -c
[  114.613758] hashlimit_mt_check for rule hashlimit1
[  114.621360] hashlimit_mt_check for rule hashlimit1
[  114.627411] hashlimit_mt_destroy on hashlimit1

2) Replace an iptables rule with iptables-restore

$ iptables -A chain1 -m hashlimit --hashlimit-above 200/sec \
  --hashlimit-mode srcip --hashlimit-name hashlimit1 -j DROP

$ iptables-save > /tmp/hashlimit

$ vi /tmp/hashlimit (edit 200/s to 300/s)

$ iptables-restore < /tmp/hashlimit

$ dmesg -c
[ 1585.411093] hashlimit_mt_check for rule hashlimit1
[ 1585.418948] hashlimit_mt_destroy on hashlimit1

In case 1 there exists two rules with the same name but we cannot have
procfs files for both of the rules since they have to exist in the same
directory. In case 2 there will be only one rule but there is a small
window where two rules with same name exist. We cannot differentiate
this from case 1. In both the cases we get the call for
hashlimit_mt_check for the new rule before the old rule is deleted.

Without the rename feature I do not know how to correctly handle the
scenario where two rules with different parameters but the same name exist.

I believe the rest of the patch handles the --hashlimit-update feature
you mentioned, but instead of updating an existing object it creates a
new one and the old object is deleted by the call to destroy. The
hashtable match function is modified to include all parameters of the
object and not just the name so that we can reuse objects that have the
exact same features.

Re: [PATCH net] bonding: fix 802.3ad aggregator reselection

2016-07-05 Thread Jay Vosburgh

Veli-Matti Lintu  wrote:
[...]
>I have understood that only miimon is available with 802.3ad:
>
>bonding.txt:
>
>802.3ad:
>...
>Finally, the 802.3ad mode mandates the use of the MII monitor,
>therefore, the ARP monitor is not available in this mode.
>
>Is there a way to enable ARP monitor somehow?

No, there isn't.  ARP monitor is not available in 802.3ad mode.

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH net-next 06/24] rxrpc: Dup the main conn list for the proc interface

2016-07-05 Thread David Howells

David Miller  wrote:

> Wouldn't it be better to just code the proc stuff to walk whatever
> table the rest of the stack uses to hold all of the connections
> as TCP et al. do?

I should also mention that this is an intermediate step.  Splitting out the
procfs list makes it easier to fix the reap lists.  I can always make the proc
code do something different once the reap fixes have settled down.

David

Re: [PATCH net-next 05/24] rxrpc: Provide more refcount helper functions

2016-07-05 Thread David Howells

David Miller  wrote:

> I don't see anything in this patch dealing with refcount helper functions.

I'm amending the patch description to:

rxrpc: Provide queuing helper functions

Provide queueing helper functions so that the queueing of local and
connection objects can be fixed later.

The issue is that a ref on the object needs to be passed to the work queue,
but the act of queueing the object may fail because the object is already
queued.  Testing the queuedness of an object before hand doesn't work
because there can be a race with someone else trying to queue it.  What
will have to be done is to adjust the refcount depending on the result of
the queue operation.

Signed-off-by: David Howells 

but not changing the patch.

David

Re: [PATCH net-next 01/24] rxrpc: Fix processing of authenticated/encrypted jumbo packets

2016-07-05 Thread David Howells

Sergei Shtylyov  wrote:

> > When commit 0d12f8a4027d021c9cc942f09f38d28288020c5d moved to keeping the
> 
>scripts/checkpatch.pl now enforces the common commit citing style as for
> the Fixes: tag and the patch description, you need to specify the summary too.

I've now added a "Fixes:" line for the commit, but checkpatch erroneously
complains that I'm mentioning a commit ID in the commit text.

David

Re: [PATCH net-next 06/24] rxrpc: Dup the main conn list for the proc interface

2016-07-05 Thread David Howells

David Miller  wrote:

> Wouldn't it be better to just code the proc stuff to walk whatever
> table the rest of the stack uses to hold all of the connections
> as TCP et al. do?

There won't be "a table" that the rest of the stack uses.  There will be more
than one.  Service conns and client conns will be handled separately, they
will have different lifecycle strategies, different lifetimes and separate
object handling code.  It's almost worth actually having separate structs for
them, but there's sufficient common ground that it doesn't actually make
sense, I think.

David

Re: [PATCH net-next 05/24] rxrpc: Provide more refcount helper functions

2016-07-05 Thread David Howells

David Miller  wrote:

> I don't see anything in this patch dealing with refcount helper functions.

Good point.  I split that part out and you took it already.  Will amend.

David

Re: [PATCH] cxgb4: update latest firmware version supported

2016-07-05 Thread David Miller

From: Ganesh Goudar 
Date: Tue, 5 Jul 2016 18:07:24 +0530

> Change t4fw_version.h to update latest firmware version number
> 
> Signed-off-by: Ganesh Goudar 

Applied.

Re: [PATCH] net/mlx5: Avoid setting unused var when modifying vport node GUID

2016-07-05 Thread David Miller

From: Or Gerlitz 
Date: Tue,  5 Jul 2016 12:17:12 +0300

> GCC complains on unused-but-set-variable, clean this up.
> 
> Fixes: 23898c763f4a ('net/mlx5: E-Switch, Modify node guid on vf set MAC')
> Signed-off-by: Or Gerlitz 

Applied.

[PATCH] net: fix decnet rtnexthop parsing

2016-07-05 Thread Vegard Nossum

dn_fib_count_nhs() could enter an infinite loop if nhp->rtnh_len == 0
(i.e. if userspace passes a malformed netlink message).

Let's use the helpers from net/nexthop.h which take care of all this
stuff. We can do exactly the same as e.g. fib_count_nexthops() and
fib_get_nhs() from net/ipv4/fib_semantics.c.

This fixes the softlockup for me.

Cc: Thomas Graf 
Signed-off-by: Vegard Nossum 
---
 net/decnet/dn_fib.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/net/decnet/dn_fib.c b/net/decnet/dn_fib.c
index df48034..a796fc7 100644
--- a/net/decnet/dn_fib.c
+++ b/net/decnet/dn_fib.c
@@ -41,6 +41,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define RT_MIN_TABLE 1
 
@@ -150,14 +151,13 @@ static int dn_fib_count_nhs(const struct nlattr *attr)
struct rtnexthop *nhp = nla_data(attr);
int nhs = 0, nhlen = nla_len(attr);
 
-   while(nhlen >= (int)sizeof(struct rtnexthop)) {
-   if ((nhlen -= nhp->rtnh_len) < 0)
-   return 0;
+   while (rtnh_ok(nhp, nhlen)) {
nhs++;
-   nhp = RTNH_NEXT(nhp);
+   nhp = rtnh_next(nhp, );
}
 
-   return nhs;
+   /* leftover implies invalid nexthop configuration, discard it */
+   return nhlen > 0 ? 0 : nhs;
 }
 
 static int dn_fib_get_nhs(struct dn_fib_info *fi, const struct nlattr *attr,
@@ -167,21 +167,24 @@ static int dn_fib_get_nhs(struct dn_fib_info *fi, const 
struct nlattr *attr,
int nhlen = nla_len(attr);
 
change_nexthops(fi) {
-   int attrlen = nhlen - sizeof(struct rtnexthop);
-   if (attrlen < 0 || (nhlen -= nhp->rtnh_len) < 0)
+   int attrlen;
+
+   if (!rtnh_ok(nhp, nhlen))
return -EINVAL;
 
nh->nh_flags  = (r->rtm_flags&~0xFF) | nhp->rtnh_flags;
nh->nh_oif= nhp->rtnh_ifindex;
nh->nh_weight = nhp->rtnh_hops + 1;
 
-   if (attrlen) {
+   attrlen = rtnh_attrlen(nhp);
+   if (attrlen > 0) {
struct nlattr *gw_attr;
 
gw_attr = nla_find((struct nlattr *) (nhp + 1), 
attrlen, RTA_GATEWAY);
nh->nh_gw = gw_attr ? nla_get_le16(gw_attr) : 0;
}
-   nhp = RTNH_NEXT(nhp);
+
+   nhp = rtnh_next(nhp, );
} endfor_nexthops(fi);
 
return 0;
-- 
1.9.1

Re: [PATCH] xfrm: fix crash in XFRM_MSG_GETSA netlink handler

2016-07-05 Thread David Miller

From: Vegard Nossum 
Date: Tue,  5 Jul 2016 10:18:08 +0200

> If we hit any of the error conditions inside xfrm_dump_sa(), then
> xfrm_state_walk_init() never gets called. However, we still call
> xfrm_state_walk_done() from xfrm_dump_sa_done(), which will crash
> because the state walk was never initialized properly.
> 
> We can fix this by setting cb->args[0] only after we've processed the
> first element and checking this before calling xfrm_state_walk_done().
> 
> Fixes: d3623099d3 ("ipsec: add support of limited SA dump")
> Cc: Nicolas Dichtel 
> Cc: Steffen Klassert 
> Signed-off-by: Vegard Nossum 

I assume Steffen will pick this up.

Re: [PATCH net] bonding: fix enslavement slave link notifications

2016-07-05 Thread David Miller

From: Saeed Mahameed 
Date: Tue,  5 Jul 2016 12:09:47 +0300

> From: Aviv Heller 
> 
> Currently, link notifications are not sent by
> bond_set_slave_link_state() upon enslavement if
> the slave is enslaved when up.
> 
> This happens because slave->link default init value
> is 0, which is the same as BOND_LINK_UP, resulting
> in bond_set_slave_link_state() ignoring this transition.
> 
> This patch sets the default value of slave->link to
> BOND_LINK_NOCHANGE, assuring it will count as a state
> transition and thus trigger notification logic.
> 
> Signed-off-by: Aviv Heller 
> Reviewed-by: Jiri Pirko 
> Signed-off-by: Saeed Mahameed 

Applied.

[PATCH net] ipv6: Fix mem leak in rt6i_pcpu

2016-07-05 Thread Martin KaFai Lau

It was first reported and reproduced by Petr (thanks!) in
https://bugzilla.kernel.org/show_bug.cgi?id=119581

free_percpu(rt->rt6i_pcpu) used to always happen in ip6_dst_destroy().

However, after fixing a deadlock bug in
commit 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt"),
free_percpu() is not called before setting non_pcpu_rt->rt6i_pcpu to NULL.

It is worth to note that rt6i_pcpu is protected by table->tb6_lock.

kmemleak somehow did not report it.  We nailed it down by
observing the pcpu entries in /proc/vmallocinfo (first suggested
by Hannes, thanks!).

Signed-off-by: Martin KaFai Lau 
Fixes: 9c7370a166b4 ("ipv6: Fix a potential deadlock when creating pcpu rt")
Reported-by: Petr Novopashenniy 
Tested-by: Petr Novopashenniy 
Acked-by: Hannes Frederic Sowa 
Cc: Hannes Frederic Sowa 
Cc: Petr Novopashenniy 
---
 net/ipv6/ip6_fib.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 1bcef23..771be1f 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -177,6 +177,7 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
}
}
 
+   free_percpu(non_pcpu_rt->rt6i_pcpu);
non_pcpu_rt->rt6i_pcpu = NULL;
 }
 
-- 
2.5.1

[RFC PATCH 0/2] strp: Stream parser for messages

2016-07-05 Thread Tom Herbert

This patch set introduces a utility for parsing application layer
protocol messages in a TCP stream. This is a generalization of the
mechanism implemented of Kernel Connection Multiplexor.

This patch set adapts KCM to use the strparser. We expect that kTLS
can use this mechanism also. RDS would probably be another candidate
to use a commone stream parsing mechanism.

The API includes a context structure, a set of callbacks, utility
functions, and a data ready function. The callbacks include
a parse_msg function that is called to perform parsing (e.g.
BPF parsing in case of KCM), and a rcv_msg funciton that is called
when a full message has been completed.

For strparser we specify the return codes from the parser to allow
the backend to indicate that control of the socket should be
transferred back to userspace to handle some exceptions in the
stream: The return values are:

  >0 : indicates length of successfully parsed message
   0  : indicates more data must be received to parse the message
   -ESTRPIPE : current message should not be processed by the
  kernel, return control of the socket to userspace which
  can proceed to read the messages itself
   other < 0 : Error is parsing, give control back to userspace
  assuming that synchronzation is lost and the stream
  is unrecoverable (application expected to close TCP socket)

There is one issue I haven't been able to fully resolve. If the
parse_msg return ESTRPIPE (wants control back to userspace)
the parser may have already ready some bytes of the message.
There is no way to put bytes back into the TCP receive queue and
tcp_read_sock does not allow an easy way to peek messages. In
lieu of a better solution, we return ENODATA on the socket to
indicate that the data stream is unrecoverable (application needs
to close socket). This condition should only happen if an application
layer message header is split across two skbuffs and parsing just
the first skbuff wasn't sufficient to determine the that transfer
to userspace is needed.

TBD: Need to document API.

Tom Herbert (2):
  strparser: Stream parser for messages
  kcm: Use stream parser

 include/net/kcm.h |  36 +---
 include/net/strparser.h   | 146 ++
 net/Kconfig   |   1 +
 net/Makefile  |   1 +
 net/kcm/Kconfig   |   1 +
 net/kcm/kcmproc.c |  41 ++--
 net/kcm/kcmsock.c | 435 ++
 net/strparser/Kconfig |   4 +
 net/strparser/Makefile|   1 +
 net/strparser/strparser.c | 472 ++
 10 files changed, 711 insertions(+), 427 deletions(-)
 create mode 100644 include/net/strparser.h
 create mode 100644 net/strparser/Kconfig
 create mode 100644 net/strparser/Makefile
 create mode 100644 net/strparser/strparser.c

-- 
2.8.0.rc2

[RFC PATCH 1/2] strparser: Stream parser for messages

2016-07-05 Thread Tom Herbert

This patch introduces a utility for parsing application layer protocol
messages in a TCP stream. This is a generalization of the mechanism
implemented of Kernel Connection Multiplexor.

The API includes a context structure, a set of callbacks, utility
functions, and a data ready function.

A stream parser instance is defined by a strparse structure that
is bound to a TCP socket. The function to initialize the structure
is:

int strp_init(struct strparser *strp, struct sock *csk,
  struct strp_callbacks *cb);

csk is the TCP socket being bound to and cb are the parser callbacks.

A parser is bound to a TCP socket by setting data_ready function to
strp_tcp_data_ready so that all receive indications on the socket
go through the parser. This is assumes that sk_user_data is set to
the strparser structure.

There are four callbacks.
 - parse_msg is called to parse the message (returns length or error).
 - rcv_msg is called when a complete message has been received
 - read_sock_done is called when data_ready function exits
 - abort_parser is called to abort the parser

The input to parse_msg is an skbuff which contains next message under
construction. The backend processing of parse_msg will parse the
application layer protocol headers to determine the length of
the message in the stream. The possible return values are:

   >0 : indicates length of successfully parsed message
   0  : indicates more data must be received to parse the message
   -ESTRPIPE : current message should not be processed by the
  kernel, return control of the socket to userspace which
  can proceed to read the messages itself
   other < 0 : Error is parsing, give control back to userspace
  assuming that synchronzation is lost and the stream
  is unrecoverable (application expected to close TCP socket)

In the case of error return (< 0) strparse will stop the parser
and report and error to userspace. The application must deal
with the error. To handle the error the strparser is unbound
from the TCP socket. If the error indicates that the stream
TCP socket is at recoverable point (ESTRPIPE) then the application
can read the TCP socket to process the stream. Once the application
has dealt with the exceptions in the stream, it may again bind the
socket to a strparser to continue data operations.

Note that ENODATA may be returned to the application. In this case
parse_msg returned -ESTRPIPE, however strparser was unable to maintain
synchronization of the stream (i.e. some of the message in question
was already read by the parser).

strp_pause and strp_unpause are used to provide flow control. For
instance, if rcv_msg is called but the upper layer can't immediately
consume the message it can hold the message and pause strparser.

Returns values from rcv_msg callback are:
   0  : message was consumed and strparser can continue to parse the
   next message
   !0 : message is held by uppoer layer, strparser cannot proceed to
   parse the next message. The upper layer should also do
   strp_pause to prevent strparser from proceeding until
   the held message is cleared

Signed-off-by: Tom Herbert 
---
 include/net/strparser.h   | 146 ++
 net/Kconfig   |   1 +
 net/Makefile  |   1 +
 net/strparser/Kconfig |   4 +
 net/strparser/Makefile|   1 +
 net/strparser/strparser.c | 472 ++
 6 files changed, 625 insertions(+)
 create mode 100644 include/net/strparser.h
 create mode 100644 net/strparser/Kconfig
 create mode 100644 net/strparser/Makefile
 create mode 100644 net/strparser/strparser.c

diff --git a/include/net/strparser.h b/include/net/strparser.h
new file mode 100644
index 000..ef372d5
--- /dev/null
+++ b/include/net/strparser.h
@@ -0,0 +1,146 @@
+/*
+ * Stream Parser
+ *
+ * Copyright (c) 2016 Tom Herbert 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __NET_STRPARSER_H_
+#define __NET_STRPARSER_H_
+
+#include 
+#include 
+
+#define STRP_STATS_ADD(stat, count) ((stat) += (count))
+#define STRP_STATS_INCR(stat) ((stat)++)
+
+struct strp_stats {
+   unsigned long long rx_msgs;
+   unsigned long long rx_bytes;
+   unsigned int rx_mem_fail;
+   unsigned int rx_need_more_hdr;
+   unsigned int rx_msg_too_big;
+   unsigned int rx_msg_timeouts;
+   unsigned int rx_bad_hdr_len;
+};
+
+struct strp_aggr_stats {
+   unsigned long long rx_msgs;
+   unsigned long long rx_bytes;
+   unsigned int rx_mem_fail;
+   unsigned int rx_need_more_hdr;
+   unsigned int rx_msg_too_big;
+   unsigned int rx_msg_timeouts;
+   unsigned int rx_bad_hdr_len;
+   unsigned int rx_aborts;
+   unsigned int rx_interrupted;
+   unsigned int rx_unrecov_intr;
+};
+
+struct

[RFC PATCH 2/2] kcm: Use stream parser

2016-07-05 Thread Tom Herbert

Adapt KCM to use the stream parser. This mostly involves removing
the RX handling and setting up the strparser using the interface.

Signed-off-by: Tom Herbert 
---
 include/net/kcm.h |  36 +
 net/kcm/Kconfig   |   1 +
 net/kcm/kcmproc.c |  41 +++--
 net/kcm/kcmsock.c | 435 +++---
 4 files changed, 86 insertions(+), 427 deletions(-)

diff --git a/include/net/kcm.h b/include/net/kcm.h
index 2840b58..4acfa31 100644
--- a/include/net/kcm.h
+++ b/include/net/kcm.h
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 extern unsigned int kcm_net_id;
@@ -21,16 +22,8 @@ extern unsigned int kcm_net_id;
 #define KCM_STATS_INCR(stat) ((stat)++)
 
 struct kcm_psock_stats {
-   unsigned long long rx_msgs;
-   unsigned long long rx_bytes;
unsigned long long tx_msgs;
unsigned long long tx_bytes;
-   unsigned int rx_aborts;
-   unsigned int rx_mem_fail;
-   unsigned int rx_need_more_hdr;
-   unsigned int rx_msg_too_big;
-   unsigned int rx_msg_timeouts;
-   unsigned int rx_bad_hdr_len;
unsigned long long reserved;
unsigned long long unreserved;
unsigned int tx_aborts;
@@ -64,13 +57,6 @@ struct kcm_tx_msg {
struct sk_buff *last_skb;
 };
 
-struct kcm_rx_msg {
-   int full_len;
-   int accum_len;
-   int offset;
-   int early_eaten;
-};
-
 /* Socket structure for KCM client sockets */
 struct kcm_sock {
struct sock sk;
@@ -104,11 +90,11 @@ struct bpf_prog;
 /* Structure for an attached lower socket */
 struct kcm_psock {
struct sock *sk;
+   struct strparser strp;
struct kcm_mux *mux;
int index;
 
u32 tx_stopped : 1;
-   u32 rx_stopped : 1;
u32 done : 1;
u32 unattaching : 1;
 
@@ -121,18 +107,12 @@ struct kcm_psock {
struct kcm_psock_stats stats;
 
/* Receive */
-   struct sk_buff *rx_skb_head;
-   struct sk_buff **rx_skb_nextp;
-   struct sk_buff *ready_rx_msg;
struct list_head psock_ready_list;
-   struct work_struct rx_work;
-   struct delayed_work rx_delayed_work;
struct bpf_prog *bpf_prog;
struct kcm_sock *rx_kcm;
unsigned long long saved_rx_bytes;
unsigned long long saved_rx_msgs;
-   struct timer_list rx_msg_timer;
-   unsigned int rx_need_bytes;
+   struct sk_buff *ready_rx_msg;
 
/* Transmit */
struct kcm_sock *tx_kcm;
@@ -146,6 +126,7 @@ struct kcm_net {
struct mutex mutex;
struct kcm_psock_stats aggregate_psock_stats;
struct kcm_mux_stats aggregate_mux_stats;
+   struct strp_aggr_stats aggregate_strp_stats;
struct list_head mux_list;
int count;
 };
@@ -163,6 +144,7 @@ struct kcm_mux {
 
struct kcm_mux_stats stats;
struct kcm_psock_stats aggregate_psock_stats;
+   struct strp_aggr_stats aggregate_strp_stats;
 
/* Receive */
spinlock_t rx_lock cacheline_aligned_in_smp;
@@ -190,14 +172,6 @@ static inline void aggregate_psock_stats(struct 
kcm_psock_stats *stats,
/* Save psock statistics in the mux when psock is being unattached. */
 
 #define SAVE_PSOCK_STATS(_stat) (agg_stats->_stat += stats->_stat)
-   SAVE_PSOCK_STATS(rx_msgs);
-   SAVE_PSOCK_STATS(rx_bytes);
-   SAVE_PSOCK_STATS(rx_aborts);
-   SAVE_PSOCK_STATS(rx_mem_fail);
-   SAVE_PSOCK_STATS(rx_need_more_hdr);
-   SAVE_PSOCK_STATS(rx_msg_too_big);
-   SAVE_PSOCK_STATS(rx_msg_timeouts);
-   SAVE_PSOCK_STATS(rx_bad_hdr_len);
SAVE_PSOCK_STATS(tx_msgs);
SAVE_PSOCK_STATS(tx_bytes);
SAVE_PSOCK_STATS(reserved);
diff --git a/net/kcm/Kconfig b/net/kcm/Kconfig
index 5db94d9..87fca36 100644
--- a/net/kcm/Kconfig
+++ b/net/kcm/Kconfig
@@ -3,6 +3,7 @@ config AF_KCM
tristate "KCM sockets"
depends on INET
select BPF_SYSCALL
+   select STREAM_PARSER
---help---
  KCM (Kernel Connection Multiplexor) sockets provide a method
  for multiplexing messages of a message based application
diff --git a/net/kcm/kcmproc.c b/net/kcm/kcmproc.c
index fda7f47..9d0cf0d 100644
--- a/net/kcm/kcmproc.c
+++ b/net/kcm/kcmproc.c
@@ -159,8 +159,8 @@ static void kcm_format_psock(struct kcm_psock *psock, 
struct seq_file *seq,
seq_printf(seq,
   "   psock-%-5u %-10llu %-16llu %-10llu %-16llu %-8d %-8d 
%-8d %-8d ",
   psock->index,
-  psock->stats.rx_msgs,
-  psock->stats.rx_bytes,
+  psock->strp.stats.rx_msgs,
+  psock->strp.stats.rx_bytes,
   psock->stats.tx_msgs,
   psock->stats.tx_bytes,
   psock->sk->sk_receive_queue.qlen,
@@ -174,7 +174,7 @@ static void kcm_format_psock(struct kcm_psock *psock, 
struct seq_file *seq,
if (psock->tx_stopped)
seq_puts(seq, "TxStop ");
 
-   if

Re: [RESEND/BUG PATCH v3] net: smsc911x: Fix bug where PHY interrupts are overwritten by 0

2016-07-05 Thread Jeremy Linton


On 07/05/2016 01:45 PM, Sergei Shtylyov wrote:

The patch has been merged to 4.7-rc6, why resend it?


Sorry, I must have missed the merge.

Thanks,

Re: [net-next PATCH] net: tracepoint napi:napi_poll add work and budget

2016-07-05 Thread David Miller

From: Jesper Dangaard Brouer 
Date: Tue, 05 Jul 2016 19:35:27 +0200

> Can this patch go thought the net-next tree?

Sure... when it actually compiles.

If you're changing a fundamental signature like this, it is in your
best interest to do an allmodconfig build... before I do.

net/core/drop_monitor.c: In function ‘set_all_monitor_traces’:
net/core/drop_monitor.c:241:34: warning: passing argument 1 of 
‘register_trace_napi_poll’ from incompatible pointer type
   rc |= register_trace_napi_poll(trace_napi_poll_hit, NULL);
  ^
In file included from include/trace/events/skb.h:9:0,
 from net/core/drop_monitor.c:31:
include/linux/tracepoint.h:199:2: note: expected ‘void (*)(void *, struct 
napi_struct *, int,  int)’ but argument is of type ‘void (*)(void *, struct 
napi_struct *)’
  register_trace_##name(void (*probe)(data_proto), void *data) \
  ^
include/linux/tracepoint.h:348:2: note: in expansion of macro ‘__DECLARE_TRACE’
  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
  ^
include/linux/tracepoint.h:484:2: note: in expansion of macro ‘DECLARE_TRACE’
  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
  ^
include/trace/events/napi.h:13:1: note: in expansion of macro ‘TRACE_EVENT’
 TRACE_EVENT(napi_poll,
 ^
net/core/drop_monitor.c:246:36: warning: passing argument 1 of 
‘unregister_trace_napi_poll’ from incompatible pointer type
   rc |= unregister_trace_napi_poll(trace_napi_poll_hit, NULL);
^
In file included from include/trace/events/skb.h:9:0,
 from net/core/drop_monitor.c:31:
include/linux/tracepoint.h:212:2: note: expected ‘void (*)(void *, struct 
napi_struct *, int,  int)’ but argument is of type ‘void (*)(void *, struct 
napi_struct *)’
  unregister_trace_##name(void (*probe)(data_proto), void *data) \
  ^
include/linux/tracepoint.h:348:2: note: in expansion of macro ‘__DECLARE_TRACE’
  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
  ^
include/linux/tracepoint.h:484:2: note: in expansion of macro ‘DECLARE_TRACE’
  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
  ^
include/trace/events/napi.h:13:1: note: in expansion of macro ‘TRACE_EVENT’
 TRACE_EVENT(napi_poll,
 ^
  C-c C-cscripts/Makefile.build:440: recipe for target 
'drivers/net/ethernet/cavium/liquidio' failed
scripts/Makefile.build:440: recipe for target 'drivers/net/ethernet/cisco' 
failed
make[4]: *** [drivers/net/ethernet/cavium/liquidio] Interrupt
scripts/Makefile.build:440: recipe for target 'drivers/net/fddi' failed
make[2]: *** [drivers/net/fddi] Interrupt
make[3]: *** [drivers/net/ethernet/cisco] Interrupt
scripts/Makefile.build:440: recipe for target 'drivers/net/ethernet/chelsio' 
failed
make[3]: *** [drivers/net/ethernet/chelsio] Interrupt
scripts/Makefile.build:440: recipe for target 'drivers/net/ethernet/cavium' 
failed
make[3]: *** [drivers/net/ethernet/cavium] Interrupt
scripts/Makefile.build:440: recipe for target 'net/wimax' failed
make[1]: *** [net/wimax] Interrupt
scripts/Makefile.build:440: recipe for target 'drivers/pci' failed
make[1]: *** [drivers/pci] Interrupt
scripts/Makefile.build:440: recipe for target 'drivers/net/ethernet' failed
make[2]: *** [drivers/net/ethernet] Interrupt
scripts/Makefile.build:440: recipe for target 'drivers/media/usb/tm6000' failed
make[3]: *** [drivers/media/usb/tm6000] Interrupt
scripts/Makefile.build:295: recipe for target 'net/netfilter/xt_dscp.o' failed
make[2]: *** [net/netfilter/xt_dscp.o] Interrupt
scripts/Makefile.build:440: recipe for target 'drivers/net' failed
make[1]: *** [drivers/net] Interrupt
scripts/Makefile.build:440: recipe for target 'drivers/media/usb' failed
make[2]: *** [drivers/media/usb] Interrupt
scripts/Makefile.build:440: recipe for target 'drivers/media' failed
make[1]: *** [drivers/media] Interrupt
Makefile:987: recipe for target 'drivers' failed
make: *** [drivers] Interrupt
scripts/Makefile.build:440: recipe for target 'net/netfilter' failed
make[1]: *** [net/netfilter] Interrupt
Makefile:987: recipe for target 'net' failed
make: *** [net] Interrupt

Re: [RESEND/BUG PATCH v3] net: smsc911x: Fix bug where PHY interrupts are overwritten by 0

2016-07-05 Thread Sergei Shtylyov


The patch has been merged to 4.7-rc6, why resend it?

Re: [PATCH -next] connector: make cn_proc explicitly non-modular

2016-07-05 Thread David Miller

From: Paul Gortmaker 
Date: Mon, 4 Jul 2016 17:50:58 -0400

> The Kconfig controlling build of this code is currently:
> 
> drivers/connector/Kconfig:config PROC_EVENTS
> drivers/connector/Kconfig:  bool "Report process events to userspace"
> 
> ...meaning that it currently is not being built as a module by anyone.
> Lets remove the two modular references, so that when reading the driver
> there is no doubt it is builtin-only.
> 
> Since module_init translates to device_initcall in the non-modular
> case, the init ordering remains unchanged with this commit.
> 
> Cc: Evgeniy Polyakov 
> Cc: netdev@vger.kernel.org
> Signed-off-by: Paul Gortmaker 

Applied, thanks.

Re: [PATCH v2 net-next 1/1] net sched actions: mirred add support for setting Dst MAC address

2016-07-05 Thread Cong Wang

On Tue, Jul 5, 2016 at 4:04 AM, Jamal Hadi Salim  wrote:
> Second arguement  usability, from:
>
> sudo $TC filter add dev $ETH parent : pref 11 protocol ip u32 \
> match ip protocol 1 0xff flowid 1:2 \
> action pedit munge offset -14 u8 set 0x02 \
> munge offset -13 u8 set 0x15 \
> munge offset -12 u8 set 0x15 \
> munge offset -11 u8 set 0x15 \
> munge offset -10 u16 set 0x1515 \
> pipe \
> action mirred egress redirect dev $SPANPORT
>
> to:
> $TC filter add dev $ETH parent 1: protocol ip prio 10 \
> u32 match ip protocol 1 0xff flowid 1:2 \
> action mirred egress redirect dev $SPANPORT dst 02:15:15:15:15:15
>
> given that I have to do this many many times in scripts and the
> second policy is better eye candy.

How about adding a "wrapper" in iproute2 pedit action to make
it accept "dst mac xx:xx:xx:xx:xx:xx"? Like what we did for u32
filters.

Re: Problem: BUG_ON hit in ppp_pernet() when re-connect after changing shared key on LAC

2016-07-05 Thread Cong Wang

On Mon, Jul 4, 2016 at 7:50 PM, Matt Bennett
 wrote:
> Using printk I have confirmed that ppp_pernet() is called from
> ppp_connect_channel() when the BUG occurs (i.e. pch->chan_net is NULL).
>
> This behavior appears to have been introduced in commit 1f461dc ("ppp:
> take reference on channels netns").

We have some race condition here, where a parallel ppp_unregister_channel()
could happen while we are in ppp_connect_channel().

We need some synchronization for them. I am not sure what is the right lock
here since ppp locking looks crazy.

Re: [PATCH net-next 01/24] rxrpc: Fix processing of authenticated/encrypted jumbo packets

2016-07-05 Thread Sergei Shtylyov


Hello.

On 07/05/2016 04:12 PM, David Howells wrote:


When a jumbo packet is being split up and processed, the crypto checksum
for each split-out packet is in the jumbo header and needs placing in the
reconstructed packet header.

When commit 0d12f8a4027d021c9cc942f09f38d28288020c5d moved to keeping the


   scripts/checkpatch.pl now enforces the common commit citing style as for 
the Fixes: tag and the patch description, you need to specify the summary too.



stored copy of the packet header in host byte order, this reconstruction
was missed.

Found with sparse with CF=-D__CHECK_ENDIAN__:

../net/rxrpc/input.c:479:33: warning: incorrect type in assignment 
(different base types)
../net/rxrpc/input.c:479:33:expected unsigned short [unsigned] 
[usertype] _rsvd
../net/rxrpc/input.c:479:33:got restricted __be16 [addressable] 
[usertype] _rsvd

Signed-off-by: David Howells 

[...]

MBR, Sergei

[net-next PATCH] net: tracepoint napi:napi_poll add work and budget

2016-07-05 Thread Jesper Dangaard Brouer

An important information for the napi_poll tracepoint is knowing
the work done (packets processed) by the napi_poll() call. Add
both the work done and budget, as they are related.

Signed-off-by: Jesper Dangaard Brouer 
---
Can this patch go thought the net-next tree?

 include/trace/events/napi.h |   13 +
 net/core/dev.c  |4 ++--
 net/core/netpoll.c  |2 +-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/trace/events/napi.h b/include/trace/events/napi.h
index 8fe1e93f531d..118ed7767639 100644
--- a/include/trace/events/napi.h
+++ b/include/trace/events/napi.h
@@ -12,22 +12,27 @@
 
 TRACE_EVENT(napi_poll,
 
-   TP_PROTO(struct napi_struct *napi),
+   TP_PROTO(struct napi_struct *napi, int work, int budget),
 
-   TP_ARGS(napi),
+   TP_ARGS(napi, work, budget),
 
TP_STRUCT__entry(
__field(struct napi_struct *,   napi)
+   __field(int,work)
+   __field(int,budget)
__string(   dev_name, napi->dev ? napi->dev->name : NO_DEV)
),
 
TP_fast_assign(
__entry->napi = napi;
+   __entry->work = work;
+   __entry->budget = budget;
__assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
),
 
-   TP_printk("napi poll on napi struct %p for device %s",
-   __entry->napi, __get_str(dev_name))
+   TP_printk("napi poll on napi struct %p for device %s work %d budget %d",
+ __entry->napi, __get_str(dev_name),
+ __entry->work, __entry->budget)
 );
 
 #undef NO_DEV
diff --git a/net/core/dev.c b/net/core/dev.c
index b8cc5e979168..7a4e9a714fb3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4973,7 +4973,7 @@ bool sk_busy_loop(struct sock *sk, int nonblock)
 
if (test_bit(NAPI_STATE_SCHED, >state)) {
rc = napi->poll(napi, BUSY_POLL_BUDGET);
-   trace_napi_poll(napi);
+   trace_napi_poll(napi, rc, BUSY_POLL_BUDGET);
if (rc == BUSY_POLL_BUDGET) {
napi_complete_done(napi, rc);
napi_schedule(napi);
@@ -5129,7 +5129,7 @@ static int napi_poll(struct napi_struct *n, struct 
list_head *repoll)
work = 0;
if (test_bit(NAPI_STATE_SCHED, >state)) {
work = n->poll(n, weight);
-   trace_napi_poll(n);
+   trace_napi_poll(n, work, weight);
}
 
WARN_ON_ONCE(work > weight);
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 94acfc89ad97..53599bd0c82d 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -163,7 +163,7 @@ static void poll_one_napi(struct napi_struct *napi)
 */
work = napi->poll(napi, 0);
WARN_ONCE(work, "%pF exceeded budget in poll\n", napi->poll);
-   trace_napi_poll(napi);
+   trace_napi_poll(napi, work, 0);
 
clear_bit(NAPI_STATE_NPSVC, >state);
 }

Re: [PATCH net-next 00/10] NCSI Support

2016-07-05 Thread Alexei Starovoitov

On Mon, Jul 04, 2016 at 01:03:06AM +0300, Or Gerlitz wrote:
> On Sun, Jul 3, 2016 at 8:32 AM, Gavin Shan  wrote:
> > This series rebases on David's linux-net git repo ("master" branch). It's
> > to support NCSI stack on net/farady/ftgmac100.c
> >
> > The following figure gives an example about how NCSI is deployed: The NCSI 
> > is
> > specified by DSP0222, which can be downloaded from the following link here
> > (http://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.0.0.pdf).
> >
> >* The NC-SI (aka NCSI) is defined as the interface between a (Base) 
> > Management
> >  Controller (BMC) and one or multiple Network Controlers (NC) on host 
> > side.
> >  The interface is responsible for providing external network 
> > connectivity
> >  for BMC.
> >* Each BMC can connect to multiple packages, up to 8. Each package can 
> > have
> >  multiple channels, up to 32. Every package and channel are identified 
> > by
> >  3-bits and 5-bits in NCSI packet. At one moment, one channel is active 
> > to
> >  provide service.
> >* NCSI packet, encapsulated in ethernet frame, has 0x88F8 in the protocol
> >  field. The destination MAC address should be 0xFF's while the source 
> > MAC
> >  address can be arbitrary one.
> >* NCSI packets are classified to command, response, AEN (Asynchronous 
> > Event
> >  Notification). Commands are sent from BMC to host for configuration and
> >  information retrival. Responses, corresponding to commands, are sent 
> > from
> >  host to BMC for confirmation and requested information. One command 
> > should
> >  have one and only one response. AEN is sent from host to BMC for 
> > notification
> >  (e.g. link down on active channel) so that BMC can take appropriate 
> > action.
> >
> >+--+
> > +--+
> >|  || Host   
> >   |
> >|BMC   ||
> >   |
> >|  || +---+  
> > +---+ |
> >|+-+   || | Package-A |  | Package-B 
> > | |
> >|| |   || +-+-+  
> > +---+ |
> >||   NIC   |   || | Channel | Channel |  | Channel | Channel 
> > | |
> >++++---+
> > +-+-+-+--+-+-+-+
> >  | |  |
> >  | |  |
> >  +-+--+
> >
> > The design for the patchset is highlighted as below:
> >
> >* The NCSI interface is abstracted with "struct ncsi_dev". It's 
> > registered
> >  when net_device is created, started to work by calling ncsi_start_dev()
> >  when net_device is opened (ndo_open()). For the first time, NCSI 
> > packets
> >  are sent and received to/from the far end (host in above figure) to 
> > probe
> >  available NCSI packages and channels. After that, one channel is 
> > chosen as
> >  active one to provide service.
> >* The NCSI stack is driven by workqueue and state machine internally.
> >* AEN (Asychronous Event Notification) might be received from the far end
> >  (host). The currently active NCSI channel fails over to another 
> > available
> >  one if possible. Otherwise, the NCSI channel is out of service.
> >* NCSI stack should be configurable through netlink or another mechanism,
> >  but it's not implemented in this patchset. It's something TBD.

Gavin,
what configurations do you have in mind?
For ncsi itself or to control the nic on the host?
This set of patches is for BMC side, right?
What needs to be done on the host?

> >* The first NIC driver that is aware of NCSI: 
> > drivers/net/ethernet/faraday/ftgmac100.c
> 
> 
> FWIW, talking to a colleague, he made a comment that some of the text
> above is wrong:
> 
> AENs are sent from NIC to BMC. Not from Host to BMC.
> 
> The traffic between a BMC and a NIC is over RBT if it is formatted as
> NC-SI packets. This is not over network traffic
> 
> Or.

Or,
since cx4 has ncsi as well, could you do a thorough review of this
to make sure that it fits mellanox nics as well?

Re: [PATCH net] r8152: fix runtime function for RTL8152

2016-07-05 Thread David Miller

From: Hayes Wang 
Date: Tue, 5 Jul 2016 16:11:46 +0800

> The RTL8152 doesn't have U1U2 and U2P3 features, so use different
> runtime functions for RTL812 and RTL8153 by adding autosuspend_en()
> to rtl_ops.
> 
> Signed-off-by: Hayes Wang 

Applied, thanks.

Re: dccp: potential deadlock in dccp_v4_ctl_send_reset

2016-07-05 Thread Eric Dumazet

On Tue, 2016-07-05 at 10:17 -0700, Cong Wang wrote:
> On Tue, Jul 5, 2016 at 4:59 AM, Dmitry Vyukov  wrote:
> > other info that might help us debug this:
> >  Possible unsafe locking scenario:
> >
> >CPU0
> >
> >   lock(slock-AF_INET);
> >   
> > lock(slock-AF_INET);
> >
> >  *** DEADLOCK ***
> >
> > 1 lock held by syz-executor/354:
> >  #0:  (sk_lock-AF_INET){+.+.+.}, at: [< inline >] lock_sock
> > include/net/sock.h:1388
> >  #0:  (sk_lock-AF_INET){+.+.+.}, at: []
> > inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:660
> >
> > stack backtrace:
> > CPU: 3 PID: 354 Comm: syz-executor Not tainted 4.7.0-rc5+ #28
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> >  880b58e0 8800361378c0 82cc01af 
> >  fbfff1016b1c 88003abfe840 899bb700 88003abff0a8
> >  86cae460 0001 880036137930 8147684d
> > Call Trace:
> >  [< inline >] __dump_stack lib/dump_stack.c:15
> >  [] dump_stack+0x12e/0x18f lib/dump_stack.c:51
> >  [] print_usage_bug+0x34d/0x3a0 
> > kernel/locking/lockdep.c:2383
> >  [< inline >] valid_state kernel/locking/lockdep.c:2396
> >  [< inline >] mark_lock_irq kernel/locking/lockdep.c:2594
> >  [] mark_lock+0xbec/0xe80 kernel/locking/lockdep.c:3057
> >  [< inline >] mark_irqflags kernel/locking/lockdep.c:2933
> >  [] __lock_acquire+0xd3e/0x2fb0 
> > kernel/locking/lockdep.c:3287
> >  [] lock_acquire+0x1e3/0x460 kernel/locking/lockdep.c:3741
> >  [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:144
> >  [] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
> >  [< inline >] spin_lock include/linux/spinlock.h:302
> >  [] dccp_v4_ctl_send_reset+0xac1/0x10d0 
> > net/dccp/ipv4.c:530
> >  [] dccp_v4_do_rcv+0xf9/0x190 net/dccp/ipv4.c:684
> >  [< inline >] sk_backlog_rcv include/net/sock.h:872
> >  [] __release_sock+0x127/0x3a0 net/core/sock.c:2058
> >  [] release_sock+0x59/0x1c0 net/core/sock.c:2516
> >  [] inet_stream_connect+0x78/0xa0 net/ipv4/af_inet.c:662
> >  [] SYSC_connect+0x23e/0x2e0 net/socket.c:1536
> >  [] SyS_connect+0x24/0x30 net/socket.c:1517
> >  [] entry_SYSCALL_64_fastpath+0x23/0xc1
> > arch/x86/entry/entry_64.S:207
> 
> This is probably a known deadlock for sk backlog recv path,
> at least the comments on tcp_v4_do_rcv() mentioned this:
> 
> 
>  * We have a potential double-lock case here, so even when
>  * doing backlog processing we use the BH locking scheme.
>  * This is because we cannot sleep with the original spinlock
>  * held.
> 
> the ->sk_backlog_rcv() is called in process context, which
> is not supposed to hold bh_lock_sock, but most of its
> implementations are called in BH context too... Interesting...


Very similar to a previous report in TCP stack, fixed
in commit 47dcc20a39d06585bf3cb9fb381f0e81c20002c3
("ipv4: tcp: ip_send_unicast_reply() is not BH safe")

I would try :

diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index 
5c7e413a3ae407e67565b48a8bd6f43e3b02de4d..3cdf5b29ce451e7b7c3290ebb231cf5f4e43f202
 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -527,10 +527,12 @@ static void dccp_v4_ctl_send_reset(const struct sock *sk, 
struct sk_buff *rxskb)
 rxiph->daddr);
skb_dst_set(skb, dst_clone(dst));
 
+   local_bh_disable();
bh_lock_sock(ctl_sk);
err = ip_build_and_send_pkt(skb, ctl_sk,
rxiph->daddr, rxiph->saddr, NULL);
bh_unlock_sock(ctl_sk);
+   local_bh_enable();
 
if (net_xmit_eval(err) == 0) {
DCCP_INC_STATS(DCCP_MIB_OUTSEGS);

Re: [PATCH -next v2] net: hns: fix return value check in hns_dsaf_get_cfg()

2016-07-05 Thread David Miller

From: weiyj...@163.com
Date: Tue,  5 Jul 2016 07:56:52 +

> From: Wei Yongjun 
> 
> In case of error, function devm_ioremap_resource() returns ERR_PTR()
> and never returns NULL. The NULL test in the return value check should
> be replaced with IS_ERR().
> 
> Signed-off-by: Wei Yongjun 

Applied, thanks.

Re: [PATCH net-next 06/24] rxrpc: Dup the main conn list for the proc interface

2016-07-05 Thread David Miller

From: David Howells 
Date: Tue, 05 Jul 2016 14:12:54 +0100

> The main connection list is used for two independent purposes: primarily it
> is used to find connections to reap and secondarily it is used to list
> connections in procfs.
> 
> Split the procfs list out from the reap list.  This allows the reap list to
> be phased out in stages as the client conns and service conns acquire their
> own separate connection management strategies.
> 
> Whilst we're at it, use the address information stored in conn->proto when
> displaying through procfs rather than accessing the peer record.
> 
> Signed-off-by: David Howells 

Wouldn't it be better to just code the proc stuff to walk whatever
table the rest of the stack uses to hold all of the connections
as TCP et al. do?

Re: dccp: potential deadlock in dccp_v4_ctl_send_reset

2016-07-05 Thread Cong Wang

On Tue, Jul 5, 2016 at 4:59 AM, Dmitry Vyukov  wrote:
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>CPU0
>
>   lock(slock-AF_INET);
>   
> lock(slock-AF_INET);
>
>  *** DEADLOCK ***
>
> 1 lock held by syz-executor/354:
>  #0:  (sk_lock-AF_INET){+.+.+.}, at: [< inline >] lock_sock
> include/net/sock.h:1388
>  #0:  (sk_lock-AF_INET){+.+.+.}, at: []
> inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:660
>
> stack backtrace:
> CPU: 3 PID: 354 Comm: syz-executor Not tainted 4.7.0-rc5+ #28
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
>  880b58e0 8800361378c0 82cc01af 
>  fbfff1016b1c 88003abfe840 899bb700 88003abff0a8
>  86cae460 0001 880036137930 8147684d
> Call Trace:
>  [< inline >] __dump_stack lib/dump_stack.c:15
>  [] dump_stack+0x12e/0x18f lib/dump_stack.c:51
>  [] print_usage_bug+0x34d/0x3a0 
> kernel/locking/lockdep.c:2383
>  [< inline >] valid_state kernel/locking/lockdep.c:2396
>  [< inline >] mark_lock_irq kernel/locking/lockdep.c:2594
>  [] mark_lock+0xbec/0xe80 kernel/locking/lockdep.c:3057
>  [< inline >] mark_irqflags kernel/locking/lockdep.c:2933
>  [] __lock_acquire+0xd3e/0x2fb0 
> kernel/locking/lockdep.c:3287
>  [] lock_acquire+0x1e3/0x460 kernel/locking/lockdep.c:3741
>  [< inline >] __raw_spin_lock include/linux/spinlock_api_smp.h:144
>  [] _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
>  [< inline >] spin_lock include/linux/spinlock.h:302
>  [] dccp_v4_ctl_send_reset+0xac1/0x10d0 net/dccp/ipv4.c:530
>  [] dccp_v4_do_rcv+0xf9/0x190 net/dccp/ipv4.c:684
>  [< inline >] sk_backlog_rcv include/net/sock.h:872
>  [] __release_sock+0x127/0x3a0 net/core/sock.c:2058
>  [] release_sock+0x59/0x1c0 net/core/sock.c:2516
>  [] inet_stream_connect+0x78/0xa0 net/ipv4/af_inet.c:662
>  [] SYSC_connect+0x23e/0x2e0 net/socket.c:1536
>  [] SyS_connect+0x24/0x30 net/socket.c:1517
>  [] entry_SYSCALL_64_fastpath+0x23/0xc1
> arch/x86/entry/entry_64.S:207

This is probably a known deadlock for sk backlog recv path,
at least the comments on tcp_v4_do_rcv() mentioned this:


 * We have a potential double-lock case here, so even when
 * doing backlog processing we use the BH locking scheme.
 * This is because we cannot sleep with the original spinlock
 * held.

the ->sk_backlog_rcv() is called in process context, which
is not supposed to hold bh_lock_sock, but most of its
implementations are called in BH context too... Interesting...

Re: [PATCH net-next 05/24] rxrpc: Provide more refcount helper functions

2016-07-05 Thread David Miller

From: David Howells 
Date: Tue, 05 Jul 2016 14:12:47 +0100

> Provide refcount helper functions for connections so that the code doesn't
> touch conn->usage directly.
> 
> Also provide queueing helper functions so that the queueing of local and
> connection objects can be fixed later.
> 
> Signed-off-by: David Howells 

I don't see anything in this patch dealing with refcount helper functions.

Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc

2016-07-05 Thread Florian Fainelli

On 07/05/2016 08:56 AM, Mason wrote:
> On 05/07/2016 17:28, Florian Fainelli wrote:
> 
>> nb8800.c does not currently show suspend/resume hooks implemented, are
>> you positive that when you suspend, you properly tear down all HW, stop
>> transmit queues, etc. and do the opposite upon resumption?
> 
> I am currently testing the error path for my suspend routine.
> Firmware is, in fact, denying the suspend request, and immediately
> returns control to Linux, without having powered anything down.
> 
> I expected not having to save any context in that situation.
> Am I mistaken?

It depends what power state you are going to and resuming from, and how
much of this is platform dependent, on the platforms I work with S2
preserves register states for our On/Off domain, while S3 only keeps an
always-on power island and shuts off the On/Off domain, you therefore
need to have your drivers in the On/Off domain suspend any activity and
preserve important register states, or re-initialize them from scratch
whichever is the most convenient.


> 
> You mention "stop transmit queues". Can you say more about this?

See drivers/net/ethernet/broadcom/genet/bcmgenet.c which is a driver
that takes care of that for instance, look for bcmgenet_{suspend,resume}

> 
>> Is your system clocksource also correctly saved/restored, or if you go
>> through a firmware in-between could it be changing the counter values
>> and make Linux think that more time as elapsed than it really happened?
> 
> Thanks for pointing this out, I was not aware I was supposed to save
> and restore the tick counter on suspend/resume. (This is not an issue
> in this specific situation, as the platform is NOT suspended.)

You don't have to save and restore the clocksource counter, although if
you want proper time accounting to be done across suspend states, you
would want to use a clocksource which is persistent across these suspend
states.

> 
> However, your remark has brought some more confusion to my mind.
> Linux is expecting time to stand still when it suspends?
> What if the tick counter is in an always-on power domain, and other
> processors depend on the counter? I can't just overwrite the reg
> when Linux resumes...

The point is more that if the firmware initializes the timer, or even
re-initializes it, Linux could think that events expired because the
timebase has a big offset compared to where it was. Just pointing out
that this *could* be a problem. If your timer is in the always on domain
and your firmware does not touch it, that should be fine without
anything specific (except adding an "always-on" boolean property to the
timer nodes in DT maybe).
-- 
Florian

Re: [patch net-next 00/16] mlxsw: Implement IPV4 unicast routing

2016-07-05 Thread David Miller

From: Jiri Pirko 
Date: Tue,  5 Jul 2016 11:27:36 +0200

> This patchset enables IPv4 unicast routing in the Mellanox Spectrum ASIC
> switch driver. This builds upon the work that was done by a couple of
> previous patchsets.
> 
> Patches 1,2,6 add a couple of dependencies outside the driver. Namely, the
> ability to propagate ndo_neigh_construct()/destroy() through stacked devices 
> and
> a notification whenever DELAY_PROBE_TIME changes. When propagated down, the
> ndos allow drivers to add and remove neighbour entries from their private
> neighbour table. The DELAY_PROBE_TIME notification gives drivers the ability 
> to
> correctly configure their polling interval for neighbour activity, so that
> active neighbour won't be marked as STALE.
> 
> Patches 3-5,7-8 add the neighbour offloading infrastructure, where patch 7 
> uses
> the DELAY_PROBE_TIME notification in order to correctly configure the device's
> polling interval. Patch 8 finally programs neighbours to the device's table
> based on NEIGH_UPDATE notifications, so that directly connected routes can
> be used.
> 
> Patches 9-16 build upon the previous patches and extend the router with
> remote routes (nexthop) support.

Looks good to me, series applied, thanks Jiri!

Re: ethtool needs a new maintainer

2016-07-05 Thread John W. Linville

On Mon, Jul 04, 2016 at 07:18:48PM +0200, Ben Hutchings wrote:
> On Mon, 2016-06-27 at 09:51 -0400, John W. Linville wrote:
> > On Sun, Jun 26, 2016 at 06:11:41PM +0200, Ben Hutchings wrote:
> > > I've become steadily less enthusiastic and less responsive as a
> > > maintainer over the past year or so.  I no longer work on networking
> > > regularly, so it takes a lot more time to get into the right state of
> > > mind to think about ethtool code, while I have other demands on my time
> > > that tend to take priority.
> > > 
> > > So, I would like to find a new maintainer to take over as soon as
> > > possible.  Ideally the new maintainer would have previous contributions
> > > to ethtool and an existing account on kernel.org so that they can push
> > > to the git repository and the home page.  But neither of those is
> > > essential.  Please reply if you're interested.
> > 
> > I would like to take this responsibility. My previous contributions
> > to ethtool are meager, but I think my skills and interests are suited
> > to the task.  Plus, I already have a kernel.org account... :-)
> [...]
> 
> It was a tough choice, but I'm pleased to accept your offer. :-)
> 
> I'll ask for the permissions to be updated accordingly.
> 
> Ben.

Awesome, thanks.  And, of course, thank you for your service to the
community as ethtool maintainer for the past several (5+) years!

John
-- 
John W. LinvilleSomeday the world will need a hero, and you
linvi...@tuxdriver.com  might be all we have.  Be ready.

Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc

2016-07-05 Thread Mason

On 05/07/2016 17:28, Florian Fainelli wrote:

> nb8800.c does not currently show suspend/resume hooks implemented, are
> you positive that when you suspend, you properly tear down all HW, stop
> transmit queues, etc. and do the opposite upon resumption?

I am currently testing the error path for my suspend routine.
Firmware is, in fact, denying the suspend request, and immediately
returns control to Linux, without having powered anything down.

I expected not having to save any context in that situation.
Am I mistaken?

You mention "stop transmit queues". Can you say more about this?

> Is your system clocksource also correctly saved/restored, or if you go
> through a firmware in-between could it be changing the counter values
> and make Linux think that more time as elapsed than it really happened?

Thanks for pointing this out, I was not aware I was supposed to save
and restore the tick counter on suspend/resume. (This is not an issue
in this specific situation, as the platform is NOT suspended.)

However, your remark has brought some more confusion to my mind.
Linux is expecting time to stand still when it suspends?
What if the tick counter is in an always-on power domain, and other
processors depend on the counter? I can't just overwrite the reg
when Linux resumes...

Regards.

[PATCH] Add support for configuring Infiniband GUIDs

2016-07-05 Thread Eli Cohen

Add two NLA's that allow configuration of Infiniband node or port GUIDs
by referencing the IPoIB net device set over then physical function. The
format to be used is as follows:

ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78

Signed-off-by: Eli Cohen 
---

Changes from V1:
Removed internal issue number and gerrit ID

 ip/iplink.c   | 40 
 man/man8/ip-link.8.in | 12 +++-
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index b1f8a37922f5..0d2d750f2887 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -267,6 +267,30 @@ static int nl_get_ll_addr_len(unsigned int dev_index)
return RTA_PAYLOAD(tb[IFLA_ADDRESS]);
 }
 
+static int extract_guid(__u64 *guid, char *arg)
+{
+   __u64 ret;
+   int g[8];
+   int err;
+
+   err = sscanf(arg, "%02x:%02x:%02x:%02x:%02x:%02x:%02x:%02x",
+g, g + 1, g + 2, g + 3, g + 4, g + 5, g + 6, g + 7);
+   if (err != 8)
+   return -1;
+
+   ret = ((__u64)(g[0]) << 56) |
+ ((__u64)(g[1]) << 48) |
+ ((__u64)(g[2]) << 40) |
+ ((__u64)(g[3]) << 32) |
+ ((__u64)(g[4]) << 24) |
+ ((__u64)(g[5]) << 16) |
+ ((__u64)(g[6]) << 8) |
+ ((__u64)(g[7]));
+   *guid = ret;
+
+   return 0;
+}
+
 static int iplink_parse_vf(int vf, int *argcp, char ***argvp,
   struct iplink_req *req, int dev_index)
 {
@@ -420,6 +444,22 @@ static int iplink_parse_vf(int vf, int *argcp, char 
***argvp,
invarg("Invalid \"state\" value\n", *argv);
ivl.vf = vf;
addattr_l(>n, sizeof(*req), IFLA_VF_LINK_STATE, 
, sizeof(ivl));
+   } else if (matches(*argv, "node_guid") == 0) {
+   struct ifla_vf_guid ivg;
+
+   NEXT_ARG();
+   ivg.vf = vf;
+   if (extract_guid(, *argv))
+   return -1;
+   addattr_l(>n, sizeof(*req), IFLA_VF_IB_NODE_GUID, 
, sizeof(ivg));
+   } else if (matches(*argv, "port_guid") == 0) {
+   struct ifla_vf_guid ivg;
+
+   NEXT_ARG();
+   ivg.vf = vf;
+   if (extract_guid(, *argv))
+   return -1;
+   addattr_l(>n, sizeof(*req), IFLA_VF_IB_PORT_GUID, 
, sizeof(ivg));
} else {
/* rewind arg */
PREV_ARG();
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index 375b4d081408..07f0a94a289a 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -146,7 +146,11 @@ ip-link \- network device configuration
 .br
 .RB "[ " state " { " auto " | " enable " | " disable " } ]"
 .br
-.RB "[ " trust " { " on " | " off " } ] ]"
+.RB "[ " trust " { " on " | " off " } ]"
+.br
+.RB "[ " node_guid " eui64 ]"
+.br
+.RB "[ " port_guid " eui64 ] ]"
 .br
 .in -9
 .RB "[ " master
@@ -1191,6 +1195,12 @@ sent by the VF.
 .BI trust " on|off"
 - trust the specified VF user. This enables that VF user can set a specific 
feature
 which may impact security and/or performance. (e.g. VF multicast promiscuous 
mode)
+.sp
+.BI node_guid " eui64"
+- configure node GUID for the VF.
+.sp
+.BI port_guid " eui64"
+- configure port GUID for the VF.
 .in -8
 
 .TP
-- 
2.8.1

Re: DCB Auto turn off when link become up with Intel 82599ES ixgbe driver

2016-07-05 Thread Alexander Duyck

On Tue, Jul 5, 2016 at 8:17 AM, ayuj  wrote:
> I'm trying to configure DCB in back-to-back scenario.
> System details
>
> OS :- CentOS 7.2
> kernel 3.10.0-327.el7.x86_64
> lldpad:- lldpad v0.9.46
> dcbtool:- v0.9.46
> ixgbe :- ixgbe-4.3.15
>
> steps followed:-
>
> # modporbe ixgbe
> # service lldpad start
> Redirecting to /bin/systemctl start  lldpad.service
>
>
> # service lldpad status
> Redirecting to /bin/systemctl status  lldpad.service
> ● lldpad.service - Link Layer Discovery Protocol Agent Daemon.
>Loaded: loaded (/usr/lib/systemd/system/lldpad.service; disabled; vendor
> preset: disabled)
>Active: active (running) since Tue 2016-07-05 05:49:12 EDT; 1s ago
>  Main PID: 133737 (lldpad)
>CGroup: /system.slice/lldpad.service
>└─133737 /usr/sbin/lldpad -t
>
> Jul 05 05:49:12 localhost.localdomain systemd[1]: Started Link Layer
> Discovery Protocol Agent Daemon..
> Jul 05 05:49:12 localhost.localdomain systemd[1]: Starting Link Layer
> Discovery Protocol Agent Daemon
>
>
> # dcbtool gc p3p2 dcb
> Command:Get Config
> Feature:DCB State
> Port:   p3p2
> Status: Successful
> DCB State:  off
>
> # dcbtool sc p3p2 dcb on
> Command:Set Config
> Feature:DCB State
> Port:   p3p2
> Status: Successful
>
> # dcbtool gc p3p2 dcb
> Command:Get Config
> Feature:DCB State
> Port:   p3p2
> Status: Successful
> DCB State:  on
>
>
> Similar configuration on another system.
>
> But as soon as I do "ifconfig p3p2 up and check dcb
>
> # dcbtool gc p3p2 dcb
> Command:Get Config
> Feature:DCB State
> Port:   p3p2
> Status: Successful
> DCB State:  off
>
> Initially suspecting issue with lldpad daemon/ixgbe driver. I replaced them
> (downloaded from repo) with latest packages.
> Same behaviour observed.
>
> - Tried disabling other dcb features like fcoe. Same issue.
> - Disabled selinux etc
>
>
> Can any 1 help me regarding this.

I'm adding e1000-devel as that is the mailing list for the ixgbe maintainers.

You might want to go through and also check the number of TCs and how
you have them mapped to your priority groups.  It is possible that one
end or the other may only have 1 TC configured and if I am not
mistaken that is interpreted as having DCB disabled by the hardware.

Hope that helps.

- Alex

ethtool TODO list - additional info

2016-07-05 Thread Jorge Alberto Garcia

Hi !

Some days ago, Jiri Pirko was talking about some next steps to
implement for ethtool.

 I haven't seen any follow up since ethtool's maintainer change. Can we have
additional details about these ?

- libethtool - API
- generic netlink
- sub commands syntax
- TODO/bugzilla


On Mon, Jul 4, 2016 at 9:24 AM, Jiri Pirko  wrote:

>>> I was thinking of adding a TODO file to the repository, but it's really
>>> for the new maintainer to decide what to do.  So here's my list as a
>>> suggestion:
>>>
>>> * Add regression test coverage for all sub-commands with complex logic
>>>
>>> * Internationalise output and error messages
>>>
>>> * Build a libethtool that handles all the API quirks and fallbacks for
>>>   old kernel versions.  This might help people writing language
>>>   bindings or other utilities that use the ethtool API.
>>>
>>> * Provide a 'cleaned up' ethtool (under some other name) that has:
>>>   - More conventional sub-command syntax, i.e. no '-'/'--' prefix
>>>   - More consistent output formatting
>>
>>That seems like a reasonable start for a TODO list. I'll bet there
>>are a few people out there with other suggestions as well...?
>
> Before that, I would like to see ethtool migrate to use generic
> netlink. Then, the new tool would be needed anyway, should exist within
> iproute2 package and have similar command line syntax.
>
> I have some ideas about the gennetlink ethtool, have to find some time
> to implement some initial part of it.

Backport bpf: try harder on clones when writing into skb? [Commit: 3697649ff29e0f647565eed04b27a7779c646a22]

2016-07-05 Thread Sargun Dhillon

Does it make sense to backport
3697649ff29e0f647565eed04b27a7779c646a22 from 4.6 to the longterm
(4.4) release? I can trivially recreate the issue represented by
3697649ff29e0f647565eed04b27a7779c646a22 by attaching a eBPF filter
that clones an ingress ICMP packet, and then tries to set the
destination MAC address.

It seems like the patch applies cleanly to 4.4. I cherry-picked it,
and rebuilt my kernel, and at least in the trivial test case passes.

[RESEND/BUG PATCH v3] net: smsc911x: Fix bug where PHY interrupts are overwritten by 0

2016-07-05 Thread Jeremy Linton

By default, mdiobus_alloc() sets the PHYs to polling mode, but a
pointer size memcpy means that a couple IRQs end up being overwritten
with a value of 0. This means that PHY_POLL is disabled and results
in unpredictable behavior depending on the PHY's location on the
MDIO bus. Remove that memcpy and the now unused phy_irq member to
force the SMSC911x PHYs into polling mode 100% of the time.

Fixes: e7f4dc3536a4 ("mdio: Move allocation of interrupts into core")

Signed-off-by: Jeremy Linton 
Reviewed-by: Andrew Lunn 
Acked-by: Sergei Shtylyov 
---
 drivers/net/ethernet/smsc/smsc911x.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/smsc/smsc911x.c 
b/drivers/net/ethernet/smsc/smsc911x.c
index 8af2556..b5ab5e1 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -116,7 +116,6 @@ struct smsc911x_data {
 
struct phy_device *phy_dev;
struct mii_bus *mii_bus;
-   int phy_irq[PHY_MAX_ADDR];
unsigned int using_extphy;
int last_duplex;
int last_carrier;
@@ -1073,7 +1072,6 @@ static int smsc911x_mii_init(struct platform_device *pdev,
pdata->mii_bus->priv = pdata;
pdata->mii_bus->read = smsc911x_mii_read;
pdata->mii_bus->write = smsc911x_mii_write;
-   memcpy(pdata->mii_bus->irq, pdata->phy_irq, sizeof(pdata->mii_bus));
 
pdata->mii_bus->parent = >dev;
 
-- 
2.5.5

Re: WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 inet_sock_destruct+0x1c4/0x1dc

2016-07-05 Thread Florian Fainelli

Le 05/07/2016 07:50, Mason a écrit :
> On 05/07/2016 15:33, Mason wrote:
> 
>> I was testing suspend/resume sequences where the suspend operation
>> fails and returns without having suspended the platform.
>>
>> # echo mem > /sys/power/state
>> [   90.322264] PM: Syncing filesystems ... done.
>> [   90.328758] Freezing user space processes ... (elapsed 0.001 seconds) 
>> done.
>> [   90.337092] Double checking all user space processes after OOM killer 
>> disable... (elapsed 0.000 seconds)
>> [   90.346765] Freezing remaining freezable tasks ... (elapsed 0.001 
>> seconds) done.
>> [   90.355357] Suspending console(s) (use no_console_suspend to debug)
>> [   90.364590] PM: suspend of devices complete after 2.068 msecs
>> [   90.365554] PM: late suspend of devices complete after 0.954 msecs
>> [   90.366223] PM: noirq suspend of devices complete after 0.662 msecs
>> [   90.366227] Disabling non-boot CPUs ...
>> [   90.379004] CPU1: shutdown
>> [   90.412661] Enabling non-boot CPUs ...
>> [   90.450385] CPU1 is up
>> [   90.450979] PM: noirq resume of devices complete after 0.584 msecs
>> [   90.451672] PM: early resume of devices complete after 0.667 msecs
>> [   90.453149] nb8800 26000.ethernet eth0: Link is Down
>> [   90.453264] PM: resume of devices complete after 1.583 msecs
>> [   90.508180] Restarting tasks ... done.
>> -sh: echo: write error: Input/output error
>> [   93.860411] nb8800 26000.ethernet eth0: Link is Up - 1Gbps/Full - flow 
>> control rx/tx
>>
>> (The error message is expected, as my suspend routine returns -EIO
>> on failure.)
>>
>> I left the system to idle at the prompt; then 5 minutes later,
>> the system printed the following trace.
>>
>> [  400.718491] [ cut here ]
>> [  400.723175] WARNING: CPU: 0 PID: 0 at net/ipv4/af_inet.c:155 
>> inet_sock_destruct+0x1c4/0x1dc
>> [  400.731582] Modules linked in:
>> [  400.734689] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
>> 4.7.0-rc6-00010-gd07031bdc433 #1
>> [  400.742646] Hardware name: Sigma Tango DT
>> [  400.746671] Backtrace: 
>> [  400.749141] [] (dump_backtrace) from [] 
>> (show_stack+0x18/0x1c)
>> [  400.756747]  r7:6113 r6:c080ea84 r5: r4:c080ea84
>> [  400.762454] [] (show_stack) from [] 
>> (dump_stack+0x80/0x94)
>> [  400.769722] [] (dump_stack) from [] 
>> (__warn+0xec/0x104)
>> [  400.776717]  r7:0009 r6:c05e3fbc r5: r4:
>> [  400.782417] [] (__warn) from [] 
>> (warn_slowpath_null+0x28/0x30)
>> [  400.790022]  r9:dfbdd4e0 r8:000a r7:c0801de8 r6:df6f9514 r5:df5df144 
>> r4:df5df040
>> [  400.797825] [] (warn_slowpath_null) from [] 
>> (inet_sock_destruct+0x1c4/0x1dc)
>> [  400.806661] [] (inet_sock_destruct) from [] 
>> (__sk_destruct+0x28/0xe0)
>> [  400.814878]  r7:c0801de8 r6:df6f9514 r5:df5df040 r4:df5df1ec
>> [  400.820584] [] (__sk_destruct) from [] 
>> (rcu_process_callbacks+0x488/0x59c)
>> [  400.829237]  r5: r4:
>> [  400.832836] [] (rcu_process_callbacks) from [] 
>> (__do_softirq+0x138/0x264)
>> [  400.841402]  r10:c08020a0 r9:4001 r8:0101 r7:c080 r6:c08020a4 
>> r5:0009
>> [  400.849285]  r4:
>> [  400.851829] [] (__do_softirq) from [] 
>> (irq_exit+0xc8/0x104)
>> [  400.859172]  r10:c0801f10 r9:df402400 r8:0001 r7: r6:0013 
>> r5:
>> [  400.867053]  r4:c0735428
>> [  400.869601] [] (irq_exit) from [] 
>> (__handle_domain_irq+0x88/0xf4)
>> [  400.877473] [] (__handle_domain_irq) from [] 
>> (gic_handle_irq+0x50/0x94)
>> [  400.885865]  r10:dfffcdc0 r9:e0803100 r8:e0802100 r7:c0801f10 r6:e080210c 
>> r5:c080277c
>> [  400.893747]  r4:c080eca0 r3:c0801f10
>> [  400.897342] [] (gic_handle_irq) from [] 
>> (__irq_svc+0x54/0x90)
>> [  400.904861] Exception stack(0xc0801f10 to 0xc0801f58)
>> [  400.909936] 1f00:   
>> 826a c0117c80
>> [  400.918156] 1f20: c080 c08024f8 c0802494 c081e2d6 c05b954c c07268c0 
>> dfffcdc0 c0801f6c
>> [  400.926376] 1f40: c0801f70 c0801f60 c01086b0 c01086b4 6013 
>> [  400.933020]  r9:c07268c0 r8:c05b954c r7:c0801f44 r6: r5:6013 
>> r4:c01086b4
>> [  400.940826] [] (arch_cpu_idle) from [] 
>> (default_idle_call+0x28/0x34)
>> [  400.948960] [] (default_idle_call) from [] 
>> (cpu_startup_entry+0x128/0x17c)
>> [  400.957620] [] (cpu_startup_entry) from [] 
>> (rest_init+0x8c/0x90)
>> [  400.965400]  r7: r4:0002
>> [  400.969005] [] (rest_init) from [] 
>> (start_kernel+0x310/0x31c)
>> [  400.976522]  r5:c081e4c0 r4:0001
>> [  400.980121] [] (start_kernel) from [<8000807c>] (0x8000807c)
>> [  400.986716] ---[ end trace f8deb50d1b3d3c7a ]---
> 
> 
> NB: The warning shows up 310 seconds after the suspend attempt.
> 
> I rebooted, tried the same operation, and hit the same warning
> still 310 seconds later:
> 
> # echo mem > /sys/power/state
> [   25.665905] PM: Syncing filesystems ... done.
> [   25.672102] Freezing user space processes ... (elapsed 0.001 seconds) done.
> [

1 2 3 >

1 - 100 of 213 matches

Mail list logo