Re: [RFC PATCH] net: flush the softnet backlog in process context

2016-07-27 Thread Eric Dumazet

On Wed, 2016-07-27 at 12:37 +0200, Paolo Abeni wrote:
> Currently in process_backlog(), the process_queue dequeuing is
> performed with local IRQ disabled, to protect against
> flush_backlog(), which runs in hard IRQ context.

Thanks guys, patch looks fine to me.




Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-27 Thread Eric Dumazet
On Wed, 2016-07-27 at 14:38 -0700, Jeff Kirsher wrote:
> On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> > Could you try this ?
> > 
> > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > index
> > f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> > 938b3820b 100644
> > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> > @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> > *netdev)
> >  {
> > struct e1000_adapter *adapter = netdev_priv(netdev);
> >  
> > -   disable_irq(adapter->pdev->irq);
> > -   e1000_intr(adapter->pdev->irq, netdev);
> > -   enable_irq(adapter->pdev->irq);
> > +   if (napi_schedule_prep(>napi)) {
> > +   adapter->total_tx_bytes = 0;
> > +   adapter->total_tx_packets = 0;
> > +   adapter->total_rx_bytes = 0;
> > +   adapter->total_rx_packets = 0;
> > +   __napi_schedule(>napi);
> > +   }
> >  }
> >  #endif
> >  
> 
> Since this fixes the issue Fengguang saw, will you be submitting a formal
> patch Eric? (please) I can get this queued up for Dave's net tree as soon
> as I receive the formal patch.

I would prefer having a definitive advice from Thomas Gleixner and/or
others if disable_irq() is forbidden from IRQ path.

As I said, about all netpoll() methods in net drivers use disable_irq()
so a lot of patches would be needed.

disable_irq() should then test this condition earlier, so that we can
detect potential bug, even if the IRQ is not (yet) threaded.

Thanks.




Re: [PATCH RFC 1/1] xfrm: dst lookup doesn't account for fwmark

2016-07-27 Thread Steffen Klassert
On Fri, Jul 22, 2016 at 03:50:43PM -0600, Doug Applegate wrote:
> If route table includes routing based on fwmark, xfrm will not take it
> into account when routing ipsec traffic. We address this issue by adding
> fwmark information before calling route lookup.
> 
> Signed-off-by: Doug Applegate 
> ---
>  include/net/xfrm.h  |  3 ++-
>  net/ipv4/xfrm4_policy.c | 15 +++
>  net/ipv6/xfrm6_policy.c | 11 +--
>  net/xfrm/xfrm_policy.c  | 28 +++-
>  4 files changed, 45 insertions(+), 12 deletions(-)
> 
> diff --git a/include/net/xfrm.h b/include/net/xfrm.h
> index adfebd6..13e80d1 100644
> --- a/include/net/xfrm.h
> +++ b/include/net/xfrm.h
> @@ -282,7 +282,7 @@ struct xfrm_policy_afinfo {
>   struct dst_ops *dst_ops;
>   void (*garbage_collect)(struct net *net);
>   struct dst_entry *(*dst_lookup)(struct net *net,
> -   int tos, int oif,
> +   int tos, int oif, int mark,
> const xfrm_address_t *saddr,
> const xfrm_address_t *daddr);
>   int (*get_saddr)(struct net *net, int oif,
> @@ -292,6 +292,7 @@ struct xfrm_policy_afinfo {
>struct flowi *fl,
>int reverse);
>   int (*get_tos)(const struct flowi *fl);
> + int (*get_mark)(const struct flowi *fl);
>   int (*init_path)(struct xfrm_dst *path,
>   struct dst_entry *dst,
>   int nfheader_len);
> diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
> index 7b0edb3..5dcd411 100644
> --- a/net/ipv4/xfrm4_policy.c
> +++ b/net/ipv4/xfrm4_policy.c
> @@ -20,7 +20,7 @@
>  static struct xfrm_policy_afinfo xfrm4_policy_afinfo;
> 
>  static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct
> flowi4 *fl4,
> -int tos, int oif,
> +int tos, int oif, int mark,
>  const xfrm_address_t *saddr,
>  const xfrm_address_t *daddr)
>  {
> @@ -30,6 +30,7 @@ static struct dst_entry *__xfrm4_dst_lookup(struct
> net *net, struct flowi4 *fl4,
>   fl4->daddr = daddr->a4;
>   fl4->flowi4_tos = tos;
>   fl4->flowi4_oif = oif;
> + fl4->flowi4_mark = mark;
>   if (saddr)
>   fl4->saddr = saddr->a4;
> 
> @@ -42,13 +43,13 @@ static struct dst_entry *__xfrm4_dst_lookup(struct
> net *net, struct flowi4 *fl4,
>   return ERR_CAST(rt);
>  }
> 
> -static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, int oif,
> +static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos,
> int oif, int mark,
>const xfrm_address_t *saddr,
>const xfrm_address_t *daddr)

We may consider to setup the flowi earlier instead of adding yet another
parameter to this function. This would have also the advantage that we
don't need to change that function whenever we find out that we need
more flow informations.



Re: [PATCH RFC 0/1] xfrm: dst lookup doesn't account for fwmark

2016-07-27 Thread Steffen Klassert
On Fri, Jul 22, 2016 at 03:50:30PM -0600, Doug Applegate wrote:
> I ran into an issue trying to route outgoing ipsec traffic from an
> ipsec responder hub that uses fwmark to route out a specific
> interface. The fwmark points to a route table that contains a default
> route out a specific interface.  The fwmark is applied based on
> incoming interface of incoming traffic. I noticed that the mark was
> not being used when doing a route lookup. Is there any reason why this
> is? If not does this patch look reasonable? I just followed similar
> logic to how TOS was being used before route-lookup. I've only tested
> on an earlier kernel (3.14) and it works. Thanks!
> 
> Here's an example:
> 
> The ipsec policy is simple, just a tunnel from 192.168.0.0/24 <-->
> 192.168.1.0/24 the ipsec hub is at 35.0.0.1 but when traffic is being
> responded too, instead of the expected interface (eth2) we end it out
> a different interface (eth0)
> 
> route table:
> # ip rule list
> 0: from all lookup local
> 32766: from all lookup main
> 32767: from all lookup default
> 4: from all fwmark 0x2 lookup 3
> 4: from all fwmark 0x4 lookup 4
> 5: from all lookup 3
> 
> # ip route
> 25.0.0.0/30 dev eth0  src 25.0.0.1
> 35.0.0.0/30 dev eth2  src 35.0.0.1
> 192.168.1.0/24 dev lan1  src 192.168.1.1
> 
> # ip route list table 4
> default via 35.0.0.2 dev eth2 onlink
> # ip route list table 3
> default via 25.0.0.2 dev eth0 onlink

Can you provide some more information about your usecase?
Mapping between interfaces and IP addresses, policies
and states would be good.

Btw. why do you need to insert the routes as 'onlink'?
This disables some checks if this route is valid.


Re: [RFC PATCH] xfrm: Add option to reset oif in xfrm lookup

2016-07-27 Thread Steffen Klassert
On Mon, Jul 25, 2016 at 06:34:32PM -0600, Subash Abhinov Kasiviswanathan wrote:
> We are seeing incorrect routing when tunneling packets over an
> interface and sending it over another interface. This scenario
> worked on 3.18 (and earlier) and failed on 4.4 kernel. The rules
> / routes / policies were the same across kernels.
> 
> Commit 42a7b32b73d6 ("xfrm: Add oif to dst lookups") allowed
> preservation of the oif from a raw packet to a transformed packet.
> This causes issues with forwarding scenarios where the
> existing oif causes an incorrect route lookup.
> 
> Create a new sysctl which resets oif in xfrm policy. Default value
> is 0 which means that oif is preserved on transform.

Please don't try to workaround a bug with a sysctl.
If we have a bug here, we should fix it. Choosing
between bug A and bug B with a sysctl is not what
we are doing ;)


Re: [PATCH] xfrm: use printk instead of WARN for bad policy reporting

2016-07-27 Thread Steffen Klassert
On Wed, Jul 27, 2016 at 08:44:15AM +0200, Vegard Nossum wrote:
> On 07/27/2016 08:31 AM, Herbert Xu wrote:
> >On Wed, Jul 27, 2016 at 08:20:57AM +0200, Vegard Nossum wrote:
> >>
> >>Here's another patch to remove that too.
> >>
> >>I don't actually *use* this code myself and I feel the justification
> >>I've given for removing the WARN to be a bit weak, so if you don't take
> >>the patch I'll just keep it in my local tree to keep it from showing up
> >>again during fuzzing.
> >
> >Please just kill the whole else clause.  For soft policy expires
> >we simply need to relay a message to the KM and nothing more.
> 
> Try #2 :-)
> 
> 
> Vegard

> >From e5111e4dcd0e0c0990d3f4bba0ba0bc9d0b40bae Mon Sep 17 00:00:00 2001
> From: Vegard Nossum 
> Date: Wed, 27 Jul 2016 08:13:14 +0200
> Subject: [PATCH] xfrm: get rid of another incorrect WARN
> 
> During fuzzing I regularly run into this WARN(). According to Herbert Xu,
> this "certainly shouldn't be a WARN, it probably shouldn't print anything
> either".
> 
> Cc: Stephen Hemminger 
> Cc: Steffen Klassert 
> Cc: Herbert Xu 
> Signed-off-by: Vegard Nossum 

Also applied to the ipsec tree, thanks a lot!


Re: [PATCH] xfrm: use printk instead of WARN for bad policy reporting

2016-07-27 Thread Steffen Klassert
On Wed, Jul 27, 2016 at 08:03:18AM +0200, Vegard Nossum wrote:
> Subject: [PATCH] xfrm: get rid of incorrect WARN
> 
> AFAICT this message is just printed whenever input validation fails.
> This is a normal failure and we shouldn't be dumping the stack over it.
> 
> Looks like it was originally a printk that was maybe incorrectly
> upgraded to a WARN:
> 
> commit 62db5cfd70b1ef53aa21f144a806fe3b78c84fab
> Author: stephen hemminger 
> Date:   Wed May 12 06:37:06 2010 +
> 
> xfrm: add severity to printk
> 
> Cc: Stephen Hemminger 
> Cc: Steffen Klassert 
> Signed-off-by: Vegard Nossum 

Applied to the ipsec tree, thanks!


[PATCH -next] drivers: net: phy: xgene: Remove redundant dev_err call in xgene_mdio_probe()

2016-07-27 Thread Wei Yongjun
There is a error message within devm_ioremap_resource
already, so remove the dev_err call to avoid redundant
error message.

Signed-off-by: Wei Yongjun 
---
 drivers/net/phy/mdio-xgene.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/phy/mdio-xgene.c b/drivers/net/phy/mdio-xgene.c
index d94a978..7756748 100644
--- a/drivers/net/phy/mdio-xgene.c
+++ b/drivers/net/phy/mdio-xgene.c
@@ -345,10 +345,8 @@ static int xgene_mdio_probe(struct platform_device *pdev)
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
csr_base = devm_ioremap_resource(dev, res);
-   if (IS_ERR(csr_base)) {
-   dev_err(dev, "Unable to retrieve mac CSR region\n");
+   if (IS_ERR(csr_base))
return PTR_ERR(csr_base);
-   }
pdata->mac_csr_addr = csr_base;
pdata->mdio_csr_addr = csr_base + BLOCK_XG_MDIO_CSR_OFFSET;
pdata->diag_csr_addr = csr_base + BLOCK_DIAG_CSR_OFFSET;





[PATCH -next] tipc: fix imbalance read_unlock_bh in __tipc_nl_add_monitor()

2016-07-27 Thread Wei Yongjun
In the error handling case of nla_nest_start() failed read_unlock_bh()
is called  to unlock a lock that had not been taken yet. sparse warns
about the context imbalance as the following:

net/tipc/monitor.c:799:23: warning:
 context imbalance in '__tipc_nl_add_monitor' - different lock contexts for 
basic block

Fixes: cf6f7e1d5109 ('tipc: dump monitor attributes')
Signed-off-by: Wei Yongjun 
---
 net/tipc/monitor.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c
index be70a57..b62caa1 100644
--- a/net/tipc/monitor.c
+++ b/net/tipc/monitor.c
@@ -794,10 +794,10 @@ int __tipc_nl_add_monitor(struct net *net, struct 
tipc_nl_msg *msg,
return 0;
 
 attr_msg_full:
+   read_unlock_bh(>lock);
nla_nest_cancel(msg->skb, attrs);
 msg_full:
genlmsg_cancel(msg->skb, hdr);
-   read_unlock_bh(>lock);
 
return -EMSGSIZE;
 }



Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-27 Thread Al Viro
On Wed, Jul 27, 2016 at 08:26:48PM -0400, alexmcwhir...@triadic.us wrote:

> I'm going to go ahead and say this is where my issue and the op's issue
> begin to branch apart from one another. He's seeing this on all incoming
> data, whereas i am only seeing it on ssl data and not on sun4v.
> 
> At this point i would say data from my issue is only going to cloud this
> issue as they seem to be two completely different issues revolving around
> the same commit. If i come across any relevant data for x86_64 ill be sure
> to post it if this isn't resolved by then, but for now i'm going to refrain
> from submitting anything sparc related.

Which just might mean that we have *three* issues here -
(1) buggered __copy_to_user_inatomic() (and friends) on some sparcs
(2) your ssl-only corruption
(3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc
anywhere, and no multi-segment recvmsg().  Which would strongly argue in
favour of some kind of copy_page_to_iter() breakage triggered when handling
a fragmented skb, as in (1).  Except that I don't see anything similar in
x86_64 uaccess primitives...


Re: [PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-07-27 Thread Philp Prindeville

Inline...


On 07/27/2016 06:04 PM, Tom Herbert wrote:

On Wed, Jul 27, 2016 at 4:42 PM,   wrote:

From: Gao Feng 

The PPTP is encapsulated by GRE header with that GRE_VERSION bits
must contain one. But current GRE RPS needs the GRE_VERSION must be
zero. So RPS does not work for PPTP traffic.

In my test environment, there are four MIPS cores, and all traffic
are passed through by PPTP. As a result, only one core is 100% busy
while other three cores are very idle. After this patch, the usage
of four cores are balanced well.

Signed-off-by: Gao Feng 
---
  v1: Initial patch

  include/uapi/linux/if_tunnel.h |   5 +-
  net/core/flow_dissector.c  | 138 ++---
  2 files changed, 92 insertions(+), 51 deletions(-)

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 1046f55..dda4e4b 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -24,9 +24,12 @@
  #define GRE_SEQ__cpu_to_be16(0x1000)
  #define GRE_STRICT __cpu_to_be16(0x0800)
  #define GRE_REC__cpu_to_be16(0x0700)
-#define GRE_FLAGS  __cpu_to_be16(0x00F8)
+#define GRE_ACK__cpu_to_be16(0x0080)
+#define GRE_FLAGS  __cpu_to_be16(0x0078)
  #define GRE_VERSION__cpu_to_be16(0x0007)

+#define GRE_PROTO_PPP  __cpu_to_be16(0x880b)
+
  struct ip_tunnel_parm {
 charname[IFNAMSIZ];
 int link;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 61ad43f..d95e060 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -346,63 +346,101 @@ ip_proto_again:
 hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, hlen, 
&_hdr);
 if (!hdr)
 goto out_bad;
-   /*
-* Only look inside GRE if version zero and no
-* routing
-*/
-   if (hdr->flags & (GRE_VERSION | GRE_ROUTING))
-   break;

-   proto = hdr->proto;
-   nhoff += 4;
-   if (hdr->flags & GRE_CSUM)
+   /*
+   * Only look inside GRE if version zero and no
+   * routing

This comment is no longer true, GRE version 1 is being handled.


+   */
+   if (!(hdr->flags & (GRE_VERSION | GRE_ROUTING))) {
+   proto = hdr->proto;
 nhoff += 4;
-   if (hdr->flags & GRE_KEY) {
-   const __be32 *keyid;
-   __be32 _keyid;
+   if (hdr->flags & GRE_CSUM)
+   nhoff += 4;
+   if (hdr->flags & GRE_KEY) {
+   const __be32 *keyid;
+   __be32 _keyid;
+
+   keyid = __skb_header_pointer(skb, nhoff, 
sizeof(_keyid),
+data, hlen, 
&_keyid);
+
+   if (!keyid)
+   goto out_bad;
+
+   if (dissector_uses_key(flow_dissector,
+  
FLOW_DISSECTOR_KEY_GRE_KEYID)) {
+   key_keyid = 
skb_flow_dissector_target(flow_dissector,
+ 
FLOW_DISSECTOR_KEY_GRE_KEYID,
+ 
target_container);
+   key_keyid->keyid = *keyid;
+   }
+   nhoff += 4;
+   }
+   if (hdr->flags & GRE_SEQ)
+   nhoff += 4;
+   if (proto == htons(ETH_P_TEB)) {
+   const struct ethhdr *eth;
+   struct ethhdr _eth;
+
+   eth = __skb_header_pointer(skb, nhoff,
+  sizeof(_eth),
+  data, hlen, &_eth);
+   if (!eth)
+   goto out_bad;
+   proto = eth->h_proto;
+   nhoff += sizeof(*eth);
+
+   /* Cap headers that we access via pointers at 
the
+* end of the Ethernet header as our maximum 
alignment
+* at that point is only 2 bytes.
+*/
+   if (NET_IP_ALIGN)
+   hlen = nhoff;
+   }

-   keyid = __skb_header_pointer(skb, 

Re: [PATCH 06/15] ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle

2016-07-27 Thread Yisen Zhuang


在 2016/7/27 10:20, Peter Chen 写道:
> of_node_put needs to be called when the device node which is got
> from of_parse_phandle has finished using.
> 
> Signed-off-by: Peter Chen 
> ---
>  drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c 
> b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
> index 3fb87e2..18d72ea 100644
> --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
> +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
> @@ -786,6 +786,7 @@ static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
>   np = of_parse_phandle(mac_cb->dev->of_node, "phy-handle",
> mac_cb->mac_id);
>   mac_cb->phy_dev = of_phy_find_device(np);
> + of_node_put(np);
>   if (mac_cb->phy_dev) {
>   /* refcount is held by of_phy_find_device()
>* if the phy_dev is found

np is accessed in case of of_phy_find_device() returns a no null value, so it 
has to be
moved after the dev_dbg log.

> @@ -804,6 +805,7 @@ static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
>   np = of_parse_phandle(to_of_node(mac_cb->fw_port),
> "phy-handle", 0);
>   mac_cb->phy_dev = of_phy_find_device(np);
> + of_node_put(np);
>   if (mac_cb->phy_dev) {
>   /* refcount is held by of_phy_find_device()
>* if the phy_dev is found

np is accessed in case of of_phy_find_device() returns a no null value, so it 
has to be
moved after the dev_dbg log.

Thanks,

Yisen

> @@ -813,9 +815,10 @@ static int  hns_mac_get_info(struct hns_mac_cb *mac_cb)
>   mac_cb->mac_id, np->name);
>   }
>  
> - syscon = syscon_node_to_regmap(
> - of_parse_phandle(to_of_node(mac_cb->fw_port),
> -  "serdes-syscon", 0));
> + np = of_parse_phandle(to_of_node(mac_cb->fw_port),
> + "serdes-syscon", 0);
> + syscon = syscon_node_to_regmap(np);
> + of_node_put(np);
>   if (IS_ERR_OR_NULL(syscon)) {
>   dev_err(mac_cb->dev, "serdes-syscon is needed!\n");
>   return -EINVAL;
> 



Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-27 Thread alexmcwhirter

On 2016-07-27 20:31, Al Viro wrote:

On Wed, Jul 27, 2016 at 04:45:43PM -0700, David Miller wrote:


> I highly expect both my issue and OP's issue to revolve not around
> commit e5a4b0bb803b specifically, but around other code that no longer
> behaves as expected because of it.

Indeed, and that fault address rounding bug occurs two other times
in arch/sparc/lib/user_fixup.c

The mentioned patchwork patch should fix the bug and I'll get that
into my sparc tree, merged, and queued up for -stable ASAP.


Plausible for sparc, but I don't see similar __copy_to_user_inatomic()
bugs in case of x86_64...


I'm going to go ahead and say this is where my issue and the op's issue 
begin to branch apart from one another. He's seeing this on all incoming 
data, whereas i am only seeing it on ssl data and not on sun4v.


At this point i would say data from my issue is only going to cloud this 
issue as they seem to be two completely different issues revolving 
around the same commit. If i come across any relevant data for x86_64 
ill be sure to post it if this isn't resolved by then, but for now i'm 
going to refrain from submitting anything sparc related.


Re: [PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-07-27 Thread Feng Gao
Hi Tom,

I think maybe it is not good idea to consolidate these codes with the
version 0 path.

Because I need to check the flags strictly like this "!(hdr->flags &
(GRE_CSUM | GRE_ROUTING | GRE_STRICT | GRE_REC | GRE_FLAGS))".
These flag must be zero with PPTP GRE according to the RFC, but
original GRE only ask the "GRE_VERSION|GRE_ROUTING" must be zero.
And the inner proto structures are different between with the original
GRE and PPTP GRE.

So I think it would be more awkful if consolidate these codes, unless
we don't check the flags strictly.

On Thu, Jul 28, 2016 at 8:24 AM, Feng Gao  wrote:
> Thanks Tom, I append some inline comments.
>
> BTW, forgive me reply the email by gmail because the original mail
> server doesn't support kernel email format.
> I will send the update later.
>
> Best Regards
> Feng
>
> On Thu, Jul 28, 2016 at 8:04 AM, Tom Herbert  wrote:
>>
>> On Wed, Jul 27, 2016 at 4:42 PM,   wrote:
>> > From: Gao Feng 
>> >
>> > The PPTP is encapsulated by GRE header with that GRE_VERSION bits
>> > must contain one. But current GRE RPS needs the GRE_VERSION must be
>> > zero. So RPS does not work for PPTP traffic.
>> >
>> > In my test environment, there are four MIPS cores, and all traffic
>> > are passed through by PPTP. As a result, only one core is 100% busy
>> > while other three cores are very idle. After this patch, the usage
>> > of four cores are balanced well.
>> >
>> > Signed-off-by: Gao Feng 
>> > ---
>> >  v1: Initial patch
>> >
>> >  include/uapi/linux/if_tunnel.h |   5 +-
>> >  net/core/flow_dissector.c  | 138 
>> > ++---
>> >  2 files changed, 92 insertions(+), 51 deletions(-)
>> >
>> > diff --git a/include/uapi/linux/if_tunnel.h 
>> > b/include/uapi/linux/if_tunnel.h
>> > index 1046f55..dda4e4b 100644
>> > --- a/include/uapi/linux/if_tunnel.h
>> > +++ b/include/uapi/linux/if_tunnel.h
>> > @@ -24,9 +24,12 @@
>> >  #define GRE_SEQ__cpu_to_be16(0x1000)
>> >  #define GRE_STRICT __cpu_to_be16(0x0800)
>> >  #define GRE_REC__cpu_to_be16(0x0700)
>> > -#define GRE_FLAGS  __cpu_to_be16(0x00F8)
>> > +#define GRE_ACK__cpu_to_be16(0x0080)
>> > +#define GRE_FLAGS  __cpu_to_be16(0x0078)
>> >  #define GRE_VERSION__cpu_to_be16(0x0007)
>> >
>> > +#define GRE_PROTO_PPP  __cpu_to_be16(0x880b)
>> > +
>> >  struct ip_tunnel_parm {
>> > charname[IFNAMSIZ];
>> > int link;
>> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>> > index 61ad43f..d95e060 100644
>> > --- a/net/core/flow_dissector.c
>> > +++ b/net/core/flow_dissector.c
>> > @@ -346,63 +346,101 @@ ip_proto_again:
>> > hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, 
>> > hlen, &_hdr);
>> > if (!hdr)
>> > goto out_bad;
>> > -   /*
>> > -* Only look inside GRE if version zero and no
>> > -* routing
>> > -*/
>> > -   if (hdr->flags & (GRE_VERSION | GRE_ROUTING))
>> > -   break;
>> >
>> > -   proto = hdr->proto;
>> > -   nhoff += 4;
>> > -   if (hdr->flags & GRE_CSUM)
>> > +   /*
>> > +   * Only look inside GRE if version zero and no
>> > +   * routing
>>
>> This comment is no longer true, GRE version 1 is being handled.
>
> Ok, I get it. Thanks.
>
>>
>> > +   */
>> > +   if (!(hdr->flags & (GRE_VERSION | GRE_ROUTING))) {
>> > +   proto = hdr->proto;
>> > nhoff += 4;
>> > -   if (hdr->flags & GRE_KEY) {
>> > -   const __be32 *keyid;
>> > -   __be32 _keyid;
>> > +   if (hdr->flags & GRE_CSUM)
>> > +   nhoff += 4;
>> > +   if (hdr->flags & GRE_KEY) {
>> > +   const __be32 *keyid;
>> > +   __be32 _keyid;
>> > +
>> > +   keyid = __skb_header_pointer(skb, nhoff, 
>> > sizeof(_keyid),
>> > +data, hlen, 
>> > &_keyid);
>> > +
>> > +   if (!keyid)
>> > +   goto out_bad;
>> > +
>> > +   if (dissector_uses_key(flow_dissector,
>> > +  
>> > FLOW_DISSECTOR_KEY_GRE_KEYID)) {
>> > +   key_keyid = 
>> > skb_flow_dissector_target(flow_dissector,
>> > +  
>> >FLOW_DISSECTOR_KEY_GRE_KEYID,
>> > +  
>> >

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-27 Thread Al Viro
On Wed, Jul 27, 2016 at 04:45:43PM -0700, David Miller wrote:

> > I highly expect both my issue and OP's issue to revolve not around
> > commit e5a4b0bb803b specifically, but around other code that no longer
> > behaves as expected because of it.
> 
> Indeed, and that fault address rounding bug occurs two other times
> in arch/sparc/lib/user_fixup.c
> 
> The mentioned patchwork patch should fix the bug and I'll get that
> into my sparc tree, merged, and queued up for -stable ASAP.

Plausible for sparc, but I don't see similar __copy_to_user_inatomic()
bugs in case of x86_64...


Re: [PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-07-27 Thread Feng Gao
Thanks Tom, I append some inline comments.

BTW, forgive me reply the email by gmail because the original mail
server doesn't support kernel email format.
I will send the update later.

Best Regards
Feng

On Thu, Jul 28, 2016 at 8:04 AM, Tom Herbert  wrote:
>
> On Wed, Jul 27, 2016 at 4:42 PM,   wrote:
> > From: Gao Feng 
> >
> > The PPTP is encapsulated by GRE header with that GRE_VERSION bits
> > must contain one. But current GRE RPS needs the GRE_VERSION must be
> > zero. So RPS does not work for PPTP traffic.
> >
> > In my test environment, there are four MIPS cores, and all traffic
> > are passed through by PPTP. As a result, only one core is 100% busy
> > while other three cores are very idle. After this patch, the usage
> > of four cores are balanced well.
> >
> > Signed-off-by: Gao Feng 
> > ---
> >  v1: Initial patch
> >
> >  include/uapi/linux/if_tunnel.h |   5 +-
> >  net/core/flow_dissector.c  | 138 
> > ++---
> >  2 files changed, 92 insertions(+), 51 deletions(-)
> >
> > diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> > index 1046f55..dda4e4b 100644
> > --- a/include/uapi/linux/if_tunnel.h
> > +++ b/include/uapi/linux/if_tunnel.h
> > @@ -24,9 +24,12 @@
> >  #define GRE_SEQ__cpu_to_be16(0x1000)
> >  #define GRE_STRICT __cpu_to_be16(0x0800)
> >  #define GRE_REC__cpu_to_be16(0x0700)
> > -#define GRE_FLAGS  __cpu_to_be16(0x00F8)
> > +#define GRE_ACK__cpu_to_be16(0x0080)
> > +#define GRE_FLAGS  __cpu_to_be16(0x0078)
> >  #define GRE_VERSION__cpu_to_be16(0x0007)
> >
> > +#define GRE_PROTO_PPP  __cpu_to_be16(0x880b)
> > +
> >  struct ip_tunnel_parm {
> > charname[IFNAMSIZ];
> > int link;
> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> > index 61ad43f..d95e060 100644
> > --- a/net/core/flow_dissector.c
> > +++ b/net/core/flow_dissector.c
> > @@ -346,63 +346,101 @@ ip_proto_again:
> > hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, 
> > hlen, &_hdr);
> > if (!hdr)
> > goto out_bad;
> > -   /*
> > -* Only look inside GRE if version zero and no
> > -* routing
> > -*/
> > -   if (hdr->flags & (GRE_VERSION | GRE_ROUTING))
> > -   break;
> >
> > -   proto = hdr->proto;
> > -   nhoff += 4;
> > -   if (hdr->flags & GRE_CSUM)
> > +   /*
> > +   * Only look inside GRE if version zero and no
> > +   * routing
>
> This comment is no longer true, GRE version 1 is being handled.

Ok, I get it. Thanks.

>
> > +   */
> > +   if (!(hdr->flags & (GRE_VERSION | GRE_ROUTING))) {
> > +   proto = hdr->proto;
> > nhoff += 4;
> > -   if (hdr->flags & GRE_KEY) {
> > -   const __be32 *keyid;
> > -   __be32 _keyid;
> > +   if (hdr->flags & GRE_CSUM)
> > +   nhoff += 4;
> > +   if (hdr->flags & GRE_KEY) {
> > +   const __be32 *keyid;
> > +   __be32 _keyid;
> > +
> > +   keyid = __skb_header_pointer(skb, nhoff, 
> > sizeof(_keyid),
> > +data, hlen, 
> > &_keyid);
> > +
> > +   if (!keyid)
> > +   goto out_bad;
> > +
> > +   if (dissector_uses_key(flow_dissector,
> > +  
> > FLOW_DISSECTOR_KEY_GRE_KEYID)) {
> > +   key_keyid = 
> > skb_flow_dissector_target(flow_dissector,
> > +   
> >   FLOW_DISSECTOR_KEY_GRE_KEYID,
> > +   
> >   target_container);
> > +   key_keyid->keyid = *keyid;
> > +   }
> > +   nhoff += 4;
> > +   }
> > +   if (hdr->flags & GRE_SEQ)
> > +   nhoff += 4;
> > +   if (proto == htons(ETH_P_TEB)) {
> > +   const struct ethhdr *eth;
> > +   struct ethhdr _eth;
> > +
> > +   eth = __skb_header_pointer(skb, nhoff,
> > +  sizeof(_eth),
> > +  data, hlen, 
> > &_eth);
> > + 

Darlehen anbieten

2016-07-27 Thread Cabot Financial (UK) Ltd


Guter Tag zu Ihnen, Sie haben um finanzielle Unterstützung benötigen? ein 
berechtigtes Kredit für Zinsen nötig? Sie benötigen einen Business-Darlehen? 
Brauchen Sie noch einen Kredit, um ein Haus, Auto zu kaufen, zahlen Sie Ihre 
Rechnungen und Schulden? Brauchen Sie Geld Probleme zu lösen? Wenn so 
freundlich für ein Darlehen bei uns bewerben, bewerben Sie sich jetzt durch die 
folgenden Details und Rückkehr zu uns per E-Mail: cabo...@hotmail.com
Die Namen sind: ...
Adresse: ...
Telefon: ...
Darlehensbetrag erforderlich: ...
Dauer: ...
Beruf: ...
Monatliches Einkommen Level: ...
Nicht ...
Geburtsdatum: 
Status: ..
Land: .
Postleitzahl: .
Zweck: ..

KONTAKTIERE UNS.
cabo...@hotmail.com



Re: [PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-07-27 Thread Tom Herbert
On Wed, Jul 27, 2016 at 4:42 PM,   wrote:
> From: Gao Feng 
>
> The PPTP is encapsulated by GRE header with that GRE_VERSION bits
> must contain one. But current GRE RPS needs the GRE_VERSION must be
> zero. So RPS does not work for PPTP traffic.
>
> In my test environment, there are four MIPS cores, and all traffic
> are passed through by PPTP. As a result, only one core is 100% busy
> while other three cores are very idle. After this patch, the usage
> of four cores are balanced well.
>
> Signed-off-by: Gao Feng 
> ---
>  v1: Initial patch
>
>  include/uapi/linux/if_tunnel.h |   5 +-
>  net/core/flow_dissector.c  | 138 
> ++---
>  2 files changed, 92 insertions(+), 51 deletions(-)
>
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 1046f55..dda4e4b 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -24,9 +24,12 @@
>  #define GRE_SEQ__cpu_to_be16(0x1000)
>  #define GRE_STRICT __cpu_to_be16(0x0800)
>  #define GRE_REC__cpu_to_be16(0x0700)
> -#define GRE_FLAGS  __cpu_to_be16(0x00F8)
> +#define GRE_ACK__cpu_to_be16(0x0080)
> +#define GRE_FLAGS  __cpu_to_be16(0x0078)
>  #define GRE_VERSION__cpu_to_be16(0x0007)
>
> +#define GRE_PROTO_PPP  __cpu_to_be16(0x880b)
> +
>  struct ip_tunnel_parm {
> charname[IFNAMSIZ];
> int link;
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 61ad43f..d95e060 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -346,63 +346,101 @@ ip_proto_again:
> hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, 
> hlen, &_hdr);
> if (!hdr)
> goto out_bad;
> -   /*
> -* Only look inside GRE if version zero and no
> -* routing
> -*/
> -   if (hdr->flags & (GRE_VERSION | GRE_ROUTING))
> -   break;
>
> -   proto = hdr->proto;
> -   nhoff += 4;
> -   if (hdr->flags & GRE_CSUM)
> +   /*
> +   * Only look inside GRE if version zero and no
> +   * routing

This comment is no longer true, GRE version 1 is being handled.

> +   */
> +   if (!(hdr->flags & (GRE_VERSION | GRE_ROUTING))) {
> +   proto = hdr->proto;
> nhoff += 4;
> -   if (hdr->flags & GRE_KEY) {
> -   const __be32 *keyid;
> -   __be32 _keyid;
> +   if (hdr->flags & GRE_CSUM)
> +   nhoff += 4;
> +   if (hdr->flags & GRE_KEY) {
> +   const __be32 *keyid;
> +   __be32 _keyid;
> +
> +   keyid = __skb_header_pointer(skb, nhoff, 
> sizeof(_keyid),
> +data, hlen, 
> &_keyid);
> +
> +   if (!keyid)
> +   goto out_bad;
> +
> +   if (dissector_uses_key(flow_dissector,
> +  
> FLOW_DISSECTOR_KEY_GRE_KEYID)) {
> +   key_keyid = 
> skb_flow_dissector_target(flow_dissector,
> + 
> FLOW_DISSECTOR_KEY_GRE_KEYID,
> + 
> target_container);
> +   key_keyid->keyid = *keyid;
> +   }
> +   nhoff += 4;
> +   }
> +   if (hdr->flags & GRE_SEQ)
> +   nhoff += 4;
> +   if (proto == htons(ETH_P_TEB)) {
> +   const struct ethhdr *eth;
> +   struct ethhdr _eth;
> +
> +   eth = __skb_header_pointer(skb, nhoff,
> +  sizeof(_eth),
> +  data, hlen, &_eth);
> +   if (!eth)
> +   goto out_bad;
> +   proto = eth->h_proto;
> +   nhoff += sizeof(*eth);
> +
> +   /* Cap headers that we access via pointers at 
> the
> +* end of the Ethernet header as our maximum 
> alignment
> +* at that point is only 2 bytes.
> +*/
> +   if 

[PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash

2016-07-27 Thread fgao
From: Gao Feng 

The PPTP is encapsulated by GRE header with that GRE_VERSION bits
must contain one. But current GRE RPS needs the GRE_VERSION must be
zero. So RPS does not work for PPTP traffic.

In my test environment, there are four MIPS cores, and all traffic
are passed through by PPTP. As a result, only one core is 100% busy
while other three cores are very idle. After this patch, the usage
of four cores are balanced well.

Signed-off-by: Gao Feng 
---
 v1: Initial patch

 include/uapi/linux/if_tunnel.h |   5 +-
 net/core/flow_dissector.c  | 138 ++---
 2 files changed, 92 insertions(+), 51 deletions(-)

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index 1046f55..dda4e4b 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -24,9 +24,12 @@
 #define GRE_SEQ__cpu_to_be16(0x1000)
 #define GRE_STRICT __cpu_to_be16(0x0800)
 #define GRE_REC__cpu_to_be16(0x0700)
-#define GRE_FLAGS  __cpu_to_be16(0x00F8)
+#define GRE_ACK__cpu_to_be16(0x0080)
+#define GRE_FLAGS  __cpu_to_be16(0x0078)
 #define GRE_VERSION__cpu_to_be16(0x0007)
 
+#define GRE_PROTO_PPP  __cpu_to_be16(0x880b)
+
 struct ip_tunnel_parm {
charname[IFNAMSIZ];
int link;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 61ad43f..d95e060 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -346,63 +346,101 @@ ip_proto_again:
hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, 
hlen, &_hdr);
if (!hdr)
goto out_bad;
-   /*
-* Only look inside GRE if version zero and no
-* routing
-*/
-   if (hdr->flags & (GRE_VERSION | GRE_ROUTING))
-   break;
 
-   proto = hdr->proto;
-   nhoff += 4;
-   if (hdr->flags & GRE_CSUM)
+   /*
+   * Only look inside GRE if version zero and no
+   * routing
+   */
+   if (!(hdr->flags & (GRE_VERSION | GRE_ROUTING))) {
+   proto = hdr->proto;
nhoff += 4;
-   if (hdr->flags & GRE_KEY) {
-   const __be32 *keyid;
-   __be32 _keyid;
+   if (hdr->flags & GRE_CSUM)
+   nhoff += 4;
+   if (hdr->flags & GRE_KEY) {
+   const __be32 *keyid;
+   __be32 _keyid;
+
+   keyid = __skb_header_pointer(skb, nhoff, 
sizeof(_keyid),
+data, hlen, 
&_keyid);
+
+   if (!keyid)
+   goto out_bad;
+
+   if (dissector_uses_key(flow_dissector,
+  
FLOW_DISSECTOR_KEY_GRE_KEYID)) {
+   key_keyid = 
skb_flow_dissector_target(flow_dissector,
+ 
FLOW_DISSECTOR_KEY_GRE_KEYID,
+ 
target_container);
+   key_keyid->keyid = *keyid;
+   }
+   nhoff += 4;
+   }
+   if (hdr->flags & GRE_SEQ)
+   nhoff += 4;
+   if (proto == htons(ETH_P_TEB)) {
+   const struct ethhdr *eth;
+   struct ethhdr _eth;
+
+   eth = __skb_header_pointer(skb, nhoff,
+  sizeof(_eth),
+  data, hlen, &_eth);
+   if (!eth)
+   goto out_bad;
+   proto = eth->h_proto;
+   nhoff += sizeof(*eth);
+
+   /* Cap headers that we access via pointers at 
the
+* end of the Ethernet header as our maximum 
alignment
+* at that point is only 2 bytes.
+*/
+   if (NET_IP_ALIGN)
+   hlen = nhoff;
+   }
 
-   keyid = __skb_header_pointer(skb, nhoff, sizeof(_keyid),
-data, hlen, &_keyid);
+   key_control->flags |= FLOW_DIS_ENCAPSULATION;
+   if (flags 

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-27 Thread David Miller
From: alexmcwhir...@triadic.us
Date: Wed, 27 Jul 2016 19:02:40 -0400

> On 2016-07-27 14:04, alexmcwhir...@triadic.us wrote:
>> Just to add some more information to this, the corruption seems to
>> effect ssh as well.
>> Using a sun hme interface, occasionally upon an ssh connection it will
>> refuse to authenticate a client with either password or cert
>> authentication. Using wireshark to capture and decrypt the packets
>> between the two machines, the data coming from the server seems good,
>> but the data received by the server from the client is essentially
>> garbage. Note that the client is sending valid data, but the server is
>> corrupting it upon receipt. Closing the connection and starting a new
>> one will remedy the login issue, but you do occasionally see
>> corruption on the server side sporadically.
>> So far this only seems to occur on incoming data, outgoing data seems
>> fine.
> 
> Also, there is another patch the references this commit on sparc64 at
> least.
> 
> https://patchwork.kernel.org/patch/9221895/
> 
> I highly expect both my issue and OP's issue to revolve not around
> commit e5a4b0bb803b specifically, but around other code that no longer
> behaves as expected because of it.

Indeed, and that fault address rounding bug occurs two other times
in arch/sparc/lib/user_fixup.c

The mentioned patchwork patch should fix the bug and I'll get that
into my sparc tree, merged, and queued up for -stable ASAP.


Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-27 Thread alexmcwhirter

On 2016-07-27 14:04, alexmcwhir...@triadic.us wrote:

Just to add some more information to this, the corruption seems to
effect ssh as well.

Using a sun hme interface, occasionally upon an ssh connection it will
refuse to authenticate a client with either password or cert
authentication. Using wireshark to capture and decrypt the packets
between the two machines, the data coming from the server seems good,
but the data received by the server from the client is essentially
garbage. Note that the client is sending valid data, but the server is
corrupting it upon receipt. Closing the connection and starting a new
one will remedy the login issue, but you do occasionally see
corruption on the server side sporadically.

So far this only seems to occur on incoming data, outgoing data seems 
fine.


Also, there is another patch the references this commit on sparc64 at 
least.


https://patchwork.kernel.org/patch/9221895/

I highly expect both my issue and OP's issue to revolve not around 
commit e5a4b0bb803b specifically, but around other code that no longer 
behaves as expected because of it.


Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-27 Thread Francois Romieu
Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 0e62d74..f07604f 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -1749,13 +1749,21 @@ static u32 __rtl8169_get_wol(struct rtl8169_private 
> *tp)
>  static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo 
> *wol)
>  {
>   struct rtl8169_private *tp = netdev_priv(dev);
> + struct pci_dev *pdev = tp->pci_dev;

Nit: you may directly use "struct device *d = >pci_dev->dev;"

-- 
Ueimor


Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-27 Thread Francois Romieu
Chunhao Lin  :
[...]
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 0e62d74..f07604f 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
[...]
> @@ -1852,12 +1863,17 @@ static int rtl8169_set_wol(struct net_device *dev, 
> struct ethtool_wolinfo *wol)
>   tp->features |= RTL_FEATURE_WOL;
>   else
>   tp->features &= ~RTL_FEATURE_WOL;
> - __rtl8169_set_wol(tp, wol->wolopts);
> + if (pm_runtime_active(>dev))
> + __rtl8169_set_wol(tp, wol->wolopts);
> + else
> + tp->saved_wolopts = wol->wolopts;
>  
>   rtl_unlock_work(tp);
>  
>   device_set_wakeup_enable(>pci_dev->dev, wol->wolopts);
>  
> + pm_runtime_put_noidle(>dev);
> +
>   return 0;

Either the driver resumes the device so that it can perform requested
operation or it signals .set_wol failure when the device is suspended.

If the driver does something else, "spam removal" translates to
"silent failure".

-- 
Ueimor


Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-27 Thread Jeff Kirsher
On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote:
> Could you try this ?
> 
> diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
> b/drivers/net/ethernet/intel/e1000/e1000_main.c
> index
> f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a
> 938b3820b 100644
> --- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device
> *netdev)
>  {
> struct e1000_adapter *adapter = netdev_priv(netdev);
>  
> -   disable_irq(adapter->pdev->irq);
> -   e1000_intr(adapter->pdev->irq, netdev);
> -   enable_irq(adapter->pdev->irq);
> +   if (napi_schedule_prep(>napi)) {
> +   adapter->total_tx_bytes = 0;
> +   adapter->total_tx_packets = 0;
> +   adapter->total_rx_bytes = 0;
> +   adapter->total_rx_packets = 0;
> +   __napi_schedule(>napi);
> +   }
>  }
>  #endif
>  

Since this fixes the issue Fengguang saw, will you be submitting a formal
patch Eric? (please) I can get this queued up for Dave's net tree as soon
as I receive the formal patch.

signature.asc
Description: This is a digitally signed message part


Re: [PATCH 05/15] ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle

2016-07-27 Thread David Daney

On 07/26/2016 07:20 PM, Peter Chen wrote:

of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.

Signed-off-by: Peter Chen 


NAK.



---
  drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c 
b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
index e8bc15b..5eb9d8c 100644
--- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
+++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
@@ -960,6 +960,7 @@ static int octeon_mgmt_init_phy(struct net_device *netdev)
phydev = of_phy_connect(netdev, p->phy_np,
octeon_mgmt_adjust_link, 0,
PHY_INTERFACE_MODE_MII);
+   of_node_put(p->phy_np);


I don't think you can do this here.  octeon_mgmt_init_phy() may be 
called multiple times in the life of the driver, so p->phy_np must 
remain valid.


It may be appropriate to do the  of_node_put() in the 
octeon_mgmt_remove() function.





if (!phydev)
return -ENODEV;





RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-07-27 Thread Woodford, Timothy W.
>> This is prepatory work for an expanding list of adapter families that have 
>> occasional ~10 hour clock jumps when being used for PTP. Factor out the 
>> sanitization function and convert to using a feature (bug) flag, per 
>> suggestion from Jesse Brandeburg.
>> 
>> Littering functional code with device-specific checks is much messier than 
>> simply checking a flag, and having device-specific init set flags as needed.
>> There are probably a number of other cases in the e1000e code that 
>> could/should be converted similarly.
> 
> Looks ok to me.
> Adding Chris who asked what happens if we reach the max retry counter 
> (E1000_MAX_82574_SYSTIM_REREAD)?
> This counter is set to 50. 
> Can you, for testing purposes, decreased this value (or even set it to 0) and 
> see what happens?

I have one of the affected NICs.  I inserted some test code into the module to 
check the value of i after the for loop inside the sanitize function exits to 
check how many iterations of the loop were being run.  I ran a PTP test setup 
for a few hours with E1000_MAX_82574_SYSTIM_REREAD=50, and the maximum number 
was 5 iterations.
It looks like setting E1000_MAX_82574_SYSTIM_REREAD to 0 will just disable the 
return value checking and will have the same results as disabling the 
sanitization function.  I'll set the max retry counter to 1 and run an 
overnight test to see what happens.

Tim Woodford


[PATCH net-next 2/3] kcm: Use stream parser

2016-07-27 Thread Tom Herbert
Adapt KCM to use the stream parser. This mostly involves removing
the RX handling and setting up the strparser using the interface.

Signed-off-by: Tom Herbert 
---
 include/net/kcm.h |  37 +
 net/kcm/Kconfig   |   1 +
 net/kcm/kcmproc.c |  41 +++--
 net/kcm/kcmsock.c | 454 --
 4 files changed, 102 insertions(+), 431 deletions(-)

diff --git a/include/net/kcm.h b/include/net/kcm.h
index 2840b58..2a89658 100644
--- a/include/net/kcm.h
+++ b/include/net/kcm.h
@@ -13,6 +13,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 extern unsigned int kcm_net_id;
@@ -21,16 +22,8 @@ extern unsigned int kcm_net_id;
 #define KCM_STATS_INCR(stat) ((stat)++)
 
 struct kcm_psock_stats {
-   unsigned long long rx_msgs;
-   unsigned long long rx_bytes;
unsigned long long tx_msgs;
unsigned long long tx_bytes;
-   unsigned int rx_aborts;
-   unsigned int rx_mem_fail;
-   unsigned int rx_need_more_hdr;
-   unsigned int rx_msg_too_big;
-   unsigned int rx_msg_timeouts;
-   unsigned int rx_bad_hdr_len;
unsigned long long reserved;
unsigned long long unreserved;
unsigned int tx_aborts;
@@ -64,13 +57,6 @@ struct kcm_tx_msg {
struct sk_buff *last_skb;
 };
 
-struct kcm_rx_msg {
-   int full_len;
-   int accum_len;
-   int offset;
-   int early_eaten;
-};
-
 /* Socket structure for KCM client sockets */
 struct kcm_sock {
struct sock sk;
@@ -87,6 +73,7 @@ struct kcm_sock {
struct work_struct tx_work;
struct list_head wait_psock_list;
struct sk_buff *seq_skb;
+   u32 tx_stopped : 1;
 
/* Don't use bit fields here, these are set under different locks */
bool tx_wait;
@@ -104,11 +91,11 @@ struct bpf_prog;
 /* Structure for an attached lower socket */
 struct kcm_psock {
struct sock *sk;
+   struct strparser strp;
struct kcm_mux *mux;
int index;
 
u32 tx_stopped : 1;
-   u32 rx_stopped : 1;
u32 done : 1;
u32 unattaching : 1;
 
@@ -121,18 +108,12 @@ struct kcm_psock {
struct kcm_psock_stats stats;
 
/* Receive */
-   struct sk_buff *rx_skb_head;
-   struct sk_buff **rx_skb_nextp;
-   struct sk_buff *ready_rx_msg;
struct list_head psock_ready_list;
-   struct work_struct rx_work;
-   struct delayed_work rx_delayed_work;
struct bpf_prog *bpf_prog;
struct kcm_sock *rx_kcm;
unsigned long long saved_rx_bytes;
unsigned long long saved_rx_msgs;
-   struct timer_list rx_msg_timer;
-   unsigned int rx_need_bytes;
+   struct sk_buff *ready_rx_msg;
 
/* Transmit */
struct kcm_sock *tx_kcm;
@@ -146,6 +127,7 @@ struct kcm_net {
struct mutex mutex;
struct kcm_psock_stats aggregate_psock_stats;
struct kcm_mux_stats aggregate_mux_stats;
+   struct strp_aggr_stats aggregate_strp_stats;
struct list_head mux_list;
int count;
 };
@@ -163,6 +145,7 @@ struct kcm_mux {
 
struct kcm_mux_stats stats;
struct kcm_psock_stats aggregate_psock_stats;
+   struct strp_aggr_stats aggregate_strp_stats;
 
/* Receive */
spinlock_t rx_lock cacheline_aligned_in_smp;
@@ -190,14 +173,6 @@ static inline void aggregate_psock_stats(struct 
kcm_psock_stats *stats,
/* Save psock statistics in the mux when psock is being unattached. */
 
 #define SAVE_PSOCK_STATS(_stat) (agg_stats->_stat += stats->_stat)
-   SAVE_PSOCK_STATS(rx_msgs);
-   SAVE_PSOCK_STATS(rx_bytes);
-   SAVE_PSOCK_STATS(rx_aborts);
-   SAVE_PSOCK_STATS(rx_mem_fail);
-   SAVE_PSOCK_STATS(rx_need_more_hdr);
-   SAVE_PSOCK_STATS(rx_msg_too_big);
-   SAVE_PSOCK_STATS(rx_msg_timeouts);
-   SAVE_PSOCK_STATS(rx_bad_hdr_len);
SAVE_PSOCK_STATS(tx_msgs);
SAVE_PSOCK_STATS(tx_bytes);
SAVE_PSOCK_STATS(reserved);
diff --git a/net/kcm/Kconfig b/net/kcm/Kconfig
index 5db94d9..87fca36 100644
--- a/net/kcm/Kconfig
+++ b/net/kcm/Kconfig
@@ -3,6 +3,7 @@ config AF_KCM
tristate "KCM sockets"
depends on INET
select BPF_SYSCALL
+   select STREAM_PARSER
---help---
  KCM (Kernel Connection Multiplexor) sockets provide a method
  for multiplexing messages of a message based application
diff --git a/net/kcm/kcmproc.c b/net/kcm/kcmproc.c
index 16c2e03..c23f065 100644
--- a/net/kcm/kcmproc.c
+++ b/net/kcm/kcmproc.c
@@ -155,8 +155,8 @@ static void kcm_format_psock(struct kcm_psock *psock, 
struct seq_file *seq,
seq_printf(seq,
   "   psock-%-5u %-10llu %-16llu %-10llu %-16llu %-8d %-8d 
%-8d %-8d ",
   psock->index,
-  psock->stats.rx_msgs,
-  psock->stats.rx_bytes,
+  psock->strp.stats.rx_msgs,
+  psock->strp.stats.rx_bytes,
   psock->stats.tx_msgs,
  

[PATCH net-next 1/3] strparser: Stream parser for messages

2016-07-27 Thread Tom Herbert
This patch introduces a utility for parsing application layer protocol
messages in a TCP stream. This is a generalization of the mechanism
implemented of Kernel Connection Multiplexor.

The API includes a context structure, a set of callbacks, utility
functions, and a data ready function.

A stream parser instance is defined by a strparse structure that
is bound to a TCP socket. The function to initialize the structure
is:

int strp_init(struct strparser *strp, struct sock *csk,
  struct strp_callbacks *cb);

csk is the TCP socket being bound to and cb are the parser callbacks.

A parser is bound to a TCP socket by setting data_ready function to
strp_tcp_data_ready so that all receive indications on the socket
go through the parser. This is assumes that sk_user_data is set to
the strparser structure.

There are four callbacks.
 - parse_msg is called to parse the message (returns length or error).
 - rcv_msg is called when a complete message has been received
 - read_sock_done is called when data_ready function exits
 - abort_parser is called to abort the parser

The input to parse_msg is an skbuff which contains next message under
construction. The backend processing of parse_msg will parse the
application layer protocol headers to determine the length of
the message in the stream. The possible return values are:

   >0 : indicates length of successfully parsed message
   0  : indicates more data must be received to parse the message
   -ESTRPIPE : current message should not be processed by the
  kernel, return control of the socket to userspace which
  can proceed to read the messages itself
   other < 0 : Error is parsing, give control back to userspace
  assuming that synchronzation is lost and the stream
  is unrecoverable (application expected to close TCP socket)

In the case of error return (< 0) strparse will stop the parser
and report and error to userspace. The application must deal
with the error. To handle the error the strparser is unbound
from the TCP socket. If the error indicates that the stream
TCP socket is at recoverable point (ESTRPIPE) then the application
can read the TCP socket to process the stream. Once the application
has dealt with the exceptions in the stream, it may again bind the
socket to a strparser to continue data operations.

Note that ENODATA may be returned to the application. In this case
parse_msg returned -ESTRPIPE, however strparser was unable to maintain
synchronization of the stream (i.e. some of the message in question
was already read by the parser).

strp_pause and strp_unpause are used to provide flow control. For
instance, if rcv_msg is called but the upper layer can't immediately
consume the message it can hold the message and pause strparser.

Signed-off-by: Tom Herbert 
---
 include/net/strparser.h   | 147 ++
 net/Kconfig   |   1 +
 net/Makefile  |   1 +
 net/strparser/Kconfig |   4 +
 net/strparser/Makefile|   1 +
 net/strparser/strparser.c | 492 ++
 6 files changed, 646 insertions(+)
 create mode 100644 include/net/strparser.h
 create mode 100644 net/strparser/Kconfig
 create mode 100644 net/strparser/Makefile
 create mode 100644 net/strparser/strparser.c

diff --git a/include/net/strparser.h b/include/net/strparser.h
new file mode 100644
index 000..42b964d
--- /dev/null
+++ b/include/net/strparser.h
@@ -0,0 +1,147 @@
+/*
+ * Stream Parser
+ *
+ * Copyright (c) 2016 Tom Herbert 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation.
+ */
+
+#ifndef __NET_STRPARSER_H_
+#define __NET_STRPARSER_H_
+
+#include 
+#include 
+
+#define STRP_STATS_ADD(stat, count) ((stat) += (count))
+#define STRP_STATS_INCR(stat) ((stat)++)
+
+struct strp_stats {
+   unsigned long long rx_msgs;
+   unsigned long long rx_bytes;
+   unsigned int rx_mem_fail;
+   unsigned int rx_need_more_hdr;
+   unsigned int rx_msg_too_big;
+   unsigned int rx_msg_timeouts;
+   unsigned int rx_bad_hdr_len;
+};
+
+struct strp_aggr_stats {
+   unsigned long long rx_msgs;
+   unsigned long long rx_bytes;
+   unsigned int rx_mem_fail;
+   unsigned int rx_need_more_hdr;
+   unsigned int rx_msg_too_big;
+   unsigned int rx_msg_timeouts;
+   unsigned int rx_bad_hdr_len;
+   unsigned int rx_aborts;
+   unsigned int rx_interrupted;
+   unsigned int rx_unrecov_intr;
+};
+
+struct strparser;
+
+/* Callbacks are called with lock held for the attached socket */
+struct strp_callbacks {
+   int (*parse_msg)(struct strparser *strp, struct sk_buff *skb);
+   void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb);
+   int (*read_sock_done)(struct strparser *strp, int err);
+   void (*abort_parser)(struct strparser *strp, int 

[PATCH net-next 0/3] strp: Stream parser for messages

2016-07-27 Thread Tom Herbert
This patch set introduces a utility for parsing application layer
protocol messages in a TCP stream. This is a generalization of the
mechanism implemented of Kernel Connection Multiplexor.

This patch set adapts KCM to use the strparser. We expect that kTLS
can use this mechanism also. RDS would probably be another candidate
to use a common stream parsing mechanism.

The API includes a context structure, a set of callbacks, utility
functions, and a data ready function. The callbacks include
a parse_msg function that is called to perform parsing (e.g.
BPF parsing in case of KCM), and a rcv_msg function that is called
when a full message has been completed.

For strparser we specify the return codes from the parser to allow
the backend to indicate that control of the socket should be
transferred back to userspace to handle some exceptions in the
stream: The return values are:

  >0 : indicates length of successfully parsed message
   0  : indicates more data must be received to parse the message
   -ESTRPIPE : current message should not be processed by the
  kernel, return control of the socket to userspace which
  can proceed to read the messages itself
   other < 0 : Error is parsing, give control back to userspace
  assuming that synchronization is lost and the stream
  is unrecoverable (application expected to close TCP socket)

There is one issue I haven't been able to fully resolve. If parse_msg
returns ESTRPIPE (wants control back to userspace) the parser may
already have consumed some bytes of the message. There is no way to
put bytes back into the TCP receive queue and tcp_read_sock does not
allow an easy way to peek messages. In lieu of a better solution, we
return ENODATA on the socket to indicate that the data stream is
unrecoverable (application needs to close socket). This condition
should only happen if an application layer message header is split
across two skbuffs and parsing just the first skbuff wasn't sufficient
to determine the that transfer to userspace is needed.

This patch set contains:

  - strparser implementation
  - changes to kcm to use strparser
  - strparser.txt documentation

Tested:
  - Ran a KCM thrash test for 24 hours. No behavioral or performance
differences observed.


Tom Herbert (3):
  strparser: Stream parser for messages
  kcm: Use stream parser
  strparser: Documentation

 Documentation/networking/strparser.txt | 147 ++
 include/net/kcm.h  |  37 +--
 include/net/strparser.h| 147 ++
 net/Kconfig|   1 +
 net/Makefile   |   1 +
 net/kcm/Kconfig|   1 +
 net/kcm/kcmproc.c  |  41 ++-
 net/kcm/kcmsock.c  | 454 +-
 net/strparser/Kconfig  |   4 +
 net/strparser/Makefile |   1 +
 net/strparser/strparser.c  | 492 +
 11 files changed, 895 insertions(+), 431 deletions(-)
 create mode 100644 Documentation/networking/strparser.txt
 create mode 100644 include/net/strparser.h
 create mode 100644 net/strparser/Kconfig
 create mode 100644 net/strparser/Makefile
 create mode 100644 net/strparser/strparser.c

-- 
2.8.0.rc2



[PATCH net-next 3/3] strparser: Documentation

2016-07-27 Thread Tom Herbert
Signed-off-by: Tom Herbert 
---
 Documentation/networking/strparser.txt | 147 +
 1 file changed, 147 insertions(+)
 create mode 100644 Documentation/networking/strparser.txt

diff --git a/Documentation/networking/strparser.txt 
b/Documentation/networking/strparser.txt
new file mode 100644
index 000..abfef72
--- /dev/null
+++ b/Documentation/networking/strparser.txt
@@ -0,0 +1,147 @@
+Stream Parser
+-
+
+The stream parser (strparser) is a utility that parses messages of an
+application layer protocol running over a TCP connection. The stream
+parser works in conjunction with an upper layer in the kernel to provide
+kernel support for application layer messages. For instance, Kernel
+Connection Multiplexor (KCM) uses the Stream Parser to parse messages
+using a BPF program.
+
+Interface
+-
+
+The API includes a context structure, a set of callbacks, utility
+functions, and a data_ready function. The callbacks include
+a parse_msg function that is called to perform parsing (e.g.
+BPF parsing in case of KCM), and a rcv_msg function that is called
+when a full message has been completed.
+
+A stream parser can be instantiated for a TCP connection. This is done
+by:
+
+strp_init(struct strparser *strp, struct sock *csk,
+ struct strp_callbacks *cb)
+
+strp is a struct of type strparser that is allocated by the upper layer.
+csk is the TCP socket associated with the stream parser. Callbacks are
+called by the stream parser.
+
+Callbacks
+-
+
+There are four callbacks:
+
+int (*parse_msg)(struct strparser *strp, struct sk_buff *skb);
+
+parse_msg is called to determine the length of the next message
+in the stream. The upper layer must implement this function. It
+should parse the sk_buff as containing the headers for the
+next application layer messages in the stream.
+
+The skb->cb in the input skb is a struct strp_rx_msg. Only
+the offset field is relevant in parse_msg and gives the offset
+where the message starts in the skb.
+
+The return values of this function are:
+
+>0 : indicates length of successfully parsed message
+0  : indicates more data must be received to parse the message
+-ESTRPIPE : current message should not be processed by the
+  kernel, return control of the socket to userspace which
+  can proceed to read the messages itself
+other < 0 : Error is parsing, give control back to userspace
+  assuming that synchronization is lost and the stream
+  is unrecoverable (application expected to close TCP socket)
+
+In the case that an error is returned (return value is less than
+zero) the stream parser will set the error on TCP socket and wake
+it up. If parse_msg returned -ESTRPIPE and the stream parser had
+previously read some bytes for the current message, then the error
+set on the attached socket is ENODATA since the stream is
+unrecoverable in that case.
+
+void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb);
+
+rcv_msg is called when a full message has been received and
+is queued. The callee must consume the sk_buff; it can
+call strp_pause to prevent any further messages from being
+received in rcv_msg (see strp_pause below). This callback
+must be set.
+
+The skb->cb in the input skb is a struct strp_rx_msg. This
+struct contains two fields: offset and full_len. Offset is
+where the message starts in the skb, and full_len is the
+the length of the message. skb->len - offset may be greater
+then full_len since strparser does not trim the skb.
+
+int (*read_sock_done)(struct strparser *strp, int err);
+
+ read_sock_done is called when the stream parser is done reading
+ the TCP socket. The stream parser may read multiple messages
+ in a loop and this function allows cleanup to occur when existing
+ the loop. If the callback is not set (NULL in strp_init) a
+ default function is used.
+
+void (*abort_parser)(struct strparser *strp, int err);
+
+ This function is called when stream parser encounters an error
+ in parsing. The default function stops the stream parser for the
+ TCP socket and sets the error in the socket. The default function
+ can be changed by setting the callback to non-NULL in strp_init.
+
+data_ready
+--
+
+strp_tcp_data_ready is the TCP data ready function that can be used with
+the stream parser. When the upper layer attaches the TCP socket it
+can set sk_data_ready to this function (after strp_init). sk_user_data
+must be set to the stream parser structure to use this (container_of
+can be used in the other socket callbacks to get the upper layer
+structure from the stream parser structure).
+
+Functions
+-
+
+The upper layer uses strp_pause and strp_unpause to flow control the
+stream parser. This is needed for instance when the upper layer can't
+immediately process a 

Re: linux-next: manual merge of the net-next tree with the arm-soc tree

2016-07-27 Thread Duc Dang
On Tue, Jul 26, 2016 at 7:53 PM, Stephen Rothwell  wrote:
> Hi all,
>
> Today's linux-next merge of the net-next tree got a conflict in:
>
>   arch/arm64/boot/dts/apm/apm-shadowcat.dtsi
>
> between commit:
>
>   cafc4cd0c8b8 ("arm64: dts: apm: Use lowercase consistently for hex 
> constants")
>
> from the arm-soc tree and commit:
>
>   8e694cd2762c ("dtb: xgene: Add MDIO node")
>
> from the net-next tree.
>
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

Thanks for taking care of this, Stephen. The fix looks fine.

>
> --
> Cheers,
> Stephen Rothwell
>
> diff --cc arch/arm64/boot/dts/apm/apm-shadowcat.dtsi
> index 21028b145d91,2e1e5daa1dc7..
> --- a/arch/arm64/boot/dts/apm/apm-shadowcat.dtsi
> +++ b/arch/arm64/boot/dts/apm/apm-shadowcat.dtsi
> @@@ -628,9 -636,9 +636,9 @@@
> sgenet0: ethernet@1f61 {
> compatible = "apm,xgene2-sgenet";
> status = "disabled";
> -   reg = <0x0 0x1f61 0x0 0x1>,
> +   reg = <0x0 0x1f61 0x0 0xd100>,
>  -<0x0 0x1f60 0x0 0Xd100>,
>  -<0x0 0x2000 0x0 0X2>;
>  +<0x0 0x1f60 0x0 0xd100>,
>  +<0x0 0x2000 0x0 0x2>;
> interrupts = <0 96 4>,
>  <0 97 4>;
> dma-coherent;
Regards,
Duc Dang.


Re: [PATCH 1/2] SUNRPC: accept() may return sockets that are still in SYN_RECV

2016-07-27 Thread Trond Myklebust
Hi Eric,

> On Jul 27, 2016, at 14:59, Eric Dumazet  wrote:
> 
> On Wed, 2016-07-27 at 14:48 -0400, Fields Bruce James wrote:
>> On Tue, Jul 26, 2016 at 04:08:29PM +, Trond Myklebust wrote:
>>> 
 On Jul 26, 2016, at 11:43, J. Bruce Fields  wrote:
 
 On Tue, Jul 26, 2016 at 09:51:19AM -0400, Trond Myklebust wrote:
> We're seeing traces of the following form:
> 
> [10952.396347] svc: transport 88042ba4a 000 dequeued, inuse=2
> [10952.396351] svc: tcp_accept 88042ba4 a000 sock 88042a6e4c80
> [10952.396362] nfsd: connect from 10.2.6.1, port=187
> [10952.396364] svc: svc_setup_socket 8800b99bcf00
> [10952.396368] setting up TCP socket for reading
> [10952.396370] svc: svc_setup_socket created 8803eb10a000 (inet 
> 88042b75b800)
> [10952.396373] svc: transport 8803eb10a000 put into queue
> [10952.396375] svc: transport 88042ba4a000 put into queue
> [10952.396377] svc: server 8800bb0ec000 waiting for data (to = 
> 360)
> [10952.396380] svc: transport 8803eb10a000 dequeued, inuse=2
> [10952.396381] svc_recv: found XPT_CLOSE
> [10952.396397] svc: svc_delete_xprt(8803eb10a000)
> [10952.396398] svc: svc_tcp_sock_detach(8803eb10a000)
> [10952.396399] svc: svc_sock_detach(8803eb10a000)
> [10952.396412] svc: svc_sock_free(8803eb10a000)
> 
> i.e. an immediate close of the socket after initialisation.
 
 Interesting, thanks!
 
 So the one thing I don't understand is why this is correct behavior for
 accept--I thought it wasn't supposed to return a socket until it was
 fully established.
>>> 
>>> inet_accept() appears to allow SYN_RECV:
>> 
>> OK.  Cc'ing netdev just to make sure we didn't overlook anything.
>> 
> 
> SYN_RECV after accept() is a TCP Fast Open property I think.
> 
> Maybe you are playing with some global TCP Fast Open settings ?
> 

The Linux kernel client should not be using TCP fast open, but it is possible 
that some of the other NFSv3 clients we’re using are.
Would a standard knfsd listener respond to a TCP fast open request, or would 
the default behaviour be to ignore it?

If the default behaviour for the server is to allow fast open, then we do need 
these patches, IMO.

> 
>> (Also: what were user-visible symptoms?  Mounts failing, or unexpected
>> delays?)
>> 

Connection retry storms on the server.

>> --b.
>> 
>>> 
>>> int inet_accept(struct socket *sock, struct socket *newsock, int flags)
>>> {
>>>struct sock *sk1 = sock->sk;
>>>int err = -EINVAL;
>>>struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, );
>>> 
>>>if (!sk2)
>>>goto do_err;
>>> 
>>>lock_sock(sk2);
>>> 
>>>sock_rps_record_flow(sk2);
>>>WARN_ON(!((1 << sk2->sk_state) &
>>>  (TCPF_ESTABLISHED | TCPF_SYN_RECV |
>>>  TCPF_CLOSE_WAIT | TCPF_CLOSE)));
>>> 
>>>sock_graft(sk2, newsock);
>>> 
>>>newsock->state = SS_CONNECTED;
>>>err = 0;
>>>release_sock(sk2);
>>> do_err:
>>>return err;
>>> }
> 
> 



Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-07-27 Thread Eric Dumazet
On Wed, 2016-07-27 at 18:19 +, Brandon Cazander wrote:
> [1.] One line summary of the problem:
> Using TPROXY together with a DNAT rule (working on older kernels) fails to 
> work on newer kernels as of commit 079096f103fa
> 
> [2.] Full description of the problem/report:
> I performed a git bisect using a qemu image to test my example below, and the 
> bisect ended at this commit:
> 
> > commit 079096f103faca2dd87342cca6f23d4b34da8871
> > Author: Eric Dumazet 
> > Date:   Fri Oct 2 11:43:32 2015 -0700
> > 
> > tcp/dccp: install syn_recv requests into ehash table
> 
> [3.] Keywords: networking
> 
> [4.] Kernel information
> [4.1.] Kernel version (from /proc/version):
> Everything as of commit 079096f103fa (tested up to 4.5.0)
> 
> [4.2.] Kernel .config file:
> When performing the bisect, I built with make oldconfig. Let me know if you 
> want the whole .config file.
> 
> [5.] Most recent kernel version which did not have the bug:
> Any kernel that I built prior to commit 
> 079096f103faca2dd87342cca6f23d4b34da8871 did not have this issue.
> 
> [6.] no Oops
> 
> [7.] A small shell script or example program which triggers the
>  problem (if possible)
> 
> I have produced what I hope is a minimal example, using the instructions for 
> TPROXY from 
> http://lxr.linux.no/#linux+v3.10/Documentation/networking/tproxy.txt and an 
> example transparent TCP proxy written in C that I found at 
> https://github.com/kristrev/tproxy-example.
> 
> * I have a machine ("ROUTER") with 10.100.0.164/24 on eth0, and 
> 192.168.30.2/24 on eth1. This is running the tproxy-example program, with the 
> following rules:
> iptables -t mangle -N DIVERT
> iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
> iptables -t mangle -A DIVERT -j MARK --set-mark 1
> iptables -t mangle -A DIVERT -j ACCEPT
> iptables -t mangle -A PREROUTING -p tcp --dport 8080 -j TPROXY 
> --tproxy-mark 0x1/0x1 --on-port 9876
> iptables -t nat -I PREROUTING -i eth0 -d 42.0.1.1 -j DNAT --to-dest 
> 192.168.30.1
> ip rule add fwmark 1 lookup 100
> ip route add local 0.0.0.0/0 dev lo table 100
> 
> * There is a machine ("WEBSERVER") at 192.168.30.1/24 hosting a webserver on 
> port 8080.
> 
> * My workstation is at 10.100.0.206, and I have a static route for both 
> 192.168.30.2 and 42.0.1.1 via 10.100.0.164.
> 
> * Making a curl request to 192.168.30.2:8080 hits the transparent proxy and 
> works in both GOOD (before the aforementioned commit) kernel, and BAD (at the 
> commit or later) kernel.
> 
> * Making a curl request to 42.0.1.1:8080 hits the transparent proxy and works 
> in GOOD kernel but in BAD kernel I get:
> "curl: (56) Recv failure: Connection reset by peer"
> 
> * When it fails, no traffic hits the WEBSERVER. A tcpdump on the bad kernel 
> shows:
> root@dons-qemu-new-kernel:~# tcpdump -niany tcp and port 8080
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
> bytes
> 16:42:31.551952 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [S], seq 
> 3793582216, win 29200, options [mss 1460,sackOK,TS val 632068656 ecr 
> 0,nop,wscale 7], length 0
> 16:42:31.551988 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745382 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:31.55 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [.], ack 1, 
> win 229, options [nop,nop,TS val 632068657 ecr 745382], length 0
> 16:42:31.552238 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
> 4042636217, win 0, length 0
> 16:42:31.552246 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [P.], seq 
> 1:78, ack 1, win 229, options [nop,nop,TS val 632068657 ecr 745382], length 77
> 16:42:31.552251 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
> 4042636217, win 0, length 0
> 16:42:32.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745632 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:32.551925 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
> 3793582217, win 0, length 0
> 16:42:34.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
> 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 746132 
> ecr 632068656,nop,wscale 7], length 0
> 16:42:34.551995 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
> 3793582217, win 0, length 0
> 
> * A tcpdump on a GOOD kernel shows:
> root@dons-qemu-old-kernel:~# tcpdump -niany tcp and port 8080
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
> bytes
> 16:44:18.364537 IP 10.100.0.206.35996 > 42.0.1.1.8080: Flags [S], seq 
> 3963646692, win 29200, options [mss 1460,sackOK,TS val 632175966 ecr 

Re: [PATCH 1/2] SUNRPC: accept() may return sockets that are still in SYN_RECV

2016-07-27 Thread Eric Dumazet
On Wed, 2016-07-27 at 14:48 -0400, Fields Bruce James wrote:
> On Tue, Jul 26, 2016 at 04:08:29PM +, Trond Myklebust wrote:
> > 
> > > On Jul 26, 2016, at 11:43, J. Bruce Fields  wrote:
> > > 
> > > On Tue, Jul 26, 2016 at 09:51:19AM -0400, Trond Myklebust wrote:
> > >> We're seeing traces of the following form:
> > >> 
> > >> [10952.396347] svc: transport 88042ba4a 000 dequeued, inuse=2
> > >> [10952.396351] svc: tcp_accept 88042ba4 a000 sock 88042a6e4c80
> > >> [10952.396362] nfsd: connect from 10.2.6.1, port=187
> > >> [10952.396364] svc: svc_setup_socket 8800b99bcf00
> > >> [10952.396368] setting up TCP socket for reading
> > >> [10952.396370] svc: svc_setup_socket created 8803eb10a000 (inet 
> > >> 88042b75b800)
> > >> [10952.396373] svc: transport 8803eb10a000 put into queue
> > >> [10952.396375] svc: transport 88042ba4a000 put into queue
> > >> [10952.396377] svc: server 8800bb0ec000 waiting for data (to = 
> > >> 360)
> > >> [10952.396380] svc: transport 8803eb10a000 dequeued, inuse=2
> > >> [10952.396381] svc_recv: found XPT_CLOSE
> > >> [10952.396397] svc: svc_delete_xprt(8803eb10a000)
> > >> [10952.396398] svc: svc_tcp_sock_detach(8803eb10a000)
> > >> [10952.396399] svc: svc_sock_detach(8803eb10a000)
> > >> [10952.396412] svc: svc_sock_free(8803eb10a000)
> > >> 
> > >> i.e. an immediate close of the socket after initialisation.
> > > 
> > > Interesting, thanks!
> > > 
> > > So the one thing I don't understand is why this is correct behavior for
> > > accept--I thought it wasn't supposed to return a socket until it was
> > > fully established.
> > 
> > inet_accept() appears to allow SYN_RECV:
> 
> OK.  Cc'ing netdev just to make sure we didn't overlook anything.
> 

SYN_RECV after accept() is a TCP Fast Open property I think.

Maybe you are playing with some global TCP Fast Open settings ?


> (Also: what were user-visible symptoms?  Mounts failing, or unexpected
> delays?)
> 
> --b.
> 
> > 
> > int inet_accept(struct socket *sock, struct socket *newsock, int flags)
> > {
> > struct sock *sk1 = sock->sk;
> > int err = -EINVAL;
> > struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, );
> > 
> > if (!sk2)
> > goto do_err;
> > 
> > lock_sock(sk2);
> > 
> > sock_rps_record_flow(sk2);
> > WARN_ON(!((1 << sk2->sk_state) &
> >   (TCPF_ESTABLISHED | TCPF_SYN_RECV |
> >   TCPF_CLOSE_WAIT | TCPF_CLOSE)));
> > 
> > sock_graft(sk2, newsock);
> > 
> > newsock->state = SS_CONNECTED;
> > err = 0;
> > release_sock(sk2);
> > do_err:
> > return err;
> > }




PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)

2016-07-27 Thread Brandon Cazander
[1.] One line summary of the problem:
Using TPROXY together with a DNAT rule (working on older kernels) fails to work 
on newer kernels as of commit 079096f103fa

[2.] Full description of the problem/report:
I performed a git bisect using a qemu image to test my example below, and the 
bisect ended at this commit:

> commit 079096f103faca2dd87342cca6f23d4b34da8871
> Author: Eric Dumazet 
> Date:   Fri Oct 2 11:43:32 2015 -0700
> 
> tcp/dccp: install syn_recv requests into ehash table

[3.] Keywords: networking

[4.] Kernel information
[4.1.] Kernel version (from /proc/version):
Everything as of commit 079096f103fa (tested up to 4.5.0)

[4.2.] Kernel .config file:
When performing the bisect, I built with make oldconfig. Let me know if you 
want the whole .config file.

[5.] Most recent kernel version which did not have the bug:
Any kernel that I built prior to commit 
079096f103faca2dd87342cca6f23d4b34da8871 did not have this issue.

[6.] no Oops

[7.] A small shell script or example program which triggers the
 problem (if possible)

I have produced what I hope is a minimal example, using the instructions for 
TPROXY from 
http://lxr.linux.no/#linux+v3.10/Documentation/networking/tproxy.txt and an 
example transparent TCP proxy written in C that I found at 
https://github.com/kristrev/tproxy-example.

* I have a machine ("ROUTER") with 10.100.0.164/24 on eth0, and 192.168.30.2/24 
on eth1. This is running the tproxy-example program, with the following rules:
iptables -t mangle -N DIVERT
iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT
iptables -t mangle -A DIVERT -j MARK --set-mark 1
iptables -t mangle -A DIVERT -j ACCEPT
iptables -t mangle -A PREROUTING -p tcp --dport 8080 -j TPROXY 
--tproxy-mark 0x1/0x1 --on-port 9876
iptables -t nat -I PREROUTING -i eth0 -d 42.0.1.1 -j DNAT --to-dest 
192.168.30.1
ip rule add fwmark 1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100

* There is a machine ("WEBSERVER") at 192.168.30.1/24 hosting a webserver on 
port 8080.

* My workstation is at 10.100.0.206, and I have a static route for both 
192.168.30.2 and 42.0.1.1 via 10.100.0.164.

* Making a curl request to 192.168.30.2:8080 hits the transparent proxy and 
works in both GOOD (before the aforementioned commit) kernel, and BAD (at the 
commit or later) kernel.

* Making a curl request to 42.0.1.1:8080 hits the transparent proxy and works 
in GOOD kernel but in BAD kernel I get:
"curl: (56) Recv failure: Connection reset by peer"

* When it fails, no traffic hits the WEBSERVER. A tcpdump on the bad kernel 
shows:
root@dons-qemu-new-kernel:~# tcpdump -niany tcp and port 8080
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
bytes
16:42:31.551952 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [S], seq 
3793582216, win 29200, options [mss 1460,sackOK,TS val 632068656 ecr 
0,nop,wscale 7], length 0
16:42:31.551988 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745382 
ecr 632068656,nop,wscale 7], length 0
16:42:31.55 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [.], ack 1, 
win 229, options [nop,nop,TS val 632068657 ecr 745382], length 0
16:42:31.552238 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
4042636217, win 0, length 0
16:42:31.552246 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [P.], seq 
1:78, ack 1, win 229, options [nop,nop,TS val 632068657 ecr 745382], length 77
16:42:31.552251 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 
4042636217, win 0, length 0
16:42:32.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745632 
ecr 632068656,nop,wscale 7], length 0
16:42:32.551925 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
3793582217, win 0, length 0
16:42:34.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 
4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 746132 
ecr 632068656,nop,wscale 7], length 0
16:42:34.551995 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 
3793582217, win 0, length 0

* A tcpdump on a GOOD kernel shows:
root@dons-qemu-old-kernel:~# tcpdump -niany tcp and port 8080
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 
bytes
16:44:18.364537 IP 10.100.0.206.35996 > 42.0.1.1.8080: Flags [S], seq 
3963646692, win 29200, options [mss 1460,sackOK,TS val 632175966 ecr 
0,nop,wscale 7], length 0
16:44:18.364571 IP 42.0.1.1.8080 > 10.100.0.206.35996: Flags [S.], seq 
4117262662, ack 3963646693, win 14480, options [mss 1460,sackOK,TS val 
4294903654 ecr 632175966,nop,wscale 7], length 0
16:44:18.364819 IP 10.100.0.206.35996 > 

Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-27 Thread Eric Dumazet
On Wed, 2016-07-27 at 23:01 +0800, Fengguang Wu wrote:
> On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote:
> >On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:
> >> Hi Eric,
> >>
> >> It works!
> >>
> >> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
> >> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
> >> >> Greetings,
> >> >>
> >> >> This BUG message can be found in recent kernels as well as v4.4 and
> >> >> linux-stable. It happens when running
> >> >>
> >> >> modprobe netconsole netconsole=@/,$port@$server/
> >> >>
> >> >> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 
> >> >> 192.168.1.1 offset -673.833841 sec
> >> >> [   39.943285] netpoll: netconsole: local port 6665
> >> >> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
> >> >> [   39.943609] netpoll: netconsole: interface 'eth0'
> >> >> [   39.943756] netpoll: netconsole: remote port 6672
> >> >> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
> >> >> [   39.944099] netpoll: netconsole: remote ethernet address 
> >> >> ff:ff:ff:ff:ff:ff
> >> >> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
> >> >> [   39.944514] BUG: sleeping function called from invalid context at 
> >> >> kernel/irq/manage.c:110
> >> >> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: 
> >> >> modprobe
> >> >> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
> >> >> 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
> >> >> [   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
> >> >> BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
> >> >> [   39.944522]   c90001f2f9e8 813417d9 
> >> >> 88007faba5c0
> >> >> [   39.944524]  006e c90001f2fa00 810aec03 
> >> >> 81a25948
> >> >> [   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
> >> >> 8803e50fbd68
> >> >> [   39.944526] Call Trace:
> >> >> [   39.944533]  [] dump_stack+0x63/0x8a
> >> >> [   39.944536]  [] ___might_sleep+0xd3/0x120
> >> >> [   39.944537]  [] __might_sleep+0x4a/0x80
> >> >> [   39.944541]  [] synchronize_irq+0x38/0xa0
> >> >> [   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
> >> >> [   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
> >> >> [   39.944547]  [] disable_irq+0x1c/0x20
> >> >> [   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
> >> >> [   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
> >> >> [   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
> >> >> [   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
> >> >> [   39.944572]  [] netpoll_send_udp+0x2ec/0x450
> >> >> [   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
> >> >> [   39.944578]  [] call_console_drivers+0x115/0x120
> >> >> [   39.944580]  [] console_unlock+0x333/0x5c0
> >> >> [   39.944583]  [] register_console+0x1c4/0x380
> >> >> [   39.944586]  [] init_netconsole+0x1c5/0x1000 
> >> >> [netconsole]
> >> >> [   39.944588]  [] ? 0xa004f000
> >> >> [   39.944591]  [] do_one_initcall+0x3d/0x150
> >> >> [   39.944592]  [] ? __might_sleep+0x4a/0x80
> >> >> [   39.944596]  [] ? 
> >> >> kmem_cache_alloc_trace+0x188/0x1e0
> >> >> [   39.944598]  [] do_init_module+0x5f/0x1d8
> >> >> [   39.944602]  [] load_module+0x1429/0x1b40
> >> >> [   39.944604]  [] ? __symbol_put+0x40/0x40
> >> >> [   39.944607]  [] ? kernel_read_file+0x178/0x1a0
> >> >> [   39.944608]  [] ? 
> >> >> kernel_read_file_from_fd+0x49/0x80
> >> >> [   39.944611]  [] SYSC_finit_module+0xc3/0xf0
> >> >> [   39.944614]  [] SyS_finit_module+0xe/0x10
> >> >> [   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
> >> >> [   39.946384] console [netcon0] enabled
> >> >> [   39.946514] netconsole: network logging started
> >> >>
> >> >> Can this be possibly fixed?
> >> >
> >> >Could you try this ?
> >> >
> >> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
> >> >b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >index 
> >> >f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b
> >> > 100644
> >> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
> >> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device 
> >> >*netdev)
> >> > {
> >> >  struct e1000_adapter *adapter = netdev_priv(netdev);
> >> >
> >> >- disable_irq(adapter->pdev->irq);
> >> >- e1000_intr(adapter->pdev->irq, netdev);
> >> >- enable_irq(adapter->pdev->irq);
> >> >+ if (napi_schedule_prep(>napi)) {
> >> >+ adapter->total_tx_bytes = 0;
> >> >+ adapter->total_tx_packets = 0;
> >> >+ adapter->total_rx_bytes = 0;
> >> >+ adapter->total_rx_packets = 0;
> >> >+ __napi_schedule(>napi);
> >> >+ }
> >>
> >> The machines are actually running e1000e driver, so I copied your
> >> approach to e1000e and it works:
> >>
> >> kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
> >> kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 

Re: [PATCH 1/2] SUNRPC: accept() may return sockets that are still in SYN_RECV

2016-07-27 Thread Fields Bruce James
On Tue, Jul 26, 2016 at 04:08:29PM +, Trond Myklebust wrote:
> 
> > On Jul 26, 2016, at 11:43, J. Bruce Fields  wrote:
> > 
> > On Tue, Jul 26, 2016 at 09:51:19AM -0400, Trond Myklebust wrote:
> >> We're seeing traces of the following form:
> >> 
> >> [10952.396347] svc: transport 88042ba4a 000 dequeued, inuse=2
> >> [10952.396351] svc: tcp_accept 88042ba4 a000 sock 88042a6e4c80
> >> [10952.396362] nfsd: connect from 10.2.6.1, port=187
> >> [10952.396364] svc: svc_setup_socket 8800b99bcf00
> >> [10952.396368] setting up TCP socket for reading
> >> [10952.396370] svc: svc_setup_socket created 8803eb10a000 (inet 
> >> 88042b75b800)
> >> [10952.396373] svc: transport 8803eb10a000 put into queue
> >> [10952.396375] svc: transport 88042ba4a000 put into queue
> >> [10952.396377] svc: server 8800bb0ec000 waiting for data (to = 360)
> >> [10952.396380] svc: transport 8803eb10a000 dequeued, inuse=2
> >> [10952.396381] svc_recv: found XPT_CLOSE
> >> [10952.396397] svc: svc_delete_xprt(8803eb10a000)
> >> [10952.396398] svc: svc_tcp_sock_detach(8803eb10a000)
> >> [10952.396399] svc: svc_sock_detach(8803eb10a000)
> >> [10952.396412] svc: svc_sock_free(8803eb10a000)
> >> 
> >> i.e. an immediate close of the socket after initialisation.
> > 
> > Interesting, thanks!
> > 
> > So the one thing I don't understand is why this is correct behavior for
> > accept--I thought it wasn't supposed to return a socket until it was
> > fully established.
> 
> inet_accept() appears to allow SYN_RECV:

OK.  Cc'ing netdev just to make sure we didn't overlook anything.

(Also: what were user-visible symptoms?  Mounts failing, or unexpected
delays?)

--b.

> 
> int inet_accept(struct socket *sock, struct socket *newsock, int flags)
> {
> struct sock *sk1 = sock->sk;
> int err = -EINVAL;
> struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, );
> 
> if (!sk2)
> goto do_err;
> 
> lock_sock(sk2);
> 
> sock_rps_record_flow(sk2);
> WARN_ON(!((1 << sk2->sk_state) &
>   (TCPF_ESTABLISHED | TCPF_SYN_RECV |
>   TCPF_CLOSE_WAIT | TCPF_CLOSE)));
> 
> sock_graft(sk2, newsock);
> 
> newsock->state = SS_CONNECTED;
> err = 0;
> release_sock(sk2);
> do_err:
> return err;
> }


[PATCH net] 8139too:fix system hang when there is a tx timeout event.

2016-07-27 Thread Chunhao Lin
If tx timeout event occur, kernel will call rtl8139_tx_timeout_task() to reset
hardware. But in this function, driver does not stop tx and rx function before
reset hardware, that will cause system hang.

In this patch, add stop tx and rx function before reset hardware.

Signed-off-by: Chunhao Lin 
---
 drivers/net/ethernet/realtek/8139too.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139too.c 
b/drivers/net/ethernet/realtek/8139too.c
index ef668d3..0d77f3c 100644
--- a/drivers/net/ethernet/realtek/8139too.c
+++ b/drivers/net/ethernet/realtek/8139too.c
@@ -1667,6 +1667,10 @@ static void rtl8139_tx_timeout_task (struct work_struct 
*work)
int i;
u8 tmp8;
 
+   napi_disable(>napi);
+   netif_stop_queue(dev);
+   synchronize_sched();
+
netdev_dbg(dev, "Transmit timeout, status %02x %04x %04x media %02x\n",
   RTL_R8(ChipCmd), RTL_R16(IntrStatus),
   RTL_R16(IntrMask), RTL_R8(MediaStatus));
@@ -1696,10 +1700,10 @@ static void rtl8139_tx_timeout_task (struct work_struct 
*work)
spin_unlock_irq(>lock);
 
/* ...and finally, reset everything */
-   if (netif_running(dev)) {
-   rtl8139_hw_start (dev);
-   netif_wake_queue (dev);
-   }
+   napi_enable(>napi);
+   rtl8139_hw_start (dev);
+   netif_wake_queue (dev);
+
spin_unlock_bh(>rx_lock);
 }
 
-- 
1.9.1



Re: [iproute PATCH 0/3] improve MACsec support

2016-07-27 Thread Stephen Hemminger
On Tue, 26 Jul 2016 15:44:39 +0200
Sabrina Dubroca  wrote:

> 2016-07-26, 11:03:17 +0200, Davide Caratti wrote:
> > parsing of 'cipher' and 'icvlen' arguments has been improved; while at it,
> > a couple of missing printouts have been added to usage() functions in 
> > "ip addr help" and "ip link help". Finally, some errors in the man pages
> > have been fixed.
> > 
> > Signed-off-by: Davide Caratti 
> > 
> > Davide Caratti (3):
> >   man: macsec: fix macsec related typos
> >   ip {link,address}: add 'macsec' item to TYPE list
> >   macsec: cipher and icvlen can be set separately
> > 
> >  ip/ipaddress.c   |  2 +-
> >  ip/iplink.c  |  2 +-
> >  ip/ipmacsec.c| 52 
> > 
> >  man/man8/ip-address.8.in |  3 ++-
> >  man/man8/ip-link.8.in| 17 +++-
> >  man/man8/ip-macsec.8 | 13 
> >  6 files changed, 42 insertions(+), 47 deletions(-)  
> 
> Acked-by: Sabrina Dubroca 
> 
> Stephen, can this go in before the 4.7 release?  This fixes parsing
> bugs and docs, it would be nice to have this.
> 
> 
> Thanks,
> 

Will try, still have a couple open things for 4.7


Re: [PATCH 2/2] net: davinci_cpdma: reduce time holding chan->lock in cpdma_chan_submit

2016-07-27 Thread Ivan Khoronzhuk



On 27.07.16 17:08, Grygorii Strashko wrote:

On 07/27/2016 10:12 AM, Uwe Kleine-König wrote:

Hello,

On Tue, Jul 26, 2016 at 05:25:58PM +0300, Grygorii Strashko wrote:

On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:

Allocating and preparing a dma descriptor doesn't need to happen under
the channel's lock. So do this before taking the channel's lock. The only
down side is that the dma descriptor might be allocated even though the
channel is about to be stopped. This is unlikely though.

Signed-off-by: Uwe Kleine-König 
---
 drivers/net/ethernet/ti/davinci_cpdma.c | 38 +
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
b/drivers/net/ethernet/ti/davinci_cpdma.c
index 5ffa04a306c6..ba3462707ae3 100644
--- a/drivers/net/ethernet/ti/davinci_cpdma.c
+++ b/drivers/net/ethernet/ti/davinci_cpdma.c
@@ -542,24 +542,10 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void 
*token, void *data,
u32 mode;
int ret = 0;

-   spin_lock_irqsave(>lock, flags);
-
-   if (chan->state == CPDMA_STATE_TEARDOWN) {
-   ret = -EINVAL;
-   goto unlock_ret;
-   }
-
-   if (chan->count >= chan->desc_num) {
-   chan->stats.desc_alloc_fail++;
-   ret = -ENOMEM;
-   goto unlock_ret;
-   }


I'm not sure this is right thing to do. This check is expected to be strict
and means "channel has exhausted the available descriptors, so further descs 
allocation does not allowed".


I developed this patch basing on a 4.4 kernel which doesn't have
742fb20fd4c7 ("net: ethernet: ti: cpdma: switch to use genalloc"). There
my patch is more obviously correct. As currently chan->count is
protected by chan->lock we must hold the lock for this check. If a
failing check means we must not call cpdma_desc_alloc in the first
place, that's bad.

The chan->count is not only case where this lock is needed unfortunately.
I like the idea to remove a bunch of locks from here (I was wondering why it 
needs to have
so much locks when using h/w queues, but this is the style driver is written 
though)
This lock is also needed to cover stats counters at least.
In case of cpsw driver, that uses cpdam_chan, the same channel can be shared
between two emacs (in dual emac mode) then the lock is needed for every chan 
var.
So that's not rare case. In general, the optimization of cpdma is good idea,
but seems it can require much more changes.



Yes. That's intention of this check :(
Now it'll work as following for two (rx/tx) channels, as example
RX desc_num = 16 (max allowed number of descriptors)
TX desc_num = 16 (max allowed number of descriptors)
and with current code number of allocated descriptors will never exceed 16.

with your change, in corner case when TX channel's already utilized 16 
descriptors the
following will happen:
cpdma_chan_submit()
 - cpdma_desc_alloc() -  will allocate 17th desc
 - lock
 - check for chan->count - fail
 - unlock
 - cpdma_desc_free()

so your patch will add additional desc_alloc/desc_free in the above corner case
and that's what i'm worry about (TEARDOWN seems ok) especially taking into 
account
further multi-queue feature development.

Above corner case seems might happen very rare, because of the guard check in 
cpsw_ndo_start_xmit(),
but it could.



But I'm not sure this is the case here. After all cpdma_desc_alloc
doesn't do anything relevant for the hardware, right?


Right.

Thanks. I'd try to do some measurement also.



--
Regards,
Ivan Khoronzhuk


Re: [PATCH v10 12/12] bpf: add sample for xdp forwarding and rewrite

2016-07-27 Thread Jesper Dangaard Brouer
On Tue, 19 Jul 2016 15:05:37 -0700
Alexei Starovoitov  wrote:

> On Tue, Jul 19, 2016 at 12:16:57PM -0700, Brenden Blanco wrote:
> > Add a sample that rewrites and forwards packets out on the same
> > interface. Observed single core forwarding performance of ~10Mpps.
> > 
> > Since the mlx4 driver under test recycles every single packet page, the
> > perf output shows almost exclusively just the ring management and bpf
> > program work. Slowdowns are likely occurring due to cache misses.  
> 
> long term we need to resurrect your prefetch patch. sux to leave
> so much performance on the table.

I will do some (more) attempts at getting prefetching working, also in
a way that benefit the normal stack (when I'm back from vacation, and
when net-next opens again).

I think Rana is also looking into this for the mlx5 driver (based on
some patches I send to her).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-27 Thread alexmcwhirter
Just to add some more information to this, the corruption seems to 
effect ssh as well.


Using a sun hme interface, occasionally upon an ssh connection it will 
refuse to authenticate a client with either password or cert 
authentication. Using wireshark to capture and decrypt the packets 
between the two machines, the data coming from the server seems good, 
but the data received by the server from the client is essentially 
garbage. Note that the client is sending valid data, but the server is 
corrupting it upon receipt. Closing the connection and starting a new 
one will remedy the login issue, but you do occasionally see corruption 
on the server side sporadically.


So far this only seems to occur on incoming data, outgoing data seems 
fine.


Re: [PATCH net-next V2 5/5] net/mlx4: Add VST QinQ support

2016-07-27 Thread John Fastabend
On 16-07-26 07:23 AM, Tariq Toukan wrote:
> Hi John,
> 
> Currently we support one tag set to a VF by the administrator, with the
> new interface can be either C-VLAN or S-VLAN.
> Once S-VLAN is set by administrator, the VM can set the C-VLANs it wants
> to use.
> 
> S-VLAN + C-VLAN per vf currently is not supported, but can be a future
> use case.
> Command line in such case should add more tuples of {vlan, qos, proto}
> such as:
> ip link set dev ens2 vf 0 vlan 100  qos 2 proto 802.1ad vlan 50 qos 4
> proto 802.1q
> 

yep looks good to me.

> As we currently don't support it, we will change the new
> IFLA_VF_VLAN_INFO struct to include a list of such tuples and currently
> use only the first one.
> Is that what you meant?
> 

Yep that was exactly my suggestion. Because I know hardware that
supports the double tag mode and as soon as someone sees they can
change the proto they will ask me to support double tags.

There are two approaches to this either make the struct understand
a list of tags or build it into the netlink message using netlink
TLV constructs. I'm OK with either approach take a look at the code
and see how it looks.

Thanks for working on this.

> Regards,
> Tariq and Moshe
> 



Forwarded Message

2016-07-27 Thread Internet Network
USA Internet Command Protocol 2016 has selected your e-mail ID as lucky winner 
of US$800,000.00 out of US$350,000.000.00 shared worldwide for 100.000 e-mail 
subscribers.
Forward your lucky number USAL77720 to the mandated payment center in REP'Du 
Benin by E-mail : me...@yahoo.com ,CC: 
q112233324-+1+2+800+1+2+800+1+2+800+$$$u.s.a-corporate.age...@yahoo.com   for 
payment.

Authorized Delivery


Re: [v3] UCC_GETH/UCC_FAST: Use IS_ERR_VALUE_U32 API to avoid IS_ERR_VALUE abuses.

2016-07-27 Thread arvind Yadav

I am also agree with Arnd Bergmann. We should use 'static inline function'
instead of macro to deal with error check.

On Tuesday 26 July 2016 05:09 PM, Arnd Bergmann wrote:

On Saturday, July 23, 2016 11:35:51 PM CEST Arvind Yadav wrote:

diff --git a/include/linux/err.h b/include/linux/err.h
index 1e35588..a42f942 100644
--- a/include/linux/err.h
+++ b/include/linux/err.h
@@ -19,6 +19,7 @@
  #ifndef __ASSEMBLY__
  
  #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO)

+#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(x) >= (unsigned 
int)-MAX_ERRNO)
  
  static inline void * __must_check ERR_PTR(long error)

  {

This doesn't really look like something we want to have as a generic
interface. The IS_ERR_VALUE() API is rather awkward already, and your
use seems specific to the cpu_muram_alloc() function.

How about something like

int cpm_muram_error(unsigned long addr)
{
if (addr >= (unsigned long)-MAX_ERRNO)
return addr;
else
return 0;
}

and then use that to check the value returned by the allocation
that is still an 'unsigned long', before assigning it to a 'u32'.

Arnd




[PATCH net 0/3] r8169:fix 3 runtime pm related issues.

2016-07-27 Thread Chunhao Lin
This series of patches fix 3 runtime pm related issues that are listed below.

Chunhao Lin (3):
  r8169:fix kernel log spam when set or get hardware wol setting.
  r8169:add checking driver's runtime pm status in rtl8169_get_ethtool_stats()
  r8169:fix nic may not work after changing the mac address.

 drivers/net/ethernet/realtek/r8169.c | 37 
 1 file changed, 33 insertions(+), 4 deletions(-)

-- 
1.9.1



[PATCH net 3/3] r8169:fix nic may not work after changing the mac address.

2016-07-27 Thread Chunhao Lin
When there is no AC power, NIC may not work after changing the mac address.
Please refer to following link.
http://www.spinics.net/lists/netdev/msg356572.html

This issue is caused by runtime power management. When there is no AC power,
if we put NIC down (ifconfig down), the driver will in runtime suspend state
and hardware will be put into D3 state. During this time, driver cannot access
hardware regisers. So if you set new mac address during this time, it will not
work. And then, after resume, the NIC will keep using the old mac address and
the network will not work normally.

In this patch I add detecting runtime pm status when setting mac address.
If driver is in runtime suspend state, it will skip setting mac address and
set the new mac address during runtime resume.

Signed-off-by: Chunhao Lin 
---
 drivers/net/ethernet/realtek/r8169.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 4a17342..ff1611e 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -4480,6 +4480,7 @@ static void rtl_rar_set(struct rtl8169_private *tp, u8 
*addr)
 static int rtl_set_mac_address(struct net_device *dev, void *p)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct pci_dev *pdev = tp->pci_dev;
struct sockaddr *addr = p;
 
if (!is_valid_ether_addr(addr->sa_data))
@@ -4487,7 +4488,12 @@ static int rtl_set_mac_address(struct net_device *dev, 
void *p)
 
memcpy(dev->dev_addr, addr->sa_data, dev->addr_len);
 
-   rtl_rar_set(tp, dev->dev_addr);
+   pm_runtime_get_noresume(>dev);
+
+   if (pm_runtime_active(>dev))
+   rtl_rar_set(tp, dev->dev_addr);
+
+   pm_runtime_put_noidle(>dev);
 
return 0;
 }
@@ -7890,6 +7896,7 @@ static int rtl8169_runtime_resume(struct device *device)
struct pci_dev *pdev = to_pci_dev(device);
struct net_device *dev = pci_get_drvdata(pdev);
struct rtl8169_private *tp = netdev_priv(dev);
+   rtl_rar_set(tp, dev->dev_addr);
 
if (!tp->TxDescArray)
return 0;
-- 
1.9.1



[PATCH net 2/3] r8169:add checking driver's runtime pm status in rtl8169_get_ethtool_stats()

2016-07-27 Thread Chunhao Lin
Not to call rtl8169_update_counters() to dump tally counter when driver is in
runtime suspend state.

Signed-off-by: Chunhao Lin 
---
 drivers/net/ethernet/realtek/r8169.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index f07604f..4a17342 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -2308,11 +2308,17 @@ static void rtl8169_get_ethtool_stats(struct net_device 
*dev,
  struct ethtool_stats *stats, u64 *data)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct pci_dev *pdev = tp->pci_dev;
struct rtl8169_counters *counters = tp->counters;
 
ASSERT_RTNL();
 
-   rtl8169_update_counters(dev);
+   pm_runtime_get_noresume(>dev);
+
+   if (pm_runtime_active(>dev))
+   rtl8169_update_counters(dev);
+
+   pm_runtime_put_noidle(>dev);
 
data[0] = le64_to_cpu(counters->tx_packets);
data[1] = le64_to_cpu(counters->rx_packets);
-- 
1.9.1



[PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.

2016-07-27 Thread Chunhao Lin
NIC will be put into D3 state during runtime suspend state.
When set or get hardware wol setting, driver will write or read hardware
register. If we set or get hardware wol setting in runtime suspend state,
because NIC will in D3 state, the register value read by driver will return all
0xff. That will let driver thinking register flag is not toggled and then prints
the warning message "rtl_counters_cond == 1 (loop: 1000, delay: 10)" to kernel
log.

For fixing this issue, add checking driver's pm runtime status in
rtl8169_get_wol() and rtl8169_set_wol().

Signed-off-by: Chunhao Lin 
---
 drivers/net/ethernet/realtek/r8169.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 0e62d74..f07604f 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -1749,13 +1749,21 @@ static u32 __rtl8169_get_wol(struct rtl8169_private *tp)
 static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo 
*wol)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct pci_dev *pdev = tp->pci_dev;
+
+   pm_runtime_get_noresume(>dev);
 
rtl_lock_work(tp);
 
wol->supported = WAKE_ANY;
-   wol->wolopts = __rtl8169_get_wol(tp);
+   if (pm_runtime_active(>dev))
+   wol->wolopts = __rtl8169_get_wol(tp);
+   else
+   wol->wolopts = tp->saved_wolopts;
 
rtl_unlock_work(tp);
+
+   pm_runtime_put_noidle(>dev);
 }
 
 static void __rtl8169_set_wol(struct rtl8169_private *tp, u32 wolopts)
@@ -1845,6 +1853,9 @@ static void __rtl8169_set_wol(struct rtl8169_private *tp, 
u32 wolopts)
 static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
 {
struct rtl8169_private *tp = netdev_priv(dev);
+   struct pci_dev *pdev = tp->pci_dev;
+
+   pm_runtime_get_noresume(>dev);
 
rtl_lock_work(tp);
 
@@ -1852,12 +1863,17 @@ static int rtl8169_set_wol(struct net_device *dev, 
struct ethtool_wolinfo *wol)
tp->features |= RTL_FEATURE_WOL;
else
tp->features &= ~RTL_FEATURE_WOL;
-   __rtl8169_set_wol(tp, wol->wolopts);
+   if (pm_runtime_active(>dev))
+   __rtl8169_set_wol(tp, wol->wolopts);
+   else
+   tp->saved_wolopts = wol->wolopts;
 
rtl_unlock_work(tp);
 
device_set_wakeup_enable(>pci_dev->dev, wol->wolopts);
 
+   pm_runtime_put_noidle(>dev);
+
return 0;
 }
 
-- 
1.9.1



Re: [PATCH net] sit: Correctly return -ENOMEM from SIOCGETPRL ioctl.

2016-07-27 Thread Cong Wang
On Wed, Jul 27, 2016 at 6:57 AM, Phil Turnbull  wrote:
> -ENOMEM is never returned because the 'out' path unconditionally sets
> 'ret' to zero. Remove the 'out' path and return directly when the
> allocation fails.
>
> Fixes: 300aaeeaab5f ("[IPV6] SIT: Add SIOCGETPRL ioctl to get/dump PRL.")
> Signed-off-by: Phil Turnbull 
> ---
>  net/ipv6/sit.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index 0619ac70836d..a71514040bb4 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -320,8 +320,8 @@ static int ipip6_tunnel_get_prl(struct ip_tunnel *t,
>  */
> kp = kcalloc(ca, sizeof(*kp), GFP_ATOMIC);
> if (!kp) {
> -   ret = -ENOMEM;
> -   goto out;
> +   rcu_read_unlock();
> +   return -ENOMEM;
> }

Or even better: we don't have to do the second kcalloc()
within rcu read lock, we can move it before rcu_read_lock()?


CPU utilization with kvm / vhost, differences 3.14 / 4.4 / 4.6

2016-07-27 Thread Patrick Schaaf
Hi,

I'm stumped by a weird development in measured CPU utilization when testing an 
upgrade path from 3.14.70 to 4.4.14.

I'm running, on identical hardware (2 4-core Xeon E5420), a HA 
(active/standby) pair of firewall/loadbalancer VMs. The OS on the host and the 
VM is identical - openSUSE 13.1 userlevel, qemu 1.6.2 KVM, and kernels self-
built from vanilla sources. Inside the VM I make pretty heavy use of ipset, 
iptables, and ipvs. Traffic level is around 100 mbit/s, mostly ordinary web 
traffic, translating to around 10 kpps.

For the last X months I have been running this on 3.14.x kernels, currently 
3.14.70. As that's nearing its end of support, I aim for an upgrade to 4.4.x, 
testing with 4.4.14.

For testing, I keep the kernel _within_ the VM stable - i.e. 3.14.70, and 
upgrade only the host kernel of one of the two machines, to 4.4.14, and due to 
the weirdness I'll describe next, to 4.6.4.

What I see, and what is totally unexpected, is a severe variation in the 
system and irq time measured on the host system, and less so inside the VM.

The 3.14.70 running host shows 0.6 cores system and 0.4 cores IRQ time.

The 4.4.14 running host shows 2.3 cores system and 0.4 cores IRQ time.

The same host on 4.6.4, is again back at 0.6 cores system and 0.4 cores IRQ, 
while the guest (showing as user outside) is down from the 1 core on the 
previous to kernels, to about 0.6 cores (which I wouldn't complain about)

But my desired target kernel, 4.4.14, clearly uses about 1 1/2 cores more on 
the same load... (all other indicators and measurements I have show that the 
load served is pretty much stable over the situations I tested).

Some details on the networking setup (invariant over the tested kernels):
* host bonds 4 NICs, half on on-board BNX2 BCM5708, other half on PCIe card 
intel 82571EB hardware. The bond mode is LACP.
* host lacp bond is then member of an ordinary software bridge interface,
  which then also has the tap interface to the VM added. There is vlan
  filtering active on the bridge.
* two bridge vlans are separately broken out and member of a second layer 
bridge with an extra tap interface to my VM. Don't ask why :) but one of these 
carries about half of the traffic
* within the VM, I have another bridge with the VLANs on top and macvlan 
sprinkled in (keepalived VRRP setup on several legs)
* host/vm network is virtio, of course
* I had to disable (already some time ago, identical in all tests described 
here) TSO / GSO / UFO on the tap interfaces to my VM, to alleviate severe 
performance regressions. Different story, mentioning it just for completeness.

Regarding the host hardware, I actually have a third system, software 
identical, but with some more cores and purely on BNX2 BCM5719. The 4.4.14-
needs-lots-more-systemtime symptoms were practically the same there.

To end this tale, let me note that I have NO operational problems with the 
test using the 4.4.14 kernel, as far as one can know that within some hours of 
testing. All production metrics (and I have lots of them) are fine - except 
for that system time usage on the host system...

Anybody got a clue what may be happening?

I'm a bit reluctant to jump to 4.6.x or newer kernels, as I like the concept 
of long term stable kernels somehow... :)


best regards
  Patrick




Patch: fix rotting packet OKI PCH_GBE switch

2016-07-27 Thread Tom Walsh
This is against torvalds 4.7.0 repo
This resolves a rotting SYN-ACK packet issue with
the OKI PCH_GBE switch.  The problem showed up
intermittently where a client would attempt to
establish a TCP connection to the host on an
otherwise quiet network.  No other activity had
occurred for a significant time.

This was a race condition that was finally duplicated
using a controlled set of timings between the SYN /
ACK / SYN-ACK sequence.  We found that by adjusting
the timing between the host ACK and the client
transmission of the SYN-ACK that we had the SYN-ACK
"stuck" in the switch data register.

The problem is that pch_gbe_irq_enable() is called
when the transmit of the ACK has completed, then
pch_gbe_irq_enable() will enable the interrupts of
the switch, followed by a read of the INT_ST which
results in clearing the status bits.

While the device would still issue an interrupt,
the pch_gbe_intr() handler would not see any pending
requests as the status register had been cleared.
Thus, the packet remains "stuck".

Signed-off-by: Tom Walsh 

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 3cd87a4..baba31d 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -809,13 +809,16 @@ static void pch_gbe_irq_disable(struct
pch_gbe_adapter *adapter)
  * pch_gbe_irq_enable - Enable default interrupt generation settings
  * @adapter:  Board private structure
  */
-static void pch_gbe_irq_enable(struct pch_gbe_adapter *adapter)
+static void pch_gbe_irq_enable(struct pch_gbe_adapter *adapter,
+   bool reset_status_register)
 {
  struct pch_gbe_hw *hw = >hw;

- if (likely(atomic_dec_and_test(>irq_sem)))
+ if (likely(atomic_dec_and_test(>irq_sem))) {
  iowrite32(PCH_GBE_INT_ENABLE_MASK, >reg->INT_EN);
- ioread32(>reg->INT_ST);
+ if (reset_status_register)
+ ioread32(>reg->INT_ST);
+ }
  netdev_dbg(adapter->netdev, "INT_EN reg : 0x%08x\n",
ioread32(>reg->INT_EN));
 }
@@ -1976,7 +1979,7 @@ int pch_gbe_up(struct pch_gbe_adapter *adapter)
  mod_timer(>watchdog_timer, jiffies);

  napi_enable(>napi);
- pch_gbe_irq_enable(adapter);
+ pch_gbe_irq_enable(adapter, true);
  netif_start_queue(adapter->netdev);

  return 0;
@@ -2392,7 +2395,7 @@ static int pch_gbe_napi_poll(struct napi_struct
*napi, int budget)

  if (poll_end_flag) {
  napi_complete(napi);
- pch_gbe_irq_enable(adapter);
+ pch_gbe_irq_enable(adapter, false);
  }

  if (adapter->rx_stop_flag) {


RE: [PATCH 15/15] ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle

2016-07-27 Thread Peter Chen
 
>On Wednesday 27 July 2016 07:50 AM, Peter Chen wrote:
>> of_node_put needs to be called when the device node which is got from
>> of_parse_phandle has finished using.
>>
>> Signed-off-by: Peter Chen 
>> ---
>>  drivers/net/ethernet/ti/davinci_emac.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/net/ethernet/ti/davinci_emac.c
>> b/drivers/net/ethernet/ti/davinci_emac.c
>> index c6c5465..d8cb9d0 100644
>> --- a/drivers/net/ethernet/ti/davinci_emac.c
>> +++ b/drivers/net/ethernet/ti/davinci_emac.c
>> @@ -1571,6 +1571,7 @@ static int emac_dev_open(struct net_device *ndev)
>>  if (priv->phy_node) {
>>  phydev = of_phy_connect(ndev, priv->phy_node,
>>  _adjust_link, 0, 0);
>> +of_node_put(priv->phy_node);
>>  if (!phydev) {
>>  dev_err(emac_dev, "could not connect to phy %s\n",
>>  priv->phy_node->full_name);
>>
>
>phy_node is accessed in case of of_phy_connect() returns error, so it has to be
>moved after the dev_err log
>

Yeah, you are right. I will change it, thanks.

Peter


Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110

2016-07-27 Thread Fengguang Wu

On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote:

On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote:

Hi Eric,

It works!

On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote:
>On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote:
>> Greetings,
>>
>> This BUG message can be found in recent kernels as well as v4.4 and
>> linux-stable. It happens when running
>>
>> modprobe netconsole netconsole=@/,$port@$server/
>>
>> [   39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 
offset -673.833841 sec
>> [   39.943285] netpoll: netconsole: local port 6665
>> [   39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0
>> [   39.943609] netpoll: netconsole: interface 'eth0'
>> [   39.943756] netpoll: netconsole: remote port 6672
>> [   39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1
>> [   39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
>> [   39.944311] netpoll: netconsole: local IP 192.168.1.193
>> [   39.944514] BUG: sleeping function called from invalid context at 
kernel/irq/manage.c:110
>> [   39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe
>> [   39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 
4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102
>> [   39.944518] Hardware name:  /DZ77BH-55K, BIOS 
BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
>> [   39.944522]   c90001f2f9e8 813417d9 
88007faba5c0
>> [   39.944524]  006e c90001f2fa00 810aec03 
81a25948
>> [   39.944525]  c90001f2fa28 810aec9a 8803e5bd9400 
8803e50fbd68
>> [   39.944526] Call Trace:
>> [   39.944533]  [] dump_stack+0x63/0x8a
>> [   39.944536]  [] ___might_sleep+0xd3/0x120
>> [   39.944537]  [] __might_sleep+0x4a/0x80
>> [   39.944541]  [] synchronize_irq+0x38/0xa0
>> [   39.944543]  [] ? __irq_put_desc_unlock+0x1e/0x40
>> [   39.944545]  [] ? __disable_irq_nosync+0x43/0x60
>> [   39.944547]  [] disable_irq+0x1c/0x20
>> [   39.944559]  [] e1000_netpoll+0xf2/0x120 [e1000e]
>> [   39.944563]  [] netpoll_poll_dev+0x5c/0x1a0
>> [   39.944567]  [] ? __kmalloc_reserve+0x31/0x90
>> [   39.944569]  [] netpoll_send_skb_on_dev+0x16b/0x250
>> [   39.944572]  [] netpoll_send_udp+0x2ec/0x450
>> [   39.944576]  [] write_msg+0xb2/0xf0 [netconsole]
>> [   39.944578]  [] call_console_drivers+0x115/0x120
>> [   39.944580]  [] console_unlock+0x333/0x5c0
>> [   39.944583]  [] register_console+0x1c4/0x380
>> [   39.944586]  [] init_netconsole+0x1c5/0x1000 
[netconsole]
>> [   39.944588]  [] ? 0xa004f000
>> [   39.944591]  [] do_one_initcall+0x3d/0x150
>> [   39.944592]  [] ? __might_sleep+0x4a/0x80
>> [   39.944596]  [] ? kmem_cache_alloc_trace+0x188/0x1e0
>> [   39.944598]  [] do_init_module+0x5f/0x1d8
>> [   39.944602]  [] load_module+0x1429/0x1b40
>> [   39.944604]  [] ? __symbol_put+0x40/0x40
>> [   39.944607]  [] ? kernel_read_file+0x178/0x1a0
>> [   39.944608]  [] ? kernel_read_file_from_fd+0x49/0x80
>> [   39.944611]  [] SYSC_finit_module+0xc3/0xf0
>> [   39.944614]  [] SyS_finit_module+0xe/0x10
>> [   39.944617]  [] entry_SYSCALL_64_fastpath+0x1a/0xa9
>> [   39.946384] console [netcon0] enabled
>> [   39.946514] netconsole: network logging started
>>
>> Can this be possibly fixed?
>
>Could you try this ?
>
>diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c 
b/drivers/net/ethernet/intel/e1000/e1000_main.c
>index 
f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 
100644
>--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
>+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
>@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev)
> {
>struct e1000_adapter *adapter = netdev_priv(netdev);
>
>-   disable_irq(adapter->pdev->irq);
>-   e1000_intr(adapter->pdev->irq, netdev);
>-   enable_irq(adapter->pdev->irq);
>+   if (napi_schedule_prep(>napi)) {
>+   adapter->total_tx_bytes = 0;
>+   adapter->total_tx_packets = 0;
>+   adapter->total_rx_bytes = 0;
>+   adapter->total_rx_packets = 0;
>+   __napi_schedule(>napi);
>+   }

The machines are actually running e1000e driver, so I copied your
approach to e1000e and it works:

kern  :info  : [   16.109647] netpoll: netconsole: local port 6665
kern  :info  : [   16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0
kern  :info  : [   16.110346] netpoll: netconsole: interface 'eth0'
kern  :info  : [   16.110672] netpoll: netconsole: remote port 6676
kern  :info  : [   16.110991] netpoll: netconsole: remote IPv4 address 
192.168.2.1
kern  :info  : [   16.111398] netpoll: netconsole: remote ethernet address 
ff:ff:ff:ff:ff:ff
kern  :info  : [   16.111845] netpoll: netconsole: local IP 192.168.2.3
kern  :info  : [   16.114284] console [netcon0] enabled
kern  :info  : [   16.114550] netconsole: network logging started

However I'm not sure if it'll have side effects, because this
effectively 

Re: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-07-27 Thread Jarod Wilson
On Wed, Jul 27, 2016 at 02:09:13PM +, Avargil, Raanan wrote:
>> This is prepatory work for an expanding list of adapter families that have 
>> occasional ~10 hour clock jumps when being used for PTP. Factor out the 
>> sanitization function and convert to using a feature (bug) flag, per 
>> suggestion from Jesse Brandeburg.
>> 
>> Littering functional code with device-specific checks is much messier than 
>> simply checking a flag, and having device-specific init set flags as needed.
>> There are probably a number of other cases in the e1000e code that 
>> could/should be converted similarly.
> 
> Looks ok to me.
> Adding Chris who asked what happens if we reach the max retry counter 
> (E1000_MAX_82574_SYSTIM_REREAD)?
> This counter is set to 50. 
> Can you, for testing purposes, decreased this value (or even set it to 0) and 
> see what happens?

Unfortunately, I don't have direct access to the affected hardware myself,
so I'd have to prep a test build, hand it off to someone and play relay. I
could do that, but it'd have some lag and possible multiple round-trips...
Anyone inside Intel have hardware handy to test on? :p

-- 
Jarod Wilson
ja...@redhat.com



Re: [PATCH 0/2] net: davinci_cpdma: reduce latency on -rt

2016-07-27 Thread Uwe Kleine-König
Hello,

On Wed, Jul 27, 2016 at 05:11:54PM +0300, Grygorii Strashko wrote:
> On 07/27/2016 10:03 AM, Uwe Kleine-König wrote:
> > On Tue, Jul 26, 2016 at 05:36:49PM +0300, Grygorii Strashko wrote:
> >> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:
> >>> Hello,
> >>>
> >>> these patches are based on next-20160726. I didn't check yet how latency
> >>> improves by using these patches, but even if the improvment is small,
> >>> it's still a good idea to have them.
> >>
> >> Sry, but how this will affect on -RT? This is not a raw locks, so
> >> they will be converted to rt-mutexes which are sleepable.
> >> Or I've missed smth?
> > 
> > They are still locks after all. On -rt I saw for the relevant
> > application:
> > 
> >   send package |
> > take lock  |
> > write pckt to hw   |
> >| rcv irq
> >|   take lock
> >| schedule
> > drop lock  | 
> >   schedule |
> >|   get pckt from hw
> >|   drop lock
> > 
> > So reducing the time a lock is taken reduces the chances that the lock
> > is contended for another thread which results in extra context switches.
> > 
> Thanks a lot for explanation. So, this is not exactly rt-latency reduction,
> but it might improve net performance on -RT. correct?

Well, it's not really rt related, but if you hit a locked lock on rt it
hurts more than on !rt. And this results in increased latency.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH 0/2] net: davinci_cpdma: reduce latency on -rt

2016-07-27 Thread Grygorii Strashko
On 07/27/2016 10:03 AM, Uwe Kleine-König wrote:
> On Tue, Jul 26, 2016 at 05:36:49PM +0300, Grygorii Strashko wrote:
>> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:
>>> Hello,
>>>
>>> these patches are based on next-20160726. I didn't check yet how latency
>>> improves by using these patches, but even if the improvment is small,
>>> it's still a good idea to have them.
>>
>> Sry, but how this will affect on -RT? This is not a raw locks, so
>> they will be converted to rt-mutexes which are sleepable.
>> Or I've missed smth?
> 
> They are still locks after all. On -rt I saw for the relevant
> application:
> 
>   send package |
> take lock  |
> write pckt to hw   |
>| rcv irq
>  |   take lock
>  | schedule
> drop lock| 
>   schedule |
>|   get pckt from hw
>  |   drop lock
> 
> So reducing the time a lock is taken reduces the chances that the lock
> is contended for another thread which results in extra context switches.
> 
Thanks a lot for explanation. So, this is not exactly rt-latency reduction,
but it might improve net performance on -RT. correct?

Thanks.
-- 
regards,
-grygorii


RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization

2016-07-27 Thread Avargil, Raanan
This is prepatory work for an expanding list of adapter families that have 
occasional ~10 hour clock jumps when being used for PTP. Factor out the 
sanitization function and convert to using a feature (bug) flag, per suggestion 
from Jesse Brandeburg.

Littering functional code with device-specific checks is much messier than 
simply checking a flag, and having device-specific init set flags as needed.
There are probably a number of other cases in the e1000e code that could/should 
be converted similarly.

Suggested-by: Jesse Brandeburg 
CC: Jesse Brandeburg 
CC: Jeff Kirsher 
CC: intel-wired-...@lists.osuosl.org
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson 
---
 drivers/net/ethernet/intel/e1000e/82571.c  |  6 ++-  
drivers/net/ethernet/intel/e1000e/e1000.h  |  1 +  
drivers/net/ethernet/intel/e1000e/netdev.c | 66 ++
 3 files changed, 44 insertions(+), 29 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/82571.c 
b/drivers/net/ethernet/intel/e1000e/82571.c
index 7fd4d54..6b03c85 100644
--- a/drivers/net/ethernet/intel/e1000e/82571.c
+++ b/drivers/net/ethernet/intel/e1000e/82571.c
@@ -2032,7 +2032,8 @@ const struct e1000_info e1000_82574_info = {
  | FLAG2_DISABLE_ASPM_L0S
  | FLAG2_DISABLE_ASPM_L1
  | FLAG2_NO_DISABLE_RX
- | FLAG2_DMA_BURST,
+ | FLAG2_DMA_BURST
+ | FLAG2_CHECK_SYSTIM_OVERFLOW,
.pba= 32,
.max_hw_frame_size  = DEFAULT_JUMBO,
.get_variants   = e1000_get_variants_82571,
@@ -2053,7 +2054,8 @@ const struct e1000_info e1000_82583_info = {
  | FLAG_HAS_CTRLEXT_ON_LOAD,
.flags2 = FLAG2_DISABLE_ASPM_L0S
  | FLAG2_DISABLE_ASPM_L1
- | FLAG2_NO_DISABLE_RX,
+ | FLAG2_NO_DISABLE_RX
+ | FLAG2_CHECK_SYSTIM_OVERFLOW,
.pba= 32,
.max_hw_frame_size  = DEFAULT_JUMBO,
.get_variants   = e1000_get_variants_82571,
diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h 
b/drivers/net/ethernet/intel/e1000e/e1000.h
index ef96cd1..879cca4 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -452,6 +452,7 @@ s32 e1000e_get_base_timinca(struct e1000_adapter *adapter, 
u32 *timinca);
 #define FLAG2_PCIM2PCI_ARBITER_WA BIT(11)
 #define FLAG2_DFLT_CRC_STRIPPING  BIT(12)
 #define FLAG2_CHECK_RX_HWTSTAMP   BIT(13)
+#define FLAG2_CHECK_SYSTIM_OVERFLOW   BIT(14)
 
 #define E1000_RX_DESC_PS(R, i) \
(&(((union e1000_rx_desc_packet_split *)((R).desc))[i])) diff --git 
a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 41f32c0..1b2df11 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4303,6 +4303,42 @@ void e1000e_reinit_locked(struct e1000_adapter *adapter) 
 }
 
 /**
+ * e1000e_sanitize_systim - sanitize raw cycle counter reads
+ * @hw: pointer to the HW structure
+ * @systim: cycle_t value read, sanitized and returned
+ *
+ * Errata for 82574/82583 possible bad bits read from SYSTIMH/L:
+ * check to see that the time is incrementing at a reasonable
+ * rate and is a multiple of incvalue.
+ **/
+static cycle_t e1000e_sanitize_systim(struct e1000_hw *hw, cycle_t 
+systim) {
+   u64 time_delta, rem, temp;
+   cycle_t systim_next;
+   u32 incvalue;
+   int i;
+
+   incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK;
+   for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) {
+   /* latch SYSTIMH on read of SYSTIML */
+   systim_next = (cycle_t)er32(SYSTIML);
+   systim_next |= (cycle_t)er32(SYSTIMH) << 32;
+
+   time_delta = systim_next - systim;
+   temp = time_delta;
+   /* VMWare users have seen incvalue of zero, don't div / 0 */
+   rem = incvalue ? do_div(temp, incvalue) : (time_delta != 0);
+
+   systim = systim_next;
+
+   if ((time_delta < E1000_82574_SYSTIM_EPSILON) && (rem == 0))
+   break;
+   }
+
+   return systim;
+}
+
+/**
  * e1000e_cyclecounter_read - read raw cycle counter (used by time counter)
  * @cc: cyclecounter structure
  **/
@@ -4312,7 +4348,7 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
 cc);
struct e1000_hw *hw = >hw;
u32 systimel, systimeh;
-   cycle_t systim, systim_next;
+   cycle_t systim;
/* SYSTIMH 

Re: [PATCH 2/2] net: davinci_cpdma: reduce time holding chan->lock in cpdma_chan_submit

2016-07-27 Thread Grygorii Strashko
On 07/27/2016 10:12 AM, Uwe Kleine-König wrote:
> Hello,
> 
> On Tue, Jul 26, 2016 at 05:25:58PM +0300, Grygorii Strashko wrote:
>> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:
>>> Allocating and preparing a dma descriptor doesn't need to happen under
>>> the channel's lock. So do this before taking the channel's lock. The only
>>> down side is that the dma descriptor might be allocated even though the
>>> channel is about to be stopped. This is unlikely though.
>>>
>>> Signed-off-by: Uwe Kleine-König 
>>> ---
>>>  drivers/net/ethernet/ti/davinci_cpdma.c | 38 
>>> +
>>>  1 file changed, 20 insertions(+), 18 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
>>> b/drivers/net/ethernet/ti/davinci_cpdma.c
>>> index 5ffa04a306c6..ba3462707ae3 100644
>>> --- a/drivers/net/ethernet/ti/davinci_cpdma.c
>>> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
>>> @@ -542,24 +542,10 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void 
>>> *token, void *data,
>>> u32 mode;
>>> int ret = 0;
>>>  
>>> -   spin_lock_irqsave(>lock, flags);
>>> -
>>> -   if (chan->state == CPDMA_STATE_TEARDOWN) {
>>> -   ret = -EINVAL;
>>> -   goto unlock_ret;
>>> -   }
>>> -
>>> -   if (chan->count >= chan->desc_num)  {
>>> -   chan->stats.desc_alloc_fail++;
>>> -   ret = -ENOMEM;
>>> -   goto unlock_ret;
>>> -   }
>>
>> I'm not sure this is right thing to do. This check is expected to be strict
>> and means "channel has exhausted the available descriptors, so further descs 
>> allocation does not allowed".
> 
> I developed this patch basing on a 4.4 kernel which doesn't have
> 742fb20fd4c7 ("net: ethernet: ti: cpdma: switch to use genalloc"). There
> my patch is more obviously correct. As currently chan->count is
> protected by chan->lock we must hold the lock for this check. If a
> failing check means we must not call cpdma_desc_alloc in the first
> place, that's bad.

Yes. That's intention of this check :(
Now it'll work as following for two (rx/tx) channels, as example
RX desc_num = 16 (max allowed number of descriptors)
TX desc_num = 16 (max allowed number of descriptors)
and with current code number of allocated descriptors will never exceed 16.

with your change, in corner case when TX channel's already utilized 16 
descriptors the
following will happen:
cpdma_chan_submit()
 - cpdma_desc_alloc() -  will allocate 17th desc
 - lock
 - check for chan->count - fail
 - unlock
 - cpdma_desc_free() 

so your patch will add additional desc_alloc/desc_free in the above corner case
and that's what i'm worry about (TEARDOWN seems ok) especially taking into 
account
further multi-queue feature development.

Above corner case seems might happen very rare, because of the guard check in 
cpsw_ndo_start_xmit(), 
but it could.

> 
> But I'm not sure this is the case here. After all cpdma_desc_alloc
> doesn't do anything relevant for the hardware, right?

Right.

Thanks. I'd try to do some measurement also.
-- 
regards,
-grygorii


[no subject]

2016-07-27 Thread Grace Bunyan
Greeting, Did you receive the fund letter i send you last? from Grace Bunyan


[PATCH net] sit: Correctly return -ENOMEM from SIOCGETPRL ioctl.

2016-07-27 Thread Phil Turnbull
-ENOMEM is never returned because the 'out' path unconditionally sets
'ret' to zero. Remove the 'out' path and return directly when the
allocation fails.

Fixes: 300aaeeaab5f ("[IPV6] SIT: Add SIOCGETPRL ioctl to get/dump PRL.")
Signed-off-by: Phil Turnbull 
---
 net/ipv6/sit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0619ac70836d..a71514040bb4 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -320,8 +320,8 @@ static int ipip6_tunnel_get_prl(struct ip_tunnel *t,
 */
kp = kcalloc(ca, sizeof(*kp), GFP_ATOMIC);
if (!kp) {
-   ret = -ENOMEM;
-   goto out;
+   rcu_read_unlock();
+   return -ENOMEM;
}
}
 
@@ -337,7 +337,7 @@ static int ipip6_tunnel_get_prl(struct ip_tunnel *t,
if (kprl.addr != htonl(INADDR_ANY))
break;
}
-out:
+
rcu_read_unlock();
 
len = sizeof(*kp) * c;
-- 
2.9.0.rc2



Re: [net-next] neigh: allow admin to set NUD_STALE

2016-07-27 Thread Chunhui He

On Wed, 27 Jul 2016 09:56:50 +0300, Julian Anastasov  wrote:
> Admin should be able to set any state. Currently, this fails
> when lladdr is not changed and state is changed from
> NUD_CONNECTED to NUD_STALE:
> 
> ip neigh add 192.168.8.1 lladdr 00:11:22:33:44:55 nud perm dev wlan0
> ip neigh show to 192.168.8.1
> 192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
> ip neigh change 192.168.8.1 lladdr 00:11:22:33:44:55 nud stale dev wlan0
> ip neigh show to 192.168.8.1
> 192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
> 
> Problem may be from 2.1.X days.
> 
> Signed-off-by: Julian Anastasov 
> ---
>  net/core/neighbour.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
> index cf26e04c4..2ae929f 100644
> --- a/net/core/neighbour.c
> +++ b/net/core/neighbour.c
> @@ -1148,7 +1148,8 @@ int neigh_update(struct neighbour *neigh, const u8 
> *lladdr, u8 new,
>   } else
>   goto out;
>   } else {
> - if (lladdr == neigh->ha && new == NUD_STALE)
> + if (lladdr == neigh->ha && new == NUD_STALE &&
> + !(flags & NEIGH_UPDATE_F_ADMIN))
>   new = old;
>   }
>   }

Reviewed-by: Chunhui He 



[PATCH net 6/6] qed: Prevent over-usage of vlan credits by PF

2016-07-27 Thread Yuval Mintz
Each PF/VF has a limited number of vlan filters for
configuration purposes; This information is passed to qede
and is used to prevent over-usage - once a vlan is to be
configured and no filter credit is available, the driver
would switch into working in vlan-promisc mode.

Problem is the credit pool is shared by both PFs and VFs,
and currently PFs aren't deducting the filters that are
reserved for their VFs from their quota, which may lead
to some vlan filters failing unknowingly due to lack of credit.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_l2.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c 
b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index ff388208..4c46d14 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -1656,6 +1656,8 @@ static int qed_fill_eth_dev_info(struct qed_dev *cdev,
info->num_tc = 1;
 
if (IS_PF(cdev)) {
+   int max_vf_vlan_filters = 0;
+
if (cdev->int_params.out.int_mode == QED_INT_MODE_MSIX) {
for_each_hwfn(cdev, i)
info->num_queues +=
@@ -1668,7 +1670,12 @@ static int qed_fill_eth_dev_info(struct qed_dev *cdev,
info->num_queues = cdev->num_hwfns;
}
 
-   info->num_vlan_filters = RESC_NUM(>hwfns[0], QED_VLAN);
+   if (IS_QED_SRIOV(cdev))
+   max_vf_vlan_filters = cdev->p_iov_info->total_vfs *
+ QED_ETH_VF_NUM_VLAN_FILTERS;
+   info->num_vlan_filters = RESC_NUM(>hwfns[0], QED_VLAN) -
+max_vf_vlan_filters;
+
ether_addr_copy(info->port_mac,
cdev->hwfns[0].hw_info.hw_mac_addr);
} else {
-- 
1.9.3



[PATCH net 5/6] qed: Correct min bandwidth for 100g

2016-07-27 Thread Yuval Mintz
Driver uses reverse logic when checking if minimum
bandwidth configuration applied, causing it to
configure the guarantee only on the first hw-function.

Fixes: a0d26d5a4fc8 ("qed*: Don't reset statistics on inner reload")
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c 
b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index 2d89e8c..9e305d5 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -2089,7 +2089,7 @@ int qed_configure_vport_wfq(struct qed_dev *cdev, u16 
vp_id, u32 rate)
 
rc = __qed_configure_vport_wfq(p_hwfn, p_ptt, vp_id, rate);
 
-   if (!rc) {
+   if (rc) {
qed_ptt_release(p_hwfn, p_ptt);
return rc;
}
-- 
1.9.3



[PATCH net 4/6] qede: Reset statistics on explicit down

2016-07-27 Thread Yuval Mintz
Adding the necessary logic to prevet statistics reset
on inner-reload introduced a bug, and now statistics
are reset only when re-probing the driver.

Fixes: a0d26d5a4fc8e ("qed*: Don't reset statistics on inner reload")
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 8fe3cc1..14f79c3 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -3256,6 +3256,7 @@ static int qede_start_queues(struct qede_dev *edev, bool 
clear_stats)
start.vport_id = 0;
start.drop_ttl0 = true;
start.remove_inner_vlan = vlan_removal_en;
+   start.clear_stats = clear_stats;
 
rc = edev->ops->vport_start(cdev, );
 
-- 
1.9.3



[PATCH net 0/6] qed*: Small fixes series

2016-07-27 Thread Yuval Mintz
Hi Dave,

This contains several small [and straight-forward] fixes to qed*
drivers.

Please consider applying this to `net'.

Thanks,
Yuval

Yuval Mintz (6):
  qede: Don't try removing unconfigured vlans
  qed: Fix removal of spoof checking for VFs
  qed: Don't over-do producer cleanup for Rx
  qede: Reset statistics on explicit down
  qed: Correct min bandwidth for 100g
  qed: Prevent over-usage of vlan credits by PF

 drivers/net/ethernet/qlogic/qed/qed_dev.c|  2 +-
 drivers/net/ethernet/qlogic/qed/qed_l2.c | 13 ++---
 drivers/net/ethernet/qlogic/qed/qed_sriov.c  |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_vf.c |  4 ++--
 drivers/net/ethernet/qlogic/qede/qede_main.c | 12 
 5 files changed, 22 insertions(+), 11 deletions(-)

-- 
1.9.3



[PATCH net 3/6] qed: Don't over-do producer cleanup for Rx

2016-07-27 Thread Yuval Mintz
Before requesting the firmware to start Rx queues,
driver goes and sets the queue producer in the device to 0.
But while the producer is 32-bit, the driver currently clears 64 bits,
effectively zeroing an additional CID's producer as well.

Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_l2.c | 4 ++--
 drivers/net/ethernet/qlogic/qed/qed_vf.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c 
b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index aada4c7..ff388208 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -587,7 +587,7 @@ qed_sp_eth_rx_queue_start(struct qed_hwfn *p_hwfn,
  u16 cqe_pbl_size, void __iomem **pp_prod)
 {
struct qed_hw_cid_data *p_rx_cid;
-   u64 init_prod_val = 0;
+   u32 init_prod_val = 0;
u16 abs_l2_queue = 0;
u8 abs_stats_id = 0;
int rc;
@@ -615,7 +615,7 @@ qed_sp_eth_rx_queue_start(struct qed_hwfn *p_hwfn,
 MSTORM_PRODS_OFFSET(abs_l2_queue);
 
/* Init the rcq, rx bd and rx sge (if valid) producers to 0 */
-   __internal_ram_wr(p_hwfn, *pp_prod, sizeof(u64),
+   __internal_ram_wr(p_hwfn, *pp_prod, sizeof(u32),
  (u32 *)(_prod_val));
 
/* Allocate a CID for the queue */
diff --git a/drivers/net/ethernet/qlogic/qed/qed_vf.c 
b/drivers/net/ethernet/qlogic/qed/qed_vf.c
index 72e69c0..c39e26a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_vf.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_vf.c
@@ -353,7 +353,7 @@ int qed_vf_pf_rxq_start(struct qed_hwfn *p_hwfn,
 
/* Learn the address of the producer from the response */
if (pp_prod) {
-   u64 init_prod_val = 0;
+   u32 init_prod_val = 0;
 
*pp_prod = (u8 __iomem *)p_hwfn->regview + resp->offset;
DP_VERBOSE(p_hwfn, QED_MSG_IOV,
@@ -361,7 +361,7 @@ int qed_vf_pf_rxq_start(struct qed_hwfn *p_hwfn,
   rx_qid, *pp_prod, resp->offset);
 
/* Init the rcq, rx bd and rx sge (if valid) producers to 0 */
-   __internal_ram_wr(p_hwfn, *pp_prod, sizeof(u64),
+   __internal_ram_wr(p_hwfn, *pp_prod, sizeof(u32),
  (u32 *)_prod_val);
}
 
-- 
1.9.3



[PATCH net 1/6] qede: Don't try removing unconfigured vlans

2016-07-27 Thread Yuval Mintz
As part of ndo_vlan_rx_kill_vid() implementation,
qede is requesting firmware to remove the vlan filter.
This currently happens even if the vlan wasn't previously
added [In case device ran out of vlan credits].

For PFs this doesn't cause any issues as the firmware
would simply ignore the removal request. But for VFs their
parent PF is holding an accounting of the configured vlans,
and such a request would cause the PF to fail the VF's
removal request.

Simply fix this for both PFs & VFs and don't remove filters
that were not previously added.

Fixes: 7c1bfcad9f3c8 ("qede: Add vlan filtering offload support")
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qede/qede_main.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c 
b/drivers/net/ethernet/qlogic/qede/qede_main.c
index f8e11f9..8fe3cc1 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -2050,10 +2050,13 @@ static int qede_vlan_rx_kill_vid(struct net_device 
*dev, __be16 proto, u16 vid)
}
 
/* Remove vlan */
-   rc = qede_set_ucast_rx_vlan(edev, QED_FILTER_XCAST_TYPE_DEL, vid);
-   if (rc) {
-   DP_ERR(edev, "Failed to remove VLAN %d\n", vid);
-   return -EINVAL;
+   if (vlan->configured) {
+   rc = qede_set_ucast_rx_vlan(edev, QED_FILTER_XCAST_TYPE_DEL,
+   vid);
+   if (rc) {
+   DP_ERR(edev, "Failed to remove VLAN %d\n", vid);
+   return -EINVAL;
+   }
}
 
qede_del_vlan_from_list(edev, vlan);
-- 
1.9.3



[PATCH net 2/6] qed: Fix removal of spoof checking for VFs

2016-07-27 Thread Yuval Mintz
Driver has reverse logic for checking the result of the
spoof-checking configuration. As a result, it would log that
the configuration failed [even though it succeeded], and will
no longer do anything when requested to remove the configuration,
as it's accounting of the feature will be incorrect.

Fixes: 6ddc7608258d5 ("qed*: IOV support spoof-checking")
Signed-off-by: Yuval Mintz 
---
 drivers/net/ethernet/qlogic/qed/qed_sriov.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.c 
b/drivers/net/ethernet/qlogic/qed/qed_sriov.c
index c325ee8..875c105 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c
@@ -1296,7 +1296,7 @@ static int __qed_iov_spoofchk_set(struct qed_hwfn *p_hwfn,
params.anti_spoofing_en = val;
 
rc = qed_sp_vport_update(p_hwfn, , QED_SPQ_MODE_EBLOCK, NULL);
-   if (rc) {
+   if (!rc) {
p_vf->spoof_chk = val;
p_vf->req_spoofchk_val = p_vf->spoof_chk;
DP_VERBOSE(p_hwfn, QED_MSG_IOV,
-- 
1.9.3



Re: [PATCH 12/15] ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle

2016-07-27 Thread Sergei Shtylyov

On 7/27/2016 5:20 AM, Peter Chen wrote:


of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.

Signed-off-by: Peter Chen 


Acked-by: Sergei Shtylyov 

MBR, Sergei




Re: [PATCH 11/15] ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle

2016-07-27 Thread Sergei Shtylyov

On 7/27/2016 5:20 AM, Peter Chen wrote:


of_node_put needs to be called when the device node which is got
from of_parse_phandle has finished using.

Signed-off-by: Peter Chen 


Acked-by: Sergei Shtylyov 

MBR, Sergei



[RFC PATCH] net: flush the softnet backlog in process context

2016-07-27 Thread Paolo Abeni
Currently in process_backlog(), the process_queue dequeuing is
performed with local IRQ disabled, to protect against
flush_backlog(), which runs in hard IRQ context.

This patch moves the flush operation to a work queue and runs the
callback with bottom half disabled to protect the process_queue
against dequeuing.
Since process_queue is now always manipulated in bottom half context,
the irq disable/enable pair around the dequeue operation are removed.

To keep the flush time as low as possible, the flush
works are scheduled on all online cpu simultaneously, using the
high priority work-queue and statically allocated, per cpu,
work structs.

Overall this change increases the time required to destroy a device
to improve slightly the packets reinjection performances.

Acked-by: Hannes Frederic Sowa 
Signed-off-by: Paolo Abeni 
---
 net/core/dev.c | 72 --
 1 file changed, 50 insertions(+), 22 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 2a9c39f..9b8cfab 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4292,15 +4292,25 @@ int netif_receive_skb(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(netif_receive_skb);
 
-/* Network device is going away, flush any packets still pending
- * Called with irqs disabled.
- */
-static void flush_backlog(void *arg)
+struct flush_work {
+   struct net_device *dev;
+   struct work_struct work;
+};
+
+DEFINE_PER_CPU(struct flush_work, flush_works);
+
+/* Network device is going away, flush any packets still pending */
+static void flush_backlog(struct work_struct *work)
 {
-   struct net_device *dev = arg;
-   struct softnet_data *sd = this_cpu_ptr(_data);
+   struct flush_work *flush = container_of(work, typeof(*flush), work);
+   struct net_device *dev = flush->dev;
struct sk_buff *skb, *tmp;
+   struct softnet_data *sd;
+
+   local_bh_disable();
+   sd = this_cpu_ptr(_data);
 
+   local_irq_disable();
rps_lock(sd);
skb_queue_walk_safe(>input_pkt_queue, skb, tmp) {
if (skb->dev == dev) {
@@ -4310,6 +4320,7 @@ static void flush_backlog(void *arg)
}
}
rps_unlock(sd);
+   local_irq_enable();
 
skb_queue_walk_safe(>process_queue, skb, tmp) {
if (skb->dev == dev) {
@@ -4318,6 +4329,27 @@ static void flush_backlog(void *arg)
input_queue_head_incr(sd);
}
}
+   local_bh_enable();
+}
+
+static void flush_all_backlogs(struct net_device *dev)
+{
+   unsigned int cpu;
+
+   get_online_cpus();
+
+   for_each_online_cpu(cpu) {
+   struct flush_work *flush = per_cpu_ptr(_works, cpu);
+
+   INIT_WORK(>work, flush_backlog);
+   flush->dev = dev;
+   queue_work_on(cpu, system_highpri_wq, >work);
+   }
+
+   for_each_online_cpu(cpu)
+   flush_work(_cpu_ptr(_works, cpu)->work);
+
+   put_online_cpus();
 }
 
 static int napi_gro_complete(struct sk_buff *skb)
@@ -4805,8 +4837,9 @@ static bool sd_has_rps_ipi_waiting(struct softnet_data 
*sd)
 
 static int process_backlog(struct napi_struct *napi, int quota)
 {
-   int work = 0;
struct softnet_data *sd = container_of(napi, struct softnet_data, 
backlog);
+   bool again = true;
+   int work = 0;
 
/* Check if we have pending ipi, its better to send them now,
 * not waiting net_rx_action() end.
@@ -4817,23 +4850,20 @@ static int process_backlog(struct napi_struct *napi, 
int quota)
}
 
napi->weight = weight_p;
-   local_irq_disable();
-   while (1) {
+   while (again) {
struct sk_buff *skb;
 
while ((skb = __skb_dequeue(>process_queue))) {
rcu_read_lock();
-   local_irq_enable();
__netif_receive_skb(skb);
rcu_read_unlock();
-   local_irq_disable();
input_queue_head_incr(sd);
-   if (++work >= quota) {
-   local_irq_enable();
+   if (++work >= quota)
return work;
-   }
+
}
 
+   local_irq_disable();
rps_lock(sd);
if (skb_queue_empty(>input_pkt_queue)) {
/*
@@ -4845,16 +4875,14 @@ static int process_backlog(struct napi_struct *napi, 
int quota)
 * and we dont need an smp_mb() memory barrier.
 */
napi->state = 0;
-   rps_unlock(sd);
-
-   break;
+   again = false;
+   } else {
+   skb_queue_splice_tail_init(>input_pkt_queue,
+  

Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-27 Thread Alan Curry
Al Viro wrote:
> 
> Another thing (and if that works, it's *NOT* a proper fix - it would be
> papering over the problem, but at least it would show where to look for
> it) - try (on top of mainline) the following delta:

I tried it on top of v4.6.4 and on top of the very recent v4.7-2509-g59ebc44
from Linus, and still got corruption.

> 
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index b7de71f..0ee5995 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -734,7 +734,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb,
>   if (!chunk)
>   return 0;
>  
> - if (msg_data_left(msg) < chunk) {
> + if (iov_iter_single_seg_count(>msg_iter) < chunk) {
>   if (__skb_checksum_complete(skb))
>   goto csum_error;
>   if (skb_copy_datagram_msg(skb, hlen, msg, chunk))
> 

-- 
Alan Curry


[net-next PATCH 3/4] be2net: Avoid unnecessary firmware updates of multicast list

2016-07-27 Thread Sathya Perla
From: Sriharsha Basavapatna 

Eachtime the ndo_set_rx_mode() routine is called, the driver programs the
multicast list in the adapter without checking if there are any changes to
the list. This leads to a flood of RX_FILTER cmds when a number of vlan
interfaces are configured over the device, as the ndo_ gets
called for each vlan interface. To avoid this, we now use __dev_mc_sync()
and __dev_uc_sync() API, but only to detect if there is a change in the
mc/uc lists. Now that we use this API, the code has to be-designed to
issue these API calls for each invocation of the be_set_rx_mode() call.

Signed-off-by: Sriharsha Basavapatna 
Signed-off-by: Sathya Perla 
---
 drivers/net/ethernet/emulex/benet/be.h  |   2 +
 drivers/net/ethernet/emulex/benet/be_main.c | 169 +---
 2 files changed, 130 insertions(+), 41 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 4555e04..6868712 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -573,6 +573,8 @@ struct be_adapter {
u32 uc_macs;/* Count of secondary UC MAC programmed */
unsigned long vids[BITS_TO_LONGS(VLAN_N_VID)];
u16 vlans_added;
+   bool update_uc_list;
+   bool update_mc_list;
 
u32 beacon_state;   /* for set_phys_id */
 
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 27fc1f0..4cf4997 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1420,8 +1420,8 @@ static int be_vid_config(struct be_adapter *adapter)
u16 num = 0, i = 0;
int status = 0;
 
-   /* No need to further configure vids if in promiscuous mode */
-   if (be_in_all_promisc(adapter))
+   /* No need to change the VLAN state if the I/F is in promiscuous */
+   if (adapter->netdev->flags & IFF_PROMISC)
return 0;
 
if (adapter->vlans_added > be_max_vlans(adapter))
@@ -1483,12 +1483,6 @@ static int be_vlan_rem_vid(struct net_device *netdev, 
__be16 proto, u16 vid)
return be_vid_config(adapter);
 }
 
-static void be_clear_all_promisc(struct be_adapter *adapter)
-{
-   be_cmd_rx_filter(adapter, BE_IF_FLAGS_ALL_PROMISCUOUS, OFF);
-   adapter->if_flags &= ~BE_IF_FLAGS_ALL_PROMISCUOUS;
-}
-
 static void be_set_all_promisc(struct be_adapter *adapter)
 {
be_cmd_rx_filter(adapter, BE_IF_FLAGS_ALL_PROMISCUOUS, ON);
@@ -1507,42 +1501,144 @@ static void be_set_mc_promisc(struct be_adapter 
*adapter)
adapter->if_flags |= BE_IF_FLAGS_MCAST_PROMISCUOUS;
 }
 
-static void be_set_mc_list(struct be_adapter *adapter)
+static void be_set_uc_promisc(struct be_adapter *adapter)
 {
int status;
 
-   status = be_cmd_rx_filter(adapter, BE_IF_FLAGS_MULTICAST, ON);
+   if (adapter->if_flags & BE_IF_FLAGS_PROMISCUOUS)
+   return;
+
+   status = be_cmd_rx_filter(adapter, BE_IF_FLAGS_PROMISCUOUS, ON);
if (!status)
-   adapter->if_flags &= ~BE_IF_FLAGS_MCAST_PROMISCUOUS;
-   else
+   adapter->if_flags |= BE_IF_FLAGS_PROMISCUOUS;
+}
+
+static void be_clear_uc_promisc(struct be_adapter *adapter)
+{
+   int status;
+
+   if (!(adapter->if_flags & BE_IF_FLAGS_PROMISCUOUS))
+   return;
+
+   status = be_cmd_rx_filter(adapter, BE_IF_FLAGS_PROMISCUOUS, OFF);
+   if (!status)
+   adapter->if_flags &= ~BE_IF_FLAGS_PROMISCUOUS;
+}
+
+/* The below 2 functions are the callback args for __dev_mc_sync/dev_uc_sync().
+ * We use a single callback function for both sync and unsync. We really don't
+ * add/remove addresses through this callback. But, we use it to detect changes
+ * to the uc/mc lists. The entire uc/mc list is programmed in be_set_rx_mode().
+ */
+static int be_uc_list_update(struct net_device *netdev,
+const unsigned char *addr)
+{
+   struct be_adapter *adapter = netdev_priv(netdev);
+
+   adapter->update_uc_list = true;
+   return 0;
+}
+
+static int be_mc_list_update(struct net_device *netdev,
+const unsigned char *addr)
+{
+   struct be_adapter *adapter = netdev_priv(netdev);
+
+   adapter->update_mc_list = true;
+   return 0;
+}
+
+static void be_set_mc_list(struct be_adapter *adapter)
+{
+   struct net_device *netdev = adapter->netdev;
+   bool mc_promisc = false;
+   int status;
+
+   __dev_mc_sync(netdev, be_mc_list_update, be_mc_list_update);
+
+   if (netdev->flags & IFF_PROMISC) {
+   adapter->update_mc_list = false;
+   } else if (netdev->flags & IFF_ALLMULTI ||
+  netdev_mc_count(netdev) > be_max_mc(adapter)) {
+   /* Enable multicast promisc if num configured exceeds
+   

[net-next PATCH 4/4] be2net: replace polling with sleeping in the FW completion path

2016-07-27 Thread Sathya Perla
The ndo_set_rx_mode() and ndo_add/del_vxlan_port() calls may be called with
BHs disabled. The driver currently issues the required cmds to the FW in
these contexts and polls on completions from the FW, while BHs remain
disabled.  This can cause either packet loss or packet reception to be
delayed on that CPU.

This patch defers processing of the above cmds to a separate workqueue.
With this change, FW cmds are now issued only in process context.
Now that the FW cmds are issued only in process context, they can sleep
waiting for a completion instead of polling. All the spin_lock_bh(mcc_lock)
calls are now replaced with mutex calls.

Also a new rx_filter_lock is now needed to protect the RX filtering fields
like vids[] between be_vlan_add/rem_vid() and __be_set_rx_mode() contexts.

Signed-off-by: Sathya Perla 
---
 drivers/net/ethernet/emulex/benet/be.h  |  19 +-
 drivers/net/ethernet/emulex/benet/be_cmds.c | 202 ++---
 drivers/net/ethernet/emulex/benet/be_main.c | 264 ++--
 3 files changed, 327 insertions(+), 158 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h 
b/drivers/net/ethernet/emulex/benet/be.h
index 6868712..86780b5 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -508,6 +508,10 @@ struct be_wrb_params {
u16 lso_mss;/* MSS for LSO */
 };
 
+struct be_eth_addr {
+   unsigned char mac[ETH_ALEN];
+};
+
 struct be_adapter {
struct pci_dev *pdev;
struct net_device *netdev;
@@ -523,7 +527,7 @@ struct be_adapter {
struct be_dma_mem mbox_mem_alloced;
 
struct be_mcc_obj mcc_obj;
-   spinlock_t mcc_lock;/* For serializing mcc cmds to BE card */
+   struct mutex mcc_lock;  /* For serializing mcc cmds to BE card */
spinlock_t mcc_cq_lock;
 
u16 cfg_num_rx_irqs;/* configured via set-channels */
@@ -570,11 +574,15 @@ struct be_adapter {
int if_handle;  /* Used to configure filtering */
u32 if_flags;   /* Interface filtering flags */
u32 *pmac_id;   /* MAC addr handle used by BE card */
+   struct be_eth_addr *uc_list;/* list of uc-addrs programmed (not perm) */
u32 uc_macs;/* Count of secondary UC MAC programmed */
+   struct be_eth_addr *mc_list;/* list of mcast addrs programmed */
+   u32 mc_count;
unsigned long vids[BITS_TO_LONGS(VLAN_N_VID)];
u16 vlans_added;
bool update_uc_list;
bool update_mc_list;
+   struct mutex rx_filter_lock;/* For protecting vids[] & mc/uc_list[] */
 
u32 beacon_state;   /* for set_phys_id */
 
@@ -628,6 +636,15 @@ struct be_adapter {
u8 phy_state; /* state of sfp optics (functional, faulted, etc.,) */
 };
 
+/* Used for defered FW config cmds. Add fields to this struct as reqd */
+struct be_cmd_work {
+   struct work_struct work;
+   struct be_adapter *adapter;
+   union {
+   __be16 vxlan_port;
+   } info;
+};
+
 #define be_physfn(adapter) (!adapter->virtfn)
 #define be_virtfn(adapter) (adapter->virtfn)
 #define sriov_enabled(adapter) (adapter->flags &   \
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c 
b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 2cc1175..fa11a5a 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -571,7 +571,7 @@ int be_process_mcc(struct be_adapter *adapter)
 /* Wait till no more pending mcc requests are present */
 static int be_mcc_wait_compl(struct be_adapter *adapter)
 {
-#define mcc_timeout12 /* 12s timeout */
+#define mcc_timeout12000 /* 12s timeout */
int i, status = 0;
struct be_mcc_obj *mcc_obj = >mcc_obj;
 
@@ -585,7 +585,7 @@ static int be_mcc_wait_compl(struct be_adapter *adapter)
 
if (atomic_read(_obj->q.used) == 0)
break;
-   udelay(100);
+   usleep_range(500, 1000);
}
if (i == mcc_timeout) {
dev_err(>pdev->dev, "FW not responding\n");
@@ -863,7 +863,7 @@ static bool use_mcc(struct be_adapter *adapter)
 static int be_cmd_lock(struct be_adapter *adapter)
 {
if (use_mcc(adapter)) {
-   spin_lock_bh(>mcc_lock);
+   mutex_lock(>mcc_lock);
return 0;
} else {
return mutex_lock_interruptible(>mbox_lock);
@@ -874,7 +874,7 @@ static int be_cmd_lock(struct be_adapter *adapter)
 static void be_cmd_unlock(struct be_adapter *adapter)
 {
if (use_mcc(adapter))
-   spin_unlock_bh(>mcc_lock);
+   return mutex_unlock(>mcc_lock);
else
return mutex_unlock(>mbox_lock);
 }
@@ -1044,7 +1044,7 @@ int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 
*mac_addr,
struct be_cmd_req_mac_query *req;

[net-next PATCH 0/4] be2net: patch set

2016-07-27 Thread Sathya Perla
Hi David, pls consider applying this patch-set to the net-next tree.

Patch 1 fixes the driver to workaournd a bug in the Lancer FW in the
vlan-config cmd processing. The FW in some cases clears the vlan-promisc
setting even if it cannot apply the vlan filter. The driver has no means
of knowing if the vlan-promisc setting has been cleared or not. This
patch now explicitly clears the vlan-promisc setting via the RX-Filter cmd
and then tries to program the vlan-list.

Patch 2 fixes the failure path in the vlan vid add code. 
The driver currently removes a new vid from the adapter->vids[] array if
be_vid_config() returns an error, which occurs when there is an error in
HW/FW. This is wrong. After the HW/FW error is recovered from, we need the
complete vids[] array to re-program the vlan list.

Patch 3 fixes the ndo_set_rx_mode() path to avoid unnecessary multicast
list updates to the FW. Each time the ndo_set_rx_mode() routine is called,
the driver programs the multicast list in the adapter without checking
if there are any changes to the list. This leads to a flood of RX_FILTER
cmds when a number of vlan interfaces are configured over the device,
as the ndo_ gets called for each vlan interface. To avoid this, we now
use __dev_mc_sync() and __dev_uc_sync() API, but only to detect if there
is a change in the mc/uc lists. Now that we use this API, the code has to
be-designed to issue these API calls for each invocation of the ndo_ call.

Patch 4 replaces polling with sleeping in the FW completion path.
The ndo_set_rx_mode() and ndo_add/del_vxlan_port() calls may be called with
BHs disabled. The driver currently issues the required cmds to the FW in
these contexts and polls on completions from the FW, while BHs remain
disabled.  This can cause either packet loss or packet reception to be
delayed on that CPU.  This patch defers processing of the above cmds to a
separate workqueue. With this change, FW cmds are now issued only in process
context. Now that the FW cmds are issued only in process context, they can
sleep waiting for a completion instead of polling.

Sathya Perla (2):
  be2net: do not remove vids from driver table if be_vid_config() fails.
  be2net: replace polling with sleeping in the FW completion path

Somnath Kotur (1):
  be2net: clear vlan-promisc setting before programming the vlan list

Sriharsha Basavapatna (1):
  be2net: Avoid unnecessary firmware updates of multicast list

 drivers/net/ethernet/emulex/benet/be.h  |  21 +-
 drivers/net/ethernet/emulex/benet/be_cmds.c | 202 ++---
 drivers/net/ethernet/emulex/benet/be_main.c | 422 ++--
 3 files changed, 450 insertions(+), 195 deletions(-)

-- 
2.4.1



[net-next PATCH 2/4] be2net: do not remove vids from driver table if be_vid_config() fails.

2016-07-27 Thread Sathya Perla
The driver currently removes a new vid from the adapter->vids[] array if
be_vid_config() returns an error, which occurs when there is an error in
HW/FW. This is wrong. After the HW/FW error is recovered from, we need the
complete vids[] array to re-program the vlan list.

Signed-off-by: Sathya Perla 
---
 drivers/net/ethernet/emulex/benet/be_main.c | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 2782299..27fc1f0 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1463,13 +1463,7 @@ static int be_vlan_add_vid(struct net_device *netdev, 
__be16 proto, u16 vid)
set_bit(vid, adapter->vids);
adapter->vlans_added++;
 
-   status = be_vid_config(adapter);
-   if (status) {
-   adapter->vlans_added--;
-   clear_bit(vid, adapter->vids);
-   }
-
-   return status;
+   return be_vid_config(adapter);
 }
 
 static int be_vlan_rem_vid(struct net_device *netdev, __be16 proto, u16 vid)
-- 
2.4.1



[net-next PATCH 1/4] be2net: clear vlan-promisc setting before programming the vlan list

2016-07-27 Thread Sathya Perla
From: Somnath Kotur 

The Lancer FW has a bug due to which in some cases vlan-promisc setting
is cleared eventhough the vlan-list programming did not succeed (via
VLAN_CONFIG) cmd. The driver has no way of knowing if the vlan-promisc
mode was cleared or not when this cmd fails. To work around this issue,
this patch first explicitly clears the vlan-promisc mode via RX_FILTER
cmd and then tries to program the vlan list.
Signed-off-by: Somnath Kotur 

Signed-off-by: Sathya Perla 
---
 drivers/net/ethernet/emulex/benet/be_main.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index 1f16e73..2782299 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -1427,6 +1427,11 @@ static int be_vid_config(struct be_adapter *adapter)
if (adapter->vlans_added > be_max_vlans(adapter))
return be_set_vlan_promisc(adapter);
 
+   if (adapter->if_flags & BE_IF_FLAGS_VLAN_PROMISCUOUS) {
+   status = be_clear_vlan_promisc(adapter);
+   if (status)
+   return status;
+   }
/* Construct VLAN Table to give to HW */
for_each_set_bit(i, adapter->vids, VLAN_N_VID)
vids[num++] = cpu_to_le16(i);
@@ -1439,8 +1444,6 @@ static int be_vid_config(struct be_adapter *adapter)
addl_status(status) ==
MCC_ADDL_STATUS_INSUFFICIENT_RESOURCES)
return be_set_vlan_promisc(adapter);
-   } else if (adapter->if_flags & BE_IF_FLAGS_VLAN_PROMISCUOUS) {
-   status = be_clear_vlan_promisc(adapter);
}
return status;
 }
-- 
2.4.1



RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-27 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Wednesday, July 27, 2016 1:45
> To: Dexuan Cui 
> 
> From: Dexuan Cui 
> Date: Tue, 26 Jul 2016 07:09:41 +
> 
> > I googled "S390 hypervisor socket" but didn't find anything related (I 
> > think).
> 
> That would be net/iucv/
Thanks for the info! I'll look into this.
 
> There's also VMWare's stuff under net/vmw_vsock
> 
> It's just absolutely rediculous to make a new hypervisor socket
> interface over and over again, so much code duplication and
> replication.
I agree on this principle of avoiding duplication.
However my feeling is: IMHO different hypervisor sockets were developed
independently without coordination and the implementation details could be
so different that an enough generic framework/infrastructure is difficult,
e.g., at first glance, it looks AF_IUCV is quite different from AF_VSOCK and
this might explain why AF_VSOCK wasn't built on AF_IUCV(?).

I'll dig more into AF_IUCV, AF_VSOCK and AF_HYPERV and figure out what
is the best direction I should go.

Thanks,
-- Dexuan


gpf in p9_client_read

2016-07-27 Thread Vlastimil Babka

Hi,

I was fuzzing with trinity in qemu via the virtme script, i.e. using 
these opts:


virtme-run --kdir /home/vbabka/wrk/linus.git --rwdir 
/home/vbabka/tmp/trinity/ --dry-run --show-command --qemu-opts --smp 8 
-drive file=/home/vbabka/tmp/trinity.img,if=ide,format=raw,media=disk
/usr/bin/qemu-system-x86_64 -fsdev 
local,id=virtfs1,path=/,security_model=none,readonly -device 
virtio-9p-pci,fsdev=virtfs1,mount_tag=/dev/root -fsdev 
local,id=virtfs5,path=/usr/local/share/virtme-guest-0,security_model=none,readonly 
-device virtio-9p-pci,fsdev=virtfs5,mount_tag=virtme.guesttools -fsdev 
local,id=virtfs9,path=/home/vbabka/tmp/trinity/,security_model=none 
-device virtio-9p-pci,fsdev=virtfs9,mount_tag=virtme.initmount0 -machine 
accel=kvm:tcg -watchdog i6300esb -cpu host -parallel none -net none 
-echr 1 -serial none -chardev stdio,id=console,signal=off,mux=on -serial 
chardev:console -mon chardev=console -vga none -display none -kernel 
/home/vbabka/wrk/linus.git/arch/x86/boot/bzImage -append 
'virtme_initmount0=home/vbabka/tmp/trinity 
earlyprintk=serial,ttyS0,115200 console=ttyS0 psmouse.proto=exps 
"virtme_stty_con=rows 53 cols 189 iutf8" TERM=xterm rootfstype=9p 
rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect ro 
init=/bin/sh -- -c "mount -t tmpfs run /run;mkdir -p 
/run/virtme/guesttools;/bin/mount -n -t 9p -o 
ro,version=9p2000.L,trans=virtio,access=any virtme.guesttools 
/run/virtme/guesttools;exec /run/virtme/guesttools/virtme-init"' --smp 8 
-drive file=/home/vbabka/tmp/trinity.img,if=ide,format=raw,media=disk


So the fuzzer's working directory is on ext4 created on disk backed by 
trinity.img, but the rest is provided from host via virtio-9p, as virtme 
does.


Very deterministically after few minutes of fuzzing, I get OOMs followed 
by this:


general protection fault:  [#1] SMP
CPU: 2 PID: 17712 Comm: trinity-c10 Not tainted 4.7.0 #45
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
0[m )  178.898319] task: 880003a67000 ti: 880001984000 task.ti: 
880001984000
RIP: 0010:[]  [] 
p9_client_read+0x70/0x270

RSP: :880001987c48  EFLAGS: 00010246
RAX: 559b48bf06527000 RBX: 1000 RCX: 880001987cdc
RDX: 1000 RSI: c000 RDI: 880005e6b7c0
RBP: 880001987cc8 R08: 1000 R09: 0020
R10: 880001987cd0 R11: 0040 R12: 880005e6b780
R13: c000 R14: 880005e6b240 R15: 024201ca
FS:  7fcdb1a74700() GS:880007c8() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0040c840 CR3: 019fb000 CR4: 001406e0
DR0: 7fcdb09b3000 DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 00050602
Stack:
 880001987c90 81130f6f 880001987c58 880001987c58
 880007c91840 880005e6b7c0 880001987cdc ea00
 880001987cf0 11987ce8 81131621 ea10abc0
Call Trace:
 [] v9fs_fid_readpage+0x73/0x180 fs/9p/vfs_addr.c:69
 [] v9fs_vfs_readpage+0x10/0x20 fs/9p/vfs_addr.c:98
 [< inline >] page_cache_read mm/filemap.c:1908
 [] filemap_fault+0x335/0x4b0 mm/filemap.c:2087
 [] __do_fault+0x61/0xe0 mm/memory.c:2839
 [< inline >] do_read_fault mm/memory.c:3030
 [< inline >] do_fault mm/memory.c:3189
 [< inline >] handle_pte_fault mm/memory.c:3364
 [< inline >] __handle_mm_fault mm/memory.c:3488
 [] handle_mm_fault+0xd41/0x13d0 mm/memory.c:3517
 [] __do_page_fault+0x1c1/0x490 arch/x86/mm/fault.c:1356
 [] do_page_fault+0xc/0x10 arch/x86/mm/fault.c:1419
 [] page_fault+0x22/0x30 arch/x86/entry/entry_64.S:920
Code: 89 c2 89 45 cc 41 8b 5e 20 41 8b 44 24 04 83 e8 18 85 db 0f 84 f8 
00 00 00 39 c3 0f 87 f0 00 00 00 49 8b 44 24 10 39 d3 0f 4f da <48> 83 
78 50 00 74 0c 81 fb 00 04 00 00 0f 8f ef 00 00 00 41 8b

RIP  [] p9_client_read+0x70/0x270 net/9p/client.c:1562
 RSP 
---[ end trace 0192b39ea4ee4e9b ]---
Kernel panic - not syncing: Fatal exception
Kernel Offset: disabled
---[ end Kernel panic - not syncing: Fatal exception

All code

   0:   89 c2   mov%eax,%edx
   2:   89 45 ccmov%eax,-0x34(%rbp)
   5:   41 8b 5e 20 mov0x20(%r14),%ebx
   9:   41 8b 44 24 04  mov0x4(%r12),%eax
   e:   83 e8 18sub$0x18,%eax
  11:   85 db   test   %ebx,%ebx
  13:   0f 84 f8 00 00 00   je 0x111
  19:   39 c3   cmp%eax,%ebx
  1b:   0f 87 f0 00 00 00   ja 0x111
  21:   49 8b 44 24 10  mov0x10(%r12),%rax
  26:   39 d3   cmp%edx,%ebx
  28:   0f 4f dacmovg  %edx,%ebx
  2b:*  48 83 78 50 00  cmpq   $0x0,0x50(%rax)  <-- 
trapping instruction

  30:   74 0c   je 0x3e
  32:   81 fb 00 04 00 00   cmp$0x400,%ebx
 

RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-27 Thread Dexuan Cui
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Dexuan Cui
> Sent: Tuesday, July 26, 2016 21:22
> ...
> This is because, the design of AF_HYPERV in the Hyper-V host side is
> suboptimal IMHO (the current host side design requires the least
> change in the host side, but it makes my life difficult. :-(  It may
> change in the future, but luckily we have to live with it at present):
BTW, sorry for my typo: "luckily" should be "unluckily".

Thanks,
-- Dexuan


Re: [net-next PATCH 4/4] be2net: replace polling with sleeping in the FW completion path

2016-07-27 Thread David Miller
From: Sathya Perla 
Date: Wed, 27 Jul 2016 13:33:19 +0530

>> -Original Message-
>> From: Sathya Perla [mailto:sathya.pe...@broadcom.com]
>> > >
>> > > > @@ -4477,6 +4551,22 @@ static int be_if_create(struct be_adapter
>> > *adapter)
>> > > >u32 cap_flags = be_if_cap_flags(adapter);
>> > > >int status;
>> > > >
>> > > > +  /* alloc required memory for other filtering fields */
>> > > > +  adapter->pmac_id = kcalloc(be_max_uc(adapter),
>> > > > + sizeof(*adapter->pmac_id), 
>> > > > GFP_KERNEL);
>> > > > +  if (!adapter->pmac_id)
>> > > > +  return -ENOMEM;
>> > > > +
>> > > > +  adapter->mc_list = kcalloc(be_max_mc(adapter),
>> > > > + sizeof(*adapter->mc_list), 
>> > > > GFP_KERNEL);
>> > > > +  if (!adapter->mc_list)
>> > > > +  return -ENOMEM;
>> > > > +
>> > > > +  adapter->uc_list = kcalloc(be_max_uc(adapter),
>> > > > + sizeof(*adapter->uc_list), 
>> > > > GFP_KERNEL);
>> > > > +  if (!adapter->uc_list)
>> > > > +  return -ENOMEM;
>> > >
>> > > These error paths are leaking memory, please audit this in the rest
>> > > of
>> > your change as well.
>> >
>> > David, thanks for catching this; will fix this (and others if any) and
>> > send out a v2.
>>
>> David, actually, when be_if_create() returns an error, it falls back into
>> the error path of
>> be_setup() (the caller) and be_clear() is called. be_clear() calls
>> be_if_destroy() which frees the allocated memory.
>> So, this code will not leak memory in the error path.
> 
> David, can you accept the v1 patch set as-is, as it doesn't have the memory
> leaks that you suspected initially?

Please resubmit, thanks.


Re: [RFC PATCH v2 1/4] Documentation: DT: net: Add Xilinx gmiitorgmii converter device tree binding documentation

2016-07-27 Thread Andrew Lunn
Hi Appana

Here is roughly what i was thinking:

struct priv {
   phy_device *master;
   phy_device *slave;
   struct phy_driver *slave_drv;
};

phy_status_clone(phy_device *master, phy_device *slave)
{
master->speed = slave->speed;
master->duplex = slave->duplex;
master->pause = slave->pause;
}

read_status(struct phy_device *phydev)
{
struct priv *priv = phydev->priv;

/* Get the status from the slave, and duplicate in into the
 * master */
 slave_drv->read_status(priv->slave);
 phy_status_clone(priv->master, priv->slave);

 /* Update the gmiitorgmii with the current link parameters */
 update_link(master);
}

config_init(struct phy_device *phydev)
{
struct priv *priv = phydev->priv;

/* Configure the slave, and duplicate in into the master */
slave_drv->config_init(priv->slave);
phy_status_clone(priv->master, priv->slave);
}

struct phy_driver master_drv = {
   .read_status = read_status,
   .config_init = config_init,
   .soft_reset  = ...
   .suspend = ...
};

probe(mdio_device *mdio)
{
struct priv *priv = devm_alloc();

/* Use the phy-handle property to find the slave phy */
node_phy = of_parse_phandle(mdio->of_node, "phy", 0);
priv->slave = of_phy_find_device(node_phy);

/* Create the master phy on the control address. Use the phy
   ID from the slave. */
priv->master = phy_device_create(mdio->bus, mdio->addr,
 phy->slave->phy_id,
 phy->slave->is_c45,
 phy->slave->c45_ids);

slave_dev_drv = phydev->mdio.dev.driver;
priv->slave_drv = to_phy_driver(slave_dev_drv);
priv->master->mdio.dev.driver = master_drv;
}

It would however be nice to only have one phydev structure, so you are
not copying status and settings backwards and forwards from one to the
other all the time, and need a wrapper for every function in
phy_driver. Studying the structures a bit, that might be possible. You
would then only need to wrap the read_status(), so that when the link
speed/duplex changes, you can configure the converter as appropriate.

 Andrew


Re: [PATCH net-next v2 2/3] bnxt_en: Log a message, if enabling NTUPLE filtering fails.

2016-07-27 Thread Michael Chan
On Wed, Jul 27, 2016 at 12:31 AM, zhuyj  wrote:
>  +   if (vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) {
><-Does this happen very rarely? If so,
>
> if (unlikely(vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) { is 
> better?

This is not in the fast path, so it doesn't matter.

>
> +   netdev_warn(bp->dev,
> +   "Not enough resources to support NTUPLE
> filters, enough resources for up to %d rx rings\n",
> +   min(pf->max_rsscos_ctxs - 1, pf->max_vnics - 1));
> return false;
> +   }
>


RE: [net-next PATCH 4/4] be2net: replace polling with sleeping in the FW completion path

2016-07-27 Thread Sathya Perla
> -Original Message-
> From: Sathya Perla [mailto:sathya.pe...@broadcom.com]
> > >
> > > > @@ -4477,6 +4551,22 @@ static int be_if_create(struct be_adapter
> > *adapter)
> > > > u32 cap_flags = be_if_cap_flags(adapter);
> > > > int status;
> > > >
> > > > +   /* alloc required memory for other filtering fields */
> > > > +   adapter->pmac_id = kcalloc(be_max_uc(adapter),
> > > > +  sizeof(*adapter->pmac_id), 
> > > > GFP_KERNEL);
> > > > +   if (!adapter->pmac_id)
> > > > +   return -ENOMEM;
> > > > +
> > > > +   adapter->mc_list = kcalloc(be_max_mc(adapter),
> > > > +  sizeof(*adapter->mc_list), 
> > > > GFP_KERNEL);
> > > > +   if (!adapter->mc_list)
> > > > +   return -ENOMEM;
> > > > +
> > > > +   adapter->uc_list = kcalloc(be_max_uc(adapter),
> > > > +  sizeof(*adapter->uc_list), 
> > > > GFP_KERNEL);
> > > > +   if (!adapter->uc_list)
> > > > +   return -ENOMEM;
> > >
> > > These error paths are leaking memory, please audit this in the rest
> > > of
> > your change as well.
> >
> > David, thanks for catching this; will fix this (and others if any) and
> > send out a v2.
>
> David, actually, when be_if_create() returns an error, it falls back into
> the error path of
> be_setup() (the caller) and be_clear() is called. be_clear() calls
> be_if_destroy() which frees the allocated memory.
> So, this code will not leak memory in the error path.

David, can you accept the v1 patch set as-is, as it doesn't have the memory
leaks that you suspected initially?

thanks,
-Sathya


RE: [PATCH] qed: Add and use specific logging functions to reduce object size

2016-07-27 Thread Yuval Mintz
> Current DP_ macros generate a lot of code.
> Using functions with vsprintf extension %pV helps reduce that size.
> 
>  drivers/net/ethernet/qlogic/qed/Makefile   |  2 +-
>  drivers/net/ethernet/qlogic/qed/qed_util.c | 82
> ++
>  include/linux/qed/qed_if.h | 60 +-
>  3 files changed, 106 insertions(+), 38 deletions(-)  create mode 100644
> drivers/net/ethernet/qlogic/qed/qed_util.c

This won't compile when CONFIG_QED*=m, as qede can't link to
the new qed_{err, verbose, info, notice} functions.
That was the original reason for putting the macros in the interface
header.

Other alternatives:
- We can EXPORT_SYMBOL() the functions, although we've taken a
strain not adding such to the interface.
 - Code duplication between qed/qede [ugly].
 - Implementing this in qede via the interface functions with qed; 
But the notion of defining an interface between 2 modules passing
va_args seems [to me] like a bad one.

If you have cleaner solutions, I'd be happy to hear those.


RE: [PATCH] qed: Add and use specific logging functions to reduce object size

2016-07-27 Thread Yuval Mintz
> > Current DP_ macros generate a lot of code.
> > Using functions with vsprintf extension %pV helps reduce that size.
> 
> Yuval, I used the same KERN_ output types, but it is unusual
> that DP_INFO outputs at KERN_NOTICE.
> 
> Was that a copy/paste defect or should it be emitted at KERN_INFO and
> DP_VERBOSE be emitted at KERN_DEBUG?

I agree it's a bit odd, but it's actually by design - 
as none of these prints would ever be reached unless user explicitly enable
them [ethtool/module param], the assumption is that NOTICE is good
enough, i.e., it will prevent the need for doing additional configuration of
logging levels by the user.


Re: [PATCH net-next v2 2/3] bnxt_en: Log a message, if enabling NTUPLE filtering fails.

2016-07-27 Thread zhuyj
 +   if (vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) {
   <-Does this happen very rarely? If so,

if (unlikely(vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) { is better?

+   netdev_warn(bp->dev,
+   "Not enough resources to support NTUPLE
filters, enough resources for up to %d rx rings\n",
+   min(pf->max_rsscos_ctxs - 1, pf->max_vnics - 1));
return false;
+   }

On Tue, Jul 26, 2016 at 12:33 AM, Michael Chan
 wrote:
> From: Vasundhara Volam 
>
> If there are not enough resources to enable ntuple filtering,
> log a warning message.
>
> v2: Use single message and add missing newline.
>
> Signed-off-by: Vasundhara Volam 
> Signed-off-by: Michael Chan 
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
> b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 7de7d7a..eac0f2b 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -5790,8 +5790,12 @@ static bool bnxt_rfs_capable(struct bnxt *bp)
> return false;
>
> vnics = 1 + bp->rx_nr_rings;
> -   if (vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics)
> +   if (vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) {
> +   netdev_warn(bp->dev,
> +   "Not enough resources to support NTUPLE filters, 
> enough resources for up to %d rx rings\n",
> +   min(pf->max_rsscos_ctxs - 1, pf->max_vnics - 1));
> return false;
> +   }
>
> return true;
>  #else
> @@ -5804,7 +5808,7 @@ static netdev_features_t bnxt_fix_features(struct 
> net_device *dev,
>  {
> struct bnxt *bp = netdev_priv(dev);
>
> -   if (!bnxt_rfs_capable(bp))
> +   if ((features & NETIF_F_NTUPLE) && !bnxt_rfs_capable(bp))
> features &= ~NETIF_F_NTUPLE;
>
> /* Both CTAG and STAG VLAN accelaration on the RX side have to be
> --
> 1.8.3.1
>


RE: [PATCH v4 5/7] thunderbolt: Networking state machine

2016-07-27 Thread Levy, Amir (Jer)
On Mon, Jul 25 2016, 01:36 AM, Lukas Wunner wrote:
> On Mon, Jul 18, 2016 at 01:00:38PM +0300, Amir Levy wrote:
> > +   const unique_id_be proto_uuid =
> APPLE_THUNDERBOLT_IP_PROTOCOL_UUID;
> > +
> > +   if (memcmp(proto_uuid, hdr->apple_tbt_ip_proto_uuid,
> > +  sizeof(proto_uuid)) != 0) {
> 
> You may want to use the uuid_be data type provided by 
> instead of rolling your own, as well as the helper uuid_be_cmp() defined
> ibidem.
> 

All the messages in Thunderbolt consist BE DWORDs.
I didn't find uuid definition in the kernel that accurately describes the uuid 
structure in the messages, which is 4 BE DWORDs.
But on the other hand, all the driver does is copy/compare of uuids, and it can 
treat uuids as byte array (i.e. uuid_be).
I'll change it in the next patches.


Re: [PATCH 2/2] net: davinci_cpdma: reduce time holding chan->lock in cpdma_chan_submit

2016-07-27 Thread Uwe Kleine-König
Hello,

On Tue, Jul 26, 2016 at 05:25:58PM +0300, Grygorii Strashko wrote:
> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:
> > Allocating and preparing a dma descriptor doesn't need to happen under
> > the channel's lock. So do this before taking the channel's lock. The only
> > down side is that the dma descriptor might be allocated even though the
> > channel is about to be stopped. This is unlikely though.
> > 
> > Signed-off-by: Uwe Kleine-König 
> > ---
> >  drivers/net/ethernet/ti/davinci_cpdma.c | 38 
> > +
> >  1 file changed, 20 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c 
> > b/drivers/net/ethernet/ti/davinci_cpdma.c
> > index 5ffa04a306c6..ba3462707ae3 100644
> > --- a/drivers/net/ethernet/ti/davinci_cpdma.c
> > +++ b/drivers/net/ethernet/ti/davinci_cpdma.c
> > @@ -542,24 +542,10 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void 
> > *token, void *data,
> > u32 mode;
> > int ret = 0;
> >  
> > -   spin_lock_irqsave(>lock, flags);
> > -
> > -   if (chan->state == CPDMA_STATE_TEARDOWN) {
> > -   ret = -EINVAL;
> > -   goto unlock_ret;
> > -   }
> > -
> > -   if (chan->count >= chan->desc_num)  {
> > -   chan->stats.desc_alloc_fail++;
> > -   ret = -ENOMEM;
> > -   goto unlock_ret;
> > -   }
> 
> I'm not sure this is right thing to do. This check is expected to be strict
> and means "channel has exhausted the available descriptors, so further descs 
> allocation does not allowed".

I developed this patch basing on a 4.4 kernel which doesn't have
742fb20fd4c7 ("net: ethernet: ti: cpdma: switch to use genalloc"). There
my patch is more obviously correct. As currently chan->count is
protected by chan->lock we must hold the lock for this check. If a
failing check means we must not call cpdma_desc_alloc in the first
place, that's bad.

But I'm not sure this is the case here. After all cpdma_desc_alloc
doesn't do anything relevant for the hardware, right?

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: problem with MPLS and TSO/GSO

2016-07-27 Thread Lennert Buytenhek
On Wed, Jul 27, 2016 at 03:02:24PM +0800, zhuyj wrote:

> On ubuntu16.04 server 64 bit
> The attached script is run, the following will appear.
> 
> Error: either "to" is duplicate, or "encap" is a garbage.

Looks like your installed iproute2 package doesn't grok MPLS.


Re: [PATCH 0/2] net: davinci_cpdma: reduce latency on -rt

2016-07-27 Thread Uwe Kleine-König
On Tue, Jul 26, 2016 at 05:36:49PM +0300, Grygorii Strashko wrote:
> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote:
> >Hello,
> >
> >these patches are based on next-20160726. I didn't check yet how latency
> >improves by using these patches, but even if the improvment is small,
> >it's still a good idea to have them.
> 
> Sry, but how this will affect on -RT? This is not a raw locks, so
> they will be converted to rt-mutexes which are sleepable.
> Or I've missed smth?

They are still locks after all. On -rt I saw for the relevant
application:

  send package |
take lock  |
write pckt to hw   |
   | rcv irq
   |   take lock
   | schedule
drop lock  | 
  schedule |
   |   get pckt from hw
   |   drop lock

So reducing the time a lock is taken reduces the chances that the lock
is contended for another thread which results in extra context switches.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


[PATCH net-next] neigh: allow admin to set NUD_STALE

2016-07-27 Thread Julian Anastasov
Admin should be able to set any state. Currently, this fails
when lladdr is not changed and state is changed from
NUD_CONNECTED to NUD_STALE:

ip neigh add 192.168.8.1 lladdr 00:11:22:33:44:55 nud perm dev wlan0
ip neigh show to 192.168.8.1
192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT
ip neigh change 192.168.8.1 lladdr 00:11:22:33:44:55 nud stale dev wlan0
ip neigh show to 192.168.8.1
192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT

Problem may be from 2.1.X days.

Signed-off-by: Julian Anastasov 
---
 net/core/neighbour.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index cf26e04c4..2ae929f 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -1148,7 +1148,8 @@ int neigh_update(struct neighbour *neigh, const u8 
*lladdr, u8 new,
} else
goto out;
} else {
-   if (lladdr == neigh->ha && new == NUD_STALE)
+   if (lladdr == neigh->ha && new == NUD_STALE &&
+   !(flags & NEIGH_UPDATE_F_ADMIN))
new = old;
}
}
-- 
1.9.3



Re: linux-next: build warning after merge of the net-next tree

2016-07-27 Thread Iyappan Subramanian
On Tue, Jul 26, 2016 at 11:35 PM, Stephen Rothwell  
wrote:
> Hi David,
>
> On Tue, 26 Jul 2016 23:19:59 -0700 (PDT) David Miller  
> wrote:
>>
>> Fixed thusly:
>>
>> 
>> From 36232012344b8db67052432742deaf17f82e70e6 Mon Sep 17 00:00:00 2001
>> From: "David S. Miller" 
>> Date: Tue, 26 Jul 2016 23:19:29 -0700
>> Subject: [PATCH] xgene: Fix build warning with ACPI disabled.
>>
>> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c: In function 
>> 'xgene_enet_phy_connect':
>> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c:759:22: warning: unused 
>> variable 'adev' [-Wunused-variable]
>>
>> Fixes: 8089a96f601b ("drivers: net: xgene: Add backward compatibility")
>> Reported-by: Stephen Rothwell 
>> Signed-off-by: David S. Miller 
>> ---
>>  drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
>> b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
>> index 8a2a221..7714b7d 100644
>> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
>> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
>> @@ -756,7 +756,6 @@ int xgene_enet_phy_connect(struct net_device *ndev)
>>   struct device_node *np;
>>   struct phy_device *phy_dev;
>>   struct device *dev = >pdev->dev;
>> - struct acpi_device *adev;
>>   int i;
>>
>>   if (dev->of_node) {
>> @@ -781,7 +780,7 @@ int xgene_enet_phy_connect(struct net_device *ndev)
>>   pdata->phy_dev = phy_dev;
>>   } else {
>>  #ifdef CONFIG_ACPI
>> - adev = acpi_phy_find_device(dev);
>> + struct acpi_device *adev = acpi_phy_find_device(dev);
>>   if (adev)
>>   pdata->phy_dev =  adev->driver_data;
>
> Looks good to me, thanks.

Thanks David, for the quick fix.

>
> --
> Cheers,
> Stephen Rothwell


Re: [PATCH] xfrm: use printk instead of WARN for bad policy reporting

2016-07-27 Thread Vegard Nossum

On 07/27/2016 08:31 AM, Herbert Xu wrote:

On Wed, Jul 27, 2016 at 08:20:57AM +0200, Vegard Nossum wrote:


Here's another patch to remove that too.

I don't actually *use* this code myself and I feel the justification
I've given for removing the WARN to be a bit weak, so if you don't take
the patch I'll just keep it in my local tree to keep it from showing up
again during fuzzing.


Please just kill the whole else clause.  For soft policy expires
we simply need to relay a message to the KM and nothing more.


Try #2 :-)


Vegard
>From e5111e4dcd0e0c0990d3f4bba0ba0bc9d0b40bae Mon Sep 17 00:00:00 2001
From: Vegard Nossum 
Date: Wed, 27 Jul 2016 08:13:14 +0200
Subject: [PATCH] xfrm: get rid of another incorrect WARN

During fuzzing I regularly run into this WARN(). According to Herbert Xu,
this "certainly shouldn't be a WARN, it probably shouldn't print anything
either".

Cc: Stephen Hemminger 
Cc: Steffen Klassert 
Cc: Herbert Xu 
Signed-off-by: Vegard Nossum 
---
 net/xfrm/xfrm_user.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2477b24..10e4e26 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2051,9 +2051,6 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 	if (up->hard) {
 		xfrm_policy_delete(xp, p->dir);
 		xfrm_audit_policy_delete(xp, 1, true);
-	} else {
-		// reset the timers here?
-		WARN(1, "Don't know what to do with soft policy expire\n");
 	}
 	km_policy_expired(xp, p->dir, up->hard, nlh->nlmsg_pid);
 
-- 
1.9.1



Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)

2016-07-27 Thread Kalle Valo
alexmcwhir...@triadic.us writes:

> On 2016-07-26 09:59, Christian Lamparter wrote:
>> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2
>> devices.
>> I did not see any corruptions in any of the tests though. Can you
>> tell me
>> something about your wireless network too? I would like to know what
>> router
>> and firmware are you using? Also important: what's your wireless
>> configuration?
>> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...)
>>
>> Probably the quickest and easiest way to get that information is by
>> running
>> the following commands as root, when you are connected to your wifi
>> network
>> and post the results:
>> # iw dev wlan0 link
>> # iw dev wlan0 scan dump
>>
>> (You can of course remove your device's MACs, but please do it
>> consistently).
>>
>> Regards,
>> Christian
>
> I wonder if this is something that is only affecting certain NIC's?
> For example, it see this issue on sun4u boxes with tigon3 and hme
> interfaces. But on my sun4v boxes that have e1000 interfaces
> everything works fine.

Kernel configuration might also make a difference. Trying to find an
Kconfig option which affects this could be useful.

-- 
Kalle Valo


Re: linux-next: build warning after merge of the net-next tree

2016-07-27 Thread Stephen Rothwell
Hi David,

On Tue, 26 Jul 2016 23:19:59 -0700 (PDT) David Miller  
wrote:
>
> Fixed thusly:
> 
> 
> From 36232012344b8db67052432742deaf17f82e70e6 Mon Sep 17 00:00:00 2001
> From: "David S. Miller" 
> Date: Tue, 26 Jul 2016 23:19:29 -0700
> Subject: [PATCH] xgene: Fix build warning with ACPI disabled.
> 
> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c: In function 
> 'xgene_enet_phy_connect':
> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c:759:22: warning: unused 
> variable 'adev' [-Wunused-variable]
> 
> Fixes: 8089a96f601b ("drivers: net: xgene: Add backward compatibility")
> Reported-by: Stephen Rothwell 
> Signed-off-by: David S. Miller 
> ---
>  drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
> b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
> index 8a2a221..7714b7d 100644
> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
> @@ -756,7 +756,6 @@ int xgene_enet_phy_connect(struct net_device *ndev)
>   struct device_node *np;
>   struct phy_device *phy_dev;
>   struct device *dev = >pdev->dev;
> - struct acpi_device *adev;
>   int i;
>  
>   if (dev->of_node) {
> @@ -781,7 +780,7 @@ int xgene_enet_phy_connect(struct net_device *ndev)
>   pdata->phy_dev = phy_dev;
>   } else {
>  #ifdef CONFIG_ACPI
> - adev = acpi_phy_find_device(dev);
> + struct acpi_device *adev = acpi_phy_find_device(dev);
>   if (adev)
>   pdata->phy_dev =  adev->driver_data;

Looks good to me, thanks.

-- 
Cheers,
Stephen Rothwell


Re: [PATCH] xfrm: use printk instead of WARN for bad policy reporting

2016-07-27 Thread Herbert Xu
On Wed, Jul 27, 2016 at 08:20:57AM +0200, Vegard Nossum wrote:
>
> Here's another patch to remove that too.
> 
> I don't actually *use* this code myself and I feel the justification
> I've given for removing the WARN to be a bit weak, so if you don't take
> the patch I'll just keep it in my local tree to keep it from showing up
> again during fuzzing.

Please just kill the whole else clause.  For soft policy expires
we simply need to relay a message to the KM and nothing more.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] xfrm: use printk instead of WARN for bad policy reporting

2016-07-27 Thread Vegard Nossum

On 07/27/2016 05:01 AM, Herbert Xu wrote:

On Wed, Jul 20, 2016 at 01:53:12PM +0200, Vegard Nossum wrote:

Just FYI I'm also running into the

// reset the timers here?
WARN(1, "Don't know what to do with soft policy expire\n");

in xfrm_add_pol_expire() from the same commit, but that looks
potentially somewhat more serious (or at least it looks like we might
want to do some sort of cleaning up), so I won't touch it for now.


It certainly shouldn't be a WARN, it probably shouldn't print
anything either.


Here's another patch to remove that too.

I don't actually *use* this code myself and I feel the justification
I've given for removing the WARN to be a bit weak, so if you don't take
the patch I'll just keep it in my local tree to keep it from showing up
again during fuzzing.

Thanks,


Vegard
>From 5b302eb4c188064a69176a901c2bec3e19440c03 Mon Sep 17 00:00:00 2001
From: Vegard Nossum 
Date: Wed, 27 Jul 2016 08:13:14 +0200
Subject: [PATCH] xfrm: get rid of another incorrect WARN

During fuzzing I regularly run into this WARN(). According to Herbert Xu,
this "certainly shouldn't be a WARN, it probably shouldn't print anything
either".

Cc: Stephen Hemminger 
Cc: Steffen Klassert 
Cc: Herbert Xu 
Signed-off-by: Vegard Nossum 
---
 net/xfrm/xfrm_user.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 2477b24..a4e44f7 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -2053,7 +2053,6 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh,
 		xfrm_audit_policy_delete(xp, 1, true);
 	} else {
 		// reset the timers here?
-		WARN(1, "Don't know what to do with soft policy expire\n");
 	}
 	km_policy_expired(xp, p->dir, up->hard, nlh->nlmsg_pid);
 
-- 
1.9.1



Re: [PATCH net] be2net: perform temperature query in adapter regardless of its interface state

2016-07-27 Thread David Miller
From: "Guilherme G. Piccoli" 
Date: Tue, 26 Jul 2016 17:39:42 -0300

> The be2net driver performs fw temperature queries on be_worker() routine,
> which is executed each second for each be_adapter. There is a frequency
> threshold to avoid fw query to happens at each call to be_worker();
> instead, currently a fw query occurs once in 64 runs of the procedure.
> 
> Nevertheless, this fw temperature query is invoked only for adapters which
> interface is up, so we can see I/O errors on read of hwmon counters from
> userspace (from tools like lm-sensors) in case we have adapters' functions
> which interface is down.
> 
> This patch moves the fw query code to be invoked even if interface is down.
> No functional changes were introduced.
> 
> Signed-off-by: Guilherme G. Piccoli 

Applied.


  1   2   >