Re: [RFC PATCH] net: flush the softnet backlog in process context
On Wed, 2016-07-27 at 12:37 +0200, Paolo Abeni wrote: > Currently in process_backlog(), the process_queue dequeuing is > performed with local IRQ disabled, to protect against > flush_backlog(), which runs in hard IRQ context. Thanks guys, patch looks fine to me.
Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
On Wed, 2016-07-27 at 14:38 -0700, Jeff Kirsher wrote: > On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote: > > Could you try this ? > > > > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c > > b/drivers/net/ethernet/intel/e1000/e1000_main.c > > index > > f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a > > 938b3820b 100644 > > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c > > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c > > @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device > > *netdev) > > { > > struct e1000_adapter *adapter = netdev_priv(netdev); > > > > - disable_irq(adapter->pdev->irq); > > - e1000_intr(adapter->pdev->irq, netdev); > > - enable_irq(adapter->pdev->irq); > > + if (napi_schedule_prep(>napi)) { > > + adapter->total_tx_bytes = 0; > > + adapter->total_tx_packets = 0; > > + adapter->total_rx_bytes = 0; > > + adapter->total_rx_packets = 0; > > + __napi_schedule(>napi); > > + } > > } > > #endif > > > > Since this fixes the issue Fengguang saw, will you be submitting a formal > patch Eric? (please) I can get this queued up for Dave's net tree as soon > as I receive the formal patch. I would prefer having a definitive advice from Thomas Gleixner and/or others if disable_irq() is forbidden from IRQ path. As I said, about all netpoll() methods in net drivers use disable_irq() so a lot of patches would be needed. disable_irq() should then test this condition earlier, so that we can detect potential bug, even if the IRQ is not (yet) threaded. Thanks.
Re: [PATCH RFC 1/1] xfrm: dst lookup doesn't account for fwmark
On Fri, Jul 22, 2016 at 03:50:43PM -0600, Doug Applegate wrote: > If route table includes routing based on fwmark, xfrm will not take it > into account when routing ipsec traffic. We address this issue by adding > fwmark information before calling route lookup. > > Signed-off-by: Doug Applegate> --- > include/net/xfrm.h | 3 ++- > net/ipv4/xfrm4_policy.c | 15 +++ > net/ipv6/xfrm6_policy.c | 11 +-- > net/xfrm/xfrm_policy.c | 28 +++- > 4 files changed, 45 insertions(+), 12 deletions(-) > > diff --git a/include/net/xfrm.h b/include/net/xfrm.h > index adfebd6..13e80d1 100644 > --- a/include/net/xfrm.h > +++ b/include/net/xfrm.h > @@ -282,7 +282,7 @@ struct xfrm_policy_afinfo { > struct dst_ops *dst_ops; > void (*garbage_collect)(struct net *net); > struct dst_entry *(*dst_lookup)(struct net *net, > - int tos, int oif, > + int tos, int oif, int mark, > const xfrm_address_t *saddr, > const xfrm_address_t *daddr); > int (*get_saddr)(struct net *net, int oif, > @@ -292,6 +292,7 @@ struct xfrm_policy_afinfo { >struct flowi *fl, >int reverse); > int (*get_tos)(const struct flowi *fl); > + int (*get_mark)(const struct flowi *fl); > int (*init_path)(struct xfrm_dst *path, > struct dst_entry *dst, > int nfheader_len); > diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c > index 7b0edb3..5dcd411 100644 > --- a/net/ipv4/xfrm4_policy.c > +++ b/net/ipv4/xfrm4_policy.c > @@ -20,7 +20,7 @@ > static struct xfrm_policy_afinfo xfrm4_policy_afinfo; > > static struct dst_entry *__xfrm4_dst_lookup(struct net *net, struct > flowi4 *fl4, > -int tos, int oif, > +int tos, int oif, int mark, > const xfrm_address_t *saddr, > const xfrm_address_t *daddr) > { > @@ -30,6 +30,7 @@ static struct dst_entry *__xfrm4_dst_lookup(struct > net *net, struct flowi4 *fl4, > fl4->daddr = daddr->a4; > fl4->flowi4_tos = tos; > fl4->flowi4_oif = oif; > + fl4->flowi4_mark = mark; > if (saddr) > fl4->saddr = saddr->a4; > > @@ -42,13 +43,13 @@ static struct dst_entry *__xfrm4_dst_lookup(struct > net *net, struct flowi4 *fl4, > return ERR_CAST(rt); > } > > -static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, int oif, > +static struct dst_entry *xfrm4_dst_lookup(struct net *net, int tos, > int oif, int mark, >const xfrm_address_t *saddr, >const xfrm_address_t *daddr) We may consider to setup the flowi earlier instead of adding yet another parameter to this function. This would have also the advantage that we don't need to change that function whenever we find out that we need more flow informations.
Re: [PATCH RFC 0/1] xfrm: dst lookup doesn't account for fwmark
On Fri, Jul 22, 2016 at 03:50:30PM -0600, Doug Applegate wrote: > I ran into an issue trying to route outgoing ipsec traffic from an > ipsec responder hub that uses fwmark to route out a specific > interface. The fwmark points to a route table that contains a default > route out a specific interface. The fwmark is applied based on > incoming interface of incoming traffic. I noticed that the mark was > not being used when doing a route lookup. Is there any reason why this > is? If not does this patch look reasonable? I just followed similar > logic to how TOS was being used before route-lookup. I've only tested > on an earlier kernel (3.14) and it works. Thanks! > > Here's an example: > > The ipsec policy is simple, just a tunnel from 192.168.0.0/24 <--> > 192.168.1.0/24 the ipsec hub is at 35.0.0.1 but when traffic is being > responded too, instead of the expected interface (eth2) we end it out > a different interface (eth0) > > route table: > # ip rule list > 0: from all lookup local > 32766: from all lookup main > 32767: from all lookup default > 4: from all fwmark 0x2 lookup 3 > 4: from all fwmark 0x4 lookup 4 > 5: from all lookup 3 > > # ip route > 25.0.0.0/30 dev eth0 src 25.0.0.1 > 35.0.0.0/30 dev eth2 src 35.0.0.1 > 192.168.1.0/24 dev lan1 src 192.168.1.1 > > # ip route list table 4 > default via 35.0.0.2 dev eth2 onlink > # ip route list table 3 > default via 25.0.0.2 dev eth0 onlink Can you provide some more information about your usecase? Mapping between interfaces and IP addresses, policies and states would be good. Btw. why do you need to insert the routes as 'onlink'? This disables some checks if this route is valid.
Re: [RFC PATCH] xfrm: Add option to reset oif in xfrm lookup
On Mon, Jul 25, 2016 at 06:34:32PM -0600, Subash Abhinov Kasiviswanathan wrote: > We are seeing incorrect routing when tunneling packets over an > interface and sending it over another interface. This scenario > worked on 3.18 (and earlier) and failed on 4.4 kernel. The rules > / routes / policies were the same across kernels. > > Commit 42a7b32b73d6 ("xfrm: Add oif to dst lookups") allowed > preservation of the oif from a raw packet to a transformed packet. > This causes issues with forwarding scenarios where the > existing oif causes an incorrect route lookup. > > Create a new sysctl which resets oif in xfrm policy. Default value > is 0 which means that oif is preserved on transform. Please don't try to workaround a bug with a sysctl. If we have a bug here, we should fix it. Choosing between bug A and bug B with a sysctl is not what we are doing ;)
Re: [PATCH] xfrm: use printk instead of WARN for bad policy reporting
On Wed, Jul 27, 2016 at 08:44:15AM +0200, Vegard Nossum wrote: > On 07/27/2016 08:31 AM, Herbert Xu wrote: > >On Wed, Jul 27, 2016 at 08:20:57AM +0200, Vegard Nossum wrote: > >> > >>Here's another patch to remove that too. > >> > >>I don't actually *use* this code myself and I feel the justification > >>I've given for removing the WARN to be a bit weak, so if you don't take > >>the patch I'll just keep it in my local tree to keep it from showing up > >>again during fuzzing. > > > >Please just kill the whole else clause. For soft policy expires > >we simply need to relay a message to the KM and nothing more. > > Try #2 :-) > > > Vegard > >From e5111e4dcd0e0c0990d3f4bba0ba0bc9d0b40bae Mon Sep 17 00:00:00 2001 > From: Vegard Nossum> Date: Wed, 27 Jul 2016 08:13:14 +0200 > Subject: [PATCH] xfrm: get rid of another incorrect WARN > > During fuzzing I regularly run into this WARN(). According to Herbert Xu, > this "certainly shouldn't be a WARN, it probably shouldn't print anything > either". > > Cc: Stephen Hemminger > Cc: Steffen Klassert > Cc: Herbert Xu > Signed-off-by: Vegard Nossum Also applied to the ipsec tree, thanks a lot!
Re: [PATCH] xfrm: use printk instead of WARN for bad policy reporting
On Wed, Jul 27, 2016 at 08:03:18AM +0200, Vegard Nossum wrote: > Subject: [PATCH] xfrm: get rid of incorrect WARN > > AFAICT this message is just printed whenever input validation fails. > This is a normal failure and we shouldn't be dumping the stack over it. > > Looks like it was originally a printk that was maybe incorrectly > upgraded to a WARN: > > commit 62db5cfd70b1ef53aa21f144a806fe3b78c84fab > Author: stephen hemminger> Date: Wed May 12 06:37:06 2010 + > > xfrm: add severity to printk > > Cc: Stephen Hemminger > Cc: Steffen Klassert > Signed-off-by: Vegard Nossum Applied to the ipsec tree, thanks!
[PATCH -next] drivers: net: phy: xgene: Remove redundant dev_err call in xgene_mdio_probe()
There is a error message within devm_ioremap_resource already, so remove the dev_err call to avoid redundant error message. Signed-off-by: Wei Yongjun--- drivers/net/phy/mdio-xgene.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/net/phy/mdio-xgene.c b/drivers/net/phy/mdio-xgene.c index d94a978..7756748 100644 --- a/drivers/net/phy/mdio-xgene.c +++ b/drivers/net/phy/mdio-xgene.c @@ -345,10 +345,8 @@ static int xgene_mdio_probe(struct platform_device *pdev) res = platform_get_resource(pdev, IORESOURCE_MEM, 0); csr_base = devm_ioremap_resource(dev, res); - if (IS_ERR(csr_base)) { - dev_err(dev, "Unable to retrieve mac CSR region\n"); + if (IS_ERR(csr_base)) return PTR_ERR(csr_base); - } pdata->mac_csr_addr = csr_base; pdata->mdio_csr_addr = csr_base + BLOCK_XG_MDIO_CSR_OFFSET; pdata->diag_csr_addr = csr_base + BLOCK_DIAG_CSR_OFFSET;
[PATCH -next] tipc: fix imbalance read_unlock_bh in __tipc_nl_add_monitor()
In the error handling case of nla_nest_start() failed read_unlock_bh() is called to unlock a lock that had not been taken yet. sparse warns about the context imbalance as the following: net/tipc/monitor.c:799:23: warning: context imbalance in '__tipc_nl_add_monitor' - different lock contexts for basic block Fixes: cf6f7e1d5109 ('tipc: dump monitor attributes') Signed-off-by: Wei Yongjun--- net/tipc/monitor.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tipc/monitor.c b/net/tipc/monitor.c index be70a57..b62caa1 100644 --- a/net/tipc/monitor.c +++ b/net/tipc/monitor.c @@ -794,10 +794,10 @@ int __tipc_nl_add_monitor(struct net *net, struct tipc_nl_msg *msg, return 0; attr_msg_full: + read_unlock_bh(>lock); nla_nest_cancel(msg->skb, attrs); msg_full: genlmsg_cancel(msg->skb, hdr); - read_unlock_bh(>lock); return -EMSGSIZE; }
Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
On Wed, Jul 27, 2016 at 08:26:48PM -0400, alexmcwhir...@triadic.us wrote: > I'm going to go ahead and say this is where my issue and the op's issue > begin to branch apart from one another. He's seeing this on all incoming > data, whereas i am only seeing it on ssl data and not on sun4v. > > At this point i would say data from my issue is only going to cloud this > issue as they seem to be two completely different issues revolving around > the same commit. If i come across any relevant data for x86_64 ill be sure > to post it if this isn't resolved by then, but for now i'm going to refrain > from submitting anything sparc related. Which just might mean that we have *three* issues here - (1) buggered __copy_to_user_inatomic() (and friends) on some sparcs (2) your ssl-only corruption (3) Alan's x86_64 corruption on plain TCP read - no ssl *or* sparc anywhere, and no multi-segment recvmsg(). Which would strongly argue in favour of some kind of copy_page_to_iter() breakage triggered when handling a fragmented skb, as in (1). Except that I don't see anything similar in x86_64 uaccess primitives...
Re: [PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash
Inline... On 07/27/2016 06:04 PM, Tom Herbert wrote: On Wed, Jul 27, 2016 at 4:42 PM,wrote: From: Gao Feng The PPTP is encapsulated by GRE header with that GRE_VERSION bits must contain one. But current GRE RPS needs the GRE_VERSION must be zero. So RPS does not work for PPTP traffic. In my test environment, there are four MIPS cores, and all traffic are passed through by PPTP. As a result, only one core is 100% busy while other three cores are very idle. After this patch, the usage of four cores are balanced well. Signed-off-by: Gao Feng --- v1: Initial patch include/uapi/linux/if_tunnel.h | 5 +- net/core/flow_dissector.c | 138 ++--- 2 files changed, 92 insertions(+), 51 deletions(-) diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h index 1046f55..dda4e4b 100644 --- a/include/uapi/linux/if_tunnel.h +++ b/include/uapi/linux/if_tunnel.h @@ -24,9 +24,12 @@ #define GRE_SEQ__cpu_to_be16(0x1000) #define GRE_STRICT __cpu_to_be16(0x0800) #define GRE_REC__cpu_to_be16(0x0700) -#define GRE_FLAGS __cpu_to_be16(0x00F8) +#define GRE_ACK__cpu_to_be16(0x0080) +#define GRE_FLAGS __cpu_to_be16(0x0078) #define GRE_VERSION__cpu_to_be16(0x0007) +#define GRE_PROTO_PPP __cpu_to_be16(0x880b) + struct ip_tunnel_parm { charname[IFNAMSIZ]; int link; diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 61ad43f..d95e060 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -346,63 +346,101 @@ ip_proto_again: hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, hlen, &_hdr); if (!hdr) goto out_bad; - /* -* Only look inside GRE if version zero and no -* routing -*/ - if (hdr->flags & (GRE_VERSION | GRE_ROUTING)) - break; - proto = hdr->proto; - nhoff += 4; - if (hdr->flags & GRE_CSUM) + /* + * Only look inside GRE if version zero and no + * routing This comment is no longer true, GRE version 1 is being handled. + */ + if (!(hdr->flags & (GRE_VERSION | GRE_ROUTING))) { + proto = hdr->proto; nhoff += 4; - if (hdr->flags & GRE_KEY) { - const __be32 *keyid; - __be32 _keyid; + if (hdr->flags & GRE_CSUM) + nhoff += 4; + if (hdr->flags & GRE_KEY) { + const __be32 *keyid; + __be32 _keyid; + + keyid = __skb_header_pointer(skb, nhoff, sizeof(_keyid), +data, hlen, &_keyid); + + if (!keyid) + goto out_bad; + + if (dissector_uses_key(flow_dissector, + FLOW_DISSECTOR_KEY_GRE_KEYID)) { + key_keyid = skb_flow_dissector_target(flow_dissector, + FLOW_DISSECTOR_KEY_GRE_KEYID, + target_container); + key_keyid->keyid = *keyid; + } + nhoff += 4; + } + if (hdr->flags & GRE_SEQ) + nhoff += 4; + if (proto == htons(ETH_P_TEB)) { + const struct ethhdr *eth; + struct ethhdr _eth; + + eth = __skb_header_pointer(skb, nhoff, + sizeof(_eth), + data, hlen, &_eth); + if (!eth) + goto out_bad; + proto = eth->h_proto; + nhoff += sizeof(*eth); + + /* Cap headers that we access via pointers at the +* end of the Ethernet header as our maximum alignment +* at that point is only 2 bytes. +*/ + if (NET_IP_ALIGN) + hlen = nhoff; + } - keyid = __skb_header_pointer(skb,
Re: [PATCH 06/15] ethernet: hisilicon: hns: hns_dsaf_mac: add missing of_node_put after calling of_parse_phandle
在 2016/7/27 10:20, Peter Chen 写道: > of_node_put needs to be called when the device node which is got > from of_parse_phandle has finished using. > > Signed-off-by: Peter Chen> --- > drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c > b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c > index 3fb87e2..18d72ea 100644 > --- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c > +++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c > @@ -786,6 +786,7 @@ static int hns_mac_get_info(struct hns_mac_cb *mac_cb) > np = of_parse_phandle(mac_cb->dev->of_node, "phy-handle", > mac_cb->mac_id); > mac_cb->phy_dev = of_phy_find_device(np); > + of_node_put(np); > if (mac_cb->phy_dev) { > /* refcount is held by of_phy_find_device() >* if the phy_dev is found np is accessed in case of of_phy_find_device() returns a no null value, so it has to be moved after the dev_dbg log. > @@ -804,6 +805,7 @@ static int hns_mac_get_info(struct hns_mac_cb *mac_cb) > np = of_parse_phandle(to_of_node(mac_cb->fw_port), > "phy-handle", 0); > mac_cb->phy_dev = of_phy_find_device(np); > + of_node_put(np); > if (mac_cb->phy_dev) { > /* refcount is held by of_phy_find_device() >* if the phy_dev is found np is accessed in case of of_phy_find_device() returns a no null value, so it has to be moved after the dev_dbg log. Thanks, Yisen > @@ -813,9 +815,10 @@ static int hns_mac_get_info(struct hns_mac_cb *mac_cb) > mac_cb->mac_id, np->name); > } > > - syscon = syscon_node_to_regmap( > - of_parse_phandle(to_of_node(mac_cb->fw_port), > - "serdes-syscon", 0)); > + np = of_parse_phandle(to_of_node(mac_cb->fw_port), > + "serdes-syscon", 0); > + syscon = syscon_node_to_regmap(np); > + of_node_put(np); > if (IS_ERR_OR_NULL(syscon)) { > dev_err(mac_cb->dev, "serdes-syscon is needed!\n"); > return -EINVAL; >
Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
On 2016-07-27 20:31, Al Viro wrote: On Wed, Jul 27, 2016 at 04:45:43PM -0700, David Miller wrote: > I highly expect both my issue and OP's issue to revolve not around > commit e5a4b0bb803b specifically, but around other code that no longer > behaves as expected because of it. Indeed, and that fault address rounding bug occurs two other times in arch/sparc/lib/user_fixup.c The mentioned patchwork patch should fix the bug and I'll get that into my sparc tree, merged, and queued up for -stable ASAP. Plausible for sparc, but I don't see similar __copy_to_user_inatomic() bugs in case of x86_64... I'm going to go ahead and say this is where my issue and the op's issue begin to branch apart from one another. He's seeing this on all incoming data, whereas i am only seeing it on ssl data and not on sun4v. At this point i would say data from my issue is only going to cloud this issue as they seem to be two completely different issues revolving around the same commit. If i come across any relevant data for x86_64 ill be sure to post it if this isn't resolved by then, but for now i'm going to refrain from submitting anything sparc related.
Re: [PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash
Hi Tom, I think maybe it is not good idea to consolidate these codes with the version 0 path. Because I need to check the flags strictly like this "!(hdr->flags & (GRE_CSUM | GRE_ROUTING | GRE_STRICT | GRE_REC | GRE_FLAGS))". These flag must be zero with PPTP GRE according to the RFC, but original GRE only ask the "GRE_VERSION|GRE_ROUTING" must be zero. And the inner proto structures are different between with the original GRE and PPTP GRE. So I think it would be more awkful if consolidate these codes, unless we don't check the flags strictly. On Thu, Jul 28, 2016 at 8:24 AM, Feng Gaowrote: > Thanks Tom, I append some inline comments. > > BTW, forgive me reply the email by gmail because the original mail > server doesn't support kernel email format. > I will send the update later. > > Best Regards > Feng > > On Thu, Jul 28, 2016 at 8:04 AM, Tom Herbert wrote: >> >> On Wed, Jul 27, 2016 at 4:42 PM, wrote: >> > From: Gao Feng >> > >> > The PPTP is encapsulated by GRE header with that GRE_VERSION bits >> > must contain one. But current GRE RPS needs the GRE_VERSION must be >> > zero. So RPS does not work for PPTP traffic. >> > >> > In my test environment, there are four MIPS cores, and all traffic >> > are passed through by PPTP. As a result, only one core is 100% busy >> > while other three cores are very idle. After this patch, the usage >> > of four cores are balanced well. >> > >> > Signed-off-by: Gao Feng >> > --- >> > v1: Initial patch >> > >> > include/uapi/linux/if_tunnel.h | 5 +- >> > net/core/flow_dissector.c | 138 >> > ++--- >> > 2 files changed, 92 insertions(+), 51 deletions(-) >> > >> > diff --git a/include/uapi/linux/if_tunnel.h >> > b/include/uapi/linux/if_tunnel.h >> > index 1046f55..dda4e4b 100644 >> > --- a/include/uapi/linux/if_tunnel.h >> > +++ b/include/uapi/linux/if_tunnel.h >> > @@ -24,9 +24,12 @@ >> > #define GRE_SEQ__cpu_to_be16(0x1000) >> > #define GRE_STRICT __cpu_to_be16(0x0800) >> > #define GRE_REC__cpu_to_be16(0x0700) >> > -#define GRE_FLAGS __cpu_to_be16(0x00F8) >> > +#define GRE_ACK__cpu_to_be16(0x0080) >> > +#define GRE_FLAGS __cpu_to_be16(0x0078) >> > #define GRE_VERSION__cpu_to_be16(0x0007) >> > >> > +#define GRE_PROTO_PPP __cpu_to_be16(0x880b) >> > + >> > struct ip_tunnel_parm { >> > charname[IFNAMSIZ]; >> > int link; >> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c >> > index 61ad43f..d95e060 100644 >> > --- a/net/core/flow_dissector.c >> > +++ b/net/core/flow_dissector.c >> > @@ -346,63 +346,101 @@ ip_proto_again: >> > hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, >> > hlen, &_hdr); >> > if (!hdr) >> > goto out_bad; >> > - /* >> > -* Only look inside GRE if version zero and no >> > -* routing >> > -*/ >> > - if (hdr->flags & (GRE_VERSION | GRE_ROUTING)) >> > - break; >> > >> > - proto = hdr->proto; >> > - nhoff += 4; >> > - if (hdr->flags & GRE_CSUM) >> > + /* >> > + * Only look inside GRE if version zero and no >> > + * routing >> >> This comment is no longer true, GRE version 1 is being handled. > > Ok, I get it. Thanks. > >> >> > + */ >> > + if (!(hdr->flags & (GRE_VERSION | GRE_ROUTING))) { >> > + proto = hdr->proto; >> > nhoff += 4; >> > - if (hdr->flags & GRE_KEY) { >> > - const __be32 *keyid; >> > - __be32 _keyid; >> > + if (hdr->flags & GRE_CSUM) >> > + nhoff += 4; >> > + if (hdr->flags & GRE_KEY) { >> > + const __be32 *keyid; >> > + __be32 _keyid; >> > + >> > + keyid = __skb_header_pointer(skb, nhoff, >> > sizeof(_keyid), >> > +data, hlen, >> > &_keyid); >> > + >> > + if (!keyid) >> > + goto out_bad; >> > + >> > + if (dissector_uses_key(flow_dissector, >> > + >> > FLOW_DISSECTOR_KEY_GRE_KEYID)) { >> > + key_keyid = >> > skb_flow_dissector_target(flow_dissector, >> > + >> >FLOW_DISSECTOR_KEY_GRE_KEYID, >> > + >> >
Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
On Wed, Jul 27, 2016 at 04:45:43PM -0700, David Miller wrote: > > I highly expect both my issue and OP's issue to revolve not around > > commit e5a4b0bb803b specifically, but around other code that no longer > > behaves as expected because of it. > > Indeed, and that fault address rounding bug occurs two other times > in arch/sparc/lib/user_fixup.c > > The mentioned patchwork patch should fix the bug and I'll get that > into my sparc tree, merged, and queued up for -stable ASAP. Plausible for sparc, but I don't see similar __copy_to_user_inatomic() bugs in case of x86_64...
Re: [PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash
Thanks Tom, I append some inline comments. BTW, forgive me reply the email by gmail because the original mail server doesn't support kernel email format. I will send the update later. Best Regards Feng On Thu, Jul 28, 2016 at 8:04 AM, Tom Herbertwrote: > > On Wed, Jul 27, 2016 at 4:42 PM, wrote: > > From: Gao Feng > > > > The PPTP is encapsulated by GRE header with that GRE_VERSION bits > > must contain one. But current GRE RPS needs the GRE_VERSION must be > > zero. So RPS does not work for PPTP traffic. > > > > In my test environment, there are four MIPS cores, and all traffic > > are passed through by PPTP. As a result, only one core is 100% busy > > while other three cores are very idle. After this patch, the usage > > of four cores are balanced well. > > > > Signed-off-by: Gao Feng > > --- > > v1: Initial patch > > > > include/uapi/linux/if_tunnel.h | 5 +- > > net/core/flow_dissector.c | 138 > > ++--- > > 2 files changed, 92 insertions(+), 51 deletions(-) > > > > diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h > > index 1046f55..dda4e4b 100644 > > --- a/include/uapi/linux/if_tunnel.h > > +++ b/include/uapi/linux/if_tunnel.h > > @@ -24,9 +24,12 @@ > > #define GRE_SEQ__cpu_to_be16(0x1000) > > #define GRE_STRICT __cpu_to_be16(0x0800) > > #define GRE_REC__cpu_to_be16(0x0700) > > -#define GRE_FLAGS __cpu_to_be16(0x00F8) > > +#define GRE_ACK__cpu_to_be16(0x0080) > > +#define GRE_FLAGS __cpu_to_be16(0x0078) > > #define GRE_VERSION__cpu_to_be16(0x0007) > > > > +#define GRE_PROTO_PPP __cpu_to_be16(0x880b) > > + > > struct ip_tunnel_parm { > > charname[IFNAMSIZ]; > > int link; > > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c > > index 61ad43f..d95e060 100644 > > --- a/net/core/flow_dissector.c > > +++ b/net/core/flow_dissector.c > > @@ -346,63 +346,101 @@ ip_proto_again: > > hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, > > hlen, &_hdr); > > if (!hdr) > > goto out_bad; > > - /* > > -* Only look inside GRE if version zero and no > > -* routing > > -*/ > > - if (hdr->flags & (GRE_VERSION | GRE_ROUTING)) > > - break; > > > > - proto = hdr->proto; > > - nhoff += 4; > > - if (hdr->flags & GRE_CSUM) > > + /* > > + * Only look inside GRE if version zero and no > > + * routing > > This comment is no longer true, GRE version 1 is being handled. Ok, I get it. Thanks. > > > + */ > > + if (!(hdr->flags & (GRE_VERSION | GRE_ROUTING))) { > > + proto = hdr->proto; > > nhoff += 4; > > - if (hdr->flags & GRE_KEY) { > > - const __be32 *keyid; > > - __be32 _keyid; > > + if (hdr->flags & GRE_CSUM) > > + nhoff += 4; > > + if (hdr->flags & GRE_KEY) { > > + const __be32 *keyid; > > + __be32 _keyid; > > + > > + keyid = __skb_header_pointer(skb, nhoff, > > sizeof(_keyid), > > +data, hlen, > > &_keyid); > > + > > + if (!keyid) > > + goto out_bad; > > + > > + if (dissector_uses_key(flow_dissector, > > + > > FLOW_DISSECTOR_KEY_GRE_KEYID)) { > > + key_keyid = > > skb_flow_dissector_target(flow_dissector, > > + > > FLOW_DISSECTOR_KEY_GRE_KEYID, > > + > > target_container); > > + key_keyid->keyid = *keyid; > > + } > > + nhoff += 4; > > + } > > + if (hdr->flags & GRE_SEQ) > > + nhoff += 4; > > + if (proto == htons(ETH_P_TEB)) { > > + const struct ethhdr *eth; > > + struct ethhdr _eth; > > + > > + eth = __skb_header_pointer(skb, nhoff, > > + sizeof(_eth), > > + data, hlen, > > &_eth); > > +
Darlehen anbieten
Guter Tag zu Ihnen, Sie haben um finanzielle Unterstützung benötigen? ein berechtigtes Kredit für Zinsen nötig? Sie benötigen einen Business-Darlehen? Brauchen Sie noch einen Kredit, um ein Haus, Auto zu kaufen, zahlen Sie Ihre Rechnungen und Schulden? Brauchen Sie Geld Probleme zu lösen? Wenn so freundlich für ein Darlehen bei uns bewerben, bewerben Sie sich jetzt durch die folgenden Details und Rückkehr zu uns per E-Mail: cabo...@hotmail.com Die Namen sind: ... Adresse: ... Telefon: ... Darlehensbetrag erforderlich: ... Dauer: ... Beruf: ... Monatliches Einkommen Level: ... Nicht ... Geburtsdatum: Status: .. Land: . Postleitzahl: . Zweck: .. KONTAKTIERE UNS. cabo...@hotmail.com
Re: [PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash
On Wed, Jul 27, 2016 at 4:42 PM,wrote: > From: Gao Feng > > The PPTP is encapsulated by GRE header with that GRE_VERSION bits > must contain one. But current GRE RPS needs the GRE_VERSION must be > zero. So RPS does not work for PPTP traffic. > > In my test environment, there are four MIPS cores, and all traffic > are passed through by PPTP. As a result, only one core is 100% busy > while other three cores are very idle. After this patch, the usage > of four cores are balanced well. > > Signed-off-by: Gao Feng > --- > v1: Initial patch > > include/uapi/linux/if_tunnel.h | 5 +- > net/core/flow_dissector.c | 138 > ++--- > 2 files changed, 92 insertions(+), 51 deletions(-) > > diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h > index 1046f55..dda4e4b 100644 > --- a/include/uapi/linux/if_tunnel.h > +++ b/include/uapi/linux/if_tunnel.h > @@ -24,9 +24,12 @@ > #define GRE_SEQ__cpu_to_be16(0x1000) > #define GRE_STRICT __cpu_to_be16(0x0800) > #define GRE_REC__cpu_to_be16(0x0700) > -#define GRE_FLAGS __cpu_to_be16(0x00F8) > +#define GRE_ACK__cpu_to_be16(0x0080) > +#define GRE_FLAGS __cpu_to_be16(0x0078) > #define GRE_VERSION__cpu_to_be16(0x0007) > > +#define GRE_PROTO_PPP __cpu_to_be16(0x880b) > + > struct ip_tunnel_parm { > charname[IFNAMSIZ]; > int link; > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c > index 61ad43f..d95e060 100644 > --- a/net/core/flow_dissector.c > +++ b/net/core/flow_dissector.c > @@ -346,63 +346,101 @@ ip_proto_again: > hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, > hlen, &_hdr); > if (!hdr) > goto out_bad; > - /* > -* Only look inside GRE if version zero and no > -* routing > -*/ > - if (hdr->flags & (GRE_VERSION | GRE_ROUTING)) > - break; > > - proto = hdr->proto; > - nhoff += 4; > - if (hdr->flags & GRE_CSUM) > + /* > + * Only look inside GRE if version zero and no > + * routing This comment is no longer true, GRE version 1 is being handled. > + */ > + if (!(hdr->flags & (GRE_VERSION | GRE_ROUTING))) { > + proto = hdr->proto; > nhoff += 4; > - if (hdr->flags & GRE_KEY) { > - const __be32 *keyid; > - __be32 _keyid; > + if (hdr->flags & GRE_CSUM) > + nhoff += 4; > + if (hdr->flags & GRE_KEY) { > + const __be32 *keyid; > + __be32 _keyid; > + > + keyid = __skb_header_pointer(skb, nhoff, > sizeof(_keyid), > +data, hlen, > &_keyid); > + > + if (!keyid) > + goto out_bad; > + > + if (dissector_uses_key(flow_dissector, > + > FLOW_DISSECTOR_KEY_GRE_KEYID)) { > + key_keyid = > skb_flow_dissector_target(flow_dissector, > + > FLOW_DISSECTOR_KEY_GRE_KEYID, > + > target_container); > + key_keyid->keyid = *keyid; > + } > + nhoff += 4; > + } > + if (hdr->flags & GRE_SEQ) > + nhoff += 4; > + if (proto == htons(ETH_P_TEB)) { > + const struct ethhdr *eth; > + struct ethhdr _eth; > + > + eth = __skb_header_pointer(skb, nhoff, > + sizeof(_eth), > + data, hlen, &_eth); > + if (!eth) > + goto out_bad; > + proto = eth->h_proto; > + nhoff += sizeof(*eth); > + > + /* Cap headers that we access via pointers at > the > +* end of the Ethernet header as our maximum > alignment > +* at that point is only 2 bytes. > +*/ > + if
[PATCH 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash
From: Gao FengThe PPTP is encapsulated by GRE header with that GRE_VERSION bits must contain one. But current GRE RPS needs the GRE_VERSION must be zero. So RPS does not work for PPTP traffic. In my test environment, there are four MIPS cores, and all traffic are passed through by PPTP. As a result, only one core is 100% busy while other three cores are very idle. After this patch, the usage of four cores are balanced well. Signed-off-by: Gao Feng --- v1: Initial patch include/uapi/linux/if_tunnel.h | 5 +- net/core/flow_dissector.c | 138 ++--- 2 files changed, 92 insertions(+), 51 deletions(-) diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h index 1046f55..dda4e4b 100644 --- a/include/uapi/linux/if_tunnel.h +++ b/include/uapi/linux/if_tunnel.h @@ -24,9 +24,12 @@ #define GRE_SEQ__cpu_to_be16(0x1000) #define GRE_STRICT __cpu_to_be16(0x0800) #define GRE_REC__cpu_to_be16(0x0700) -#define GRE_FLAGS __cpu_to_be16(0x00F8) +#define GRE_ACK__cpu_to_be16(0x0080) +#define GRE_FLAGS __cpu_to_be16(0x0078) #define GRE_VERSION__cpu_to_be16(0x0007) +#define GRE_PROTO_PPP __cpu_to_be16(0x880b) + struct ip_tunnel_parm { charname[IFNAMSIZ]; int link; diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 61ad43f..d95e060 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -346,63 +346,101 @@ ip_proto_again: hdr = __skb_header_pointer(skb, nhoff, sizeof(_hdr), data, hlen, &_hdr); if (!hdr) goto out_bad; - /* -* Only look inside GRE if version zero and no -* routing -*/ - if (hdr->flags & (GRE_VERSION | GRE_ROUTING)) - break; - proto = hdr->proto; - nhoff += 4; - if (hdr->flags & GRE_CSUM) + /* + * Only look inside GRE if version zero and no + * routing + */ + if (!(hdr->flags & (GRE_VERSION | GRE_ROUTING))) { + proto = hdr->proto; nhoff += 4; - if (hdr->flags & GRE_KEY) { - const __be32 *keyid; - __be32 _keyid; + if (hdr->flags & GRE_CSUM) + nhoff += 4; + if (hdr->flags & GRE_KEY) { + const __be32 *keyid; + __be32 _keyid; + + keyid = __skb_header_pointer(skb, nhoff, sizeof(_keyid), +data, hlen, &_keyid); + + if (!keyid) + goto out_bad; + + if (dissector_uses_key(flow_dissector, + FLOW_DISSECTOR_KEY_GRE_KEYID)) { + key_keyid = skb_flow_dissector_target(flow_dissector, + FLOW_DISSECTOR_KEY_GRE_KEYID, + target_container); + key_keyid->keyid = *keyid; + } + nhoff += 4; + } + if (hdr->flags & GRE_SEQ) + nhoff += 4; + if (proto == htons(ETH_P_TEB)) { + const struct ethhdr *eth; + struct ethhdr _eth; + + eth = __skb_header_pointer(skb, nhoff, + sizeof(_eth), + data, hlen, &_eth); + if (!eth) + goto out_bad; + proto = eth->h_proto; + nhoff += sizeof(*eth); + + /* Cap headers that we access via pointers at the +* end of the Ethernet header as our maximum alignment +* at that point is only 2 bytes. +*/ + if (NET_IP_ALIGN) + hlen = nhoff; + } - keyid = __skb_header_pointer(skb, nhoff, sizeof(_keyid), -data, hlen, &_keyid); + key_control->flags |= FLOW_DIS_ENCAPSULATION; + if (flags
Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
From: alexmcwhir...@triadic.us Date: Wed, 27 Jul 2016 19:02:40 -0400 > On 2016-07-27 14:04, alexmcwhir...@triadic.us wrote: >> Just to add some more information to this, the corruption seems to >> effect ssh as well. >> Using a sun hme interface, occasionally upon an ssh connection it will >> refuse to authenticate a client with either password or cert >> authentication. Using wireshark to capture and decrypt the packets >> between the two machines, the data coming from the server seems good, >> but the data received by the server from the client is essentially >> garbage. Note that the client is sending valid data, but the server is >> corrupting it upon receipt. Closing the connection and starting a new >> one will remedy the login issue, but you do occasionally see >> corruption on the server side sporadically. >> So far this only seems to occur on incoming data, outgoing data seems >> fine. > > Also, there is another patch the references this commit on sparc64 at > least. > > https://patchwork.kernel.org/patch/9221895/ > > I highly expect both my issue and OP's issue to revolve not around > commit e5a4b0bb803b specifically, but around other code that no longer > behaves as expected because of it. Indeed, and that fault address rounding bug occurs two other times in arch/sparc/lib/user_fixup.c The mentioned patchwork patch should fix the bug and I'll get that into my sparc tree, merged, and queued up for -stable ASAP.
Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
On 2016-07-27 14:04, alexmcwhir...@triadic.us wrote: Just to add some more information to this, the corruption seems to effect ssh as well. Using a sun hme interface, occasionally upon an ssh connection it will refuse to authenticate a client with either password or cert authentication. Using wireshark to capture and decrypt the packets between the two machines, the data coming from the server seems good, but the data received by the server from the client is essentially garbage. Note that the client is sending valid data, but the server is corrupting it upon receipt. Closing the connection and starting a new one will remedy the login issue, but you do occasionally see corruption on the server side sporadically. So far this only seems to occur on incoming data, outgoing data seems fine. Also, there is another patch the references this commit on sparc64 at least. https://patchwork.kernel.org/patch/9221895/ I highly expect both my issue and OP's issue to revolve not around commit e5a4b0bb803b specifically, but around other code that no longer behaves as expected because of it.
Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.
Chunhao Lin: [...] > diff --git a/drivers/net/ethernet/realtek/r8169.c > b/drivers/net/ethernet/realtek/r8169.c > index 0e62d74..f07604f 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c > @@ -1749,13 +1749,21 @@ static u32 __rtl8169_get_wol(struct rtl8169_private > *tp) > static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo > *wol) > { > struct rtl8169_private *tp = netdev_priv(dev); > + struct pci_dev *pdev = tp->pci_dev; Nit: you may directly use "struct device *d = >pci_dev->dev;" -- Ueimor
Re: [PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.
Chunhao Lin: [...] > diff --git a/drivers/net/ethernet/realtek/r8169.c > b/drivers/net/ethernet/realtek/r8169.c > index 0e62d74..f07604f 100644 > --- a/drivers/net/ethernet/realtek/r8169.c > +++ b/drivers/net/ethernet/realtek/r8169.c [...] > @@ -1852,12 +1863,17 @@ static int rtl8169_set_wol(struct net_device *dev, > struct ethtool_wolinfo *wol) > tp->features |= RTL_FEATURE_WOL; > else > tp->features &= ~RTL_FEATURE_WOL; > - __rtl8169_set_wol(tp, wol->wolopts); > + if (pm_runtime_active(>dev)) > + __rtl8169_set_wol(tp, wol->wolopts); > + else > + tp->saved_wolopts = wol->wolopts; > > rtl_unlock_work(tp); > > device_set_wakeup_enable(>pci_dev->dev, wol->wolopts); > > + pm_runtime_put_noidle(>dev); > + > return 0; Either the driver resumes the device so that it can perform requested operation or it signals .set_wol failure when the device is suspended. If the driver does something else, "spam removal" translates to "silent failure". -- Ueimor
Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
On Tue, 2016-07-26 at 11:14 +0200, Eric Dumazet wrote: > Could you try this ? > > diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c > b/drivers/net/ethernet/intel/e1000/e1000_main.c > index > f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a > 938b3820b 100644 > --- a/drivers/net/ethernet/intel/e1000/e1000_main.c > +++ b/drivers/net/ethernet/intel/e1000/e1000_main.c > @@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device > *netdev) > { > struct e1000_adapter *adapter = netdev_priv(netdev); > > - disable_irq(adapter->pdev->irq); > - e1000_intr(adapter->pdev->irq, netdev); > - enable_irq(adapter->pdev->irq); > + if (napi_schedule_prep(>napi)) { > + adapter->total_tx_bytes = 0; > + adapter->total_tx_packets = 0; > + adapter->total_rx_bytes = 0; > + adapter->total_rx_packets = 0; > + __napi_schedule(>napi); > + } > } > #endif > Since this fixes the issue Fengguang saw, will you be submitting a formal patch Eric? (please) I can get this queued up for Dave's net tree as soon as I receive the formal patch. signature.asc Description: This is a digitally signed message part
Re: [PATCH 05/15] ethernet: cavium: octeon: add missing of_node_put after calling of_parse_phandle
On 07/26/2016 07:20 PM, Peter Chen wrote: of_node_put needs to be called when the device node which is got from of_parse_phandle has finished using. Signed-off-by: Peter ChenNAK. --- drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c index e8bc15b..5eb9d8c 100644 --- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c +++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c @@ -960,6 +960,7 @@ static int octeon_mgmt_init_phy(struct net_device *netdev) phydev = of_phy_connect(netdev, p->phy_np, octeon_mgmt_adjust_link, 0, PHY_INTERFACE_MODE_MII); + of_node_put(p->phy_np); I don't think you can do this here. octeon_mgmt_init_phy() may be called multiple times in the life of the driver, so p->phy_np must remain valid. It may be appropriate to do the of_node_put() in the octeon_mgmt_remove() function. if (!phydev) return -ENODEV;
RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization
>> This is prepatory work for an expanding list of adapter families that have >> occasional ~10 hour clock jumps when being used for PTP. Factor out the >> sanitization function and convert to using a feature (bug) flag, per >> suggestion from Jesse Brandeburg. >> >> Littering functional code with device-specific checks is much messier than >> simply checking a flag, and having device-specific init set flags as needed. >> There are probably a number of other cases in the e1000e code that >> could/should be converted similarly. > > Looks ok to me. > Adding Chris who asked what happens if we reach the max retry counter > (E1000_MAX_82574_SYSTIM_REREAD)? > This counter is set to 50. > Can you, for testing purposes, decreased this value (or even set it to 0) and > see what happens? I have one of the affected NICs. I inserted some test code into the module to check the value of i after the for loop inside the sanitize function exits to check how many iterations of the loop were being run. I ran a PTP test setup for a few hours with E1000_MAX_82574_SYSTIM_REREAD=50, and the maximum number was 5 iterations. It looks like setting E1000_MAX_82574_SYSTIM_REREAD to 0 will just disable the return value checking and will have the same results as disabling the sanitization function. I'll set the max retry counter to 1 and run an overnight test to see what happens. Tim Woodford
[PATCH net-next 2/3] kcm: Use stream parser
Adapt KCM to use the stream parser. This mostly involves removing the RX handling and setting up the strparser using the interface. Signed-off-by: Tom Herbert--- include/net/kcm.h | 37 + net/kcm/Kconfig | 1 + net/kcm/kcmproc.c | 41 +++-- net/kcm/kcmsock.c | 454 -- 4 files changed, 102 insertions(+), 431 deletions(-) diff --git a/include/net/kcm.h b/include/net/kcm.h index 2840b58..2a89658 100644 --- a/include/net/kcm.h +++ b/include/net/kcm.h @@ -13,6 +13,7 @@ #include #include +#include #include extern unsigned int kcm_net_id; @@ -21,16 +22,8 @@ extern unsigned int kcm_net_id; #define KCM_STATS_INCR(stat) ((stat)++) struct kcm_psock_stats { - unsigned long long rx_msgs; - unsigned long long rx_bytes; unsigned long long tx_msgs; unsigned long long tx_bytes; - unsigned int rx_aborts; - unsigned int rx_mem_fail; - unsigned int rx_need_more_hdr; - unsigned int rx_msg_too_big; - unsigned int rx_msg_timeouts; - unsigned int rx_bad_hdr_len; unsigned long long reserved; unsigned long long unreserved; unsigned int tx_aborts; @@ -64,13 +57,6 @@ struct kcm_tx_msg { struct sk_buff *last_skb; }; -struct kcm_rx_msg { - int full_len; - int accum_len; - int offset; - int early_eaten; -}; - /* Socket structure for KCM client sockets */ struct kcm_sock { struct sock sk; @@ -87,6 +73,7 @@ struct kcm_sock { struct work_struct tx_work; struct list_head wait_psock_list; struct sk_buff *seq_skb; + u32 tx_stopped : 1; /* Don't use bit fields here, these are set under different locks */ bool tx_wait; @@ -104,11 +91,11 @@ struct bpf_prog; /* Structure for an attached lower socket */ struct kcm_psock { struct sock *sk; + struct strparser strp; struct kcm_mux *mux; int index; u32 tx_stopped : 1; - u32 rx_stopped : 1; u32 done : 1; u32 unattaching : 1; @@ -121,18 +108,12 @@ struct kcm_psock { struct kcm_psock_stats stats; /* Receive */ - struct sk_buff *rx_skb_head; - struct sk_buff **rx_skb_nextp; - struct sk_buff *ready_rx_msg; struct list_head psock_ready_list; - struct work_struct rx_work; - struct delayed_work rx_delayed_work; struct bpf_prog *bpf_prog; struct kcm_sock *rx_kcm; unsigned long long saved_rx_bytes; unsigned long long saved_rx_msgs; - struct timer_list rx_msg_timer; - unsigned int rx_need_bytes; + struct sk_buff *ready_rx_msg; /* Transmit */ struct kcm_sock *tx_kcm; @@ -146,6 +127,7 @@ struct kcm_net { struct mutex mutex; struct kcm_psock_stats aggregate_psock_stats; struct kcm_mux_stats aggregate_mux_stats; + struct strp_aggr_stats aggregate_strp_stats; struct list_head mux_list; int count; }; @@ -163,6 +145,7 @@ struct kcm_mux { struct kcm_mux_stats stats; struct kcm_psock_stats aggregate_psock_stats; + struct strp_aggr_stats aggregate_strp_stats; /* Receive */ spinlock_t rx_lock cacheline_aligned_in_smp; @@ -190,14 +173,6 @@ static inline void aggregate_psock_stats(struct kcm_psock_stats *stats, /* Save psock statistics in the mux when psock is being unattached. */ #define SAVE_PSOCK_STATS(_stat) (agg_stats->_stat += stats->_stat) - SAVE_PSOCK_STATS(rx_msgs); - SAVE_PSOCK_STATS(rx_bytes); - SAVE_PSOCK_STATS(rx_aborts); - SAVE_PSOCK_STATS(rx_mem_fail); - SAVE_PSOCK_STATS(rx_need_more_hdr); - SAVE_PSOCK_STATS(rx_msg_too_big); - SAVE_PSOCK_STATS(rx_msg_timeouts); - SAVE_PSOCK_STATS(rx_bad_hdr_len); SAVE_PSOCK_STATS(tx_msgs); SAVE_PSOCK_STATS(tx_bytes); SAVE_PSOCK_STATS(reserved); diff --git a/net/kcm/Kconfig b/net/kcm/Kconfig index 5db94d9..87fca36 100644 --- a/net/kcm/Kconfig +++ b/net/kcm/Kconfig @@ -3,6 +3,7 @@ config AF_KCM tristate "KCM sockets" depends on INET select BPF_SYSCALL + select STREAM_PARSER ---help--- KCM (Kernel Connection Multiplexor) sockets provide a method for multiplexing messages of a message based application diff --git a/net/kcm/kcmproc.c b/net/kcm/kcmproc.c index 16c2e03..c23f065 100644 --- a/net/kcm/kcmproc.c +++ b/net/kcm/kcmproc.c @@ -155,8 +155,8 @@ static void kcm_format_psock(struct kcm_psock *psock, struct seq_file *seq, seq_printf(seq, " psock-%-5u %-10llu %-16llu %-10llu %-16llu %-8d %-8d %-8d %-8d ", psock->index, - psock->stats.rx_msgs, - psock->stats.rx_bytes, + psock->strp.stats.rx_msgs, + psock->strp.stats.rx_bytes, psock->stats.tx_msgs,
[PATCH net-next 1/3] strparser: Stream parser for messages
This patch introduces a utility for parsing application layer protocol messages in a TCP stream. This is a generalization of the mechanism implemented of Kernel Connection Multiplexor. The API includes a context structure, a set of callbacks, utility functions, and a data ready function. A stream parser instance is defined by a strparse structure that is bound to a TCP socket. The function to initialize the structure is: int strp_init(struct strparser *strp, struct sock *csk, struct strp_callbacks *cb); csk is the TCP socket being bound to and cb are the parser callbacks. A parser is bound to a TCP socket by setting data_ready function to strp_tcp_data_ready so that all receive indications on the socket go through the parser. This is assumes that sk_user_data is set to the strparser structure. There are four callbacks. - parse_msg is called to parse the message (returns length or error). - rcv_msg is called when a complete message has been received - read_sock_done is called when data_ready function exits - abort_parser is called to abort the parser The input to parse_msg is an skbuff which contains next message under construction. The backend processing of parse_msg will parse the application layer protocol headers to determine the length of the message in the stream. The possible return values are: >0 : indicates length of successfully parsed message 0 : indicates more data must be received to parse the message -ESTRPIPE : current message should not be processed by the kernel, return control of the socket to userspace which can proceed to read the messages itself other < 0 : Error is parsing, give control back to userspace assuming that synchronzation is lost and the stream is unrecoverable (application expected to close TCP socket) In the case of error return (< 0) strparse will stop the parser and report and error to userspace. The application must deal with the error. To handle the error the strparser is unbound from the TCP socket. If the error indicates that the stream TCP socket is at recoverable point (ESTRPIPE) then the application can read the TCP socket to process the stream. Once the application has dealt with the exceptions in the stream, it may again bind the socket to a strparser to continue data operations. Note that ENODATA may be returned to the application. In this case parse_msg returned -ESTRPIPE, however strparser was unable to maintain synchronization of the stream (i.e. some of the message in question was already read by the parser). strp_pause and strp_unpause are used to provide flow control. For instance, if rcv_msg is called but the upper layer can't immediately consume the message it can hold the message and pause strparser. Signed-off-by: Tom Herbert--- include/net/strparser.h | 147 ++ net/Kconfig | 1 + net/Makefile | 1 + net/strparser/Kconfig | 4 + net/strparser/Makefile| 1 + net/strparser/strparser.c | 492 ++ 6 files changed, 646 insertions(+) create mode 100644 include/net/strparser.h create mode 100644 net/strparser/Kconfig create mode 100644 net/strparser/Makefile create mode 100644 net/strparser/strparser.c diff --git a/include/net/strparser.h b/include/net/strparser.h new file mode 100644 index 000..42b964d --- /dev/null +++ b/include/net/strparser.h @@ -0,0 +1,147 @@ +/* + * Stream Parser + * + * Copyright (c) 2016 Tom Herbert + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 + * as published by the Free Software Foundation. + */ + +#ifndef __NET_STRPARSER_H_ +#define __NET_STRPARSER_H_ + +#include +#include + +#define STRP_STATS_ADD(stat, count) ((stat) += (count)) +#define STRP_STATS_INCR(stat) ((stat)++) + +struct strp_stats { + unsigned long long rx_msgs; + unsigned long long rx_bytes; + unsigned int rx_mem_fail; + unsigned int rx_need_more_hdr; + unsigned int rx_msg_too_big; + unsigned int rx_msg_timeouts; + unsigned int rx_bad_hdr_len; +}; + +struct strp_aggr_stats { + unsigned long long rx_msgs; + unsigned long long rx_bytes; + unsigned int rx_mem_fail; + unsigned int rx_need_more_hdr; + unsigned int rx_msg_too_big; + unsigned int rx_msg_timeouts; + unsigned int rx_bad_hdr_len; + unsigned int rx_aborts; + unsigned int rx_interrupted; + unsigned int rx_unrecov_intr; +}; + +struct strparser; + +/* Callbacks are called with lock held for the attached socket */ +struct strp_callbacks { + int (*parse_msg)(struct strparser *strp, struct sk_buff *skb); + void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb); + int (*read_sock_done)(struct strparser *strp, int err); + void (*abort_parser)(struct strparser *strp, int
[PATCH net-next 0/3] strp: Stream parser for messages
This patch set introduces a utility for parsing application layer protocol messages in a TCP stream. This is a generalization of the mechanism implemented of Kernel Connection Multiplexor. This patch set adapts KCM to use the strparser. We expect that kTLS can use this mechanism also. RDS would probably be another candidate to use a common stream parsing mechanism. The API includes a context structure, a set of callbacks, utility functions, and a data ready function. The callbacks include a parse_msg function that is called to perform parsing (e.g. BPF parsing in case of KCM), and a rcv_msg function that is called when a full message has been completed. For strparser we specify the return codes from the parser to allow the backend to indicate that control of the socket should be transferred back to userspace to handle some exceptions in the stream: The return values are: >0 : indicates length of successfully parsed message 0 : indicates more data must be received to parse the message -ESTRPIPE : current message should not be processed by the kernel, return control of the socket to userspace which can proceed to read the messages itself other < 0 : Error is parsing, give control back to userspace assuming that synchronization is lost and the stream is unrecoverable (application expected to close TCP socket) There is one issue I haven't been able to fully resolve. If parse_msg returns ESTRPIPE (wants control back to userspace) the parser may already have consumed some bytes of the message. There is no way to put bytes back into the TCP receive queue and tcp_read_sock does not allow an easy way to peek messages. In lieu of a better solution, we return ENODATA on the socket to indicate that the data stream is unrecoverable (application needs to close socket). This condition should only happen if an application layer message header is split across two skbuffs and parsing just the first skbuff wasn't sufficient to determine the that transfer to userspace is needed. This patch set contains: - strparser implementation - changes to kcm to use strparser - strparser.txt documentation Tested: - Ran a KCM thrash test for 24 hours. No behavioral or performance differences observed. Tom Herbert (3): strparser: Stream parser for messages kcm: Use stream parser strparser: Documentation Documentation/networking/strparser.txt | 147 ++ include/net/kcm.h | 37 +-- include/net/strparser.h| 147 ++ net/Kconfig| 1 + net/Makefile | 1 + net/kcm/Kconfig| 1 + net/kcm/kcmproc.c | 41 ++- net/kcm/kcmsock.c | 454 +- net/strparser/Kconfig | 4 + net/strparser/Makefile | 1 + net/strparser/strparser.c | 492 + 11 files changed, 895 insertions(+), 431 deletions(-) create mode 100644 Documentation/networking/strparser.txt create mode 100644 include/net/strparser.h create mode 100644 net/strparser/Kconfig create mode 100644 net/strparser/Makefile create mode 100644 net/strparser/strparser.c -- 2.8.0.rc2
[PATCH net-next 3/3] strparser: Documentation
Signed-off-by: Tom Herbert--- Documentation/networking/strparser.txt | 147 + 1 file changed, 147 insertions(+) create mode 100644 Documentation/networking/strparser.txt diff --git a/Documentation/networking/strparser.txt b/Documentation/networking/strparser.txt new file mode 100644 index 000..abfef72 --- /dev/null +++ b/Documentation/networking/strparser.txt @@ -0,0 +1,147 @@ +Stream Parser +- + +The stream parser (strparser) is a utility that parses messages of an +application layer protocol running over a TCP connection. The stream +parser works in conjunction with an upper layer in the kernel to provide +kernel support for application layer messages. For instance, Kernel +Connection Multiplexor (KCM) uses the Stream Parser to parse messages +using a BPF program. + +Interface +- + +The API includes a context structure, a set of callbacks, utility +functions, and a data_ready function. The callbacks include +a parse_msg function that is called to perform parsing (e.g. +BPF parsing in case of KCM), and a rcv_msg function that is called +when a full message has been completed. + +A stream parser can be instantiated for a TCP connection. This is done +by: + +strp_init(struct strparser *strp, struct sock *csk, + struct strp_callbacks *cb) + +strp is a struct of type strparser that is allocated by the upper layer. +csk is the TCP socket associated with the stream parser. Callbacks are +called by the stream parser. + +Callbacks +- + +There are four callbacks: + +int (*parse_msg)(struct strparser *strp, struct sk_buff *skb); + +parse_msg is called to determine the length of the next message +in the stream. The upper layer must implement this function. It +should parse the sk_buff as containing the headers for the +next application layer messages in the stream. + +The skb->cb in the input skb is a struct strp_rx_msg. Only +the offset field is relevant in parse_msg and gives the offset +where the message starts in the skb. + +The return values of this function are: + +>0 : indicates length of successfully parsed message +0 : indicates more data must be received to parse the message +-ESTRPIPE : current message should not be processed by the + kernel, return control of the socket to userspace which + can proceed to read the messages itself +other < 0 : Error is parsing, give control back to userspace + assuming that synchronization is lost and the stream + is unrecoverable (application expected to close TCP socket) + +In the case that an error is returned (return value is less than +zero) the stream parser will set the error on TCP socket and wake +it up. If parse_msg returned -ESTRPIPE and the stream parser had +previously read some bytes for the current message, then the error +set on the attached socket is ENODATA since the stream is +unrecoverable in that case. + +void (*rcv_msg)(struct strparser *strp, struct sk_buff *skb); + +rcv_msg is called when a full message has been received and +is queued. The callee must consume the sk_buff; it can +call strp_pause to prevent any further messages from being +received in rcv_msg (see strp_pause below). This callback +must be set. + +The skb->cb in the input skb is a struct strp_rx_msg. This +struct contains two fields: offset and full_len. Offset is +where the message starts in the skb, and full_len is the +the length of the message. skb->len - offset may be greater +then full_len since strparser does not trim the skb. + +int (*read_sock_done)(struct strparser *strp, int err); + + read_sock_done is called when the stream parser is done reading + the TCP socket. The stream parser may read multiple messages + in a loop and this function allows cleanup to occur when existing + the loop. If the callback is not set (NULL in strp_init) a + default function is used. + +void (*abort_parser)(struct strparser *strp, int err); + + This function is called when stream parser encounters an error + in parsing. The default function stops the stream parser for the + TCP socket and sets the error in the socket. The default function + can be changed by setting the callback to non-NULL in strp_init. + +data_ready +-- + +strp_tcp_data_ready is the TCP data ready function that can be used with +the stream parser. When the upper layer attaches the TCP socket it +can set sk_data_ready to this function (after strp_init). sk_user_data +must be set to the stream parser structure to use this (container_of +can be used in the other socket callbacks to get the upper layer +structure from the stream parser structure). + +Functions +- + +The upper layer uses strp_pause and strp_unpause to flow control the +stream parser. This is needed for instance when the upper layer can't +immediately process a
Re: linux-next: manual merge of the net-next tree with the arm-soc tree
On Tue, Jul 26, 2016 at 7:53 PM, Stephen Rothwellwrote: > Hi all, > > Today's linux-next merge of the net-next tree got a conflict in: > > arch/arm64/boot/dts/apm/apm-shadowcat.dtsi > > between commit: > > cafc4cd0c8b8 ("arm64: dts: apm: Use lowercase consistently for hex > constants") > > from the arm-soc tree and commit: > > 8e694cd2762c ("dtb: xgene: Add MDIO node") > > from the net-next tree. > > I fixed it up (see below) and can carry the fix as necessary. This > is now fixed as far as linux-next is concerned, but any non trivial > conflicts should be mentioned to your upstream maintainer when your tree > is submitted for merging. You may also want to consider cooperating > with the maintainer of the conflicting tree to minimise any particularly > complex conflicts. Thanks for taking care of this, Stephen. The fix looks fine. > > -- > Cheers, > Stephen Rothwell > > diff --cc arch/arm64/boot/dts/apm/apm-shadowcat.dtsi > index 21028b145d91,2e1e5daa1dc7.. > --- a/arch/arm64/boot/dts/apm/apm-shadowcat.dtsi > +++ b/arch/arm64/boot/dts/apm/apm-shadowcat.dtsi > @@@ -628,9 -636,9 +636,9 @@@ > sgenet0: ethernet@1f61 { > compatible = "apm,xgene2-sgenet"; > status = "disabled"; > - reg = <0x0 0x1f61 0x0 0x1>, > + reg = <0x0 0x1f61 0x0 0xd100>, > -<0x0 0x1f60 0x0 0Xd100>, > -<0x0 0x2000 0x0 0X2>; > +<0x0 0x1f60 0x0 0xd100>, > +<0x0 0x2000 0x0 0x2>; > interrupts = <0 96 4>, > <0 97 4>; > dma-coherent; Regards, Duc Dang.
Re: [PATCH 1/2] SUNRPC: accept() may return sockets that are still in SYN_RECV
Hi Eric, > On Jul 27, 2016, at 14:59, Eric Dumazetwrote: > > On Wed, 2016-07-27 at 14:48 -0400, Fields Bruce James wrote: >> On Tue, Jul 26, 2016 at 04:08:29PM +, Trond Myklebust wrote: >>> On Jul 26, 2016, at 11:43, J. Bruce Fields wrote: On Tue, Jul 26, 2016 at 09:51:19AM -0400, Trond Myklebust wrote: > We're seeing traces of the following form: > > [10952.396347] svc: transport 88042ba4a 000 dequeued, inuse=2 > [10952.396351] svc: tcp_accept 88042ba4 a000 sock 88042a6e4c80 > [10952.396362] nfsd: connect from 10.2.6.1, port=187 > [10952.396364] svc: svc_setup_socket 8800b99bcf00 > [10952.396368] setting up TCP socket for reading > [10952.396370] svc: svc_setup_socket created 8803eb10a000 (inet > 88042b75b800) > [10952.396373] svc: transport 8803eb10a000 put into queue > [10952.396375] svc: transport 88042ba4a000 put into queue > [10952.396377] svc: server 8800bb0ec000 waiting for data (to = > 360) > [10952.396380] svc: transport 8803eb10a000 dequeued, inuse=2 > [10952.396381] svc_recv: found XPT_CLOSE > [10952.396397] svc: svc_delete_xprt(8803eb10a000) > [10952.396398] svc: svc_tcp_sock_detach(8803eb10a000) > [10952.396399] svc: svc_sock_detach(8803eb10a000) > [10952.396412] svc: svc_sock_free(8803eb10a000) > > i.e. an immediate close of the socket after initialisation. Interesting, thanks! So the one thing I don't understand is why this is correct behavior for accept--I thought it wasn't supposed to return a socket until it was fully established. >>> >>> inet_accept() appears to allow SYN_RECV: >> >> OK. Cc'ing netdev just to make sure we didn't overlook anything. >> > > SYN_RECV after accept() is a TCP Fast Open property I think. > > Maybe you are playing with some global TCP Fast Open settings ? > The Linux kernel client should not be using TCP fast open, but it is possible that some of the other NFSv3 clients we’re using are. Would a standard knfsd listener respond to a TCP fast open request, or would the default behaviour be to ignore it? If the default behaviour for the server is to allow fast open, then we do need these patches, IMO. > >> (Also: what were user-visible symptoms? Mounts failing, or unexpected >> delays?) >> Connection retry storms on the server. >> --b. >> >>> >>> int inet_accept(struct socket *sock, struct socket *newsock, int flags) >>> { >>>struct sock *sk1 = sock->sk; >>>int err = -EINVAL; >>>struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, ); >>> >>>if (!sk2) >>>goto do_err; >>> >>>lock_sock(sk2); >>> >>>sock_rps_record_flow(sk2); >>>WARN_ON(!((1 << sk2->sk_state) & >>> (TCPF_ESTABLISHED | TCPF_SYN_RECV | >>> TCPF_CLOSE_WAIT | TCPF_CLOSE))); >>> >>>sock_graft(sk2, newsock); >>> >>>newsock->state = SS_CONNECTED; >>>err = 0; >>>release_sock(sk2); >>> do_err: >>>return err; >>> } > >
Re: PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)
On Wed, 2016-07-27 at 18:19 +, Brandon Cazander wrote: > [1.] One line summary of the problem: > Using TPROXY together with a DNAT rule (working on older kernels) fails to > work on newer kernels as of commit 079096f103fa > > [2.] Full description of the problem/report: > I performed a git bisect using a qemu image to test my example below, and the > bisect ended at this commit: > > > commit 079096f103faca2dd87342cca6f23d4b34da8871 > > Author: Eric Dumazet> > Date: Fri Oct 2 11:43:32 2015 -0700 > > > > tcp/dccp: install syn_recv requests into ehash table > > [3.] Keywords: networking > > [4.] Kernel information > [4.1.] Kernel version (from /proc/version): > Everything as of commit 079096f103fa (tested up to 4.5.0) > > [4.2.] Kernel .config file: > When performing the bisect, I built with make oldconfig. Let me know if you > want the whole .config file. > > [5.] Most recent kernel version which did not have the bug: > Any kernel that I built prior to commit > 079096f103faca2dd87342cca6f23d4b34da8871 did not have this issue. > > [6.] no Oops > > [7.] A small shell script or example program which triggers the > problem (if possible) > > I have produced what I hope is a minimal example, using the instructions for > TPROXY from > http://lxr.linux.no/#linux+v3.10/Documentation/networking/tproxy.txt and an > example transparent TCP proxy written in C that I found at > https://github.com/kristrev/tproxy-example. > > * I have a machine ("ROUTER") with 10.100.0.164/24 on eth0, and > 192.168.30.2/24 on eth1. This is running the tproxy-example program, with the > following rules: > iptables -t mangle -N DIVERT > iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT > iptables -t mangle -A DIVERT -j MARK --set-mark 1 > iptables -t mangle -A DIVERT -j ACCEPT > iptables -t mangle -A PREROUTING -p tcp --dport 8080 -j TPROXY > --tproxy-mark 0x1/0x1 --on-port 9876 > iptables -t nat -I PREROUTING -i eth0 -d 42.0.1.1 -j DNAT --to-dest > 192.168.30.1 > ip rule add fwmark 1 lookup 100 > ip route add local 0.0.0.0/0 dev lo table 100 > > * There is a machine ("WEBSERVER") at 192.168.30.1/24 hosting a webserver on > port 8080. > > * My workstation is at 10.100.0.206, and I have a static route for both > 192.168.30.2 and 42.0.1.1 via 10.100.0.164. > > * Making a curl request to 192.168.30.2:8080 hits the transparent proxy and > works in both GOOD (before the aforementioned commit) kernel, and BAD (at the > commit or later) kernel. > > * Making a curl request to 42.0.1.1:8080 hits the transparent proxy and works > in GOOD kernel but in BAD kernel I get: > "curl: (56) Recv failure: Connection reset by peer" > > * When it fails, no traffic hits the WEBSERVER. A tcpdump on the bad kernel > shows: > root@dons-qemu-new-kernel:~# tcpdump -niany tcp and port 8080 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 > bytes > 16:42:31.551952 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [S], seq > 3793582216, win 29200, options [mss 1460,sackOK,TS val 632068656 ecr > 0,nop,wscale 7], length 0 > 16:42:31.551988 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq > 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745382 > ecr 632068656,nop,wscale 7], length 0 > 16:42:31.55 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [.], ack 1, > win 229, options [nop,nop,TS val 632068657 ecr 745382], length 0 > 16:42:31.552238 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq > 4042636217, win 0, length 0 > 16:42:31.552246 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [P.], seq > 1:78, ack 1, win 229, options [nop,nop,TS val 632068657 ecr 745382], length 77 > 16:42:31.552251 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq > 4042636217, win 0, length 0 > 16:42:32.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq > 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745632 > ecr 632068656,nop,wscale 7], length 0 > 16:42:32.551925 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq > 3793582217, win 0, length 0 > 16:42:34.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq > 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 746132 > ecr 632068656,nop,wscale 7], length 0 > 16:42:34.551995 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq > 3793582217, win 0, length 0 > > * A tcpdump on a GOOD kernel shows: > root@dons-qemu-old-kernel:~# tcpdump -niany tcp and port 8080 > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 > bytes > 16:44:18.364537 IP 10.100.0.206.35996 > 42.0.1.1.8080: Flags [S], seq > 3963646692, win 29200, options [mss 1460,sackOK,TS val 632175966 ecr
Re: [PATCH 1/2] SUNRPC: accept() may return sockets that are still in SYN_RECV
On Wed, 2016-07-27 at 14:48 -0400, Fields Bruce James wrote: > On Tue, Jul 26, 2016 at 04:08:29PM +, Trond Myklebust wrote: > > > > > On Jul 26, 2016, at 11:43, J. Bruce Fieldswrote: > > > > > > On Tue, Jul 26, 2016 at 09:51:19AM -0400, Trond Myklebust wrote: > > >> We're seeing traces of the following form: > > >> > > >> [10952.396347] svc: transport 88042ba4a 000 dequeued, inuse=2 > > >> [10952.396351] svc: tcp_accept 88042ba4 a000 sock 88042a6e4c80 > > >> [10952.396362] nfsd: connect from 10.2.6.1, port=187 > > >> [10952.396364] svc: svc_setup_socket 8800b99bcf00 > > >> [10952.396368] setting up TCP socket for reading > > >> [10952.396370] svc: svc_setup_socket created 8803eb10a000 (inet > > >> 88042b75b800) > > >> [10952.396373] svc: transport 8803eb10a000 put into queue > > >> [10952.396375] svc: transport 88042ba4a000 put into queue > > >> [10952.396377] svc: server 8800bb0ec000 waiting for data (to = > > >> 360) > > >> [10952.396380] svc: transport 8803eb10a000 dequeued, inuse=2 > > >> [10952.396381] svc_recv: found XPT_CLOSE > > >> [10952.396397] svc: svc_delete_xprt(8803eb10a000) > > >> [10952.396398] svc: svc_tcp_sock_detach(8803eb10a000) > > >> [10952.396399] svc: svc_sock_detach(8803eb10a000) > > >> [10952.396412] svc: svc_sock_free(8803eb10a000) > > >> > > >> i.e. an immediate close of the socket after initialisation. > > > > > > Interesting, thanks! > > > > > > So the one thing I don't understand is why this is correct behavior for > > > accept--I thought it wasn't supposed to return a socket until it was > > > fully established. > > > > inet_accept() appears to allow SYN_RECV: > > OK. Cc'ing netdev just to make sure we didn't overlook anything. > SYN_RECV after accept() is a TCP Fast Open property I think. Maybe you are playing with some global TCP Fast Open settings ? > (Also: what were user-visible symptoms? Mounts failing, or unexpected > delays?) > > --b. > > > > > int inet_accept(struct socket *sock, struct socket *newsock, int flags) > > { > > struct sock *sk1 = sock->sk; > > int err = -EINVAL; > > struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, ); > > > > if (!sk2) > > goto do_err; > > > > lock_sock(sk2); > > > > sock_rps_record_flow(sk2); > > WARN_ON(!((1 << sk2->sk_state) & > > (TCPF_ESTABLISHED | TCPF_SYN_RECV | > > TCPF_CLOSE_WAIT | TCPF_CLOSE))); > > > > sock_graft(sk2, newsock); > > > > newsock->state = SS_CONNECTED; > > err = 0; > > release_sock(sk2); > > do_err: > > return err; > > }
PROBLEM: TPROXY and DNAT broken (bisected to 079096f103fa)
[1.] One line summary of the problem: Using TPROXY together with a DNAT rule (working on older kernels) fails to work on newer kernels as of commit 079096f103fa [2.] Full description of the problem/report: I performed a git bisect using a qemu image to test my example below, and the bisect ended at this commit: > commit 079096f103faca2dd87342cca6f23d4b34da8871 > Author: Eric Dumazet> Date: Fri Oct 2 11:43:32 2015 -0700 > > tcp/dccp: install syn_recv requests into ehash table [3.] Keywords: networking [4.] Kernel information [4.1.] Kernel version (from /proc/version): Everything as of commit 079096f103fa (tested up to 4.5.0) [4.2.] Kernel .config file: When performing the bisect, I built with make oldconfig. Let me know if you want the whole .config file. [5.] Most recent kernel version which did not have the bug: Any kernel that I built prior to commit 079096f103faca2dd87342cca6f23d4b34da8871 did not have this issue. [6.] no Oops [7.] A small shell script or example program which triggers the problem (if possible) I have produced what I hope is a minimal example, using the instructions for TPROXY from http://lxr.linux.no/#linux+v3.10/Documentation/networking/tproxy.txt and an example transparent TCP proxy written in C that I found at https://github.com/kristrev/tproxy-example. * I have a machine ("ROUTER") with 10.100.0.164/24 on eth0, and 192.168.30.2/24 on eth1. This is running the tproxy-example program, with the following rules: iptables -t mangle -N DIVERT iptables -t mangle -A PREROUTING -p tcp -m socket -j DIVERT iptables -t mangle -A DIVERT -j MARK --set-mark 1 iptables -t mangle -A DIVERT -j ACCEPT iptables -t mangle -A PREROUTING -p tcp --dport 8080 -j TPROXY --tproxy-mark 0x1/0x1 --on-port 9876 iptables -t nat -I PREROUTING -i eth0 -d 42.0.1.1 -j DNAT --to-dest 192.168.30.1 ip rule add fwmark 1 lookup 100 ip route add local 0.0.0.0/0 dev lo table 100 * There is a machine ("WEBSERVER") at 192.168.30.1/24 hosting a webserver on port 8080. * My workstation is at 10.100.0.206, and I have a static route for both 192.168.30.2 and 42.0.1.1 via 10.100.0.164. * Making a curl request to 192.168.30.2:8080 hits the transparent proxy and works in both GOOD (before the aforementioned commit) kernel, and BAD (at the commit or later) kernel. * Making a curl request to 42.0.1.1:8080 hits the transparent proxy and works in GOOD kernel but in BAD kernel I get: "curl: (56) Recv failure: Connection reset by peer" * When it fails, no traffic hits the WEBSERVER. A tcpdump on the bad kernel shows: root@dons-qemu-new-kernel:~# tcpdump -niany tcp and port 8080 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 16:42:31.551952 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [S], seq 3793582216, win 29200, options [mss 1460,sackOK,TS val 632068656 ecr 0,nop,wscale 7], length 0 16:42:31.551988 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745382 ecr 632068656,nop,wscale 7], length 0 16:42:31.55 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [.], ack 1, win 229, options [nop,nop,TS val 632068657 ecr 745382], length 0 16:42:31.552238 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 4042636217, win 0, length 0 16:42:31.552246 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [P.], seq 1:78, ack 1, win 229, options [nop,nop,TS val 632068657 ecr 745382], length 77 16:42:31.552251 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [R], seq 4042636217, win 0, length 0 16:42:32.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 745632 ecr 632068656,nop,wscale 7], length 0 16:42:32.551925 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 3793582217, win 0, length 0 16:42:34.551668 IP 42.0.1.1.8080 > 10.100.0.206.35562: Flags [S.], seq 4042636216, ack 3793582217, win 28960, options [mss 1460,sackOK,TS val 746132 ecr 632068656,nop,wscale 7], length 0 16:42:34.551995 IP 10.100.0.206.35562 > 42.0.1.1.8080: Flags [R], seq 3793582217, win 0, length 0 * A tcpdump on a GOOD kernel shows: root@dons-qemu-old-kernel:~# tcpdump -niany tcp and port 8080 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 16:44:18.364537 IP 10.100.0.206.35996 > 42.0.1.1.8080: Flags [S], seq 3963646692, win 29200, options [mss 1460,sackOK,TS val 632175966 ecr 0,nop,wscale 7], length 0 16:44:18.364571 IP 42.0.1.1.8080 > 10.100.0.206.35996: Flags [S.], seq 4117262662, ack 3963646693, win 14480, options [mss 1460,sackOK,TS val 4294903654 ecr 632175966,nop,wscale 7], length 0 16:44:18.364819 IP 10.100.0.206.35996 >
Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
On Wed, 2016-07-27 at 23:01 +0800, Fengguang Wu wrote: > On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote: > >On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote: > >> Hi Eric, > >> > >> It works! > >> > >> On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote: > >> >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote: > >> >> Greetings, > >> >> > >> >> This BUG message can be found in recent kernels as well as v4.4 and > >> >> linux-stable. It happens when running > >> >> > >> >> modprobe netconsole netconsole=@/,$port@$server/ > >> >> > >> >> [ 39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server > >> >> 192.168.1.1 offset -673.833841 sec > >> >> [ 39.943285] netpoll: netconsole: local port 6665 > >> >> [ 39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0 > >> >> [ 39.943609] netpoll: netconsole: interface 'eth0' > >> >> [ 39.943756] netpoll: netconsole: remote port 6672 > >> >> [ 39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1 > >> >> [ 39.944099] netpoll: netconsole: remote ethernet address > >> >> ff:ff:ff:ff:ff:ff > >> >> [ 39.944311] netpoll: netconsole: local IP 192.168.1.193 > >> >> [ 39.944514] BUG: sleeping function called from invalid context at > >> >> kernel/irq/manage.c:110 > >> >> [ 39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: > >> >> modprobe > >> >> [ 39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted > >> >> 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102 > >> >> [ 39.944518] Hardware name: /DZ77BH-55K, BIOS > >> >> BHZ7710H.86A.0097.2012.1228.1346 12/28/2012 > >> >> [ 39.944522] c90001f2f9e8 813417d9 > >> >> 88007faba5c0 > >> >> [ 39.944524] 006e c90001f2fa00 810aec03 > >> >> 81a25948 > >> >> [ 39.944525] c90001f2fa28 810aec9a 8803e5bd9400 > >> >> 8803e50fbd68 > >> >> [ 39.944526] Call Trace: > >> >> [ 39.944533] [] dump_stack+0x63/0x8a > >> >> [ 39.944536] [] ___might_sleep+0xd3/0x120 > >> >> [ 39.944537] [] __might_sleep+0x4a/0x80 > >> >> [ 39.944541] [] synchronize_irq+0x38/0xa0 > >> >> [ 39.944543] [] ? __irq_put_desc_unlock+0x1e/0x40 > >> >> [ 39.944545] [] ? __disable_irq_nosync+0x43/0x60 > >> >> [ 39.944547] [] disable_irq+0x1c/0x20 > >> >> [ 39.944559] [] e1000_netpoll+0xf2/0x120 [e1000e] > >> >> [ 39.944563] [] netpoll_poll_dev+0x5c/0x1a0 > >> >> [ 39.944567] [] ? __kmalloc_reserve+0x31/0x90 > >> >> [ 39.944569] [] netpoll_send_skb_on_dev+0x16b/0x250 > >> >> [ 39.944572] [] netpoll_send_udp+0x2ec/0x450 > >> >> [ 39.944576] [] write_msg+0xb2/0xf0 [netconsole] > >> >> [ 39.944578] [] call_console_drivers+0x115/0x120 > >> >> [ 39.944580] [] console_unlock+0x333/0x5c0 > >> >> [ 39.944583] [] register_console+0x1c4/0x380 > >> >> [ 39.944586] [] init_netconsole+0x1c5/0x1000 > >> >> [netconsole] > >> >> [ 39.944588] [] ? 0xa004f000 > >> >> [ 39.944591] [] do_one_initcall+0x3d/0x150 > >> >> [ 39.944592] [] ? __might_sleep+0x4a/0x80 > >> >> [ 39.944596] [] ? > >> >> kmem_cache_alloc_trace+0x188/0x1e0 > >> >> [ 39.944598] [] do_init_module+0x5f/0x1d8 > >> >> [ 39.944602] [] load_module+0x1429/0x1b40 > >> >> [ 39.944604] [] ? __symbol_put+0x40/0x40 > >> >> [ 39.944607] [] ? kernel_read_file+0x178/0x1a0 > >> >> [ 39.944608] [] ? > >> >> kernel_read_file_from_fd+0x49/0x80 > >> >> [ 39.944611] [] SYSC_finit_module+0xc3/0xf0 > >> >> [ 39.944614] [] SyS_finit_module+0xe/0x10 > >> >> [ 39.944617] [] entry_SYSCALL_64_fastpath+0x1a/0xa9 > >> >> [ 39.946384] console [netcon0] enabled > >> >> [ 39.946514] netconsole: network logging started > >> >> > >> >> Can this be possibly fixed? > >> > > >> >Could you try this ? > >> > > >> >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c > >> >b/drivers/net/ethernet/intel/e1000/e1000_main.c > >> >index > >> >f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b > >> > 100644 > >> >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c > >> >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c > >> >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device > >> >*netdev) > >> > { > >> > struct e1000_adapter *adapter = netdev_priv(netdev); > >> > > >> >- disable_irq(adapter->pdev->irq); > >> >- e1000_intr(adapter->pdev->irq, netdev); > >> >- enable_irq(adapter->pdev->irq); > >> >+ if (napi_schedule_prep(>napi)) { > >> >+ adapter->total_tx_bytes = 0; > >> >+ adapter->total_tx_packets = 0; > >> >+ adapter->total_rx_bytes = 0; > >> >+ adapter->total_rx_packets = 0; > >> >+ __napi_schedule(>napi); > >> >+ } > >> > >> The machines are actually running e1000e driver, so I copied your > >> approach to e1000e and it works: > >> > >> kern :info : [ 16.109647] netpoll: netconsole: local port 6665 > >> kern :info : [ 16.109961] netpoll: netconsole: local IPv4
Re: [PATCH 1/2] SUNRPC: accept() may return sockets that are still in SYN_RECV
On Tue, Jul 26, 2016 at 04:08:29PM +, Trond Myklebust wrote: > > > On Jul 26, 2016, at 11:43, J. Bruce Fieldswrote: > > > > On Tue, Jul 26, 2016 at 09:51:19AM -0400, Trond Myklebust wrote: > >> We're seeing traces of the following form: > >> > >> [10952.396347] svc: transport 88042ba4a 000 dequeued, inuse=2 > >> [10952.396351] svc: tcp_accept 88042ba4 a000 sock 88042a6e4c80 > >> [10952.396362] nfsd: connect from 10.2.6.1, port=187 > >> [10952.396364] svc: svc_setup_socket 8800b99bcf00 > >> [10952.396368] setting up TCP socket for reading > >> [10952.396370] svc: svc_setup_socket created 8803eb10a000 (inet > >> 88042b75b800) > >> [10952.396373] svc: transport 8803eb10a000 put into queue > >> [10952.396375] svc: transport 88042ba4a000 put into queue > >> [10952.396377] svc: server 8800bb0ec000 waiting for data (to = 360) > >> [10952.396380] svc: transport 8803eb10a000 dequeued, inuse=2 > >> [10952.396381] svc_recv: found XPT_CLOSE > >> [10952.396397] svc: svc_delete_xprt(8803eb10a000) > >> [10952.396398] svc: svc_tcp_sock_detach(8803eb10a000) > >> [10952.396399] svc: svc_sock_detach(8803eb10a000) > >> [10952.396412] svc: svc_sock_free(8803eb10a000) > >> > >> i.e. an immediate close of the socket after initialisation. > > > > Interesting, thanks! > > > > So the one thing I don't understand is why this is correct behavior for > > accept--I thought it wasn't supposed to return a socket until it was > > fully established. > > inet_accept() appears to allow SYN_RECV: OK. Cc'ing netdev just to make sure we didn't overlook anything. (Also: what were user-visible symptoms? Mounts failing, or unexpected delays?) --b. > > int inet_accept(struct socket *sock, struct socket *newsock, int flags) > { > struct sock *sk1 = sock->sk; > int err = -EINVAL; > struct sock *sk2 = sk1->sk_prot->accept(sk1, flags, ); > > if (!sk2) > goto do_err; > > lock_sock(sk2); > > sock_rps_record_flow(sk2); > WARN_ON(!((1 << sk2->sk_state) & > (TCPF_ESTABLISHED | TCPF_SYN_RECV | > TCPF_CLOSE_WAIT | TCPF_CLOSE))); > > sock_graft(sk2, newsock); > > newsock->state = SS_CONNECTED; > err = 0; > release_sock(sk2); > do_err: > return err; > }
[PATCH net] 8139too:fix system hang when there is a tx timeout event.
If tx timeout event occur, kernel will call rtl8139_tx_timeout_task() to reset hardware. But in this function, driver does not stop tx and rx function before reset hardware, that will cause system hang. In this patch, add stop tx and rx function before reset hardware. Signed-off-by: Chunhao Lin--- drivers/net/ethernet/realtek/8139too.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c index ef668d3..0d77f3c 100644 --- a/drivers/net/ethernet/realtek/8139too.c +++ b/drivers/net/ethernet/realtek/8139too.c @@ -1667,6 +1667,10 @@ static void rtl8139_tx_timeout_task (struct work_struct *work) int i; u8 tmp8; + napi_disable(>napi); + netif_stop_queue(dev); + synchronize_sched(); + netdev_dbg(dev, "Transmit timeout, status %02x %04x %04x media %02x\n", RTL_R8(ChipCmd), RTL_R16(IntrStatus), RTL_R16(IntrMask), RTL_R8(MediaStatus)); @@ -1696,10 +1700,10 @@ static void rtl8139_tx_timeout_task (struct work_struct *work) spin_unlock_irq(>lock); /* ...and finally, reset everything */ - if (netif_running(dev)) { - rtl8139_hw_start (dev); - netif_wake_queue (dev); - } + napi_enable(>napi); + rtl8139_hw_start (dev); + netif_wake_queue (dev); + spin_unlock_bh(>rx_lock); } -- 1.9.1
Re: [iproute PATCH 0/3] improve MACsec support
On Tue, 26 Jul 2016 15:44:39 +0200 Sabrina Dubrocawrote: > 2016-07-26, 11:03:17 +0200, Davide Caratti wrote: > > parsing of 'cipher' and 'icvlen' arguments has been improved; while at it, > > a couple of missing printouts have been added to usage() functions in > > "ip addr help" and "ip link help". Finally, some errors in the man pages > > have been fixed. > > > > Signed-off-by: Davide Caratti > > > > Davide Caratti (3): > > man: macsec: fix macsec related typos > > ip {link,address}: add 'macsec' item to TYPE list > > macsec: cipher and icvlen can be set separately > > > > ip/ipaddress.c | 2 +- > > ip/iplink.c | 2 +- > > ip/ipmacsec.c| 52 > > > > man/man8/ip-address.8.in | 3 ++- > > man/man8/ip-link.8.in| 17 +++- > > man/man8/ip-macsec.8 | 13 > > 6 files changed, 42 insertions(+), 47 deletions(-) > > Acked-by: Sabrina Dubroca > > Stephen, can this go in before the 4.7 release? This fixes parsing > bugs and docs, it would be nice to have this. > > > Thanks, > Will try, still have a couple open things for 4.7
Re: [PATCH 2/2] net: davinci_cpdma: reduce time holding chan->lock in cpdma_chan_submit
On 27.07.16 17:08, Grygorii Strashko wrote: On 07/27/2016 10:12 AM, Uwe Kleine-König wrote: Hello, On Tue, Jul 26, 2016 at 05:25:58PM +0300, Grygorii Strashko wrote: On 07/26/2016 03:02 PM, Uwe Kleine-König wrote: Allocating and preparing a dma descriptor doesn't need to happen under the channel's lock. So do this before taking the channel's lock. The only down side is that the dma descriptor might be allocated even though the channel is about to be stopped. This is unlikely though. Signed-off-by: Uwe Kleine-König--- drivers/net/ethernet/ti/davinci_cpdma.c | 38 + 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c b/drivers/net/ethernet/ti/davinci_cpdma.c index 5ffa04a306c6..ba3462707ae3 100644 --- a/drivers/net/ethernet/ti/davinci_cpdma.c +++ b/drivers/net/ethernet/ti/davinci_cpdma.c @@ -542,24 +542,10 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void *token, void *data, u32 mode; int ret = 0; - spin_lock_irqsave(>lock, flags); - - if (chan->state == CPDMA_STATE_TEARDOWN) { - ret = -EINVAL; - goto unlock_ret; - } - - if (chan->count >= chan->desc_num) { - chan->stats.desc_alloc_fail++; - ret = -ENOMEM; - goto unlock_ret; - } I'm not sure this is right thing to do. This check is expected to be strict and means "channel has exhausted the available descriptors, so further descs allocation does not allowed". I developed this patch basing on a 4.4 kernel which doesn't have 742fb20fd4c7 ("net: ethernet: ti: cpdma: switch to use genalloc"). There my patch is more obviously correct. As currently chan->count is protected by chan->lock we must hold the lock for this check. If a failing check means we must not call cpdma_desc_alloc in the first place, that's bad. The chan->count is not only case where this lock is needed unfortunately. I like the idea to remove a bunch of locks from here (I was wondering why it needs to have so much locks when using h/w queues, but this is the style driver is written though) This lock is also needed to cover stats counters at least. In case of cpsw driver, that uses cpdam_chan, the same channel can be shared between two emacs (in dual emac mode) then the lock is needed for every chan var. So that's not rare case. In general, the optimization of cpdma is good idea, but seems it can require much more changes. Yes. That's intention of this check :( Now it'll work as following for two (rx/tx) channels, as example RX desc_num = 16 (max allowed number of descriptors) TX desc_num = 16 (max allowed number of descriptors) and with current code number of allocated descriptors will never exceed 16. with your change, in corner case when TX channel's already utilized 16 descriptors the following will happen: cpdma_chan_submit() - cpdma_desc_alloc() - will allocate 17th desc - lock - check for chan->count - fail - unlock - cpdma_desc_free() so your patch will add additional desc_alloc/desc_free in the above corner case and that's what i'm worry about (TEARDOWN seems ok) especially taking into account further multi-queue feature development. Above corner case seems might happen very rare, because of the guard check in cpsw_ndo_start_xmit(), but it could. But I'm not sure this is the case here. After all cpdma_desc_alloc doesn't do anything relevant for the hardware, right? Right. Thanks. I'd try to do some measurement also. -- Regards, Ivan Khoronzhuk
Re: [PATCH v10 12/12] bpf: add sample for xdp forwarding and rewrite
On Tue, 19 Jul 2016 15:05:37 -0700 Alexei Starovoitovwrote: > On Tue, Jul 19, 2016 at 12:16:57PM -0700, Brenden Blanco wrote: > > Add a sample that rewrites and forwards packets out on the same > > interface. Observed single core forwarding performance of ~10Mpps. > > > > Since the mlx4 driver under test recycles every single packet page, the > > perf output shows almost exclusively just the ring management and bpf > > program work. Slowdowns are likely occurring due to cache misses. > > long term we need to resurrect your prefetch patch. sux to leave > so much performance on the table. I will do some (more) attempts at getting prefetching working, also in a way that benefit the normal stack (when I'm back from vacation, and when net-next opens again). I think Rana is also looking into this for the mlx5 driver (based on some patches I send to her). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer
Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
Just to add some more information to this, the corruption seems to effect ssh as well. Using a sun hme interface, occasionally upon an ssh connection it will refuse to authenticate a client with either password or cert authentication. Using wireshark to capture and decrypt the packets between the two machines, the data coming from the server seems good, but the data received by the server from the client is essentially garbage. Note that the client is sending valid data, but the server is corrupting it upon receipt. Closing the connection and starting a new one will remedy the login issue, but you do occasionally see corruption on the server side sporadically. So far this only seems to occur on incoming data, outgoing data seems fine.
Re: [PATCH net-next V2 5/5] net/mlx4: Add VST QinQ support
On 16-07-26 07:23 AM, Tariq Toukan wrote: > Hi John, > > Currently we support one tag set to a VF by the administrator, with the > new interface can be either C-VLAN or S-VLAN. > Once S-VLAN is set by administrator, the VM can set the C-VLANs it wants > to use. > > S-VLAN + C-VLAN per vf currently is not supported, but can be a future > use case. > Command line in such case should add more tuples of {vlan, qos, proto} > such as: > ip link set dev ens2 vf 0 vlan 100 qos 2 proto 802.1ad vlan 50 qos 4 > proto 802.1q > yep looks good to me. > As we currently don't support it, we will change the new > IFLA_VF_VLAN_INFO struct to include a list of such tuples and currently > use only the first one. > Is that what you meant? > Yep that was exactly my suggestion. Because I know hardware that supports the double tag mode and as soon as someone sees they can change the proto they will ask me to support double tags. There are two approaches to this either make the struct understand a list of tags or build it into the netlink message using netlink TLV constructs. I'm OK with either approach take a look at the code and see how it looks. Thanks for working on this. > Regards, > Tariq and Moshe >
Forwarded Message
USA Internet Command Protocol 2016 has selected your e-mail ID as lucky winner of US$800,000.00 out of US$350,000.000.00 shared worldwide for 100.000 e-mail subscribers. Forward your lucky number USAL77720 to the mandated payment center in REP'Du Benin by E-mail : me...@yahoo.com ,CC: q112233324-+1+2+800+1+2+800+1+2+800+$$$u.s.a-corporate.age...@yahoo.com for payment. Authorized Delivery
Re: [v3] UCC_GETH/UCC_FAST: Use IS_ERR_VALUE_U32 API to avoid IS_ERR_VALUE abuses.
I am also agree with Arnd Bergmann. We should use 'static inline function' instead of macro to deal with error check. On Tuesday 26 July 2016 05:09 PM, Arnd Bergmann wrote: On Saturday, July 23, 2016 11:35:51 PM CEST Arvind Yadav wrote: diff --git a/include/linux/err.h b/include/linux/err.h index 1e35588..a42f942 100644 --- a/include/linux/err.h +++ b/include/linux/err.h @@ -19,6 +19,7 @@ #ifndef __ASSEMBLY__ #define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO) +#define IS_ERR_VALUE_U32(x) unlikely((unsigned int)(x) >= (unsigned int)-MAX_ERRNO) static inline void * __must_check ERR_PTR(long error) { This doesn't really look like something we want to have as a generic interface. The IS_ERR_VALUE() API is rather awkward already, and your use seems specific to the cpu_muram_alloc() function. How about something like int cpm_muram_error(unsigned long addr) { if (addr >= (unsigned long)-MAX_ERRNO) return addr; else return 0; } and then use that to check the value returned by the allocation that is still an 'unsigned long', before assigning it to a 'u32'. Arnd
[PATCH net 0/3] r8169:fix 3 runtime pm related issues.
This series of patches fix 3 runtime pm related issues that are listed below. Chunhao Lin (3): r8169:fix kernel log spam when set or get hardware wol setting. r8169:add checking driver's runtime pm status in rtl8169_get_ethtool_stats() r8169:fix nic may not work after changing the mac address. drivers/net/ethernet/realtek/r8169.c | 37 1 file changed, 33 insertions(+), 4 deletions(-) -- 1.9.1
[PATCH net 3/3] r8169:fix nic may not work after changing the mac address.
When there is no AC power, NIC may not work after changing the mac address. Please refer to following link. http://www.spinics.net/lists/netdev/msg356572.html This issue is caused by runtime power management. When there is no AC power, if we put NIC down (ifconfig down), the driver will in runtime suspend state and hardware will be put into D3 state. During this time, driver cannot access hardware regisers. So if you set new mac address during this time, it will not work. And then, after resume, the NIC will keep using the old mac address and the network will not work normally. In this patch I add detecting runtime pm status when setting mac address. If driver is in runtime suspend state, it will skip setting mac address and set the new mac address during runtime resume. Signed-off-by: Chunhao Lin--- drivers/net/ethernet/realtek/r8169.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 4a17342..ff1611e 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -4480,6 +4480,7 @@ static void rtl_rar_set(struct rtl8169_private *tp, u8 *addr) static int rtl_set_mac_address(struct net_device *dev, void *p) { struct rtl8169_private *tp = netdev_priv(dev); + struct pci_dev *pdev = tp->pci_dev; struct sockaddr *addr = p; if (!is_valid_ether_addr(addr->sa_data)) @@ -4487,7 +4488,12 @@ static int rtl_set_mac_address(struct net_device *dev, void *p) memcpy(dev->dev_addr, addr->sa_data, dev->addr_len); - rtl_rar_set(tp, dev->dev_addr); + pm_runtime_get_noresume(>dev); + + if (pm_runtime_active(>dev)) + rtl_rar_set(tp, dev->dev_addr); + + pm_runtime_put_noidle(>dev); return 0; } @@ -7890,6 +7896,7 @@ static int rtl8169_runtime_resume(struct device *device) struct pci_dev *pdev = to_pci_dev(device); struct net_device *dev = pci_get_drvdata(pdev); struct rtl8169_private *tp = netdev_priv(dev); + rtl_rar_set(tp, dev->dev_addr); if (!tp->TxDescArray) return 0; -- 1.9.1
[PATCH net 2/3] r8169:add checking driver's runtime pm status in rtl8169_get_ethtool_stats()
Not to call rtl8169_update_counters() to dump tally counter when driver is in runtime suspend state. Signed-off-by: Chunhao Lin--- drivers/net/ethernet/realtek/r8169.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index f07604f..4a17342 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -2308,11 +2308,17 @@ static void rtl8169_get_ethtool_stats(struct net_device *dev, struct ethtool_stats *stats, u64 *data) { struct rtl8169_private *tp = netdev_priv(dev); + struct pci_dev *pdev = tp->pci_dev; struct rtl8169_counters *counters = tp->counters; ASSERT_RTNL(); - rtl8169_update_counters(dev); + pm_runtime_get_noresume(>dev); + + if (pm_runtime_active(>dev)) + rtl8169_update_counters(dev); + + pm_runtime_put_noidle(>dev); data[0] = le64_to_cpu(counters->tx_packets); data[1] = le64_to_cpu(counters->rx_packets); -- 1.9.1
[PATCH net 1/3] r8169:fix kernel log spam when set or get hardware wol setting.
NIC will be put into D3 state during runtime suspend state. When set or get hardware wol setting, driver will write or read hardware register. If we set or get hardware wol setting in runtime suspend state, because NIC will in D3 state, the register value read by driver will return all 0xff. That will let driver thinking register flag is not toggled and then prints the warning message "rtl_counters_cond == 1 (loop: 1000, delay: 10)" to kernel log. For fixing this issue, add checking driver's pm runtime status in rtl8169_get_wol() and rtl8169_set_wol(). Signed-off-by: Chunhao Lin--- drivers/net/ethernet/realtek/r8169.c | 20 ++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 0e62d74..f07604f 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -1749,13 +1749,21 @@ static u32 __rtl8169_get_wol(struct rtl8169_private *tp) static void rtl8169_get_wol(struct net_device *dev, struct ethtool_wolinfo *wol) { struct rtl8169_private *tp = netdev_priv(dev); + struct pci_dev *pdev = tp->pci_dev; + + pm_runtime_get_noresume(>dev); rtl_lock_work(tp); wol->supported = WAKE_ANY; - wol->wolopts = __rtl8169_get_wol(tp); + if (pm_runtime_active(>dev)) + wol->wolopts = __rtl8169_get_wol(tp); + else + wol->wolopts = tp->saved_wolopts; rtl_unlock_work(tp); + + pm_runtime_put_noidle(>dev); } static void __rtl8169_set_wol(struct rtl8169_private *tp, u32 wolopts) @@ -1845,6 +1853,9 @@ static void __rtl8169_set_wol(struct rtl8169_private *tp, u32 wolopts) static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) { struct rtl8169_private *tp = netdev_priv(dev); + struct pci_dev *pdev = tp->pci_dev; + + pm_runtime_get_noresume(>dev); rtl_lock_work(tp); @@ -1852,12 +1863,17 @@ static int rtl8169_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol) tp->features |= RTL_FEATURE_WOL; else tp->features &= ~RTL_FEATURE_WOL; - __rtl8169_set_wol(tp, wol->wolopts); + if (pm_runtime_active(>dev)) + __rtl8169_set_wol(tp, wol->wolopts); + else + tp->saved_wolopts = wol->wolopts; rtl_unlock_work(tp); device_set_wakeup_enable(>pci_dev->dev, wol->wolopts); + pm_runtime_put_noidle(>dev); + return 0; } -- 1.9.1
Re: [PATCH net] sit: Correctly return -ENOMEM from SIOCGETPRL ioctl.
On Wed, Jul 27, 2016 at 6:57 AM, Phil Turnbullwrote: > -ENOMEM is never returned because the 'out' path unconditionally sets > 'ret' to zero. Remove the 'out' path and return directly when the > allocation fails. > > Fixes: 300aaeeaab5f ("[IPV6] SIT: Add SIOCGETPRL ioctl to get/dump PRL.") > Signed-off-by: Phil Turnbull > --- > net/ipv6/sit.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c > index 0619ac70836d..a71514040bb4 100644 > --- a/net/ipv6/sit.c > +++ b/net/ipv6/sit.c > @@ -320,8 +320,8 @@ static int ipip6_tunnel_get_prl(struct ip_tunnel *t, > */ > kp = kcalloc(ca, sizeof(*kp), GFP_ATOMIC); > if (!kp) { > - ret = -ENOMEM; > - goto out; > + rcu_read_unlock(); > + return -ENOMEM; > } Or even better: we don't have to do the second kcalloc() within rcu read lock, we can move it before rcu_read_lock()?
CPU utilization with kvm / vhost, differences 3.14 / 4.4 / 4.6
Hi, I'm stumped by a weird development in measured CPU utilization when testing an upgrade path from 3.14.70 to 4.4.14. I'm running, on identical hardware (2 4-core Xeon E5420), a HA (active/standby) pair of firewall/loadbalancer VMs. The OS on the host and the VM is identical - openSUSE 13.1 userlevel, qemu 1.6.2 KVM, and kernels self- built from vanilla sources. Inside the VM I make pretty heavy use of ipset, iptables, and ipvs. Traffic level is around 100 mbit/s, mostly ordinary web traffic, translating to around 10 kpps. For the last X months I have been running this on 3.14.x kernels, currently 3.14.70. As that's nearing its end of support, I aim for an upgrade to 4.4.x, testing with 4.4.14. For testing, I keep the kernel _within_ the VM stable - i.e. 3.14.70, and upgrade only the host kernel of one of the two machines, to 4.4.14, and due to the weirdness I'll describe next, to 4.6.4. What I see, and what is totally unexpected, is a severe variation in the system and irq time measured on the host system, and less so inside the VM. The 3.14.70 running host shows 0.6 cores system and 0.4 cores IRQ time. The 4.4.14 running host shows 2.3 cores system and 0.4 cores IRQ time. The same host on 4.6.4, is again back at 0.6 cores system and 0.4 cores IRQ, while the guest (showing as user outside) is down from the 1 core on the previous to kernels, to about 0.6 cores (which I wouldn't complain about) But my desired target kernel, 4.4.14, clearly uses about 1 1/2 cores more on the same load... (all other indicators and measurements I have show that the load served is pretty much stable over the situations I tested). Some details on the networking setup (invariant over the tested kernels): * host bonds 4 NICs, half on on-board BNX2 BCM5708, other half on PCIe card intel 82571EB hardware. The bond mode is LACP. * host lacp bond is then member of an ordinary software bridge interface, which then also has the tap interface to the VM added. There is vlan filtering active on the bridge. * two bridge vlans are separately broken out and member of a second layer bridge with an extra tap interface to my VM. Don't ask why :) but one of these carries about half of the traffic * within the VM, I have another bridge with the VLANs on top and macvlan sprinkled in (keepalived VRRP setup on several legs) * host/vm network is virtio, of course * I had to disable (already some time ago, identical in all tests described here) TSO / GSO / UFO on the tap interfaces to my VM, to alleviate severe performance regressions. Different story, mentioning it just for completeness. Regarding the host hardware, I actually have a third system, software identical, but with some more cores and purely on BNX2 BCM5719. The 4.4.14- needs-lots-more-systemtime symptoms were practically the same there. To end this tale, let me note that I have NO operational problems with the test using the 4.4.14 kernel, as far as one can know that within some hours of testing. All production metrics (and I have lots of them) are fine - except for that system time usage on the host system... Anybody got a clue what may be happening? I'm a bit reluctant to jump to 4.6.x or newer kernels, as I like the concept of long term stable kernels somehow... :) best regards Patrick
Patch: fix rotting packet OKI PCH_GBE switch
This is against torvalds 4.7.0 repo This resolves a rotting SYN-ACK packet issue with the OKI PCH_GBE switch. The problem showed up intermittently where a client would attempt to establish a TCP connection to the host on an otherwise quiet network. No other activity had occurred for a significant time. This was a race condition that was finally duplicated using a controlled set of timings between the SYN / ACK / SYN-ACK sequence. We found that by adjusting the timing between the host ACK and the client transmission of the SYN-ACK that we had the SYN-ACK "stuck" in the switch data register. The problem is that pch_gbe_irq_enable() is called when the transmit of the ACK has completed, then pch_gbe_irq_enable() will enable the interrupts of the switch, followed by a read of the INT_ST which results in clearing the status bits. While the device would still issue an interrupt, the pch_gbe_intr() handler would not see any pending requests as the status register had been cleared. Thus, the packet remains "stuck". Signed-off-by: Tom Walshdiff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c index 3cd87a4..baba31d 100644 --- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c +++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c @@ -809,13 +809,16 @@ static void pch_gbe_irq_disable(struct pch_gbe_adapter *adapter) * pch_gbe_irq_enable - Enable default interrupt generation settings * @adapter: Board private structure */ -static void pch_gbe_irq_enable(struct pch_gbe_adapter *adapter) +static void pch_gbe_irq_enable(struct pch_gbe_adapter *adapter, + bool reset_status_register) { struct pch_gbe_hw *hw = >hw; - if (likely(atomic_dec_and_test(>irq_sem))) + if (likely(atomic_dec_and_test(>irq_sem))) { iowrite32(PCH_GBE_INT_ENABLE_MASK, >reg->INT_EN); - ioread32(>reg->INT_ST); + if (reset_status_register) + ioread32(>reg->INT_ST); + } netdev_dbg(adapter->netdev, "INT_EN reg : 0x%08x\n", ioread32(>reg->INT_EN)); } @@ -1976,7 +1979,7 @@ int pch_gbe_up(struct pch_gbe_adapter *adapter) mod_timer(>watchdog_timer, jiffies); napi_enable(>napi); - pch_gbe_irq_enable(adapter); + pch_gbe_irq_enable(adapter, true); netif_start_queue(adapter->netdev); return 0; @@ -2392,7 +2395,7 @@ static int pch_gbe_napi_poll(struct napi_struct *napi, int budget) if (poll_end_flag) { napi_complete(napi); - pch_gbe_irq_enable(adapter); + pch_gbe_irq_enable(adapter, false); } if (adapter->rx_stop_flag) {
RE: [PATCH 15/15] ethernet: ti: davinci_emac: add missing of_node_put after calling of_parse_phandle
>On Wednesday 27 July 2016 07:50 AM, Peter Chen wrote: >> of_node_put needs to be called when the device node which is got from >> of_parse_phandle has finished using. >> >> Signed-off-by: Peter Chen>> --- >> drivers/net/ethernet/ti/davinci_emac.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/net/ethernet/ti/davinci_emac.c >> b/drivers/net/ethernet/ti/davinci_emac.c >> index c6c5465..d8cb9d0 100644 >> --- a/drivers/net/ethernet/ti/davinci_emac.c >> +++ b/drivers/net/ethernet/ti/davinci_emac.c >> @@ -1571,6 +1571,7 @@ static int emac_dev_open(struct net_device *ndev) >> if (priv->phy_node) { >> phydev = of_phy_connect(ndev, priv->phy_node, >> _adjust_link, 0, 0); >> +of_node_put(priv->phy_node); >> if (!phydev) { >> dev_err(emac_dev, "could not connect to phy %s\n", >> priv->phy_node->full_name); >> > >phy_node is accessed in case of of_phy_connect() returns error, so it has to be >moved after the dev_err log > Yeah, you are right. I will change it, thanks. Peter
Re: [e1000_netpoll] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110
On Tue, Jul 26, 2016 at 06:28:33PM +0200, Eric Dumazet wrote: On Tue, 2016-07-26 at 23:32 +0800, Fengguang Wu wrote: Hi Eric, It works! On Tue, Jul 26, 2016 at 11:14:52AM +0200, Eric Dumazet wrote: >On Tue, 2016-07-26 at 11:50 +0800, Fengguang Wu wrote: >> Greetings, >> >> This BUG message can be found in recent kernels as well as v4.4 and >> linux-stable. It happens when running >> >> modprobe netconsole netconsole=@/,$port@$server/ >> >> [ 39.937534] 22 Jul 13:30:40 ntpdate[440]: step time server 192.168.1.1 offset -673.833841 sec >> [ 39.943285] netpoll: netconsole: local port 6665 >> [ 39.943436] netpoll: netconsole: local IPv4 address 0.0.0.0 >> [ 39.943609] netpoll: netconsole: interface 'eth0' >> [ 39.943756] netpoll: netconsole: remote port 6672 >> [ 39.943913] netpoll: netconsole: remote IPv4 address 192.168.1.1 >> [ 39.944099] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff >> [ 39.944311] netpoll: netconsole: local IP 192.168.1.193 >> [ 39.944514] BUG: sleeping function called from invalid context at kernel/irq/manage.c:110 >> [ 39.944515] in_atomic(): 1, irqs_disabled(): 1, pid: 448, name: modprobe >> [ 39.944517] CPU: 6 PID: 448 Comm: modprobe Not tainted 4.7.0-rc7-wt-ath-10122-gf9b5ec2 #102 >> [ 39.944518] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012 >> [ 39.944522] c90001f2f9e8 813417d9 88007faba5c0 >> [ 39.944524] 006e c90001f2fa00 810aec03 81a25948 >> [ 39.944525] c90001f2fa28 810aec9a 8803e5bd9400 8803e50fbd68 >> [ 39.944526] Call Trace: >> [ 39.944533] [] dump_stack+0x63/0x8a >> [ 39.944536] [] ___might_sleep+0xd3/0x120 >> [ 39.944537] [] __might_sleep+0x4a/0x80 >> [ 39.944541] [] synchronize_irq+0x38/0xa0 >> [ 39.944543] [] ? __irq_put_desc_unlock+0x1e/0x40 >> [ 39.944545] [] ? __disable_irq_nosync+0x43/0x60 >> [ 39.944547] [] disable_irq+0x1c/0x20 >> [ 39.944559] [] e1000_netpoll+0xf2/0x120 [e1000e] >> [ 39.944563] [] netpoll_poll_dev+0x5c/0x1a0 >> [ 39.944567] [] ? __kmalloc_reserve+0x31/0x90 >> [ 39.944569] [] netpoll_send_skb_on_dev+0x16b/0x250 >> [ 39.944572] [] netpoll_send_udp+0x2ec/0x450 >> [ 39.944576] [] write_msg+0xb2/0xf0 [netconsole] >> [ 39.944578] [] call_console_drivers+0x115/0x120 >> [ 39.944580] [] console_unlock+0x333/0x5c0 >> [ 39.944583] [] register_console+0x1c4/0x380 >> [ 39.944586] [] init_netconsole+0x1c5/0x1000 [netconsole] >> [ 39.944588] [] ? 0xa004f000 >> [ 39.944591] [] do_one_initcall+0x3d/0x150 >> [ 39.944592] [] ? __might_sleep+0x4a/0x80 >> [ 39.944596] [] ? kmem_cache_alloc_trace+0x188/0x1e0 >> [ 39.944598] [] do_init_module+0x5f/0x1d8 >> [ 39.944602] [] load_module+0x1429/0x1b40 >> [ 39.944604] [] ? __symbol_put+0x40/0x40 >> [ 39.944607] [] ? kernel_read_file+0x178/0x1a0 >> [ 39.944608] [] ? kernel_read_file_from_fd+0x49/0x80 >> [ 39.944611] [] SYSC_finit_module+0xc3/0xf0 >> [ 39.944614] [] SyS_finit_module+0xe/0x10 >> [ 39.944617] [] entry_SYSCALL_64_fastpath+0x1a/0xa9 >> [ 39.946384] console [netcon0] enabled >> [ 39.946514] netconsole: network logging started >> >> Can this be possibly fixed? > >Could you try this ? > >diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c >index f42129d09e2c23ba9fdb5cde890d50ecb7166a42..a53c41c4c4f7d1fe52f95a2cab8784a938b3820b 100644 >--- a/drivers/net/ethernet/intel/e1000/e1000_main.c >+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c >@@ -5257,9 +5257,13 @@ static void e1000_netpoll(struct net_device *netdev) > { >struct e1000_adapter *adapter = netdev_priv(netdev); > >- disable_irq(adapter->pdev->irq); >- e1000_intr(adapter->pdev->irq, netdev); >- enable_irq(adapter->pdev->irq); >+ if (napi_schedule_prep(>napi)) { >+ adapter->total_tx_bytes = 0; >+ adapter->total_tx_packets = 0; >+ adapter->total_rx_bytes = 0; >+ adapter->total_rx_packets = 0; >+ __napi_schedule(>napi); >+ } The machines are actually running e1000e driver, so I copied your approach to e1000e and it works: kern :info : [ 16.109647] netpoll: netconsole: local port 6665 kern :info : [ 16.109961] netpoll: netconsole: local IPv4 address 0.0.0.0 kern :info : [ 16.110346] netpoll: netconsole: interface 'eth0' kern :info : [ 16.110672] netpoll: netconsole: remote port 6676 kern :info : [ 16.110991] netpoll: netconsole: remote IPv4 address 192.168.2.1 kern :info : [ 16.111398] netpoll: netconsole: remote ethernet address ff:ff:ff:ff:ff:ff kern :info : [ 16.111845] netpoll: netconsole: local IP 192.168.2.3 kern :info : [ 16.114284] console [netcon0] enabled kern :info : [ 16.114550] netconsole: network logging started However I'm not sure if it'll have side effects, because this effectively
Re: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization
On Wed, Jul 27, 2016 at 02:09:13PM +, Avargil, Raanan wrote: >> This is prepatory work for an expanding list of adapter families that have >> occasional ~10 hour clock jumps when being used for PTP. Factor out the >> sanitization function and convert to using a feature (bug) flag, per >> suggestion from Jesse Brandeburg. >> >> Littering functional code with device-specific checks is much messier than >> simply checking a flag, and having device-specific init set flags as needed. >> There are probably a number of other cases in the e1000e code that >> could/should be converted similarly. > > Looks ok to me. > Adding Chris who asked what happens if we reach the max retry counter > (E1000_MAX_82574_SYSTIM_REREAD)? > This counter is set to 50. > Can you, for testing purposes, decreased this value (or even set it to 0) and > see what happens? Unfortunately, I don't have direct access to the affected hardware myself, so I'd have to prep a test build, hand it off to someone and play relay. I could do that, but it'd have some lag and possible multiple round-trips... Anyone inside Intel have hardware handy to test on? :p -- Jarod Wilson ja...@redhat.com
Re: [PATCH 0/2] net: davinci_cpdma: reduce latency on -rt
Hello, On Wed, Jul 27, 2016 at 05:11:54PM +0300, Grygorii Strashko wrote: > On 07/27/2016 10:03 AM, Uwe Kleine-König wrote: > > On Tue, Jul 26, 2016 at 05:36:49PM +0300, Grygorii Strashko wrote: > >> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote: > >>> Hello, > >>> > >>> these patches are based on next-20160726. I didn't check yet how latency > >>> improves by using these patches, but even if the improvment is small, > >>> it's still a good idea to have them. > >> > >> Sry, but how this will affect on -RT? This is not a raw locks, so > >> they will be converted to rt-mutexes which are sleepable. > >> Or I've missed smth? > > > > They are still locks after all. On -rt I saw for the relevant > > application: > > > > send package | > > take lock | > > write pckt to hw | > >| rcv irq > >| take lock > >| schedule > > drop lock | > > schedule | > >| get pckt from hw > >| drop lock > > > > So reducing the time a lock is taken reduces the chances that the lock > > is contended for another thread which results in extra context switches. > > > Thanks a lot for explanation. So, this is not exactly rt-latency reduction, > but it might improve net performance on -RT. correct? Well, it's not really rt related, but if you hit a locked lock on rt it hurts more than on !rt. And this results in increased latency. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ |
Re: [PATCH 0/2] net: davinci_cpdma: reduce latency on -rt
On 07/27/2016 10:03 AM, Uwe Kleine-König wrote: > On Tue, Jul 26, 2016 at 05:36:49PM +0300, Grygorii Strashko wrote: >> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote: >>> Hello, >>> >>> these patches are based on next-20160726. I didn't check yet how latency >>> improves by using these patches, but even if the improvment is small, >>> it's still a good idea to have them. >> >> Sry, but how this will affect on -RT? This is not a raw locks, so >> they will be converted to rt-mutexes which are sleepable. >> Or I've missed smth? > > They are still locks after all. On -rt I saw for the relevant > application: > > send package | > take lock | > write pckt to hw | >| rcv irq > | take lock > | schedule > drop lock| > schedule | >| get pckt from hw > | drop lock > > So reducing the time a lock is taken reduces the chances that the lock > is contended for another thread which results in extra context switches. > Thanks a lot for explanation. So, this is not exactly rt-latency reduction, but it might improve net performance on -RT. correct? Thanks. -- regards, -grygorii
RE: [Intel-wired-lan] [PATCH net-next v3 1/2] e1000e: factor out systim sanitization
This is prepatory work for an expanding list of adapter families that have occasional ~10 hour clock jumps when being used for PTP. Factor out the sanitization function and convert to using a feature (bug) flag, per suggestion from Jesse Brandeburg. Littering functional code with device-specific checks is much messier than simply checking a flag, and having device-specific init set flags as needed. There are probably a number of other cases in the e1000e code that could/should be converted similarly. Suggested-by: Jesse BrandeburgCC: Jesse Brandeburg CC: Jeff Kirsher CC: intel-wired-...@lists.osuosl.org CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson --- drivers/net/ethernet/intel/e1000e/82571.c | 6 ++- drivers/net/ethernet/intel/e1000e/e1000.h | 1 + drivers/net/ethernet/intel/e1000e/netdev.c | 66 ++ 3 files changed, 44 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/82571.c b/drivers/net/ethernet/intel/e1000e/82571.c index 7fd4d54..6b03c85 100644 --- a/drivers/net/ethernet/intel/e1000e/82571.c +++ b/drivers/net/ethernet/intel/e1000e/82571.c @@ -2032,7 +2032,8 @@ const struct e1000_info e1000_82574_info = { | FLAG2_DISABLE_ASPM_L0S | FLAG2_DISABLE_ASPM_L1 | FLAG2_NO_DISABLE_RX - | FLAG2_DMA_BURST, + | FLAG2_DMA_BURST + | FLAG2_CHECK_SYSTIM_OVERFLOW, .pba= 32, .max_hw_frame_size = DEFAULT_JUMBO, .get_variants = e1000_get_variants_82571, @@ -2053,7 +2054,8 @@ const struct e1000_info e1000_82583_info = { | FLAG_HAS_CTRLEXT_ON_LOAD, .flags2 = FLAG2_DISABLE_ASPM_L0S | FLAG2_DISABLE_ASPM_L1 - | FLAG2_NO_DISABLE_RX, + | FLAG2_NO_DISABLE_RX + | FLAG2_CHECK_SYSTIM_OVERFLOW, .pba= 32, .max_hw_frame_size = DEFAULT_JUMBO, .get_variants = e1000_get_variants_82571, diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h index ef96cd1..879cca4 100644 --- a/drivers/net/ethernet/intel/e1000e/e1000.h +++ b/drivers/net/ethernet/intel/e1000e/e1000.h @@ -452,6 +452,7 @@ s32 e1000e_get_base_timinca(struct e1000_adapter *adapter, u32 *timinca); #define FLAG2_PCIM2PCI_ARBITER_WA BIT(11) #define FLAG2_DFLT_CRC_STRIPPING BIT(12) #define FLAG2_CHECK_RX_HWTSTAMP BIT(13) +#define FLAG2_CHECK_SYSTIM_OVERFLOW BIT(14) #define E1000_RX_DESC_PS(R, i) \ (&(((union e1000_rx_desc_packet_split *)((R).desc))[i])) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index 41f32c0..1b2df11 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -4303,6 +4303,42 @@ void e1000e_reinit_locked(struct e1000_adapter *adapter) } /** + * e1000e_sanitize_systim - sanitize raw cycle counter reads + * @hw: pointer to the HW structure + * @systim: cycle_t value read, sanitized and returned + * + * Errata for 82574/82583 possible bad bits read from SYSTIMH/L: + * check to see that the time is incrementing at a reasonable + * rate and is a multiple of incvalue. + **/ +static cycle_t e1000e_sanitize_systim(struct e1000_hw *hw, cycle_t +systim) { + u64 time_delta, rem, temp; + cycle_t systim_next; + u32 incvalue; + int i; + + incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK; + for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) { + /* latch SYSTIMH on read of SYSTIML */ + systim_next = (cycle_t)er32(SYSTIML); + systim_next |= (cycle_t)er32(SYSTIMH) << 32; + + time_delta = systim_next - systim; + temp = time_delta; + /* VMWare users have seen incvalue of zero, don't div / 0 */ + rem = incvalue ? do_div(temp, incvalue) : (time_delta != 0); + + systim = systim_next; + + if ((time_delta < E1000_82574_SYSTIM_EPSILON) && (rem == 0)) + break; + } + + return systim; +} + +/** * e1000e_cyclecounter_read - read raw cycle counter (used by time counter) * @cc: cyclecounter structure **/ @@ -4312,7 +4348,7 @@ static cycle_t e1000e_cyclecounter_read(const struct cyclecounter *cc) cc); struct e1000_hw *hw = >hw; u32 systimel, systimeh; - cycle_t systim, systim_next; + cycle_t systim; /* SYSTIMH
Re: [PATCH 2/2] net: davinci_cpdma: reduce time holding chan->lock in cpdma_chan_submit
On 07/27/2016 10:12 AM, Uwe Kleine-König wrote: > Hello, > > On Tue, Jul 26, 2016 at 05:25:58PM +0300, Grygorii Strashko wrote: >> On 07/26/2016 03:02 PM, Uwe Kleine-König wrote: >>> Allocating and preparing a dma descriptor doesn't need to happen under >>> the channel's lock. So do this before taking the channel's lock. The only >>> down side is that the dma descriptor might be allocated even though the >>> channel is about to be stopped. This is unlikely though. >>> >>> Signed-off-by: Uwe Kleine-König>>> --- >>> drivers/net/ethernet/ti/davinci_cpdma.c | 38 >>> + >>> 1 file changed, 20 insertions(+), 18 deletions(-) >>> >>> diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c >>> b/drivers/net/ethernet/ti/davinci_cpdma.c >>> index 5ffa04a306c6..ba3462707ae3 100644 >>> --- a/drivers/net/ethernet/ti/davinci_cpdma.c >>> +++ b/drivers/net/ethernet/ti/davinci_cpdma.c >>> @@ -542,24 +542,10 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void >>> *token, void *data, >>> u32 mode; >>> int ret = 0; >>> >>> - spin_lock_irqsave(>lock, flags); >>> - >>> - if (chan->state == CPDMA_STATE_TEARDOWN) { >>> - ret = -EINVAL; >>> - goto unlock_ret; >>> - } >>> - >>> - if (chan->count >= chan->desc_num) { >>> - chan->stats.desc_alloc_fail++; >>> - ret = -ENOMEM; >>> - goto unlock_ret; >>> - } >> >> I'm not sure this is right thing to do. This check is expected to be strict >> and means "channel has exhausted the available descriptors, so further descs >> allocation does not allowed". > > I developed this patch basing on a 4.4 kernel which doesn't have > 742fb20fd4c7 ("net: ethernet: ti: cpdma: switch to use genalloc"). There > my patch is more obviously correct. As currently chan->count is > protected by chan->lock we must hold the lock for this check. If a > failing check means we must not call cpdma_desc_alloc in the first > place, that's bad. Yes. That's intention of this check :( Now it'll work as following for two (rx/tx) channels, as example RX desc_num = 16 (max allowed number of descriptors) TX desc_num = 16 (max allowed number of descriptors) and with current code number of allocated descriptors will never exceed 16. with your change, in corner case when TX channel's already utilized 16 descriptors the following will happen: cpdma_chan_submit() - cpdma_desc_alloc() - will allocate 17th desc - lock - check for chan->count - fail - unlock - cpdma_desc_free() so your patch will add additional desc_alloc/desc_free in the above corner case and that's what i'm worry about (TEARDOWN seems ok) especially taking into account further multi-queue feature development. Above corner case seems might happen very rare, because of the guard check in cpsw_ndo_start_xmit(), but it could. > > But I'm not sure this is the case here. After all cpdma_desc_alloc > doesn't do anything relevant for the hardware, right? Right. Thanks. I'd try to do some measurement also. -- regards, -grygorii
[no subject]
Greeting, Did you receive the fund letter i send you last? from Grace Bunyan
[PATCH net] sit: Correctly return -ENOMEM from SIOCGETPRL ioctl.
-ENOMEM is never returned because the 'out' path unconditionally sets 'ret' to zero. Remove the 'out' path and return directly when the allocation fails. Fixes: 300aaeeaab5f ("[IPV6] SIT: Add SIOCGETPRL ioctl to get/dump PRL.") Signed-off-by: Phil Turnbull--- net/ipv6/sit.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index 0619ac70836d..a71514040bb4 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -320,8 +320,8 @@ static int ipip6_tunnel_get_prl(struct ip_tunnel *t, */ kp = kcalloc(ca, sizeof(*kp), GFP_ATOMIC); if (!kp) { - ret = -ENOMEM; - goto out; + rcu_read_unlock(); + return -ENOMEM; } } @@ -337,7 +337,7 @@ static int ipip6_tunnel_get_prl(struct ip_tunnel *t, if (kprl.addr != htonl(INADDR_ANY)) break; } -out: + rcu_read_unlock(); len = sizeof(*kp) * c; -- 2.9.0.rc2
Re: [net-next] neigh: allow admin to set NUD_STALE
On Wed, 27 Jul 2016 09:56:50 +0300, Julian Anastasovwrote: > Admin should be able to set any state. Currently, this fails > when lladdr is not changed and state is changed from > NUD_CONNECTED to NUD_STALE: > > ip neigh add 192.168.8.1 lladdr 00:11:22:33:44:55 nud perm dev wlan0 > ip neigh show to 192.168.8.1 > 192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT > ip neigh change 192.168.8.1 lladdr 00:11:22:33:44:55 nud stale dev wlan0 > ip neigh show to 192.168.8.1 > 192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT > > Problem may be from 2.1.X days. > > Signed-off-by: Julian Anastasov > --- > net/core/neighbour.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index cf26e04c4..2ae929f 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -1148,7 +1148,8 @@ int neigh_update(struct neighbour *neigh, const u8 > *lladdr, u8 new, > } else > goto out; > } else { > - if (lladdr == neigh->ha && new == NUD_STALE) > + if (lladdr == neigh->ha && new == NUD_STALE && > + !(flags & NEIGH_UPDATE_F_ADMIN)) > new = old; > } > } Reviewed-by: Chunhui He
[PATCH net 6/6] qed: Prevent over-usage of vlan credits by PF
Each PF/VF has a limited number of vlan filters for configuration purposes; This information is passed to qede and is used to prevent over-usage - once a vlan is to be configured and no filter credit is available, the driver would switch into working in vlan-promisc mode. Problem is the credit pool is shared by both PFs and VFs, and currently PFs aren't deducting the filters that are reserved for their VFs from their quota, which may lead to some vlan filters failing unknowingly due to lack of credit. Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed_l2.c | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c index ff388208..4c46d14 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c @@ -1656,6 +1656,8 @@ static int qed_fill_eth_dev_info(struct qed_dev *cdev, info->num_tc = 1; if (IS_PF(cdev)) { + int max_vf_vlan_filters = 0; + if (cdev->int_params.out.int_mode == QED_INT_MODE_MSIX) { for_each_hwfn(cdev, i) info->num_queues += @@ -1668,7 +1670,12 @@ static int qed_fill_eth_dev_info(struct qed_dev *cdev, info->num_queues = cdev->num_hwfns; } - info->num_vlan_filters = RESC_NUM(>hwfns[0], QED_VLAN); + if (IS_QED_SRIOV(cdev)) + max_vf_vlan_filters = cdev->p_iov_info->total_vfs * + QED_ETH_VF_NUM_VLAN_FILTERS; + info->num_vlan_filters = RESC_NUM(>hwfns[0], QED_VLAN) - +max_vf_vlan_filters; + ether_addr_copy(info->port_mac, cdev->hwfns[0].hw_info.hw_mac_addr); } else { -- 1.9.3
[PATCH net 5/6] qed: Correct min bandwidth for 100g
Driver uses reverse logic when checking if minimum bandwidth configuration applied, causing it to configure the guarantee only on the first hw-function. Fixes: a0d26d5a4fc8 ("qed*: Don't reset statistics on inner reload") Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed_dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c index 2d89e8c..9e305d5 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_dev.c +++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c @@ -2089,7 +2089,7 @@ int qed_configure_vport_wfq(struct qed_dev *cdev, u16 vp_id, u32 rate) rc = __qed_configure_vport_wfq(p_hwfn, p_ptt, vp_id, rate); - if (!rc) { + if (rc) { qed_ptt_release(p_hwfn, p_ptt); return rc; } -- 1.9.3
[PATCH net 4/6] qede: Reset statistics on explicit down
Adding the necessary logic to prevet statistics reset on inner-reload introduced a bug, and now statistics are reset only when re-probing the driver. Fixes: a0d26d5a4fc8e ("qed*: Don't reset statistics on inner reload") Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qede/qede_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index 8fe3cc1..14f79c3 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -3256,6 +3256,7 @@ static int qede_start_queues(struct qede_dev *edev, bool clear_stats) start.vport_id = 0; start.drop_ttl0 = true; start.remove_inner_vlan = vlan_removal_en; + start.clear_stats = clear_stats; rc = edev->ops->vport_start(cdev, ); -- 1.9.3
[PATCH net 0/6] qed*: Small fixes series
Hi Dave, This contains several small [and straight-forward] fixes to qed* drivers. Please consider applying this to `net'. Thanks, Yuval Yuval Mintz (6): qede: Don't try removing unconfigured vlans qed: Fix removal of spoof checking for VFs qed: Don't over-do producer cleanup for Rx qede: Reset statistics on explicit down qed: Correct min bandwidth for 100g qed: Prevent over-usage of vlan credits by PF drivers/net/ethernet/qlogic/qed/qed_dev.c| 2 +- drivers/net/ethernet/qlogic/qed/qed_l2.c | 13 ++--- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 2 +- drivers/net/ethernet/qlogic/qed/qed_vf.c | 4 ++-- drivers/net/ethernet/qlogic/qede/qede_main.c | 12 5 files changed, 22 insertions(+), 11 deletions(-) -- 1.9.3
[PATCH net 3/6] qed: Don't over-do producer cleanup for Rx
Before requesting the firmware to start Rx queues, driver goes and sets the queue producer in the device to 0. But while the producer is 32-bit, the driver currently clears 64 bits, effectively zeroing an additional CID's producer as well. Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed_l2.c | 4 ++-- drivers/net/ethernet/qlogic/qed/qed_vf.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c index aada4c7..ff388208 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_l2.c +++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c @@ -587,7 +587,7 @@ qed_sp_eth_rx_queue_start(struct qed_hwfn *p_hwfn, u16 cqe_pbl_size, void __iomem **pp_prod) { struct qed_hw_cid_data *p_rx_cid; - u64 init_prod_val = 0; + u32 init_prod_val = 0; u16 abs_l2_queue = 0; u8 abs_stats_id = 0; int rc; @@ -615,7 +615,7 @@ qed_sp_eth_rx_queue_start(struct qed_hwfn *p_hwfn, MSTORM_PRODS_OFFSET(abs_l2_queue); /* Init the rcq, rx bd and rx sge (if valid) producers to 0 */ - __internal_ram_wr(p_hwfn, *pp_prod, sizeof(u64), + __internal_ram_wr(p_hwfn, *pp_prod, sizeof(u32), (u32 *)(_prod_val)); /* Allocate a CID for the queue */ diff --git a/drivers/net/ethernet/qlogic/qed/qed_vf.c b/drivers/net/ethernet/qlogic/qed/qed_vf.c index 72e69c0..c39e26a 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_vf.c +++ b/drivers/net/ethernet/qlogic/qed/qed_vf.c @@ -353,7 +353,7 @@ int qed_vf_pf_rxq_start(struct qed_hwfn *p_hwfn, /* Learn the address of the producer from the response */ if (pp_prod) { - u64 init_prod_val = 0; + u32 init_prod_val = 0; *pp_prod = (u8 __iomem *)p_hwfn->regview + resp->offset; DP_VERBOSE(p_hwfn, QED_MSG_IOV, @@ -361,7 +361,7 @@ int qed_vf_pf_rxq_start(struct qed_hwfn *p_hwfn, rx_qid, *pp_prod, resp->offset); /* Init the rcq, rx bd and rx sge (if valid) producers to 0 */ - __internal_ram_wr(p_hwfn, *pp_prod, sizeof(u64), + __internal_ram_wr(p_hwfn, *pp_prod, sizeof(u32), (u32 *)_prod_val); } -- 1.9.3
[PATCH net 1/6] qede: Don't try removing unconfigured vlans
As part of ndo_vlan_rx_kill_vid() implementation, qede is requesting firmware to remove the vlan filter. This currently happens even if the vlan wasn't previously added [In case device ran out of vlan credits]. For PFs this doesn't cause any issues as the firmware would simply ignore the removal request. But for VFs their parent PF is holding an accounting of the configured vlans, and such a request would cause the PF to fail the VF's removal request. Simply fix this for both PFs & VFs and don't remove filters that were not previously added. Fixes: 7c1bfcad9f3c8 ("qede: Add vlan filtering offload support") Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qede/qede_main.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c index f8e11f9..8fe3cc1 100644 --- a/drivers/net/ethernet/qlogic/qede/qede_main.c +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c @@ -2050,10 +2050,13 @@ static int qede_vlan_rx_kill_vid(struct net_device *dev, __be16 proto, u16 vid) } /* Remove vlan */ - rc = qede_set_ucast_rx_vlan(edev, QED_FILTER_XCAST_TYPE_DEL, vid); - if (rc) { - DP_ERR(edev, "Failed to remove VLAN %d\n", vid); - return -EINVAL; + if (vlan->configured) { + rc = qede_set_ucast_rx_vlan(edev, QED_FILTER_XCAST_TYPE_DEL, + vid); + if (rc) { + DP_ERR(edev, "Failed to remove VLAN %d\n", vid); + return -EINVAL; + } } qede_del_vlan_from_list(edev, vlan); -- 1.9.3
[PATCH net 2/6] qed: Fix removal of spoof checking for VFs
Driver has reverse logic for checking the result of the spoof-checking configuration. As a result, it would log that the configuration failed [even though it succeeded], and will no longer do anything when requested to remove the configuration, as it's accounting of the feature will be incorrect. Fixes: 6ddc7608258d5 ("qed*: IOV support spoof-checking") Signed-off-by: Yuval Mintz--- drivers/net/ethernet/qlogic/qed/qed_sriov.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.c b/drivers/net/ethernet/qlogic/qed/qed_sriov.c index c325ee8..875c105 100644 --- a/drivers/net/ethernet/qlogic/qed/qed_sriov.c +++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.c @@ -1296,7 +1296,7 @@ static int __qed_iov_spoofchk_set(struct qed_hwfn *p_hwfn, params.anti_spoofing_en = val; rc = qed_sp_vport_update(p_hwfn, , QED_SPQ_MODE_EBLOCK, NULL); - if (rc) { + if (!rc) { p_vf->spoof_chk = val; p_vf->req_spoofchk_val = p_vf->spoof_chk; DP_VERBOSE(p_hwfn, QED_MSG_IOV, -- 1.9.3
Re: [PATCH 12/15] ethernet: renesas: sh_eth: add missing of_node_put after calling of_parse_phandle
On 7/27/2016 5:20 AM, Peter Chen wrote: of_node_put needs to be called when the device node which is got from of_parse_phandle has finished using. Signed-off-by: Peter ChenAcked-by: Sergei Shtylyov MBR, Sergei
Re: [PATCH 11/15] ethernet: renesas: ravb_main: add missing of_node_put after calling of_parse_phandle
On 7/27/2016 5:20 AM, Peter Chen wrote: of_node_put needs to be called when the device node which is got from of_parse_phandle has finished using. Signed-off-by: Peter ChenAcked-by: Sergei Shtylyov MBR, Sergei
[RFC PATCH] net: flush the softnet backlog in process context
Currently in process_backlog(), the process_queue dequeuing is performed with local IRQ disabled, to protect against flush_backlog(), which runs in hard IRQ context. This patch moves the flush operation to a work queue and runs the callback with bottom half disabled to protect the process_queue against dequeuing. Since process_queue is now always manipulated in bottom half context, the irq disable/enable pair around the dequeue operation are removed. To keep the flush time as low as possible, the flush works are scheduled on all online cpu simultaneously, using the high priority work-queue and statically allocated, per cpu, work structs. Overall this change increases the time required to destroy a device to improve slightly the packets reinjection performances. Acked-by: Hannes Frederic SowaSigned-off-by: Paolo Abeni --- net/core/dev.c | 72 -- 1 file changed, 50 insertions(+), 22 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 2a9c39f..9b8cfab 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4292,15 +4292,25 @@ int netif_receive_skb(struct sk_buff *skb) } EXPORT_SYMBOL(netif_receive_skb); -/* Network device is going away, flush any packets still pending - * Called with irqs disabled. - */ -static void flush_backlog(void *arg) +struct flush_work { + struct net_device *dev; + struct work_struct work; +}; + +DEFINE_PER_CPU(struct flush_work, flush_works); + +/* Network device is going away, flush any packets still pending */ +static void flush_backlog(struct work_struct *work) { - struct net_device *dev = arg; - struct softnet_data *sd = this_cpu_ptr(_data); + struct flush_work *flush = container_of(work, typeof(*flush), work); + struct net_device *dev = flush->dev; struct sk_buff *skb, *tmp; + struct softnet_data *sd; + + local_bh_disable(); + sd = this_cpu_ptr(_data); + local_irq_disable(); rps_lock(sd); skb_queue_walk_safe(>input_pkt_queue, skb, tmp) { if (skb->dev == dev) { @@ -4310,6 +4320,7 @@ static void flush_backlog(void *arg) } } rps_unlock(sd); + local_irq_enable(); skb_queue_walk_safe(>process_queue, skb, tmp) { if (skb->dev == dev) { @@ -4318,6 +4329,27 @@ static void flush_backlog(void *arg) input_queue_head_incr(sd); } } + local_bh_enable(); +} + +static void flush_all_backlogs(struct net_device *dev) +{ + unsigned int cpu; + + get_online_cpus(); + + for_each_online_cpu(cpu) { + struct flush_work *flush = per_cpu_ptr(_works, cpu); + + INIT_WORK(>work, flush_backlog); + flush->dev = dev; + queue_work_on(cpu, system_highpri_wq, >work); + } + + for_each_online_cpu(cpu) + flush_work(_cpu_ptr(_works, cpu)->work); + + put_online_cpus(); } static int napi_gro_complete(struct sk_buff *skb) @@ -4805,8 +4837,9 @@ static bool sd_has_rps_ipi_waiting(struct softnet_data *sd) static int process_backlog(struct napi_struct *napi, int quota) { - int work = 0; struct softnet_data *sd = container_of(napi, struct softnet_data, backlog); + bool again = true; + int work = 0; /* Check if we have pending ipi, its better to send them now, * not waiting net_rx_action() end. @@ -4817,23 +4850,20 @@ static int process_backlog(struct napi_struct *napi, int quota) } napi->weight = weight_p; - local_irq_disable(); - while (1) { + while (again) { struct sk_buff *skb; while ((skb = __skb_dequeue(>process_queue))) { rcu_read_lock(); - local_irq_enable(); __netif_receive_skb(skb); rcu_read_unlock(); - local_irq_disable(); input_queue_head_incr(sd); - if (++work >= quota) { - local_irq_enable(); + if (++work >= quota) return work; - } + } + local_irq_disable(); rps_lock(sd); if (skb_queue_empty(>input_pkt_queue)) { /* @@ -4845,16 +4875,14 @@ static int process_backlog(struct napi_struct *napi, int quota) * and we dont need an smp_mb() memory barrier. */ napi->state = 0; - rps_unlock(sd); - - break; + again = false; + } else { + skb_queue_splice_tail_init(>input_pkt_queue, +
Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
Al Viro wrote: > > Another thing (and if that works, it's *NOT* a proper fix - it would be > papering over the problem, but at least it would show where to look for > it) - try (on top of mainline) the following delta: I tried it on top of v4.6.4 and on top of the very recent v4.7-2509-g59ebc44 from Linus, and still got corruption. > > diff --git a/net/core/datagram.c b/net/core/datagram.c > index b7de71f..0ee5995 100644 > --- a/net/core/datagram.c > +++ b/net/core/datagram.c > @@ -734,7 +734,7 @@ int skb_copy_and_csum_datagram_msg(struct sk_buff *skb, > if (!chunk) > return 0; > > - if (msg_data_left(msg) < chunk) { > + if (iov_iter_single_seg_count(>msg_iter) < chunk) { > if (__skb_checksum_complete(skb)) > goto csum_error; > if (skb_copy_datagram_msg(skb, hlen, msg, chunk)) > -- Alan Curry
[net-next PATCH 3/4] be2net: Avoid unnecessary firmware updates of multicast list
From: Sriharsha BasavapatnaEachtime the ndo_set_rx_mode() routine is called, the driver programs the multicast list in the adapter without checking if there are any changes to the list. This leads to a flood of RX_FILTER cmds when a number of vlan interfaces are configured over the device, as the ndo_ gets called for each vlan interface. To avoid this, we now use __dev_mc_sync() and __dev_uc_sync() API, but only to detect if there is a change in the mc/uc lists. Now that we use this API, the code has to be-designed to issue these API calls for each invocation of the be_set_rx_mode() call. Signed-off-by: Sriharsha Basavapatna Signed-off-by: Sathya Perla --- drivers/net/ethernet/emulex/benet/be.h | 2 + drivers/net/ethernet/emulex/benet/be_main.c | 169 +--- 2 files changed, 130 insertions(+), 41 deletions(-) diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h index 4555e04..6868712 100644 --- a/drivers/net/ethernet/emulex/benet/be.h +++ b/drivers/net/ethernet/emulex/benet/be.h @@ -573,6 +573,8 @@ struct be_adapter { u32 uc_macs;/* Count of secondary UC MAC programmed */ unsigned long vids[BITS_TO_LONGS(VLAN_N_VID)]; u16 vlans_added; + bool update_uc_list; + bool update_mc_list; u32 beacon_state; /* for set_phys_id */ diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 27fc1f0..4cf4997 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -1420,8 +1420,8 @@ static int be_vid_config(struct be_adapter *adapter) u16 num = 0, i = 0; int status = 0; - /* No need to further configure vids if in promiscuous mode */ - if (be_in_all_promisc(adapter)) + /* No need to change the VLAN state if the I/F is in promiscuous */ + if (adapter->netdev->flags & IFF_PROMISC) return 0; if (adapter->vlans_added > be_max_vlans(adapter)) @@ -1483,12 +1483,6 @@ static int be_vlan_rem_vid(struct net_device *netdev, __be16 proto, u16 vid) return be_vid_config(adapter); } -static void be_clear_all_promisc(struct be_adapter *adapter) -{ - be_cmd_rx_filter(adapter, BE_IF_FLAGS_ALL_PROMISCUOUS, OFF); - adapter->if_flags &= ~BE_IF_FLAGS_ALL_PROMISCUOUS; -} - static void be_set_all_promisc(struct be_adapter *adapter) { be_cmd_rx_filter(adapter, BE_IF_FLAGS_ALL_PROMISCUOUS, ON); @@ -1507,42 +1501,144 @@ static void be_set_mc_promisc(struct be_adapter *adapter) adapter->if_flags |= BE_IF_FLAGS_MCAST_PROMISCUOUS; } -static void be_set_mc_list(struct be_adapter *adapter) +static void be_set_uc_promisc(struct be_adapter *adapter) { int status; - status = be_cmd_rx_filter(adapter, BE_IF_FLAGS_MULTICAST, ON); + if (adapter->if_flags & BE_IF_FLAGS_PROMISCUOUS) + return; + + status = be_cmd_rx_filter(adapter, BE_IF_FLAGS_PROMISCUOUS, ON); if (!status) - adapter->if_flags &= ~BE_IF_FLAGS_MCAST_PROMISCUOUS; - else + adapter->if_flags |= BE_IF_FLAGS_PROMISCUOUS; +} + +static void be_clear_uc_promisc(struct be_adapter *adapter) +{ + int status; + + if (!(adapter->if_flags & BE_IF_FLAGS_PROMISCUOUS)) + return; + + status = be_cmd_rx_filter(adapter, BE_IF_FLAGS_PROMISCUOUS, OFF); + if (!status) + adapter->if_flags &= ~BE_IF_FLAGS_PROMISCUOUS; +} + +/* The below 2 functions are the callback args for __dev_mc_sync/dev_uc_sync(). + * We use a single callback function for both sync and unsync. We really don't + * add/remove addresses through this callback. But, we use it to detect changes + * to the uc/mc lists. The entire uc/mc list is programmed in be_set_rx_mode(). + */ +static int be_uc_list_update(struct net_device *netdev, +const unsigned char *addr) +{ + struct be_adapter *adapter = netdev_priv(netdev); + + adapter->update_uc_list = true; + return 0; +} + +static int be_mc_list_update(struct net_device *netdev, +const unsigned char *addr) +{ + struct be_adapter *adapter = netdev_priv(netdev); + + adapter->update_mc_list = true; + return 0; +} + +static void be_set_mc_list(struct be_adapter *adapter) +{ + struct net_device *netdev = adapter->netdev; + bool mc_promisc = false; + int status; + + __dev_mc_sync(netdev, be_mc_list_update, be_mc_list_update); + + if (netdev->flags & IFF_PROMISC) { + adapter->update_mc_list = false; + } else if (netdev->flags & IFF_ALLMULTI || + netdev_mc_count(netdev) > be_max_mc(adapter)) { + /* Enable multicast promisc if num configured exceeds +
[net-next PATCH 4/4] be2net: replace polling with sleeping in the FW completion path
The ndo_set_rx_mode() and ndo_add/del_vxlan_port() calls may be called with BHs disabled. The driver currently issues the required cmds to the FW in these contexts and polls on completions from the FW, while BHs remain disabled. This can cause either packet loss or packet reception to be delayed on that CPU. This patch defers processing of the above cmds to a separate workqueue. With this change, FW cmds are now issued only in process context. Now that the FW cmds are issued only in process context, they can sleep waiting for a completion instead of polling. All the spin_lock_bh(mcc_lock) calls are now replaced with mutex calls. Also a new rx_filter_lock is now needed to protect the RX filtering fields like vids[] between be_vlan_add/rem_vid() and __be_set_rx_mode() contexts. Signed-off-by: Sathya Perla--- drivers/net/ethernet/emulex/benet/be.h | 19 +- drivers/net/ethernet/emulex/benet/be_cmds.c | 202 ++--- drivers/net/ethernet/emulex/benet/be_main.c | 264 ++-- 3 files changed, 327 insertions(+), 158 deletions(-) diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h index 6868712..86780b5 100644 --- a/drivers/net/ethernet/emulex/benet/be.h +++ b/drivers/net/ethernet/emulex/benet/be.h @@ -508,6 +508,10 @@ struct be_wrb_params { u16 lso_mss;/* MSS for LSO */ }; +struct be_eth_addr { + unsigned char mac[ETH_ALEN]; +}; + struct be_adapter { struct pci_dev *pdev; struct net_device *netdev; @@ -523,7 +527,7 @@ struct be_adapter { struct be_dma_mem mbox_mem_alloced; struct be_mcc_obj mcc_obj; - spinlock_t mcc_lock;/* For serializing mcc cmds to BE card */ + struct mutex mcc_lock; /* For serializing mcc cmds to BE card */ spinlock_t mcc_cq_lock; u16 cfg_num_rx_irqs;/* configured via set-channels */ @@ -570,11 +574,15 @@ struct be_adapter { int if_handle; /* Used to configure filtering */ u32 if_flags; /* Interface filtering flags */ u32 *pmac_id; /* MAC addr handle used by BE card */ + struct be_eth_addr *uc_list;/* list of uc-addrs programmed (not perm) */ u32 uc_macs;/* Count of secondary UC MAC programmed */ + struct be_eth_addr *mc_list;/* list of mcast addrs programmed */ + u32 mc_count; unsigned long vids[BITS_TO_LONGS(VLAN_N_VID)]; u16 vlans_added; bool update_uc_list; bool update_mc_list; + struct mutex rx_filter_lock;/* For protecting vids[] & mc/uc_list[] */ u32 beacon_state; /* for set_phys_id */ @@ -628,6 +636,15 @@ struct be_adapter { u8 phy_state; /* state of sfp optics (functional, faulted, etc.,) */ }; +/* Used for defered FW config cmds. Add fields to this struct as reqd */ +struct be_cmd_work { + struct work_struct work; + struct be_adapter *adapter; + union { + __be16 vxlan_port; + } info; +}; + #define be_physfn(adapter) (!adapter->virtfn) #define be_virtfn(adapter) (adapter->virtfn) #define sriov_enabled(adapter) (adapter->flags & \ diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c index 2cc1175..fa11a5a 100644 --- a/drivers/net/ethernet/emulex/benet/be_cmds.c +++ b/drivers/net/ethernet/emulex/benet/be_cmds.c @@ -571,7 +571,7 @@ int be_process_mcc(struct be_adapter *adapter) /* Wait till no more pending mcc requests are present */ static int be_mcc_wait_compl(struct be_adapter *adapter) { -#define mcc_timeout12 /* 12s timeout */ +#define mcc_timeout12000 /* 12s timeout */ int i, status = 0; struct be_mcc_obj *mcc_obj = >mcc_obj; @@ -585,7 +585,7 @@ static int be_mcc_wait_compl(struct be_adapter *adapter) if (atomic_read(_obj->q.used) == 0) break; - udelay(100); + usleep_range(500, 1000); } if (i == mcc_timeout) { dev_err(>pdev->dev, "FW not responding\n"); @@ -863,7 +863,7 @@ static bool use_mcc(struct be_adapter *adapter) static int be_cmd_lock(struct be_adapter *adapter) { if (use_mcc(adapter)) { - spin_lock_bh(>mcc_lock); + mutex_lock(>mcc_lock); return 0; } else { return mutex_lock_interruptible(>mbox_lock); @@ -874,7 +874,7 @@ static int be_cmd_lock(struct be_adapter *adapter) static void be_cmd_unlock(struct be_adapter *adapter) { if (use_mcc(adapter)) - spin_unlock_bh(>mcc_lock); + return mutex_unlock(>mcc_lock); else return mutex_unlock(>mbox_lock); } @@ -1044,7 +1044,7 @@ int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr, struct be_cmd_req_mac_query *req;
[net-next PATCH 0/4] be2net: patch set
Hi David, pls consider applying this patch-set to the net-next tree. Patch 1 fixes the driver to workaournd a bug in the Lancer FW in the vlan-config cmd processing. The FW in some cases clears the vlan-promisc setting even if it cannot apply the vlan filter. The driver has no means of knowing if the vlan-promisc setting has been cleared or not. This patch now explicitly clears the vlan-promisc setting via the RX-Filter cmd and then tries to program the vlan-list. Patch 2 fixes the failure path in the vlan vid add code. The driver currently removes a new vid from the adapter->vids[] array if be_vid_config() returns an error, which occurs when there is an error in HW/FW. This is wrong. After the HW/FW error is recovered from, we need the complete vids[] array to re-program the vlan list. Patch 3 fixes the ndo_set_rx_mode() path to avoid unnecessary multicast list updates to the FW. Each time the ndo_set_rx_mode() routine is called, the driver programs the multicast list in the adapter without checking if there are any changes to the list. This leads to a flood of RX_FILTER cmds when a number of vlan interfaces are configured over the device, as the ndo_ gets called for each vlan interface. To avoid this, we now use __dev_mc_sync() and __dev_uc_sync() API, but only to detect if there is a change in the mc/uc lists. Now that we use this API, the code has to be-designed to issue these API calls for each invocation of the ndo_ call. Patch 4 replaces polling with sleeping in the FW completion path. The ndo_set_rx_mode() and ndo_add/del_vxlan_port() calls may be called with BHs disabled. The driver currently issues the required cmds to the FW in these contexts and polls on completions from the FW, while BHs remain disabled. This can cause either packet loss or packet reception to be delayed on that CPU. This patch defers processing of the above cmds to a separate workqueue. With this change, FW cmds are now issued only in process context. Now that the FW cmds are issued only in process context, they can sleep waiting for a completion instead of polling. Sathya Perla (2): be2net: do not remove vids from driver table if be_vid_config() fails. be2net: replace polling with sleeping in the FW completion path Somnath Kotur (1): be2net: clear vlan-promisc setting before programming the vlan list Sriharsha Basavapatna (1): be2net: Avoid unnecessary firmware updates of multicast list drivers/net/ethernet/emulex/benet/be.h | 21 +- drivers/net/ethernet/emulex/benet/be_cmds.c | 202 ++--- drivers/net/ethernet/emulex/benet/be_main.c | 422 ++-- 3 files changed, 450 insertions(+), 195 deletions(-) -- 2.4.1
[net-next PATCH 2/4] be2net: do not remove vids from driver table if be_vid_config() fails.
The driver currently removes a new vid from the adapter->vids[] array if be_vid_config() returns an error, which occurs when there is an error in HW/FW. This is wrong. After the HW/FW error is recovered from, we need the complete vids[] array to re-program the vlan list. Signed-off-by: Sathya Perla--- drivers/net/ethernet/emulex/benet/be_main.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 2782299..27fc1f0 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -1463,13 +1463,7 @@ static int be_vlan_add_vid(struct net_device *netdev, __be16 proto, u16 vid) set_bit(vid, adapter->vids); adapter->vlans_added++; - status = be_vid_config(adapter); - if (status) { - adapter->vlans_added--; - clear_bit(vid, adapter->vids); - } - - return status; + return be_vid_config(adapter); } static int be_vlan_rem_vid(struct net_device *netdev, __be16 proto, u16 vid) -- 2.4.1
[net-next PATCH 1/4] be2net: clear vlan-promisc setting before programming the vlan list
From: Somnath KoturThe Lancer FW has a bug due to which in some cases vlan-promisc setting is cleared eventhough the vlan-list programming did not succeed (via VLAN_CONFIG) cmd. The driver has no way of knowing if the vlan-promisc mode was cleared or not when this cmd fails. To work around this issue, this patch first explicitly clears the vlan-promisc mode via RX_FILTER cmd and then tries to program the vlan list. Signed-off-by: Somnath Kotur Signed-off-by: Sathya Perla --- drivers/net/ethernet/emulex/benet/be_main.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c index 1f16e73..2782299 100644 --- a/drivers/net/ethernet/emulex/benet/be_main.c +++ b/drivers/net/ethernet/emulex/benet/be_main.c @@ -1427,6 +1427,11 @@ static int be_vid_config(struct be_adapter *adapter) if (adapter->vlans_added > be_max_vlans(adapter)) return be_set_vlan_promisc(adapter); + if (adapter->if_flags & BE_IF_FLAGS_VLAN_PROMISCUOUS) { + status = be_clear_vlan_promisc(adapter); + if (status) + return status; + } /* Construct VLAN Table to give to HW */ for_each_set_bit(i, adapter->vids, VLAN_N_VID) vids[num++] = cpu_to_le16(i); @@ -1439,8 +1444,6 @@ static int be_vid_config(struct be_adapter *adapter) addl_status(status) == MCC_ADDL_STATUS_INSUFFICIENT_RESOURCES) return be_set_vlan_promisc(adapter); - } else if (adapter->if_flags & BE_IF_FLAGS_VLAN_PROMISCUOUS) { - status = be_clear_vlan_promisc(adapter); } return status; } -- 2.4.1
RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Wednesday, July 27, 2016 1:45 > To: Dexuan Cui> > From: Dexuan Cui > Date: Tue, 26 Jul 2016 07:09:41 + > > > I googled "S390 hypervisor socket" but didn't find anything related (I > > think). > > That would be net/iucv/ Thanks for the info! I'll look into this. > There's also VMWare's stuff under net/vmw_vsock > > It's just absolutely rediculous to make a new hypervisor socket > interface over and over again, so much code duplication and > replication. I agree on this principle of avoiding duplication. However my feeling is: IMHO different hypervisor sockets were developed independently without coordination and the implementation details could be so different that an enough generic framework/infrastructure is difficult, e.g., at first glance, it looks AF_IUCV is quite different from AF_VSOCK and this might explain why AF_VSOCK wasn't built on AF_IUCV(?). I'll dig more into AF_IUCV, AF_VSOCK and AF_HYPERV and figure out what is the best direction I should go. Thanks, -- Dexuan
gpf in p9_client_read
Hi, I was fuzzing with trinity in qemu via the virtme script, i.e. using these opts: virtme-run --kdir /home/vbabka/wrk/linus.git --rwdir /home/vbabka/tmp/trinity/ --dry-run --show-command --qemu-opts --smp 8 -drive file=/home/vbabka/tmp/trinity.img,if=ide,format=raw,media=disk /usr/bin/qemu-system-x86_64 -fsdev local,id=virtfs1,path=/,security_model=none,readonly -device virtio-9p-pci,fsdev=virtfs1,mount_tag=/dev/root -fsdev local,id=virtfs5,path=/usr/local/share/virtme-guest-0,security_model=none,readonly -device virtio-9p-pci,fsdev=virtfs5,mount_tag=virtme.guesttools -fsdev local,id=virtfs9,path=/home/vbabka/tmp/trinity/,security_model=none -device virtio-9p-pci,fsdev=virtfs9,mount_tag=virtme.initmount0 -machine accel=kvm:tcg -watchdog i6300esb -cpu host -parallel none -net none -echr 1 -serial none -chardev stdio,id=console,signal=off,mux=on -serial chardev:console -mon chardev=console -vga none -display none -kernel /home/vbabka/wrk/linus.git/arch/x86/boot/bzImage -append 'virtme_initmount0=home/vbabka/tmp/trinity earlyprintk=serial,ttyS0,115200 console=ttyS0 psmouse.proto=exps "virtme_stty_con=rows 53 cols 189 iutf8" TERM=xterm rootfstype=9p rootflags=version=9p2000.L,trans=virtio,access=any raid=noautodetect ro init=/bin/sh -- -c "mount -t tmpfs run /run;mkdir -p /run/virtme/guesttools;/bin/mount -n -t 9p -o ro,version=9p2000.L,trans=virtio,access=any virtme.guesttools /run/virtme/guesttools;exec /run/virtme/guesttools/virtme-init"' --smp 8 -drive file=/home/vbabka/tmp/trinity.img,if=ide,format=raw,media=disk So the fuzzer's working directory is on ext4 created on disk backed by trinity.img, but the rest is provided from host via virtio-9p, as virtme does. Very deterministically after few minutes of fuzzing, I get OOMs followed by this: general protection fault: [#1] SMP CPU: 2 PID: 17712 Comm: trinity-c10 Not tainted 4.7.0 #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 0[m ) 178.898319] task: 880003a67000 ti: 880001984000 task.ti: 880001984000 RIP: 0010:[] [] p9_client_read+0x70/0x270 RSP: :880001987c48 EFLAGS: 00010246 RAX: 559b48bf06527000 RBX: 1000 RCX: 880001987cdc RDX: 1000 RSI: c000 RDI: 880005e6b7c0 RBP: 880001987cc8 R08: 1000 R09: 0020 R10: 880001987cd0 R11: 0040 R12: 880005e6b780 R13: c000 R14: 880005e6b240 R15: 024201ca FS: 7fcdb1a74700() GS:880007c8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0040c840 CR3: 019fb000 CR4: 001406e0 DR0: 7fcdb09b3000 DR1: DR2: DR3: DR6: 0ff0 DR7: 00050602 Stack: 880001987c90 81130f6f 880001987c58 880001987c58 880007c91840 880005e6b7c0 880001987cdc ea00 880001987cf0 11987ce8 81131621 ea10abc0 Call Trace: [] v9fs_fid_readpage+0x73/0x180 fs/9p/vfs_addr.c:69 [] v9fs_vfs_readpage+0x10/0x20 fs/9p/vfs_addr.c:98 [< inline >] page_cache_read mm/filemap.c:1908 [] filemap_fault+0x335/0x4b0 mm/filemap.c:2087 [] __do_fault+0x61/0xe0 mm/memory.c:2839 [< inline >] do_read_fault mm/memory.c:3030 [< inline >] do_fault mm/memory.c:3189 [< inline >] handle_pte_fault mm/memory.c:3364 [< inline >] __handle_mm_fault mm/memory.c:3488 [] handle_mm_fault+0xd41/0x13d0 mm/memory.c:3517 [] __do_page_fault+0x1c1/0x490 arch/x86/mm/fault.c:1356 [] do_page_fault+0xc/0x10 arch/x86/mm/fault.c:1419 [] page_fault+0x22/0x30 arch/x86/entry/entry_64.S:920 Code: 89 c2 89 45 cc 41 8b 5e 20 41 8b 44 24 04 83 e8 18 85 db 0f 84 f8 00 00 00 39 c3 0f 87 f0 00 00 00 49 8b 44 24 10 39 d3 0f 4f da <48> 83 78 50 00 74 0c 81 fb 00 04 00 00 0f 8f ef 00 00 00 41 8b RIP [] p9_client_read+0x70/0x270 net/9p/client.c:1562 RSP ---[ end trace 0192b39ea4ee4e9b ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled ---[ end Kernel panic - not syncing: Fatal exception All code 0: 89 c2 mov%eax,%edx 2: 89 45 ccmov%eax,-0x34(%rbp) 5: 41 8b 5e 20 mov0x20(%r14),%ebx 9: 41 8b 44 24 04 mov0x4(%r12),%eax e: 83 e8 18sub$0x18,%eax 11: 85 db test %ebx,%ebx 13: 0f 84 f8 00 00 00 je 0x111 19: 39 c3 cmp%eax,%ebx 1b: 0f 87 f0 00 00 00 ja 0x111 21: 49 8b 44 24 10 mov0x10(%r12),%rax 26: 39 d3 cmp%edx,%ebx 28: 0f 4f dacmovg %edx,%ebx 2b:* 48 83 78 50 00 cmpq $0x0,0x50(%rax) <-- trapping instruction 30: 74 0c je 0x3e 32: 81 fb 00 04 00 00 cmp$0x400,%ebx
RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: netdev-ow...@vger.kernel.org [mailto:netdev- > ow...@vger.kernel.org] On Behalf Of Dexuan Cui > Sent: Tuesday, July 26, 2016 21:22 > ... > This is because, the design of AF_HYPERV in the Hyper-V host side is > suboptimal IMHO (the current host side design requires the least > change in the host side, but it makes my life difficult. :-( It may > change in the future, but luckily we have to live with it at present): BTW, sorry for my typo: "luckily" should be "unluckily". Thanks, -- Dexuan
Re: [net-next PATCH 4/4] be2net: replace polling with sleeping in the FW completion path
From: Sathya PerlaDate: Wed, 27 Jul 2016 13:33:19 +0530 >> -Original Message- >> From: Sathya Perla [mailto:sathya.pe...@broadcom.com] >> > > >> > > > @@ -4477,6 +4551,22 @@ static int be_if_create(struct be_adapter >> > *adapter) >> > > >u32 cap_flags = be_if_cap_flags(adapter); >> > > >int status; >> > > > >> > > > + /* alloc required memory for other filtering fields */ >> > > > + adapter->pmac_id = kcalloc(be_max_uc(adapter), >> > > > + sizeof(*adapter->pmac_id), >> > > > GFP_KERNEL); >> > > > + if (!adapter->pmac_id) >> > > > + return -ENOMEM; >> > > > + >> > > > + adapter->mc_list = kcalloc(be_max_mc(adapter), >> > > > + sizeof(*adapter->mc_list), >> > > > GFP_KERNEL); >> > > > + if (!adapter->mc_list) >> > > > + return -ENOMEM; >> > > > + >> > > > + adapter->uc_list = kcalloc(be_max_uc(adapter), >> > > > + sizeof(*adapter->uc_list), >> > > > GFP_KERNEL); >> > > > + if (!adapter->uc_list) >> > > > + return -ENOMEM; >> > > >> > > These error paths are leaking memory, please audit this in the rest >> > > of >> > your change as well. >> > >> > David, thanks for catching this; will fix this (and others if any) and >> > send out a v2. >> >> David, actually, when be_if_create() returns an error, it falls back into >> the error path of >> be_setup() (the caller) and be_clear() is called. be_clear() calls >> be_if_destroy() which frees the allocated memory. >> So, this code will not leak memory in the error path. > > David, can you accept the v1 patch set as-is, as it doesn't have the memory > leaks that you suspected initially? Please resubmit, thanks.
Re: [RFC PATCH v2 1/4] Documentation: DT: net: Add Xilinx gmiitorgmii converter device tree binding documentation
Hi Appana Here is roughly what i was thinking: struct priv { phy_device *master; phy_device *slave; struct phy_driver *slave_drv; }; phy_status_clone(phy_device *master, phy_device *slave) { master->speed = slave->speed; master->duplex = slave->duplex; master->pause = slave->pause; } read_status(struct phy_device *phydev) { struct priv *priv = phydev->priv; /* Get the status from the slave, and duplicate in into the * master */ slave_drv->read_status(priv->slave); phy_status_clone(priv->master, priv->slave); /* Update the gmiitorgmii with the current link parameters */ update_link(master); } config_init(struct phy_device *phydev) { struct priv *priv = phydev->priv; /* Configure the slave, and duplicate in into the master */ slave_drv->config_init(priv->slave); phy_status_clone(priv->master, priv->slave); } struct phy_driver master_drv = { .read_status = read_status, .config_init = config_init, .soft_reset = ... .suspend = ... }; probe(mdio_device *mdio) { struct priv *priv = devm_alloc(); /* Use the phy-handle property to find the slave phy */ node_phy = of_parse_phandle(mdio->of_node, "phy", 0); priv->slave = of_phy_find_device(node_phy); /* Create the master phy on the control address. Use the phy ID from the slave. */ priv->master = phy_device_create(mdio->bus, mdio->addr, phy->slave->phy_id, phy->slave->is_c45, phy->slave->c45_ids); slave_dev_drv = phydev->mdio.dev.driver; priv->slave_drv = to_phy_driver(slave_dev_drv); priv->master->mdio.dev.driver = master_drv; } It would however be nice to only have one phydev structure, so you are not copying status and settings backwards and forwards from one to the other all the time, and need a wrapper for every function in phy_driver. Studying the structures a bit, that might be possible. You would then only need to wrap the read_status(), so that when the link speed/duplex changes, you can configure the converter as appropriate. Andrew
Re: [PATCH net-next v2 2/3] bnxt_en: Log a message, if enabling NTUPLE filtering fails.
On Wed, Jul 27, 2016 at 12:31 AM, zhuyjwrote: > + if (vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) { ><-Does this happen very rarely? If so, > > if (unlikely(vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) { is > better? This is not in the fast path, so it doesn't matter. > > + netdev_warn(bp->dev, > + "Not enough resources to support NTUPLE > filters, enough resources for up to %d rx rings\n", > + min(pf->max_rsscos_ctxs - 1, pf->max_vnics - 1)); > return false; > + } >
RE: [net-next PATCH 4/4] be2net: replace polling with sleeping in the FW completion path
> -Original Message- > From: Sathya Perla [mailto:sathya.pe...@broadcom.com] > > > > > > > @@ -4477,6 +4551,22 @@ static int be_if_create(struct be_adapter > > *adapter) > > > > u32 cap_flags = be_if_cap_flags(adapter); > > > > int status; > > > > > > > > + /* alloc required memory for other filtering fields */ > > > > + adapter->pmac_id = kcalloc(be_max_uc(adapter), > > > > + sizeof(*adapter->pmac_id), > > > > GFP_KERNEL); > > > > + if (!adapter->pmac_id) > > > > + return -ENOMEM; > > > > + > > > > + adapter->mc_list = kcalloc(be_max_mc(adapter), > > > > + sizeof(*adapter->mc_list), > > > > GFP_KERNEL); > > > > + if (!adapter->mc_list) > > > > + return -ENOMEM; > > > > + > > > > + adapter->uc_list = kcalloc(be_max_uc(adapter), > > > > + sizeof(*adapter->uc_list), > > > > GFP_KERNEL); > > > > + if (!adapter->uc_list) > > > > + return -ENOMEM; > > > > > > These error paths are leaking memory, please audit this in the rest > > > of > > your change as well. > > > > David, thanks for catching this; will fix this (and others if any) and > > send out a v2. > > David, actually, when be_if_create() returns an error, it falls back into > the error path of > be_setup() (the caller) and be_clear() is called. be_clear() calls > be_if_destroy() which frees the allocated memory. > So, this code will not leak memory in the error path. David, can you accept the v1 patch set as-is, as it doesn't have the memory leaks that you suspected initially? thanks, -Sathya
RE: [PATCH] qed: Add and use specific logging functions to reduce object size
> Current DP_ macros generate a lot of code. > Using functions with vsprintf extension %pV helps reduce that size. > > drivers/net/ethernet/qlogic/qed/Makefile | 2 +- > drivers/net/ethernet/qlogic/qed/qed_util.c | 82 > ++ > include/linux/qed/qed_if.h | 60 +- > 3 files changed, 106 insertions(+), 38 deletions(-) create mode 100644 > drivers/net/ethernet/qlogic/qed/qed_util.c This won't compile when CONFIG_QED*=m, as qede can't link to the new qed_{err, verbose, info, notice} functions. That was the original reason for putting the macros in the interface header. Other alternatives: - We can EXPORT_SYMBOL() the functions, although we've taken a strain not adding such to the interface. - Code duplication between qed/qede [ugly]. - Implementing this in qede via the interface functions with qed; But the notion of defining an interface between 2 modules passing va_args seems [to me] like a bad one. If you have cleaner solutions, I'd be happy to hear those.
RE: [PATCH] qed: Add and use specific logging functions to reduce object size
> > Current DP_ macros generate a lot of code. > > Using functions with vsprintf extension %pV helps reduce that size. > > Yuval, I used the same KERN_ output types, but it is unusual > that DP_INFO outputs at KERN_NOTICE. > > Was that a copy/paste defect or should it be emitted at KERN_INFO and > DP_VERBOSE be emitted at KERN_DEBUG? I agree it's a bit odd, but it's actually by design - as none of these prints would ever be reached unless user explicitly enable them [ethtool/module param], the assumption is that NOTICE is good enough, i.e., it will prevent the need for doing additional configuration of logging levels by the user.
Re: [PATCH net-next v2 2/3] bnxt_en: Log a message, if enabling NTUPLE filtering fails.
+ if (vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) { <-Does this happen very rarely? If so, if (unlikely(vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) { is better? + netdev_warn(bp->dev, + "Not enough resources to support NTUPLE filters, enough resources for up to %d rx rings\n", + min(pf->max_rsscos_ctxs - 1, pf->max_vnics - 1)); return false; + } On Tue, Jul 26, 2016 at 12:33 AM, Michael Chanwrote: > From: Vasundhara Volam > > If there are not enough resources to enable ntuple filtering, > log a warning message. > > v2: Use single message and add missing newline. > > Signed-off-by: Vasundhara Volam > Signed-off-by: Michael Chan > --- > drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c > b/drivers/net/ethernet/broadcom/bnxt/bnxt.c > index 7de7d7a..eac0f2b 100644 > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c > @@ -5790,8 +5790,12 @@ static bool bnxt_rfs_capable(struct bnxt *bp) > return false; > > vnics = 1 + bp->rx_nr_rings; > - if (vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) > + if (vnics > pf->max_rsscos_ctxs || vnics > pf->max_vnics) { > + netdev_warn(bp->dev, > + "Not enough resources to support NTUPLE filters, > enough resources for up to %d rx rings\n", > + min(pf->max_rsscos_ctxs - 1, pf->max_vnics - 1)); > return false; > + } > > return true; > #else > @@ -5804,7 +5808,7 @@ static netdev_features_t bnxt_fix_features(struct > net_device *dev, > { > struct bnxt *bp = netdev_priv(dev); > > - if (!bnxt_rfs_capable(bp)) > + if ((features & NETIF_F_NTUPLE) && !bnxt_rfs_capable(bp)) > features &= ~NETIF_F_NTUPLE; > > /* Both CTAG and STAG VLAN accelaration on the RX side have to be > -- > 1.8.3.1 >
RE: [PATCH v4 5/7] thunderbolt: Networking state machine
On Mon, Jul 25 2016, 01:36 AM, Lukas Wunner wrote: > On Mon, Jul 18, 2016 at 01:00:38PM +0300, Amir Levy wrote: > > + const unique_id_be proto_uuid = > APPLE_THUNDERBOLT_IP_PROTOCOL_UUID; > > + > > + if (memcmp(proto_uuid, hdr->apple_tbt_ip_proto_uuid, > > + sizeof(proto_uuid)) != 0) { > > You may want to use the uuid_be data type provided by > instead of rolling your own, as well as the helper uuid_be_cmp() defined > ibidem. > All the messages in Thunderbolt consist BE DWORDs. I didn't find uuid definition in the kernel that accurately describes the uuid structure in the messages, which is 4 BE DWORDs. But on the other hand, all the driver does is copy/compare of uuids, and it can treat uuids as byte array (i.e. uuid_be). I'll change it in the next patches.
Re: [PATCH 2/2] net: davinci_cpdma: reduce time holding chan->lock in cpdma_chan_submit
Hello, On Tue, Jul 26, 2016 at 05:25:58PM +0300, Grygorii Strashko wrote: > On 07/26/2016 03:02 PM, Uwe Kleine-König wrote: > > Allocating and preparing a dma descriptor doesn't need to happen under > > the channel's lock. So do this before taking the channel's lock. The only > > down side is that the dma descriptor might be allocated even though the > > channel is about to be stopped. This is unlikely though. > > > > Signed-off-by: Uwe Kleine-König> > --- > > drivers/net/ethernet/ti/davinci_cpdma.c | 38 > > + > > 1 file changed, 20 insertions(+), 18 deletions(-) > > > > diff --git a/drivers/net/ethernet/ti/davinci_cpdma.c > > b/drivers/net/ethernet/ti/davinci_cpdma.c > > index 5ffa04a306c6..ba3462707ae3 100644 > > --- a/drivers/net/ethernet/ti/davinci_cpdma.c > > +++ b/drivers/net/ethernet/ti/davinci_cpdma.c > > @@ -542,24 +542,10 @@ int cpdma_chan_submit(struct cpdma_chan *chan, void > > *token, void *data, > > u32 mode; > > int ret = 0; > > > > - spin_lock_irqsave(>lock, flags); > > - > > - if (chan->state == CPDMA_STATE_TEARDOWN) { > > - ret = -EINVAL; > > - goto unlock_ret; > > - } > > - > > - if (chan->count >= chan->desc_num) { > > - chan->stats.desc_alloc_fail++; > > - ret = -ENOMEM; > > - goto unlock_ret; > > - } > > I'm not sure this is right thing to do. This check is expected to be strict > and means "channel has exhausted the available descriptors, so further descs > allocation does not allowed". I developed this patch basing on a 4.4 kernel which doesn't have 742fb20fd4c7 ("net: ethernet: ti: cpdma: switch to use genalloc"). There my patch is more obviously correct. As currently chan->count is protected by chan->lock we must hold the lock for this check. If a failing check means we must not call cpdma_desc_alloc in the first place, that's bad. But I'm not sure this is the case here. After all cpdma_desc_alloc doesn't do anything relevant for the hardware, right? Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ |
Re: problem with MPLS and TSO/GSO
On Wed, Jul 27, 2016 at 03:02:24PM +0800, zhuyj wrote: > On ubuntu16.04 server 64 bit > The attached script is run, the following will appear. > > Error: either "to" is duplicate, or "encap" is a garbage. Looks like your installed iproute2 package doesn't grok MPLS.
Re: [PATCH 0/2] net: davinci_cpdma: reduce latency on -rt
On Tue, Jul 26, 2016 at 05:36:49PM +0300, Grygorii Strashko wrote: > On 07/26/2016 03:02 PM, Uwe Kleine-König wrote: > >Hello, > > > >these patches are based on next-20160726. I didn't check yet how latency > >improves by using these patches, but even if the improvment is small, > >it's still a good idea to have them. > > Sry, but how this will affect on -RT? This is not a raw locks, so > they will be converted to rt-mutexes which are sleepable. > Or I've missed smth? They are still locks after all. On -rt I saw for the relevant application: send package | take lock | write pckt to hw | | rcv irq | take lock | schedule drop lock | schedule | | get pckt from hw | drop lock So reducing the time a lock is taken reduces the chances that the lock is contended for another thread which results in extra context switches. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König| Industrial Linux Solutions | http://www.pengutronix.de/ |
[PATCH net-next] neigh: allow admin to set NUD_STALE
Admin should be able to set any state. Currently, this fails when lladdr is not changed and state is changed from NUD_CONNECTED to NUD_STALE: ip neigh add 192.168.8.1 lladdr 00:11:22:33:44:55 nud perm dev wlan0 ip neigh show to 192.168.8.1 192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT ip neigh change 192.168.8.1 lladdr 00:11:22:33:44:55 nud stale dev wlan0 ip neigh show to 192.168.8.1 192.168.8.1 dev wlan0 lladdr 00:11:22:33:44:55 PERMANENT Problem may be from 2.1.X days. Signed-off-by: Julian Anastasov--- net/core/neighbour.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/neighbour.c b/net/core/neighbour.c index cf26e04c4..2ae929f 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -1148,7 +1148,8 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new, } else goto out; } else { - if (lladdr == neigh->ha && new == NUD_STALE) + if (lladdr == neigh->ha && new == NUD_STALE && + !(flags & NEIGH_UPDATE_F_ADMIN)) new = old; } } -- 1.9.3
Re: linux-next: build warning after merge of the net-next tree
On Tue, Jul 26, 2016 at 11:35 PM, Stephen Rothwellwrote: > Hi David, > > On Tue, 26 Jul 2016 23:19:59 -0700 (PDT) David Miller > wrote: >> >> Fixed thusly: >> >> >> From 36232012344b8db67052432742deaf17f82e70e6 Mon Sep 17 00:00:00 2001 >> From: "David S. Miller" >> Date: Tue, 26 Jul 2016 23:19:29 -0700 >> Subject: [PATCH] xgene: Fix build warning with ACPI disabled. >> >> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c: In function >> 'xgene_enet_phy_connect': >> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c:759:22: warning: unused >> variable 'adev' [-Wunused-variable] >> >> Fixes: 8089a96f601b ("drivers: net: xgene: Add backward compatibility") >> Reported-by: Stephen Rothwell >> Signed-off-by: David S. Miller >> --- >> drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 3 +-- >> 1 file changed, 1 insertion(+), 2 deletions(-) >> >> diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c >> b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c >> index 8a2a221..7714b7d 100644 >> --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c >> +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c >> @@ -756,7 +756,6 @@ int xgene_enet_phy_connect(struct net_device *ndev) >> struct device_node *np; >> struct phy_device *phy_dev; >> struct device *dev = >pdev->dev; >> - struct acpi_device *adev; >> int i; >> >> if (dev->of_node) { >> @@ -781,7 +780,7 @@ int xgene_enet_phy_connect(struct net_device *ndev) >> pdata->phy_dev = phy_dev; >> } else { >> #ifdef CONFIG_ACPI >> - adev = acpi_phy_find_device(dev); >> + struct acpi_device *adev = acpi_phy_find_device(dev); >> if (adev) >> pdata->phy_dev = adev->driver_data; > > Looks good to me, thanks. Thanks David, for the quick fix. > > -- > Cheers, > Stephen Rothwell
Re: [PATCH] xfrm: use printk instead of WARN for bad policy reporting
On 07/27/2016 08:31 AM, Herbert Xu wrote: On Wed, Jul 27, 2016 at 08:20:57AM +0200, Vegard Nossum wrote: Here's another patch to remove that too. I don't actually *use* this code myself and I feel the justification I've given for removing the WARN to be a bit weak, so if you don't take the patch I'll just keep it in my local tree to keep it from showing up again during fuzzing. Please just kill the whole else clause. For soft policy expires we simply need to relay a message to the KM and nothing more. Try #2 :-) Vegard >From e5111e4dcd0e0c0990d3f4bba0ba0bc9d0b40bae Mon Sep 17 00:00:00 2001 From: Vegard NossumDate: Wed, 27 Jul 2016 08:13:14 +0200 Subject: [PATCH] xfrm: get rid of another incorrect WARN During fuzzing I regularly run into this WARN(). According to Herbert Xu, this "certainly shouldn't be a WARN, it probably shouldn't print anything either". Cc: Stephen Hemminger Cc: Steffen Klassert Cc: Herbert Xu Signed-off-by: Vegard Nossum --- net/xfrm/xfrm_user.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 2477b24..10e4e26 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -2051,9 +2051,6 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh, if (up->hard) { xfrm_policy_delete(xp, p->dir); xfrm_audit_policy_delete(xp, 1, true); - } else { - // reset the timers here? - WARN(1, "Don't know what to do with soft policy expire\n"); } km_policy_expired(xp, p->dir, up->hard, nlh->nlmsg_pid); -- 1.9.1
Re: PROBLEM: network data corruption (bisected to e5a4b0bb803b)
alexmcwhir...@triadic.us writes: > On 2016-07-26 09:59, Christian Lamparter wrote: >> Thanks, I gave the program a try with my WNDA3100 and a WN821N v2 >> devices. >> I did not see any corruptions in any of the tests though. Can you >> tell me >> something about your wireless network too? I would like to know what >> router >> and firmware are you using? Also important: what's your wireless >> configuration? >> (WPA?, CCMP or TKIP? HT40, HT20 or Legacy rates? ...) >> >> Probably the quickest and easiest way to get that information is by >> running >> the following commands as root, when you are connected to your wifi >> network >> and post the results: >> # iw dev wlan0 link >> # iw dev wlan0 scan dump >> >> (You can of course remove your device's MACs, but please do it >> consistently). >> >> Regards, >> Christian > > I wonder if this is something that is only affecting certain NIC's? > For example, it see this issue on sun4u boxes with tigon3 and hme > interfaces. But on my sun4v boxes that have e1000 interfaces > everything works fine. Kernel configuration might also make a difference. Trying to find an Kconfig option which affects this could be useful. -- Kalle Valo
Re: linux-next: build warning after merge of the net-next tree
Hi David, On Tue, 26 Jul 2016 23:19:59 -0700 (PDT) David Millerwrote: > > Fixed thusly: > > > From 36232012344b8db67052432742deaf17f82e70e6 Mon Sep 17 00:00:00 2001 > From: "David S. Miller" > Date: Tue, 26 Jul 2016 23:19:29 -0700 > Subject: [PATCH] xgene: Fix build warning with ACPI disabled. > > drivers/net/ethernet/apm/xgene/xgene_enet_hw.c: In function > 'xgene_enet_phy_connect': > drivers/net/ethernet/apm/xgene/xgene_enet_hw.c:759:22: warning: unused > variable 'adev' [-Wunused-variable] > > Fixes: 8089a96f601b ("drivers: net: xgene: Add backward compatibility") > Reported-by: Stephen Rothwell > Signed-off-by: David S. Miller > --- > drivers/net/ethernet/apm/xgene/xgene_enet_hw.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > index 8a2a221..7714b7d 100644 > --- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > +++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c > @@ -756,7 +756,6 @@ int xgene_enet_phy_connect(struct net_device *ndev) > struct device_node *np; > struct phy_device *phy_dev; > struct device *dev = >pdev->dev; > - struct acpi_device *adev; > int i; > > if (dev->of_node) { > @@ -781,7 +780,7 @@ int xgene_enet_phy_connect(struct net_device *ndev) > pdata->phy_dev = phy_dev; > } else { > #ifdef CONFIG_ACPI > - adev = acpi_phy_find_device(dev); > + struct acpi_device *adev = acpi_phy_find_device(dev); > if (adev) > pdata->phy_dev = adev->driver_data; Looks good to me, thanks. -- Cheers, Stephen Rothwell
Re: [PATCH] xfrm: use printk instead of WARN for bad policy reporting
On Wed, Jul 27, 2016 at 08:20:57AM +0200, Vegard Nossum wrote: > > Here's another patch to remove that too. > > I don't actually *use* this code myself and I feel the justification > I've given for removing the WARN to be a bit weak, so if you don't take > the patch I'll just keep it in my local tree to keep it from showing up > again during fuzzing. Please just kill the whole else clause. For soft policy expires we simply need to relay a message to the KM and nothing more. Thanks, -- Email: Herbert XuHome Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] xfrm: use printk instead of WARN for bad policy reporting
On 07/27/2016 05:01 AM, Herbert Xu wrote: On Wed, Jul 20, 2016 at 01:53:12PM +0200, Vegard Nossum wrote: Just FYI I'm also running into the // reset the timers here? WARN(1, "Don't know what to do with soft policy expire\n"); in xfrm_add_pol_expire() from the same commit, but that looks potentially somewhat more serious (or at least it looks like we might want to do some sort of cleaning up), so I won't touch it for now. It certainly shouldn't be a WARN, it probably shouldn't print anything either. Here's another patch to remove that too. I don't actually *use* this code myself and I feel the justification I've given for removing the WARN to be a bit weak, so if you don't take the patch I'll just keep it in my local tree to keep it from showing up again during fuzzing. Thanks, Vegard >From 5b302eb4c188064a69176a901c2bec3e19440c03 Mon Sep 17 00:00:00 2001 From: Vegard NossumDate: Wed, 27 Jul 2016 08:13:14 +0200 Subject: [PATCH] xfrm: get rid of another incorrect WARN During fuzzing I regularly run into this WARN(). According to Herbert Xu, this "certainly shouldn't be a WARN, it probably shouldn't print anything either". Cc: Stephen Hemminger Cc: Steffen Klassert Cc: Herbert Xu Signed-off-by: Vegard Nossum --- net/xfrm/xfrm_user.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index 2477b24..a4e44f7 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -2053,7 +2053,6 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh, xfrm_audit_policy_delete(xp, 1, true); } else { // reset the timers here? - WARN(1, "Don't know what to do with soft policy expire\n"); } km_policy_expired(xp, p->dir, up->hard, nlh->nlmsg_pid); -- 1.9.1
Re: [PATCH net] be2net: perform temperature query in adapter regardless of its interface state
From: "Guilherme G. Piccoli"Date: Tue, 26 Jul 2016 17:39:42 -0300 > The be2net driver performs fw temperature queries on be_worker() routine, > which is executed each second for each be_adapter. There is a frequency > threshold to avoid fw query to happens at each call to be_worker(); > instead, currently a fw query occurs once in 64 runs of the procedure. > > Nevertheless, this fw temperature query is invoked only for adapters which > interface is up, so we can see I/O errors on read of hwmon counters from > userspace (from tools like lm-sensors) in case we have adapters' functions > which interface is down. > > This patch moves the fw query code to be invoked even if interface is down. > No functional changes were introduced. > > Signed-off-by: Guilherme G. Piccoli Applied.