date:20171014

RE: [PATCH net-next 0/2] Add mqprio hardware offload support in hns3 driver

2017-10-14 Thread Yuval Mintz

> Hi, Yuval
> 
> On 2017/10/13 4:21, Yuval Mintz wrote:
> >> This patchset adds a new hardware offload type in mqprio before adding
> >> mqprio hardware offload support in hns3 driver.
> >
> > I think one of the biggest issues in tying this to DCB configuration is the
> > non-immediate [and possibly non persistent] configuration.
> >
> > Scenario #1:
> > User is configuring mqprio offloaded with 3 TCs while device is in willing
> mode.
> > Would you expect the driver to immediately respond with a success or
> instead
> > delay the return until the DCBx negotiation is complete and the operational
> > num of TCs is actually 3?
> 
> Well, when user requsts the mqprio offloaded by a hardware shared by DCB,
> I expect
> the user is not using the dcb tool.
> If user is still using dcb tool, then result is undefined.
> 
> The scenario you mention maybe can be enforced by setting willing to zero
> when user
> is requesting the mqprio offload, and restore the willing bit when unloaded
> the mqprio
> offload.

Sounds a bit harsh but would probably work.

> But I think the real issue is that dcb and mqprio shares the tc system in the
> stack,
> the problem may be better to be fixed in the stack rather than in the driver,
> as you
> suggested in the DCB patchset. What do you think?

What did you have in mind?

> 
> >
> > Scenario #2:
> > Assume user explicitly offloaded mqprio with 3 TCs, but now DCB
> configuration
> > has changed on the peer side and 4 TCs is the new negotiated operational
> value.
> > Your current driver logic would change the number of TCs underneath the
> user
> > configuration [and it would actually probably work due to mqprio being a
> crappy
> > qdisc]. But was that the user actual intention?
> > [I think the likely answer in this scenario is 'yes' since the alternative 
> > is no
> better.
> > But I still thought it was worth mentioning]
> 
> You are right, the problem also have something to do with mqprio and dcb
> sharing
> the tc in the stack.
> 
> Druing testing, when user explicitly offloaded mqprio with 3 TCs, all
> queue has a default pfifo mqprio attached, after DCB changes the tc num to
> 4,
> using tc qdisc shows some queue does not have a default pfifo mqprio
> attached.

Really? Then what did it show? 
[I assume it has some pfifo attached, and it's an mqprio dump kind of an issue]

> 
> Maybe we can add a callback to notify mqprio the configuration has changed.
> 

Which would do what?
You already have the notifications available for monitoring using dcbnl logic 
if the
configuration change [for user]; So user can re-configure whatever it wants.
But other than dropping all the qdisc configurations and going back to the 
default
qdiscs, what default action would mqprio be able to do when configuration 
changes
that actually makes sense?

> Thanks
> Yunsheng Lin
> 
> >
> > Cheers,
> > Yuval
> >
> >>
> >> Yunsheng Lin (2):
> >>   mqprio: Add a new hardware offload type in mqprio
> >>   net: hns3: Add mqprio hardware offload support in hns3 driver
> >>
> >>  drivers/net/ethernet/hisilicon/hns3/hnae3.h|  1 +
> >>  .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 23 +++
> >>  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 46
> ++-
> >> ---
> >>  include/uapi/linux/pkt_sched.h |  1 +
> >>  4 files changed, 55 insertions(+), 16 deletions(-)
> >>
> >> --
> >> 1.9.1
> >
> >
> >

Re: [net-next 3/3] tcp: keep tcp_collapse controllable even after processing starts

2017-10-14 Thread Koichiro Den

On Sat, 2017-10-14 at 08:46 -0700, Eric Dumazet wrote:
> On Sat, 2017-10-14 at 16:27 +0900, Koichiro Den wrote:
> > Combining actual collapsing with reasoning for deciding the starting
> > point, we can apply its logic in a consistent manner such that we can
> > avoid costly yet not much useful collapsing. When collapsing to be
> > triggered, it's not rare that most of the skbs in the receive or ooo
> > queue are large ones without much metadata overhead. This also
> > simplifies code and makes it easier to apply logic in a fair manner.
> > 
> > Subtle subsidiary changes included:
> > - When the end_seq of the skb we are trying to collapse was larger than
> >   the 'end' argument provided, we would end up copying to the 'end'
> >   even though we couldn't collapse the original one. Current users of
> >   tcp_collapse does not require such reserves so redefines it as the
> >   point over which skbs whose seq passes guranteed not to be collapsed.
> > - Naturally tcp_collapse_ofo_queue shapes up and we no longer need
> >   'tail' argument.
> 
> 
> I am not inclined to review such a large change, without you providing
> actual numbers.
> 
> We have a problem in TCP right now, that receiver announces a too big
> window, and that is the main reason we trigger collapsing.
> 
> I would rather fix the root cause.
> 
> 
Ok I got it, thank you.

Re: [net-next 2/3] tcp: do not tcp_collapse once SYN or FIN found

2017-10-14 Thread Koichiro Den

On Sat, 2017-10-14 at 08:43 -0700, Eric Dumazet wrote:
> On Sat, 2017-10-14 at 16:27 +0900, Koichiro Den wrote:
> > Since 9f5afeae5152 ("tcp: use an RB tree for ooo receive queue")
> > applied, we no longer need to continue to search for the starting
> > point once we encounter FIN packet. Same reasoning for SYN packet
> > since commit 9d691539eea2d ("tcp: do not enqueue skb with SYN flag"),
> > that would help us with error message when actual receiving.
> 
> Very confusing changelog or patch.
> 
> What exact problem do you want to solve ?
> 
> 
That I thought as unnecessary search for the starting point. I am going to re-
read all to correct my misunderstanding.
Thank you.

Re: [net-next 1/3] tcp: avoid useless copying and collapsing of just one skb

2017-10-14 Thread Koichiro Den

On Sat, 2017-10-14 at 08:42 -0700, Eric Dumazet wrote:
> On Sat, 2017-10-14 at 16:27 +0900, Koichiro Den wrote:
> > On the starting point chosen, it could be possible that just one skb
> > remains in between the range provided, leading to copying and re-insertion
> > of rb node, which is useless with respect to the rcv buf measurement.
> > This is rather probable in ooo queue case, in which non-contiguous bloated
> > packets have been queued up.
> > 
> > Signed-off-by: Koichiro Den 
> > ---
> >  net/ipv4/tcp_input.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> > index d0682ce2a5d6..1d785b5bf62d 100644
> > --- a/net/ipv4/tcp_input.c
> > +++ b/net/ipv4/tcp_input.c
> > @@ -4807,7 +4807,8 @@ tcp_collapse(struct sock *sk, struct sk_buff_head
> > *list, struct rb_root *root,
> >     start = TCP_SKB_CB(skb)->end_seq;
> >     }
> >     if (end_of_skbs ||
> > -   (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)))
> > +   (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) ||
> > +   (TCP_SKB_CB(skb)->seq == start && TCP_SKB_CB(skb)->end_seq ==
> > end))
> >     return;
> >  
> >     __skb_queue_head_init();
> 
> 
> What do you mean by useless ?
> 
> Surely if this skb contains 17 segments (some USB drivers allocate 8KB
> per frame), we want to collapse them to save memory.
> 
> So I do not agree with this patch.
> 
> 
I missed that, and sorry about bothering with all, I totally misunderstood it.
Thank you for all of them.

[PATCH] rtlwifi: Fix typo in if ... else if ... else construct

2017-10-14 Thread Larry Finger

The kbuild test robot reports two conditions with no effect (if == else).
These are the result of copy and paste typographical errors.

Signed-off-by: Larry Finger 
Cc: Ping-Ke Shih 
Cc: Yan-Hsuan Chuang 
Cc: Birming Chiu 
Cc: Shaofu 
Cc: Steven Ting 
Cc: kbuild-...@01.org
Cc: Julia Lawall 
---
 drivers/staging/rtlwifi/base.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/rtlwifi/base.c b/drivers/staging/rtlwifi/base.c
index b88b0e8edd3d..1a0331cf63ee 100644
--- a/drivers/staging/rtlwifi/base.c
+++ b/drivers/staging/rtlwifi/base.c
@@ -920,7 +920,7 @@ static u8 _rtl_get_vht_highest_n_rate(struct ieee80211_hw 
*hw,
else if ((tx_mcs_map  & 0x000c) >> 2 ==
IEEE80211_VHT_MCS_SUPPORT_0_8)
hw_rate =
-   rtlpriv->cfg->maps[RTL_RC_VHT_RATE_2SS_MCS9];
+   rtlpriv->cfg->maps[RTL_RC_VHT_RATE_2SS_MCS8];
else
hw_rate =
rtlpriv->cfg->maps[RTL_RC_VHT_RATE_2SS_MCS9];
@@ -932,7 +932,7 @@ static u8 _rtl_get_vht_highest_n_rate(struct ieee80211_hw 
*hw,
else if ((tx_mcs_map  & 0x0003) ==
IEEE80211_VHT_MCS_SUPPORT_0_8)
hw_rate =
-   rtlpriv->cfg->maps[RTL_RC_VHT_RATE_1SS_MCS9];
+   rtlpriv->cfg->maps[RTL_RC_VHT_RATE_1SS_MCS8];
else
hw_rate =
rtlpriv->cfg->maps[RTL_RC_VHT_RATE_1SS_MCS9];
-- 
2.12.3

Re: [PATCH net 0/6] bnxt_en: bug fixes.

2017-10-14 Thread David Miller

From: Michael Chan 
Date: Fri, 13 Oct 2017 21:09:28 -0400

> Various bug fixes for the VF/PF link change logic, VF resource checking,
> potential firmware response corruption on NVRAM and DCB parameters,
> and reading the wrong register for PCIe link speed on the VF.

Series applied, thanks.

Re: [net-next 0/9][pull request] 40GbE Intel Wired LAN Driver Updates 2017-10-13

2017-10-14 Thread David Miller

From: Jeff Kirsher 
Date: Fri, 13 Oct 2017 14:52:40 -0700

> This series contains updates to mqprio and i40e.

Pulled, thanks Jeff.

Re: [PATCH net-next v2 0/4] tc-testing: Test suite updates

2017-10-14 Thread David Miller

From: Lucas Bates 
Date: Fri, 13 Oct 2017 17:51:21 -0400

> This patch series is a roundup of changes to the tc-testing
> suite:
> 
>  - Add test cases for police and mirred modules and some coverage
>in already-submitted test categories
>  - Break the test case files down into more user-friendly sizes
>  - Bug fix to the tdc.py script's handling of the -l argument
> 
> v2: fix the lack of final newlines in two new files (thanks David)

Series applied, thanks for fixing that up.

Re: [net-next PATCH 0/2] Minor macvlan source mode cleanups

2017-10-14 Thread David Miller

From: Alexander Duyck 
Date: Fri, 13 Oct 2017 13:40:18 -0700

> So this patch series is just a few minor cleanups for macvlan source mode.
> The first patch addresses double receives when a packet is being routed to
> the macvlan destination address, and the other addresses the pkt_type being
> updated in cases where it most likely should not be.

Series applied, thanks.

Re: [Patch net-next v3] tcp: add a tracepoint for tcp retransmission

2017-10-14 Thread David Miller

From: Cong Wang 
Date: Fri, 13 Oct 2017 13:03:16 -0700

> We need a real-time notification for tcp retransmission
> for monitoring.
> 
> Of course we could use ftrace to dynamically instrument this
> kernel function too, however we can't retrieve the connection
> information at the same time, for example perf-tools [1] reads
> /proc/net/tcp for socket details, which is slow when we have
> a lots of connections.
> 
> Therefore, this patch adds a tracepoint for __tcp_retransmit_skb()
> and exposes src/dst IP addresses and ports of the connection.
> This also makes it easier to integrate into perf.
> 
> Note, I expose both IPv4 and IPv6 addresses at the same time:
> for a IPv4 socket, v4 mapped address is used as IPv6 addresses,
> for a IPv6 socket, LOOPBACK4_IPV6 is already filled by kernel.
> Also, add sk and skb pointers as they are useful for BPF.
> 
> 1. https://github.com/brendangregg/perf-tools/blob/master/net/tcpretrans
> 
> Cc: Eric Dumazet 
> Cc: Alexei Starovoitov 
> Cc: Hannes Frederic Sowa 
> Cc: Brendan Gregg 
> Cc: Neal Cardwell 
> Signed-off-by: Cong Wang 

Applied, thank you.

Re: [Patch net-next] net_sched: fix a compile warning in act_ife

2017-10-14 Thread David Miller

From: Cong Wang 
Date: Fri, 13 Oct 2017 12:58:13 -0700

> Apparently ife_meta_id2name() is only called when
> CONFIG_MODULES is defined.
> 
> This fixes:
> 
> net/sched/act_ife.c:251:20: warning: ‘ife_meta_id2name’ defined but not used 
> [-Wunused-function]
>  static const char *ife_meta_id2name(u32 metaid)
> ^~~~
> 
> Fixes: d3f24ba895f0 ("net sched actions: fix module auto-loading")
> Cc: Roman Mashak 
> Signed-off-by: Cong Wang 

Applied.

Re: [PATCH net-next,0/3] Add init of send table and var renames

2017-10-14 Thread David Miller

From: Haiyang Zhang 
Date: Fri, 13 Oct 2017 12:28:02 -0700

> From: Haiyang Zhang 
> 
> Add initialization of send indirection table. Otherwise it may contain
> old info of previous device with different number of channels.
> 
> Also, did some variable renaming for easier reading.

Series applied, thank you.

Re: [PATCH net] l2tp: check ps->sock before running pppol2tp_session_ioctl()

2017-10-14 Thread David Miller

From: Guillaume Nault 
Date: Fri, 13 Oct 2017 19:22:35 +0200

> When pppol2tp_session_ioctl() is called by pppol2tp_tunnel_ioctl(),
> the session may be unconnected. That is, it was created by
> pppol2tp_session_create() and hasn't been connected with
> pppol2tp_connect(). In this case, ps->sock is NULL, so we need to check
> for this case in order to avoid dereferencing a NULL pointer.
> 
> Fixes: 309795f4bec2 ("l2tp: Add netlink control API for L2TP")
> Signed-off-by: Guillaume Nault 

Applied and queued up for -stable, thanks.

Re: [PATCH][net-next] cxgb4: fix missing break in switch and indent return statements

2017-10-14 Thread David Miller

From: Colin King 
Date: Fri, 13 Oct 2017 17:29:00 +0100

> From: Colin Ian King 
> 
> The break statement for the Macronix case is missing and will
> fall through to the Winbond case and re-assign the size setting.
> Fix this by adding the missing break statement.  Also correctly
> indent the return statements.
> 
> Detected by CoverityScan, CID#1458020 ("Missing break in switch")
> 
> Fixes: 96ac18f14a5a ("cxgb4: Add support for new flash parts")
> Signed-off-by: Colin Ian King 

Applied.

Re: pull-request: mac80211-next 2017-10-13

2017-10-14 Thread David Miller

From: Johannes Berg 
Date: Fri, 13 Oct 2017 17:53:31 +0200

> Sorry for the quick succession - there were a few issues with
> the last pull request that only got noticed now, so I'm fixing
> those here.
> 
> Please pull and let me know if there's any problem.

No worries, pulled, thanks Johannes.

Re: [PATCH net-next v2 0/8] cxgb4: add support to get hardware debug logs via ethtool

2017-10-14 Thread David Miller

From: Rahul Lakkireddy 
Date: Fri, 13 Oct 2017 18:48:12 +0530

> This series of patches add support to collect hardware debug logs
> via ethtool --get-dump facility.

Series applied, thank you.

Re: [PATCH] nfp: Explicitly include linux/bug.h

2017-10-14 Thread David Miller

From: Mark Brown 
Date: Fri, 13 Oct 2017 03:50:35 +0100

> Today's -next build encountered an error due to a missing definition of
> WARN_ON(), caused by some header reorganization removing an implicit
> inclusion of linux/bug.h.  Fix this with an explicit inclusion.
> 
> Signed-off-by: Mark Brown 

Applied, thanks Mark.

Re: [PATCH net-next v3 0/5] net: dsa: remove .set_addr

2017-10-14 Thread David Miller

From: Vivien Didelot 
Date: Fri, 13 Oct 2017 14:18:04 -0400

> An Ethernet switch may support having a MAC address, which can be used
> as the switch's source address in transmitted full-duplex Pause frames.
> 
> If a DSA switch supports the related .set_addr operation, the DSA core
> sets the master's MAC address on the switch.
> 
> This won't make sense anymore in a multi-CPU ports system, because there
> won't be a unique master device assigned to a switch tree.
> 
> Moreover this operation is confusing because it makes the user think
> that it could be used to program the switch with the MAC address of the
> CPU/management port such that MAC address learning can be disabled on
> said port, but in fact, that's not how it is currently used.
> 
> To fix this, assign a random MAC address at setup time in the mv88e6060
> and mv88e6xxx drivers before removing .set_addr completely from DSA.
> 
> Changes in v3:
>   - include fix for mv88e6060 switch MAC address setter.
> 
> Changes in v2:
>   - remove .set_addr implementation from drivers and use a random MAC.

Series applied,thanks Vivien.

Re: [PATCH] atm: fore200e: mark expected switch fall-throughs

2017-10-14 Thread David Miller

From: "Gustavo A. R. Silva" 
Date: Thu, 12 Oct 2017 16:11:32 -0500

> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Signed-off-by: Gustavo A. R. Silva 

Applied to net-next

Re: [PATCH] fix typo in skbuff.c

2017-10-14 Thread David Miller

From: Wenhua Shi 
Date: Sat, 14 Oct 2017 18:51:36 +0200

> ---
>  net/core/skbuff.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied, thanks.

Re: Linux 4.12+ memory leak on router with i40e NICs

2017-10-14 Thread Alexander Duyck

Hi Pawel,

To clarify is that Dave Miller's tree or Linus's that you are talking
about? If it is Dave's tree how long ago was it you pulled it since I
think the fix was just pushed by Jeff Kirsher a few days ago.

The issue should be fixed in the following commit:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972

Thanks.

- Alex

On Sat, Oct 14, 2017 at 3:03 PM, Paweł Staszewski  wrote:
> Forgot to add - this graphs are tested with Kernel 4.14-rc4-next
>
>
> W dniu 2017-10-15 o 00:00, Paweł Staszewski pisze:
>
> Same problem here
>
> Also only difference is change 82599 intel to x710 and have memleak
>
> mem with ixgbe driver over time - same config saame kernel
>
>
>
> changed NIC's to x710 i40e driver (this is the only change)
>
> And mem over time:
>
>
>
> There is no process that is eating memory - looks like there is some problem
> with i40e driver - but it not a surprise :) this driver is really buggy -
> with many things - most tickets on e1000e sourceforge that i openned have no
> reply for year or more - or if somebody reply after year they are closing
> ticket after 1 day with info about no activity :)
>
>
>
> W dniu 2017-10-05 o 07:19, Anders K. Pedersen | Cohaesio pisze:
>
> On ons, 2017-10-04 at 08:32 -0700, Alexander Duyck wrote:
>
> On Wed, Oct 4, 2017 at 5:56 AM, Anders K. Pedersen | Cohaesio
>  wrote:
>
> Hello,
>
> After updating one of our Linux based routers to kernel 4.13 it
> began
> leaking memory quite fast (about 1 GB every half hour). To narrow
> we
> tried various kernel versions and found that 4.11.12 is okay, while
> 4.12 also leaks, so we did a bisection between 4.11 and 4.12.
>
> The first bisection ended at
> "[6964e53f55837b0c49ed60d36656d2e0ee4fc27b] i40e: fix handling of
> HW
> ATR eviction", which fixes some flag handling that was broken by
> 47994c119a36 "i40e: remove hw_disabled_flags in favor of using
> separate
> flag bits", so I did a second bisection, where I added 6964e53f5583
> "i40e: fix handling of HW ATR eviction" to the steps that had
> 47994c119a36 "i40e: remove hw_disabled_flags in favor of using
> separate
> flag bits" in them.
>
> The second bisection ended at
> "[0e626ff7ccbfc43c6cc4aeea611c40b899682382] i40e: Fix support for
> flow
> director programming status", where I don't see any obvious
> problems,
> so I'm hoping for some assistance.
>
> The router is a PowerEdge R730 server (Haswell based) with three
> Intel
> NICs (all using the i40e driver):
>
> X710 quad port 10 GbE SFP+: eth0 eth1 eth2 eth3
> X710 quad port 10 GbE SFP+: eth4 eth5 eth6 eth7
> XL710 dual port 40 GbE QSFP+: eth8 eth9
>
> The NICs are aggregated with LACP with the team driver:
>
> team0: eth9 (40 GbE selected primary), and eth3, eth7 (10 GbE non-
> selected backups)
> team1: eth0, eth1, eth4, eth5 (all 10 GbE selected)
>
> team0 is used for internal networks and has one untagged and four
> tagged VLAN interfaces, while team1 has an external uplink
> connection
> without any VLANs.
>
> The router runs an eBGP session on team1 to one of our uplinks, and
> iBGP via team0 to our other border routers. It also runs OSPF on
> the
> internal VLANs on team0. One thing I've noticed is that when OSPF
> is
> not announcing a default gateway to the internal networks, so there
> is
> almost no traffic coming in on team0 and out on team1, but still
> plenty
> of traffic coming in on team1 and out via team0, there's no memory
> leak
> (or at least it is so small that we haven't detected it). But as
> soon
> as we configure OSPF to announce a default gateway to the internal
> VLANs, so we get traffic from team0 to team1 the leaking begins.
> Stopping the OSPF default gateway announcement again also stops the
> leaking, but does not release already leaked memory.
>
> So this leads to me suspect that the leaking is related to RX on
> team0
> (where XL710 eth9 is normally the only active interface) or TX on
> team1
> (X710 eth0, eth1, eth4, eth5). The first bad commit is related to
> RX
> cleaning, which suggests RX on team0. Since we're only seeing the
> leak
> for our outbound traffic, I suspect either a difference between the
> X710 vs. XL710 NICs, or that the inbound traffic is for relatively
> few
> destination addresses (only our own systems) while the outbound
> traffic
> is for many different addresses on the internet. But I'm just
> guessing
> here.
>
> I've tried kmemleak, but it only found a few kB of suspected memory
> leaks (several of which disappeared again after a while).
>
> Below I've included more details - git bisect logs, ethtool -i,
> dmesg,
> Kernel .config, and various memory related /proc files. Any help or
> suggestions would be much appreciated, and please let me know if
> more
> information is needed or there's something I should try.
>
> Regards,
> Anders K. Pedersen
>
> Hi Anders,
>
> I think I see the problem and should

Re: [PATCH net-next] virtio_net: implement VIRTIO_CONFIG_S_NEEDS_RESET

2017-10-14 Thread Michael S. Tsirkin

On Fri, Oct 13, 2017 at 11:51:40AM -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> Implement the reset communication request defined in the VIRTIO 1.0
> specification and introduces in Linux in commit c00bbcf862896 ("virtio:
> add VIRTIO_CONFIG_S_NEEDS_RESET device status bit").
> 
> Use the virtnet_reset function introduced in commit 2de2f7f40ef9
> ("virtio_net: XDP support for adjust_head"). That was removed in
> commit 4941d472bf95 ("virtio-net: do not reset during XDP set"),
> because no longer used. Bring it back, minus the xdp specific code.
> 
> Before tearing down any state, virtnet_freeze_down quiesces the
> device with netif_tx_disable. virtnet_reset also ensures that no
> other config operations can run concurrently.
> 
> On successful reset, the host can observe that the flag has been
> cleared. There is no need for the explicit control flag introduced
> in the previous RFC of this patch.
> 
> Changes
>   RFC -> v1
>   - drop VIRTIO_NET_CTRL_RESET_ACK message
>   - drop VIRTIO_F_CAN_RESET flag to notify guest support
> 
> Signed-off-by: Willem de Bruijn 
> ---
>  drivers/net/virtio_net.c | 48 
> 
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index fc059f193e7d..8e768b54844f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1903,13 +1903,14 @@ static const struct ethtool_ops virtnet_ethtool_ops = 
> {
>   .set_link_ksettings = virtnet_set_link_ksettings,
>  };
>  
> -static void virtnet_freeze_down(struct virtio_device *vdev)
> +static void virtnet_freeze_down(struct virtio_device *vdev, bool in_config)
>  {
>   struct virtnet_info *vi = vdev->priv;
>   int i;
>  
> - /* Make sure no work handler is accessing the device */
> - flush_work(>config_work);
> + /* Make sure no other work handler is accessing the device */
> + if (!in_config)
> + flush_work(>config_work);
>  
>   netif_device_detach(vi->dev);
>   netif_tx_disable(vi->dev);
> @@ -1924,6 +1925,7 @@ static void virtnet_freeze_down(struct virtio_device 
> *vdev)
>  }
>  
>  static int init_vqs(struct virtnet_info *vi);
> +static void remove_vq_common(struct virtnet_info *vi);
>  
>  static int virtnet_restore_up(struct virtio_device *vdev)
>  {
> @@ -1952,6 +1954,40 @@ static int virtnet_restore_up(struct virtio_device 
> *vdev)
>   return err;
>  }
>  
> +static int virtnet_reset(struct virtnet_info *vi)
> +{
> + struct virtio_device *dev = vi->vdev;
> + int ret;
> +
> + virtio_config_disable(dev);
> + dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> + virtnet_freeze_down(dev, true);
> + remove_vq_common(vi);
> +
> + virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> + virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> +
> + ret = virtio_finalize_features(dev);
> + if (ret)
> + goto err;
> +
> + ret = virtnet_restore_up(dev);
> + if (ret)
> + goto err;
> +
> + ret = virtnet_set_queues(vi, vi->curr_queue_pairs);
> + if (ret)
> + goto err;
> +
> + virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> + virtio_config_enable(dev);
> + return 0;
> +
> +err:
> + virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> + return ret;
> +}
> +
>  static int virtnet_set_guest_offloads(struct virtnet_info *vi, u64 offloads)
>  {
>   struct scatterlist sg;

I have a question here though. How do things like MAC address
get restored?

What about the rx mode?

vlans?

Also, it seems that LINK_ANNOUNCE requests will get ignored
even if they got set before the reset, leading to downtime.

> @@ -2136,6 +2172,10 @@ static void virtnet_config_changed_work(struct 
> work_struct *work)
>   virtnet_ack_link_announce(vi);
>   }
>  
> + if (vi->vdev->config->get_status(vi->vdev) &
> + VIRTIO_CONFIG_S_NEEDS_RESET)
> + virtnet_reset(vi);
> +
>   /* Ignore unknown (future) status bits */
>   v &= VIRTIO_NET_S_LINK_UP;
>  
> @@ -2756,7 +2796,7 @@ static __maybe_unused int virtnet_freeze(struct 
> virtio_device *vdev)
>   struct virtnet_info *vi = vdev->priv;
>  
>   virtnet_cpu_notif_remove(vi);
> - virtnet_freeze_down(vdev);
> + virtnet_freeze_down(vdev, false);
>   remove_vq_common(vi);
>  
>   return 0;
> -- 
> 2.15.0.rc0.271.g36b669edcc-goog

Re: [patch net-next 06/34] net: core: use dev->ingress_queue instead of tp->q

2017-10-14 Thread Daniel Borkmann


On 10/13/2017 08:30 AM, Jiri Pirko wrote:

Thu, Oct 12, 2017 at 11:45:43PM CEST, dan...@iogearbox.net wrote:

On 10/12/2017 07:17 PM, Jiri Pirko wrote:

From: Jiri Pirko 

In sch_handle_egress and sch_handle_ingress, don't use tp->q and use
dev->ingress_queue which stores the same pointer instead.

Signed-off-by: Jiri Pirko 
---
   net/core/dev.c | 21 +++--
   1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fcddccb..cb9e5e5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3273,14 +3273,18 @@ EXPORT_SYMBOL(dev_loopback_xmit);
   static struct sk_buff *
   sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
   {
+   struct netdev_queue *netdev_queue =
+   rcu_dereference_bh(dev->ingress_queue);
struct tcf_proto *cl = rcu_dereference_bh(dev->egress_cl_list);
struct tcf_result cl_res;
+   struct Qdisc *q;

-   if (!cl)
+   if (!cl || !netdev_queue)
return skb;
+   q = netdev_queue->qdisc;


NAK, no additional overhead in the software fast-path of
sch_handle_{ingress,egress}() like this. There are users out there
that use tc in software only, so performance is critical here.


Okay, how else do you suggest I can avoid the need to use tp->q?
I was thinking about storing q directly to net_device, which would safe
one dereference, resulting in the same amount as current cl->q.


Sorry for late reply, mostly off for few days. netdev struct has different
cachelines which are hot on rx and tx (see also the location of the two
lists, ingress_cl_list and egress_cl_list), if you add only one qdisc
pointer there, then you'd at least penalize one of the two w/ potential
cache miss. Can we leave it in cl?

Re: [PATCH] Add -target to clang switch while cross compiling.

2017-10-14 Thread Daniel Borkmann


On 10/13/2017 09:24 PM, Abhijit Ayarekar wrote:

Update to llvm excludes assembly instructions.
llvm git revision is below

commit 65fad7c26569 ("bpf: add inline-asm support")

This change will be part of llvm  release 6.0

__ASM_SYSREG_H define is not required for native compile.
-target switch includes appropriate target specific files
while cross compiling

Tested on x86 and arm64.

Signed-off-by: Abhijit Ayarekar 


Acked-by: Daniel Borkmann

Re: [pull request][for-next 00/12] Mellanox, mlx5 IPoIB Muli Pkey support 2017-10-11

2017-10-14 Thread Parav Pandit

On Sat, Oct 14, 2017 at 1:48 PM, Saeed Mahameed  wrote:
> Hi Dave and Doug,
>
> This series includes updates for mlx5 IPoIB offloading driver from Alex
> and Feras to add the support for Muli Pkey in the mlx5i ipoib offloading 
> netdev,

In description and cover letter subject, here
s/Muli/Multi

Re: [OpenWrt-Devel] [PATCH v3 0/6] staging: Introduce DPAA2 Ethernet Switch driver

2017-10-14 Thread Florian Fainelli

On October 14, 2017 2:59:22 PM PDT, Linus Walleij  
wrote:
>On Sat, Oct 14, 2017 at 8:52 PM, Florian Fainelli
> wrote:
>
>> The most deployed switch device drivers have been converted to DSA
>> already: b53, qca8k (ar83xx in OpenWrt/LEDE) and mtk7530 are all in
>> tree, and now we are getting new submissions from Michrochip to
>support
>> their pretty large KSZ series. Converting from swconfig to DSA is
>> actually quite simple, but like anything requires time and testing,
>and
>> access to hardware and ideally datasheet.
>
>Hm, I have a Realtek RB8366RB in this router on my desk.
>
>I guess that means I should just take the old switchdev-based
>SMI-driver and convert it to DSA.
>
>I bet I can do that :D

Yes, it really should not be too hard. The OpenWrt/LEDE driver had mostly the 
same semantics as what is needed for being a proper DSA driver. You should of 
course start simple: get basic switching working, then add statistics, VLAN, 
FDB, etc. OpenWrt/LEDE models switches as PHY device objects which would not 
work upstream so you should have the driver be probed as a MDIO/SPI/I2C (see 
b53 for example) and set up fixed-link properties between the CPU and the 
switch.

>
>Well, I will try. Because it's blocking me to work on the Gemini
>ethernet driver.

Well usually the boot loader may leave the switch in a good enough state that 
you can work on the CPU controller mostly independently from dealing with the 
switch. This is not universally true, and a properly working bootloader should 
actually quiesce/reset both blocks prior to OS control.

Don't hesitate if you have questions.

Cheers.

-- 
Florian

Re: [OpenWrt-Devel] [PATCH v3 0/6] staging: Introduce DPAA2 Ethernet Switch driver

2017-10-14 Thread Linus Walleij

On Sat, Oct 14, 2017 at 8:52 PM, Florian Fainelli  wrote:

> The most deployed switch device drivers have been converted to DSA
> already: b53, qca8k (ar83xx in OpenWrt/LEDE) and mtk7530 are all in
> tree, and now we are getting new submissions from Michrochip to support
> their pretty large KSZ series. Converting from swconfig to DSA is
> actually quite simple, but like anything requires time and testing, and
> access to hardware and ideally datasheet.

Hm, I have a Realtek RB8366RB in this router on my desk.

I guess that means I should just take the old switchdev-based
SMI-driver and convert it to DSA.

I bet I can do that :D

Well, I will try. Because it's blocking me to work on the Gemini
ethernet driver.

Yours,
Linus Walleij

Re: [pull request][for-next 00/12] Mellanox, mlx5 IPoIB Muli Pkey support 2017-10-11

2017-10-14 Thread Doug Ledford

On 10/14/2017 2:48 PM, Saeed Mahameed wrote:
> Hi Dave and Doug,
> 
> This series includes updates for mlx5 IPoIB offloading driver from Alex
> and Feras to add the support for Muli Pkey in the mlx5i ipoib offloading 
> netdev,
> to be merged into net-next and rdma-next trees.

As far as the two IPoIB patches are concerned, they're fine.

> Doug, I am sorry I couldn't base this on rc2 since the series needs and 
> conflicts
> with a fix that was submitted to rc3, so to keep things simple I based it on 
> rc4,
> I hope this is ok with you..

No worries, it just means I have to submit it under another branch.  But
I'm already holding one patch series in a stand alone branch, so no big
deal.  And, actually, the IPoIB changes are so small they can simply go
through Dave's tree if you don't have any dependent code in the IPoIB
driver to submit after this but still in this devel cycle.

> Please pull and let me know if there's any problem.

Once I hear that Dave is OK with the net changes, I'm ready to pull (if
I need to).

-- 
Doug Ledford 
GPG Key ID: B826A3330E572FDD
Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

signature.asc
Description: OpenPGP digital signature

Re: [PATCH v1 RFC 7/7] Modify tag_ksz.c so that tail tag code can be used by other KSZ switch drivers

2017-10-14 Thread Andrew Lunn

On Fri, Oct 06, 2017 at 01:33:05PM -0700, tristram...@microchip.com wrote:
> From: Tristram Ha 
> 
> Modify tag_ksz.c so that tail tag code can be used by other KSZ switch
> drivers.

There is multiple things going on in this patch. Please split the
special_mult_addr change into a separate patch, with an explanation
why it is needed. Same for the PTP indication.

It is always better to have lots of small patches, one logical change
per patch.

Andrew

Re: [PATCH v1 RFC 6/7] Add MIB counter reading support

2017-10-14 Thread Andrew Lunn

> +static void ksz9477_phy_setup(struct ksz_device *dev, int port,
> +   struct phy_device *phy)
> +{
> + if (port < dev->phy_port_cnt) {
> + /* SUPPORTED_Asym_Pause and SUPPORTED_Pause can be removed to
> +  * disable flow control when rate limiting is used.
> +  */
> + phy->advertising = phy->supported;
> + }
> +}
> +

This has nothing to do with MIBs. It does not belong in this patch.

>  static void ksz9477_port_setup(struct ksz_device *dev, int port, bool 
> cpu_port)
>  {
>   u8 data8;
> @@ -1159,6 +1198,8 @@ static int ksz9477_setup(struct dsa_switch *ds)
>   /* start switch */
>   ksz_cfg(dev, REG_SW_OPERATION, SW_START, true);
>  
> + ksz_init_mib_timer(dev);
> +
>   return 0;
>  }
>  
> @@ -1168,6 +1209,7 @@ static int ksz9477_setup(struct dsa_switch *ds)
>   .set_addr   = ksz9477_set_addr,
>   .phy_read   = ksz9477_phy_read16,
>   .phy_write  = ksz9477_phy_write16,
> + .adjust_link= ksz_adjust_link,

Please move the adjust_link changes into a separate patch.

I need to come back at look at the mutex's and the freeze code. It is
not obviously correct, and i don't have the time at the moment.

   Andrew

Re: [PATCH v1 RFC 5/7] Break KSZ9477 DSA driver into two files

2017-10-14 Thread Andrew Lunn

> diff --git a/drivers/net/dsa/microchip/ksz9477.c 
> b/drivers/net/dsa/microchip/ksz9477.c
> new file mode 100644
> index 000..214d380
> --- /dev/null
> +++ b/drivers/net/dsa/microchip/ksz9477.c
> @@ -0,0 +1,1328 @@
> +/*
> + * Microchip KSZ9477 switch driver main logic
> + *
> + * Copyright (C) 2017 Microchip Technology Inc.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "ksz_priv.h"
> +#include "ksz_common.h"
> +#include "ksz_9477_reg.h"
> +
> +static const struct {
> + int index;
> + char string[ETH_GSTRING_LEN];
> +} mib_names[TOTAL_SWITCH_COUNTER_NUM] = {
> + { 0x00, "rx_hi" },
> + { 0x01, "rx_undersize" },
> + { 0x02, "rx_fragments" },
> + { 0x03, "rx_oversize" },
> + { 0x04, "rx_jabbers" },
> + { 0x05, "rx_symbol_err" },
> + { 0x06, "rx_crc_err" },
> + { 0x07, "rx_align_err" },
> + { 0x08, "rx_mac_ctrl" },
> + { 0x09, "rx_pause" },
> + { 0x0A, "rx_bcast" },
> + { 0x0B, "rx_mcast" },
> + { 0x0C, "rx_ucast" },
> + { 0x0D, "rx_64_or_less" },
> + { 0x0E, "rx_65_127" },
> + { 0x0F, "rx_128_255" },
> + { 0x10, "rx_256_511" },
> + { 0x11, "rx_512_1023" },
> + { 0x12, "rx_1024_1522" },
> + { 0x13, "rx_1523_2000" },
> + { 0x14, "rx_2001" },
> + { 0x15, "tx_hi" },
> + { 0x16, "tx_late_col" },
> + { 0x17, "tx_pause" },
> + { 0x18, "tx_bcast" },
> + { 0x19, "tx_mcast" },
> + { 0x1A, "tx_ucast" },
> + { 0x1B, "tx_deferred" },
> + { 0x1C, "tx_total_col" },
> + { 0x1D, "tx_exc_col" },
> + { 0x1E, "tx_single_col" },
> + { 0x1F, "tx_mult_col" },
> + { 0x80, "rx_total" },
> + { 0x81, "tx_total" },
> + { 0x82, "rx_discards" },
> + { 0x83, "tx_discards" },
> +};
> +
> +static void ksz_cfg32(struct ksz_device *dev, u32 addr, u32 bits, bool set)
> +{
> + u32 data;
> +
> + ksz_read32(dev, addr, );
> + if (set)
> + data |= bits;
> + else
> + data &= ~bits;
> + ksz_write32(dev, addr, data);
> +}

In a follow up patch, it would be good to fixup the naming. All
functions should use the ksz9477_ prefix.

But this is O.K. for now.

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v1 RFC 4/7] Rename ksz_spi.c to ksz9477_spi.c

2017-10-14 Thread Andrew Lunn

On Fri, Oct 06, 2017 at 01:33:02PM -0700, tristram...@microchip.com wrote:
> From: Tristram Ha 
> 
> Rename ksz_spi.c to ksz9477_spi.c and update Kconfig in preparation to add
> more KSZ switch drivers.
> 
> Signed-off-by: Tristram Ha 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v1 RFC 3/7] Rename some functions with ksz9477 prefix

2017-10-14 Thread Andrew Lunn

On Fri, Oct 06, 2017 at 01:33:01PM -0700, tristram...@microchip.com wrote:
> From: Tristram Ha 
> 
> Rename some functions with ksz9477 prefix to separate chip specific code
> from common code.
> 
> Signed-off-by: Tristram Ha 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v1 RFC 1/7] Replace license with GPL

2017-10-14 Thread Andrew Lunn

On Mon, Oct 09, 2017 at 09:18:16AM +, David Laight wrote:
> From: tristram...@microchip.com
> > Sent: 06 October 2017 21:33
> > Replace license with GPL.
> 
> Don't you need permission from all the people who have updated
> the files in order to make this change?

Hi David

Interesting question.

ksz_9477_reg.h and ksz_spi.c are not a problem, since Woojung Huh is
the only contributor.

ksz_common.c has a MODULE_LICENSE("GPL") so indicating it probably was
intended to the GPL.

However, getting an acked-by from Arkadi Sharshevsky
 would be good.

   Andrew

Re: [PATCH v1 RFC 2/7] Clean up code according to patch check suggestions

2017-10-14 Thread Andrew Lunn

On Fri, Oct 06, 2017 at 01:33:00PM -0700, tristram...@microchip.com wrote:
> From: Tristram Ha 
> 
> Clean up code according to patch check suggestions.
> 
> Signed-off-by: Tristram Ha 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH] netconsole: make config_item_type const

2017-10-14 Thread kbuild test robot

Hi Bhumika,

[auto build test WARNING on net-next/master]
[also build test WARNING on v4.14-rc4 next-20171013]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Bhumika-Goyal/netconsole-make-config_item_type-const/20171015-014833
config: x86_64-randconfig-x005-201742 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/net/netconsole.c: In function 'make_netconsole_target':
>> drivers/net/netconsole.c:650:46: warning: passing argument 3 of 
>> 'config_item_init_type_name' discards 'const' qualifier from pointer target 
>> type [-Wdiscarded-qualifiers]
 config_item_init_type_name(>item, name, _target_type);
 ^
   In file included from drivers/net/netconsole.c:49:0:
   include/linux/configfs.h:73:13: note: expected 'struct config_item_type *' 
but argument is of type 'const struct config_item_type *'
extern void config_item_init_type_name(struct config_item *item,
^~
   drivers/net/netconsole.c: At top level:
>> drivers/net/netconsole.c:695:15: warning: initialization discards 'const' 
>> qualifier from pointer target type [-Wdiscarded-qualifiers]
   .ci_type = _subsys_type,
  ^

vim +650 drivers/net/netconsole.c

0bcc1816 Satyam Sharma 2007-08-10  624  
0bcc1816 Satyam Sharma 2007-08-10  625  /*
0bcc1816 Satyam Sharma 2007-08-10  626   * Group operations and type for 
netconsole_subsys.
0bcc1816 Satyam Sharma 2007-08-10  627   */
0bcc1816 Satyam Sharma 2007-08-10  628  
f89ab861 Joel Becker   2008-07-17  629  static struct config_item 
*make_netconsole_target(struct config_group *group,
f89ab861 Joel Becker   2008-07-17  630  
  const char *name)
0bcc1816 Satyam Sharma 2007-08-10  631  {
0bcc1816 Satyam Sharma 2007-08-10  632  unsigned long flags;
0bcc1816 Satyam Sharma 2007-08-10  633  struct netconsole_target *nt;
0bcc1816 Satyam Sharma 2007-08-10  634  
0bcc1816 Satyam Sharma 2007-08-10  635  /*
0bcc1816 Satyam Sharma 2007-08-10  636   * Allocate and initialize with 
defaults.
698cf1c6 Tejun Heo 2015-06-25  637   * Target is disabled at 
creation (!enabled).
0bcc1816 Satyam Sharma 2007-08-10  638   */
0bcc1816 Satyam Sharma 2007-08-10  639  nt = kzalloc(sizeof(*nt), 
GFP_KERNEL);
e404decb Joe Perches   2012-01-29  640  if (!nt)
a6795e9e Joel Becker   2008-07-17  641  return ERR_PTR(-ENOMEM);
0bcc1816 Satyam Sharma 2007-08-10  642  
0bcc1816 Satyam Sharma 2007-08-10  643  nt->np.name = "netconsole";
0bcc1816 Satyam Sharma 2007-08-10  644  strlcpy(nt->np.dev_name, 
"eth0", IFNAMSIZ);
0bcc1816 Satyam Sharma 2007-08-10  645  nt->np.local_port = 6665;
0bcc1816 Satyam Sharma 2007-08-10  646  nt->np.remote_port = ;
1667c942 Joe Perches   2015-03-02  647  
eth_broadcast_addr(nt->np.remote_mac);
0bcc1816 Satyam Sharma 2007-08-10  648  
0bcc1816 Satyam Sharma 2007-08-10  649  /* Initialize the config_item 
member */
0bcc1816 Satyam Sharma 2007-08-10 @650  
config_item_init_type_name(>item, name, _target_type);
0bcc1816 Satyam Sharma 2007-08-10  651  
0bcc1816 Satyam Sharma 2007-08-10  652  /* Adding, but it is disabled */
0bcc1816 Satyam Sharma 2007-08-10  653  
spin_lock_irqsave(_list_lock, flags);
0bcc1816 Satyam Sharma 2007-08-10  654  list_add(>list, 
_list);
0bcc1816 Satyam Sharma 2007-08-10  655  
spin_unlock_irqrestore(_list_lock, flags);
0bcc1816 Satyam Sharma 2007-08-10  656  
f89ab861 Joel Becker   2008-07-17  657  return >item;
0bcc1816 Satyam Sharma 2007-08-10  658  }
0bcc1816 Satyam Sharma 2007-08-10  659  
0bcc1816 Satyam Sharma 2007-08-10  660  static void 
drop_netconsole_target(struct config_group *group,
0bcc1816 Satyam Sharma 2007-08-10  661 
struct config_item *item)
0bcc1816 Satyam Sharma 2007-08-10  662  {
0bcc1816 Satyam Sharma 2007-08-10  663  unsigned long flags;
0bcc1816 Satyam Sharma 2007-08-10  664  struct netconsole_target *nt = 
to_target(item);
0bcc1816 Satyam Sharma 2007-08-10  665  
0bcc1816 Satyam Sharma 2007-08-10  666  
spin_lock_irqsave(_list_lock, flags);
0bcc1816 Satyam Sharma 2007-08-10  667  list_del(>list);
0bcc1816 Satyam Sharma 2007-08-10  668  
spin_unlock_irqrestore(_list_lock, flags);
0bcc1816 Satyam Sharma 2007-08-10  669  
0bcc1816 Satyam Sharma 2007-08-10  670  /*
0bcc1816 Satyam Sharma 2007-08-10  671   * The target may have never 
been enabled, or was manually disabled
0bcc1816 Satyam Sharma 2007-08-10  672   * before being removed so 
netpoll may have already

[PATCH] net/rose: Delete an error message for a failed memory allocation in rose_proto_init()

2017-10-14 Thread SF Markus Elfring

From: Markus Elfring 
Date: Sat, 14 Oct 2017 20:57:28 +0200

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 net/rose/af_rose.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 4a9729257023..ce37be0027ca 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1529,7 +1529,6 @@ static int __init rose_proto_init(void)
 
dev_rose = kzalloc(rose_ndevs * sizeof(struct net_device *), 
GFP_KERNEL);
if (dev_rose == NULL) {
-   printk(KERN_ERR "ROSE: rose_proto_init - unable to allocate 
device structure\n");
rc = -ENOMEM;
goto out_proto_unregister;
}
-- 
2.14.2

Re: [OpenWrt-Devel] [PATCH v3 0/6] staging: Introduce DPAA2 Ethernet Switch driver

2017-10-14 Thread Florian Fainelli

Hi,

On 10/14/2017 04:09 AM, Linus Walleij wrote:
> Top posting and resending since netdev@vger.kernel.org
> is the right mail address for this. Mea culpa.
> 
> Linus Walleij
> 
> On Sat, Oct 14, 2017 at 11:35 AM, Linus Walleij
>  wrote:
>> On Thu, Oct 5, 2017 at 11:16 AM, Razvan Stefanescu
>>  wrote:
>>
>>> This patchset introduces the Ethernet Switch Driver for Freescale/NXP SoCs
>>> with DPAA2 (DataPath Acceleration Architecture v2). The driver manages
>>> switch objects discovered on the fsl-mc bus. A description of the driver
>>> can be found in the associated README file.
>>>
>>> The patchset consists of:
>>> * A set of libraries containing APIs for configuring and controlling
>>>   Management Complex (MC) switch objects
>>> * The DPAA2 Ethernet Switch driver
>>> * Patch adding ethtool support
>>
>> So it appears that ethernet switches is a class of device that need their own
>> subsystem in Linux, before this driver can move out of staging.

FWIW, I think the biggest dependency on this driver is not a switch
driver model because that exists but it's actually the specific bus (MC
AFAICT) that it depends on. More on the Ethernet switch device model below.

We do actually have a pretty good model for Ethernet switches now, in
fact, we got several options:

- Distributed Switching Architecture (DSA) should be used when the
CPU/management Ethernet controller is a traditional Ethernet MAC that is
either internally or externally attached to a switch. This usually comes
with the Ethernet switch capable of providing per-packet metadata (tag)
to indicate to the management interface why the packet is transmitted.
For older/dumber switches, using no management tag, but separating with
802.1Q tags is definitively an option that brings in the same
requirements for DSA. DSA does make use of switchdev to get notified
from the networking stack when there is an opportunity to offload
objects: VLAN, FDB, MDB, etc. DSA is both a device driver model and a
switch device abstraction model.

- switchdev should be used when the management interface is tightly
coupled with the switching hardware, such that, per-packet information
is obtained via DMA/PIO descriptors for instance. switchdev is not a
device driver model so the switch driver is responsible for creating its
own net_device instances and feeding the appropriate netdev_ops,
ethtool_ops and switchdev_ops, this is what is being done here, and this
is also perfectly fine.

>>
>> I ran into the problem in OpenWRT that has these out-of-tree patches for
>> off-chip ethernet switches, conveniently placed under net/phy:
>> https://github.com/openwrt/openwrt/tree/master/target/linux/generic/files/drivers/net/phy
>>
>> These are some 12 different ethernet switches. It is used in more or
>> less every home router out there.

The most deployed switch device drivers have been converted to DSA
already: b53, qca8k (ar83xx in OpenWrt/LEDE) and mtk7530 are all in
tree, and now we are getting new submissions from Michrochip to support
their pretty large KSZ series. Converting from swconfig to DSA is
actually quite simple, but like anything requires time and testing, and
access to hardware and ideally datasheet.

>>
>> It's not really working to have all of this out-of-tree, there must have been
>> discussions about the requirements for a proper ethernet switch subsystem.
>>
>> I'm not a good net developers, just a grumpy user having to deal with all
>> of this out-of-tree code that's not helpful with changing interfaces like
>> device tree and so on.
>>
>> Can you people who worked on this over the years pit in with your
>> requirements for an ethernet switch subsystem so we can house these
>> drivers in a proper way?
>>
>> What we need AFAICT:
>>
>> - Consensus on userspace ABI
>> - Consensus on ethtool extenstions
>> - Consensus on where in drivers/net this goes
>>
>> You can kick me for not knowing what I'm talking about and how complex the
>> problem is now.

Kicking you would not be fair, but you are about 3 years late ;) We had
such discussions in 2014 after a failed attempt at submitting swconfig
as a possible model. 3 years later we have 1 major switchdev driver:
mlxsw and quite a few active DSA drivers. The paradigms that apply are:

- normal Linux tools keep working: bridge, iproute2, ethtool
- every user-visible port has a corresponding network device, in order
to meet the first paradigm
- for every other part of the switch that does not have a net_device
representor, devlink can/should be used.
-- 
Florian

Re: [PATCH net v2 1/2] ARM: dts: imx: name the interrupts for the fec ethernet driver

2017-10-14 Thread Andrew Lunn

On Fri, Oct 13, 2017 at 07:09:39PM -0700, Troy Kisky wrote:
> imx7s/imx7d has the ptp interrupt newly added as well.

Hi Troy
 
> For imx7, "int0" is the interrupt for queue 0 and ENET_MII
> "int1" is for queue 1
> "int2" is for queue 2

Thanks for adding this explanation. Please also add it to

Documentation/devicetree/bindings/net/fsl-fec.txt 

>   fec: ethernet@02188000 {
>   compatible = "fsl,imx6q-fec";
>   reg = <0x02188000 0x4000>;
> + interrupt-names = "int0","ptp";

It is normal to have a space after the ,

   Andrew

[for-next 04/12] net/mlx5: Support for attaching multiple underlay QPs to root flow table

2017-10-14 Thread Saeed Mahameed

From: Alex Vesker 

Previous support allowed connecting only a single QPN to the FT.
Now using a linked list multiple QPNs can be attached to the same FT.

Supporting attaching multiple underlay QPs is required for PKEY
support in which child and parent share the same FT.

The actual attaching/detaching FW commands will be called inside the
function symmetrically.

This change requires a change in IPoIB open and close functions, the
attaching/detaching to/from the FT is done each time we open/close.

Signed-off-by: Alex Vesker 
Reviewed-by: Maor Gottlieb 
Signed-off-by: Saeed Mahameed 
Signed-off-by: Leon Romanovsky 
---
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  13 ++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 123 ++---
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   7 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |  78 +++--
 5 files changed, 171 insertions(+), 54 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
index 36ecc2b2e187..881e2e55840c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
@@ -40,7 +40,8 @@
 #include "eswitch.h"
 
 int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
-   struct mlx5_flow_table *ft, u32 underlay_qpn)
+   struct mlx5_flow_table *ft, u32 underlay_qpn,
+   bool disconnect)
 {
u32 in[MLX5_ST_SZ_DW(set_flow_table_root_in)]   = {0};
u32 out[MLX5_ST_SZ_DW(set_flow_table_root_out)] = {0};
@@ -52,7 +53,15 @@ int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
MLX5_SET(set_flow_table_root_in, in, opcode,
 MLX5_CMD_OP_SET_FLOW_TABLE_ROOT);
MLX5_SET(set_flow_table_root_in, in, table_type, ft->type);
-   MLX5_SET(set_flow_table_root_in, in, table_id, ft->id);
+
+   if (disconnect) {
+   MLX5_SET(set_flow_table_root_in, in, op_mod, 1);
+   MLX5_SET(set_flow_table_root_in, in, table_id, 0);
+   } else {
+   MLX5_SET(set_flow_table_root_in, in, op_mod, 0);
+   MLX5_SET(set_flow_table_root_in, in, table_id, ft->id);
+   }
+
MLX5_SET(set_flow_table_root_in, in, underlay_qpn, underlay_qpn);
if (ft->vport) {
MLX5_SET(set_flow_table_root_in, in, vport_number, ft->vport);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
index c6d7bdf255b6..71e2d0f37ad9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h
@@ -71,8 +71,8 @@ int mlx5_cmd_delete_fte(struct mlx5_core_dev *dev,
unsigned int index);
 
 int mlx5_cmd_update_root_ft(struct mlx5_core_dev *dev,
-   struct mlx5_flow_table *ft,
-   u32 underlay_qpn);
+   struct mlx5_flow_table *ft, u32 underlay_qpn,
+   bool disconnect);
 
 int mlx5_cmd_fc_alloc(struct mlx5_core_dev *dev, u32 *id);
 int mlx5_cmd_fc_free(struct mlx5_core_dev *dev, u32 id);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c 
b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
index 5a7bea688ec8..8a1a7ba9fe53 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c
@@ -693,8 +693,10 @@ static int update_root_ft_create(struct mlx5_flow_table 
*ft, struct fs_prio
 *prio)
 {
struct mlx5_flow_root_namespace *root = find_root(>node);
+   struct mlx5_ft_underlay_qp *uqp;
int min_level = INT_MAX;
int err;
+   u32 qpn;
 
if (root->root_ft)
min_level = root->root_ft->level;
@@ -702,10 +704,24 @@ static int update_root_ft_create(struct mlx5_flow_table 
*ft, struct fs_prio
if (ft->level >= min_level)
return 0;
 
-   err = mlx5_cmd_update_root_ft(root->dev, ft, root->underlay_qpn);
+   if (list_empty(>underlay_qpns)) {
+   /* Don't set any QPN (zero) in case QPN list is empty */
+   qpn = 0;
+   err = mlx5_cmd_update_root_ft(root->dev, ft, qpn, false);
+   } else {
+   list_for_each_entry(uqp, >underlay_qpns, list) {
+   qpn = uqp->qpn;
+   err = mlx5_cmd_update_root_ft(root->dev, ft, qpn,
+ false);
+   if (err)
+   break;
+   }
+   }
+
if (err)
-   mlx5_core_warn(root->dev, "Update root flow table of id=%u 
failed\n",
-

[for-next 07/12] net/mlx5e: IPoIB, Support for setting PKEY index to underlay QP

2017-10-14 Thread Saeed Mahameed

From: Alex Vesker 

Added a function to set PKEY index to IPoIB device driver using the
already present set_id function. PKEY index is attached to the QP
during state modification.

Signed-off-by: Alex Vesker 
Reviewed-by: Erez Shitrit 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 9 +
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 00f0e6a038bb..679c1f9af642 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -123,6 +123,7 @@ static int mlx5i_init_underlay_qp(struct mlx5e_priv *priv)
 
context->flags = cpu_to_be32(MLX5_QP_PM_MIGRATED << 11);
context->pri_path.port = 1;
+   context->pri_path.pkey_index = cpu_to_be16(ipriv->pkey_index);
context->qkey = cpu_to_be32(IB_DEFAULT_Q_KEY);
 
ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_RST2INIT_QP, 0, context, 
qp);
@@ -529,6 +530,13 @@ static int mlx5i_xmit(struct net_device *dev, struct 
sk_buff *skb,
return mlx5i_sq_xmit(sq, skb, >av, dqpn, ipriv->qkey);
 }
 
+static void mlx5i_set_pkey_index(struct net_device *netdev, int id)
+{
+   struct mlx5i_priv *ipriv = netdev_priv(netdev);
+
+   ipriv->pkey_index = (u16)id;
+}
+
 static int mlx5i_check_required_hca_cap(struct mlx5_core_dev *mdev)
 {
if (MLX5_CAP_GEN(mdev, port_type) != MLX5_CAP_PORT_TYPE_IB)
@@ -593,6 +601,7 @@ struct net_device *mlx5_rdma_netdev_alloc(struct 
mlx5_core_dev *mdev,
rn->send = mlx5i_xmit;
rn->attach_mcast = mlx5i_attach_mcast;
rn->detach_mcast = mlx5i_detach_mcast;
+   rn->set_id = mlx5i_set_pkey_index;
 
return netdev;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
index a0f405f520f7..9a729883c3b3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
@@ -50,6 +50,7 @@ struct mlx5i_priv {
struct rdma_netdev rn; /* keep this first */
struct mlx5_core_qp qp;
u32qkey;
+   u16pkey_index;
char  *mlx5e_priv[0];
 };
 
-- 
2.14.2

[for-next 03/12] net/mlx5e: IPoIB, Move underlay QP init/uninit to separate functions

2017-10-14 Thread Saeed Mahameed

From: Alex Vesker 

During the creation of the underlay QP the PKEY index is unknown, the
PKEY index is known only when calling ndo_open.
PKEY index attached to the QP during state modification.

Splitting the functions will also make the code symmetric and more
readable. This split is also required for later PKEY support to be
called with the PKEY index during ndo_open.

Signed-off-by: Alex Vesker 
Reviewed-by: Erez Shitrit 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  | 108 +
 1 file changed, 70 insertions(+), 38 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 14dfb577691b..feb94db6b921 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -108,11 +108,68 @@ static void mlx5i_cleanup(struct mlx5e_priv *priv)
/* Do nothing .. */
 }
 
+static int mlx5i_init_underlay_qp(struct mlx5e_priv *priv)
+{
+   struct mlx5_core_dev *mdev = priv->mdev;
+   struct mlx5i_priv *ipriv = priv->ppriv;
+   struct mlx5_core_qp *qp = >qp;
+   struct mlx5_qp_context *context;
+   int ret;
+
+   /* QP states */
+   context = kzalloc(sizeof(*context), GFP_KERNEL);
+   if (!context)
+   return -ENOMEM;
+
+   context->flags = cpu_to_be32(MLX5_QP_PM_MIGRATED << 11);
+   context->pri_path.port = 1;
+   context->qkey = cpu_to_be32(IB_DEFAULT_Q_KEY);
+
+   ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_RST2INIT_QP, 0, context, 
qp);
+   if (ret) {
+   mlx5_core_err(mdev, "Failed to modify qp RST2INIT, err: %d\n", 
ret);
+   goto err_qp_modify_to_err;
+   }
+   memset(context, 0, sizeof(*context));
+
+   ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_INIT2RTR_QP, 0, context, 
qp);
+   if (ret) {
+   mlx5_core_err(mdev, "Failed to modify qp INIT2RTR, err: %d\n", 
ret);
+   goto err_qp_modify_to_err;
+   }
+
+   ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_RTR2RTS_QP, 0, context, qp);
+   if (ret) {
+   mlx5_core_err(mdev, "Failed to modify qp RTR2RTS, err: %d\n", 
ret);
+   goto err_qp_modify_to_err;
+   }
+
+   kfree(context);
+   return 0;
+
+err_qp_modify_to_err:
+   mlx5_core_qp_modify(mdev, MLX5_CMD_OP_2ERR_QP, 0, , qp);
+   kfree(context);
+   return ret;
+}
+
+static void mlx5i_uninit_underlay_qp(struct mlx5e_priv *priv)
+{
+   struct mlx5i_priv *ipriv = priv->ppriv;
+   struct mlx5_core_dev *mdev = priv->mdev;
+   struct mlx5_qp_context context;
+   int err;
+
+   err = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_2RST_QP, 0, ,
+ >qp);
+   if (err)
+   mlx5_core_err(mdev, "Failed to modify qp 2RST, err: %d\n", err);
+}
+
 #define MLX5_QP_ENHANCED_ULP_STATELESS_MODE 2
 
 static int mlx5i_create_underlay_qp(struct mlx5_core_dev *mdev, struct 
mlx5_core_qp *qp)
 {
-   struct mlx5_qp_context *context = NULL;
u32 *in = NULL;
void *addr_path;
int ret = 0;
@@ -140,38 +197,7 @@ static int mlx5i_create_underlay_qp(struct mlx5_core_dev 
*mdev, struct mlx5_core
goto out;
}
 
-   /* QP states */
-   context = kzalloc(sizeof(*context), GFP_KERNEL);
-   if (!context) {
-   ret = -ENOMEM;
-   goto out;
-   }
-
-   context->flags = cpu_to_be32(MLX5_QP_PM_MIGRATED << 11);
-   context->pri_path.port = 1;
-   context->qkey = cpu_to_be32(IB_DEFAULT_Q_KEY);
-
-   ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_RST2INIT_QP, 0, context, 
qp);
-   if (ret) {
-   mlx5_core_err(mdev, "Failed to modify qp RST2INIT, err: %d\n", 
ret);
-   goto out;
-   }
-   memset(context, 0, sizeof(*context));
-
-   ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_INIT2RTR_QP, 0, context, 
qp);
-   if (ret) {
-   mlx5_core_err(mdev, "Failed to modify qp INIT2RTR, err: %d\n", 
ret);
-   goto out;
-   }
-
-   ret = mlx5_core_qp_modify(mdev, MLX5_CMD_OP_RTR2RTS_QP, 0, context, qp);
-   if (ret) {
-   mlx5_core_err(mdev, "Failed to modify qp RTR2RTS, err: %d\n", 
ret);
-   goto out;
-   }
-
 out:
-   kfree(context);
kvfree(in);
return ret;
 }
@@ -192,13 +218,23 @@ static int mlx5i_init_tx(struct mlx5e_priv *priv)
return err;
}
 
+   err = mlx5i_init_underlay_qp(priv);
+   if (err) {
+   mlx5_core_warn(priv->mdev, "intilize underlay QP failed, %d\n", 
err);
+   goto err_destroy_underlay_qp;
+   }
+
err = mlx5e_create_tis(priv->mdev, 0 /* tc */, ipriv->qp.qpn, 
>tisn[0]);
if

[for-next 06/12] IB/ipoib: Add ability to set PKEY index to lower device driver

2017-10-14 Thread Saeed Mahameed

From: Alex Vesker 

To support passing child interfaces to the lower device a new
rdma_netdev function was used, set_id. This will allow us to
attach the PKEY index lower device resources such as TIS/QP.
For devices that do not support offloads in IPoIB same logic
will be used, setting the PKEY index to priv struct.

Signed-off-by: Alex Vesker 
Reviewed-by: Erez Shitrit 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index c97384c914a4..fe690f82af29 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -893,13 +893,17 @@ int ipoib_ib_dev_open(struct net_device *dev)
 void ipoib_pkey_dev_check_presence(struct net_device *dev)
 {
struct ipoib_dev_priv *priv = ipoib_priv(dev);
+   struct rdma_netdev *rn = netdev_priv(dev);
 
if (!(priv->pkey & 0x7fff) ||
ib_find_pkey(priv->ca, priv->port, priv->pkey,
->pkey_index))
+>pkey_index)) {
clear_bit(IPOIB_PKEY_ASSIGNED, >flags);
-   else
+   } else {
+   if (rn->set_id)
+   rn->set_id(dev, priv->pkey_index);
set_bit(IPOIB_PKEY_ASSIGNED, >flags);
+   }
 }
 
 void ipoib_ib_dev_up(struct net_device *dev)
-- 
2.14.2

[for-next 05/12] IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop

2017-10-14 Thread Saeed Mahameed

From: Alex Vesker 

When ndo_open and ndo_stop are called RTNL lock should be held.
In this specific case ipoib_ib_dev_open calls the offloaded ndo_open
which re-sets the number of TX queue assuming RTNL lock is held.
Since RTNL lock is not held, RTNL assert will fail.

Signed-off-by: Alex Vesker 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c 
b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 6cd61638b441..c97384c914a4 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -1203,10 +1203,15 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv 
*priv,
ipoib_ib_dev_down(dev);
 
if (level == IPOIB_FLUSH_HEAVY) {
+   rtnl_lock();
if (test_bit(IPOIB_FLAG_INITIALIZED, >flags))
ipoib_ib_dev_stop(dev);
-   if (ipoib_ib_dev_open(dev) != 0)
+
+   result = ipoib_ib_dev_open(dev);
+   rtnl_unlock();
+   if (result)
return;
+
if (netif_queue_stopped(dev))
netif_start_queue(dev);
}
-- 
2.14.2

[for-next 09/12] net/mlx5e: IPoIB, Add PKEY child interface nic profile

2017-10-14 Thread Saeed Mahameed

From: Alex Vesker 

Child interface profile will be called to support child interface
specific behaviour. The child code is sparse compared to the parent
since the RX channels are shared between the interfaces.
Creating a septate profile for child and parent will make a smother
code with a better ability for future expansion.
The profile stuct is exposed to the parent using a getter function.

Signed-off-by: Alex Vesker 
Reviewed-by: Erez Shitrit 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  | 12 ++--
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h  | 13 
 .../ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c | 83 ++
 3 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index c479fe54a6ca..196771cc599e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -70,10 +70,10 @@ static void mlx5i_build_nic_params(struct mlx5_core_dev 
*mdev,
 }
 
 /* Called directly after IPoIB netdevice was created to initialize SW structs 
*/
-static void mlx5i_init(struct mlx5_core_dev *mdev,
-  struct net_device *netdev,
-  const struct mlx5e_profile *profile,
-  void *ppriv)
+void mlx5i_init(struct mlx5_core_dev *mdev,
+   struct net_device *netdev,
+   const struct mlx5e_profile *profile,
+   void *ppriv)
 {
struct mlx5e_priv *priv  = mlx5i_epriv(netdev);
 
@@ -169,7 +169,7 @@ static void mlx5i_uninit_underlay_qp(struct mlx5e_priv 
*priv)
 
 #define MLX5_QP_ENHANCED_ULP_STATELESS_MODE 2
 
-static int mlx5i_create_underlay_qp(struct mlx5_core_dev *mdev, struct 
mlx5_core_qp *qp)
+int mlx5i_create_underlay_qp(struct mlx5_core_dev *mdev, struct mlx5_core_qp 
*qp)
 {
u32 *in = NULL;
void *addr_path;
@@ -203,7 +203,7 @@ static int mlx5i_create_underlay_qp(struct mlx5_core_dev 
*mdev, struct mlx5_core
return ret;
 }
 
-static void mlx5i_destroy_underlay_qp(struct mlx5_core_dev *mdev, struct 
mlx5_core_qp *qp)
+void mlx5i_destroy_underlay_qp(struct mlx5_core_dev *mdev, struct mlx5_core_qp 
*qp)
 {
mlx5_core_destroy_qp(mdev, qp);
 }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
index e313f6d90729..c9895f7a2358 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
@@ -55,6 +55,10 @@ struct mlx5i_priv {
char  *mlx5e_priv[0];
 };
 
+/* Underlay QP create/destroy functions */
+int mlx5i_create_underlay_qp(struct mlx5_core_dev *mdev, struct mlx5_core_qp 
*qp);
+void mlx5i_destroy_underlay_qp(struct mlx5_core_dev *mdev, struct mlx5_core_qp 
*qp);
+
 /* Allocate/Free underlay QPN to net-device hash table */
 int mlx5i_pkey_qpn_ht_init(struct net_device *netdev);
 void mlx5i_pkey_qpn_ht_cleanup(struct net_device *netdev);
@@ -66,6 +70,15 @@ int mlx5i_pkey_del_qpn(struct net_device *netdev, u32 qpn);
 /* Get the net-device corresponding to the given underlay QPN */
 struct net_device *mlx5i_pkey_get_netdev(struct net_device *netdev, u32 qpn);
 
+/* Parent profile functions */
+void mlx5i_init(struct mlx5_core_dev *mdev,
+   struct net_device *netdev,
+   const struct mlx5e_profile *profile,
+   void *ppriv);
+
+/* Get child interface nic profile */
+const struct mlx5e_profile *mlx5i_pkey_get_profile(void);
+
 /* Extract mlx5e_priv from IPoIB netdev */
 #define mlx5i_epriv(netdev) ((void *)(((struct mlx5i_priv 
*)netdev_priv(netdev))->mlx5e_priv))
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
index e4d39aa1f552..17c508d98dbb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
@@ -134,3 +134,86 @@ struct net_device *mlx5i_pkey_get_netdev(struct net_device 
*netdev, u32 qpn)
 
return node->netdev;
 }
+
+/* Called directly after IPoIB netdevice was created to initialize SW structs 
*/
+static void mlx5i_pkey_init(struct mlx5_core_dev *mdev,
+struct net_device *netdev,
+const struct mlx5e_profile *profile,
+void *ppriv)
+{
+   struct mlx5e_priv *priv  = mlx5i_epriv(netdev);
+
+   mlx5i_init(mdev, netdev, profile, ppriv);
+
+   /* Override parent ndo */
+   netdev->netdev_ops = NULL;
+
+   /* Currently no ethtool support */
+   netdev->ethtool_ops = NULL;
+
+   /* Use dummy rqs */
+   priv->channels.params.log_rq_size = MLX5E_PARAMS_MINIMUM_LOG_RQ_SIZE;
+}
+
+/* Called directly before

[for-next 10/12] net/mlx5e: IPoIB, Add PKEY child interface ndos

2017-10-14 Thread Saeed Mahameed

From: Alex Vesker 

Child interface ndos will be called to support child interface
specific behaviour.

ndo_init flow:
-Acquire shared QPN to net-device HT from parent
-Continue with the same flow as parent interface

ndo_open flow:
-Initialize child underlay QP and connect to shared FT
-Create child send TIS
-Open child send channels

Signed-off-by: Alex Vesker 
Reviewed-by: Erez Shitrit 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |  10 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h  |   8 ++
 .../ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c | 133 -
 3 files changed, 144 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 196771cc599e..70706eb70d3e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -40,8 +40,6 @@
 
 static int mlx5i_open(struct net_device *netdev);
 static int mlx5i_close(struct net_device *netdev);
-static int  mlx5i_dev_init(struct net_device *dev);
-static void mlx5i_dev_cleanup(struct net_device *dev);
 static int mlx5i_change_mtu(struct net_device *netdev, int new_mtu);
 static int mlx5i_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd);
 
@@ -108,7 +106,7 @@ static void mlx5i_cleanup(struct mlx5e_priv *priv)
/* Do nothing .. */
 }
 
-static int mlx5i_init_underlay_qp(struct mlx5e_priv *priv)
+int mlx5i_init_underlay_qp(struct mlx5e_priv *priv)
 {
struct mlx5_core_dev *mdev = priv->mdev;
struct mlx5i_priv *ipriv = priv->ppriv;
@@ -154,7 +152,7 @@ static int mlx5i_init_underlay_qp(struct mlx5e_priv *priv)
return ret;
 }
 
-static void mlx5i_uninit_underlay_qp(struct mlx5e_priv *priv)
+void mlx5i_uninit_underlay_qp(struct mlx5e_priv *priv)
 {
struct mlx5i_priv *ipriv = priv->ppriv;
struct mlx5_core_dev *mdev = priv->mdev;
@@ -372,7 +370,7 @@ static int mlx5i_change_mtu(struct net_device *netdev, int 
new_mtu)
return err;
 }
 
-static int mlx5i_dev_init(struct net_device *dev)
+int mlx5i_dev_init(struct net_device *dev)
 {
struct mlx5e_priv*priv   = mlx5i_epriv(dev);
struct mlx5i_priv*ipriv  = priv->ppriv;
@@ -402,7 +400,7 @@ static int mlx5i_ioctl(struct net_device *dev, struct ifreq 
*ifr, int cmd)
}
 }
 
-static void mlx5i_dev_cleanup(struct net_device *dev)
+void mlx5i_dev_cleanup(struct net_device *dev)
 {
struct mlx5e_priv*priv   = mlx5i_epriv(dev);
struct mlx5i_priv*ipriv = priv->ppriv;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
index c9895f7a2358..80c0cfee7164 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
@@ -59,6 +59,10 @@ struct mlx5i_priv {
 int mlx5i_create_underlay_qp(struct mlx5_core_dev *mdev, struct mlx5_core_qp 
*qp);
 void mlx5i_destroy_underlay_qp(struct mlx5_core_dev *mdev, struct mlx5_core_qp 
*qp);
 
+/* Underlay QP state modification init/uninit functions */
+int mlx5i_init_underlay_qp(struct mlx5e_priv *priv);
+void mlx5i_uninit_underlay_qp(struct mlx5e_priv *priv);
+
 /* Allocate/Free underlay QPN to net-device hash table */
 int mlx5i_pkey_qpn_ht_init(struct net_device *netdev);
 void mlx5i_pkey_qpn_ht_cleanup(struct net_device *netdev);
@@ -70,6 +74,10 @@ int mlx5i_pkey_del_qpn(struct net_device *netdev, u32 qpn);
 /* Get the net-device corresponding to the given underlay QPN */
 struct net_device *mlx5i_pkey_get_netdev(struct net_device *netdev, u32 qpn);
 
+/* Shared ndo functionts */
+int mlx5i_dev_init(struct net_device *dev);
+void mlx5i_dev_cleanup(struct net_device *dev);
+
 /* Parent profile functions */
 void mlx5i_init(struct mlx5_core_dev *mdev,
struct net_device *netdev,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
index 17c508d98dbb..d99bec6855de 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
@@ -135,6 +135,137 @@ struct net_device *mlx5i_pkey_get_netdev(struct 
net_device *netdev, u32 qpn)
return node->netdev;
 }
 
+static int mlx5i_pkey_open(struct net_device *netdev);
+static int mlx5i_pkey_close(struct net_device *netdev);
+static int mlx5i_pkey_dev_init(struct net_device *dev);
+static void mlx5i_pkey_dev_cleanup(struct net_device *netdev);
+static int mlx5i_pkey_change_mtu(struct net_device *netdev, int new_mtu);
+
+static const struct net_device_ops mlx5i_pkey_netdev_ops = {
+   .ndo_open= mlx5i_pkey_open,
+   .ndo_stop= mlx5i_pkey_close,
+   .ndo_init=

[for-next 08/12] net/mlx5e: IPoIB, Use hash-table to map between QPN to child netdev

2017-10-14 Thread Saeed Mahameed

From: Alex Vesker 

This change is needed for PKEY support, since the RQs are shared
between the child interface and the parent. The parent is responsible
for NAPI and the precessing of RX completions. Using the dqpn in the
completion descriptor we set the corresponding child IPoIB netdevice
on the SKB.
The mapping between the dqpn and the netdevice is done using a HT,
each mlx5 IPoIB interface registers its mapping on creation.

Signed-off-by: Alex Vesker 
Reviewed-by: Erez Shitrit 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  20 ++-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |  16 +++
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h  |  12 ++
 .../ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c | 136 +
 5 files changed, 184 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index d9621b2152d3..100fe4ecad9b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -22,7 +22,7 @@ mlx5_core-$(CONFIG_MLX5_ESWITCH) += eswitch.o 
eswitch_offloads.o en_rep.o en_tc.
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) +=  en_dcbnl.o
 
-mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o ipoib/ethtool.o
+mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o ipoib/ethtool.o 
ipoib/ipoib_vlan.o
 
 mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o en_accel/ipsec_rxtx.o \
en_accel/ipsec_stats.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 7e3bfe62ef6e..2c3f2e9b6983 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1163,11 +1163,25 @@ static inline void mlx5i_complete_rx_cqe(struct 
mlx5e_rq *rq,
 u32 cqe_bcnt,
 struct sk_buff *skb)
 {
-   struct net_device *netdev = rq->netdev;
+   struct net_device *netdev;
char *pseudo_header;
+   u32 qpn;
u8 *dgid;
u8 g;
 
+   qpn = be32_to_cpu(cqe->sop_drop_qpn) & 0xff;
+   netdev = mlx5i_pkey_get_netdev(rq->netdev, qpn);
+
+   /* No mapping present, cannot process SKB. This might happen if a child
+* interface is going down while having unprocessed CQEs on parent RQ
+*/
+   if (unlikely(!netdev)) {
+   /* TODO: add drop counters support */
+   skb->dev = NULL;
+   pr_warn_once("Unable to map QPN %u to dev - dropping skb\n", 
qpn);
+   return;
+   }
+
g = (be32_to_cpu(cqe->flags_rqpn) >> 28) & 3;
dgid = skb->data + MLX5_IB_GRH_DGID_OFFSET;
if ((!g) || dgid[0] != 0xff)
@@ -1230,6 +1244,10 @@ void mlx5i_handle_rx_cqe(struct mlx5e_rq *rq, struct 
mlx5_cqe64 *cqe)
goto wq_free_wqe;
 
mlx5i_complete_rx_cqe(rq, cqe, cqe_bcnt, skb);
+   if (unlikely(!skb->dev)) {
+   dev_kfree_skb_any(skb);
+   goto wq_free_wqe;
+   }
napi_gro_receive(rq->cq.napi, skb);
 
 wq_free_wqe:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 679c1f9af642..c479fe54a6ca 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -382,6 +382,9 @@ static int mlx5i_dev_init(struct net_device *dev)
dev->dev_addr[2] = (ipriv->qp.qpn >>  8) & 0xff;
dev->dev_addr[3] = (ipriv->qp.qpn) & 0xff;
 
+   /* Add QPN to net-device mapping to HT */
+   mlx5i_pkey_add_qpn(dev ,ipriv->qp.qpn);
+
return 0;
 }
 
@@ -402,8 +405,12 @@ static int mlx5i_ioctl(struct net_device *dev, struct 
ifreq *ifr, int cmd)
 static void mlx5i_dev_cleanup(struct net_device *dev)
 {
struct mlx5e_priv*priv   = mlx5i_epriv(dev);
+   struct mlx5i_priv*ipriv = priv->ppriv;
 
mlx5i_uninit_underlay_qp(priv);
+
+   /* Delete QPN to net-device mapping from HT */
+   mlx5i_pkey_del_qpn(dev, ipriv->qp.qpn);
 }
 
 static int mlx5i_open(struct net_device *netdev)
@@ -590,6 +597,12 @@ struct net_device *mlx5_rdma_netdev_alloc(struct 
mlx5_core_dev *mdev,
if (!epriv->wq)
goto err_free_netdev;
 
+   err = mlx5i_pkey_qpn_ht_init(netdev);
+   if (err) {
+   mlx5_core_warn(mdev, "allocate qpn_to_netdev ht failed\n");
+   goto destroy_wq;
+   }
+
profile->init(mdev, netdev, profile, ipriv);
 
mlx5e_attach_netdev(epriv);
@@ -605,6 +618,8 @@ struct net_device *mlx5_rdma_netdev_alloc(struct 
mlx5_core_dev

[for-next 12/12] net/mlx5e: IPoIB, Modify rdma netdev allocate and free to support PKEY

2017-10-14 Thread Saeed Mahameed

From: Alex Vesker 

Resources such as FT, QPN HT and mdev resources should be allocated
only by parent netdev. Shared resources are allocated and freed by the
parent interface since the parent is always present and created
before the IPoIB PKEY sub-interface.

Signed-off-by: Alex Vesker 
Reviewed-by: Erez Shitrit 
Signed-off-by: Saeed Mahameed 
---
 .../net/ethernet/mellanox/mlx5/core/en_common.c|  1 +
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  | 52 ++
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h  |  1 +
 3 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
index ece3fb147e3e..157d02917237 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c
@@ -134,6 +134,7 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev 
*mdev)
mlx5_core_destroy_mkey(mdev, >mkey);
mlx5_core_dealloc_transport_domain(mdev, res->td.tdn);
mlx5_core_dealloc_pd(mdev, res->pdn);
+   memset(res, 0, sizeof(*res));
 }
 
 int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 70706eb70d3e..abf270d7f556 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -560,12 +560,13 @@ struct net_device *mlx5_rdma_netdev_alloc(struct 
mlx5_core_dev *mdev,
  const char *name,
  void (*setup)(struct net_device *))
 {
-   const struct mlx5e_profile *profile = _nic_profile;
-   int nch = profile->max_nch(mdev);
+   const struct mlx5e_profile *profile;
struct net_device *netdev;
struct mlx5i_priv *ipriv;
struct mlx5e_priv *epriv;
struct rdma_netdev *rn;
+   bool sub_interface;
+   int nch;
int err;
 
if (mlx5i_check_required_hca_cap(mdev)) {
@@ -573,10 +574,15 @@ struct net_device *mlx5_rdma_netdev_alloc(struct 
mlx5_core_dev *mdev,
return ERR_PTR(-EOPNOTSUPP);
}
 
-   /* This function should only be called once per mdev */
-   err = mlx5e_create_mdev_resources(mdev);
-   if (err)
-   return NULL;
+   /* TODO: Need to find a better way to check if child device*/
+   sub_interface = (mdev->mlx5e_res.pdn != 0);
+
+   if (sub_interface)
+   profile = mlx5i_pkey_get_profile();
+   else
+   profile = _nic_profile;
+
+   nch = profile->max_nch(mdev);
 
netdev = alloc_netdev_mqs(sizeof(struct mlx5i_priv) + sizeof(struct 
mlx5e_priv),
  name, NET_NAME_UNKNOWN,
@@ -585,7 +591,7 @@ struct net_device *mlx5_rdma_netdev_alloc(struct 
mlx5_core_dev *mdev,
  nch);
if (!netdev) {
mlx5_core_warn(mdev, "alloc_netdev_mqs failed\n");
-   goto free_mdev_resources;
+   return NULL;
}
 
ipriv = netdev_priv(netdev);
@@ -595,10 +601,18 @@ struct net_device *mlx5_rdma_netdev_alloc(struct 
mlx5_core_dev *mdev,
if (!epriv->wq)
goto err_free_netdev;
 
-   err = mlx5i_pkey_qpn_ht_init(netdev);
-   if (err) {
-   mlx5_core_warn(mdev, "allocate qpn_to_netdev ht failed\n");
-   goto destroy_wq;
+   ipriv->sub_interface = sub_interface;
+   if (!ipriv->sub_interface) {
+   err = mlx5i_pkey_qpn_ht_init(netdev);
+   if (err) {
+   mlx5_core_warn(mdev, "allocate qpn_to_netdev ht 
failed\n");
+   goto destroy_wq;
+   }
+
+   /* This should only be called once per mdev */
+   err = mlx5e_create_mdev_resources(mdev);
+   if (err)
+   goto destroy_ht;
}
 
profile->init(mdev, netdev, profile, ipriv);
@@ -616,12 +630,12 @@ struct net_device *mlx5_rdma_netdev_alloc(struct 
mlx5_core_dev *mdev,
 
return netdev;
 
+destroy_ht:
+   mlx5i_pkey_qpn_ht_cleanup(netdev);
 destroy_wq:
destroy_workqueue(epriv->wq);
 err_free_netdev:
free_netdev(netdev);
-free_mdev_resources:
-   mlx5e_destroy_mdev_resources(mdev);
 
return NULL;
 }
@@ -629,16 +643,18 @@ EXPORT_SYMBOL(mlx5_rdma_netdev_alloc);
 
 void mlx5_rdma_netdev_free(struct net_device *netdev)
 {
-   struct mlx5e_priv  *priv= mlx5i_epriv(netdev);
+   struct mlx5e_priv *priv = mlx5i_epriv(netdev);
+   struct mlx5i_priv *ipriv = priv->ppriv;
const struct mlx5e_profile *profile = priv->profile;
-   struct mlx5_core_dev   *mdev= priv->mdev;

[for-next 11/12] net/mlx5e: IPoIB, Add PKEY child interface ethtool ops

2017-10-14 Thread Saeed Mahameed

From: Alex Vesker 

Similar to VLAN interfaces child interfaces have limited ethtool
support. In current code the main limitation that does not
allow child interface ethtool configuration is due to shared
resources which are managed by the parent.

Signed-off-by: Alex Vesker 
Reviewed-by: Erez Shitrit 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c| 5 +
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h  | 1 +
 drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c | 4 ++--
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
index 43c126c63955..6f338a9219c8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ethtool.c
@@ -250,3 +250,8 @@ const struct ethtool_ops mlx5i_ethtool_ops = {
.get_link_ksettings = mlx5i_get_link_ksettings,
.get_link   = ethtool_op_get_link,
 };
+
+const struct ethtool_ops mlx5i_pkey_ethtool_ops = {
+   .get_drvinfo= mlx5i_get_drvinfo,
+   .get_link   = ethtool_op_get_link,
+};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
index 80c0cfee7164..a50c1a19550e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h
@@ -39,6 +39,7 @@
 #define MLX5I_MAX_NUM_TC 1
 
 extern const struct ethtool_ops mlx5i_ethtool_ops;
+extern const struct ethtool_ops mlx5i_pkey_ethtool_ops;
 
 #define MLX5_IB_GRH_BYTES   40
 #define MLX5_IPOIB_ENCAP_LEN4
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c 
b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
index d99bec6855de..531b02cc979b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c
@@ -279,8 +279,8 @@ static void mlx5i_pkey_init(struct mlx5_core_dev *mdev,
/* Override parent ndo */
netdev->netdev_ops = _pkey_netdev_ops;
 
-   /* Currently no ethtool support */
-   netdev->ethtool_ops = NULL;
+   /* Set child limited ethtool support */
+   netdev->ethtool_ops = _pkey_ethtool_ops;
 
/* Use dummy rqs */
priv->channels.params.log_rq_size = MLX5E_PARAMS_MINIMUM_LOG_RQ_SIZE;
-- 
2.14.2

[for-next 02/12] net/mlx5: PTP code migration to driver core section

2017-10-14 Thread Saeed Mahameed

From: Feras Daoud 

PTP code is moved to core section of mlx5 driver in order to share
it between ethernet and infiniband. This movement involves the following
changes:
- Change mlx5e_ prefix to be mlx5_
- Add clock structs to Core
- Add clock object to mlx5_core_dev
- Call Init/Uninit clock from core init/cleanup
- Rename mlx5e_tstamp to be mlx5_clock

Signed-off-by: Feras Daoud 
Signed-off-by: Eitan Rabin 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  39 +-
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   7 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  95 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  17 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |   3 +-
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  |   3 +-
 .../net/ethernet/mellanox/mlx5/core/lib/clock.c| 548 +
 .../net/ethernet/mellanox/mlx5/core/lib/clock.h|  51 ++
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   4 +
 .../net/ethernet/mellanox/mlx5/core/mlx5_core.h|   1 +
 include/linux/mlx5/driver.h|  24 +
 12 files changed, 416 insertions(+), 382 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/lib/clock.h

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index cc13d3dbd366..2059122eb089 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -267,28 +267,6 @@ struct mlx5e_dcbx {
 };
 #endif
 
-#define MAX_PIN_NUM8
-struct mlx5e_pps {
-   u8 pin_caps[MAX_PIN_NUM];
-   struct work_struct out_work;
-   u64start[MAX_PIN_NUM];
-   u8 enabled;
-};
-
-struct mlx5e_tstamp {
-   rwlock_t   lock;
-   struct cyclecountercycles;
-   struct timecounter clock;
-   struct hwtstamp_config hwtstamp_config;
-   u32nominal_c_mult;
-   unsigned long  overflow_period;
-   struct delayed_workoverflow_work;
-   struct mlx5_core_dev  *mdev;
-   struct ptp_clock  *ptp;
-   struct ptp_clock_info  ptp_info;
-   struct mlx5e_pps   pps_info;
-};
-
 enum {
MLX5E_RQ_STATE_ENABLED,
MLX5E_RQ_STATE_AM,
@@ -375,9 +353,10 @@ struct mlx5e_txqsq {
u8 min_inline_mode;
u16edge;
struct device *pdev;
-   struct mlx5e_tstamp   *tstamp;
__be32 mkey_be;
unsigned long  state;
+   struct hwtstamp_config*tstamp;
+   struct mlx5_clock *clock;
 
/* control path */
struct mlx5_wq_ctrlwq_ctrl;
@@ -543,10 +522,11 @@ struct mlx5e_rq {
struct mlx5e_channel  *channel;
struct device *pdev;
struct net_device *netdev;
-   struct mlx5e_tstamp   *tstamp;
struct mlx5e_rq_stats  stats;
struct mlx5e_cqcq;
struct mlx5e_page_cache page_cache;
+   struct hwtstamp_config *tstamp;
+   struct mlx5_clock  *clock;
 
mlx5e_fp_handle_rx_cqe handle_rx_cqe;
mlx5e_fp_post_rx_wqes  post_wqes;
@@ -588,7 +568,7 @@ struct mlx5e_channel {
/* control */
struct mlx5e_priv *priv;
struct mlx5_core_dev  *mdev;
-   struct mlx5e_tstamp   *tstamp;
+   struct hwtstamp_config*tstamp;
intix;
 };
 
@@ -789,7 +769,7 @@ struct mlx5e_priv {
struct mlx5_core_dev  *mdev;
struct net_device *netdev;
struct mlx5e_stats stats;
-   struct mlx5e_tstamptstamp;
+   struct hwtstamp_config tstamp;
u16 q_counter;
 #ifdef CONFIG_MLX5_CORE_EN_DCB
struct mlx5e_dcbx  dcbx;
@@ -873,12 +853,6 @@ void mlx5e_ethtool_init_steering(struct mlx5e_priv *priv);
 void mlx5e_ethtool_cleanup_steering(struct mlx5e_priv *priv);
 void mlx5e_set_rx_mode_work(struct work_struct *work);
 
-void mlx5e_fill_hwstamp(struct mlx5e_tstamp *clock, u64 timestamp,
-   struct skb_shared_hwtstamps *hwts);
-void mlx5e_timestamp_init(struct mlx5e_priv *priv);
-void mlx5e_timestamp_cleanup(struct mlx5e_priv *priv);
-void mlx5e_pps_event_handler(struct mlx5e_priv *priv,
-struct ptp_clock_event *event);
 int mlx5e_hwstamp_set(struct mlx5e_priv *priv, struct ifreq *ifr);
 int mlx5e_hwstamp_get(struct mlx5e_priv *priv, struct ifreq *ifr);
 int mlx5e_modify_rx_cqe_compression_locked(struct mlx5e_priv *priv, bool val);
@@ -889,6 +863,7 @@ int mlx5e_vlan_rx_kill_vid(struct net_device *dev, 
__always_unused

[pull request][for-next 00/12] Mellanox, mlx5 IPoIB Muli Pkey support 2017-10-11

2017-10-14 Thread Saeed Mahameed

Hi Dave and Doug,

This series includes updates for mlx5 IPoIB offloading driver from Alex
and Feras to add the support for Muli Pkey in the mlx5i ipoib offloading netdev,
to be merged into net-next and rdma-next trees.

Doug, I am sorry I couldn't base this on rc2 since the series needs and 
conflicts
with a fix that was submitted to rc3, so to keep things simple I based it on 
rc4,
I hope this is ok with you..

Please pull and let me know if there's any problem.

Thanks,
Saeed.

---

The following changes since commit 8a5776a5f49812d29fe4b2d0a2d71675c3facf3f:

  Linux 4.14-rc4 (2017-10-08 20:53:29 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
tags/mlx5-updates-2017-10-11

for you to fetch changes up to b5ae577741bec22b584fa704076ccd8221cad19d:

  net/mlx5e: IPoIB, Modify rdma netdev allocate and free to support PKEY 
(2017-10-14 11:22:12 -0700)


mlx5-updates-2017-10-11: IPoIB Muli Pkey support

This series provides the support for IPoIB Multi Pkey.
InfiniBand Pkeys are the equivalent of Ethernet vlans.
Currently IPoIB device driver supports only default Pkey and IPoIB Pkey child
interfaces are not supported with IPoIB offloads mode, this series will add
the support for that by allowing creating mlx5 multiple IPoIB netdevices with
a non-default Pkey.

mlx5 IPoIB Pkey child interface is smaller version of mlx5i IPoIB interfaces 
and shares
most of its resources with the parent IPoIB interface, namely RX steering and 
ring
queue resources.

The only mlx5 resources a child Pkey interface will be creating are the TX 
rings,
since they should be assigned to a specific Pkey.

mlx5i Pkey netdev is implemented via new mlx5e netdev profile implemented in
mlx5/core/ipoib/ipoib_vlan.c.

The series starts with a refactoring of mlx5e PTP and mlx5 clock implementation
to move the code to be part of mlx5 core rather than mlx5e netdevice, in order 
to
make mlx5 clock and PTP registration part of the core to be shared with mlx5e
master Ethernet netdev/IPoIB parent netdev and mlx5_ib in the near future.

Add the support for attaching multiple underlay QPs for the different Pkeys
in mlx5 core RX steering.

Add Pkey index to rdma_netdev to add the ability to set PKEY index to lower
IPoIB offload netdev.

Use hash-table to map between DQPN (Destination QP number) to child netdev
for the IPoIB parent netdev to forward RX packets to the corresponding
child Pkey netdev, since the RX rings are shared.

The reset of the series adds the ipoib child Pkey: mlx5e netdev profile,
netdev nods implementation and minimal set of ethtool callbacks.

Thanks,
Saeed.


Alex Vesker (10):
  net/mlx5e: IPoIB, Move underlay QP init/uninit to separate functions
  net/mlx5: Support for attaching multiple underlay QPs to root flow table
  IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop
  IB/ipoib: Add ability to set PKEY index to lower device driver
  net/mlx5e: IPoIB, Support for setting PKEY index to underlay QP
  net/mlx5e: IPoIB, Use hash-table to map between QPN to child netdev
  net/mlx5e: IPoIB, Add PKEY child interface nic profile
  net/mlx5e: IPoIB, Add PKEY child interface ndos
  net/mlx5e: IPoIB, Add PKEY child interface ethtool ops
  net/mlx5e: IPoIB, Modify rdma netdev allocate and free to support PKEY

Feras Daoud (2):
  net/mlx5: File renaming towards ptp core implementation
  net/mlx5: PTP code migration to driver core section

 drivers/infiniband/ulp/ipoib/ipoib_ib.c|  15 +-
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig|   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/Makefile   |   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  39 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_clock.c | 619 -
 .../net/ethernet/mellanox/mlx5/core/en_common.c|   1 +
 .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |   7 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  95 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c|  37 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_tx.c|   6 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |   3 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  13 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.h   |   4 +-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  | 123 +++-
 drivers/net/ethernet/mellanox/mlx5/core/fs_core.h  |   7 +-
 .../ethernet/mellanox/mlx5/core/ipoib/ethtool.c|   5 +
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c  | 260 ++---
 .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.h  |  36 ++
 .../ethernet/mellanox/mlx5/core/ipoib/ipoib_vlan.c | 350 
 .../net/ethernet/mellanox/mlx5/core/lib/clock.c| 525 +
 .../net/ethernet/mellanox/mlx5/core/lib/clock.h|  51 ++

[for-next 01/12] net/mlx5: File renaming towards ptp core implementation

2017-10-14 Thread Saeed Mahameed

From: Feras Daoud 

en_clock.c renamed clock.c and moved to lib/ as first step
towards relocating code to core part of the driver to allow
sharing between Ethernet and Infiniband.

Signed-off-by: Feras Daoud 
Signed-off-by: Eitan Rabin 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 2 +-
 drivers/net/ethernet/mellanox/mlx5/core/Makefile| 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/{en_clock.c => lib/clock.c} | 0
 3 files changed, 3 insertions(+), 3 deletions(-)
 rename drivers/net/ethernet/mellanox/mlx5/core/{en_clock.c => lib/clock.c} 
(100%)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig 
b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index fdaef00465d7..25deaa5a534c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -6,6 +6,7 @@ config MLX5_CORE
tristate "Mellanox Technologies ConnectX-4 and Connect-IB core driver"
depends on MAY_USE_DEVLINK
depends on PCI
+   imply PTP_1588_CLOCK
default n
---help---
  Core driver for low level functionality of the ConnectX-4 and
@@ -29,7 +30,6 @@ config MLX5_CORE_EN
bool "Mellanox Technologies ConnectX-4 Ethernet support"
depends on NETDEVICES && ETHERNET && INET && PCI && MLX5_CORE
depends on IPV6=y || IPV6=n || MLX5_CORE=m
-   imply PTP_1588_CLOCK
default n
---help---
  Ethernet support in Mellanox Technologies ConnectX-4 NIC.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile 
b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 87a3099808f3..d9621b2152d3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -4,7 +4,7 @@ subdir-ccflags-y += -I$(src)
 mlx5_core-y := main.o cmd.o debugfs.o fw.o eq.o uar.o pagealloc.o \
health.o mcg.o cq.o srq.o alloc.o qp.o port.o mr.o pd.o \
mad.o transobj.o vport.o sriov.o fs_cmd.o fs_core.o \
-   fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o \
+   fs_counters.o rl.o lag.o dev.o wq.o lib/gid.o lib/clock.o \
diag/fs_tracepoint.o
 
 mlx5_core-$(CONFIG_MLX5_ACCEL) += accel/ipsec.o
@@ -13,7 +13,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o 
fpga/conn.o fpga/sdk.o \
fpga/ipsec.o
 
 mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o 
\
-   en_tx.o en_rx.o en_rx_am.o en_txrx.o en_clock.o vxlan.o \
+   en_tx.o en_rx.o en_rx_am.o en_txrx.o vxlan.o \
en_arfs.o en_fs_ethtool.o en_selftest.o
 
 mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_clock.c 
b/drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
similarity index 100%
rename from drivers/net/ethernet/mellanox/mlx5/core/en_clock.c
rename to drivers/net/ethernet/mellanox/mlx5/core/lib/clock.c
-- 
2.14.2

Re: [PATCH net-next v5 0/5] bpf: security: New file mode and LSM hooks for eBPF object permission control

2017-10-14 Thread David Miller


Hmmm, this doesn't build for me:

security/selinux/hooks.c: In function ‘bpf_fd_pass’:
security/selinux/hooks.c:6325:40: error: ‘SECCLASS_BPF_MAP’ undeclared (first 
use in this function); did you mean ‘SECCLASS_BPF’?
   ret = avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_MAP,
^~~~
SECCLASS_BPF
security/selinux/hooks.c:6325:40: note: each undeclared identifier is reported 
only once for each function it appears in
security/selinux/hooks.c:6332:40: error: ‘SECCLASS_BPF_PROG’ undeclared (first 
use in this function); did you mean ‘SECCLASS_BPF_MAP’?
   ret = avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF_PROG,
^
SECCLASS_BPF_MAP

Re: [PATCH net-next 00/12] nfp: bpf: support direct packet access

2017-10-14 Thread David Miller

From: Jakub Kicinski 
Date: Thu, 12 Oct 2017 10:34:06 -0700

> The core of this series is direct packet access support.  With a
> small change to the verifier, the offloaded code can now make
> use of DPA.  We need to be careful to use kernel (after initial
> translation) offsets in our JIT.  Direct packet access also brings
> us to the problem of eBPF endianness.  After considering the 
> changes necessary we decided to not support translation on both
> BE and LE hosts, for now.
> 
> This series contains two fixes - one for compare instructions and
> one for ineffective jne optimization.  I chose to include fixes
> in this set because the code in -net works only with unreleased
> PoC FW (ABI version 1) and therefore nobody outside of Netronome
> can exercise it anyway.

Series applied, thank you.

Re: [PATCH net-next v2 0/2] net: stmmac: Improvements for multi-queuing and for AVB

2017-10-14 Thread David Miller

From: Jose Abreu 
Date: Fri, 13 Oct 2017 10:58:35 +0100

> Two improvements for stmmac: First one corrects the available fifo
> size per queue, second one corrects enabling of AVB queues. More
> info in commit log.

Series applied, thanks.

Re: [PATCH net-next] icmp: don't fail on fragment reassembly time exceeded

2017-10-14 Thread David Miller

From: Matteo Croce 
Date: Thu, 12 Oct 2017 16:12:37 +0200

> The ICMP implementation currently replies to an ICMP time exceeded message
> (type 11) with an ICMP host unreachable message (type 3, code 1).
> 
> However, time exceeded messages can either represent "time to live exceeded
> in transit" (code 0) or "fragment reassembly time exceeded" (code 1).
> 
> Unconditionally replying to "fragment reassembly time exceeded" with
> host unreachable messages might cause unjustified connection resets
> which are now easily triggered as UFO has been removed, because, in turn,
> sending large buffers triggers IP fragmentation.
> 
> The issue can be easily reproduced by running a lot of UDP streams
> which is likely to trigger IP fragmentation:
> 
>   # start netserver in the test namespace
>   ip netns add test
>   ip netns exec test netserver
> 
>   # create a VETH pair
>   ip link add name veth0 type veth peer name veth0 netns test
>   ip link set veth0 up
>   ip -n test link set veth0 up
> 
>   for i in $(seq 20 29); do
>   # assign addresses to both ends
>   ip addr add dev veth0 192.168.$i.1/24
>   ip -n test addr add dev veth0 192.168.$i.2/24
> 
>   # start the traffic
>   netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 &
>   done
> 
>   # wait
>   send_data: data send error: No route to host (errno 113)
>   netperf: send_omni: send_data failed: No route to host
> 
> We need to differentiate instead: if fragment reassembly time exceeded
> is reported, we need to silently drop the packet,
> if time to live exceeded is reported, maintain the current behaviour.
> In both cases increment the related error count "icmpInTimeExcds".
> 
> While at it, fix a typo in a comment, and convert the if statement
> into a switch to mate it more readable.
> 
> Signed-off-by: Matteo Croce 

Looks good, applied, thank you!

[PATCH] netrom: Delete an error message for a failed memory allocation in nr_proto_init()

2017-10-14 Thread SF Markus Elfring

From: Markus Elfring 
Date: Sat, 14 Oct 2017 18:48:18 +0200

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 net/netrom/af_netrom.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/netrom/af_netrom.c b/net/netrom/af_netrom.c
index ebf16f7f9089..568d6a148bf2 100644
--- a/net/netrom/af_netrom.c
+++ b/net/netrom/af_netrom.c
@@ -1408,10 +1408,8 @@ static int __init nr_proto_init(void)
}
 
dev_nr = kzalloc(nr_ndevs * sizeof(struct net_device *), GFP_KERNEL);
-   if (dev_nr == NULL) {
-   printk(KERN_ERR "NET/ROM: nr_proto_init - unable to allocate 
device array\n");
+   if (!dev_nr)
return -1;
-   }
 
for (i = 0; i < nr_ndevs; i++) {
char name[IFNAMSIZ];
-- 
2.14.2

Re: [net-next V7 PATCH 4/5] bpf: cpumap add tracepoints

2017-10-14 Thread David Miller

From: Jesper Dangaard Brouer 
Date: Thu, 12 Oct 2017 14:27:05 +0200

> @@ -355,7 +360,10 @@ struct bpf_cpu_map_entry *__cpu_map_entry_alloc(u32 
> qsize, u32 cpu, int map_id)
>   err = ptr_ring_init(rcpu->queue, qsize, gfp);
>   if (err)
>   goto free_queue;
> - rcpu->qsize = qsize
> +
> + rcpu->cpu= cpu;
> + rcpu->map_id = map_id;
> + rcpu->qsize  = qsize;
>  
>   /* Setup kthread */
>   rcpu->kthread = kthread_create_on_node(cpu_map_kthread_run, rcpu, numa,

So this fixes a build failure (missing final semicolon) introduced by
an earlier patch in the series, please fix this up so that this series
is properly bisectable.

[PATCH] fix typo in skbuff.c

2017-10-14 Thread Wenhua Shi

---
 net/core/skbuff.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 16982de6..e62476be 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1896,7 +1896,7 @@ void *__pskb_pull_tail(struct sk_buff *skb, int delta)
}
 
/* If we need update frag list, we are in troubles.
-* Certainly, it possible to add an offset to skb data,
+* Certainly, it is possible to add an offset to skb data,
 * but taking into account that pulling is expected to
 * be very rare operation, it is worth to fight against
 * further bloating skb head and crucify ourselves here instead.
-- 
2.11.0

[PATCH] net/ncsi: Delete an error message for a failed memory allocation in ncsi_rsp_handler_gc()

2017-10-14 Thread SF Markus Elfring

From: Markus Elfring 
Date: Sat, 14 Oct 2017 18:03:11 +0200

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 net/ncsi/ncsi-rsp.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index 265b9a892d41..eb3611ffbb62 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -686,11 +686,8 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr)
 
size = sizeof(*ncf) + cnt * entry_size;
ncf = kzalloc(size, GFP_ATOMIC);
-   if (!ncf) {
-   pr_warn("%s: Cannot alloc filter table (%d)\n",
-   __func__, i);
+   if (!ncf)
return -ENOMEM;
-   }
 
ncf->index = i;
ncf->total = cnt;
-- 
2.14.2

Re: [net-next 3/3] tcp: keep tcp_collapse controllable even after processing starts

2017-10-14 Thread Eric Dumazet

On Sat, 2017-10-14 at 16:27 +0900, Koichiro Den wrote:
> Combining actual collapsing with reasoning for deciding the starting
> point, we can apply its logic in a consistent manner such that we can
> avoid costly yet not much useful collapsing. When collapsing to be
> triggered, it's not rare that most of the skbs in the receive or ooo
> queue are large ones without much metadata overhead. This also
> simplifies code and makes it easier to apply logic in a fair manner.
> 
> Subtle subsidiary changes included:
> - When the end_seq of the skb we are trying to collapse was larger than
>   the 'end' argument provided, we would end up copying to the 'end'
>   even though we couldn't collapse the original one. Current users of
>   tcp_collapse does not require such reserves so redefines it as the
>   point over which skbs whose seq passes guranteed not to be collapsed.
> - Naturally tcp_collapse_ofo_queue shapes up and we no longer need
>   'tail' argument.


I am not inclined to review such a large change, without you providing
actual numbers.

We have a problem in TCP right now, that receiver announces a too big
window, and that is the main reason we trigger collapsing.

I would rather fix the root cause.

Re: [net-next 2/3] tcp: do not tcp_collapse once SYN or FIN found

2017-10-14 Thread Eric Dumazet

On Sat, 2017-10-14 at 16:27 +0900, Koichiro Den wrote:
> Since 9f5afeae5152 ("tcp: use an RB tree for ooo receive queue")
> applied, we no longer need to continue to search for the starting
> point once we encounter FIN packet. Same reasoning for SYN packet
> since commit 9d691539eea2d ("tcp: do not enqueue skb with SYN flag"),
> that would help us with error message when actual receiving.

Very confusing changelog or patch.

What exact problem do you want to solve ?

Re: [net-next 1/3] tcp: avoid useless copying and collapsing of just one skb

2017-10-14 Thread Eric Dumazet

On Sat, 2017-10-14 at 16:27 +0900, Koichiro Den wrote:
> On the starting point chosen, it could be possible that just one skb
> remains in between the range provided, leading to copying and re-insertion
> of rb node, which is useless with respect to the rcv buf measurement.
> This is rather probable in ooo queue case, in which non-contiguous bloated
> packets have been queued up.
> 
> Signed-off-by: Koichiro Den 
> ---
>  net/ipv4/tcp_input.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index d0682ce2a5d6..1d785b5bf62d 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4807,7 +4807,8 @@ tcp_collapse(struct sock *sk, struct sk_buff_head 
> *list, struct rb_root *root,
>   start = TCP_SKB_CB(skb)->end_seq;
>   }
>   if (end_of_skbs ||
> - (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)))
> + (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) ||
> + (TCP_SKB_CB(skb)->seq == start && TCP_SKB_CB(skb)->end_seq == end))
>   return;
>  
>   __skb_queue_head_init();


What do you mean by useless ?

Surely if this skb contains 17 segments (some USB drivers allocate 8KB
per frame), we want to collapse them to save memory.

So I do not agree with this patch.

Re: [PATCH 3/3] ARM: dts: gr-peach: Add ETHER pin group

2017-10-14 Thread Andrew Lunn

> > So your binding whats to look something like
> >
> > ether: ethernet@e8203000 {
> > compatible = "renesas,ether-r7s72100";
> > reg = <0xe8203000 0x800>,
> >   <0xe8204800 0x200>;
> > interrupts = ;
> > clocks = <_clks R7S72100_CLK_ETHER>;
> > power-domains = <_clocks>;
> > phy-mode = "mii";
> >   phy-handle = <>;
> > #address-cells = <1>;
> > #size-cells = <0>;
> >
> > mdio: bus-bus {
> >   #address-cells = <1>;
> >   #size-cells = <0>;
> >
> >   phy0: ethernet-phy@1 {
> > reg = <1>;
> 
> Why reg = <1> ?
> Shouldn't this be 0, or even better with no reg property at all?

This is the address of the PHY on the MDIO bus. There can be up to 32
devices on the bus. I have no idea what address your PHY is using, so
i just picked a value. 0 can be special, so i avoided it.

Andrew

Re: [PATCH v2] net: ftgmac100: Request clock and set speed

2017-10-14 Thread kbuild test robot

Hi Joel,

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.14-rc4 next-20171013]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Joel-Stanley/net-ftgmac100-Request-clock-and-set-speed/20171014-195836
config: arm-allmodconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All error/warnings (new ones prefixed by >>):

>> drivers/net//ethernet/faraday/ftgmac100.c:1742:40: warning: 'struct 
>> ftgmac100_priv' declared inside parameter list will not be visible outside 
>> of this definition or declaration
static void ftgmac100_setup_clk(struct ftgmac100_priv *priv)
   ^~
   drivers/net//ethernet/faraday/ftgmac100.c: In function 'ftgmac100_setup_clk':
>> drivers/net//ethernet/faraday/ftgmac100.c:1744:6: error: dereferencing 
>> pointer to incomplete type 'struct ftgmac100_priv'
 priv->clk = devm_clk_get(>dev, NULL);
 ^~
>> drivers/net//ethernet/faraday/ftgmac100.c:1744:28: error: 'pdev' undeclared 
>> (first use in this function)
 priv->clk = devm_clk_get(>dev, NULL);
   ^~~~
   drivers/net//ethernet/faraday/ftgmac100.c:1744:28: note: each undeclared 
identifier is reported only once for each function it appears in
   drivers/net//ethernet/faraday/ftgmac100.c: In function 'ftgmac100_probe':
>> drivers/net//ethernet/faraday/ftgmac100.c:1855:23: error: passing argument 1 
>> of 'ftgmac100_setup_clk' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
  ftgmac100_setup_clk(priv);
  ^~~~
   drivers/net//ethernet/faraday/ftgmac100.c:1742:13: note: expected 'struct 
ftgmac100_priv *' but argument is of type 'struct ftgmac100 *'
static void ftgmac100_setup_clk(struct ftgmac100_priv *priv)
^~~
   cc1: some warnings being treated as errors

vim +1744 drivers/net//ethernet/faraday/ftgmac100.c

  1741  
> 1742  static void ftgmac100_setup_clk(struct ftgmac100_priv *priv)
  1743  {
> 1744  priv->clk = devm_clk_get(>dev, NULL);
  1745  if (IS_ERR(priv->clk))
  1746  return;
  1747  
  1748  clk_prepare_enable(priv->clk);
  1749  
  1750  /* Aspeed specifies a 100MHz clock is required for up to
  1751   * 1000Mbit link speeds. As NCSI is limited to 100Mbit, 25MHz
  1752   * is sufficient
  1753   */
  1754  clk_set_rate(priv->clk, priv->is_ncsi ? FTGMAC_25MHZ :
  1755  FTGMAC_100MHZ);
  1756  }
  1757  
  1758  static int ftgmac100_probe(struct platform_device *pdev)
  1759  {
  1760  struct resource *res;
  1761  int irq;
  1762  struct net_device *netdev;
  1763  struct ftgmac100 *priv;
  1764  struct device_node *np;
  1765  int err = 0;
  1766  
  1767  if (!pdev)
  1768  return -ENODEV;
  1769  
  1770  res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
  1771  if (!res)
  1772  return -ENXIO;
  1773  
  1774  irq = platform_get_irq(pdev, 0);
  1775  if (irq < 0)
  1776  return irq;
  1777  
  1778  /* setup net_device */
  1779  netdev = alloc_etherdev(sizeof(*priv));
  1780  if (!netdev) {
  1781  err = -ENOMEM;
  1782  goto err_alloc_etherdev;
  1783  }
  1784  
  1785  SET_NETDEV_DEV(netdev, >dev);
  1786  
  1787  netdev->ethtool_ops = _ethtool_ops;
  1788  netdev->netdev_ops = _netdev_ops;
  1789  netdev->watchdog_timeo = 5 * HZ;
  1790  
  1791  platform_set_drvdata(pdev, netdev);
  1792  
  1793  /* setup private data */
  1794  priv = netdev_priv(netdev);
  1795  priv->netdev = netdev;
  1796  priv->dev = >dev;
  1797  INIT_WORK(>reset_task, ftgmac100_reset_task);
  1798  
  1799  /* map io memory */
  1800  priv->res = request_mem_region(res->start, resource_size(res),
  1801 dev_name(>dev));
  1802  if (!priv->res) {
  1803  dev_err(>dev, "Could not reserve memory 
region\n");
  1804  err = -ENOMEM;
  1805  goto err_req_mem;
  1806  }
  1807  
  1808  priv->base = ioremap(res->start, resource_size(res));
  1809  if (!priv->base) {
  1810

Re: usb/net/rt2x00: warning in rt2800_eeprom_word_index

2017-10-14 Thread Dmitry Vyukov

On Thu, Oct 12, 2017 at 9:25 AM, Stanislaw Gruszka  wrote:
> Hi
>
> On Mon, Oct 09, 2017 at 07:50:53PM +0200, Andrey Konovalov wrote:
>> I've got the following report while fuzzing the kernel with syzkaller.
>>
>> On commit 8a5776a5f49812d29fe4b2d0a2d71675c3facf3f (4.14-rc4).
>>
>> I'm not sure whether this is a bug in the driver, or just a way to
>> report misbehaving device. In the latter case this shouldn't be a
>> WARN() call, since WARN() means bug in the kernel.
>
> This is about wrong EEPROM, which reported 3 tx streams on
> non 3 antenna device. I think WARN() is justified and thanks
> to the call trace I was actually able to to understand what
> happened.
>
> In general I do not think WARN() only means a kernel bug, it
> can be F/W or H/W bug too.

Hi Stanislaw,

Printing messages is fine. Printing stacks is fine. Just please make
them distinguishable from kernel bugs and don't kill the whole
possibility of automated Linux kernel testing. That's an important
capability.

Thanks

Re: Kernel Performance Tuning for High Volume SCTP traffic

2017-10-14 Thread Traiano Welcome

I've upped the value of the following sctp and udp related parameters,
in the hope that this would help:

sysctl -w net.core.rmem_max=9
sysctl -w net.core.wmem_max=9

sysctl -w net.sctp.sctp_mem="21 21 21"
sysctl -w net.sctp.sctp_rmem="21 21 21"
sysctl -w net.sctp.sctp_wmem="21 21 21"

sysctl -w net.ipv4.udp_mem="50 50 50"
sysctl -w net.ipv4.udp_mem="100 100 100"

However, I'm still seeing rapidly incrementing rx discards reported on the NIC:

:~# ethtool -S ens4f1 | egrep -i rx_discards
 [0]: rx_discards: 6390805462
 [1]: rx_discards: 6659315919
 [2]: rx_discards: 6542570026
 [3]: rx_discards: 6431513008
 [4]: rx_discards: 6436779078
 [5]: rx_discards: 6665897051
 [6]: rx_discards: 6167985560
 [7]: rx_discards: 11340068788
 rx_discards: 56634934892

Despite the fact that I've set the NIC ring buffer on the Netextreme
interface to he maximum:

:~# ethtool -g ens4f0
Ring parameters for ens4f0:
Pre-set maximums:
RX: 4078
RX Mini:0
RX Jumbo:   0
TX: 4078
Current hardware settings:
RX: 4078
RX Mini:0
RX Jumbo:   0
TX: 4078

I see no ip errors at the physical interface:

ethtool -S ens4f0 | egrep phy_ip_err_discard| tail -1
 rx_phy_ip_err_discards: 0


Could anyone suggest alternative approaches I might take to optimising
the system's handling of SCTP traffic?



On Sat, Oct 14, 2017 at 12:35 AM, David Laight  wrote:
> From: Traiano Welcome
>> Sent: 13 October 2017 17:04
>> On Fri, Oct 13, 2017 at 11:56 PM, David Laight  
>> wrote:
>> > From: Traiano Welcome
>> >
>> > (copied to netdev)
>> >> Sent: 13 October 2017 07:16
>> >> To: linux-s...@vger.kernel.org
>> >> Subject: Kernel Performance Tuning for High Volume SCTP traffic
>> >>
>> >> Hi List
>> >>
>> >> I'm running a linux server processing high volumes of SCTP traffic and
>> >> am seeing large numbers of packet overruns (ifconfig output).
>> >
>> > I'd guess that overruns indicate that the ethernet MAC is failing to
>> > copy the receive frames into kernel memory.
>> > It is probably running out of receive buffers, but might be
>> > suffering from a lack of bus bandwidth.
>> > MAC drivers usually discard receive frames if they can't get
>> > a replacement buffer - so you shouldn't run out of rx buffers.
>> >
>> > This means the errors are probably below SCTP - so changing SCTP parameters
>> > is unlikely to help.
>>
>> Does this mean that tuning UDP performance could help ? Or do you mean
>> hardware (NIC) performance could be the issue?
>
> I'd certainly check UDP performance.
>
> David
>

[PATCH v2] pch_gbe: Switch to new PCI IRQ allocation API

2017-10-14 Thread Andy Shevchenko

This removes custom flag handling.

Signed-off-by: Andy Shevchenko 
---
 drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h|  3 +-
 .../net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c   | 42 +-
 2 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
index 8d710a3b4db0..697e29dd4bd3 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe.h
@@ -613,7 +613,6 @@ struct pch_gbe_privdata {
  * @rx_ring:   Pointer of Rx descriptor ring structure
  * @rx_buffer_len: Receive buffer length
  * @tx_queue_len:  Transmit queue length
- * @have_msi:  PCI MSI mode flag
  * @pch_gbe_privdata:  PCI Device ID driver_data
  */
 
@@ -623,6 +622,7 @@ struct pch_gbe_adapter {
atomic_t irq_sem;
struct net_device *netdev;
struct pci_dev *pdev;
+   int irq;
struct net_device *polling_netdev;
struct napi_struct napi;
struct pch_gbe_hw hw;
@@ -637,7 +637,6 @@ struct pch_gbe_adapter {
struct pch_gbe_rx_ring *rx_ring;
unsigned long rx_buffer_len;
unsigned long tx_queue_len;
-   bool have_msi;
bool rx_stop_flag;
int hwts_tx_en;
int hwts_rx_en;
diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c 
b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index 5ae9681a2da7..457ee80307ea 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -781,11 +781,8 @@ static void pch_gbe_free_irq(struct pch_gbe_adapter 
*adapter)
 {
struct net_device *netdev = adapter->netdev;
 
-   free_irq(adapter->pdev->irq, netdev);
-   if (adapter->have_msi) {
-   pci_disable_msi(adapter->pdev);
-   netdev_dbg(netdev, "call pci_disable_msi\n");
-   }
+   free_irq(adapter->irq, netdev);
+   pci_free_irq_vectors(adapter->pdev);
 }
 
 /**
@@ -799,7 +796,7 @@ static void pch_gbe_irq_disable(struct pch_gbe_adapter 
*adapter)
atomic_inc(>irq_sem);
iowrite32(0, >reg->INT_EN);
ioread32(>reg->INT_ST);
-   synchronize_irq(adapter->pdev->irq);
+   synchronize_irq(adapter->irq);
 
netdev_dbg(adapter->netdev, "INT_EN reg : 0x%08x\n",
   ioread32(>reg->INT_EN));
@@ -1903,30 +1900,23 @@ static int pch_gbe_request_irq(struct pch_gbe_adapter 
*adapter)
 {
struct net_device *netdev = adapter->netdev;
int err;
-   int flags;
 
-   flags = IRQF_SHARED;
-   adapter->have_msi = false;
-   err = pci_enable_msi(adapter->pdev);
-   netdev_dbg(netdev, "call pci_enable_msi\n");
-   if (err) {
-   netdev_dbg(netdev, "call pci_enable_msi - Error: %d\n", err);
-   } else {
-   flags = 0;
-   adapter->have_msi = true;
-   }
-   err = request_irq(adapter->pdev->irq, _gbe_intr,
- flags, netdev->name, netdev);
+   err = pci_alloc_irq_vectors(adapter->pdev, 1, 1, PCI_IRQ_ALL_TYPES);
+   if (err < 0)
+   return err;
+
+   adapter->irq = pci_irq_vector(adapter->pdev, 0);
+
+   err = request_irq(adapter->irq, _gbe_intr, IRQF_SHARED,
+ netdev->name, netdev);
if (err)
netdev_err(netdev, "Unable to allocate interrupt Error: %d\n",
   err);
-   netdev_dbg(netdev,
-  "adapter->have_msi : %d  flags : 0x%04x  return : 0x%04x\n",
-  adapter->have_msi, flags, err);
+   netdev_dbg(netdev, "have_msi : %d  return : 0x%04x\n",
+  pci_dev_msi_enabled(adapter->pdev), err);
return err;
 }
 
-
 /**
  * pch_gbe_up - Up GbE network device
  * @adapter:  Board private structure
@@ -2399,9 +2389,9 @@ static void pch_gbe_netpoll(struct net_device *netdev)
 {
struct pch_gbe_adapter *adapter = netdev_priv(netdev);
 
-   disable_irq(adapter->pdev->irq);
-   pch_gbe_intr(adapter->pdev->irq, netdev);
-   enable_irq(adapter->pdev->irq);
+   disable_irq(adapter->irq);
+   pch_gbe_intr(adapter->irq, netdev);
+   enable_irq(adapter->irq);
 }
 #endif
 
-- 
2.14.2

Re: [PATCH v3 1/2] dt-bindings: add device tree binding for Allwinner XR819 SDIO Wi-Fi

2017-10-14 Thread icenowy

在 2017-10-05 14:58，Kalle Valo 写道：

Icenowy Zheng  writes:

于 2017年10月4日 GMT+08:00 下午6:11:45, Maxime Ripard
 写到:

On Wed, Oct 04, 2017 at 10:02:48AM +, Arend van Spriel wrote:

On 10/4/2017 11:03 AM, Icenowy Zheng wrote:
>
>
> 于 2017年10月4日 GMT+08:00 下午5:02:17, Kalle Valo 

写到:

> > Icenowy Zheng  writes:
> >
> > > Allwinner XR819 is a SDIO Wi-Fi chip, which has the

functionality to

> > use
> > > an out-of-band interrupt pin instead of SDIO in-band interrupt.
> > >
> > > Add the device tree binding of this chip, in order to make it
> > possible
> > > to add this interrupt pin to device trees.
> > >
> > > Signed-off-by: Icenowy Zheng 
> > > Acked-by: Rob Herring 
> > > ---
> > > Changes in v3:
> > > - Renames the node name.
> > > - Adds ACK from Rob.
> > > Changes in v2:
> > > - Removed status property in example.
> > > - Added required property reg.
> > >
> > >   .../bindings/net/wireless/allwinner,xr819.txt  | 38
> > ++
> > >   1 file changed, 38 insertions(+)
> > >   create mode 100644
> >

Documentation/devicetree/bindings/net/wireless/allwinner,xr819.txt

> >
> > Like I asked already last time, AFAICS there is no upstream xr819
> > wireless driver in drivers/net/wireless directory. Do we still

accept

> > bindings like this for out-of-tree drivers?
>
> See esp8089.
>
> There's also no in-tree driver for it.

The question is whether we should. The above might be a precedent,

but it

may not necessarily be the way to go. The commit message for esp8089

seems

to hint that there is intent to have an in-tree driver:

"""
Note that at this point there only is an out of tree driver for

this

hardware, there is no clear timeline / path for merging this.

Still

I believe it would be good to specify the binding for this in

tree

now, so that any future migration to an in tree driver will not

cause

compatiblity issues.

Cc: Icenowy Zheng 
Signed-off-by: Hans de Goede 
Signed-off-by: Rob Herring 
"""

Regardless the bindings are in principle independent of the kernel

and just

describing hardware. I think there have been discussions to move the
bindings to their own repository, but apparently it was decided

otherwise.

Yeah, I guess especially how it could be merged with the cw1200 
driver

would be very relevant to that commit log.

The cw1200 driver seems to still have some legacy platform
data. Maybe they should also be convert to DT.
(Or maybe compatible = "allwinner,xr819" is enough, as
xr819 is a specified variant of cw1200 family)

Ah, so the upstream cw1200 driver supports xr819? Has anyone tested
that? Or does cw1200 more changes than just adding the DT support?

The support of XR819 in CW1200 driver is far more difficult than I
imagined -- the codedrop used in the mainlined CW1200 driver seems to
be so old that it's before XR819 (which seems to be based on CW1160),
and there's a large number of problems to adapt it to a modern CW1200
variant.

P.S. could you apply this device tree binding patch now?

[RFC PATCH v2 2/5] tcp: implemented pacing_expired

2017-10-14 Thread Natale Patriciello

Inform the congestion control that the pacing timer, previously set,
has expired. The commit does not consider situations in which another
kind of timer has expired (e.g., a tail loss probe, a retransmission
timer...)

Signed-off-by: Natale Patriciello 
---
 include/net/tcp.h | 2 ++
 net/ipv4/tcp_output.c | 6 ++
 2 files changed, 8 insertions(+)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 42c7aa96c4cf..e817f0669d0e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1017,6 +1017,8 @@ struct tcp_congestion_ops {
   union tcp_cc_info *info);
/* get the expiration time for the pacing timer (optional) */
u64 (*get_pacing_time)(struct sock *sk);
+   /* the pacing timer is expired (optional) */
+   void (*pacing_timer_expired)(struct sock *sk);
 
charname[TCP_CA_NAME_MAX];
struct module   *owner;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ec5977156c26..25b4cf0802f2 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2241,6 +2241,7 @@ void tcp_chrono_stop(struct sock *sk, const enum 
tcp_chrono type)
 static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
   int push_one, gfp_t gfp)
 {
+   const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *skb;
unsigned int tso_segs, sent_pkts;
@@ -2263,6 +2264,11 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
 
max_segs = tcp_tso_segs(sk, mss_now);
tcp_mstamp_refresh(tp);
+
+   if (!tcp_pacing_timer_check(sk) &&
+   ca_ops && ca_ops->pacing_timer_expired)
+   ca_ops->pacing_timer_expired(sk);
+
while ((skb = tcp_send_head(sk))) {
unsigned int limit;
 
-- 
2.14.2

[RFC PATCH v2 4/5] tcp: added segment sent

2017-10-14 Thread Natale Patriciello

Inform the congestion control of the number of segment sent in normal
conditions, it means segments that left the node without involving
recovery or retransmission procedures.

Signed-off-by: Natale Patriciello 
---
 include/net/tcp.h |  2 ++
 net/ipv4/tcp_output.c | 10 +-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3561eca5a61f..aebe225ab8b1 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1021,6 +1021,8 @@ struct tcp_congestion_ops {
void (*pacing_timer_expired)(struct sock *sk);
/* get the # segs to send out when the timer expires (optional) */
u32 (*get_segs_per_round)(struct sock *sk);
+   /* the TCP has sent some segments (optional) */
+   void (*segments_sent)(struct sock *sk, u32 sent);
 
charname[TCP_CA_NAME_MAX];
struct module   *owner;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e37941e4328b..ef50202659da 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2250,6 +2250,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
bool is_cwnd_limited = false, is_rwnd_limited = false;
u32 max_segs;
u32 pacing_allowed_segs = 0;
+   bool notify = false;
 
sent_pkts = 0;
 
@@ -2268,8 +2269,12 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
 
if (!tcp_pacing_timer_check(sk)) {
pacing_allowed_segs = 1;
-   if (ca_ops && ca_ops->pacing_timer_expired)
+
+   if (ca_ops && ca_ops->pacing_timer_expired) {
ca_ops->pacing_timer_expired(sk);
+   notify = true;
+   }
+
if (ca_ops && ca_ops->get_segs_per_round)
pacing_allowed_segs = ca_ops->get_segs_per_round(sk);
}
@@ -2348,6 +2353,9 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
break;
}
 
+   if (ca_ops && notify && ca_ops->segments_sent)
+   ca_ops->segments_sent(sk, sent_pkts);
+
if (is_rwnd_limited)
tcp_chrono_start(sk, TCP_CHRONO_RWND_LIMITED);
else
-- 
2.14.2

[RFC PATCH v2 3/5] tcp: added get_segs_per_round

2017-10-14 Thread Natale Patriciello

Usually, the pacing time is provided per-segment. In some occasion, this
time refers to the time between a group of segments. With this commit, add
the possibility for the congestion control module to tell the TCP socket
how many segments can be sent out before pausing and setting a pacing
timer.

Signed-off-by: Natale Patriciello 
---
 include/net/tcp.h |  2 ++
 net/ipv4/tcp_output.c | 13 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e817f0669d0e..3561eca5a61f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1019,6 +1019,8 @@ struct tcp_congestion_ops {
u64 (*get_pacing_time)(struct sock *sk);
/* the pacing timer is expired (optional) */
void (*pacing_timer_expired)(struct sock *sk);
+   /* get the # segs to send out when the timer expires (optional) */
+   u32 (*get_segs_per_round)(struct sock *sk);
 
charname[TCP_CA_NAME_MAX];
struct module   *owner;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 25b4cf0802f2..e37941e4328b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2249,6 +2249,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
int result;
bool is_cwnd_limited = false, is_rwnd_limited = false;
u32 max_segs;
+   u32 pacing_allowed_segs = 0;
 
sent_pkts = 0;
 
@@ -2265,14 +2266,18 @@ static bool tcp_write_xmit(struct sock *sk, unsigned 
int mss_now, int nonagle,
max_segs = tcp_tso_segs(sk, mss_now);
tcp_mstamp_refresh(tp);
 
-   if (!tcp_pacing_timer_check(sk) &&
-   ca_ops && ca_ops->pacing_timer_expired)
-   ca_ops->pacing_timer_expired(sk);
+   if (!tcp_pacing_timer_check(sk)) {
+   pacing_allowed_segs = 1;
+   if (ca_ops && ca_ops->pacing_timer_expired)
+   ca_ops->pacing_timer_expired(sk);
+   if (ca_ops && ca_ops->get_segs_per_round)
+   pacing_allowed_segs = ca_ops->get_segs_per_round(sk);
+   }
 
while ((skb = tcp_send_head(sk))) {
unsigned int limit;
 
-   if (tcp_pacing_check(sk))
+   if (sent_pkts >= pacing_allowed_segs)
break;
 
tso_segs = tcp_init_tso_segs(skb, mss_now);
-- 
2.14.2

[RFC PATCH v2 0/5] TCP Wave

2017-10-14 Thread Natale Patriciello

Hello,

after the round of review on our v1 patch (you can find the relevant
thread here [1]) we have improved our code of TCP Wave, a new congestion
control algorithm.

Context: TCP Wave (TCPW) replaces the window-based transmission paradigm
of the standard TCP with a burst-based transmission, the ACK-clock
scheduling with a self-managed timer and the RTT-based congestion
control loop with an Ack-based Capacity and Congestion Estimation (ACCE)
module. In non-technical words, it sends data down the stack when a
timer expires, and the timing of the received ACKs contribute to
updating this timer regularly. We have left many debug messages to help
people understand what is going on inside the module. We plan to remove
almost all of them in the final submission.

We added this new sender paradigm without deeply touching existing code;
we re-used the existing infrastructure (TCP pacing timer, added with
commit 218af599fa635b107cfe10acf3249c4dfe5e4123), thanks to the
suggestion of Eric Dumazet. In fact, we only added four (optional) new
congestion control functions:

+ /* get the expiration time for the pacing timer (optional) */
+ u64 (*get_pacing_time)(struct sock *sk);
+ /* the pacing timer is expired (optional) */
+ void (*pacing_timer_expired)(struct sock *sk);
+ /* get the # segs to send out when the timer expires (optional) */
+ u32 (*get_segs_per_round)(struct sock *sk);
+ /* the TCP has sent some segments (optional) */
+ void (*segments_sent)(struct sock *sk, u32 sent);

to manage the previously mentioned pacing timer. With these functions, a
congestion control can set the pacing time, be informed when it expires,
indicate how many segments can leave when it expires, and know how many
segments really left the TCP layer after it has expired.

Thanks to the reviewers' suggestions we believe that the code has
improved in clarity and performance. David Laight, Stephen Hemminger,
David Miller, Neal Cardwell, Eric Dumazet, and all others that replied
privately, thank you.

Again, we would greatly appreciate any feedback, comments, suggestions,
corrections and so on. Thank you for your attention.

Cesare, Francesco, Ahmed, Natale

[1] http://lists.openwall.net/netdev/2017/07/28/219

---
Changes in v2:
 - Using TCP pacing timer instead of adding a new one
 - Using ktime_t instead of jiffies to measure the time
 - Avoided the use of custom debug facilities
 - Cleaned the variable declarations

Natale Patriciello (5):
  tcp: Added a function to retrieve pacing timer
  tcp: implemented pacing_expired
  tcp: added get_segs_per_round
  tcp: added segment sent
  wave: Added TCP Wave

 MAINTAINERS|6 +
 include/net/tcp.h  |8 +
 include/uapi/linux/inet_diag.h |   13 +
 net/ipv4/Kconfig   |   16 +
 net/ipv4/Makefile  |1 +
 net/ipv4/tcp_output.c  |   61 ++-
 net/ipv4/tcp_wave.c| 1035 
 7 files changed, 1127 insertions(+), 13 deletions(-)
 create mode 100644 net/ipv4/tcp_wave.c

-- 
2.14.2

[RFC PATCH v2 1/5] tcp: Added a function to retrieve pacing timer

2017-10-14 Thread Natale Patriciello

Allow congestion control modules to set a custom pacing time between
the transmission of segments.
Moreover, it is assumed that the time returned by the congestion module
in the past is firm, until the timer expires; therefore, do not re-start
the timer if it is already active.

Signed-off-by: Natale Patriciello 
---
 include/net/tcp.h |  2 ++
 net/ipv4/tcp_output.c | 36 +---
 2 files changed, 27 insertions(+), 11 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 89974c5286d8..42c7aa96c4cf 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1015,6 +1015,8 @@ struct tcp_congestion_ops {
/* get info for inet_diag (optional) */
size_t (*get_info)(struct sock *sk, u32 ext, int *attr,
   union tcp_cc_info *info);
+   /* get the expiration time for the pacing timer (optional) */
+   u64 (*get_pacing_time)(struct sock *sk);
 
charname[TCP_CA_NAME_MAX];
struct module   *owner;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0bc9e46a5369..ec5977156c26 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -950,22 +950,36 @@ static bool tcp_needs_internal_pacing(const struct sock 
*sk)
return smp_load_acquire(>sk_pacing_status) == SK_PACING_NEEDED;
 }
 
+static bool tcp_pacing_timer_check(const struct sock *sk)
+{
+   return hrtimer_active(_sk(sk)->pacing_timer);
+}
+
 static void tcp_internal_pacing(struct sock *sk, const struct sk_buff *skb)
 {
+   const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
u64 len_ns;
-   u32 rate;
 
if (!tcp_needs_internal_pacing(sk))
return;
-   rate = sk->sk_pacing_rate;
-   if (!rate || rate == ~0U)
-   return;
-
-   /* Should account for header sizes as sch_fq does,
-* but lets make things simple.
-*/
-   len_ns = (u64)skb->len * NSEC_PER_SEC;
-   do_div(len_ns, rate);
+
+   if (ca_ops && ca_ops->get_pacing_time) {
+   if (tcp_pacing_timer_check(sk))
+   return;
+
+   len_ns = ca_ops->get_pacing_time(sk);
+   } else {
+   u32 rate = sk->sk_pacing_rate;
+
+   if (!rate || rate == ~0U)
+   return;
+
+   /* Should account for header sizes as sch_fq does,
+* but lets make things simple.
+*/
+   len_ns = (u64)skb->len * NSEC_PER_SEC;
+   do_div(len_ns, rate);
+   }
hrtimer_start(_sk(sk)->pacing_timer,
  ktime_add_ns(ktime_get(), len_ns),
  HRTIMER_MODE_ABS_PINNED);
@@ -2123,7 +2137,7 @@ static int tcp_mtu_probe(struct sock *sk)
 static bool tcp_pacing_check(const struct sock *sk)
 {
return tcp_needs_internal_pacing(sk) &&
-  hrtimer_active(_sk(sk)->pacing_timer);
+   tcp_pacing_timer_check(sk);
 }
 
 /* TCP Small Queues :
-- 
2.14.2

[RFC PATCH v2 5/5] wave: Added TCP Wave

2017-10-14 Thread Natale Patriciello

TCP Wave (TCPW) replaces the window-based transmission paradigm of the
standard TCP with a burst-based transmission, the ACK-clock scheduling
with a self-managed timer and the RTT-based congestion control loop
with an Ack-based Capacity and Congestion Estimation (ACCE) module. In
non-technical words, it sends data down the stack when its internal
timer expires, and the timing of the received ACKs contribute to
updating this timer regularly.

It is the first TCP congestion control that uses the timing constraint
developed in the Linux kernel.

Signed-off-by: Natale Patriciello 
Tested-by: Ahmed Said 
---
 MAINTAINERS|6 +
 include/uapi/linux/inet_diag.h |   13 +
 net/ipv4/Kconfig   |   16 +
 net/ipv4/Makefile  |1 +
 net/ipv4/tcp_output.c  |4 +-
 net/ipv4/tcp_wave.c| 1035 
 6 files changed, 1074 insertions(+), 1 deletion(-)
 create mode 100644 net/ipv4/tcp_wave.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 2d3d750b19c0..b59815dcda67 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -13024,6 +13024,12 @@ W: http://tcp-lp-mod.sourceforge.net/
 S: Maintained
 F: net/ipv4/tcp_lp.c
 
+TCP WAVE MODULE
+M: "Natale Patriciello" 
+W: http://tlcsat.uniroma2.it/tcpwave4linux/
+S: Maintained
+F: net/ipv4/tcp_wave.c
+
 TDA10071 MEDIA DRIVER
 M: Antti Palosaari 
 L: linux-me...@vger.kernel.org
diff --git a/include/uapi/linux/inet_diag.h b/include/uapi/linux/inet_diag.h
index f52ff62bfabe..2f204844e580 100644
--- a/include/uapi/linux/inet_diag.h
+++ b/include/uapi/linux/inet_diag.h
@@ -142,6 +142,7 @@ enum {
INET_DIAG_PAD,
INET_DIAG_MARK,
INET_DIAG_BBRINFO,
+   INET_DIAG_WAVEINFO,
INET_DIAG_CLASS_ID,
INET_DIAG_MD5SIG,
__INET_DIAG_MAX,
@@ -188,9 +189,21 @@ struct tcp_bbr_info {
__u32   bbr_cwnd_gain;  /* cwnd gain shifted left 8 bits */
 };
 
+/* INET_DIAG_WAVEINFO */
+
+struct tcp_wave_info {
+   __u32   tx_timer;
+   __u16   burst;
+   __u32   previous_ack_t_disp;
+   __u32   min_rtt;
+   __u32   avg_rtt;
+   __u32   max_rtt;
+};
+
 union tcp_cc_info {
struct tcpvegas_infovegas;
struct tcp_dctcp_info   dctcp;
struct tcp_bbr_info bbr;
+   struct tcp_wave_infowave;
 };
 #endif /* _UAPI_INET_DIAG_H_ */
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 91a2557942fa..de23b3a04b98 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -492,6 +492,18 @@ config TCP_CONG_BIC
increase provides TCP friendliness.
See http://www.csc.ncsu.edu/faculty/rhee/export/bitcp/
 
+config TCP_CONG_WAVE
+   tristate "Wave TCP"
+   default m
+   ---help---
+   TCP Wave (TCPW) replaces the window-based transmission paradigm of the
+   standard TCP with a burst-based transmission, the ACK-clock scheduling
+   with a self-managed timer and the RTT-based congestion control loop with
+   an Ack-based Capacity and Congestion Estimation (ACCE) module. In
+   non-technical words, it sends data down the stack when its internal
+   timer expires, and the timing of the received ACKs contribute to
+   updating this timer regularly.
+
 config TCP_CONG_CUBIC
tristate "CUBIC TCP"
default y
@@ -690,6 +702,9 @@ choice
config DEFAULT_CUBIC
bool "Cubic" if TCP_CONG_CUBIC=y
 
+   config DEFAULT_WAVE
+   bool "Wave" if TCP_CONG_WAVE=y
+
config DEFAULT_HTCP
bool "Htcp" if TCP_CONG_HTCP=y
 
@@ -729,6 +744,7 @@ config DEFAULT_TCP_CONG
string
default "bic" if DEFAULT_BIC
default "cubic" if DEFAULT_CUBIC
+   default "wave" if DEFAULT_WAVE
default "htcp" if DEFAULT_HTCP
default "hybla" if DEFAULT_HYBLA
default "vegas" if DEFAULT_VEGAS
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index afcb435adfbe..bdc8cd1a804a 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_TCP_CONG_BBR) += tcp_bbr.o
 obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
 obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o
 obj-$(CONFIG_TCP_CONG_CUBIC) += tcp_cubic.o
+obj-$(CONFIG_TCP_CONG_WAVE) += tcp_wave.o
 obj-$(CONFIG_TCP_CONG_DCTCP) += tcp_dctcp.o
 obj-$(CONFIG_TCP_CONG_WESTWOOD) += tcp_westwood.o
 obj-$(CONFIG_TCP_CONG_HSTCP) += tcp_highspeed.o
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index ef50202659da..40ec467e5afd 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2527,7 +2527,9 @@ void tcp_push_one(struct sock *sk, unsigned int mss_now)
 {
struct sk_buff *skb = tcp_send_head(sk);
 
-   BUG_ON(!skb || skb->len < mss_now);
+   /* Don't be forced to send not meaningful data */
+   if (!skb || skb->len < mss_now)
+   return;
 
tcp_write_xmit(sk,

Re: [PATCH v3 0/6] staging: Introduce DPAA2 Ethernet Switch driver

2017-10-14 Thread Linus Walleij

Top posting and resending since netdev@vger.kernel.org
is the right mail address for this. Mea culpa.

Linus Walleij

On Sat, Oct 14, 2017 at 11:35 AM, Linus Walleij
 wrote:
> On Thu, Oct 5, 2017 at 11:16 AM, Razvan Stefanescu
>  wrote:
>
>> This patchset introduces the Ethernet Switch Driver for Freescale/NXP SoCs
>> with DPAA2 (DataPath Acceleration Architecture v2). The driver manages
>> switch objects discovered on the fsl-mc bus. A description of the driver
>> can be found in the associated README file.
>>
>> The patchset consists of:
>> * A set of libraries containing APIs for configuring and controlling
>>   Management Complex (MC) switch objects
>> * The DPAA2 Ethernet Switch driver
>> * Patch adding ethtool support
>
> So it appears that ethernet switches is a class of device that need their own
> subsystem in Linux, before this driver can move out of staging.
>
> I ran into the problem in OpenWRT that has these out-of-tree patches for
> off-chip ethernet switches, conveniently placed under net/phy:
> https://github.com/openwrt/openwrt/tree/master/target/linux/generic/files/drivers/net/phy
>
> These are some 12 different ethernet switches. It is used in more or
> less every home router out there.
>
> It's not really working to have all of this out-of-tree, there must have been
> discussions about the requirements for a proper ethernet switch subsystem.
>
> I'm not a good net developers, just a grumpy user having to deal with all
> of this out-of-tree code that's not helpful with changing interfaces like
> device tree and so on.
>
> Can you people who worked on this over the years pit in with your
> requirements for an ethernet switch subsystem so we can house these
> drivers in a proper way?
>
> What we need AFAICT:
>
> - Consensus on userspace ABI
> - Consensus on ethtool extenstions
> - Consensus on where in drivers/net this goes
>
> You can kick me for not knowing what I'm talking about and how complex the
> problem is now.
>
> Yours,
> Linus Walleij

Re: [PATCH net-next v2 1/1] bridge: return error code when deleting Vlan

2017-10-14 Thread Nikolay Aleksandrov

On 13/10/17 19:00, Roman Mashak wrote:
> Nikolay Aleksandrov  writes:
> 
> 
> [...]
> 
 Why do you want to return the error code here? Walking the code paths
 seems like ENOENT or err from switchdev_port_obj_del are the 2 error
 possibilities.
>>>
>>> For example, if you attempt to delete a non-existing vlan on a port,
>>> the current code succeeds and also sends event :
>>>
>>> rtnetlink_rcv_msg
>>> rtnl_bridge_dellink
>>>br_dellink
>>>   br_afspec
>>>  br_vlan_info
>>>
>>> int br_dellink(..)
>>> {
>>>   ...
>>>   err = br_afspec()
>>>   if (err == 0)
>>>   br_ifinfo_notify(RTM_NEWLINK, p);
>>> }
>>>
>>> This is misleading, so a proper errcode has to be produced.
>>>
>>
>> True, but you also change the expected behaviour because now a user can
>> clear all vlans with one request (1 - 4094), and after the change that
>> will fail with a partial delete if some vlan was missing.
> 
> Nikolay, would you like to have a crack at fixing this?
> 

Sure, need to finish something and will cook up a patch next week.

Thanks,
 Nik

[PATCH] ath9k: debug: Remove redundant check

2017-10-14 Thread Christos Gkekas

Variable val is unsigned, so checking whether it is less than zero is
redundant.

Signed-off-by: Christos Gkekas 
---
 drivers/net/wireless/ath/ath9k/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/debug.c 
b/drivers/net/wireless/ath/ath9k/debug.c
index 01fa301..4f5f141 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -1167,7 +1167,7 @@ static ssize_t write_file_tpc(struct file *file, const 
char __user *user_buf,
if (kstrtoul(buf, 0, ))
return -EINVAL;
 
-   if (val < 0 || val > 1)
+   if (val > 1)
return -EINVAL;
 
tpc_enabled = !!val;
-- 
2.7.4

[PATCH] ath10k: spectral: Remove redundant check

2017-10-14 Thread Christos Gkekas

Variable val is unsigned, so checking whether it is less than zero is
redundant.

Signed-off-by: Christos Gkekas 
---
 drivers/net/wireless/ath/ath10k/spectral.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/spectral.c 
b/drivers/net/wireless/ath/ath10k/spectral.c
index dd9cc09..2048b1e 100644
--- a/drivers/net/wireless/ath/ath10k/spectral.c
+++ b/drivers/net/wireless/ath/ath10k/spectral.c
@@ -406,7 +406,7 @@ static ssize_t write_file_spectral_count(struct file *file,
if (kstrtoul(buf, 0, ))
return -EINVAL;
 
-   if (val < 0 || val > 255)
+   if (val > 255)
return -EINVAL;
 
mutex_lock(>conf_mutex);
-- 
2.7.4

Re: [PATCH] ath9k: debug: Simplify error checking

2017-10-14 Thread Christos Gkekas

On 13/10/17 15:49:15 +0300, Kalle Valo wrote:
> Christos Gkekas  writes:
> 
> > Variable val is unsigned so checking whether it is less than zero is
> > redundant.
> >
> > Signed-off-by: Christos Gkekas 
> > ---
> >  drivers/net/wireless/ath/ath9k/debug.c | 5 +
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/wireless/ath/ath9k/debug.c 
> > b/drivers/net/wireless/ath/ath9k/debug.c
> > index 01fa301..3b93c23 100644
> > --- a/drivers/net/wireless/ath/ath9k/debug.c
> > +++ b/drivers/net/wireless/ath/ath9k/debug.c
> > @@ -1164,10 +1164,7 @@ static ssize_t write_file_tpc(struct file *file, 
> > const char __user *user_buf,
> > return -EFAULT;
> >  
> > buf[len] = '\0';
> > -   if (kstrtoul(buf, 0, ))
> > -   return -EINVAL;
> > -
> > -   if (val < 0 || val > 1)
> > +   if (kstrtoul(buf, 0, ) || val > 1)
> > return -EINVAL;
> 
> Same as with the ath10k patch, please keep the two if statements
> separate.
> 
> -- 
> Kalle Valo

Thanks, will submit an new, updated patch.

Christos Gkekas

Re: [PATCH] ath10k: spectral: Simplify error checking

2017-10-14 Thread Christos Gkekas

On 13/10/17 12:28:50 +, Kalle Valo wrote:
> Christos Gkekas  writes:
> 
> > Variable val is unsigned so checking whether it is less than zero is
> > redundant.
> >
> > Signed-off-by: Christos Gkekas 
> > ---
> >  drivers/net/wireless/ath/ath10k/spectral.c | 5 +
> >  1 file changed, 1 insertion(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/wireless/ath/ath10k/spectral.c 
> > b/drivers/net/wireless/ath/ath10k/spectral.c
> > index dd9cc09..1867937 100644
> > --- a/drivers/net/wireless/ath/ath10k/spectral.c
> > +++ b/drivers/net/wireless/ath/ath10k/spectral.c
> > @@ -403,10 +403,7 @@ static ssize_t write_file_spectral_count(struct file 
> > *file,
> > return -EFAULT;
> >  
> > buf[len] = '\0';
> > -   if (kstrtoul(buf, 0, ))
> > -   return -EINVAL;
> > -
> > -   if (val < 0 || val > 255)
> > +   if (kstrtoul(buf, 0, ) || val > 255)
> > return -EINVAL;
> 
> Removing the check for negative is correct but I don't think you are
> simplifying anything, on the contrary it's harder to read. Please keep
> the two if statements separate.
> 
> -- 
> Kalle Valo

You are right, will make the change and send a new patch.
Thanks for your time.

Christos Gkekas

[net-next 0/3] optimisations and reorganisations of tcp_collapse

2017-10-14 Thread Koichiro Den

This patch series removes possible useless copying and collapsing while
not missing the chance when it is worth the effort. Also reorganizes it
and do some cleanups.

Koichiro Den (3):
  tcp: avoid useless copying and collapsing of just one skb
  tcp: do not tcp_collapse once SYN or FIN found
  tcp: keep tcp_collapse controllable even after processing starts

 net/ipv4/tcp_input.c | 193 ---
 1 file changed, 90 insertions(+), 103 deletions(-)

-- 
2.9.4

[net-next 1/3] tcp: avoid useless copying and collapsing of just one skb

2017-10-14 Thread Koichiro Den

On the starting point chosen, it could be possible that just one skb
remains in between the range provided, leading to copying and re-insertion
of rb node, which is useless with respect to the rcv buf measurement.
This is rather probable in ooo queue case, in which non-contiguous bloated
packets have been queued up.

Signed-off-by: Koichiro Den 
---
 net/ipv4/tcp_input.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d0682ce2a5d6..1d785b5bf62d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4807,7 +4807,8 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, 
struct rb_root *root,
start = TCP_SKB_CB(skb)->end_seq;
}
if (end_of_skbs ||
-   (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)))
+   (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) ||
+   (TCP_SKB_CB(skb)->seq == start && TCP_SKB_CB(skb)->end_seq == end))
return;
 
__skb_queue_head_init();
-- 
2.9.4

[net-next 3/3] tcp: keep tcp_collapse controllable even after processing starts

2017-10-14 Thread Koichiro Den

Combining actual collapsing with reasoning for deciding the starting
point, we can apply its logic in a consistent manner such that we can
avoid costly yet not much useful collapsing. When collapsing to be
triggered, it's not rare that most of the skbs in the receive or ooo
queue are large ones without much metadata overhead. This also
simplifies code and makes it easier to apply logic in a fair manner.

Subtle subsidiary changes included:
- When the end_seq of the skb we are trying to collapse was larger than
  the 'end' argument provided, we would end up copying to the 'end'
  even though we couldn't collapse the original one. Current users of
  tcp_collapse does not require such reserves so redefines it as the
  point over which skbs whose seq passes guranteed not to be collapsed.
- Naturally tcp_collapse_ofo_queue shapes up and we no longer need
  'tail' argument.

Signed-off-by: Koichiro Den 
---
 net/ipv4/tcp_input.c | 197 +++
 1 file changed, 90 insertions(+), 107 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1a74457db0f3..5fb90cc0ae95 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4756,108 +4756,117 @@ void tcp_rbtree_insert(struct rb_root *root, struct 
sk_buff *skb)
 
 /* Collapse contiguous sequence of skbs head..tail with
  * sequence numbers start..end.
- *
- * If tail is NULL, this means until the end of the queue.
- *
- * Segments with FIN/SYN are not collapsed (only because this
- * simplifies code)
  */
 static void
 tcp_collapse(struct sock *sk, struct sk_buff_head *list, struct rb_root *root,
-struct sk_buff *head, struct sk_buff *tail, u32 start, u32 end)
+struct sk_buff *head, u32 start, u32 *end)
 {
-   struct sk_buff *skb = head, *n;
+   struct sk_buff *skb = head, *n, *nskb = NULL;
+   int copy = 0, offset, size;
struct sk_buff_head tmp;
-   bool end_of_skbs;
 
-   /* First, check that queue is collapsible and find
-* the point where collapsing can be useful.
-*/
-restart:
-   for (end_of_skbs = true; skb != NULL && skb != tail; skb = n) {
-   /* If list is ooo queue, it will get purged when
-* this FIN will get moved to sk_receive_queue.
-* SYN packet is not expected here. We will get
-* error message when actual receiving.
-*/
-   if (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_FIN | TCPHDR_SYN))
-   return;
+   if (!list)
+   __skb_queue_head_init(); /* To defer rb tree insertion */
 
-   n = tcp_skb_next(skb, list);
-
-   /* No new bits? It is possible on ofo queue. */
+   while (skb) {
if (!before(start, TCP_SKB_CB(skb)->end_seq)) {
skb = tcp_collapse_one(sk, skb, list, root);
-   if (!skb)
-   break;
-   goto restart;
+   continue;
}
+   n = tcp_skb_next(skb, list);
 
-   /* The first skb to collapse is:
-* - bloated or contains data before "start" or
-*   overlaps to the next one.
-*/
-   if (tcp_win_from_space(skb->truesize) > skb->len ||
-   before(TCP_SKB_CB(skb)->seq, start)) {
-   end_of_skbs = false;
+examine:
+   /* Nothing beneficial to expect any more if SYN/FIN or last. */
+   if (!n ||
+   TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_FIN | TCPHDR_SYN))
break;
+
+   /* If hole found, skip to the next. */
+   if (after(TCP_SKB_CB(n)->seq, TCP_SKB_CB(skb)->end_seq)) {
+   skb = n;
+   continue;
}
 
-   if (n && n != tail &&
-   TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(n)->seq) {
-   end_of_skbs = false;
-   break;
+   /* If the 2nd skb has no newer bits than the current one,
+* just collapse and advance it to re-examine.
+*/
+   if (!after(TCP_SKB_CB(n)->end_seq, TCP_SKB_CB(skb)->end_seq)) {
+   n = tcp_collapse_one(sk, skb, list, root);
+   if (!n)
+   break;
+   goto examine;
}
 
-   /* Decided to skip this, advance start seq. */
-   start = TCP_SKB_CB(skb)->end_seq;
-   }
-   if (end_of_skbs ||
-   (TCP_SKB_CB(skb)->seq == start && TCP_SKB_CB(skb)->end_seq == end))
-   return;
+   /* If the next skb passes the end hint, finish. */
+   if (end && !before(TCP_SKB_CB(n)->seq, *end))
+   break;
 
-   __skb_queue_head_init();
+

[net-next 2/3] tcp: do not tcp_collapse once SYN or FIN found

2017-10-14 Thread Koichiro Den

Since 9f5afeae5152 ("tcp: use an RB tree for ooo receive queue")
applied, we no longer need to continue to search for the starting
point once we encounter FIN packet. Same reasoning for SYN packet
since commit 9d691539eea2d ("tcp: do not enqueue skb with SYN flag"),
that would help us with error message when actual receiving.

Signed-off-by: Koichiro Den 
---
 net/ipv4/tcp_input.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1d785b5bf62d..1a74457db0f3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4775,6 +4775,14 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, 
struct rb_root *root,
 */
 restart:
for (end_of_skbs = true; skb != NULL && skb != tail; skb = n) {
+   /* If list is ooo queue, it will get purged when
+* this FIN will get moved to sk_receive_queue.
+* SYN packet is not expected here. We will get
+* error message when actual receiving.
+*/
+   if (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_FIN | TCPHDR_SYN))
+   return;
+
n = tcp_skb_next(skb, list);
 
/* No new bits? It is possible on ofo queue. */
@@ -4786,13 +4794,11 @@ tcp_collapse(struct sock *sk, struct sk_buff_head 
*list, struct rb_root *root,
}
 
/* The first skb to collapse is:
-* - not SYN/FIN and
 * - bloated or contains data before "start" or
 *   overlaps to the next one.
 */
-   if (!(TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) &&
-   (tcp_win_from_space(skb->truesize) > skb->len ||
-before(TCP_SKB_CB(skb)->seq, start))) {
+   if (tcp_win_from_space(skb->truesize) > skb->len ||
+   before(TCP_SKB_CB(skb)->seq, start)) {
end_of_skbs = false;
break;
}
@@ -4807,7 +4813,6 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, 
struct rb_root *root,
start = TCP_SKB_CB(skb)->end_seq;
}
if (end_of_skbs ||
-   (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | TCPHDR_FIN)) ||
(TCP_SKB_CB(skb)->seq == start && TCP_SKB_CB(skb)->end_seq == end))
return;
 
@@ -4845,9 +4850,7 @@ tcp_collapse(struct sock *sk, struct sk_buff_head *list, 
struct rb_root *root,
}
if (!before(start, TCP_SKB_CB(skb)->end_seq)) {
skb = tcp_collapse_one(sk, skb, list, root);
-   if (!skb ||
-   skb == tail ||
-   (TCP_SKB_CB(skb)->tcp_flags & (TCPHDR_SYN | 
TCPHDR_FIN)))
+   if (!skb || skb == tail)
goto end;
}
}
-- 
2.9.4

85 matches

Mail list logo