Re: [PATCH][V2] mlxsw: spectrum: remove redundant check if err is zero

2016-09-24 Thread Ido Schimmel
On Sat, Sep 24, 2016 at 06:03:38PM -0700, Colin King wrote:
> From: Colin Ian King 
> 
> There is an earlier check and return if err is non-zero, so
> the check to see if it is zero is redundant in every iteration
> of the loop and hence the check can be removed.
> 
> Signed-off-by: Colin Ian King 

The subject and commit message are wrong. I think you copy-pasted them
from an earlier patch.

> ---
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c 
> b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
> index 2a61617..1073673 100644
> --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
> +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
> @@ -117,11 +117,11 @@ static int validate_filter(struct net_device *dev,
>   return 0;
>  }
>  
> -static unsigned int get_filter_steerq(struct net_device *dev,
> -   struct ch_filter_specification *fs)
> +static int get_filter_steerq(struct net_device *dev,
> +  struct ch_filter_specification *fs)
>  {
>   struct adapter *adapter = netdev2adap(dev);
> - unsigned int iq;
> + int iq;
>  
>   /* If the user has requested steering matching Ingress Packets
>* to a specific Queue Set, we need to make sure it's in range
> @@ -443,10 +443,10 @@ int __cxgb4_set_filter(struct net_device *dev, int 
> filter_id,
>  struct filter_ctx *ctx)
>  {
>   struct adapter *adapter = netdev2adap(dev);
> - unsigned int max_fidx, fidx, iq;
> + unsigned int max_fidx, fidx;
>   struct filter_entry *f;
>   u32 iconf;
> - int ret;
> + int iq, ret;
>  
>   max_fidx = adapter->tids.nftids;
>   if (filter_id != (max_fidx + adapter->tids.nsftids - 1) &&
> -- 
> 2.9.3
> 


[PATCH net-next] Revert "net: ethernet: bcmgenet: use phydev from struct net_device"

2016-09-24 Thread Florian Fainelli
This reverts commit 62469c76007e ("net: ethernet: bcmgenet: use phydev
from struct net_device") because it causes GENETv1/2/3 adapters to
expose the following behavior after an ifconfig down/up sequence:

PING fainelli-linux (10.112.156.244): 56 data bytes
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.352 ms
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.472 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.496 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.517 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.536 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.557 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=752.448 ms (DUP!)

This was previously fixed by commit 5dbebbb44a6a ("net: bcmgenet:
Software reset EPHY after power on") but the commit we are reverting was
essentially making this previous commit void, here is why.

Without commit 62469c76007e we would have the following scenario after
an ifconfig down then up sequence:

- bcmgenet_open() calls bcmgenet_power_up() to make sure the PHY is
  initialized *before* we get to initialize the UniMAC, this is
  critical to ensure the PHY is in a correct state, priv->phydev is
  valid, this code executes fine

- second time from bcmgenet_mii_probe(), through the normal
  phy_init_hw() call (which arguably could be optimized out)

Everything is fine in that case. With commit 62469c76007e, we would have
the following scenario to happen after an ifconfig down then up
sequence:

- bcmgenet_close() calls phy_disonnect() which makes dev->phydev become
  NULL

- when bcmgenet_open() executes again and calls bcmgenet_mii_reset() from
  bcmgenet_power_up() to initialize the internal PHY, the NULL check
  becomes true, so we do not reset the PHY, yet we keep going on and
  initialize the UniMAC, causing MAC activity to occur

- we call bcmgenet_mii_reset() from bcmgenet_mii_probe(), but this is
  too late, the PHY is botched, and causes the above bogus pings/packets
  transmission/reception to occur

Reported-by: Jaedon Shin 
Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 45 ++
 drivers/net/ethernet/broadcom/genet/bcmgenet.h |  1 +
 drivers/net/ethernet/broadcom/genet/bcmmii.c   | 24 +++---
 3 files changed, 39 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 2013474bfdbf..0140bc0cd508 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -453,25 +453,29 @@ static inline void bcmgenet_rdma_ring_writel(struct 
bcmgenet_priv *priv,
 static int bcmgenet_get_link_ksettings(struct net_device *dev,
   struct ethtool_link_ksettings *cmd)
 {
+   struct bcmgenet_priv *priv = netdev_priv(dev);
+
if (!netif_running(dev))
return -EINVAL;
 
-   if (!dev->phydev)
+   if (!priv->phydev)
return -ENODEV;
 
-   return phy_ethtool_ksettings_get(dev->phydev, cmd);
+   return phy_ethtool_ksettings_get(priv->phydev, cmd);
 }
 
 static int bcmgenet_set_link_ksettings(struct net_device *dev,
   const struct ethtool_link_ksettings *cmd)
 {
+   struct bcmgenet_priv *priv = netdev_priv(dev);
+
if (!netif_running(dev))
return -EINVAL;
 
-   if (!dev->phydev)
+   if (!priv->phydev)
return -ENODEV;
 
-   return phy_ethtool_ksettings_set(dev->phydev, cmd);
+   return phy_ethtool_ksettings_set(priv->phydev, cmd);
 }
 
 static int bcmgenet_set_rx_csum(struct net_device *dev,
@@ -937,7 +941,7 @@ static int bcmgenet_get_eee(struct net_device *dev, struct 
ethtool_eee *e)
e->eee_active = p->eee_active;
e->tx_lpi_timer = bcmgenet_umac_readl(priv, UMAC_EEE_LPI_TIMER);
 
-   return phy_ethtool_get_eee(dev->phydev, e);
+   return phy_ethtool_get_eee(priv->phydev, e);
 }
 
 static int bcmgenet_set_eee(struct net_device *dev, struct ethtool_eee *e)
@@ -954,7 +958,7 @@ static int bcmgenet_set_eee(struct net_device *dev, struct 
ethtool_eee *e)
if (!p->eee_enabled) {
bcmgenet_eee_enable_set(dev, false);
} else {
-   ret = phy_init_eee(dev->phydev, 0);
+   ret = phy_init_eee(priv->phydev, 0);
if (ret) {
netif_err(priv, hw, dev, "EEE initialization failed\n");
return ret;
@@ -964,12 +968,14 @@ static int bcmgenet_set_eee(struct net_device *dev, 
struct ethtool_eee *e)
bcmgenet_eee_enable_set(dev, true);
}
 
-   return phy_ethtool_set_eee(dev->phydev, e);
+   return phy_ethtool_set_eee(priv->phydev, e);
 }
 
 static int bcmgenet_nway_reset(struct net_device *dev)
 {
-   return genphy_restart_aneg(dev->phydev);
+  

Re: [PATCH net] Revert "net: ethernet: bcmgenet: use phydev from struct net_device"

2016-09-24 Thread Florian Fainelli


On 09/24/2016 07:51 PM, Florian Fainelli wrote:
> 
> 
> On 09/24/2016 05:10 PM, David Miller wrote:
>> From: Florian Fainelli 
>> Date: Sat, 24 Sep 2016 12:58:30 -0700
>>
>>> There is already a commit:
>>>
>>> Revert "net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings"
>>>
>>> which should make this apply cleanly to "net" now.
>>
>> But look at net-next, it got re-added there.
>>
>> This is going to be a bit of a merge hassle, and this is why I pushed
>> back on the other attempt to revert this thing.
> 
> OK, so how about this:
> 
> - this patch applies to net which should be okay for now
> - to avoid future conflicts when you merge net into net-next, I submit a
> revert of "net: ethernet: bcmgenet: use
> phy_ethtool_{get|set}_link_ksettings" against net-next
> 
> Does that work for you?

Scratch that, seems like I need to submit another version of this revert
for net-next, let me submit that as well.
--
Florian


Re: [PATCH net] Revert "net: ethernet: bcmgenet: use phydev from struct net_device"

2016-09-24 Thread Florian Fainelli


On 09/24/2016 05:10 PM, David Miller wrote:
> From: Florian Fainelli 
> Date: Sat, 24 Sep 2016 12:58:30 -0700
> 
>> There is already a commit:
>>
>> Revert "net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings"
>>
>> which should make this apply cleanly to "net" now.
> 
> But look at net-next, it got re-added there.
> 
> This is going to be a bit of a merge hassle, and this is why I pushed
> back on the other attempt to revert this thing.

OK, so how about this:

- this patch applies to net which should be okay for now
- to avoid future conflicts when you merge net into net-next, I submit a
revert of "net: ethernet: bcmgenet: use
phy_ethtool_{get|set}_link_ksettings" against net-next

Does that work for you?
--
Florian


[PATCH] ipv6 addrconf: implement RFC7559 router solicitation backoff

2016-09-24 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

This implements:
  https://tools.ietf.org/html/rfc7559

Backoff is performed according to RFC3315 section 14:
  https://tools.ietf.org/html/rfc3315#section-14

Signed-off-by: Maciej Żenczykowski 
---
 include/net/if_inet6.h |  1 +
 net/ipv6/addrconf.c| 31 +++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 1c8b6820b694..515352c6280a 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -201,6 +201,7 @@ struct inet6_dev {
struct ipv6_devstat stats;
 
struct timer_list   rs_timer;
+   __s32   rs_interval;/* in jiffies */
__u8rs_probes;
 
__u8addr_gen_mode;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 255be34cdbce..6384a1cde056 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -112,6 +112,24 @@ static inline u32 cstamp_delta(unsigned long cstamp)
return (cstamp - INITIAL_JIFFIES) * 100UL / HZ;
 }
 
+static inline s32 rfc3315_s14_backoff_init(s32 initial)
+{
+   u32 r = (9 << 20) / 10 + (prandom_u32() % ((2 << 20) / 10 + 1));
+   s32 v = initial * (u64)r >> 20;   /* ~ multiply by 0.9 .. 1.1 */
+   return v;
+}
+
+static inline s32 rfc3315_s14_backoff_update(s32 cur, s32 ceiling)
+{
+   u32 r = (19 << 20) / 10 + (prandom_u32() % ((2 << 20) / 10 + 1));
+   s32 v = cur * (u64)r >> 20;   /* ~ multiply by 1.9 .. 2.1 */
+   if (v > ceiling) {
+   r -= 1 << 20;
+   v = ceiling * (u64)r >> 20;   /* ~ multiply by 0.9 .. 1.1 */
+   }
+   return v;
+}
+
 #ifdef CONFIG_SYSCTL
 static int addrconf_sysctl_register(struct inet6_dev *idev);
 static void addrconf_sysctl_unregister(struct inet6_dev *idev);
@@ -3698,11 +3716,13 @@ static void addrconf_rs_timer(unsigned long data)
goto put;
 
write_lock(>lock);
+   idev->rs_interval = rfc3315_s14_backoff_update(
+   idev->rs_interval, idev->cnf.rtr_solicit_max_interval);
/* The wait after the last probe can be shorter */
addrconf_mod_rs_timer(idev, (idev->rs_probes ==
 idev->cnf.rtr_solicits) ?
  idev->cnf.rtr_solicit_delay :
- idev->cnf.rtr_solicit_interval);
+ idev->rs_interval);
} else {
/*
 * Note: we do not support deprecated "all on-link"
@@ -3973,10 +3993,11 @@ static void addrconf_dad_completed(struct inet6_ifaddr 
*ifp)
 
write_lock_bh(>idev->lock);
spin_lock(>lock);
+   ifp->idev->rs_interval = rfc3315_s14_backoff_init(
+   ifp->idev->cnf.rtr_solicit_interval);
ifp->idev->rs_probes = 1;
ifp->idev->if_flags |= IF_RS_SENT;
-   addrconf_mod_rs_timer(ifp->idev,
- ifp->idev->cnf.rtr_solicit_interval);
+   addrconf_mod_rs_timer(ifp->idev, ifp->idev->rs_interval);
spin_unlock(>lock);
write_unlock_bh(>idev->lock);
}
@@ -5132,8 +5153,10 @@ update_lft:
 
if (update_rs) {
idev->if_flags |= IF_RS_SENT;
+   idev->rs_interval = rfc3315_s14_backoff_init(
+   idev->cnf.rtr_solicit_interval);
idev->rs_probes = 1;
-   addrconf_mod_rs_timer(idev, idev->cnf.rtr_solicit_interval);
+   addrconf_mod_rs_timer(idev, idev->rs_interval);
}
 
/* Well, that's kinda nasty ... */
-- 
2.8.0.rc3.226.g39d4020



Re: [PATCH 5/7] ipv6 addrconf: implement RFC7559 router solicitation backoff

2016-09-24 Thread Maciej Żenczykowski
Ok, so that seems to have all sorts of __divdi3 or __aeabi_ldivmod
undefined errors on 32-bit platforms (ie. arm/m68k/i386).


[PATCH][V2] mlxsw: spectrum: remove redundant check if err is zero

2016-09-24 Thread Colin King
From: Colin Ian King 

There is an earlier check and return if err is non-zero, so
the check to see if it is zero is redundant in every iteration
of the loop and hence the check can be removed.

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
index 2a61617..1073673 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c
@@ -117,11 +117,11 @@ static int validate_filter(struct net_device *dev,
return 0;
 }
 
-static unsigned int get_filter_steerq(struct net_device *dev,
- struct ch_filter_specification *fs)
+static int get_filter_steerq(struct net_device *dev,
+struct ch_filter_specification *fs)
 {
struct adapter *adapter = netdev2adap(dev);
-   unsigned int iq;
+   int iq;
 
/* If the user has requested steering matching Ingress Packets
 * to a specific Queue Set, we need to make sure it's in range
@@ -443,10 +443,10 @@ int __cxgb4_set_filter(struct net_device *dev, int 
filter_id,
   struct filter_ctx *ctx)
 {
struct adapter *adapter = netdev2adap(dev);
-   unsigned int max_fidx, fidx, iq;
+   unsigned int max_fidx, fidx;
struct filter_entry *f;
u32 iconf;
-   int ret;
+   int iq, ret;
 
max_fidx = adapter->tids.nftids;
if (filter_id != (max_fidx + adapter->tids.nsftids - 1) &&
-- 
2.9.3



Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-24 Thread Cong Wang
On Fri, Sep 23, 2016 at 8:40 AM, Shmulik Ladkani
 wrote:
> On Fri, 23 Sep 2016 08:48:33 -0400 Jamal Hadi Salim  wrote:
>> > Even today, one may create loops using existing 'egress redirect',
>> > e.g. this rediculously errorneous construct:
>> >
>> >  # ip l add v0 type veth peer name v0p
>> >  # tc filter add dev v0p parent : basic \
>> > action mirred egress redirect dev v0
>>
>> I think we actually recover from this one by eventually
>> dropping (theres a ttl field).
>
> [off topic]
>
> Don't know about that :) cpu fan got very noisy, 3 of 4 cores at 100%,
> and after one second I got:
>
> # ip -s l show type veth
> 16: v0p@v0:  mtu 1500 qdisc noqueue state UP 
> mode DEFAULT group default qlen 1000
> link/ether a2:64:ff:10:dd:85 brd ff:ff:ff:ff:ff:ff
> RX: bytes  packets  errors  dropped overrun mcast
> 71660305923 469890864 0   0   0   0
> TX: bytes  packets  errors  dropped carrier collsns
> 3509   24   0   0   0   0
> 17: v0@v0p:  mtu 1500 qdisc noqueue state UP 
> mode DEFAULT group default qlen 1000
> link/ether 52:a2:34:f6:7c:ec brd ff:ff:ff:ff:ff:ff
> RX: bytes  packets  errors  dropped overrun mcast
> 3509   24   0   0   0   0
> TX: bytes  packets  errors  dropped carrier collsns
> 71660713017 469893555 0   0   0   0

These ghost packets never enter IP stack, I don't think TTL
helps.


Re: [PATCH net] Revert "net: ethernet: bcmgenet: use phydev from struct net_device"

2016-09-24 Thread David Miller
From: Florian Fainelli 
Date: Sat, 24 Sep 2016 12:58:30 -0700

> There is already a commit:
> 
> Revert "net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings"
> 
> which should make this apply cleanly to "net" now.

But look at net-next, it got re-added there.

This is going to be a bit of a merge hassle, and this is why I pushed
back on the other attempt to revert this thing.


Re: [PATCH net-next] gre: use nla_get_be32() to extract flowinfo

2016-09-24 Thread David Miller
From: Lance Richardson 
Date: Sat, 24 Sep 2016 14:01:04 -0400

> Eliminate a sparse endianness mismatch warning, use nla_get_be32() to
> extract a __be32 value instead of nla_get_u32().
> 
> Signed-off-by: Lance Richardson 

Applied.


Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-24 Thread Cong Wang
On Thu, Sep 22, 2016 at 10:11 PM, Shmulik Ladkani
 wrote:
> Was wondering why it's missing, googled a bit with no meaningful
> results, so speculated the following:
>
> Some time long ago, initial 'mirred' purpose was to facilitate ifb.
> Therefore 'egress redirect' was implemented. Jamal probably left the
> 'ingress' support for a later time :)
>
> One interesting usecase for 'ingress redirect' is creating "rx bouncing"
> construct (like macvlan/macvtap/ipvlan) but applied according to custom
> logic.

We have done this for our containers for a long time. We simply
redirect packets to veth TX then flow to veth RX of course.

One problem to use your code for us is that, the RX side of veth
is inside containers, not visible to outside, perhaps we need some
more parameter to tell the netns before the device name/index?
Thoughts?

>
>> It may be around preventing loops maybe.
>
> Could be, but personally, I treat these constructs as (powerful)
> building blocks, and "with great power comes great responsibility".
>
> Even today, one may create loops using existing 'egress redirect',
> e.g. this rediculously errorneous construct:
>
>  # ip l add v0 type veth peer name v0p
>  # tc filter add dev v0p parent : basic \
> action mirred egress redirect dev v0

Detecting such loops should not be hard technically, like we do
for reclassification. We might need some bits in skb to detect
this specific case. Anyway, I don't think it is a blocker, just need
more tests to catch some corner cases.

Thanks.


Re: [PATCH] Net Driver: Add Cypress GX3 VID=04b4 PID=3610.

2016-09-24 Thread David Miller
From: Chris Roth 
Date: Sat, 24 Sep 2016 10:59:04 -0600

> Due to my lack of familiarity with the how git send-email works, I've
> unintentionally had my name listed as the first 'from' whereas I
> intended Allan Chou to be listed as the first 'from' in the patch. If
> anyone can correct this on my behalf, I would appreciate it.

Simply just submit a 'v2' of this patch with this issue fixed.


Re: [PATCH net-next 4/4] net/sched: act_mirred: Implement ingress actions

2016-09-24 Thread Cong Wang
On Thu, Sep 22, 2016 at 6:21 AM, Shmulik Ladkani
 wrote:
> From: Shmulik Ladkani 
>
> Up until now, 'action mirred' supported only egress actions (either
> TCA_EGRESS_REDIR or TCA_EGRESS_MIRROR).
>
> This patch implements the corresponding ingress actions
> TCA_INGRESS_REDIR and TCA_INGRESS_MIRROR.
>
> This allows attaching filters whose target is to hand matching skbs into
> the rx processing of a specified device.

I like this idea, this idea actually came to my mind when we
tried to redirect packets from eth0 to veth device in containers.
I remember I already brought this up to Jamal before (either personally
or publicly), but I forgot why I stopped.

This would reduce some latency for our Mesos containers, we
would just skip one veth TX in our scenario.


[PATCH net-next 5/8] rxrpc: Delay the resend timer to allow for nsec->jiffies conv error

2016-09-24 Thread David Howells
When determining the resend timer value, we have a value in nsec but the
timer is in jiffies which may be a million or more times more coarse.
nsecs_to_jiffies() rounds down - which means that the resend timeout
expressed as jiffies is very likely earlier than the one expressed as
nanoseconds from which it was derived.

The problem is that rxrpc_resend() gets triggered by the timer, but can't
then find anything to resend yet.  It sets the timer again - but gets
kicked off immediately again and again until the nanosecond-based expiry
time is reached and we actually retransmit.

Fix this by adding 1 to the jiffies-based resend_at value to counteract the
rounding and make sure that the timer happens after the nanosecond-based
expiry is passed.

Alternatives would be to adjust the timestamp on the packets to align
with the jiffie scale or to switch back to using jiffie-timestamps.

Signed-off-by: David Howells 
---

 net/rxrpc/call_event.c |   10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index a78a92fe5d77..d5bf9ce7ec6f 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -200,8 +200,14 @@ static void rxrpc_resend(struct rxrpc_call *call)
   ktime_to_ns(ktime_sub(skb->tstamp, 
max_age)));
}
 
-   resend_at = ktime_sub(ktime_add_ms(oldest, rxrpc_resend_timeout), now);
-   call->resend_at = jiffies + nsecs_to_jiffies(ktime_to_ns(resend_at));
+   resend_at = ktime_add_ms(oldest, rxrpc_resend_timeout);
+   call->resend_at = jiffies +
+   nsecs_to_jiffies(ktime_to_ns(ktime_sub(resend_at, now))) +
+   1; /* We have to make sure that the calculated jiffies value
+   * falls at or after the nsec value, or we shall loop
+   * ceaselessly because the timer times out, but we haven't
+   * reached the nsec timeout yet.
+   */
 
/* Now go through the Tx window and perform the retransmissions.  We
 * have to drop the lock for each send.  If an ACK comes in whilst the



[PATCH net-next 6/8] rxrpc: Generate a summary of the ACK state for later use

2016-09-24 Thread David Howells
Generate a summary of the Tx buffer packet state when an ACK is received
for use in a later patch that does congestion management.

Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |   14 ++
 net/rxrpc/input.c   |   45 ++---
 2 files changed, 48 insertions(+), 11 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index cdd35e2b40ba..1a700b6a998b 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -540,6 +540,20 @@ struct rxrpc_call {
 
/* transmission-phase ACK management */
rxrpc_serial_t  acks_latest;/* serial number of latest ACK 
received */
+   rxrpc_seq_t acks_lowest_nak; /* Lowest NACK in the buffer 
(or ==tx_hard_ack) */
+};
+
+/*
+ * Summary of a new ACK and the changes it made.
+ */
+struct rxrpc_ack_summary {
+   u8  ack_reason;
+   u8  nr_acks;/* Number of ACKs in 
packet */
+   u8  nr_nacks;   /* Number of NACKs in 
packet */
+   u8  nr_new_acks;/* Number of new ACKs 
in packet */
+   u8  nr_new_nacks;   /* Number of new NACKs 
in packet */
+   u8  nr_rot_new_acks;/* Number of rotated 
new ACKs */
+   boolnew_low_nack;   /* T if new low NACK 
found */
 };
 
 enum rxrpc_skb_trace {
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index bda11eb2ab2a..dd699667eeef 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -56,12 +56,20 @@ static void rxrpc_send_ping(struct rxrpc_call *call, struct 
sk_buff *skb,
 /*
  * Apply a hard ACK by advancing the Tx window.
  */
-static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to)
+static void rxrpc_rotate_tx_window(struct rxrpc_call *call, rxrpc_seq_t to,
+  struct rxrpc_ack_summary *summary)
 {
struct sk_buff *skb, *list = NULL;
int ix;
u8 annotation;
 
+   if (call->acks_lowest_nak == call->tx_hard_ack) {
+   call->acks_lowest_nak = to;
+   } else if (before_eq(call->acks_lowest_nak, to)) {
+   summary->new_low_nack = true;
+   call->acks_lowest_nak = to;
+   }
+
spin_lock(>lock);
 
while (before(call->tx_hard_ack, to)) {
@@ -77,6 +85,8 @@ static void rxrpc_rotate_tx_window(struct rxrpc_call *call, 
rxrpc_seq_t to)
 
if (annotation & RXRPC_TX_ANNO_LAST)
set_bit(RXRPC_CALL_TX_LAST, >flags);
+   if ((annotation & RXRPC_TX_ANNO_MASK) != RXRPC_TX_ANNO_ACK)
+   summary->nr_rot_new_acks++;
}
 
spin_unlock(>lock);
@@ -147,6 +157,7 @@ bad_state:
  */
 static bool rxrpc_receiving_reply(struct rxrpc_call *call)
 {
+   struct rxrpc_ack_summary summary = { 0 };
rxrpc_seq_t top = READ_ONCE(call->tx_top);
 
if (call->ackr_reason) {
@@ -159,7 +170,7 @@ static bool rxrpc_receiving_reply(struct rxrpc_call *call)
}
 
if (!test_bit(RXRPC_CALL_TX_LAST, >flags))
-   rxrpc_rotate_tx_window(call, top);
+   rxrpc_rotate_tx_window(call, top, );
if (!test_bit(RXRPC_CALL_TX_LAST, >flags)) {
rxrpc_proto_abort("TXL", call, top);
return false;
@@ -508,7 +519,8 @@ static void rxrpc_input_ackinfo(struct rxrpc_call *call, 
struct sk_buff *skb,
  * the time the ACK was sent.
  */
 static void rxrpc_input_soft_acks(struct rxrpc_call *call, u8 *acks,
- rxrpc_seq_t seq, int nr_acks)
+ rxrpc_seq_t seq, int nr_acks,
+ struct rxrpc_ack_summary *summary)
 {
bool resend = false;
int ix;
@@ -521,14 +533,23 @@ static void rxrpc_input_soft_acks(struct rxrpc_call 
*call, u8 *acks,
annotation &= ~RXRPC_TX_ANNO_MASK;
switch (*acks++) {
case RXRPC_ACK_TYPE_ACK:
+   summary->nr_acks++;
if (anno_type == RXRPC_TX_ANNO_ACK)
continue;
+   summary->nr_new_acks++;
call->rxtx_annotations[ix] =
RXRPC_TX_ANNO_ACK | annotation;
break;
case RXRPC_ACK_TYPE_NACK:
+   if (!summary->nr_nacks &&
+   call->acks_lowest_nak != seq) {
+   call->acks_lowest_nak = seq;
+   summary->new_low_nack = true;
+   }
+   summary->nr_nacks++;
if (anno_type == RXRPC_TX_ANNO_NAK)
continue;
+   summary->nr_new_nacks++;
if 

[PATCH net-next 8/8] rxrpc: Implement slow-start

2016-09-24 Thread David Howells
Implement RxRPC slow-start, which is similar to RFC 5681 for TCP.  A
tracepoint is added to log the state of the congestion management algorithm
and the decisions it makes.

Notes:

 (1) Since we send fixed-size DATA packets (apart from the final packet in
 each phase), counters and calculations are in terms of packets rather
 than bytes.

 (2) The ACK packet carries the equivalent of TCP SACK.

 (3) The FLIGHT_SIZE calculation in RFC 5681 doesn't seem particularly
 suited to SACK of a small number of packets.  It seems that, almost
 inevitably, by the time three 'duplicate' ACKs have been seen, we have
 narrowed the loss down to one or two missing packets, and the
 FLIGHT_SIZE calculation ends up as 2.

 (4) In rxrpc_resend(), if there was no data that apparently needed
 retransmission, we transmit a PING ACK to ask the peer to tell us what
 its Rx window state is.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |   45 +++
 net/rxrpc/ar-internal.h  |   53 +
 net/rxrpc/call_event.c   |   36 -
 net/rxrpc/call_object.c  |   13 +++
 net/rxrpc/conn_event.c   |1 
 net/rxrpc/input.c|  169 +-
 net/rxrpc/misc.c |   19 +
 net/rxrpc/output.c   |9 ++
 net/rxrpc/sendmsg.c  |7 +-
 9 files changed, 339 insertions(+), 13 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 56475497043d..ada12d00118c 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -570,6 +570,51 @@ TRACE_EVENT(rxrpc_retransmit,
  __entry->expiry)
);
 
+TRACE_EVENT(rxrpc_congest,
+   TP_PROTO(struct rxrpc_call *call, struct rxrpc_ack_summary *summary,
+rxrpc_serial_t ack_serial, enum rxrpc_congest_change 
change),
+
+   TP_ARGS(call, summary, ack_serial, change),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_call *,call
)
+   __field(enum rxrpc_congest_change,  change  
)
+   __field(rxrpc_seq_t,hard_ack
)
+   __field(rxrpc_seq_t,top 
)
+   __field(rxrpc_seq_t,lowest_nak  
)
+   __field(rxrpc_serial_t, ack_serial  
)
+   __field_struct(struct rxrpc_ack_summary,sum 
)
+),
+
+   TP_fast_assign(
+   __entry->call   = call;
+   __entry->change = change;
+   __entry->hard_ack   = call->tx_hard_ack;
+   __entry->top= call->tx_top;
+   __entry->lowest_nak = call->acks_lowest_nak;
+   __entry->ack_serial = ack_serial;
+   memcpy(&__entry->sum, summary, sizeof(__entry->sum));
+  ),
+
+   TP_printk("c=%p %08x %s %08x %s cw=%u ss=%u nr=%u,%u nw=%u,%u r=%u 
b=%u u=%u d=%u l=%x%s%s%s",
+ __entry->call,
+ __entry->ack_serial,
+ rxrpc_ack_names[__entry->sum.ack_reason],
+ __entry->hard_ack,
+ rxrpc_congest_modes[__entry->sum.mode],
+ __entry->sum.cwnd,
+ __entry->sum.ssthresh,
+ __entry->sum.nr_acks, __entry->sum.nr_nacks,
+ __entry->sum.nr_new_acks, __entry->sum.nr_new_nacks,
+ __entry->sum.nr_rot_new_acks,
+ __entry->top - __entry->hard_ack,
+ __entry->sum.cumulative_acks,
+ __entry->sum.dup_acks,
+ __entry->lowest_nak, __entry->sum.new_low_nack ? "!" : "",
+ rxrpc_congest_changes[__entry->change],
+ __entry->sum.retrans_timeo ? " rTxTo" : "")
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index b1e697fc9ffb..ca96e547cb9a 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -402,6 +402,7 @@ enum rxrpc_call_flag {
RXRPC_CALL_RX_LAST, /* Received the last packet (at 
rxtx_top) */
RXRPC_CALL_TX_LAST, /* Last packet in Tx buffer (at 
rxtx_top) */
RXRPC_CALL_PINGING, /* Ping in process */
+   RXRPC_CALL_RETRANS_TIMEOUT, /* Retransmission due to timeout 
occurred */
 };
 
 /*
@@ -447,6 +448,17 @@ enum rxrpc_call_completion {
 };
 
 /*
+ * Call Tx congestion management modes.
+ */
+enum rxrpc_congest_mode {
+   RXRPC_CALL_SLOW_START,
+   RXRPC_CALL_CONGEST_AVOIDANCE,
+   RXRPC_CALL_PACKET_LOSS,
+   

[PATCH net-next 2/8] rxrpc: Send an immediate ACK if we fill in a hole

2016-09-24 Thread David Howells
Send an immediate ACK if we fill in a hole in the buffer left by an
out-of-sequence packet.  This may allow the congestion management in the peer
to avoid a retransmission if packets got reordered on the wire.

Signed-off-by: David Howells 
---

 net/rxrpc/input.c |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 349698d87ad1..757c16f033a0 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -331,8 +331,16 @@ next_subpacket:
call->rxtx_annotations[ix] = annotation;
smp_wmb();
call->rxtx_buffer[ix] = skb;
-   if (after(seq, call->rx_top))
+   if (after(seq, call->rx_top)) {
smp_store_release(>rx_top, seq);
+   } else if (before(seq, call->rx_top)) {
+   /* Send an immediate ACK if we fill in a hole */
+   if (!ack) {
+   ack = RXRPC_ACK_DELAY;
+   ack_serial = serial;
+   }
+   immediate_ack = true;
+   }
if (flags & RXRPC_LAST_PACKET) {
set_bit(RXRPC_CALL_RX_LAST, >flags);
trace_rxrpc_receive(call, rxrpc_receive_queue_last, serial, 
seq);



[PATCH net-next 3/8] rxrpc: Include the last reply DATA serial number in the final ACK

2016-09-24 Thread David Howells
In a client call, include the serial number of the last DATA packet of the
reply in the final ACK.

Signed-off-by: David Howells 
---

 net/rxrpc/recvmsg.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index a7458c398b9e..038ae62ddb4d 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -133,7 +133,7 @@ static int rxrpc_recvmsg_new_call(struct rxrpc_sock *rx,
 /*
  * End the packet reception phase.
  */
-static void rxrpc_end_rx_phase(struct rxrpc_call *call)
+static void rxrpc_end_rx_phase(struct rxrpc_call *call, rxrpc_serial_t serial)
 {
_enter("%d,%s", call->debug_id, rxrpc_call_states[call->state]);
 
@@ -141,7 +141,7 @@ static void rxrpc_end_rx_phase(struct rxrpc_call *call)
ASSERTCMP(call->rx_hard_ack, ==, call->rx_top);
 
if (call->state == RXRPC_CALL_CLIENT_RECV_REPLY) {
-   rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, 0, true, false,
+   rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, serial, true, false,
  rxrpc_propose_ack_terminal_ack);
rxrpc_send_call_packet(call, RXRPC_PACKET_TYPE_ACK);
}
@@ -202,7 +202,7 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
_debug("%u,%u,%02x", hard_ack, top, flags);
trace_rxrpc_receive(call, rxrpc_receive_rotate, serial, hard_ack);
if (flags & RXRPC_LAST_PACKET) {
-   rxrpc_end_rx_phase(call);
+   rxrpc_end_rx_phase(call, serial);
} else {
/* Check to see if there's an ACK that needs sending. */
if (after_eq(hard_ack, call->ackr_consumed + 2) ||



[PATCH net-next 1/8] rxrpc: Send an ACK after every few DATA packets we receive

2016-09-24 Thread David Howells
Send an ACK if we haven't sent one for the last two packets we've received.
This keeps the other end apprised of where we've got to - which is
important if they're doing slow-start.

We do this in recvmsg so that we can dispatch a packet directly without the
need to wake up the background thread.

This should possibly be made configurable in future.

Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |3 +++
 net/rxrpc/misc.c|1 +
 net/rxrpc/output.c  |   25 +
 net/rxrpc/recvmsg.c |   13 -
 4 files changed, 33 insertions(+), 9 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 042dbcc52654..e3bf9c0e3ad1 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -533,6 +533,8 @@ struct rxrpc_call {
u16 ackr_skew;  /* skew on packet being ACK'd */
rxrpc_serial_t  ackr_serial;/* serial of packet being ACK'd 
*/
rxrpc_seq_t ackr_prev_seq;  /* previous sequence number 
received */
+   rxrpc_seq_t ackr_consumed;  /* Highest packet shown 
consumed */
+   rxrpc_seq_t ackr_seen;  /* Highest packet shown seen */
rxrpc_serial_t  ackr_ping;  /* Last ping sent */
ktime_t ackr_ping_time; /* Time last ping sent */
 
@@ -695,6 +697,7 @@ enum rxrpc_propose_ack_trace {
rxrpc_propose_ack_respond_to_ack,
rxrpc_propose_ack_respond_to_ping,
rxrpc_propose_ack_retry_tx,
+   rxrpc_propose_ack_rotate_rx,
rxrpc_propose_ack_terminal_ack,
rxrpc_propose_ack__nr_trace
 };
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index 1ca14835d87f..a473fd7dabaa 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -202,6 +202,7 @@ const char 
rxrpc_propose_ack_traces[rxrpc_propose_ack__nr_trace][8] = {
[rxrpc_propose_ack_respond_to_ack]  = "Rsp2Ack",
[rxrpc_propose_ack_respond_to_ping] = "Rsp2Png",
[rxrpc_propose_ack_retry_tx]= "RetryTx",
+   [rxrpc_propose_ack_rotate_rx]   = "RxAck  ",
[rxrpc_propose_ack_terminal_ack]= "ClTerm ",
 };
 
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 0c563e325c9d..3eb01445e814 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -36,7 +36,9 @@ struct rxrpc_pkt_buffer {
  * Fill out an ACK packet.
  */
 static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
-struct rxrpc_pkt_buffer *pkt)
+struct rxrpc_pkt_buffer *pkt,
+rxrpc_seq_t *_hard_ack,
+rxrpc_seq_t *_top)
 {
rxrpc_serial_t serial;
rxrpc_seq_t hard_ack, top, seq;
@@ -48,6 +50,8 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
serial = call->ackr_serial;
hard_ack = READ_ONCE(call->rx_hard_ack);
top = smp_load_acquire(>rx_top);
+   *_hard_ack = hard_ack;
+   *_top = top;
 
pkt->ack.bufferSpace= htons(8);
pkt->ack.maxSkew= htons(call->ackr_skew);
@@ -96,6 +100,7 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 type)
struct msghdr msg;
struct kvec iov[2];
rxrpc_serial_t serial;
+   rxrpc_seq_t hard_ack, top;
size_t len, n;
bool ping = false;
int ioc, ret;
@@ -146,7 +151,7 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 type)
goto out;
}
ping = (call->ackr_reason == RXRPC_ACK_PING);
-   n = rxrpc_fill_out_ack(call, pkt);
+   n = rxrpc_fill_out_ack(call, pkt, _ack, );
call->ackr_reason = 0;
 
spin_unlock_bh(>lock);
@@ -203,18 +208,22 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 
type)
if (ping)
call->ackr_ping_time = ktime_get_real();
 
-   if (ret < 0 && call->state < RXRPC_CALL_COMPLETE) {
-   switch (type) {
-   case RXRPC_PACKET_TYPE_ACK:
+   if (type == RXRPC_PACKET_TYPE_ACK &&
+   call->state < RXRPC_CALL_COMPLETE) {
+   if (ret < 0) {
clear_bit(RXRPC_CALL_PINGING, >flags);
rxrpc_propose_ACK(call, pkt->ack.reason,
  ntohs(pkt->ack.maxSkew),
  ntohl(pkt->ack.serial),
  true, true,
  rxrpc_propose_ack_retry_tx);
-   break;
-   case RXRPC_PACKET_TYPE_ABORT:
-   break;
+   } else {
+   spin_lock_bh(>lock);
+   if (after(hard_ack, call->ackr_consumed))
+   call->ackr_consumed = hard_ack;
+   if (after(top, 

[PATCH net-next 7/8] rxrpc: Schedule an ACK if the reply to a client call appears overdue

2016-09-24 Thread David Howells
If we've sent all the request data in a client call but haven't seen any
sign of the reply data yet, schedule an ACK to be sent to the server to
find out if the reply data got lost.

If the server hasn't yet hard-ACK'd the request data, we send a PING ACK to
demand a response to find out whether we need to retransmit.

If the server says it has received all of the data, we send an IDLE ACK to
tell the server that we haven't received anything in the receive phase as
yet.

To make this work, a non-immediate PING ACK must carry a delay.  I've chosen
the same as the IDLE ACK for the moment.

Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |2 ++
 net/rxrpc/call_event.c  |1 +
 net/rxrpc/input.c   |8 
 net/rxrpc/misc.c|2 ++
 4 files changed, 13 insertions(+)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 1a700b6a998b..b1e697fc9ffb 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -707,7 +707,9 @@ enum rxrpc_timer_trace {
 extern const char rxrpc_timer_traces[rxrpc_timer__nr_trace][8];
 
 enum rxrpc_propose_ack_trace {
+   rxrpc_propose_ack_client_tx_end,
rxrpc_propose_ack_input_data,
+   rxrpc_propose_ack_ping_for_lost_reply,
rxrpc_propose_ack_ping_for_params,
rxrpc_propose_ack_respond_to_ack,
rxrpc_propose_ack_respond_to_ping,
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index d5bf9ce7ec6f..05b94d1acf52 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -100,6 +100,7 @@ static void __rxrpc_propose_ACK(struct rxrpc_call *call, u8 
ack_reason,
expiry = rxrpc_soft_ack_delay;
break;
 
+   case RXRPC_ACK_PING:
case RXRPC_ACK_IDLE:
if (rxrpc_idle_ack_delay < expiry)
expiry = rxrpc_idle_ack_delay;
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index dd699667eeef..0344f4494eb7 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -138,6 +138,8 @@ static bool rxrpc_end_tx_phase(struct rxrpc_call *call, 
bool reply_begun,
 
write_unlock(>state_lock);
if (call->state == RXRPC_CALL_CLIENT_AWAIT_REPLY) {
+   rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, 0, false, true,
+ rxrpc_propose_ack_client_tx_end);
trace_rxrpc_transmit(call, rxrpc_transmit_await_reply);
} else {
trace_rxrpc_transmit(call, rxrpc_transmit_end);
@@ -684,6 +686,12 @@ static void rxrpc_input_ack(struct rxrpc_call *call, 
struct sk_buff *skb,
return;
}
 
+   if (call->rxtx_annotations[call->tx_top & RXRPC_RXTX_BUFF_MASK] &
+   RXRPC_TX_ANNO_LAST &&
+   summary.nr_acks == call->tx_top - hard_ack)
+   rxrpc_propose_ACK(call, RXRPC_ACK_PING, skew, sp->hdr.serial,
+ false, true,
+ rxrpc_propose_ack_ping_for_lost_reply);
 }
 
 /*
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index 901c012a2700..a608769343e6 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -198,7 +198,9 @@ const char rxrpc_timer_traces[rxrpc_timer__nr_trace][8] = {
 };
 
 const char rxrpc_propose_ack_traces[rxrpc_propose_ack__nr_trace][8] = {
+   [rxrpc_propose_ack_client_tx_end]   = "ClTxEnd",
[rxrpc_propose_ack_input_data]  = "DataIn ",
+   [rxrpc_propose_ack_ping_for_lost_reply] = "LostRpl",
[rxrpc_propose_ack_ping_for_params] = "Params ",
[rxrpc_propose_ack_respond_to_ack]  = "Rsp2Ack",
[rxrpc_propose_ack_respond_to_ping] = "Rsp2Png",



[PATCH net-next 4/8] rxrpc: Reinitialise the call ACK and timer state for client reply phase

2016-09-24 Thread David Howells
Clear the ACK reason, ACK timer and resend timer when entering the client
reply phase when the first DATA packet is received.  New ACKs will be
proposed once the data is queued.

The resend timer is no longer relevant and we need to cancel ACKs scheduled
to probe for a lost reply.

Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |1 +
 net/rxrpc/input.c   |9 +
 net/rxrpc/misc.c|1 +
 3 files changed, 11 insertions(+)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index e3bf9c0e3ad1..cdd35e2b40ba 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -682,6 +682,7 @@ extern const char 
rxrpc_rtt_rx_traces[rxrpc_rtt_rx__nr_trace][5];
 
 enum rxrpc_timer_trace {
rxrpc_timer_begin,
+   rxrpc_timer_init_for_reply,
rxrpc_timer_expired,
rxrpc_timer_set_for_ack,
rxrpc_timer_set_for_resend,
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 757c16f033a0..bda11eb2ab2a 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -149,6 +149,15 @@ static bool rxrpc_receiving_reply(struct rxrpc_call *call)
 {
rxrpc_seq_t top = READ_ONCE(call->tx_top);
 
+   if (call->ackr_reason) {
+   spin_lock_bh(>lock);
+   call->ackr_reason = 0;
+   call->resend_at = call->expire_at;
+   call->ack_at = call->expire_at;
+   spin_unlock_bh(>lock);
+   rxrpc_set_timer(call, rxrpc_timer_init_for_reply);
+   }
+
if (!test_bit(RXRPC_CALL_TX_LAST, >flags))
rxrpc_rotate_tx_window(call, top);
if (!test_bit(RXRPC_CALL_TX_LAST, >flags)) {
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index a473fd7dabaa..901c012a2700 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -191,6 +191,7 @@ const char rxrpc_rtt_rx_traces[rxrpc_rtt_rx__nr_trace][5] = 
{
 const char rxrpc_timer_traces[rxrpc_timer__nr_trace][8] = {
[rxrpc_timer_begin] = "Begin ",
[rxrpc_timer_expired]   = "*EXPR*",
+   [rxrpc_timer_init_for_reply]= "IniRpl",
[rxrpc_timer_set_for_ack]   = "SetAck",
[rxrpc_timer_set_for_send]  = "SetTx ",
[rxrpc_timer_set_for_resend]= "SetRTx",



[PATCH v5 07/16] IB/pvrdma: Add helper functions

2016-09-24 Thread Adit Ranadive
This patch adds helper functions to store guest page addresses in a page
directory structure. The page directory pointer is passed down to the
backend which then maps the entire memory for the RDMA object by
traversing the directory. We add some more helper functions for converting
to/from RDMA stack address handles from/to PVRDMA ones.

Reviewed-by: Yuval Shaia 
Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Correct var type passed to dma_alloc_coherent.

Changes v3->v4:
 - Updated conversion functions to func_name(dst, src) format.
 - Removed unneeded local variables.
---
 drivers/infiniband/hw/pvrdma/pvrdma_misc.c | 304 +
 1 file changed, 304 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_misc.c

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_misc.c 
b/drivers/infiniband/hw/pvrdma/pvrdma_misc.c
new file mode 100644
index 000..948b5cc
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma_misc.c
@@ -0,0 +1,304 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "pvrdma.h"
+
+int pvrdma_page_dir_init(struct pvrdma_dev *dev, struct pvrdma_page_dir *pdir,
+u64 npages, bool alloc_pages)
+{
+   u64 i;
+
+   if (npages > PVRDMA_PAGE_DIR_MAX_PAGES)
+   return -EINVAL;
+
+   memset(pdir, 0, sizeof(*pdir));
+
+   pdir->dir = dma_alloc_coherent(>pdev->dev, PAGE_SIZE,
+  >dir_dma, GFP_KERNEL);
+   if (!pdir->dir)
+   goto err;
+
+   pdir->ntables = PVRDMA_PAGE_DIR_TABLE(npages - 1) + 1;
+   pdir->tables = kcalloc(pdir->ntables, sizeof(*pdir->tables),
+  GFP_KERNEL);
+   if (!pdir->tables)
+   goto err;
+
+   for (i = 0; i < pdir->ntables; i++) {
+   pdir->tables[i] = dma_alloc_coherent(>pdev->dev, PAGE_SIZE,
+   (dma_addr_t *)>dir[i],
+   GFP_KERNEL);
+   if (!pdir->tables[i])
+   goto err;
+   }
+
+   pdir->npages = npages;
+
+   if (alloc_pages) {
+   pdir->pages = kcalloc(npages, sizeof(*pdir->pages),
+ GFP_KERNEL);
+   if (!pdir->pages)
+   goto err;
+
+   for (i = 0; i < pdir->npages; i++) {
+   dma_addr_t page_dma;
+
+   pdir->pages[i] = dma_alloc_coherent(>pdev->dev,
+

[PATCH v5 10/16] IB/pvrdma: Add UAR support

2016-09-24 Thread Adit Ranadive
This patch adds the UAR support for the paravirtual RDMA device. The UAR
pages are MMIO pages from the virtual PCI space. We define offsets within
this page to provide the fast data-path operations.

Reviewed-by: Yuval Shaia 
Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v3->v4:
 - Removed an unnecessary comment.

Changes v2->v3:
 - Used is_power_of_2 function.
 - Simplify pvrdma_uar_alloc function.
---
 drivers/infiniband/hw/pvrdma/pvrdma_doorbell.c | 127 +
 1 file changed, 127 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_doorbell.c

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_doorbell.c 
b/drivers/infiniband/hw/pvrdma/pvrdma_doorbell.c
new file mode 100644
index 000..bf51357
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma_doorbell.c
@@ -0,0 +1,127 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+
+#include "pvrdma.h"
+
+int pvrdma_uar_table_init(struct pvrdma_dev *dev)
+{
+   u32 num = dev->dsr->caps.max_uar;
+   u32 mask = num - 1;
+   struct pvrdma_id_table *tbl = >uar_table.tbl;
+
+   if (!is_power_of_2(num))
+   return -EINVAL;
+
+   tbl->last = 0;
+   tbl->top = 0;
+   tbl->max = num;
+   tbl->mask = mask;
+   spin_lock_init(>lock);
+   tbl->table = kcalloc(BITS_TO_LONGS(num), sizeof(long), GFP_KERNEL);
+   if (!tbl->table)
+   return -ENOMEM;
+
+   /* 0th UAR is taken by the device. */
+   set_bit(0, tbl->table);
+
+   return 0;
+}
+
+void pvrdma_uar_table_cleanup(struct pvrdma_dev *dev)
+{
+   struct pvrdma_id_table *tbl = >uar_table.tbl;
+
+   kfree(tbl->table);
+}
+
+int pvrdma_uar_alloc(struct pvrdma_dev *dev, struct pvrdma_uar_map *uar)
+{
+   struct pvrdma_id_table *tbl;
+   unsigned long flags;
+   u32 obj;
+
+   tbl = >uar_table.tbl;
+
+   spin_lock_irqsave(>lock, flags);
+   obj = find_next_zero_bit(tbl->table, tbl->max, tbl->last);
+   if (obj >= tbl->max) {
+   tbl->top = (tbl->top + tbl->max) & tbl->mask;
+   obj = find_first_zero_bit(tbl->table, tbl->max);
+   }
+
+   if (obj >= tbl->max) {
+   spin_unlock_irqrestore(>lock, flags);
+   return -ENOMEM;
+   }
+
+   set_bit(obj, tbl->table);
+   obj |= tbl->top;
+
+   spin_unlock_irqrestore(>lock, flags);
+
+   uar->index = obj;
+   uar->pfn = (pci_resource_start(dev->pdev, PVRDMA_PCI_RESOURCE_UAR) >>
+   PAGE_SHIFT) + uar->index;
+
+   return 0;
+}
+
+void 

[PATCH v5 09/16] IB/pvrdma: Add support for Completion Queues

2016-09-24 Thread Adit Ranadive
This patch adds the support for creating and destroying completion queues
on the paravirtual RDMA device.

Reviewed-by: Yuval Shaia 
Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Updated include for headers in UAPI folder.
 - Changed from EINVAL to ENOMEM if atomic add fails.
 - Added error code if destroy cq command failed.
 - Update to pvrdma_cmd_post for creating/destroying CQ.

Changes v3->v4:
 - Added a pvrdma_destroy_cq in the error path.
 - Renamed pvrdma_flush_cqe to _pvrdma_flush_cqe since we need a lock to
 be held while calling this.
 - Updated to use wrapper for UAR write for CQ.
 - Ensure that poll_cq does not return error values.

Changes v2->v3:
 - Removed boolean from pvrdma_cmd_post.
 - Return -EAGAIN if qp retrieved from CQE is bogus.
 - Check for invalid index of ring.
---
 drivers/infiniband/hw/pvrdma/pvrdma_cq.c | 426 +++
 1 file changed, 426 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_cq.c

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_cq.c 
b/drivers/infiniband/hw/pvrdma/pvrdma_cq.c
new file mode 100644
index 000..f26d4bc
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma_cq.c
@@ -0,0 +1,426 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvrdma.h"
+
+/**
+ * pvrdma_req_notify_cq - request notification for a completion queue
+ * @ibcq: the completion queue
+ * @notify_flags: notification flags
+ *
+ * @return: 0 for success.
+ */
+int pvrdma_req_notify_cq(struct ib_cq *ibcq,
+enum ib_cq_notify_flags notify_flags)
+{
+   struct pvrdma_dev *dev = to_vdev(ibcq->device);
+   struct pvrdma_cq *cq = to_vcq(ibcq);
+   u32 val = cq->cq_handle;
+
+   val |= (notify_flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED ?
+   PVRDMA_UAR_CQ_ARM_SOL : PVRDMA_UAR_CQ_ARM;
+
+   pvrdma_write_uar_cq(dev, val);
+
+   return 0;
+}
+
+/**
+ * pvrdma_create_cq - create completion queue
+ * @ibdev: the device
+ * @attr: completion queue attributes
+ * @context: user context
+ * @udata: user data
+ *
+ * @return: ib_cq completion queue pointer on success,
+ *  otherwise returns negative errno.
+ */
+struct ib_cq *pvrdma_create_cq(struct ib_device *ibdev,
+  const struct ib_cq_init_attr *attr,
+  struct ib_ucontext *context,
+  struct ib_udata *udata)
+{
+   int entries = attr->cqe;
+   struct pvrdma_dev *dev = 

[PATCH v5 13/16] IB/pvrdma: Add the main driver module for PVRDMA

2016-09-24 Thread Adit Ranadive
This patch adds the support to register a RDMA device with the kernel RDMA
stack as well as a kernel module. This also initializes the underlying
virtual PCI device.

Reviewed-by: Yuval Shaia 
Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Removed two unnecessary lines.
 - Updated include for headers in UAPI folder.
 - Update to pvrdma_cmd_post for add/delete GIDs.
 - Add error code in dev_warn if pvrdma_cmd_post failed.

Changes v3->v4:
 - Fixed some checkpatch warnings.
 - Added support for new get_dev_fw_str API.
 - Added event workqueue for netdevice events.
 - Restructured the pvrdma_pci_remove function a little bit.

Changes v2->v3:
 - Removed boolean in pvrdma_cmd_post.

Changes v1->v2:
 - Addressed 32-bit build errors
 - Cosmetic change to avoid if else in intr0_handler
 - Removed unnecessary return assignment.
---
 drivers/infiniband/hw/pvrdma/pvrdma_main.c | 1220 
 1 file changed, 1220 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_main.c

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_main.c 
b/drivers/infiniband/hw/pvrdma/pvrdma_main.c
new file mode 100644
index 000..94cbbb9
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma_main.c
@@ -0,0 +1,1220 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvrdma.h"
+
+#define DRV_NAME   "pvrdma"
+#define DRV_VERSION"1.0"
+#define DRV_RELDATE"January 1, 2013"
+
+static const char pvrdma_version[] =
+   DRV_NAME ": PVRDMA InfiniBand driver v"
+   DRV_VERSION " (" DRV_RELDATE ")\n";
+
+static DEFINE_MUTEX(pvrdma_device_list_lock);
+static LIST_HEAD(pvrdma_device_list);
+static struct workqueue_struct *event_wq;
+
+static int pvrdma_add_gid(struct ib_device *ibdev,
+ u8 port_num,
+ unsigned int index,
+ const union ib_gid *gid,
+ const struct ib_gid_attr *attr,
+ void **context);
+static int pvrdma_del_gid(struct ib_device *ibdev,
+ u8 port_num,
+ unsigned int index,
+ void **context);
+
+
+static ssize_t show_hca(struct device *device, struct device_attribute *attr,
+   char *buf)
+{
+   return sprintf(buf, "PVRDMA%s\n", DRV_VERSION);
+}
+
+static ssize_t show_rev(struct device *device, struct device_attribute *attr,
+   char *buf)
+{
+   

[PATCH v5 05/16] IB/pvrdma: Add functions for Verbs support

2016-09-24 Thread Adit Ranadive
This patch implements the remaining Verbs functions registered with the
core RDMA stack.

Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Update include for headers in UAPI folder.
 - Removed setting any properties that are reported by device as 0.
 - Simplified modify_port.
 - PD should be allocated first in kernel then in device.
 - Update to pvrdma_cmd_post for creating/destroying PD, Query port/device.

Changes v3->v4:
 - Renamed priviledged -> privileged.
 - Added error numbers for command errors.
 - Removed unnecessary goto in modify_device.
 - Moved pd allocation to after command execution.
 - Removed an incorrect atomic_dec.
---
 drivers/infiniband/hw/pvrdma/pvrdma_verbs.c | 577 
 drivers/infiniband/hw/pvrdma/pvrdma_verbs.h | 108 ++
 2 files changed, 685 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_verbs.c
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_verbs.h

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_verbs.c 
b/drivers/infiniband/hw/pvrdma/pvrdma_verbs.c
new file mode 100644
index 000..a7aef93d
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma_verbs.c
@@ -0,0 +1,577 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvrdma.h"
+
+/**
+ * pvrdma_query_device - query device
+ * @ibdev: the device to query
+ * @props: the device properties
+ * @uhw: user data
+ *
+ * @return: 0 on success, otherwise negative errno
+ */
+int pvrdma_query_device(struct ib_device *ibdev,
+   struct ib_device_attr *props,
+   struct ib_udata *uhw)
+{
+   struct pvrdma_dev *dev = to_vdev(ibdev);
+
+   if (uhw->inlen || uhw->outlen)
+   return -EINVAL;
+
+   memset(props, 0, sizeof(*props));
+
+   props->fw_ver = dev->dsr->caps.fw_ver;
+   props->sys_image_guid = dev->dsr->caps.sys_image_guid;
+   props->max_mr_size = dev->dsr->caps.max_mr_size;
+   props->page_size_cap = dev->dsr->caps.page_size_cap;
+   props->vendor_id = dev->dsr->caps.vendor_id;
+   props->vendor_part_id = dev->pdev->device;
+   props->hw_ver = dev->dsr->caps.hw_ver;
+   props->max_qp = dev->dsr->caps.max_qp;
+   props->max_qp_wr = dev->dsr->caps.max_qp_wr;
+   props->device_cap_flags = dev->dsr->caps.device_cap_flags;
+   props->max_sge = dev->dsr->caps.max_sge;
+   props->max_cq = dev->dsr->caps.max_cq;
+   props->max_cqe = dev->dsr->caps.max_cqe;
+   props->max_mr = dev->dsr->caps.max_mr;
+   

[PATCH v5 11/16] IB/pvrdma: Add support for memory regions

2016-09-24 Thread Adit Ranadive
This patch adds support for creating and destroying memory regions. The
PVRDMA device supports User MRs, DMA MRs (no Remote Read/Write support),
Fast Register MRs.

Reviewed-by: Yuval Shaia 
Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Check the access flags correctly for DMA MR.
 - Update to pvrdma_cmd_post for creating/destroying MRs.

Changes v3->v4:
 - Changed access flag check for DMA MR to using bit operation.
 - Removed some local variables.

Changes v2->v3:
 - Removed boolean in pvrdma_cmd_post.
---
 drivers/infiniband/hw/pvrdma/pvrdma_mr.c | 334 +++
 1 file changed, 334 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_mr.c

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_mr.c 
b/drivers/infiniband/hw/pvrdma/pvrdma_mr.c
new file mode 100644
index 000..8519f32
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma_mr.c
@@ -0,0 +1,334 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+
+#include "pvrdma.h"
+
+/**
+ * pvrdma_get_dma_mr - get a DMA memory region
+ * @pd: protection domain
+ * @acc: access flags
+ *
+ * @return: ib_mr pointer on success, otherwise returns an errno.
+ */
+struct ib_mr *pvrdma_get_dma_mr(struct ib_pd *pd, int acc)
+{
+   struct pvrdma_dev *dev = to_vdev(pd->device);
+   struct pvrdma_user_mr *mr;
+   union pvrdma_cmd_req req;
+   union pvrdma_cmd_resp rsp;
+   struct pvrdma_cmd_create_mr *cmd = _mr;
+   struct pvrdma_cmd_create_mr_resp *resp = _mr_resp;
+   int ret;
+
+   /* Support only LOCAL_WRITE flag for DMA MRs */
+   if (acc & ~IB_ACCESS_LOCAL_WRITE) {
+   dev_warn(>pdev->dev,
+"unsupported dma mr access flags %#x\n", acc);
+   return ERR_PTR(-EOPNOTSUPP);
+   }
+
+   mr = kzalloc(sizeof(*mr), GFP_KERNEL);
+   if (!mr)
+   return ERR_PTR(-ENOMEM);
+
+   memset(cmd, 0, sizeof(*cmd));
+   cmd->hdr.cmd = PVRDMA_CMD_CREATE_MR;
+   cmd->pd_handle = to_vpd(pd)->pd_handle;
+   cmd->access_flags = acc;
+   cmd->flags = PVRDMA_MR_FLAG_DMA;
+
+   ret = pvrdma_cmd_post(dev, , , PVRDMA_CMD_CREATE_MR_RESP);
+   if (ret < 0) {
+   dev_warn(>pdev->dev,
+"could not get DMA mem region, error: %d\n", ret);
+   kfree(mr);
+   return ERR_PTR(ret);
+   }
+
+   mr->mmr.mr_handle = resp->mr_handle;
+   mr->ibmr.lkey = resp->lkey;
+   mr->ibmr.rkey = resp->rkey;
+
+   return >ibmr;
+}
+
+/**
+ * pvrdma_reg_user_mr - 

[PATCH net-next 0/8] rxrpc: Implement slow-start and other bits

2016-09-24 Thread David Howells

This set of patches implements the RxRPC slow-start feature for AF_RXRPC to
improve performance and handling of occasional packet loss.  This is more or
less the same as TCP slow start [RFC 5681].  Firstly, there are some ACK
generation improvements:

 (1) Send ACKs regularly to apprise the peer of our state so that they can do
 congestion management of their own.

 (2) Send an ACK when we fill in a hole in the buffer so that the peer can
 find out that we did this thus forestalling retransmission.

 (3) Note the final DATA packet's serial number in the final ACK for
 correlation purposes.

and a couple of bug fixes:

 (4) Reinitialise the ACK state and clear the ACK and resend timers upon
 entering the client reply reception phase to kill off any pending probe
 ACKs.

 (5) Delay the resend timer to allow for nsec->jiffies conversion errors.

and then there's the slow-start pieces:

 (6) Summarise an ACK.

 (7) Schedule a PING or IDLE ACK if the reply to a client call is overdue to
 try and find out what happened to it.

 (8) Implement the slow start feature.

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-20160924

David
---
David Howells (8):
  rxrpc: Send an ACK after every few DATA packets we receive
  rxrpc: Send an immediate ACK if we fill in a hole
  rxrpc: Include the last reply DATA serial number in the final ACK
  rxrpc: Reinitialise the call ACK and timer state for client reply phase
  rxrpc: Delay the resend timer to allow for nsec->jiffies conv error
  rxrpc: Generate a summary of the ACK state for later use
  rxrpc: Schedule an ACK if the reply to a client call appears overdue
  rxrpc: Implement slow-start


 include/trace/events/rxrpc.h |   45 
 net/rxrpc/ar-internal.h  |   71 
 net/rxrpc/call_event.c   |   47 +++-
 net/rxrpc/call_object.c  |   13 ++
 net/rxrpc/conn_event.c   |1 
 net/rxrpc/input.c|  241 +++---
 net/rxrpc/misc.c |   23 
 net/rxrpc/output.c   |   34 --
 net/rxrpc/recvmsg.c  |   19 +++
 net/rxrpc/sendmsg.c  |7 +
 10 files changed, 463 insertions(+), 38 deletions(-)



[PATCH v5 03/16] IB/pvrdma: Add virtual device RDMA structures

2016-09-24 Thread Adit Ranadive
This patch adds the various Verbs structures that we support in the
virtual RDMA device. We have re-mapped the ones from the RDMA core stack
to make sure we can maintain compatibility with our backend.

Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Removed __ prefix for unsigned vars.

Changes v3->v4:
 - Moved the pvrdma_sge struct to pvrdma_uapi.h.
---
 drivers/infiniband/hw/pvrdma/pvrdma_ib_verbs.h | 444 +
 1 file changed, 444 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_ib_verbs.h

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_ib_verbs.h 
b/drivers/infiniband/hw/pvrdma/pvrdma_ib_verbs.h
new file mode 100644
index 000..290b6d8
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma_ib_verbs.h
@@ -0,0 +1,444 @@
+/*
+ * [PLEASE NOTE:  VMWARE, INC. ELECTS TO USE AND DISTRIBUTE THIS COMPONENT
+ * UNDER THE TERMS OF THE OpenIB.org BSD license.  THE ORIGINAL LICENSE TERMS
+ * ARE REPRODUCED BELOW ONLY AS A REFERENCE.]
+ *
+ * Copyright (c) 2004 Mellanox Technologies Ltd.  All rights reserved.
+ * Copyright (c) 2004 Infinicon Corporation.  All rights reserved.
+ * Copyright (c) 2004 Intel Corporation.  All rights reserved.
+ * Copyright (c) 2004 Topspin Corporation.  All rights reserved.
+ * Copyright (c) 2004 Voltaire Corporation.  All rights reserved.
+ * Copyright (c) 2005 Sun Microsystems, Inc. All rights reserved.
+ * Copyright (c) 2005, 2006, 2007 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2015-2016 VMware, Inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __PVRDMA_IB_VERBS_H__
+#define __PVRDMA_IB_VERBS_H__
+
+#include 
+
+union pvrdma_gid {
+   u8  raw[16];
+   struct {
+   __be64  subnet_prefix;
+   __be64  interface_id;
+   } global;
+};
+
+enum pvrdma_link_layer {
+   PVRDMA_LINK_LAYER_UNSPECIFIED,
+   PVRDMA_LINK_LAYER_INFINIBAND,
+   PVRDMA_LINK_LAYER_ETHERNET,
+};
+
+enum pvrdma_mtu {
+   PVRDMA_MTU_256  = 1,
+   PVRDMA_MTU_512  = 2,
+   PVRDMA_MTU_1024 = 3,
+   PVRDMA_MTU_2048 = 4,
+   PVRDMA_MTU_4096 = 5,
+};
+
+static inline int pvrdma_mtu_enum_to_int(enum pvrdma_mtu mtu)
+{
+   switch (mtu) {
+   case PVRDMA_MTU_256:return  256;
+   case PVRDMA_MTU_512:return  512;
+   case PVRDMA_MTU_1024:   return 1024;
+   case PVRDMA_MTU_2048:   return 2048;
+   case PVRDMA_MTU_4096:   return 4096;
+   default:return   -1;
+   }
+}
+
+static inline enum pvrdma_mtu pvrdma_mtu_int_to_enum(int mtu)
+{
+   switch (mtu) {
+   case 256:   return PVRDMA_MTU_256;
+   case 512:   return PVRDMA_MTU_512;
+   case 1024:  return PVRDMA_MTU_1024;
+   case 2048:  return PVRDMA_MTU_2048;
+   case 4096:
+   default:return PVRDMA_MTU_4096;
+   }
+}
+
+enum pvrdma_port_state {
+   PVRDMA_PORT_NOP = 0,
+   PVRDMA_PORT_DOWN= 1,
+   PVRDMA_PORT_INIT= 2,
+   PVRDMA_PORT_ARMED   = 3,
+   PVRDMA_PORT_ACTIVE  = 4,
+   PVRDMA_PORT_ACTIVE_DEFER= 5,
+};
+
+enum pvrdma_port_cap_flags {
+   PVRDMA_PORT_SM  = 1 <<  1,
+   PVRDMA_PORT_NOTICE_SUP  = 1 <<  2,
+   PVRDMA_PORT_TRAP_SUP= 1 <<  3,
+   PVRDMA_PORT_OPT_IPD_SUP = 1 <<  4,

[PATCH v5 06/16] IB/pvrdma: Add paravirtual rdma device

2016-09-24 Thread Adit Ranadive
This patch adds the main device-level structures and functions to be used
to provide RDMA functionality. Also, we define conversion functions from
the IB core stack structures to the device-specific ones.

Reviewed-by: Yuval Shaia 
Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - pvrdma_cmd_post takes the response code.

Changes v3->v4:
 - Renamed pvrdma_flush_cqe to _pvrdma_flush_cqe since we hold a lock
 to call it.
 - Added wrapper functions for writing to UARs for CQ/QP.
 - The conversion functions are updated as func_name(dst, src) format.
 - Renamed max_gs to max_sg.
 - Added work struct for net device events.
 - priviledged -> privileged.

Changes v2->v3:
 - Removed VMware vendor id redefinition.
 - Removed the boolean in pvrdma_cmd_post.
---
 drivers/infiniband/hw/pvrdma/pvrdma.h | 473 ++
 1 file changed, 473 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma.h

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma.h 
b/drivers/infiniband/hw/pvrdma/pvrdma.h
new file mode 100644
index 000..8073ffc
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma.h
@@ -0,0 +1,473 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __PVRDMA_H__
+#define __PVRDMA_H__
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvrdma_defs.h"
+#include "pvrdma_dev_api.h"
+#include "pvrdma_verbs.h"
+
+/* NOT the same as BIT_MASK(). */
+#define PVRDMA_MASK(n) ((n << 1) - 1)
+
+/*
+ * VMware PVRDMA PCI device id.
+ */
+#define PCI_DEVICE_ID_VMWARE_PVRDMA0x0820
+
+struct pvrdma_dev;
+
+struct pvrdma_page_dir {
+   dma_addr_t dir_dma;
+   u64 *dir;
+   int ntables;
+   u64 **tables;
+   u64 npages;
+   void **pages;
+};
+
+struct pvrdma_cq {
+   struct ib_cq ibcq;
+   int offset;
+   spinlock_t cq_lock; /* Poll lock. */
+   struct pvrdma_uar_map *uar;
+   struct ib_umem *umem;
+   struct pvrdma_ring_state *ring_state;
+   struct pvrdma_page_dir pdir;
+   u32 cq_handle;
+   bool is_kernel;
+   atomic_t refcnt;
+   wait_queue_head_t wait;
+};
+
+struct pvrdma_id_table {
+   u32 last;
+   u32 top;
+   u32 max;
+   u32 mask;
+   spinlock_t lock; /* Table lock. */
+   unsigned long *table;
+};
+
+struct pvrdma_uar_map {
+   unsigned long pfn;
+   void __iomem *map;
+   int index;
+};
+
+struct pvrdma_uar_table {
+   struct pvrdma_id_table tbl;
+   int size;
+};
+
+struct pvrdma_ucontext {
+   struct ib_ucontext 

[PATCH v5 08/16] IB/pvrdma: Add device command support

2016-09-24 Thread Adit Ranadive
This patch enables posting Verb requests and receiving responses to/from
the backend PVRDMA emulation layer.

Reviewed-by: Yuval Shaia 
Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Moved the timeout to pvrdma_cmd_recv.
 - Added additional response code parameter to pvrdma_cmd_post.

Changes v3->v4:
 - Removed the min check and added a BUILD_BUG_ON check for size.

Changes v2->v3:
 - Converted pvrdma_cmd_recv to inline.
 - Added a min check in the memcpy to cmd_slot.
 - Removed the boolean from pvrdma_cmd_post.
---
 drivers/infiniband/hw/pvrdma/pvrdma_cmd.c | 117 ++
 1 file changed, 117 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_cmd.c

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_cmd.c 
b/drivers/infiniband/hw/pvrdma/pvrdma_cmd.c
new file mode 100644
index 000..21f1af8
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma_cmd.c
@@ -0,0 +1,117 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+
+#include "pvrdma.h"
+
+#define PVRDMA_CMD_TIMEOUT 1 /* ms */
+
+static inline int pvrdma_cmd_recv(struct pvrdma_dev *dev,
+ union pvrdma_cmd_resp *resp,
+ unsigned resp_code)
+{
+   int err;
+
+   dev_dbg(>pdev->dev, "receive response from device\n");
+
+   err = wait_for_completion_interruptible_timeout(>cmd_done,
+   msecs_to_jiffies(PVRDMA_CMD_TIMEOUT));
+   if (err == 0 || err == -ERESTARTSYS) {
+   dev_warn(>pdev->dev,
+"completion timeout or interrupted\n");
+   return -ETIMEDOUT;
+   }
+
+   spin_lock(>cmd_lock);
+   memcpy(resp, dev->resp_slot, sizeof(*resp));
+   spin_unlock(>cmd_lock);
+
+   if (resp->hdr.ack != resp_code) {
+   dev_warn(>pdev->dev,
+"unknown response %#x expected %#x\n",
+resp->hdr.ack, resp_code);
+   return -EFAULT;
+   }
+
+   return 0;
+}
+
+int
+pvrdma_cmd_post(struct pvrdma_dev *dev, union pvrdma_cmd_req *req,
+   union pvrdma_cmd_resp *resp, unsigned resp_code)
+{
+   int err;
+
+   dev_dbg(>pdev->dev, "post request to device\n");
+
+   /* Serializiation */
+   down(>cmd_sema);
+
+   BUILD_BUG_ON(sizeof(union pvrdma_cmd_req) !=
+sizeof(struct pvrdma_cmd_modify_qp));
+
+   spin_lock(>cmd_lock);
+   memcpy(dev->cmd_slot, req, sizeof(*req));
+   spin_unlock(>cmd_lock);
+
+   init_completion(>cmd_done);
+   

[PATCH v5 16/16] MAINTAINERS: Update for PVRDMA driver

2016-09-24 Thread Adit Ranadive
Add maintainer info for the PVRDMA driver.

Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Added pvrdma files to common UAPI folder.
---
 MAINTAINERS | 9 +
 1 file changed, 9 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 87e23cd..5023dc0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12615,6 +12615,15 @@ S: Maintained
 F: drivers/scsi/vmw_pvscsi.c
 F: drivers/scsi/vmw_pvscsi.h
 
+VMWARE PVRDMA DRIVER
+M: Adit Ranadive 
+M: VMware PV-Drivers 
+L: linux-r...@vger.kernel.org
+S: Maintained
+F: drivers/infiniband/hw/pvrdma/
+F: include/uapi/rdma/pvrdma-abi.h
+F: include/uapi/rdma/pvrdma-uapi.h
+
 VOLTAGE AND CURRENT REGULATOR FRAMEWORK
 M: Liam Girdwood 
 M: Mark Brown 
-- 
2.7.4



[PATCH v5 12/16] IB/pvrdma: Add Queue Pair support

2016-09-24 Thread Adit Ranadive
This patch adds the ability to create, modify, query and destroy QPs. The
PVRDMA device supports RC, UD and GSI QPs.

Reviewed-by: Yuval Shaia 
Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Updated include for headers in UAPI folder.
 - Update to pvrdma_cmd_post for creating/destroying/querying/modifying QPs.
 - Use the pvrdma_sge struct when posting WRs/allocating QP memory.
 - Removed two set but unused variables.

Changes v3->v4:
 - Removed an unnecessary switch case.
 - Unified the returns in pvrdma_create_qp to use one exit point.
 - Renamed pvrdma_flush_cqe to _pvrdma_flush_cqe since we need a lock to
 be held when calling this.
 - Updated to use wrapper for UAR write for QP.
 - Updated conversion function to func_name(dst, src) format.
 - Renamed max_gs to max_sg.
 - Renamed cap variable to req_cap in pvrdma_set_sq/rq_size.
 - Changed dev_warn to dev_warn_ratelimited in pvrdma_post_send/recv.
 - Added nesting locking for flushing CQs when destroying/resetting a QP.
 - Added missing ret value.

Changes v2->v3:
 - Removed boolean in pvrdma_cmd_post.
---
 drivers/infiniband/hw/pvrdma/pvrdma_qp.c | 973 +++
 1 file changed, 973 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_qp.c

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_qp.c 
b/drivers/infiniband/hw/pvrdma/pvrdma_qp.c
new file mode 100644
index 000..52c689d
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma_qp.c
@@ -0,0 +1,973 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "pvrdma.h"
+
+static inline void get_cqs(struct pvrdma_qp *qp, struct pvrdma_cq **send_cq,
+  struct pvrdma_cq **recv_cq)
+{
+   *send_cq = to_vcq(qp->ibqp.send_cq);
+   *recv_cq = to_vcq(qp->ibqp.recv_cq);
+}
+
+static void pvrdma_lock_cqs(struct pvrdma_cq *scq, struct pvrdma_cq *rcq,
+   unsigned long *scq_flags,
+   unsigned long *rcq_flags)
+   __acquires(scq->cq_lock) __acquires(rcq->cq_lock)
+{
+   if (scq == rcq) {
+   spin_lock_irqsave(>cq_lock, *scq_flags);
+   __acquire(rcq->cq_lock);
+   } else if (scq->cq_handle < rcq->cq_handle) {
+   spin_lock_irqsave(>cq_lock, *scq_flags);
+   spin_lock_irqsave_nested(>cq_lock, *rcq_flags,
+SINGLE_DEPTH_NESTING);
+   } else {
+   spin_lock_irqsave(>cq_lock, *rcq_flags);
+   

[PATCH v5 15/16] IB: Add PVRDMA driver

2016-09-24 Thread Adit Ranadive
This patch updates the InfiniBand subsystem to build the PVRDMA driver.

Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
 drivers/infiniband/Kconfig | 1 +
 drivers/infiniband/hw/Makefile | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 19a418a..dff4bcf 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -88,5 +88,6 @@ source "drivers/infiniband/sw/rdmavt/Kconfig"
 source "drivers/infiniband/sw/rxe/Kconfig"
 
 source "drivers/infiniband/hw/hfi1/Kconfig"
+source "drivers/infiniband/hw/pvrdma/Kconfig"
 
 endif # INFINIBAND
diff --git a/drivers/infiniband/hw/Makefile b/drivers/infiniband/hw/Makefile
index 21fe401..c8a7a36 100644
--- a/drivers/infiniband/hw/Makefile
+++ b/drivers/infiniband/hw/Makefile
@@ -10,3 +10,4 @@ obj-$(CONFIG_INFINIBAND_OCRDMA)   += ocrdma/
 obj-$(CONFIG_INFINIBAND_USNIC) += usnic/
 obj-$(CONFIG_INFINIBAND_HFI1)  += hfi1/
 obj-$(CONFIG_INFINIBAND_HNS)   += hns/
+obj-$(CONFIG_INFINIBAND_PVRDMA)+= pvrdma/
-- 
2.7.4



[PATCH v5 14/16] IB/pvrdma: Add Kconfig and Makefile

2016-09-24 Thread Adit Ranadive
This patch adds a Kconfig and Makefile for the PVRDMA driver.

Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v3->v4:
 - Enforced dependency on VMXNet3
---
 drivers/infiniband/hw/pvrdma/Kconfig  | 7 +++
 drivers/infiniband/hw/pvrdma/Makefile | 3 +++
 2 files changed, 10 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/Kconfig
 create mode 100644 drivers/infiniband/hw/pvrdma/Makefile

diff --git a/drivers/infiniband/hw/pvrdma/Kconfig 
b/drivers/infiniband/hw/pvrdma/Kconfig
new file mode 100644
index 000..b345679
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/Kconfig
@@ -0,0 +1,7 @@
+config INFINIBAND_PVRDMA
+   tristate "VMware Paravirtualized RDMA Driver"
+   depends on NETDEVICES && ETHERNET && PCI && INET && VMXNET3
+   ---help---
+ This driver provides low-level support for VMware Paravirtual
+ RDMA adapter. It interacts with the VMXNet3 driver to provide
+ Ethernet capabilities.
diff --git a/drivers/infiniband/hw/pvrdma/Makefile 
b/drivers/infiniband/hw/pvrdma/Makefile
new file mode 100644
index 000..e6f078b
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_INFINIBAND_PVRDMA) += pvrdma.o
+
+pvrdma-y := pvrdma_cmd.o pvrdma_cq.o pvrdma_doorbell.o pvrdma_main.o 
pvrdma_misc.o pvrdma_mr.o pvrdma_qp.o pvrdma_verbs.o
-- 
2.7.4



[PATCH v5 00/16] Add Paravirtual RDMA Driver

2016-09-24 Thread Adit Ranadive
Hi Doug, others,

This patch series adds a driver for a paravirtual RDMA device. The device
is developed for VMware's Virtual Machines and allows existing RDMA
applications to continue to use existing Verbs API when deployed in VMs on
ESXi. We recently did a presentation in the OFA Workshop [1] regarding this
device.

Description and RDMA Support

The virtual device is exposed as a dual function PCIe device. One part is
a virtual network device (VMXNet3) which provides networking properties
like MAC, IP addresses to the RDMA part of the device. The networking
properties are used to register GIDs required by RDMA applications to
communicate.

These patches add support and the all required infrastructure for letting
applications use such a device. We support the mandatory Verbs API as well
as the base memory management extensions (Local Inv, Send with Inv and Fast
Register Work Requests). We currently support both Reliable Connected and
Unreliable Datagram QPs but do not support Shared Receive Queues (SRQs).
Also, we support the following types of Work Requests:
 o Send/Receive (with or without Immediate Data)
 o RDMA Write (with or without Immediate Data)
 o RDMA Read
 o Local Invalidate
 o Send with Invalidate
 o Fast Register Work Requests

This version only adds support for version 1 of RoCE. We will add RoCEv2
support in a future patch. We do support registration of both MAC-based and
IP-based GIDs. I have also created a git tree for our user-level driver [2].

Testing
===
We have tested this internally for various types of Guest OS - Red Hat,
Centos, Ubuntu 12.04/14.04/16.04, Oracle Enterprise Linux, SLES 12
using backported versions of this driver. The tests included several runs
of the performance tests (included with OFED), Intel MPI PingPong benchmark
on OpenMPI, krping for FRWRs. Mellanox has been kind enough to test the
backported version of the driver internally on their hardware using a
VMware provided ESX build. I have also applied and tested this with Doug's
k.o/for-4.9 branch (commit 64278fe). Note, that this patch series should be
applied all together. I split out the commits so that it may be easier to
review.

PVRDMA Resources

[1] OFA Workshop Presentation - 
https://openfabrics.org/images/eventpresos/2016presentations/102parardma.pdf
[2] Libpvrdma User-level library - 
http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summary
---
Changes v4->v5:
 - PATCH [02/16]
 - Moved pvrdma_uapi.h and pvrdma_user.h into common UAPI folder.
 - Renamed to pvrdma-uapi.h and pvrdma-abi.h respectively.
 - Prefixed unsigned vars with __.
 - PATCH [03/16]
 - Removed __ prefix for unsigned vars.
 - PATCH [04/16]
 - Update include for headers moved to UAPI.
 - Removed __ prefix for unsigned vars.
 - PATCH [05/16]
 - Update include for headers in UAPI folder.
 - Removed setting any properties that are reported by device as 0.
 - Simplified modify_port.
 - PD should be allocated first in kernel then in device.
 - Update to pvrdma_cmd_post for creating/destroying PD, Query port/device.
 - PATCH [06/16]
 - pvrdma_cmd_post takes the response code.
 - PATCH [07/16]
 - Correct var type passed to dma_alloc_coherent.
 - PATCH [08/16]
 - Moved the timeout to pvrdma_cmd_recv.
 - Added additional response code parameter to pvrdma_cmd_post.
 - PATCH [09/16]
 - Updated include for headers in UAPI folder.
 - Changed from EINVAL to ENOMEM if atomic add fails.
 - Added error code if destroy cq command failed.
 - Update to pvrdma_cmd_post for creating/destroying CQ.
 - PATCH [11/16]
 - Check the access flags correctly for DMA MR.
 - Update to pvrdma_cmd_post for creating/destroying MRs.
 - PATCH [12/16]
 - Updated include for headers in UAPI folder.
 - Update to pvrdma_cmd_post for creating/destroying/querying/modifying QPs.
 - Use the pvrdma_sge struct when posting WRs/allocating QP memory.
 - Removed two set but unused variables.
 - PATCH [13/16]
 - Removed two unnecessary lines.
 - Updated include for headers in UAPI folder.
 - Update to pvrdma_cmd_post for add/delete GIDs.
 - Add error code in dev_warn if pvrdma_cmd_post failed.
 - PATCH [16/16]
 - Added pvrdma files to common UAPI folder.

Changes v3->v4:
 - Rebased on for-4.9 branch - commit 64278fe89b729
   ("Merge branch 'hns-roce' into k.o/for-4.9")
 - PATCH [01/16]
 - New in v4 - Moved vmxnet3 id to pci_ids.h
 - PATCH [02,03/16]
 - pvrdma_sge was moved into pvrdma_uapi.h
 - PATCH [04/16]
 - Removed explicit enum values.
 - PATCH [05/16]
 - Renamed priviledged -> privileged.
 - Added error numbers for command errors.
 - Removed unnecessary goto in modify_device.
 - Moved pd allocation to after command execution.
 - Removed an incorrect atomic_dec.
 - PATCH [06/16]
 - Renamed priviledged -> privileged.
 - Renamed pvrdma_flush_cqe to _pvrdma_flush_cqe since we 

[PATCH v5 01/16] vmxnet3: Move PCI Id to pci_ids.h

2016-09-24 Thread Adit Ranadive
The VMXNet3 PCI Id will be shared with our paravirtual RDMA driver.
Moved it to the shared location in pci_ids.h.

Suggested-by: Leon Romanovsky 
Acked-by: Bjorn Helgaas 
Reviewed-by: Yuval Shaia 
Signed-off-by: Adit Ranadive 
---
 drivers/net/vmxnet3/vmxnet3_int.h | 3 +--
 include/linux/pci_ids.h   | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_int.h 
b/drivers/net/vmxnet3/vmxnet3_int.h
index 74fc030..2bd6bf8 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -119,9 +119,8 @@ enum {
 };
 
 /*
- * PCI vendor and device IDs.
+ * Maximum devices supported.
  */
-#define PCI_DEVICE_ID_VMWARE_VMXNET30x07B0
 #define MAX_ETHERNET_CARDS 10
 #define MAX_PCI_PASSTHRU_DEVICE6
 
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index c58752f..98bb455 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2251,6 +2251,7 @@
 #define PCI_DEVICE_ID_RASTEL_2PORT 0x2000
 
 #define PCI_VENDOR_ID_VMWARE   0x15ad
+#define PCI_DEVICE_ID_VMWARE_VMXNET3   0x07b0
 
 #define PCI_VENDOR_ID_ZOLTRIX  0x15b0
 #define PCI_DEVICE_ID_ZOLTRIX_2BD0 0x2bd0
-- 
2.7.4



[PATCH v5 02/16] IB/pvrdma: Add user-level shared functions

2016-09-24 Thread Adit Ranadive
We share some common structures with the user-level driver. This patch adds
those structures and shared functions to traverse the QP/CQ rings.

Reviewed-by: Yuval Shaia 
Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Moved pvrdma_uapi.h and pvrdma_user.h into common UAPI folder.
 - Renamed to pvrdma-uapi.h and pvrdma-abi.h respectively.
 - Prefixed unsigned vars with __.

Changes v3->v4:
 - Moved pvrdma_sge into pvrdma_uapi.h
---
 include/uapi/rdma/Kbuild|   2 +
 include/uapi/rdma/pvrdma-abi.h  |  99 
 include/uapi/rdma/pvrdma-uapi.h | 255 
 3 files changed, 356 insertions(+)
 create mode 100644 include/uapi/rdma/pvrdma-abi.h
 create mode 100644 include/uapi/rdma/pvrdma-uapi.h

diff --git a/include/uapi/rdma/Kbuild b/include/uapi/rdma/Kbuild
index 4edb0f2..fc2b285 100644
--- a/include/uapi/rdma/Kbuild
+++ b/include/uapi/rdma/Kbuild
@@ -7,3 +7,5 @@ header-y += rdma_netlink.h
 header-y += rdma_user_cm.h
 header-y += hfi/
 header-y += rdma_user_rxe.h
+header-y += pvrdma-abi.h
+header-y += pvrdma-uapi.h
diff --git a/include/uapi/rdma/pvrdma-abi.h b/include/uapi/rdma/pvrdma-abi.h
new file mode 100644
index 000..6fa0ab6
--- /dev/null
+++ b/include/uapi/rdma/pvrdma-abi.h
@@ -0,0 +1,99 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __PVRDMA_USER_H__
+#define __PVRDMA_USER_H__
+
+#include 
+
+#define PVRDMA_UVERBS_ABI_VERSION  3
+#define PVRDMA_BOARD_ID1
+#define PVRDMA_REV_ID  1
+
+struct pvrdma_alloc_ucontext_resp {
+   __u32 qp_tab_size;
+   __u32 reserved;
+};
+
+struct pvrdma_alloc_pd_resp {
+   __u32 pdn;
+   __u32 reserved;
+};
+
+struct pvrdma_create_cq {
+   __u64 buf_addr;
+   __u32 buf_size;
+   __u32 reserved;
+};
+
+struct pvrdma_create_cq_resp {
+   __u32 cqn;
+   __u32 reserved;
+};
+
+struct pvrdma_resize_cq {
+   __u64 buf_addr;
+   __u32 buf_size;
+   __u32 reserved;
+};
+
+struct pvrdma_create_srq {
+   __u64 buf_addr;
+};
+
+struct pvrdma_create_srq_resp {
+   __u32 srqn;
+   __u32 reserved;
+};
+
+struct pvrdma_create_qp {
+   __u64 rbuf_addr;
+   __u64 sbuf_addr;
+   __u32 rbuf_size;
+   __u32 sbuf_size;
+   __u64 qp_addr;
+};
+
+#endif /* __PVRDMA_USER_H__ */
diff --git a/include/uapi/rdma/pvrdma-uapi.h b/include/uapi/rdma/pvrdma-uapi.h
new file mode 100644
index 000..430d8a5
--- /dev/null
+++ b/include/uapi/rdma/pvrdma-uapi.h
@@ -0,0 +1,255 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.

[PATCH v5 04/16] IB/pvrdma: Add the paravirtual RDMA device specification

2016-09-24 Thread Adit Ranadive
This patch describes the main specification of the underlying virtual RDMA
device. The pvrdma_dev_api header file defines the Verbs commands and
their parameters that can be issued to the device backend.

Reviewed-by: Yuval Shaia 
Reviewed-by: Jorgen Hansen 
Reviewed-by: George Zhang 
Reviewed-by: Aditya Sarwade 
Reviewed-by: Bryan Tan 
Signed-off-by: Adit Ranadive 
---
Changes v4->v5:
 - Update include for headers moved to UAPI.
 - Removed __ prefix for unsigned vars.

Changes v3->v4:
 - Removed explicit enum values.

Changes v2->v3:
 - Defined 9 and 18 for page directory.
 - Stripped spaces in some comments.
---
 drivers/infiniband/hw/pvrdma/pvrdma_defs.h| 301 +++
 drivers/infiniband/hw/pvrdma/pvrdma_dev_api.h | 342 ++
 2 files changed, 643 insertions(+)
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_defs.h
 create mode 100644 drivers/infiniband/hw/pvrdma/pvrdma_dev_api.h

diff --git a/drivers/infiniband/hw/pvrdma/pvrdma_defs.h 
b/drivers/infiniband/hw/pvrdma/pvrdma_defs.h
new file mode 100644
index 000..8105b01
--- /dev/null
+++ b/drivers/infiniband/hw/pvrdma/pvrdma_defs.h
@@ -0,0 +1,301 @@
+/*
+ * Copyright (c) 2012-2016 VMware, Inc.  All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of EITHER the GNU General Public License
+ * version 2 as published by the Free Software Foundation or the BSD
+ * 2-Clause License. This program is distributed in the hope that it
+ * will be useful, but WITHOUT ANY WARRANTY; WITHOUT EVEN THE IMPLIED
+ * WARRANTY OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
+ * See the GNU General Public License version 2 for more details at
+ * http://www.gnu.org/licenses/old-licenses/gpl-2.0.en.html.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program available in the file COPYING in the main
+ * directory of this source tree.
+ *
+ * The BSD 2-Clause License
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  - Redistributions of source code must retain the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer.
+ *
+ *  - Redistributions in binary form must reproduce the above
+ *copyright notice, this list of conditions and the following
+ *disclaimer in the documentation and/or other materials
+ *provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+ * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+ * COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
+ * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+ * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
+ * OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __PVRDMA_DEFS_H__
+#define __PVRDMA_DEFS_H__
+
+#include 
+#include 
+#include "pvrdma_ib_verbs.h"
+
+/*
+ * Masks and accessors for page directory, which is a two-level lookup:
+ * page directory -> page table -> page. Only one directory for now, but we
+ * could expand that easily. 9 bits for tables, 9 bits for pages, gives one
+ * gigabyte for memory regions and so forth.
+ */
+
+#define PVRDMA_PDIR_SHIFT  18
+#define PVRDMA_PTABLE_SHIFT9
+#define PVRDMA_PAGE_DIR_DIR(x) (((x) >> PVRDMA_PDIR_SHIFT) & 0x1)
+#define PVRDMA_PAGE_DIR_TABLE(x)   (((x) >> PVRDMA_PTABLE_SHIFT) & 0x1ff)
+#define PVRDMA_PAGE_DIR_PAGE(x)((x) & 0x1ff)
+#define PVRDMA_PAGE_DIR_MAX_PAGES  (1 * 512 * 512)
+#define PVRDMA_MAX_FAST_REG_PAGES  128
+
+/*
+ * Max MSI-X vectors.
+ */
+
+#define PVRDMA_MAX_INTERRUPTS  3
+
+/* Register offsets within PCI resource on BAR1. */
+#define PVRDMA_REG_VERSION 0x00/* R: Version of device. */
+#define PVRDMA_REG_DSRLOW  0x04/* W: Device shared region low PA. */
+#define PVRDMA_REG_DSRHIGH 0x08/* W: Device shared region high PA. */
+#define PVRDMA_REG_CTL 0x0c/* W: PVRDMA_DEVICE_CTL */
+#define PVRDMA_REG_REQUEST 0x10/* W: Indicate device request. */
+#define PVRDMA_REG_ERR 0x14/* R: Device error. */
+#define PVRDMA_REG_ICR 0x18/* R: Interrupt cause. */
+#define PVRDMA_REG_IMR 0x1c/* R/W: 

[PATCH 2/2] brcmfmac: compile fws(ignal) code only with BCDC support enabled

2016-09-24 Thread Rafał Miłecki
From: Rafał Miłecki 

It's not needed by the other (msgbuf) protocol, so let's save some size
and compile it conditionally.

Signed-off-by: Rafał Miłecki 
---
 .../wireless/broadcom/brcm80211/brcmfmac/Makefile  |  4 +-
 .../broadcom/brcm80211/brcmfmac/fwsignal.h | 59 ++
 2 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/Makefile 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/Makefile
index 9e4b505..ad3b06e 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/Makefile
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/Makefile
@@ -27,7 +27,6 @@ brcmfmac-objs += \
chip.o \
fwil.o \
fweh.o \
-   fwsignal.o \
p2p.o \
proto.o \
common.o \
@@ -37,7 +36,8 @@ brcmfmac-objs += \
btcoex.o \
vendor.o
 brcmfmac-$(CONFIG_BRCMFMAC_PROTO_BCDC) += \
-   bcdc.o
+   bcdc.o \
+   fwsignal.o
 brcmfmac-$(CONFIG_BRCMFMAC_PROTO_MSGBUF) += \
commonring.o \
flowring.o \
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.h 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.h
index 8f7c1d7..ba0c1bc 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.h
@@ -18,6 +18,7 @@
 #ifndef FWSIGNAL_H_
 #define FWSIGNAL_H_
 
+#ifdef CONFIG_BRCMFMAC_PROTO_BCDC
 int brcmf_fws_init(struct brcmf_pub *drvr);
 void brcmf_fws_deinit(struct brcmf_pub *drvr);
 bool brcmf_fws_skbs_queueing(struct brcmf_fws_info *fws);
@@ -31,5 +32,63 @@ void brcmf_fws_del_interface(struct brcmf_if *ifp);
 void brcmf_fws_bustxfail(struct brcmf_fws_info *fws, struct sk_buff *skb);
 void brcmf_fws_bus_blocked(struct brcmf_pub *drvr, bool flow_blocked);
 void brcmf_fws_rxreorder(struct brcmf_if *ifp, struct sk_buff *skb);
+#else
+static inline int brcmf_fws_init(struct brcmf_pub *drvr)
+{
+   return -ENOTSUPP;
+}
+
+static inline void brcmf_fws_deinit(struct brcmf_pub *drvr)
+{
+}
+
+static inline bool brcmf_fws_skbs_queueing(struct brcmf_fws_info *fws)
+{
+   return false;
+}
+
+static inline bool brcmf_fws_fc_active(struct brcmf_fws_info *fws)
+{
+   return false;
+}
+
+static inline void brcmf_fws_hdrpull(struct brcmf_if *ifp, s16 siglen,
+struct sk_buff *skb)
+{
+}
+
+static inline int brcmf_fws_process_skb(struct brcmf_if *ifp,
+   struct sk_buff *skb)
+{
+   return -ENOTSUPP;
+}
+
+static inline void brcmf_fws_reset_interface(struct brcmf_if *ifp)
+{
+}
+
+static inline void brcmf_fws_add_interface(struct brcmf_if *ifp)
+{
+}
+
+static inline void brcmf_fws_del_interface(struct brcmf_if *ifp)
+{
+}
+
+static inline void brcmf_fws_bustxfail(struct brcmf_fws_info *fws,
+  struct sk_buff *skb)
+{
+}
+
+static inline void brcmf_fws_bus_blocked(struct brcmf_pub *drvr,
+bool flow_blocked)
+{
+}
+
+static inline void brcmf_fws_rxreorder(struct brcmf_if *ifp,
+  struct sk_buff *skb)
+{
+}
+#endif
 
 #endif /* FWSIGNAL_H_ */
-- 
2.9.3



[PATCH 1/2] brcmfmac: initialize fws(ignal) for BCDC protocol only

2016-09-24 Thread Rafał Miłecki
From: Rafał Miłecki 

There are two protocols used by Broadcom FullMAC devices: BCDC and
msgbuf. They use different ways for (some part of) communication with
the firmware. Firmware Signaling is required for the first one only
(BCDC).

So far we were always initializing fws and always calling it's skb
processing function. It was fws that was passing skb processing to the
protocol specific function. It was redundant for the msgbuf case.

Simply taking few lines of code out of fws allows us to totally avoid
using it. This simplifies code flow, saves some memory & will allow
further optimizations like not compiling fwsignal.c.

Signed-off-by: Rafał Miłecki 
---
 .../wireless/broadcom/brcm80211/brcmfmac/core.c| 24 --
 .../broadcom/brcm80211/brcmfmac/fwsignal.c | 17 ++-
 .../broadcom/brcm80211/brcmfmac/fwsignal.h |  1 +
 3 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 27cd50a..bc3d8ab 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -250,7 +250,17 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff 
*skb,
if (eh->h_proto == htons(ETH_P_PAE))
atomic_inc(>pend_8021x_cnt);
 
-   ret = brcmf_fws_process_skb(ifp, skb);
+   /* determine the priority */
+   if (skb->priority == 0 || skb->priority > 7)
+   skb->priority = cfg80211_classify8021d(skb, NULL);
+
+   if (drvr->fws && brcmf_fws_skbs_queueing(drvr->fws)) {
+   ret = brcmf_fws_process_skb(ifp, skb);
+   } else {
+   ret = brcmf_proto_txdata(drvr, ifp->ifidx, 0, skb);
+   if (ret < 0)
+   brcmf_txfinalize(ifp, skb, false);
+   }
 
 done:
if (ret) {
@@ -405,7 +415,7 @@ void brcmf_txcomplete(struct device *dev, struct sk_buff 
*txp, bool success)
struct brcmf_if *ifp;
 
/* await txstatus signal for firmware if active */
-   if (brcmf_fws_fc_active(drvr->fws)) {
+   if (drvr->fws && brcmf_fws_fc_active(drvr->fws)) {
if (!success)
brcmf_fws_bustxfail(drvr->fws, txp);
} else {
@@ -1006,11 +1016,13 @@ int brcmf_bus_start(struct device *dev)
}
brcmf_feat_attach(drvr);
 
-   ret = brcmf_fws_init(drvr);
-   if (ret < 0)
-   goto fail;
+   if (bus_if->proto_type == BRCMF_PROTO_BCDC) {
+   ret = brcmf_fws_init(drvr);
+   if (ret < 0)
+   goto fail;
 
-   brcmf_fws_add_interface(ifp);
+   brcmf_fws_add_interface(ifp);
+   }
 
drvr->config = brcmf_cfg80211_attach(drvr, bus_if->dev,
 drvr->settings->p2p_enable);
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
index a190f53..495eaf8 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.c
@@ -2100,16 +2100,6 @@ int brcmf_fws_process_skb(struct brcmf_if *ifp, struct 
sk_buff *skb)
int rc = 0;
 
brcmf_dbg(DATA, "tx proto=0x%X\n", ntohs(eh->h_proto));
-   /* determine the priority */
-   if ((skb->priority == 0) || (skb->priority > 7))
-   skb->priority = cfg80211_classify8021d(skb, NULL);
-
-   if (fws->avoid_queueing) {
-   rc = brcmf_proto_txdata(drvr, ifp->ifidx, 0, skb);
-   if (rc < 0)
-   brcmf_txfinalize(ifp, skb, false);
-   return rc;
-   }
 
/* set control buffer information */
skcb->if_flags = 0;
@@ -2155,7 +2145,7 @@ void brcmf_fws_add_interface(struct brcmf_if *ifp)
struct brcmf_fws_info *fws = ifp->drvr->fws;
struct brcmf_fws_mac_descriptor *entry;
 
-   if (!ifp->ndev)
+   if (!fws || !ifp->ndev)
return;
 
entry = >desc.iface[ifp->ifidx];
@@ -2442,6 +2432,11 @@ void brcmf_fws_deinit(struct brcmf_pub *drvr)
kfree(fws);
 }
 
+bool brcmf_fws_skbs_queueing(struct brcmf_fws_info *fws)
+{
+   return !fws->avoid_queueing;
+}
+
 bool brcmf_fws_fc_active(struct brcmf_fws_info *fws)
 {
if (!fws->creditmap_received)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.h 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.h
index ef0ad85..8f7c1d7 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/fwsignal.h
@@ -20,6 +20,7 @@
 
 int brcmf_fws_init(struct brcmf_pub *drvr);
 void brcmf_fws_deinit(struct brcmf_pub *drvr);
+bool brcmf_fws_skbs_queueing(struct brcmf_fws_info *fws);
 bool brcmf_fws_fc_active(struct 

Re: [PATCH] realtek: Add switch variable to 'switch case not processed' messages

2016-09-24 Thread Jes Sorensen
Joe Perches  writes:
> On Sat, 2016-09-24 at 14:06 -0500, Larry Finger wrote:
>> On 09/24/2016 12:32 PM, Joe Perches wrote:
> []
>> o Reindent all the switch/case blocks to a more normal
>>   kernel style (git diff -w would show no changes here)
>> That sounds like busy work to me, but if you want to do it, go ahead.
>
> It's really just to make the comparison case block reductions
> easier to verify for later steps done
>
>> > o cast, spacing and parenthesis reductions
>> >   Lots of odd and somewhat unique styles in various
>> >   drivers, looks like too many individual authors without
>> >   a style guide / code enforcer using slightly different
>> >   personalized code.  Glancing at the code, it looks to be
>> >   similar logic, just written in different styles.
>> Same comment.
>
> Same rationale
>
>> > o Logic changes like
>> >   from:
>> > if (foo) func(..., bar, ...); else func(..., baz, ...);
>> >   to:
>> > func(..., foo ? bar : baz, ...);
>> >   to make the case statement code blocks more consistent
>> >   and emit somewhat smaller object code.
>> I find if .. else constructs much easier to read than the cond ?  :  
>> form. I would reject any such patches.
>
>  I think object code reduction generally a good thing
> but then again, I'm not a maintainer here.

I missed this part, but I am with Larry here - 'foo ? bar : boo' are
just obfuscating the code and far less clear than if or switch
statements.

Jes


Re: [PATCH] realtek: Add switch variable to 'switch case not processed' messages

2016-09-24 Thread Joe Perches
On Sat, 2016-09-24 at 14:06 -0500, Larry Finger wrote:
> On 09/24/2016 12:32 PM, Joe Perches wrote:
[]
> o Reindent all the switch/case blocks to a more normal
>   kernel style (git diff -w would show no changes here)
> That sounds like busy work to me, but if you want to do it, go ahead.

It's really just to make the comparison case block reductions
easier to verify for later steps done

> > o cast, spacing and parenthesis reductions
> >   Lots of odd and somewhat unique styles in various
> >   drivers, looks like too many individual authors without
> >   a style guide / code enforcer using slightly different
> >   personalized code.  Glancing at the code, it looks to be
> >   similar logic, just written in different styles.
> Same comment.

Same rationale

> > o Logic changes like
> >   from:
> > if (foo) func(..., bar, ...); else func(..., baz, ...);
> >   to:
> > func(..., foo ? bar : baz, ...);
> >   to make the case statement code blocks more consistent
> >   and emit somewhat smaller object code.
> I find if .. else constructs much easier to read than the cond ?  :  
> form. I would reject any such patches.

 I think object code reduction generally a good thing
but then again, I'm not a maintainer here.

> > o Consolidation of equivalent function spanning drivers
> >   With the style only changes minimized, where possible
> >   make the drivers use common ops/callback functions.
> The is no question that there are similar routines in different drivers. I 
> would 
> like to place as much as possible into common routines, but I never seem to 
> find 
> the time. There are too many bugs in other things I support to consider these 
> niceties.

Consolidation generally reduces defects and improves ease of
updating.
> 


Re: [PATCH] realtek: Add switch variable to 'switch case not processed' messages

2016-09-24 Thread Jes Sorensen
Larry Finger  writes:
> On 09/24/2016 12:32 PM, Joe Perches wrote:
>> Is there any value in that or is Jes' work going to make
>> doing any or all of this unnecessary and futile?
>
> That is not yet determined. The only driver that is to be replaced at
> this point is rtl8192cu. Jes only has USB I/O for his driver. We are
> looking at adding SDIO, and once that is done, PCI should be possible.

If someone else wants to address PCI then it could happen quite soon,
but at the current schedule I don't see PCI happen in my driver for at
least a year, probably more.

If you can reduce the size of rtlwifi in the mean time that probably
isn't going to upset a lot of people.

Jes


[PATCH net] Revert "net: ethernet: bcmgenet: use phydev from struct net_device"

2016-09-24 Thread Florian Fainelli
This reverts commit 62469c76007e ("net: ethernet: bcmgenet: use phydev
from struct net_device") because it causes GENETv1/2/3 adapters to
expose the following behavior after an ifconfig down/up sequence:

PING fainelli-linux (10.112.156.244): 56 data bytes
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.352 ms
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.472 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.496 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.517 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.536 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=1.557 ms (DUP!)
64 bytes from 10.112.156.244: seq=1 ttl=61 time=752.448 ms (DUP!)

This was previously fixed by commit 5dbebbb44a6a ("net: bcmgenet:
Software reset EPHY after power on") but the commit we are reverting was
essentially making this previous commit void, here is why.

Without commit 62469c76007e we would have the following scenario after
an ifconfig down then up sequence:

- bcmgenet_open() calls bcmgenet_power_up() to make sure the PHY is
  initialized *before* we get to initialize the UniMAC, this is
  critical to ensure the PHY is in a correct state, priv->phydev is
  valid, this code executes fine

- second time from bcmgenet_mii_probe(), through the normal
  phy_init_hw() call (which arguably could be optimized out)

Everything is fine in that case. With commit 62469c76007e, we would have
the following scenario to happen after an ifconfig down then up
sequence:

- bcmgenet_close() calls phy_disonnect() which makes dev->phydev become
  NULL

- when bcmgenet_open() executes again and calls bcmgenet_mii_reset() from
  bcmgenet_power_up() to initialize the internal PHY, the NULL check
  becomes true, so we do not reset the PHY, yet we keep going on and
  initialize the UniMAC, causing MAC activity to occur

- we call bcmgenet_mii_reset() from bcmgenet_mii_probe(), but this is
  too late, the PHY is botched, and causes the above bogus pings/packets
  transmission/reception to occur

Reported-by: Jaedon Shin 
Signed-off-by: Florian Fainelli 
---
David,

There is already a commit:

Revert "net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings"

which should make this apply cleanly to "net" now.

Thanks!

 drivers/net/ethernet/broadcom/genet/bcmgenet.c | 45 ++
 drivers/net/ethernet/broadcom/genet/bcmgenet.h |  1 +
 drivers/net/ethernet/broadcom/genet/bcmmii.c   | 24 +++---
 3 files changed, 39 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 8d4f8495dbb3..541456398dfb 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -453,25 +453,29 @@ static inline void bcmgenet_rdma_ring_writel(struct 
bcmgenet_priv *priv,
 static int bcmgenet_get_settings(struct net_device *dev,
 struct ethtool_cmd *cmd)
 {
+   struct bcmgenet_priv *priv = netdev_priv(dev);
+
if (!netif_running(dev))
return -EINVAL;
 
-   if (!dev->phydev)
+   if (!priv->phydev)
return -ENODEV;
 
-   return phy_ethtool_gset(dev->phydev, cmd);
+   return phy_ethtool_gset(priv->phydev, cmd);
 }
 
 static int bcmgenet_set_settings(struct net_device *dev,
 struct ethtool_cmd *cmd)
 {
+   struct bcmgenet_priv *priv = netdev_priv(dev);
+
if (!netif_running(dev))
return -EINVAL;
 
-   if (!dev->phydev)
+   if (!priv->phydev)
return -ENODEV;
 
-   return phy_ethtool_sset(dev->phydev, cmd);
+   return phy_ethtool_sset(priv->phydev, cmd);
 }
 
 static int bcmgenet_set_rx_csum(struct net_device *dev,
@@ -937,7 +941,7 @@ static int bcmgenet_get_eee(struct net_device *dev, struct 
ethtool_eee *e)
e->eee_active = p->eee_active;
e->tx_lpi_timer = bcmgenet_umac_readl(priv, UMAC_EEE_LPI_TIMER);
 
-   return phy_ethtool_get_eee(dev->phydev, e);
+   return phy_ethtool_get_eee(priv->phydev, e);
 }
 
 static int bcmgenet_set_eee(struct net_device *dev, struct ethtool_eee *e)
@@ -954,7 +958,7 @@ static int bcmgenet_set_eee(struct net_device *dev, struct 
ethtool_eee *e)
if (!p->eee_enabled) {
bcmgenet_eee_enable_set(dev, false);
} else {
-   ret = phy_init_eee(dev->phydev, 0);
+   ret = phy_init_eee(priv->phydev, 0);
if (ret) {
netif_err(priv, hw, dev, "EEE initialization failed\n");
return ret;
@@ -964,12 +968,14 @@ static int bcmgenet_set_eee(struct net_device *dev, 
struct ethtool_eee *e)
bcmgenet_eee_enable_set(dev, true);
}
 
-   return phy_ethtool_set_eee(dev->phydev, e);
+   return phy_ethtool_set_eee(priv->phydev, e);
 }
 
 static int 

Re: [PATCH] realtek: Add switch variable to 'switch case not processed' messages

2016-09-24 Thread Larry Finger

On 09/24/2016 12:32 PM, Joe Perches wrote:

(adding Jes Sorensen to recipients)

On Sat, 2016-09-24 at 11:35 -0500, Larry Finger wrote:

I have patches that makes HAL_DEF_WOWLAN be a no-op for the rest of the drivers,
and one that sets the enum values for that particular statement to hex values. I
also looked at the other large enums and decided that they never need the human
lookup.


Hey Larry.

There are many somewhat common realtek wireless drivers.

Not to step on your toes, but what do you think of
rationalizing the switch/case statements of all the
realtek drivers in a few steps:

o Reindent all the switch/case blocks to a more normal
  kernel style (git diff -w would show no changes here)


That sounds like busy work to me, but if you want to do it, go ahead.


o cast, spacing and parenthesis reductions
  Lots of odd and somewhat unique styles in various
  drivers, looks like too many individual authors without
  a style guide / code enforcer using slightly different
  personalized code.  Glancing at the code, it looks to be
  similar logic, just written in different styles.


Same comment.


o Logic changes like
  from:
if (foo) func(..., bar, ...); else func(..., baz, ...);
  to:
func(..., foo ? bar : baz, ...);
  to make the case statement code blocks more consistent
  and emit somewhat smaller object code.


I find if .. else constructs much easier to read than the cond ?  :  
form. I would reject any such patches.



o Consolidation of equivalent function spanning drivers
  With the style only changes minimized, where possible
  make the drivers use common ops/callback functions.


The is no question that there are similar routines in different drivers. I would 
like to place as much as possible into common routines, but I never seem to find 
the time. There are too many bugs in other things I support to consider these 
niceties.



Is there any value in that or is Jes' work going to make
doing any or all of this unnecessary and futile?


That is not yet determined. The only driver that is to be replaced at this point 
is rtl8192cu. Jes only has USB I/O for his driver. We are looking at adding 
SDIO, and once that is done, PCI should be possible.


Larry





[PATCH v3] bpf: Set register type according to is_valid_access()

2016-09-24 Thread Mickaël Salaün
This prevent future potential pointer leaks when an unprivileged eBPF
program will read a pointer value from its context. Even if
is_valid_access() returns a pointer type, the eBPF verifier replace it
with UNKNOWN_VALUE. The register value that contains a kernel address is
then allowed to leak. Moreover, this fix allows unprivileged eBPF
programs to use functions with (legitimate) pointer arguments.

Not an issue currently since reg_type is only set for PTR_TO_PACKET or
PTR_TO_PACKET_END in XDP and TC programs that can only be loaded as
privileged. For now, the only unprivileged eBPF program allowed is for
socket filtering and all the types from its context are UNKNOWN_VALUE.
However, this fix is important for future unprivileged eBPF programs
which could use pointers in their context.

Signed-off-by: Mickaël Salaün 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index daea765d72e6..adbc7c161ba5 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -795,9 +795,8 @@ static int check_mem_access(struct verifier_env *env, u32 
regno, int off,
err = check_ctx_access(env, off, size, t, _type);
if (!err && t == BPF_READ && value_regno >= 0) {
mark_reg_unknown_value(state->regs, value_regno);
-   if (env->allow_ptr_leaks)
-   /* note that reg.[id|off|range] == 0 */
-   state->regs[value_regno].type = reg_type;
+   /* note that reg.[id|off|range] == 0 */
+   state->regs[value_regno].type = reg_type;
}
 
} else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) {
-- 
2.9.3



[PATCH net-next] gre: use nla_get_be32() to extract flowinfo

2016-09-24 Thread Lance Richardson
Eliminate a sparse endianness mismatch warning, use nla_get_be32() to
extract a __be32 value instead of nla_get_u32().

Signed-off-by: Lance Richardson 
---
 net/ipv6/ip6_gre.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 397e1ed..4ce74f8 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1239,7 +1239,7 @@ static void ip6gre_netlink_parms(struct nlattr *data[],
parms->encap_limit = nla_get_u8(data[IFLA_GRE_ENCAP_LIMIT]);
 
if (data[IFLA_GRE_FLOWINFO])
-   parms->flowinfo = nla_get_u32(data[IFLA_GRE_FLOWINFO]);
+   parms->flowinfo = nla_get_be32(data[IFLA_GRE_FLOWINFO]);
 
if (data[IFLA_GRE_FLAGS])
parms->flags = nla_get_u32(data[IFLA_GRE_FLAGS]);
-- 
2.5.5



Re: [PATCH] realtek: Add switch variable to 'switch case not processed' messages

2016-09-24 Thread Joe Perches
(adding Jes Sorensen to recipients)

On Sat, 2016-09-24 at 11:35 -0500, Larry Finger wrote:
> I have patches that makes HAL_DEF_WOWLAN be a no-op for the rest of the 
> drivers, 
> and one that sets the enum values for that particular statement to hex 
> values. I 
> also looked at the other large enums and decided that they never need the 
> human 
> lookup.

Hey Larry.

There are many somewhat common realtek wireless drivers.

Not to step on your toes, but what do you think of
rationalizing the switch/case statements of all the
realtek drivers in a few steps:

o Reindent all the switch/case blocks to a more normal
  kernel style (git diff -w would show no changes here)

o cast, spacing and parenthesis reductions
  Lots of odd and somewhat unique styles in various
  drivers, looks like too many individual authors without
  a style guide / code enforcer using slightly different
  personalized code.  Glancing at the code, it looks to be
  similar logic, just written in different styles.

o Logic changes like
  from:
if (foo) func(..., bar, ...); else func(..., baz, ...);
  to:
func(..., foo ? bar : baz, ...);
  to make the case statement code blocks more consistent
  and emit somewhat smaller object code.

o Consolidation of equivalent function spanning drivers
  With the style only changes minimized, where possible
  make the drivers use common ops/callback functions.

Is there any value in that or is Jes' work going to make
doing any or all of this unnecessary and futile?


Re: [PATCH] Net Driver: Add Cypress GX3 VID=04b4 PID=3610.

2016-09-24 Thread Chris Roth
Due to my lack of familiarity with the how git send-email works, I've
unintentionally had my name listed as the first 'from' whereas I
intended Allan Chou to be listed as the first 'from' in the patch. If
anyone can correct this on my behalf, I would appreciate it.

Regards,
Chris

On Sat, Sep 24, 2016 at 10:57 AM, Chris Roth  wrote:
>
> Due to my lack of familiarity with the how git send-email works, I've 
> unintentionally had my name listed as the first 'from' whereas I intended 
> Allan Chou to be listed as the first 'from' in the patch. If anyone can 
> correct this on my behalf, I would appreciate it.
>
> Regards,
> Chris
>
> On Fri, Sep 23, 2016 at 4:24 PM,  wrote:
>>
>> From: Chris Roth 
>>
>> From: Allan Chou 
>>
>> Add support for Cypress GX3 SuperSpeed to Gigabit Ethernet
>> Bridge Controller (Vendor=04b4 ProdID=3610).
>>
>> Patch verified on x64 linux kernel 4.7.4 system with the
>> Kensington SD4600P USB-C Universal Dock with Power, which uses the
>> Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller.
>>
>> A similar patch was signed-off and tested-by Allan Chou
>>  on 2015-12-01.
>>
>> Allan verified his similar patch on x86 Linux kernel 4.1.6 system
>> with Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller.
>>
>> Tested-by: Allan Chou 
>> Tested-by: Chris Roth 
>>
>> Signed-off-by: Allan Chou 
>> Signed-off-by: Chris Roth 
>> ---
>>  drivers/net/usb/ax88179_178a.c | 17 +
>>  1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
>> index e6338c1..8a6675d 100644
>> --- a/drivers/net/usb/ax88179_178a.c
>> +++ b/drivers/net/usb/ax88179_178a.c
>> @@ -1656,6 +1656,19 @@ static const struct driver_info ax88178a_info = {
>> .tx_fixup = ax88179_tx_fixup,
>>  };
>>
>> +static const struct driver_info cypress_GX3_info = {
>> +   .description = "Cypress GX3 SuperSpeed to Gigabit Ethernet 
>> Controller",
>> +   .bind = ax88179_bind,
>> +   .unbind = ax88179_unbind,
>> +   .status = ax88179_status,
>> +   .link_reset = ax88179_link_reset,
>> +   .reset = ax88179_reset,
>> +   .stop = ax88179_stop,
>> +   .flags = FLAG_ETHER | FLAG_FRAMING_AX,
>> +   .rx_fixup = ax88179_rx_fixup,
>> +   .tx_fixup = ax88179_tx_fixup,
>> +};
>> +
>>  static const struct driver_info dlink_dub1312_info = {
>> .description = "D-Link DUB-1312 USB 3.0 to Gigabit Ethernet Adapter",
>> .bind = ax88179_bind,
>> @@ -1718,6 +1731,10 @@ static const struct usb_device_id products[] = {
>> USB_DEVICE(0x0b95, 0x178a),
>> .driver_info = (unsigned long)_info,
>>  }, {
>> +   /* Cypress GX3 SuperSpeed to Gigabit Ethernet Bridge Controller */
>> +   USB_DEVICE(0x04b4, 0x3610),
>> +   .driver_info = (unsigned long)_GX3_info,
>> +}, {
>> /* D-Link DUB-1312 USB 3.0 to Gigabit Ethernet Adapter */
>> USB_DEVICE(0x2001, 0x4a00),
>> .driver_info = (unsigned long)_dub1312_info,
>> --
>> 2.7.4
>>
>


Re: [Intel-wired-lan] [PATCH net-next v2 1/2] i40e: remove superfluous I40E_DEBUG_USER statement

2016-09-24 Thread Alexander Duyck
On Sat, Sep 24, 2016 at 4:13 AM, Stefan Assmann  wrote:
> On 24.09.2016 04:48, Alexander Duyck wrote:
>> On Fri, Sep 23, 2016 at 6:30 AM, Stefan Assmann  wrote:
>>> This debug statement is confusing and never set in the code. Any debug
>>> output should be guarded by the proper I40E_DEBUG_* statement which can
>>> be enabled via the debug module parameter or ethtool.
>>> Remove or convert the I40E_DEBUG_USER cases to I40E_DEBUG_INIT.
>>>
>>> v2: re-add setting the debug_mask in i40e_set_msglevel() so that the
>>> debug level can still be altered via ethtool msglvl.
>>>
>>> Signed-off-by: Stefan Assmann 
>>> ---
>>>  drivers/net/ethernet/intel/i40e/i40e_common.c  |  3 ---
>>>  drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  6 -
>>>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  3 +--
>>>  drivers/net/ethernet/intel/i40e/i40e_main.c| 35 
>>> +-
>>>  drivers/net/ethernet/intel/i40e/i40e_type.h|  2 --
>>>  5 files changed, 18 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
>>> b/drivers/net/ethernet/intel/i40e/i40e_common.c
>>> index 2154a34..8ccb09c 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_common.c
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
>>> @@ -3207,9 +3207,6 @@ static void i40e_parse_discover_capabilities(struct 
>>> i40e_hw *hw, void *buff,
>>> break;
>>> case I40E_AQ_CAP_ID_MSIX:
>>> p->num_msix_vectors = number;
>>> -   i40e_debug(hw, I40E_DEBUG_INIT,
>>> -  "HW Capability: MSIX vector count = 
>>> %d\n",
>>> -  p->num_msix_vectors);
>>> break;
>>> case I40E_AQ_CAP_ID_VF_MSIX:
>>> p->num_msix_vectors_vf = number;
>>
>> I'm assuming this is dropped because you considered it redundant with
>> the dump in i40e_get_capabilities.  If so it would have been nice to
>> see this called out in your patch description somewhere as it doesn't
>> jive with the rest of the patch since you are stripping something that
>> is using I40E_DEBUG_INIT.
>
> Hi Alex,
>
> agreed, it seemed redundant. I'll make a note about it in the next
> version when we have decided how to proceed.
>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
>>> b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
>>> index 05cf9a7..e9c6f1c 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
>>> @@ -1210,12 +1210,6 @@ static ssize_t i40e_dbg_command_write(struct file 
>>> *filp,
>>> u32 level;
>>> cnt = sscanf(_buf[10], "%i", );
>>> if (cnt) {
>>> -   if (I40E_DEBUG_USER & level) {
>>> -   pf->hw.debug_mask = level;
>>> -   dev_info(>pdev->dev,
>>> -"set hw.debug_mask = 0x%08x\n",
>>> -pf->hw.debug_mask);
>>> -   }
>>> pf->msg_enable = level;
>>> dev_info(>pdev->dev, "set msg_enable = 
>>> 0x%08x\n",
>>>  pf->msg_enable);
>>
>> From what I can tell the interface is completely redundant as ethtool
>> can already do this.  I'd say it is okay to just remove this command
>> and section entirely from the debugfs interface.
>
> Yes, I didn't want to stray too far from what the description said and
> just removed the I40E_DEBUG_USER related code.
>
>>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
>>> b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
>>> index 1835186..02f55ab 100644
>>> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
>>> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
>>> @@ -987,8 +987,7 @@ static void i40e_set_msglevel(struct net_device 
>>> *netdev, u32 data)
>>> struct i40e_netdev_priv *np = netdev_priv(netdev);
>>> struct i40e_pf *pf = np->vsi->back;
>>>
>>> -   if (I40E_DEBUG_USER & data)
>>> -   pf->hw.debug_mask = data;
>>> +   pf->hw.debug_mask = data;
>>> pf->msg_enable = data;
>>>  }
>>>
>>
>> So the way I view this is that I40E_DEBUG_USER appears to be a flag
>> that is being used to differentiate between some proprietary flags and
>> the standard msg level.  The problem is that msg_enable and debug_mask
>> are playing off of two completely different bit definitions.  For
>> example how much sense does it make for NETIF_F_MSG_TX_DONE to map to
>> I40E_DEBUG_DCB.  If anything what should probably happen here is
>> instead of dropping the if there probably needs to be an else.
>
> As you said the flags don't match, which is part of the problem. What
> tipped me of starting to work on this is, that the 

Re: [PATCH] realtek: Add switch variable to 'switch case not processed' messages

2016-09-24 Thread Larry Finger

On 09/24/2016 11:15 AM, Joe Perches wrote:

On Sat, 2016-09-24 at 17:55 +0200, Jean Delvare wrote:

Would it make sense to explicitly set the enum values, or add them as
comments, to make such look-ups easier?


If you want to create enum->#ENUM structs and
"const char *" lookup functions, please be my guest.

otherwise, hex is at least a consistent way to display
what should be infrequent output.


Displaying those values as hex is OK. As Joe says, they will not be shown very 
often.


I have patches that makes HAL_DEF_WOWLAN be a no-op for the rest of the drivers, 
and one that sets the enum values for that particular statement to hex values. I 
also looked at the other large enums and decided that they never need the human 
lookup.


Larry




Re: [PATCH] realtek: Add switch variable to 'switch case not processed' messages

2016-09-24 Thread Joe Perches
On Sat, 2016-09-24 at 17:55 +0200, Jean Delvare wrote:
> Would it make sense to explicitly set the enum values, or add them as
> comments, to make such look-ups easier?

If you want to create enum->#ENUM structs and
"const char *" lookup functions, please be my guest.

otherwise, hex is at least a consistent way to display
what should be infrequent output.


[PATCH 7/7] ipv6 addrconf: change default MAX_RTR_SOLICITATIONS from 3 to -1 (unlimited)

2016-09-24 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

This changes:
  /proc/sys/net/ipv6/conf/all/router_solicitations
  /proc/sys/net/ipv6/conf/default/router_solicitations
from 3 to unlimited.

This is the https://tools.ietf.org/html/rfc7559 recommended default.

Signed-off-by: Maciej Żenczykowski 
---
 include/net/addrconf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 8f3677269f9a..f2d072787947 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -1,7 +1,7 @@
 #ifndef _ADDRCONF_H
 #define _ADDRCONF_H
 
-#define MAX_RTR_SOLICITATIONS  3
+#define MAX_RTR_SOLICITATIONS  -1  /* unlimited */
 #define RTR_SOLICITATION_INTERVAL  (4*HZ)
 #define RTR_SOLICITATION_MAX_INTERVAL  (3600*HZ)   /* 1 hour */
 
-- 
2.8.0.rc3.226.g39d4020



[PATCH 4/7] ipv6 addrconf: add new sysctl 'router_solicitation_max_interval'

2016-09-24 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

Accessible via:
  /proc/sys/net/ipv6/conf/*/router_solicitation_max_interval

For now we default it to the same value as the normal interval.

Signed-off-by: Maciej Żenczykowski 
---
 include/linux/ipv6.h  |  1 +
 include/net/addrconf.h|  1 +
 include/uapi/linux/ipv6.h |  1 +
 net/ipv6/addrconf.c   | 11 +++
 4 files changed, 14 insertions(+)

diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index c6dbcd84a2c7..7e9a789be5e0 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -18,6 +18,7 @@ struct ipv6_devconf {
__s32   dad_transmits;
__s32   rtr_solicits;
__s32   rtr_solicit_interval;
+   __s32   rtr_solicit_max_interval;
__s32   rtr_solicit_delay;
__s32   force_mld_version;
__s32   mldv1_unsolicited_report_interval;
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 9826d3a9464c..275e5af4c2f4 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -3,6 +3,7 @@
 
 #define MAX_RTR_SOLICITATIONS  3
 #define RTR_SOLICITATION_INTERVAL  (4*HZ)
+#define RTR_SOLICITATION_MAX_INTERVAL  (4*HZ)
 
 #define MIN_VALID_LIFETIME (2*3600)/* 2 hours */
 
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index 395876060f50..8c2772340c3f 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -177,6 +177,7 @@ enum {
DEVCONF_DROP_UNICAST_IN_L2_MULTICAST,
DEVCONF_DROP_UNSOLICITED_NA,
DEVCONF_KEEP_ADDR_ON_DOWN,
+   DEVCONF_RTR_SOLICIT_MAX_INTERVAL,
DEVCONF_MAX
 };
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 6c63bf06fbcf..255be34cdbce 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -187,6 +187,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
.dad_transmits  = 1,
.rtr_solicits   = MAX_RTR_SOLICITATIONS,
.rtr_solicit_interval   = RTR_SOLICITATION_INTERVAL,
+   .rtr_solicit_max_interval = RTR_SOLICITATION_MAX_INTERVAL,
.rtr_solicit_delay  = MAX_RTR_SOLICITATION_DELAY,
.use_tempaddr   = 0,
.temp_valid_lft = TEMP_VALID_LIFETIME,
@@ -232,6 +233,7 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly 
= {
.dad_transmits  = 1,
.rtr_solicits   = MAX_RTR_SOLICITATIONS,
.rtr_solicit_interval   = RTR_SOLICITATION_INTERVAL,
+   .rtr_solicit_max_interval = RTR_SOLICITATION_MAX_INTERVAL,
.rtr_solicit_delay  = MAX_RTR_SOLICITATION_DELAY,
.use_tempaddr   = 0,
.temp_valid_lft = TEMP_VALID_LIFETIME,
@@ -4891,6 +4893,8 @@ static inline void ipv6_store_devconf(struct ipv6_devconf 
*cnf,
array[DEVCONF_RTR_SOLICITS] = cnf->rtr_solicits;
array[DEVCONF_RTR_SOLICIT_INTERVAL] =
jiffies_to_msecs(cnf->rtr_solicit_interval);
+   array[DEVCONF_RTR_SOLICIT_MAX_INTERVAL] =
+   jiffies_to_msecs(cnf->rtr_solicit_max_interval);
array[DEVCONF_RTR_SOLICIT_DELAY] =
jiffies_to_msecs(cnf->rtr_solicit_delay);
array[DEVCONF_FORCE_MLD_VERSION] = cnf->force_mld_version;
@@ -5771,6 +5775,13 @@ static const struct ctl_table addrconf_sysctl[] = {
.proc_handler   = proc_dointvec_jiffies,
},
{
+   .procname   = "router_solicitation_max_interval",
+   .data   = _devconf.rtr_solicit_max_interval,
+   .maxlen = sizeof(int),
+   .mode   = 0644,
+   .proc_handler   = proc_dointvec_jiffies,
+   },
+   {
.procname   = "router_solicitation_delay",
.data   = _devconf.rtr_solicit_delay,
.maxlen = sizeof(int),
-- 
2.8.0.rc3.226.g39d4020



[PATCH 5/7] ipv6 addrconf: implement RFC7559 router solicitation backoff

2016-09-24 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

This implements:
  https://tools.ietf.org/html/rfc7559

Backoff is performed according to RFC3315 section 14:
  https://tools.ietf.org/html/rfc3315#section-14

Signed-off-by: Maciej Żenczykowski 
---
 include/net/if_inet6.h |  1 +
 net/ipv6/addrconf.c| 31 +++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 1c8b6820b694..515352c6280a 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -201,6 +201,7 @@ struct inet6_dev {
struct ipv6_devstat stats;
 
struct timer_list   rs_timer;
+   __s32   rs_interval;/* in jiffies */
__u8rs_probes;
 
__u8addr_gen_mode;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 255be34cdbce..f2147b3352b9 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -112,6 +112,24 @@ static inline u32 cstamp_delta(unsigned long cstamp)
return (cstamp - INITIAL_JIFFIES) * 100UL / HZ;
 }
 
+static inline s32 rfc3315_s14_backoff_init(s32 initial)
+{
+   s64 r = 90 + (prandom_u32() % 21); /* 0.9 .. 1.1 */
+   s32 v = initial * r / 100;
+   return v;
+}
+
+static inline s32 rfc3315_s14_backoff_update(s32 cur, s32 ceiling)
+{
+   s64 r = 190 + (prandom_u32() % 21); /* 1.9 .. 2.1 */
+   s32 v = cur * r / 100;
+   if (v > ceiling) {
+   r -= 100; /* 0.9 .. 1.1 */
+   v = ceiling * r / 100;
+   }
+   return v;
+}
+
 #ifdef CONFIG_SYSCTL
 static int addrconf_sysctl_register(struct inet6_dev *idev);
 static void addrconf_sysctl_unregister(struct inet6_dev *idev);
@@ -3698,11 +3716,13 @@ static void addrconf_rs_timer(unsigned long data)
goto put;
 
write_lock(>lock);
+   idev->rs_interval = rfc3315_s14_backoff_update(
+   idev->rs_interval, idev->cnf.rtr_solicit_max_interval);
/* The wait after the last probe can be shorter */
addrconf_mod_rs_timer(idev, (idev->rs_probes ==
 idev->cnf.rtr_solicits) ?
  idev->cnf.rtr_solicit_delay :
- idev->cnf.rtr_solicit_interval);
+ idev->rs_interval);
} else {
/*
 * Note: we do not support deprecated "all on-link"
@@ -3973,10 +3993,11 @@ static void addrconf_dad_completed(struct inet6_ifaddr 
*ifp)
 
write_lock_bh(>idev->lock);
spin_lock(>lock);
+   ifp->idev->rs_interval = rfc3315_s14_backoff_init(
+   ifp->idev->cnf.rtr_solicit_interval);
ifp->idev->rs_probes = 1;
ifp->idev->if_flags |= IF_RS_SENT;
-   addrconf_mod_rs_timer(ifp->idev,
- ifp->idev->cnf.rtr_solicit_interval);
+   addrconf_mod_rs_timer(ifp->idev, ifp->idev->rs_interval);
spin_unlock(>lock);
write_unlock_bh(>idev->lock);
}
@@ -5132,8 +5153,10 @@ update_lft:
 
if (update_rs) {
idev->if_flags |= IF_RS_SENT;
+   idev->rs_interval = rfc3315_s14_backoff_init(
+   idev->cnf.rtr_solicit_interval);
idev->rs_probes = 1;
-   addrconf_mod_rs_timer(idev, idev->cnf.rtr_solicit_interval);
+   addrconf_mod_rs_timer(idev, idev->rs_interval);
}
 
/* Well, that's kinda nasty ... */
-- 
2.8.0.rc3.226.g39d4020



[PATCH 6/7] ipv6 addrconf: change default RTR_SOLICITATION_MAX_INTERVAL from 4s to 1h

2016-09-24 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

This changes:
  /proc/sys/net/ipv6/conf/all/router_solicitation_max_interval
  /proc/sys/net/ipv6/conf/default/router_solicitation_max_interval
from 4 seconds to 1 hour.

This is the https://tools.ietf.org/html/rfc7559 recommended default.

Signed-off-by: Maciej Żenczykowski 
---
 include/net/addrconf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 275e5af4c2f4..8f3677269f9a 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -3,7 +3,7 @@
 
 #define MAX_RTR_SOLICITATIONS  3
 #define RTR_SOLICITATION_INTERVAL  (4*HZ)
-#define RTR_SOLICITATION_MAX_INTERVAL  (4*HZ)
+#define RTR_SOLICITATION_MAX_INTERVAL  (3600*HZ)   /* 1 hour */
 
 #define MIN_VALID_LIFETIME (2*3600)/* 2 hours */
 
-- 
2.8.0.rc3.226.g39d4020



[PATCH 2/7] ipv6 addrconf: remove addrconf_sysctl_hop_limit()

2016-09-24 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

replace with extra1/2 magic

Signed-off-by: Maciej Żenczykowski 
---
 net/ipv6/addrconf.c | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 11fa1a5564d4..3a835495fb53 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5467,20 +5467,6 @@ int addrconf_sysctl_forward(struct ctl_table *ctl, int 
write,
 }
 
 static
-int addrconf_sysctl_hop_limit(struct ctl_table *ctl, int write,
-  void __user *buffer, size_t *lenp, loff_t *ppos)
-{
-   struct ctl_table lctl;
-   int min_hl = 1, max_hl = 255;
-
-   lctl = *ctl;
-   lctl.extra1 = _hl;
-   lctl.extra2 = _hl;
-
-   return proc_dointvec_minmax(, write, buffer, lenp, ppos);
-}
-
-static
 int addrconf_sysctl_mtu(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
 {
@@ -5713,6 +5699,9 @@ int addrconf_sysctl_ignore_routes_with_linkdown(struct 
ctl_table *ctl,
return ret;
 }
 
+static int one = 1;
+static int two_five_five = 255;
+
 static const struct ctl_table addrconf_sysctl[] = {
{
.procname   = "forwarding",
@@ -5726,7 +5715,9 @@ static const struct ctl_table addrconf_sysctl[] = {
.data   = _devconf.hop_limit,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = addrconf_sysctl_hop_limit,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = ,
+   .extra2 = _five_five,
},
{
.procname   = "mtu",
-- 
2.8.0.rc3.226.g39d4020



[PATCH 3/7] ipv6 addrconf: rtr_solicits == -1 means unlimited

2016-09-24 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

This allows setting /proc/sys/net/ipv6/conf/*/router_solicitations
to -1 meaning an unlimited number of retransmits.

Signed-off-by: Maciej Żenczykowski 
---
 net/ipv6/addrconf.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 3a835495fb53..6c63bf06fbcf 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3687,7 +3687,7 @@ static void addrconf_rs_timer(unsigned long data)
if (idev->if_flags & IF_RA_RCVD)
goto out;
 
-   if (idev->rs_probes++ < idev->cnf.rtr_solicits) {
+   if (idev->rs_probes++ < idev->cnf.rtr_solicits || 
idev->cnf.rtr_solicits == -1) {
write_unlock(>lock);
if (!ipv6_get_lladdr(dev, , IFA_F_TENTATIVE))
ndisc_send_rs(dev, ,
@@ -3949,7 +3949,7 @@ static void addrconf_dad_completed(struct inet6_ifaddr 
*ifp)
send_mld = ifp->scope == IFA_LINK && ipv6_lonely_lladdr(ifp);
send_rs = send_mld &&
  ipv6_accept_ra(ifp->idev) &&
- ifp->idev->cnf.rtr_solicits > 0 &&
+ ifp->idev->cnf.rtr_solicits != 0 &&
  (dev->flags_LOOPBACK) == 0;
read_unlock_bh(>idev->lock);
 
@@ -5099,7 +5099,7 @@ static int inet6_set_iftoken(struct inet6_dev *idev, 
struct in6_addr *token)
return -EINVAL;
if (!ipv6_accept_ra(idev))
return -EINVAL;
-   if (idev->cnf.rtr_solicits <= 0)
+   if (idev->cnf.rtr_solicits == 0)
return -EINVAL;
 
write_lock_bh(>lock);
@@ -5699,6 +5699,7 @@ int addrconf_sysctl_ignore_routes_with_linkdown(struct 
ctl_table *ctl,
return ret;
 }
 
+static int minus_one = -1;
 static int one = 1;
 static int two_five_five = 255;
 
@@ -5759,7 +5760,8 @@ static const struct ctl_table addrconf_sysctl[] = {
.data   = _devconf.rtr_solicits,
.maxlen = sizeof(int),
.mode   = 0644,
-   .proc_handler   = proc_dointvec,
+   .proc_handler   = proc_dointvec_minmax,
+   .extra1 = _one,
},
{
.procname   = "router_solicitation_interval",
-- 
2.8.0.rc3.226.g39d4020



[PATCH 1/7] ipv6 addrconf: enable use of proc_dointvec_minmax in addrconf_sysctl

2016-09-24 Thread Maciej Żenczykowski
From: Maciej Żenczykowski 

Signed-off-by: Maciej Żenczykowski 
---
 net/ipv6/addrconf.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2f1f5d439788..11fa1a5564d4 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -6044,8 +6044,14 @@ static int __addrconf_sysctl_register(struct net *net, 
char *dev_name,
 
for (i = 0; table[i].data; i++) {
table[i].data += (char *)p - (char *)_devconf;
-   table[i].extra1 = idev; /* embedded; no ref */
-   table[i].extra2 = net;
+   /* If one of these is already set, then it is not safe to
+* overwrite either of them: this makes proc_dointvec_minmax
+* usable.
+*/
+   if (!table[i].extra1 && !table[i].extra2) {
+   table[i].extra1 = idev; /* embedded; no ref */
+   table[i].extra2 = net;
+   }
}
 
snprintf(path, sizeof(path), "net/ipv6/conf/%s", dev_name);
-- 
2.8.0.rc3.226.g39d4020



Implement rfc7559 ipv6 router solicitation backoff

2016-09-24 Thread Maciej Żenczykowski



Re: [PATCH] realtek: Add switch variable to 'switch case not processed' messages

2016-09-24 Thread Jean Delvare
Hi Joe, Larry,

On Fri, 23 Sep 2016 12:02:43 -0700, Joe Perches wrote:
> On Fri, 2016-09-23 at 13:59 -0500, Larry Finger wrote:
> > I'm not familiar with the %#x format. What does it do?
> 
> Outputs SPECIAL prefix, it's the same as "0x%x"
> 
> lib/vsprintf.c:
> #define SPECIAL   64  /* prefix hex with "0x", octal with "0" 
> */

Is hexadecimal actually the best way to display these values? I guess it
depends how they are listed in the datasheets (if there's anything like
that for these chips?)

I found it a bit difficult to look up the meaning of the value.
HAL_DEF_WOWLAN is an enum value, the number is not set and there's no
comment. I had to count the line numbers, taking blank lines into
account... I ended up pasting the whole enum to a random C file and
printing the value of HAL_DEF_WOWLAN to make sure it was 92.

Would it make sense to explicitly set the enum values, or add them as
comments, to make such look-ups easier?

-- 
Jean Delvare
SUSE L3 Support


Re: [PATCH 0/3] net: fec: updates to align IP header

2016-09-24 Thread Eric Nelson
On 09/24/2016 08:09 AM, Andy Duan wrote:
> From: Eric Nelson  Sent: Saturday, September 24, 2016 10:42 
> PM
>> To: netdev@vger.kernel.org
>> Cc: li...@arm.linux.org.uk; and...@lunn.ch; Andy Duan
>> ; ota...@ossystems.com.br;
>> eduma...@google.com; troy.ki...@boundarydevices.com;
>> da...@davemloft.net; u.kleine-koe...@pengutronix.de; Eric Nelson
>> 
>> Subject: [PATCH 0/3] net: fec: updates to align IP header
>>
>> This patch series is the outcome of investigation into very high numbers of
>> alignment faults on kernel 4.1.33 from the linux-fslc
>> tree:
>> https://github.com/freescale/linux-fslc/tree/4.1-1.0.x-imx
>>
>> The first two patches remove support for the receive accelerator (RACC)
>> from the i.MX25 and i.MX27 SoCs which don't support the function.
>>
>> The third patch enables hardware alignment of the ethernet packet payload
>> (and especially the IP header) to prevent alignment faults in the IP stack.
>>
>> Testing on i.MX6UL on the 4.1.33 kernel showed that this patch removed on
>> the order of 70k alignment faults during a 100MiB transfer using wget.
>>
>> Testing on an i.MX6Q (SABRE Lite) board on net-next (4.8.0-rc7) showed a
>> much more modest improvement from 10's of faults, and it's not clear why
>> that's the case.
>>
>> Eric Nelson (3):
>>   net: fec: remove QUIRK_HAS_RACC from i.mx25
>>   net: fec: remove QUIRK_HAS_RACC from i.mx27
>>   net: fec: align IP header in hardware
>>
>>  drivers/net/ethernet/freescale/fec_main.c | 15 ---
>>  1 file changed, 12 insertions(+), 3 deletions(-)
>>
>> --
>> 2.7.4
> I will investigate the diff between 4.1 and 4.8. Thanks.
> 

Thanks. Note that I'm not sure if the difference is 4.1 vs. 4.8 or
i.MX6UL vs. i.MX6Q.

> Acked-by: Fugang Duan 
> 



RE: [PATCH 0/3] net: fec: updates to align IP header

2016-09-24 Thread Andy Duan
From: Eric Nelson  Sent: Saturday, September 24, 2016 10:42 PM
> To: netdev@vger.kernel.org
> Cc: li...@arm.linux.org.uk; and...@lunn.ch; Andy Duan
> ; ota...@ossystems.com.br;
> eduma...@google.com; troy.ki...@boundarydevices.com;
> da...@davemloft.net; u.kleine-koe...@pengutronix.de; Eric Nelson
> 
> Subject: [PATCH 0/3] net: fec: updates to align IP header
> 
> This patch series is the outcome of investigation into very high numbers of
> alignment faults on kernel 4.1.33 from the linux-fslc
> tree:
> https://github.com/freescale/linux-fslc/tree/4.1-1.0.x-imx
> 
> The first two patches remove support for the receive accelerator (RACC)
> from the i.MX25 and i.MX27 SoCs which don't support the function.
> 
> The third patch enables hardware alignment of the ethernet packet payload
> (and especially the IP header) to prevent alignment faults in the IP stack.
> 
> Testing on i.MX6UL on the 4.1.33 kernel showed that this patch removed on
> the order of 70k alignment faults during a 100MiB transfer using wget.
> 
> Testing on an i.MX6Q (SABRE Lite) board on net-next (4.8.0-rc7) showed a
> much more modest improvement from 10's of faults, and it's not clear why
> that's the case.
> 
> Eric Nelson (3):
>   net: fec: remove QUIRK_HAS_RACC from i.mx25
>   net: fec: remove QUIRK_HAS_RACC from i.mx27
>   net: fec: align IP header in hardware
> 
>  drivers/net/ethernet/freescale/fec_main.c | 15 ---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> --
> 2.7.4
I will investigate the diff between 4.1 and 4.8. Thanks.

Acked-by: Fugang Duan 


[PATCH 3/3] net: fec: align IP header in hardware

2016-09-24 Thread Eric Nelson
The FEC receive accelerator (RACC) supports shifting the data payload of
received packets by 16-bits, which aligns the payload (IP header) on a
4-byte boundary, which is, if not required, at least strongly suggested
by the Linux networking layer.

Without this patch, a huge number of alignment faults will be taken by the
IP stack, as seen in /proc/cpu/alignment:

~/$ cat /proc/cpu/alignment
User:   0
System: 72645 (inet_gro_receive+0x104/0x27c)
Skipped:0
Half:   0
Word:   0
DWord:  0
Multi:  72645
User faults:3 (fixup+warn)

This patch was suggested by Andrew Lunn in this message to linux-netdev:
http://marc.info/?l=linux-arm-kernel=147465452108384=2

and adapted from a patch by Russell King from 2014:
http://git.arm.linux.org.uk/cgit/linux-arm.git/commit/?id=70d8a8a

Signed-off-by: Eric Nelson 
---
 drivers/net/ethernet/freescale/fec_main.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 0219e79..1fa2d87 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -180,6 +180,7 @@ MODULE_PARM_DESC(macaddr, "FEC Ethernet MAC address");
 /* FEC receive acceleration */
 #define FEC_RACC_IPDIS (1 << 1)
 #define FEC_RACC_PRODIS(1 << 2)
+#define FEC_RACC_SHIFT16   BIT(7)
 #define FEC_RACC_OPTIONS   (FEC_RACC_IPDIS | FEC_RACC_PRODIS)
 
 /*
@@ -945,9 +946,11 @@ fec_restart(struct net_device *ndev)
 
 #if !defined(CONFIG_M5272)
if (fep->quirks & FEC_QUIRK_HAS_RACC) {
-   /* set RX checksum */
val = readl(fep->hwp + FEC_RACC);
+   /* align IP header */
+   val |= FEC_RACC_SHIFT16;
if (fep->csum_flags & FLAG_RX_CSUM_ENABLED)
+   /* set RX checksum */
val |= FEC_RACC_OPTIONS;
else
val &= ~FEC_RACC_OPTIONS;
@@ -1428,6 +1431,12 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, 
u16 queue_id)
prefetch(skb->data - NET_IP_ALIGN);
skb_put(skb, pkt_len - 4);
data = skb->data;
+
+#if !defined(CONFIG_M5272)
+   if (fep->quirks & FEC_QUIRK_HAS_RACC)
+   data = skb_pull_inline(skb, 2);
+#endif
+
if (!is_copybreak && need_swap)
swap_buffer(data, pkt_len);
 
-- 
2.7.4



[PATCH 2/3] net: fec: remove QUIRK_HAS_RACC from i.mx27

2016-09-24 Thread Eric Nelson
According to the i.MX27 reference manual, this SoC does not have support
for the receive accelerator (RACC) register at offset 0x1C4.

http://cache.nxp.com/files/32bit/doc/ref_manual/MCIMX27RM.pdf

Signed-off-by: Eric Nelson 
---
 drivers/net/ethernet/freescale/fec_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index d193406..0219e79 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -92,7 +92,7 @@ static struct platform_device_id fec_devtype[] = {
.driver_data = FEC_QUIRK_USE_GASKET,
}, {
.name = "imx27-fec",
-   .driver_data = FEC_QUIRK_HAS_RACC,
+   .driver_data = 0,
}, {
.name = "imx28-fec",
.driver_data = FEC_QUIRK_ENET_MAC | FEC_QUIRK_SWAP_FRAME |
-- 
2.7.4



[PATCH 1/3] net: fec: remove QUIRK_HAS_RACC from i.mx25

2016-09-24 Thread Eric Nelson
According to the i.MX25 reference manual, this SoC does not have support
for the receive accelerator (RACC) register at offset 0x1C4.

http://www.nxp.com/files/dsp/doc/ref_manual/IMX25RM.pdf

Signed-off-by: Eric Nelson 
---
 drivers/net/ethernet/freescale/fec_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index fb5c638..d193406 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -89,7 +89,7 @@ static struct platform_device_id fec_devtype[] = {
.driver_data = 0,
}, {
.name = "imx25-fec",
-   .driver_data = FEC_QUIRK_USE_GASKET | FEC_QUIRK_HAS_RACC,
+   .driver_data = FEC_QUIRK_USE_GASKET,
}, {
.name = "imx27-fec",
.driver_data = FEC_QUIRK_HAS_RACC,
-- 
2.7.4



[PATCH 0/3] net: fec: updates to align IP header

2016-09-24 Thread Eric Nelson
This patch series is the outcome of investigation into very high
numbers of alignment faults on kernel 4.1.33 from the linux-fslc
tree:
https://github.com/freescale/linux-fslc/tree/4.1-1.0.x-imx

The first two patches remove support for the receive accelerator (RACC) from
the i.MX25 and i.MX27 SoCs which don't support the function.

The third patch enables hardware alignment of the ethernet packet payload
(and especially the IP header) to prevent alignment faults in the IP stack.

Testing on i.MX6UL on the 4.1.33 kernel showed that this patch removed
on the order of 70k alignment faults during a 100MiB transfer using 
wget.

Testing on an i.MX6Q (SABRE Lite) board on net-next (4.8.0-rc7) showed
a much more modest improvement from 10's of faults, and it's not clear
why that's the case.

Eric Nelson (3):
  net: fec: remove QUIRK_HAS_RACC from i.mx25
  net: fec: remove QUIRK_HAS_RACC from i.mx27
  net: fec: align IP header in hardware

 drivers/net/ethernet/freescale/fec_main.c | 15 ---
 1 file changed, 12 insertions(+), 3 deletions(-)

-- 
2.7.4



Re: [PATCH 4/6] isdn/hisax: clean function declaration in hscx.c up

2016-09-24 Thread Sergei Shtylyov

Hello.

On 9/24/2016 8:24 AM, Baoyou Xie wrote:


We get 1 warning when building kernel with W=1:
drivers/isdn/hisax/hscx.c:175:1: warning: no previous prototype for 
'open_hscxstate' [-Wmissing-prototypes]

In fact, this function is declared in
drivers/isdn/hisax/elsa_ser.c, but should be
declard in a header file, thus can be recognized in other file.


   Declared.


So this patch moves the declaration into drivers/isdn/hisax/hscx.h.

Signed-off-by: Baoyou Xie 

[...]

MBR, Sergei



Re: [PATCH net v2] ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()

2016-09-24 Thread David Miller
From: Lance Richardson 
Date: Fri, 23 Sep 2016 15:50:29 -0400

> Similar to commit 3be07244b733 ("ip6_gre: fix flowi6_proto value in
> xmit path"), set flowi6_proto to IPPROTO_GRE for output route lookup.
> 
> Up until now, ip6gre_xmit_other() has set flowi6_proto to a bogus value.
> This affected output route lookup for packets sent on an ip6gretap device
> in cases where routing was dependent on the value of flowi6_proto.
> 
> Since the correct proto is already set in the tunnel flowi6 template via
> commit 252f3f5a1189 ("ip6_gre: Set flowi6_proto as IPPROTO_GRE in xmit
> path."), simply delete the line setting the incorrect flowi6_proto value.
> 
> Suggested-by: Jiri Benc 
> Fixes: commit c12b395a4664 ("gre: Support GRE over IPv6")
> Reviewed-by: Shmulik Ladkani 
> Signed-off-by: Lance Richardson 
> ---
> v2: expanded commit description as suggested by Shmulik Ladkani.

Applied and queued up for -stable with Fixes tag fixes up.

Thanks.


Re: [PATCH] hv_netvsc: fix comments

2016-09-24 Thread David Miller
From: sthem...@exchange.microsoft.com
Date: Fri, 23 Sep 2016 17:08:17 -0700

> From: Stephen Hemminger 
> 
> Typo's and spelling errors. Also remove old comment from staging era.
> 
> Signed-off-by: Stephen Hemminger 

Applied to net-next.

Please properly specify "[PATCH net-next]" or "[PATCH net]" in your
Subject lines in the future.  Don't make me guess.

Thank you.


Re: [PATCH v2 0/2] BQL support and fix for a regression issue

2016-09-24 Thread David Miller
From: sunil.kovv...@gmail.com
Date: Fri, 23 Sep 2016 14:42:26 +0530

> From: Sunil Goutham 
> 
> These patches add byte queue limit support and also fixes a regression
> issue introduced by commit
> 'net: thunderx: Use netdev's name for naming VF's interrupts'
> 
> Changes from v1:
> - As suggested added 'Fixes' tag with commit id of previous commit 
>   which cuased issue.
> - Also fixed the missing netdev_tx_reset_queue() function call in 
>   byte queue limits support patch.

Series applied to net-next, thanks.


Re: [PATCH] cxgb4: fix -ve error check on a signed iq

2016-09-24 Thread David Miller
From: Colin King 
Date: Fri, 23 Sep 2016 14:45:13 +0100

> -static unsigned int get_filter_steerq(struct net_device *dev,
> +static int get_filter_steerq(struct net_device *dev,
> struct ch_filter_specification *fs)

If you change the location of the openning parenthesis of the first
line, you must reindent the second line so that the arguments are
placed preciely at the column following that openning parenthesis.


Re: [PATCH] mlxsw: spectrum: remove redundant check if err is zero

2016-09-24 Thread David Miller
From: Colin King 
Date: Fri, 23 Sep 2016 12:02:45 +0100

> From: Colin Ian King 
> 
> There is an earlier check and return if err is non-zero, so
> the check to see if it is zero is redundant in every iteration
> of the loop and hence the check can be removed.
> 
> Signed-off-by: Colin Ian King 

Applied to net-next.


Re: Alignment issues with freescale FEC driver

2016-09-24 Thread Eric Nelson
Hi David,

On 09/23/2016 07:43 PM, David Miller wrote:
> From: Eric Nelson 
> Date: Fri, 23 Sep 2016 10:33:29 -0700
> 
>> Since the hardware requires longword alignment for its' DMA transfers,
>> aligning the IP header will require a memcpy, right?
> 
> I wish hardware designers didn't do this.
> 
> There is no conflict between DMA alignment and properly offseting
> the packet data by two bytes.
> 
> All hardware designers have to do is allow 2 padding bytes to be
> emitted by the chip before the actual packet data.
> 

Andrew Lunn pointed out that the hardware does support this,
and I just pushed a patch for the vendor kernel to the meta-freescale
mailing list:

https://lists.yoctoproject.org/pipermail/meta-freescale/2016-September/019228.html

> Then the longword or whatever DMA transfer alignment is met
> whilst still giving the necessary flexibility for where the
> packet data lands.
> 

Right. A relatively small change fixes things right up.

Many thanks to Andrew for pointing this out and Russell for providing
the basis for my patch.

I'll re-work this for the up-stream kernel when I get out from
under a couple of unrelated things.


Re: [PATCH net-next] Documentation: devicetree: fix typo in MediaTek ethernet device-tree binding

2016-09-24 Thread David Miller
From: 
Date: Fri, 23 Sep 2016 14:09:32 +0800

> From: Sean Wang 
> 
> fix typo in
> Documentation/devicetree/bindings/net/mediatek-net.txt
> 
> Cc: devicet...@vger.kernel.org
> Reported-by: Sergei Shtylyov 
> Signed-off-by: Sean Wang 

Applied.


Re: [PATCH net-next v2] Documentation: devicetree: revise ethernet device-tree binding about TRGMII

2016-09-24 Thread David Miller
From: 
Date: Fri, 23 Sep 2016 14:04:09 +0800

> From: Sean Wang 
> 
> add phy-mode "trgmii" to
> Documentation/devicetree/bindings/net/ethernet.txt
> 
> Cc: devicet...@vger.kernel.org
> Reported-by: Sergei Shtylyov 
> Signed-off-by: Sean Wang 

Applied.


Re: [PATCH net-next 00/15] rxrpc: Bug fixes and tracepoints

2016-09-24 Thread David Miller
From: David Howells 
Date: Fri, 23 Sep 2016 16:15:17 +0100

> Here are a bunch of bug fixes:
 ...
> Tagged thusly:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
>   rxrpc-rewrite-20160923

Pulled, thanks David.


Re: [net-next 00/10][pull request] 10GbE Intel Wired LAN Driver Updates 2016-09-23

2016-09-24 Thread David Miller
From: Jeff Kirsher 
Date: Fri, 23 Sep 2016 00:51:33 -0700

> This series contains updates to ixgbe and ixgbevf.

Pulled, thanks Jeff.


Re: pull request (net-next): ipsec-next 2016-09-23

2016-09-24 Thread David Miller
From: Steffen Klassert 
Date: Fri, 23 Sep 2016 09:14:40 +0200

> Only two patches this time:
> 
> 1) Fix a comment reference to struct xfrm_replay_state_esn.
>From Richard Guy Briggs.
> 
> 2) Convert xfrm_state_lookup to rcu, we don't need the
>xfrm_state_lock anymore in the input path.
>From Florian Westphal.
> 
> Please pull or let me know if there are problems.

Pulled, thanks Steffen.


Re: [net-next v2 00/10][pull request] 40GbE Intel Wired LAN Driver Updates 2016-09-22

2016-09-24 Thread David Miller
From: Jeff Kirsher 
Date: Thu, 22 Sep 2016 22:45:32 -0700

> This series contains updates to i40e and i40evf only.

Pulled, thanks Jeff.


Re: [PATCH net-next V3 0/5] mlx4 VF vlan protocol 802.1ad support

2016-09-24 Thread David Miller
From: Tariq Toukan 
Date: Thu, 22 Sep 2016 12:11:11 +0300

> This patchset adds VF VLAN protocol 802.1ad support to the
> mlx4 driver.
> We extended the VF VLAN API with an additional parameter
> for VLAN protocol, and kept 802.1Q as drivers' default.
> 
> We prepared a userspace support (ip link tool).
> The patch will be submitted to the iproute2 mailing list.
> 
> The ip link tool VF VLAN protocol parameter is optional (default: 802.1Q).
> A configuration command of VF VLAN that is used prior to this patchset
> will result in same functionality as today's (VST with VLAN protocol 802.1Q).
> 
> The series generated against net-next commit:
> 688dc5369a63 "Merge branch 'mlx4-next'"
> 
> All maintainers of the modified modules are in cc.

Series applied, thanks.


Re: [Intel-wired-lan] [PATCH net-next v2 1/2] i40e: remove superfluous I40E_DEBUG_USER statement

2016-09-24 Thread Stefan Assmann
On 24.09.2016 04:48, Alexander Duyck wrote:
> On Fri, Sep 23, 2016 at 6:30 AM, Stefan Assmann  wrote:
>> This debug statement is confusing and never set in the code. Any debug
>> output should be guarded by the proper I40E_DEBUG_* statement which can
>> be enabled via the debug module parameter or ethtool.
>> Remove or convert the I40E_DEBUG_USER cases to I40E_DEBUG_INIT.
>>
>> v2: re-add setting the debug_mask in i40e_set_msglevel() so that the
>> debug level can still be altered via ethtool msglvl.
>>
>> Signed-off-by: Stefan Assmann 
>> ---
>>  drivers/net/ethernet/intel/i40e/i40e_common.c  |  3 ---
>>  drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  6 -
>>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  3 +--
>>  drivers/net/ethernet/intel/i40e/i40e_main.c| 35 
>> +-
>>  drivers/net/ethernet/intel/i40e/i40e_type.h|  2 --
>>  5 files changed, 18 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
>> b/drivers/net/ethernet/intel/i40e/i40e_common.c
>> index 2154a34..8ccb09c 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_common.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
>> @@ -3207,9 +3207,6 @@ static void i40e_parse_discover_capabilities(struct 
>> i40e_hw *hw, void *buff,
>> break;
>> case I40E_AQ_CAP_ID_MSIX:
>> p->num_msix_vectors = number;
>> -   i40e_debug(hw, I40E_DEBUG_INIT,
>> -  "HW Capability: MSIX vector count = %d\n",
>> -  p->num_msix_vectors);
>> break;
>> case I40E_AQ_CAP_ID_VF_MSIX:
>> p->num_msix_vectors_vf = number;
> 
> I'm assuming this is dropped because you considered it redundant with
> the dump in i40e_get_capabilities.  If so it would have been nice to
> see this called out in your patch description somewhere as it doesn't
> jive with the rest of the patch since you are stripping something that
> is using I40E_DEBUG_INIT.

Hi Alex,

agreed, it seemed redundant. I'll make a note about it in the next
version when we have decided how to proceed.

>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
>> b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
>> index 05cf9a7..e9c6f1c 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
>> @@ -1210,12 +1210,6 @@ static ssize_t i40e_dbg_command_write(struct file 
>> *filp,
>> u32 level;
>> cnt = sscanf(_buf[10], "%i", );
>> if (cnt) {
>> -   if (I40E_DEBUG_USER & level) {
>> -   pf->hw.debug_mask = level;
>> -   dev_info(>pdev->dev,
>> -"set hw.debug_mask = 0x%08x\n",
>> -pf->hw.debug_mask);
>> -   }
>> pf->msg_enable = level;
>> dev_info(>pdev->dev, "set msg_enable = 0x%08x\n",
>>  pf->msg_enable);
> 
> From what I can tell the interface is completely redundant as ethtool
> can already do this.  I'd say it is okay to just remove this command
> and section entirely from the debugfs interface.

Yes, I didn't want to stray too far from what the description said and
just removed the I40E_DEBUG_USER related code.

>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
>> b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
>> index 1835186..02f55ab 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
>> @@ -987,8 +987,7 @@ static void i40e_set_msglevel(struct net_device *netdev, 
>> u32 data)
>> struct i40e_netdev_priv *np = netdev_priv(netdev);
>> struct i40e_pf *pf = np->vsi->back;
>>
>> -   if (I40E_DEBUG_USER & data)
>> -   pf->hw.debug_mask = data;
>> +   pf->hw.debug_mask = data;
>> pf->msg_enable = data;
>>  }
>>
> 
> So the way I view this is that I40E_DEBUG_USER appears to be a flag
> that is being used to differentiate between some proprietary flags and
> the standard msg level.  The problem is that msg_enable and debug_mask
> are playing off of two completely different bit definitions.  For
> example how much sense does it make for NETIF_F_MSG_TX_DONE to map to
> I40E_DEBUG_DCB.  If anything what should probably happen here is
> instead of dropping the if there probably needs to be an else.

As you said the flags don't match, which is part of the problem. What
tipped me of starting to work on this is, that the debug module
parameter doesn't do a thing atm and I had to debug some stuff during
driver MSI-X initialization. So my main pain point here is to get the
debug parameter in a sane state.

> 

Re: [PATCH] mlx5: Add ndo_poll_controller() implementation

2016-09-24 Thread Saeed Mahameed
On Fri, Sep 23, 2016 at 11:13 PM, Calvin Owens  wrote:
> This implements ndo_poll_controller in net_device_ops for mlx5, which is
> necessary to use netconsole with this driver.
>
> Signed-off-by: Calvin Owens 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 2459c7f..439476f 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -2786,6 +2786,20 @@ static void mlx5e_tx_timeout(struct net_device *dev)
> schedule_work(>tx_timeout_work);
>  }
>
> +#ifdef CONFIG_NET_POLL_CONTROLLER
> +/* Fake "interrupt" called by netpoll (eg netconsole) to send skbs without
> + * reenabling interrupts.
> + */
> +static void mlx5e_netpoll(struct net_device *dev)
> +{
> +   struct mlx5e_priv *priv = netdev_priv(dev);
> +   int i, nr_sq = priv->params.num_channels * priv->params.num_tc;
> +
> +   for (i = 0; i < nr_sq; i++)
> +   napi_schedule(priv->txq_to_sq_map[i]->cq.napi);

Hi Calvin,

Basically all CQs on the same channel are sharing the same napi, so
here you will end up calling napi_schedule more than once for each
napi (channel).
iterating over the SQs map is irrelevant here, all you need to do is
to iterate over the channels:

 for (i = 0; i < priv->params.num_channels; i++)
napi_schedule(priv->channel[i]->napi);


Thanks,
Saeed.

> +}
> +#endif
> +
>  static const struct net_device_ops mlx5e_netdev_ops_basic = {
> .ndo_open= mlx5e_open,
> .ndo_stop= mlx5e_close,
> @@ -2805,6 +2819,9 @@ static const struct net_device_ops 
> mlx5e_netdev_ops_basic = {
> .ndo_rx_flow_steer   = mlx5e_rx_flow_steer,
>  #endif
> .ndo_tx_timeout  = mlx5e_tx_timeout,
> +#ifdef CONFIG_NET_POLL_CONTROLLER
> +   .ndo_poll_controller = mlx5e_netpoll,
> +#endif
>  };
>
>  static const struct net_device_ops mlx5e_netdev_ops_sriov = {
> @@ -2836,6 +2853,9 @@ static const struct net_device_ops 
> mlx5e_netdev_ops_sriov = {
> .ndo_set_vf_link_state   = mlx5e_set_vf_link_state,
> .ndo_get_vf_stats= mlx5e_get_vf_stats,
> .ndo_tx_timeout  = mlx5e_tx_timeout,
> +#ifdef CONFIG_NET_POLL_CONTROLLER
> +   .ndo_poll_controller = mlx5e_netpoll,
> +#endif
>  };
>
>  static int mlx5e_check_required_hca_cap(struct mlx5_core_dev *mdev)
> --
> 2.9.3
>


Re: [PATCH] brcmfmac: fix memory leak in brcmf_fill_bss_param

2016-09-24 Thread Kalle Valo
Rafał Miłecki  writes:

> From: Rafał Miłecki 
>
> This function is called from get_station callback which means that every
> time user space was getting/dumping station(s) we were leaking 2 KiB.
>
> Signed-off-by: Rafał Miłecki 
> Fixes: 1f0dc59a6de ("brcmfmac: rework .get_station() callback")
> Cc: sta...@vger.kernel.org # 4.2+
> ---
> Kalle, ideally this should go as 4.8 fix, but I'm aware it's quite late.
> If you are not planning to send another pull request, just get it for
> the next release and let's let stable guys backport it.

An old memory leak is not severe enough for 4.8 at this stage, so I'll
queue this to 4.9.

BTW, either my Gnus or my SMTP server (I haven't bothered to check yet
why exactly) don't like the names with style of "(open list:NETWORKING
DRIVERS)" in the CC list, I have to edit them away everytime I reply.
Does anyone have any ideas why that's happening just to me?

-- 
Kalle Valo


RE RE

2016-09-24 Thread Mr.Campbell Neiman
This message is the last notification about U USD14.5 million bearing our
Name as Beneficiary, all effort to reach you have not be successful,
Please if you Receive this message kindly respond back stating your Desire
to make the claim, Reconfirm your full name and age Mr. Mr.Campbell Neiman



Re: [PATCH 1/6] isdn/eicon: add function declarations

2016-09-24 Thread Arnd Bergmann
On Saturday, September 24, 2016 1:16:44 PM CEST Baoyou Xie wrote:
> We get a few warnings when building kernel with W=1:
> drivers/isdn/hardware/eicon/diddfunc.c:95:12: warning: no previous prototype 
> for 'diddfunc_init' [-Wmissing-prototypes]
> drivers/isdn/hardware/eicon/s_4bri.c:128:6: warning: no previous prototype 
> for 'start_qBri_hardware' [-Wmissing-prototypes]
> drivers/isdn/hardware/eicon/idifunc.c:243:12: warning: no previous prototype 
> for 'idifunc_init' [-Wmissing-prototypes]
> drivers/isdn/hardware/eicon/capifunc.c:217:6: warning: no previous prototype 
> for 'api_remove_complete' [-Wmissing-prototypes]
> 
> 
> In fact, these functions need be declare in some header files.
> 
> So this patch adds function declarations in
> drivers/isdn/hardware/eicon/di_defs.h,
> drivers/isdn/hardware/eicon/capifunc.h,
> drivers/isdn/hardware/eicon/xdi_adapter.h.
> 
> Signed-off-by: Baoyou Xie 

Nice cleanup!

> 
> diff --git a/drivers/isdn/hardware/eicon/capifunc.c 
> b/drivers/isdn/hardware/eicon/capifunc.c
> index 7a0bdbd..869b98e 100644
> --- a/drivers/isdn/hardware/eicon/capifunc.c
> +++ b/drivers/isdn/hardware/eicon/capifunc.c
> @@ -55,9 +55,6 @@ static void diva_release_appl(struct capi_ctr *, __u16);
>  static char *diva_procinfo(struct capi_ctr *);
>  static u16 diva_send_message(struct capi_ctr *,
>diva_os_message_buffer_s *);
> -extern void diva_os_set_controller_struct(struct capi_ctr *);
> -
> -extern void DIVA_DIDD_Read(DESCRIPTOR *, int);
>  
>  /*
>   * debug

There are a couple of other 'extern' declarations in this file,
please do them at all once.

Note that there are also some extern declarations for variables in
this .c files of this driver, so it makes sense to do the variables
and the function declarations at the same time.


> diff --git a/drivers/isdn/hardware/eicon/diva.c 
> b/drivers/isdn/hardware/eicon/diva.c
> index d91dd58..9693add 100644
> --- a/drivers/isdn/hardware/eicon/diva.c
> +++ b/drivers/isdn/hardware/eicon/diva.c
> @@ -28,8 +28,6 @@
>  
>  PISDN_ADAPTER IoAdapters[MAX_ADAPTER];
>  extern IDI_CALL Requests[MAX_ADAPTER];
> -extern int create_adapter_proc(diva_os_xdi_adapter_t *a);
> -extern void remove_adapter_proc(diva_os_xdi_adapter_t *a);

Requests[] is another such example. This is particularly bad,
because the name is extremely generic, and can cause conflicts
when another driver uses the same identifier for a global symbol.

Ideally it should be renamed with to 'diva_requests'.

> --- a/drivers/isdn/hardware/eicon/divasproc.c
> +++ b/drivers/isdn/hardware/eicon/divasproc.c
> @@ -34,8 +34,6 @@
>  
>  
>  extern PISDN_ADAPTER IoAdapters[MAX_ADAPTER];
> -extern void divas_get_version(char *);
> -extern void diva_get_vserial_number(PISDN_ADAPTER IoAdapter, char *buf);
>  

same for IoAdapters.

>  static void diva_get_extended_adapter_features(DIVA_CAPI_ADAPTER *a);
> @@ -224,20 +223,10 @@ static void diva_free_dma_descriptor(PLCI *plci, int 
> nr);
>  /* external function prototypes */
>  /*--*/
>  
> -extern byte MapController(byte);
>  extern byte UnMapController(byte);


The comment "external function prototypes" should be removed along with the
actual prototypes.

>  #define MapId(Id)(((Id) & 0xff00L) | MapController((byte)(Id)))
>  #define UnMapId(Id)(((Id) & 0xff00L) | UnMapController((byte)(Id)))

and probably the macros can get moved as well for consistency.

> -extern int diva_card_read_xlog(diva_os_xdi_adapter_t *a);
> -
>  /*
>  **  IMPORTS
>  */
> -extern void prepare_pri_functions(PISDN_ADAPTER IoAdapter);
> -extern void prepare_pri2_functions(PISDN_ADAPTER IoAdapter);
> -extern void diva_xdi_display_adapter_features(int card);
> -

Another comment that should go.

Arnd




Re: [PATCH 4/6] isdn/hisax: clean function declaration in hscx.c up

2016-09-24 Thread Arnd Bergmann
On Saturday, September 24, 2016 1:24:22 PM CEST Baoyou Xie wrote:
>  }
>  
> -extern int open_hscxstate(struct IsdnCardState *cs, struct BCState *bcs);
>  extern void modehscx(struct BCState *bcs, int mode, int bc);
>  extern void hscx_l2l1(struct PStack *st, int pr, void *arg);
>  

The change makes sense, but I would remove the other two declarations
as well, as extern declarations don't belong into .c files.

As far as I can tell, modehscx() already has a declaration in hscx.h,
while hscx_l2l1() doesn't, and the declaration here should be
moved as well.

> diff --git a/drivers/isdn/hisax/hscx.h b/drivers/isdn/hisax/hscx.h
> index 1148b4b..fa7bf16 100644
> --- a/drivers/isdn/hisax/hscx.h
> +++ b/drivers/isdn/hisax/hscx.h
> @@ -39,3 +39,4 @@ extern void modehscx(struct BCState *bcs, int mode, int bc);
>  extern void clear_pending_hscx_ints(struct IsdnCardState *cs);
>  extern void inithscx(struct IsdnCardState *cs);
>  extern void inithscxisac(struct IsdnCardState *cs, int part);
> +int open_hscxstate(struct IsdnCardState *cs, struct BCState *bcs);

For consistency, I would add 'extern' here. We normally leave that out,
but I think if there are lots of declarations in a header file that all
have it, it's better if they are all the same.

Arnd


Re: [PATCH 3/6] isdn/hisax: add function declarations

2016-09-24 Thread Arnd Bergmann
On Saturday, September 24, 2016 1:21:47 PM CEST Baoyou Xie wrote:
> --- a/drivers/isdn/hisax/config.c
> +++ b/drivers/isdn/hisax/config.c
> @@ -460,42 +460,14 @@ __setup("hisax=", HiSax_setup);
>  extern int setup_teles0(struct IsdnCard *card);
>  #endif
>  
> -#if CARD_TELES3
> -extern int setup_teles3(struct IsdnCard *card);
> -#endif
...
> -#if CARD_TELESPCI
> -extern int setup_telespci(struct IsdnCard *card);
> -#endif
> -
>  #if CARD_AVM_A1
>  extern int setup_avm_a1(struct IsdnCard *card);
>  #endif

It seems odd that you remove some but not all declarations
here. Please do all of them at once.

> @@ -1350,3 +1350,63 @@ static inline struct pci_dev 
> *hisax_find_pci_device(unsigned int vendor,
>  }
>  
>  #endif
> +
> +#if CARD_TELES3
> +int setup_teles3(struct IsdnCard *card);
> +#endif
> +
> +#if CARD_TELESPCI
> +int setup_telespci(struct IsdnCard *card);
> +#endif
> +

When you add the declarations here, just leave out the #if guards,
and put all the declarations here unconditionally, as we normally
do in the kernel.

Arnd



Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing

2016-09-24 Thread Pavel Machek
On Tue 2016-09-20 19:08:23, Mickaël Salaün wrote:
> 
> On 15/09/2016 11:19, Pavel Machek wrote:
> > Hi!
> > 
> >> This series is a proof of concept to fill some missing part of seccomp as 
> >> the
> >> ability to check syscall argument pointers or creating more dynamic 
> >> security
> >> policies. The goal of this new stackable Linux Security Module (LSM) called
> >> Landlock is to allow any process, including unprivileged ones, to create
> >> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
> >> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact 
> >> of
> >> bugs or unexpected/malicious behaviors in userland applications.
> >>
> >> The first RFC [1] was focused on extending seccomp while staying at the 
> >> syscall
> >> level. This brought a working PoC but with some (mitigated) ToCToU race
> >> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
> >> syscall argument evaluation (hence the LSM hooks).
> > 
> > Long and nice description follows. Should it go to Documentation/
> > somewhere?
> > 
> > Because some documentation would be useful...
> 
> Right, but I was looking for feedback before investing in documentation. :)

Heh. And I was hoping to learn what I'm reviewing. Too bad :-).

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH 2/3] bpf powerpc: implement support for tail calls

2016-09-24 Thread Alexei Starovoitov
On Sat, Sep 24, 2016 at 12:33:54AM +0200, Daniel Borkmann wrote:
> On 09/23/2016 10:35 PM, Naveen N. Rao wrote:
> >Tail calls allow JIT'ed eBPF programs to call into other JIT'ed eBPF
> >programs. This can be achieved either by:
> >(1) retaining the stack setup by the first eBPF program and having all
> >subsequent eBPF programs re-using it, or,
> >(2) by unwinding/tearing down the stack and having each eBPF program
> >deal with its own stack as it sees fit.
> >
> >To ensure that this does not create loops, there is a limit to how many
> >tail calls can be done (currently 32). This requires the JIT'ed code to
> >maintain a count of the number of tail calls done so far.
> >
> >Approach (1) is simple, but requires every eBPF program to have (almost)
> >the same prologue/epilogue, regardless of whether they need it. This is
> >inefficient for small eBPF programs which may not sometimes need a
> >prologue at all. As such, to minimize impact of tail call
> >implementation, we use approach (2) here which needs each eBPF program
> >in the chain to use its own prologue/epilogue. This is not ideal when
> >many tail calls are involved and when all the eBPF programs in the chain
> >have similar prologue/epilogue. However, the impact is restricted to
> >programs that do tail calls. Individual eBPF programs are not affected.
> >
> >We maintain the tail call count in a fixed location on the stack and
> >updated tail call count values are passed in through this. The very
> >first eBPF program in a chain sets this up to 0 (the first 2
> >instructions). Subsequent tail calls skip the first two eBPF JIT
> >instructions to maintain the count. For programs that don't do tail
> >calls themselves, the first two instructions are NOPs.
> >
> >Signed-off-by: Naveen N. Rao 
> 
> Thanks for adding support, Naveen, that's really great! I think 2) seems
> fine as well in this context as prologue size can vary quite a bit here,
> and depending on program types likelihood of tail call usage as well (but
> I wouldn't expect deep nesting). Thanks a lot!

Great stuff. In this circumstances approach 2 makes sense to me as well.



Re: [PATCH 2/2] bpf samples: update tracex5 sample to use __seccomp_filter

2016-09-24 Thread Alexei Starovoitov
On Sat, Sep 24, 2016 at 02:10:05AM +0530, Naveen N. Rao wrote:
> seccomp_phase1() does not exist anymore. Instead, update sample to use
> __seccomp_filter(). While at it, set max locked memory to unlimited.
> 
> Signed-off-by: Naveen N. Rao 

Acked-by: Alexei Starovoitov 



Re: [PATCH 1/2] bpf samples: fix compiler errors with sockex2 and sockex3

2016-09-24 Thread Alexei Starovoitov
On Sat, Sep 24, 2016 at 02:10:04AM +0530, Naveen N. Rao wrote:
> These samples fail to compile as 'struct flow_keys' conflicts with
> definition in net/flow_dissector.h. Fix the same by renaming the
> structure used in the sample.
> 
> Signed-off-by: Naveen N. Rao 

Thanks for the fix.
Acked-by: Alexei Starovoitov