Re: [PATCH net 0/4] macvlan: Fix some issues with changing mac addresses

2017-06-16 Thread Girish Moodalbail

On 6/16/17 6:36 AM, Vladislav Yasevich wrote:

There are some issues in macvlan wrt to changing it's mac address.
* An error is returned in the specified address is the same as an already
  assigned address.
* In passthru mode, the mac address of the macvlan device doesn't change.
* After changing the mac address of a passthru macvlan and then removing it,
  the mac address of the physical device remains changed.

This patch series attempts to resolve these issues.

Thanks
-vlad

Vladislav Yasevich (4):
  macvlan: Do not return error when setting the same mac address
  macvlan: Fix passthru macvlan mac address inheritance
  macvlan: convert port passthru to flags.


Above 3 patches looks good to me, so

Reviewed-by: Girish Moodalbail 



  macvlan: Let passthru macvlan correctly restore lower mac address


However, I have few questions/comments on the above patch.

thanks,
~Girish



 drivers/net/macvlan.c | 85 ++-
 1 file changed, 71 insertions(+), 14 deletions(-)





Re: [PATCH net 4/4] macvlan: Let passthru macvlan correctly restore lower mac address

2017-06-16 Thread Girish Moodalbail
Sorry, it took sometime to wrap around this patch series since they all change 
one file and at times the same function :).



On 6/16/17 6:36 AM, Vladislav Yasevich wrote:

Passthru macvlans directly change the mac address of the lower
level device.  That's OK, but after the macvlan is deleted,
the lower device is left with changed address and one needs to
reboot to bring back the origina HW addresses.


s/origina/original/



This scenario is actually quite common with passthru macvtap devices.

This patch attempts to solve this, by storing the mac address
of the lower device in macvlan_port structure and keeping track of
it through the changes.

After this patch, any changes to the lower device mac address
done trough the macvlan device, will be reverted back.  Any
changes done directly to the lower device mac address will be kept.

Signed-off-by: Vladislav Yasevich 
---
 drivers/net/macvlan.c | 47 ---
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c
index eb956ff..c551165 100644
--- a/drivers/net/macvlan.c
+++ b/drivers/net/macvlan.c
@@ -40,6 +40,7 @@
 #define MACVLAN_BC_QUEUE_LEN   1000

 #define MACVLAN_F_PASSTHRU 1
+#define MACVLAN_F_ADDRCHANGE   2

 struct macvlan_port {
struct net_device   *dev;
@@ -51,6 +52,7 @@ struct macvlan_port {
int count;
struct hlist_head   vlan_source_hash[MACVLAN_HASH_SIZE];
DECLARE_BITMAP(mc_filter, MACVLAN_MC_FILTER_SZ);
+   unsigned char   perm_addr[ETH_ALEN];
 };

 struct macvlan_source_entry {
@@ -78,6 +80,21 @@ static inline void macvlan_set_passthru(struct macvlan_port 
*port)
port->flags |= MACVLAN_F_PASSTHRU;
 }

+static inline bool macvlan_addr_change(const struct macvlan_port *port)
+{
+   return port->flags & MACVLAN_F_ADDRCHANGE;
+}
+
+static inline void macvlan_set_addr_change(struct macvlan_port *port)
+{
+   port->flags |= MACVLAN_F_ADDRCHANGE;
+}
+
+static inline void macvlan_clear_addr_change(struct macvlan_port *port)
+{
+   port->flags &= ~MACVLAN_F_ADDRCHANGE;
+}
+
 /* Hash Ethernet address */
 static u32 macvlan_eth_hash(const unsigned char *addr)
 {
@@ -193,11 +210,11 @@ static void macvlan_hash_change_addr(struct macvlan_dev 
*vlan,
 static bool macvlan_addr_busy(const struct macvlan_port *port,
  const unsigned char *addr)
 {
-   /* Test to see if the specified multicast address is
+   /* Test to see if the specified address is
 * currently in use by the underlying device or
 * another macvlan.
 */
-   if (!macvlan_passthru(port) &&
+   if (!macvlan_passthru(port) && !macvlan_addr_change(port) &&
ether_addr_equal_64bits(port->dev->dev_addr, addr))
return true;

@@ -685,6 +702,7 @@ static int macvlan_sync_address(struct net_device *dev, 
unsigned char *addr)
 {
struct macvlan_dev *vlan = netdev_priv(dev);
struct net_device *lowerdev = vlan->lowerdev;
+   struct macvlan_port *port = vlan->port;
int err;

if (!(dev->flags & IFF_UP)) {
@@ -695,7 +713,7 @@ static int macvlan_sync_address(struct net_device *dev, 
unsigned char *addr)
if (macvlan_addr_busy(vlan->port, addr))
return -EBUSY;

-   if (!macvlan_passthru(vlan->port)) {
+   if (!macvlan_passthru(port)) {
err = dev_uc_add(lowerdev, addr);
if (err)
return err;
@@ -705,6 +723,15 @@ static int macvlan_sync_address(struct net_device *dev, 
unsigned char *addr)

macvlan_hash_change_addr(vlan, addr);
}
+   if (macvlan_passthru(port) && !macvlan_addr_change(port)) {
+   /* Since addr_change isn't set, we are here due to lower
+* device change.  Save the lower-dev address so we can
+* restore it later.
+*/
+   ether_addr_copy(vlan->port->perm_addr,
+   dev->dev_addr);


Did you meant to copy `addr' here? Since dev->dev_addr is that of the macvlan 
device whilst `addr' is from the lower parent device.



Thanks,
~Girish




[PATCH v3] ip6_tunnel: Correct tos value in collect_md mode

2017-06-16 Thread Haishuang Yan
Same as ip_gre, geneve and vxlan, use key->tos as traffic class value.

CC: Peter Dawson 
Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on
encapsulated packets”)
Signed-off-by: Haishuang Yan 

---
Changes since v3:
  * Add fixes information
  * Remove obsoleted RT_TOS mask
---
 net/ipv6/ip6_tunnel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index ef99d59..9d65918 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1249,7 +1249,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device 
*dev, __u8 dsfield,
fl6.flowi6_proto = IPPROTO_IPIP;
fl6.daddr = key->u.ipv6.dst;
fl6.flowlabel = key->label;
-   dsfield = ip6_tclass(key->label);
+   dsfield =  key->tos;
} else {
if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
encap_limit = t->parms.encap_limit;
@@ -1320,7 +1320,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device 
*dev, __u8 dsfield,
fl6.flowi6_proto = IPPROTO_IPV6;
fl6.daddr = key->u.ipv6.dst;
fl6.flowlabel = key->label;
-   dsfield = ip6_tclass(key->label);
+   dsfield = key->tos;
} else {
offset = ip6_tnl_parse_tlv_enc_lim(skb, 
skb_network_header(skb));
/* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head 
*/
-- 
1.8.3.1





[PATCH v2 2/2] ip6_tunnel: fix ip6 tunnel lookup in collect_md mode

2017-06-16 Thread Haishuang Yan
In collect_md mode, if the tun dev is down, it still can call
__ip6_tnl_rcv to receive on packets, and the rx statistics increase
improperly.

Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Cc: Alexei Starovoitov 
Signed-off-by: Haishuang Yan 

---
Change since v2:
  * Fix wrong recipient address
---
 net/ipv6/ip6_tunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 6400726..25961c7 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -171,7 +171,7 @@ static struct net_device_stats *ip6_get_stats(struct 
net_device *dev)
}
 
t = rcu_dereference(ip6n->collect_md_tun);
-   if (t)
+   if (t && (t->dev->flags & IFF_UP))
return t;
 
t = rcu_dereference(ip6n->tnls_wc[0]);
-- 
1.8.3.1





[PATCH v2 1/2] ip_tunnel: fix ip tunnel lookup in collect_md mode

2017-06-16 Thread Haishuang Yan
In collect_md mode, if the tun dev is down, it still can call
ip_tunnel_rcv to receive on packets, and the rx statistics increase
improperly.

Fixes: 2e15ea390e6f ("ip_gre: Add support to collect tunnel metadata.")
Cc: Pravin B Shelar 
Signed-off-by: Haishuang Yan 

---
Change since v2:
  * Fix wrong recipient addresss
---
 net/ipv4/ip_tunnel.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 0f1d876..a3caba1 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -176,7 +176,7 @@ struct ip_tunnel *ip_tunnel_lookup(struct ip_tunnel_net 
*itn,
return cand;
 
t = rcu_dereference(itn->collect_md_tun);
-   if (t)
+   if (t && (t->dev->flags & IFF_UP))
return t;
 
if (itn->fb_tunnel_dev && itn->fb_tunnel_dev->flags & IFF_UP)
-- 
1.8.3.1





Re: [PATCH v2] ip6_tunnel: Correct tos value in collect_md mode

2017-06-16 Thread 严海双


> On 16 Jun 2017, at 10:44 PM, Daniel Borkmann  wrote:
> 
> On 06/15/2017 05:54 AM, Peter Dawson wrote:
>> On Thu, 15 Jun 2017 10:30:29 +0800
>> Haishuang Yan  wrote:
>> 
>>> Same as ip_gre, geneve and vxlan, use key->tos as tos value.
>>> 
>>> CC: Peter Dawson 
>>> Fixes: 0e9a709560db ("ip6_tunnel, ip6_gre: fix setting of DSCP on
>>> encapsulated packets”)
>>> Suggested-by: Daniel Borkmann 
>>> Signed-off-by: Haishuang Yan 
>>> 
>>> ---
>>> Changes since v2:
>>>   * Add fixes information
>>>   * mask key->tos with RT_TOS() suggested by Daniel
>>> ---
>>>  net/ipv6/ip6_tunnel.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
>>> index ef99d59..6400726 100644
>>> --- a/net/ipv6/ip6_tunnel.c
>>> +++ b/net/ipv6/ip6_tunnel.c
>>> @@ -1249,7 +1249,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct 
>>> net_device *dev, __u8 dsfield,
>>> fl6.flowi6_proto = IPPROTO_IPIP;
>>> fl6.daddr = key->u.ipv6.dst;
>>> fl6.flowlabel = key->label;
>>> -   dsfield = ip6_tclass(key->label);
>>> +   dsfield =  RT_TOS(key->tos);
>>> } else {
>>> if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
>>> encap_limit = t->parms.encap_limit;
>>> @@ -1320,7 +1320,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct 
>>> net_device *dev, __u8 dsfield,
>>> fl6.flowi6_proto = IPPROTO_IPV6;
>>> fl6.daddr = key->u.ipv6.dst;
>>> fl6.flowlabel = key->label;
>>> -   dsfield = ip6_tclass(key->label);
>>> +   dsfield = RT_TOS(key->tos);
>>> } else {
>>> offset = ip6_tnl_parse_tlv_enc_lim(skb, 
>>> skb_network_header(skb));
>>> /* ip6_tnl_parse_tlv_enc_lim() might have reallocated skb->head 
>>> */
>> 
>> I don't think it is correct to apply RT_TOS
>> 
>> Here is my understanding based on the RFCs.
>> 
>> IPv4/6 Header:0 |0 1 2 3 |0 1 2 3 |0 1 2 3 |0 1 2 3 |
>> RFC2460(IPv6)   |Version | Traffic Class   ||
>> RFC2474(IPv6)   |Version | DSCP|ECN||
>> RFC2474(IPv4)   |Version |  IHL   |DSCP |ECN|
>> RFC1349(IPv4)   |Version |  IHL   | PREC |  TOS   |X|
>> RFC791 (IPv4)   |Version |  IHL   |  TOS|
>> 
>> u8 key->tos stores the full 8bits of Traffic class from an IPv6 header and;
>> u8 key->tos stores the full 8bits of TOS(RFC791) from an IPv4 header
>> u8 ip6_tclass will return the full 8bits of Traffic Class from an IPv6 
>> flowlabel
>> 
>> RT_TOS will return the RFC1349 4bit TOS field.
>> 
>> Applying RT_TOS to a key->tos will result in lost information and the 
>> inclusion of 1 bit of ECN if the original field was a DSCP+ECN.
>> 
>> Based on this understanding of the RFCs (but not years of experience) and 
>> since RFC1349 has been obsoleted by RFC2474 I think the use of RT_TOS should 
>> be deprecated.
>> 
>> This being said, dsfield = ip6_tclass(key->label) = key->tos isn't fully 
>> correct either because the result will contain the ECN bits as well as the 
>> DSCP.
>> 
>> I agree that code should be consistent, but not where there is a potential 
>> issue.
> 
> Yeah, you're right. Looks like initial dsfield = key->tos diff was
> the better choice then, sorry for my confusing comment.
> 
> For example, bpf_skb_set_tunnel_key() helper that populates the collect
> metadata as one user of this infra masks the key->label so that it really
> only holds the label meaning previous dsfield = ip6_tclass(key->label)
> will always be 0 in that case unlike key->tos that actually gets populated
> and would propagate it.
> 
Okay, I will change the commit back to initial version, thanks everyone.





Re: [PATCH v3 net-next 3/4] tls: kernel TLS support

2017-06-16 Thread Dave Watson
On 06/16/17 01:58 PM, Stephen Hemminger wrote:
> On Wed, 14 Jun 2017 11:37:39 -0700
> Dave Watson  wrote:
> 
> > --- /dev/null
> > +++ b/net/tls/Kconfig
> > @@ -0,0 +1,12 @@
> > +#
> > +# TLS configuration
> > +#
> > +config TLS
> > +   tristate "Transport Layer Security support"
> > +   depends on NET
> > +   default m
> > +   ---help---
> > +   Enable kernel support for TLS protocol. This allows symmetric
> > +   encryption handling of the TLS protocol to be done in-kernel.
> > +
> > +   If unsure, say M.
> 
> I understand that this will be useful to lots of people and most distributions
> will enable it. But the defacto policy in kernel configuration has been that
> new features in kernel default to being disabled.

Sure, will send a patch to switch to default n.


Re: [PATCH v3 net-next 1/4] tcp: ULP infrastructure

2017-06-16 Thread Christoph Paasch
Hello,

On 14/06/17 - 11:37:14, Dave Watson wrote:
> Add the infrustructure for attaching Upper Layer Protocols (ULPs) over TCP
> sockets. Based on a similar infrastructure in tcp_cong.  The idea is that any
> ULP can add its own logic by changing the TCP proto_ops structure to its own
> methods.
> 
> Example usage:
> 
> setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
> 
> modules will call:
> tcp_register_ulp(_tls_ulp_ops);
> 
> to register/unregister their ulp, with an init function and name.
> 
> A list of registered ulps will be returned by tcp_get_available_ulp, which is
> hooked up to /proc.  Example:
> 
> $ cat /proc/sys/net/ipv4/tcp_available_ulp
> tls
> 
> There is currently no functionality to remove or chain ULPs, but
> it should be possible to add these in the future if needed.
> 
> Signed-off-by: Boris Pismenny 
> Signed-off-by: Dave Watson 
> ---
>  include/net/inet_connection_sock.h |   4 ++
>  include/net/tcp.h  |  25 +++
>  include/uapi/linux/tcp.h   |   1 +
>  net/ipv4/Makefile  |   2 +-
>  net/ipv4/sysctl_net_ipv4.c |  25 +++
>  net/ipv4/tcp.c |  28 
>  net/ipv4/tcp_ipv4.c|   2 +
>  net/ipv4/tcp_ulp.c | 134 
> +
>  8 files changed, 220 insertions(+), 1 deletion(-)
>  create mode 100644 net/ipv4/tcp_ulp.c

I know I'm pretty late to the game (and maybe this has already been
discussed but I couldn't find anything in the archives), but I am wondering
what the take is on potential races of the setsockopt() vs other system-calls.

For example one might race the setsockopt() with a sendmsg() and the sendmsg
might end up blocking on the lock_sock in tcp_sendmsg, waiting for
tcp_set_ulp() to finish changing sk_prot. When the setsockopt() finishes, we
are then inside tcp_sendmsg() coming directly from sendmsg(), while we
should have been in the ULP's sendmsg.

It seems like TLS-ULP is resilient to this (or at least, won't cause a panic),
but there might be more exotic users of ULP in the future, that change other
callbacks and then things might go wrong.


Thoughts?


Thanks,
Christoph



[RFC net-next 3/8] nfp: xdp: move driver XDP setup into a separate function

2017-06-16 Thread Jakub Kicinski
In preparation of XDP offload flags move the driver setup into
a function.  Otherwise the number of conditions in one function
would make it slightly hard to follow.  The offload handler may
now be called with NULL prog, even if no offload is currently
active, but that's fine, offload code can handle that.

Signed-off-by: Jakub Kicinski 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 23 +-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 2b1ae666..f2188b9c3628 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3274,10 +3274,11 @@ static void nfp_net_del_vxlan_port(struct net_device 
*netdev,
nfp_net_set_vxlan_port(nn, idx, 0);
 }
 
-static int nfp_net_xdp_setup(struct nfp_net *nn, struct netdev_xdp *xdp)
+static int
+nfp_net_xdp_setup_drv(struct nfp_net *nn, struct bpf_prog *prog,
+ struct netlink_ext_ack *extack)
 {
struct bpf_prog *old_prog = nn->dp.xdp_prog;
-   struct bpf_prog *prog = xdp->prog;
struct nfp_net_dp *dp;
int err;
 
@@ -3286,7 +3287,6 @@ static int nfp_net_xdp_setup(struct nfp_net *nn, struct 
netdev_xdp *xdp)
if (prog && nn->dp.xdp_prog) {
prog = xchg(>dp.xdp_prog, prog);
bpf_prog_put(prog);
-   nfp_app_xdp_offload(nn->app, nn, nn->dp.xdp_prog);
return 0;
}
 
@@ -3300,13 +3300,26 @@ static int nfp_net_xdp_setup(struct nfp_net *nn, struct 
netdev_xdp *xdp)
dp->rx_dma_off = prog ? XDP_PACKET_HEADROOM - nn->dp.rx_offset : 0;
 
/* We need RX reconfig to remap the buffers (BIDIR vs FROM_DEV) */
-   err = nfp_net_ring_reconfig(nn, dp, xdp->extack);
+   err = nfp_net_ring_reconfig(nn, dp, extack);
if (err)
return err;
 
if (old_prog)
bpf_prog_put(old_prog);
 
+   return 0;
+}
+
+static int
+nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog *prog,
+ struct netlink_ext_ack *extack)
+{
+   int err;
+
+   err = nfp_net_xdp_setup_drv(nn, prog, extack);
+   if (err)
+   return err;
+
nfp_app_xdp_offload(nn->app, nn, nn->dp.xdp_prog);
 
return 0;
@@ -3318,7 +3331,7 @@ static int nfp_net_xdp(struct net_device *netdev, struct 
netdev_xdp *xdp)
 
switch (xdp->command) {
case XDP_SETUP_PROG:
-   return nfp_net_xdp_setup(nn, xdp);
+   return nfp_net_xdp_setup(nn, xdp->prog, xdp->extack);
case XDP_QUERY_PROG:
xdp->prog_attached = !!nn->dp.xdp_prog;
xdp->prog_id = nn->dp.xdp_prog ? nn->dp.xdp_prog->aux->id : 0;
-- 
2.11.0



[RFC net-next 5/8] nfp: bpf: take a reference on offloaded programs

2017-06-16 Thread Jakub Kicinski
The xdp_prog member of the adapter's data path structure is used
for XDP in driver mode.  In case a XDP program is loaded with in
HW-only mode, we need to store it somewhere else.  Add a new XDP
prog pointer in the main structure and use that when we need to
know whether any XDP program is loaded, not only a driver mode
one.  Only release our reference on adapter free instead of
immediately after netdev unregister to allow offload to be disabled
first.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h|  2 ++
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 20 ++--
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 7952fbfb94d6..b7446793106d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -542,6 +542,7 @@ struct nfp_net_dp {
  * @rss_key:RSS secret key
  * @rss_itbl:   RSS indirection table
  * @xdp_flags: Flags with which XDP prog was loaded
+ * @xdp_prog:  XDP prog (for ctrl path, both DRV and HW modes)
  * @max_r_vecs:Number of allocated interrupt vectors for RX/TX
  * @max_tx_rings:   Maximum number of TX rings supported by the Firmware
  * @max_rx_rings:   Maximum number of RX rings supported by the Firmware
@@ -592,6 +593,7 @@ struct nfp_net {
u8 rss_itbl[NFP_NET_CFG_RSS_ITBL_SZ];
 
u32 xdp_flags;
+   struct bpf_prog *xdp_prog;
 
unsigned int max_tx_rings;
unsigned int max_rx_rings;
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 9563615cf4b7..68648e312129 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3278,7 +3278,6 @@ static int
 nfp_net_xdp_setup_drv(struct nfp_net *nn, struct bpf_prog *prog,
  struct netlink_ext_ack *extack)
 {
-   struct bpf_prog *old_prog = nn->dp.xdp_prog;
struct nfp_net_dp *dp;
int err;
 
@@ -3304,9 +3303,6 @@ nfp_net_xdp_setup_drv(struct nfp_net *nn, struct bpf_prog 
*prog,
if (err)
return err;
 
-   if (old_prog)
-   bpf_prog_put(old_prog);
-
return 0;
 }
 
@@ -3317,7 +3313,7 @@ nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog 
*prog, u32 flags,
struct bpf_prog *offload_prog;
int err;
 
-   if (nn->dp.xdp_prog && (flags ^ nn->xdp_flags) & XDP_FLAGS_MODES)
+   if (nn->xdp_prog && (flags ^ nn->xdp_flags) & XDP_FLAGS_MODES)
return -EBUSY;
 
offload_prog = flags & XDP_FLAGS_DRV_MODE ? NULL : prog;
@@ -3327,6 +3323,10 @@ nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog 
*prog, u32 flags,
return err;
 
nfp_app_xdp_offload(nn->app, nn, offload_prog);
+
+   if (nn->xdp_prog)
+   bpf_prog_put(nn->xdp_prog);
+   nn->xdp_prog = prog;
nn->xdp_flags = flags;
 
return 0;
@@ -3341,8 +3341,8 @@ static int nfp_net_xdp(struct net_device *netdev, struct 
netdev_xdp *xdp)
return nfp_net_xdp_setup(nn, xdp->prog, xdp->flags,
 xdp->extack);
case XDP_QUERY_PROG:
-   xdp->prog_attached = !!nn->dp.xdp_prog;
-   xdp->prog_id = nn->dp.xdp_prog ? nn->dp.xdp_prog->aux->id : 0;
+   xdp->prog_attached = !!nn->xdp_prog;
+   xdp->prog_id = nn->xdp_prog ? nn->xdp_prog->aux->id : 0;
return 0;
default:
return -EINVAL;
@@ -3500,6 +3500,9 @@ struct nfp_net *nfp_net_alloc(struct pci_dev *pdev, bool 
needs_netdev,
  */
 void nfp_net_free(struct nfp_net *nn)
 {
+   if (nn->xdp_prog)
+   bpf_prog_put(nn->xdp_prog);
+
if (nn->dp.netdev)
free_netdev(nn->dp.netdev);
else
@@ -3757,7 +3760,4 @@ void nfp_net_clean(struct nfp_net *nn)
return;
 
unregister_netdev(nn->dp.netdev);
-
-   if (nn->dp.xdp_prog)
-   bpf_prog_put(nn->dp.xdp_prog);
 }
-- 
2.11.0



[RFC net-next 4/8] nfp: bpf: don't offload XDP programs in DRV_MODE

2017-06-16 Thread Jakub Kicinski
DRV_MODE means that user space wants the program to be run in
the driver.  Do not try to offload.  Only offload if no mode
flags have been specified.

Remember what the mode is when the program is installed and refuse
new setup requests if there is already a program loaded in a
different mode.  This should leave it open for us to implement
simultaneous loading of two programs - one in the drv path and
another to the NIC later.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h|  3 +++
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 14 +++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 02fd8d4e253c..7952fbfb94d6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -541,6 +541,7 @@ struct nfp_net_dp {
  * @rss_cfg:RSS configuration
  * @rss_key:RSS secret key
  * @rss_itbl:   RSS indirection table
+ * @xdp_flags: Flags with which XDP prog was loaded
  * @max_r_vecs:Number of allocated interrupt vectors for RX/TX
  * @max_tx_rings:   Maximum number of TX rings supported by the Firmware
  * @max_rx_rings:   Maximum number of RX rings supported by the Firmware
@@ -590,6 +591,8 @@ struct nfp_net {
u8 rss_key[NFP_NET_CFG_RSS_KEY_SZ];
u8 rss_itbl[NFP_NET_CFG_RSS_ITBL_SZ];
 
+   u32 xdp_flags;
+
unsigned int max_tx_rings;
unsigned int max_rx_rings;
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index f2188b9c3628..9563615cf4b7 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3311,16 +3311,23 @@ nfp_net_xdp_setup_drv(struct nfp_net *nn, struct 
bpf_prog *prog,
 }
 
 static int
-nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog *prog,
+nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog *prog, u32 flags,
  struct netlink_ext_ack *extack)
 {
+   struct bpf_prog *offload_prog;
int err;
 
+   if (nn->dp.xdp_prog && (flags ^ nn->xdp_flags) & XDP_FLAGS_MODES)
+   return -EBUSY;
+
+   offload_prog = flags & XDP_FLAGS_DRV_MODE ? NULL : prog;
+
err = nfp_net_xdp_setup_drv(nn, prog, extack);
if (err)
return err;
 
-   nfp_app_xdp_offload(nn->app, nn, nn->dp.xdp_prog);
+   nfp_app_xdp_offload(nn->app, nn, offload_prog);
+   nn->xdp_flags = flags;
 
return 0;
 }
@@ -3331,7 +3338,8 @@ static int nfp_net_xdp(struct net_device *netdev, struct 
netdev_xdp *xdp)
 
switch (xdp->command) {
case XDP_SETUP_PROG:
-   return nfp_net_xdp_setup(nn, xdp->prog, xdp->extack);
+   return nfp_net_xdp_setup(nn, xdp->prog, xdp->flags,
+xdp->extack);
case XDP_QUERY_PROG:
xdp->prog_attached = !!nn->dp.xdp_prog;
xdp->prog_id = nn->dp.xdp_prog ? nn->dp.xdp_prog->aux->id : 0;
-- 
2.11.0



[RFC net-next 2/8] xdp: add HW offload mode flag for installing programs

2017-06-16 Thread Jakub Kicinski
Add an installation-time flag for requesting that the program
be installed only if it can be offloaded to HW.

Internally new command for ndo_xdp is added, this way we avoid
putting checks into drivers since they all return -EINVAL on
an unknown command.

Signed-off-by: Jakub Kicinski 
---
 include/linux/netdevice.h| 1 +
 include/uapi/linux/if_link.h | 7 +--
 net/core/dev.c   | 7 +--
 net/core/rtnetlink.c | 4 ++--
 4 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b194817631de..a838591aad28 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -807,6 +807,7 @@ enum xdp_netdev_command {
 * when it is no longer used.
 */
XDP_SETUP_PROG,
+   XDP_SETUP_PROG_HW,
/* Check if a bpf program is set on the device.  The callee should
 * return true if a program is currently attached and running.
 */
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index dd88375a6580..ce777ec88e1e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -891,9 +891,12 @@ enum {
 #define XDP_FLAGS_UPDATE_IF_NOEXIST(1U << 0)
 #define XDP_FLAGS_SKB_MODE (1U << 1)
 #define XDP_FLAGS_DRV_MODE (1U << 2)
+#define XDP_FLAGS_HW_MODE  (1U << 3)
+#define XDP_FLAGS_MODES(XDP_FLAGS_SKB_MODE | \
+XDP_FLAGS_DRV_MODE | \
+XDP_FLAGS_HW_MODE)
 #define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \
-XDP_FLAGS_SKB_MODE | \
-XDP_FLAGS_DRV_MODE)
+XDP_FLAGS_MODES)
 
 /* These are stored into IFLA_XDP_ATTACHED on dump. */
 enum {
diff --git a/net/core/dev.c b/net/core/dev.c
index a04db264aa1c..05cec8e2cd82 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6959,7 +6959,10 @@ static int dev_xdp_install(struct net_device *dev, 
xdp_op_t xdp_op,
struct netdev_xdp xdp;
 
memset(, 0, sizeof(xdp));
-   xdp.command = XDP_SETUP_PROG;
+   if (flags & XDP_FLAGS_HW_MODE)
+   xdp.command = XDP_SETUP_PROG_HW;
+   else
+   xdp.command = XDP_SETUP_PROG;
xdp.extack = extack;
xdp.flags = flags;
xdp.prog = prog;
@@ -6987,7 +6990,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct 
netlink_ext_ack *extack,
ASSERT_RTNL();
 
xdp_op = xdp_chk = ops->ndo_xdp;
-   if (!xdp_op && (flags & XDP_FLAGS_DRV_MODE))
+   if (!xdp_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE)))
return -EOPNOTSUPP;
if (!xdp_op || (flags & XDP_FLAGS_SKB_MODE))
xdp_op = generic_xdp_install;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 3aa57848a895..daf3b39be649 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -16,6 +16,7 @@
  * Vitaly E. LavrovRTA_OK arithmetics was wrong.
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -2251,8 +2252,7 @@ static int do_setlink(const struct sk_buff *skb,
err = -EINVAL;
goto errout;
}
-   if ((xdp_flags & XDP_FLAGS_SKB_MODE) &&
-   (xdp_flags & XDP_FLAGS_DRV_MODE)) {
+   if (hweight32(xdp_flags & XDP_FLAGS_MODES) > 1) {
err = -EINVAL;
goto errout;
}
-- 
2.11.0



[RFC net-next 8/8] nfp: xdp: report if program is offloaded

2017-06-16 Thread Jakub Kicinski
Make use of just added XDP_ATTACHED_HW.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index c5903b6e58c5..cabd117303e1 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3346,6 +3346,8 @@ static int nfp_net_xdp(struct net_device *netdev, struct 
netdev_xdp *xdp)
 xdp->extack);
case XDP_QUERY_PROG:
xdp->prog_attached = !!nn->xdp_prog;
+   if (nn->dp.bpf_offload_xdp)
+   xdp->prog_attached = XDP_ATTACHED_HW;
xdp->prog_id = nn->xdp_prog ? nn->xdp_prog->aux->id : 0;
return 0;
default:
-- 
2.11.0



[RFC net-next 7/8] xdp: add reporting of offload mode

2017-06-16 Thread Jakub Kicinski
Extend the XDP_ATTACHED_* values to include offloaded mode.
Let drivers report whether program is installed in the driver
or the HW by changing the prog_attached field from bool to
u8 (type of the netlink attribute).

Exploit the fact that the value of XDP_ATTACHED_DRV is 1,
therefore since all drivers currently assign the mode with
double negation:
   mode = !!xdp_prog;
no drivers have to be modified.

Signed-off-by: Jakub Kicinski 
---
 include/linux/netdevice.h| 7 ---
 include/uapi/linux/if_link.h | 1 +
 net/core/dev.c   | 3 +--
 net/core/rtnetlink.c | 6 +++---
 4 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index a838591aad28..68f5d899d1e6 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -809,7 +809,8 @@ enum xdp_netdev_command {
XDP_SETUP_PROG,
XDP_SETUP_PROG_HW,
/* Check if a bpf program is set on the device.  The callee should
-* return true if a program is currently attached and running.
+* set @prog_attached to one of XDP_ATTACHED_* values, note that "true"
+* is equivalent to XDP_ATTACHED_DRV.
 */
XDP_QUERY_PROG,
 };
@@ -827,7 +828,7 @@ struct netdev_xdp {
};
/* XDP_QUERY_PROG */
struct {
-   bool prog_attached;
+   u8 prog_attached;
u32 prog_id;
};
};
@@ -3307,7 +3308,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, 
struct net_device *dev,
 typedef int (*xdp_op_t)(struct net_device *dev, struct netdev_xdp *xdp);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
  int fd, u32 flags);
-bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op, u32 *prog_id);
+u8 __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op, u32 *prog_id);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
 int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ce777ec88e1e..8d062c58d5cb 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -903,6 +903,7 @@ enum {
XDP_ATTACHED_NONE = 0,
XDP_ATTACHED_DRV,
XDP_ATTACHED_SKB,
+   XDP_ATTACHED_HW,
 };
 
 enum {
diff --git a/net/core/dev.c b/net/core/dev.c
index 05cec8e2cd82..ad1ecb26e9fa 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6936,8 +6936,7 @@ int dev_change_proto_down(struct net_device *dev, bool 
proto_down)
 }
 EXPORT_SYMBOL(dev_change_proto_down);
 
-bool __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op,
-   u32 *prog_id)
+u8 __dev_xdp_attached(struct net_device *dev, xdp_op_t xdp_op, u32 *prog_id)
 {
struct netdev_xdp xdp;
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index daf3b39be649..f0f5b418e52d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1264,10 +1264,10 @@ static u8 rtnl_xdp_attached_mode(struct net_device 
*dev, u32 *prog_id)
*prog_id = generic_xdp_prog->aux->id;
return XDP_ATTACHED_SKB;
}
-   if (ops->ndo_xdp && __dev_xdp_attached(dev, ops->ndo_xdp, prog_id))
-   return XDP_ATTACHED_DRV;
+   if (!ops->ndo_xdp)
+   return XDP_ATTACHED_NONE;
 
-   return XDP_ATTACHED_NONE;
+   return __dev_xdp_attached(dev, ops->ndo_xdp, prog_id);
 }
 
 static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
-- 
2.11.0



[RFC net-next 0/8] xdp: offload mode

2017-06-16 Thread Jakub Kicinski
Hi!

This set adds XDP flag for forcing offload and a attachement mode
for reporting to user space that program has been offloaded.  The
nfp driver is modified to make use of the new flags, but also to
adhere to the DRV_MODE flag which should disable the HW offload.

Note that the NFP driver currently claims XDP offload support but 
lacks most basic features like direct packet access.

Jakub Kicinski (8):
  xdp: pass XDP flags into install handlers
  xdp: add HW offload mode flag for installing programs
  nfp: xdp: move driver XDP setup into a separate function
  nfp: bpf: don't offload XDP programs in DRV_MODE
  nfp: bpf: take a reference on offloaded programs
  nfp: bpf: add support for XDP_FLAGS_HW_MODE
  xdp: add reporting of offload mode
  nfp: xdp: report if program is offloaded

 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  5 ++
 .../net/ethernet/netronome/nfp/nfp_net_common.c| 53 --
 include/linux/netdevice.h  |  8 ++--
 include/uapi/linux/if_link.h   |  8 +++-
 net/core/dev.c | 10 ++--
 net/core/rtnetlink.c   | 10 ++--
 6 files changed, 66 insertions(+), 28 deletions(-)

-- 
2.11.0



[RFC net-next 1/8] xdp: pass XDP flags into install handlers

2017-06-16 Thread Jakub Kicinski
Pass XDP flags to the xdp ndo.  This will allow drivers to look
at the mode flags and make decisions about offload.

Signed-off-by: Jakub Kicinski 
---
 include/linux/netdevice.h | 1 +
 net/core/dev.c| 5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7c7118b3bd69..b194817631de 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -820,6 +820,7 @@ struct netdev_xdp {
union {
/* XDP_SETUP_PROG */
struct {
+   u32 flags;
struct bpf_prog *prog;
struct netlink_ext_ack *extack;
};
diff --git a/net/core/dev.c b/net/core/dev.c
index b8d6dd9e8b5c..a04db264aa1c 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6953,7 +6953,7 @@ bool __dev_xdp_attached(struct net_device *dev, xdp_op_t 
xdp_op,
 }
 
 static int dev_xdp_install(struct net_device *dev, xdp_op_t xdp_op,
-  struct netlink_ext_ack *extack,
+  struct netlink_ext_ack *extack, u32 flags,
   struct bpf_prog *prog)
 {
struct netdev_xdp xdp;
@@ -6961,6 +6961,7 @@ static int dev_xdp_install(struct net_device *dev, 
xdp_op_t xdp_op,
memset(, 0, sizeof(xdp));
xdp.command = XDP_SETUP_PROG;
xdp.extack = extack;
+   xdp.flags = flags;
xdp.prog = prog;
 
return xdp_op(dev, );
@@ -7005,7 +7006,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct 
netlink_ext_ack *extack,
return PTR_ERR(prog);
}
 
-   err = dev_xdp_install(dev, xdp_op, extack, prog);
+   err = dev_xdp_install(dev, xdp_op, extack, flags, prog);
if (err < 0 && prog)
bpf_prog_put(prog);
 
-- 
2.11.0



[RFC net-next 6/8] nfp: bpf: add support for XDP_FLAGS_HW_MODE

2017-06-16 Thread Jakub Kicinski
Respect the XDP_FLAGS_HW_MODE.  When it's set install the program
on the NIC and skip enabling XDP in the driver.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 68648e312129..c5903b6e58c5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3310,19 +3310,22 @@ static int
 nfp_net_xdp_setup(struct nfp_net *nn, struct bpf_prog *prog, u32 flags,
  struct netlink_ext_ack *extack)
 {
-   struct bpf_prog *offload_prog;
+   struct bpf_prog *drv_prog, *offload_prog;
int err;
 
if (nn->xdp_prog && (flags ^ nn->xdp_flags) & XDP_FLAGS_MODES)
return -EBUSY;
 
+   drv_prog = flags & XDP_FLAGS_HW_MODE  ? NULL : prog;
offload_prog = flags & XDP_FLAGS_DRV_MODE ? NULL : prog;
 
-   err = nfp_net_xdp_setup_drv(nn, prog, extack);
+   err = nfp_net_xdp_setup_drv(nn, drv_prog, extack);
if (err)
return err;
 
-   nfp_app_xdp_offload(nn->app, nn, offload_prog);
+   err = nfp_app_xdp_offload(nn->app, nn, offload_prog);
+   if (err && flags & XDP_FLAGS_HW_MODE)
+   return err;
 
if (nn->xdp_prog)
bpf_prog_put(nn->xdp_prog);
@@ -3338,6 +3341,7 @@ static int nfp_net_xdp(struct net_device *netdev, struct 
netdev_xdp *xdp)
 
switch (xdp->command) {
case XDP_SETUP_PROG:
+   case XDP_SETUP_PROG_HW:
return nfp_net_xdp_setup(nn, xdp->prog, xdp->flags,
 xdp->extack);
case XDP_QUERY_PROG:
-- 
2.11.0



[PATCH net-next] net: dsa: Fix legacy probing

2017-06-16 Thread Florian Fainelli
After commit 6d3c8c0dd88a ("net: dsa: Remove master_netdev and
use dst->cpu_dp->netdev") and a29342e73911 ("net: dsa: Associate
slave network device with CPU port") we would be seeing NULL pointer
dereferences when accessing dst->cpu_dp->netdev too early. In the legacy
code, we actually know early in advance the master network device, so
pass it down to the relevant functions.

Fixes: 6d3c8c0dd88a ("net: dsa: Remove master_netdev and use 
dst->cpu_dp->netdev")
Fixes: a29342e73911 ("net: dsa: Associate slave network device with CPU port")
Reported-by: Jason Cobham 
Tested-by: Jason Cobham 
Signed-off-by: Florian Fainelli 
---
 net/dsa/legacy.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/net/dsa/legacy.c b/net/dsa/legacy.c
index e60906125375..1d7a3282f2a7 100644
--- a/net/dsa/legacy.c
+++ b/net/dsa/legacy.c
@@ -95,18 +95,16 @@ static int dsa_cpu_dsa_setups(struct dsa_switch *ds, struct 
device *dev)
return 0;
 }
 
-static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent)
+static int dsa_switch_setup_one(struct dsa_switch *ds, struct net_device 
*master,
+   struct device *parent)
 {
const struct dsa_switch_ops *ops = ds->ops;
struct dsa_switch_tree *dst = ds->dst;
struct dsa_chip_data *cd = ds->cd;
bool valid_name_found = false;
-   struct net_device *master;
int index = ds->index;
int i, ret;
 
-   master = dst->cpu_dp->netdev;
-
/*
 * Validate supplied switch configuration.
 */
@@ -124,12 +122,12 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
return -EINVAL;
}
dst->cpu_dp = >ports[i];
+   dst->cpu_dp->netdev = master;
ds->cpu_port_mask |= 1 << i;
} else if (!strcmp(name, "dsa")) {
ds->dsa_port_mask |= 1 << i;
} else {
ds->enabled_port_mask |= 1 << i;
-   ds->ports[i].cpu_dp = dst->cpu_dp;
}
valid_name_found = true;
}
@@ -193,6 +191,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
 */
for (i = 0; i < ds->num_ports; i++) {
ds->ports[i].dn = cd->port_dn[i];
+   ds->ports[i].cpu_dp = dst->cpu_dp;
 
if (!(ds->enabled_port_mask & (1 << i)))
continue;
@@ -217,11 +216,10 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
 }
 
 static struct dsa_switch *
-dsa_switch_setup(struct dsa_switch_tree *dst, int index,
-struct device *parent, struct device *host_dev)
+dsa_switch_setup(struct dsa_switch_tree *dst, struct net_device *master,
+int index, struct device *parent, struct device *host_dev)
 {
struct dsa_chip_data *cd = dst->pd->chip + index;
-   struct net_device *master = dst->cpu_dp->netdev;
const struct dsa_switch_ops *ops;
struct dsa_switch *ds;
int ret;
@@ -254,7 +252,7 @@ dsa_switch_setup(struct dsa_switch_tree *dst, int index,
ds->ops = ops;
ds->priv = priv;
 
-   ret = dsa_switch_setup_one(ds, parent);
+   ret = dsa_switch_setup_one(ds, master, parent);
if (ret)
return ERR_PTR(ret);
 
@@ -580,12 +578,11 @@ static int dsa_setup_dst(struct dsa_switch_tree *dst, 
struct net_device *dev,
unsigned configured = 0;
 
dst->pd = pd;
-   dst->cpu_dp->netdev = dev;
 
for (i = 0; i < pd->nr_chips; i++) {
struct dsa_switch *ds;
 
-   ds = dsa_switch_setup(dst, i, parent, pd->chip[i].host_dev);
+   ds = dsa_switch_setup(dst, dev, i, parent, 
pd->chip[i].host_dev);
if (IS_ERR(ds)) {
netdev_err(dev, "[%d]: couldn't create dsa switch 
instance (error %ld)\n",
   i, PTR_ERR(ds));
-- 
2.9.3



Re: [RFC PATCH net-next v2 01/15] bpf: BPF support for socket ops

2017-06-16 Thread Lawrence Brakmo
On 6/16/17, 5:07 AM, "Daniel Borkmann"  wrote:

On 06/15/2017 10:08 PM, Lawrence Brakmo wrote:
> Created a new BPF program type, BPF_PROG_TYPE_SOCKET_OPS, and a 
corresponding
> struct that allows BPF programs of this type to access some of the
> socket's fields (such as IP addresses, ports, etc.). Currently there is
> functionality to load one global BPF program of this type which can be
> called at appropriate times to set relevant connection parameters such
> as buffer sizes, SYN and SYN-ACK RTOs, etc., based on connection
> information such as IP addresses, port numbers, etc.
>
> Alghough there are already 3 mechanisms to set parameters (sysctls,
> route metrics and setsockopts), this new mechanism provides some
> disticnt advantages. Unlike sysctls, it can set parameters per
> connection. In contrast to route metrics, it can also use port numbers
> and information provided by a user level program. In addition, it could
> set parameters probabilistically for evaluation purposes (i.e. do
> something different on 10% of the flows and compare results with the
> other 90% of the flows). Also, in cases where IPv6 addresses contain
> geographic information, the rules to make changes based on the distance
> (or RTT) between the hosts are much easier than route metric rules and
> can be global. Finally, unlike setsockopt, it oes not require
> application changes and it can be updated easily at any time.
>
> I plan to add support for loading per cgroup socket ops BPF programs in
> the near future. One question is whether I should add this functionality
> into David Ahern's BPF_PROG_TYPE_CGROUP_SOCK or create a new cgroup bpf
> type. Whereas the current cgroup_sock type expects to be called only once
> during a connection's lifetime, the new socket_ops type could be called
> multipe times. For example, before sending SYN and SYN-ACKs to set an
> appropriate timeout, when the connection is established to set
> congestion control, etc. As a result it has "op" field to specify the
> type of operation requested.
>
> The purpose of this new program type is to simplify setting connection
> parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is
> easy to use facebook's internal IPv6 addresses to determine if both hosts
> of a connection are in the same datacenter. Therefore, it is easy to
> write a BPF program to choose a small SYN RTO value when both hosts are
> in the same datacenter.
>
> This patch only contains the framework to support the new BPF program
> type, following patches add the functionality to set various connection
> parameters.
>
> This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS
> and a new bpf syscall command to load a new program of this type:
> BPF_PROG_LOAD_SOCKET_OPS.
>
> Two new corresponding structs (one for the kernel one for the user/BPF
> program):
>
> /* kernel version */
> struct bpf_socket_ops_kern {
>  struct sock *sk;
>   __u32  is_req_sock:1;
>  __u32  op;
>  union {
>  __u32 reply;
>  __u32 replylong[4];
>  };
> };
>
> /* user version */
> struct bpf_socket_ops {
>  __u32 op;
>  union {
>  __u32 reply;
>  __u32 replylong[4];
>  };
>  __u32 family;
>  __u32 remote_ip4;
>  __u32 local_ip4;
>  __u32 remote_ip6[4];
>  __u32 local_ip6[4];
>  __u32 remote_port;
>  __u32 local_port;
> };

Above and ...

struct bpf_sock {
__u32 bound_dev_if;
__u32 family;
__u32 type;
__u32 protocol;
};

... would result in two BPF sock user versions. It's okayish, but
given struct bpf_sock is quite generic, couldn't we merge the members
from struct bpf_socket_ops into struct bpf_sock instead?

Idea would be that sock_filter_is_valid_access() for cgroups would
then check off < 0 || off + size > offsetofend(struct bpf_sock, protocol)
to disallow new members, and your socket_ops_is_valid_access() could
allow and xlate the full range. The family member is already duplicate
and the others could then be accessed from these kind of BPF progs as
well, plus we have a single user representation similar as with __sk_buff
that multiple types will use.

I see. You are saying have one struct in common but still keep the two
PROG_TYPES? That makes sense. Do we really need two different
is_valid_access functions? Both types should be able to see all
the fields (otherwise adding new fields becomes messy).
   
> Currently there are two types of ops. The first type expects the BPF
> program 

[PATCH net] igb: protect TX timestamping from API misuse

2017-06-16 Thread Cliff Spradlin
HW timestamping can only be requested for a packet if the NIC is first
setup via ioctl(SIOCSHWTSTAMP). If this step was skipped, then the igb
driver still allowed TX packets to request HW timestamping. In this
situation, the _IGB_PTP_TX_IN_PROGRESS flag was set and would never
clear. This prevented any future HW timestamping requests to succeed.

Fix this by checking that the NIC is configured for HW TX timestamping
before accepting a HW TX timestamping request.

Signed-off-by: Cliff Spradlin 
---
 drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c 
b/drivers/net/ethernet/intel/igb/igb_main.c
index 1cf74aa4ebd9..45c2a9c9fa03 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -5381,7 +5381,8 @@ netdev_tx_t igb_xmit_frame_ring(struct sk_buff *skb,
if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) {
struct igb_adapter *adapter = netdev_priv(tx_ring->netdev);
 
-   if (!test_and_set_bit_lock(__IGB_PTP_TX_IN_PROGRESS,
+   if (adapter->tstamp_config.tx_type & HWTSTAMP_TX_ON &&
+   !test_and_set_bit_lock(__IGB_PTP_TX_IN_PROGRESS,
   >state)) {
skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
tx_flags |= IGB_TX_FLAGS_TSTAMP;
-- 
2.13.1.518.g3df882009-goog



[PATCH net-next 1/1] selftests: Introduce tc testsuite

2017-06-16 Thread Lucas Bates
Add the beginnings of a testsuite for tc functionality in the kernel.
These are a series of unit tests that use the tc executable and verify
the success of those commands by checking both the exit codes and the
output from tc's 'show' operation.

To run the tests:
  # cd tools/testing/selftests/tc-testing
  # sudo ./tdc.py

You can specify the tc executable to use with the -p argument on the command
line or editing the 'TC' variable in tdc_config.py. Refer to the README for
full details on how to run.

The initial complement of test cases are limited mostly to tc actions. Test
cases are most welcome; see the creating-testcases subdirectory for help
in creating them.

Signed-off-by: Lucas Bates 
Signed-off-by: Jamal Hadi Salim 
---
 tools/testing/selftests/tc-testing/.gitignore  |1 +
 tools/testing/selftests/tc-testing/README  |  102 ++
 tools/testing/selftests/tc-testing/TODO.txt|   10 +
 .../creating-testcases/AddingTestCases.txt |   69 ++
 .../tc-testing/creating-testcases/template.json|   40 +
 .../tc-testing/tc-tests/actions/tests.json | 1115 
 .../tc-testing/tc-tests/filters/tests.json |   21 +
 tools/testing/selftests/tc-testing/tdc.py  |  413 
 tools/testing/selftests/tc-testing/tdc_config.py   |   17 +
 tools/testing/selftests/tc-testing/tdc_helper.py   |   75 ++
 10 files changed, 1863 insertions(+)
 create mode 100644 tools/testing/selftests/tc-testing/.gitignore
 create mode 100644 tools/testing/selftests/tc-testing/README
 create mode 100644 tools/testing/selftests/tc-testing/TODO.txt
 create mode 100644 
tools/testing/selftests/tc-testing/creating-testcases/AddingTestCases.txt
 create mode 100644 
tools/testing/selftests/tc-testing/creating-testcases/template.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/tests.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
 create mode 100755 tools/testing/selftests/tc-testing/tdc.py
 create mode 100644 tools/testing/selftests/tc-testing/tdc_config.py
 create mode 100644 tools/testing/selftests/tc-testing/tdc_helper.py

diff --git a/tools/testing/selftests/tc-testing/.gitignore 
b/tools/testing/selftests/tc-testing/.gitignore
new file mode 100644
index 000..c18dd8d
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/.gitignore
@@ -0,0 +1 @@
+__pycache__/
diff --git a/tools/testing/selftests/tc-testing/README 
b/tools/testing/selftests/tc-testing/README
new file mode 100644
index 000..970ff29
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/README
@@ -0,0 +1,102 @@
+tdc - Linux Traffic Control (tc) unit testing suite
+
+Author: Lucas Bates - luc...@mojatatu.com
+
+tdc is a Python script to load tc unit tests from a separate JSON file and
+execute them inside a network namespace dedicated to the task.
+
+
+REQUIREMENTS
+
+
+*  Minimum Python version of 3.4. Earlier 3.X versions may work but are not
+   guaranteed.
+
+*  The kernel must have network namespace support
+
+*   The kernel must have veth support available, as a veth pair is created
+   prior to running the tests.
+
+*  All tc-related features must be built in or available as modules.
+   To check what is required in current setup run:
+   ./tdc.py -c
+
+   Note:
+   In the current release, tdc run will abort due to a failure in setup or
+   teardown commands - which includes not being able to run a test simply
+   because the kernel did not support a specific feature. (This will be
+   handled in a future version - the current workaround is to run the tests
+   on specific test categories that your kernel supports)
+
+
+BEFORE YOU RUN
+--
+
+The path to the tc executable that will be most commonly tested can be defined
+in the tdc_config.py file. Find the 'TC' entry in the NAMES dictionary and
+define the path.
+
+If you need to test a different tc executable on the fly, you can do so by
+using the -p option when running tdc:
+   ./tdc.py -p /path/to/tc
+
+
+RUNNING TDC
+---
+
+To use tdc, root privileges are required. tdc will not run otherwise.
+
+All tests are executed inside a network namespace to prevent conflicts
+within the host.
+
+Running tdc without any arguments will run all tests. Refer to the section
+on command line arguments for more information, or run:
+   ./tdc.py -h
+
+tdc will list the test names as they are being run, and print a summary in
+TAP (Test Anything Protocol) format when they are done. If tests fail,
+output captured from the failing test will be printed immediately following
+the failed test in the TAP output.
+
+
+USER-DEFINED CONSTANTS
+--
+
+The tdc_config.py file contains multiple values that can be altered to suit
+your needs. Any value in the NAMES dictionary can be altered without affecting
+the tests to be run. These values are used in the tc commands that will be
+executed as part of 

[PATCH net-next 0/1] Introduction of the tc tests

2017-06-16 Thread Lucas Bates
Apologies for sending this as one big patch. I've been sitting on this a little
too long, but it's ready and I wanted to get it out.

There are a limited number of tests to start - I plan to add more on a regular
basis.

Lucas Bates (1):
  selftests: Introduce tc testsuite

 tools/testing/selftests/tc-testing/.gitignore  |1 +
 tools/testing/selftests/tc-testing/README  |  102 ++
 tools/testing/selftests/tc-testing/TODO.txt|   10 +
 .../creating-testcases/AddingTestCases.txt |   69 ++
 .../tc-testing/creating-testcases/template.json|   40 +
 .../tc-testing/tc-tests/actions/tests.json | 1115 
 .../tc-testing/tc-tests/filters/tests.json |   21 +
 tools/testing/selftests/tc-testing/tdc.py  |  413 
 tools/testing/selftests/tc-testing/tdc_config.py   |   17 +
 tools/testing/selftests/tc-testing/tdc_helper.py   |   75 ++
 10 files changed, 1863 insertions(+)
 create mode 100644 tools/testing/selftests/tc-testing/.gitignore
 create mode 100644 tools/testing/selftests/tc-testing/README
 create mode 100644 tools/testing/selftests/tc-testing/TODO.txt
 create mode 100644 
tools/testing/selftests/tc-testing/creating-testcases/AddingTestCases.txt
 create mode 100644 
tools/testing/selftests/tc-testing/creating-testcases/template.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/actions/tests.json
 create mode 100644 
tools/testing/selftests/tc-testing/tc-tests/filters/tests.json
 create mode 100755 tools/testing/selftests/tc-testing/tdc.py
 create mode 100644 tools/testing/selftests/tc-testing/tdc_config.py
 create mode 100644 tools/testing/selftests/tc-testing/tdc_helper.py

--
2.7.4



Re: [PATCH v2 03/11] tty: kbd: reduce stack size with KASAN

2017-06-16 Thread Dmitry Torokhov
On Fri, Jun 16, 2017 at 1:56 PM, Arnd Bergmann  wrote:
> On Fri, Jun 16, 2017 at 7:29 PM, Dmitry Torokhov
>  wrote:
>> On Fri, Jun 16, 2017 at 8:58 AM, Samuel Thibault
>>  wrote:
>>> I'm however afraid we'd have to mark a lot of static functions that way,
>>> depending on the aggressivity of gcc... I'd indeed really argue that gcc
>>> should consider stack usage when inlining.
>>>
>>> static int f(int foo) {
>>> char c[256];
>>> g(c, foo);
>>> }
>>>
>>> is really not something that I'd want to see the compiler to inline.
>>
>> Why would not we want it be inlined? What we do not want us several
>> calls having _separate_ instances of 'c' generated on the stack, all
>> inlined calls should share 'c'. And of course if we have f1, f2, and
>> f3 with c1, c2, and c3, GCC should not blow up the stack inlining and
>> allocating stack for all 3 of them beforehand.
>>
>> But this all seems to me issue that should be solved in toolchain, not
>> trying to play whack-a-mole with kernel sources.
>
> The problem for the Samuel's example is that
>
> a) the "--param asan-stack=1" option in KASAN does blow up the
>stack, which is why the annotation is now called 'noinline_if_stackbloat'.
>
> b) The toolchain cannot solve the problem, as most instances of the
>problem (unlike kbd_put_queue) force the inlining unless you build
>with the x86-specific CONFIG_OPTIMIZE_INLINING.

If inlining done right there should be no change in stack size,
because if calls are not inlined then stack storage is "shared"
between calls, and it should similarly be shared when calls are
inlined. And that is toolchain issue.

-- 
Dmitry


Re: [PATCH v3 net-next 3/4] tls: kernel TLS support

2017-06-16 Thread Stephen Hemminger
On Wed, 14 Jun 2017 11:37:39 -0700
Dave Watson  wrote:

> --- /dev/null
> +++ b/net/tls/Kconfig
> @@ -0,0 +1,12 @@
> +#
> +# TLS configuration
> +#
> +config TLS
> + tristate "Transport Layer Security support"
> + depends on NET
> + default m
> + ---help---
> + Enable kernel support for TLS protocol. This allows symmetric
> + encryption handling of the TLS protocol to be done in-kernel.
> +
> + If unsure, say M.

I understand that this will be useful to lots of people and most distributions
will enable it. But the defacto policy in kernel configuration has been that
new features in kernel default to being disabled.


Re: [PATCH v2 03/11] tty: kbd: reduce stack size with KASAN

2017-06-16 Thread Arnd Bergmann
On Fri, Jun 16, 2017 at 7:29 PM, Dmitry Torokhov
 wrote:
> On Fri, Jun 16, 2017 at 8:58 AM, Samuel Thibault
>  wrote:
>> Arnd Bergmann, on ven. 16 juin 2017 17:41:47 +0200, wrote:
>>> The problem are the 'ch' and 'flag' variables that are passed into
>>> tty_insert_flip_char by value, and from there into
>>> tty_insert_flip_string_flags by reference.  In this case, kasan tries
>>> to detect whether tty_insert_flip_string_flags() does any out-of-bounds
>>> access on the pointers and adds 64 bytes redzone around each of
>>> the two variables.
>>
>> Ouch.
>>
>>> gcc-6.3.1 happens to inline 16 calls of tty_insert_flip_char() into
>
> I wonder if we should stop marking tty_insert_flip_char() as inline.

That would be an easy solution, yes. tty_insert_flip_char() was
apparently meant to be optimized for the fast path to completely
avoid calling into another function, but that fast path got a bit more
complex with commit acc0f67f307f ("tty: Halve flip buffer
GFP_ATOMIC memory consumption").

If we move it out of line, the fast path optimization goes away and
we could just have a simple implementation like


int tty_insert_flip_char(struct tty_port *port, unsigned char ch, char flag)
{
struct tty_buffer *tb = port->buf.tail;
int flags = (flag == TTY_NORMAL) ? TTYB_NORMAL : 0;

if (!__tty_buffer_request_room(port, 1, flags))
return 0;

if (~tb->flags & TTYB_NORMAL)
*flag_buf_ptr(tb, tb->used) = flag;
*char_buf_ptr(tb, tb->used++) = ch;

return 1;
}

One rather simple change I found would actually avoid the warning
and would seem to actually give us better runtime behavior even
without KASAN:

diff --git a/include/linux/tty_flip.h b/include/linux/tty_flip.h
index c28dd523f96e..15d03a14ad0f 100644
--- a/include/linux/tty_flip.h
+++ b/include/linux/tty_flip.h
@@ -26,7 +26,7 @@ static inline int tty_insert_flip_char(struct tty_port *port,
*char_buf_ptr(tb, tb->used++) = ch;
return 1;
}
-   return tty_insert_flip_string_flags(port, , , 1);
+   return tty_insert_flip_string_fixed_flag(port, , flag, 1);
 }

 static inline int tty_insert_flip_string(struct tty_port *port,

This reduces the stack frame size for kbd_event() to 1256 bytes,
which is well within the limit, and it lets us keep the flag-less
buffers across a 'tb->used >= tb->size' condition. Calling
into tty_insert_flip_string_flags() today will allocate a flag buffer
if there isn't already one, even when it is not needed.

>> I'm however afraid we'd have to mark a lot of static functions that way,
>> depending on the aggressivity of gcc... I'd indeed really argue that gcc
>> should consider stack usage when inlining.
>>
>> static int f(int foo) {
>> char c[256];
>> g(c, foo);
>> }
>>
>> is really not something that I'd want to see the compiler to inline.
>
> Why would not we want it be inlined? What we do not want us several
> calls having _separate_ instances of 'c' generated on the stack, all
> inlined calls should share 'c'. And of course if we have f1, f2, and
> f3 with c1, c2, and c3, GCC should not blow up the stack inlining and
> allocating stack for all 3 of them beforehand.
>
> But this all seems to me issue that should be solved in toolchain, not
> trying to play whack-a-mole with kernel sources.

The problem for the Samuel's example is that

a) the "--param asan-stack=1" option in KASAN does blow up the
   stack, which is why the annotation is now called 'noinline_if_stackbloat'.

b) The toolchain cannot solve the problem, as most instances of the
   problem (unlike kbd_put_queue) force the inlining unless you build
   with the x86-specific CONFIG_OPTIMIZE_INLINING.

Arnd


Re: [PATCH v3 net-next 3/4] tls: kernel TLS support

2017-06-16 Thread Stephen Hemminger
On Wed, 14 Jun 2017 11:37:39 -0700
Dave Watson  wrote:

> +
> +static inline struct tls_context *tls_get_ctx(const struct sock *sk)
> +{
> + struct inet_connection_sock *icsk = inet_csk(sk);
> +
> + return icsk->icsk_ulp_data;
> +}
> +
> +static inline struct tls_sw_context *tls_sw_ctx(
> + const struct tls_context *tls_ctx)
> +{
> + return (struct tls_sw_context *)tls_ctx->priv_ctx;
> +}
> +
> +static inline struct tls_offload_context *tls_offload_ctx(
> + const struct tls_context *tls_ctx)
> +{
> + return (struct tls_offload_context *)tls_ctx->priv_ctx;
> +}
> +

Since priv_ctx is void *, casts here are unnecessary.


Re: [PATCH 03/44] dmaengine: ioat: don't use DMA_ERROR_CODE

2017-06-16 Thread Alexander Duyck
On Fri, Jun 16, 2017 at 11:10 AM, Christoph Hellwig  wrote:
> DMA_ERROR_CODE is not a public API and will go away.  Instead properly
> unwind based on the loop counter.
>
> Signed-off-by: Christoph Hellwig 
> Acked-by: Dave Jiang 
> Acked-By: Vinod Koul 
> ---
>  drivers/dma/ioat/init.c | 24 +++-
>  1 file changed, 7 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
> index 6ad4384b3fa8..ed8ed1192775 100644
> --- a/drivers/dma/ioat/init.c
> +++ b/drivers/dma/ioat/init.c
> @@ -839,8 +839,6 @@ static int ioat_xor_val_self_test(struct ioatdma_device 
> *ioat_dma)
> goto free_resources;
> }
>
> -   for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
> -   dma_srcs[i] = DMA_ERROR_CODE;
> for (i = 0; i < IOAT_NUM_SRC_TEST; i++) {
> dma_srcs[i] = dma_map_page(dev, xor_srcs[i], 0, PAGE_SIZE,
>DMA_TO_DEVICE);
> @@ -910,8 +908,6 @@ static int ioat_xor_val_self_test(struct ioatdma_device 
> *ioat_dma)
>
> xor_val_result = 1;
>
> -   for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
> -   dma_srcs[i] = DMA_ERROR_CODE;
> for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++) {
> dma_srcs[i] = dma_map_page(dev, xor_val_srcs[i], 0, PAGE_SIZE,
>DMA_TO_DEVICE);
> @@ -965,8 +961,6 @@ static int ioat_xor_val_self_test(struct ioatdma_device 
> *ioat_dma)
> op = IOAT_OP_XOR_VAL;
>
> xor_val_result = 0;
> -   for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
> -   dma_srcs[i] = DMA_ERROR_CODE;
> for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++) {
> dma_srcs[i] = dma_map_page(dev, xor_val_srcs[i], 0, PAGE_SIZE,
>DMA_TO_DEVICE);
> @@ -1017,18 +1011,14 @@ static int ioat_xor_val_self_test(struct 
> ioatdma_device *ioat_dma)
> goto free_resources;
>  dma_unmap:
> if (op == IOAT_OP_XOR) {
> -   if (dest_dma != DMA_ERROR_CODE)
> -   dma_unmap_page(dev, dest_dma, PAGE_SIZE,
> -  DMA_FROM_DEVICE);
> -   for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
> -   if (dma_srcs[i] != DMA_ERROR_CODE)
> -   dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
> -  DMA_TO_DEVICE);
> +   while (--i >= 0)
> +   dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
> +  DMA_TO_DEVICE);
> +   dma_unmap_page(dev, dest_dma, PAGE_SIZE, DMA_FROM_DEVICE);
> } else if (op == IOAT_OP_XOR_VAL) {
> -   for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
> -   if (dma_srcs[i] != DMA_ERROR_CODE)
> -   dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
> -  DMA_TO_DEVICE);
> +   while (--i >= 0)
> +   dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
> +  DMA_TO_DEVICE);

Wouldn't it make more sense to pull out the while loop and just call
dma_unmap_page on dest_dma if "op == IOAT_OP_XOR"? Odds are it is what
the compiler is already generating and will save a few lines of code
so what you end up with is something like:
while (--i >= 0)
dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE, DMA_TO_DEVICE);
if (op == IOAT_OP_XOR)
dma_unmap_page(dev, dest_dma, PAGE_SIZE, DMA_FROM_DEVICE);

> }
>  free_resources:
> dma->device_free_chan_resources(dma_chan);
> --
> 2.11.0
>


Re: [PATCH] rtlwifi: rtl8821ae: remove unused variable

2017-06-16 Thread Larry Finger

On 06/13/2017 03:42 PM, Gustavo A. R. Silva wrote:

Remove unused variable rtlhal.

Addresses-Coverity-ID: 1248810
Signed-off-by: Gustavo A. R. Silva 
---


NACK!! That variable is used in file core.c in driver rtlwifi, which is loaded 
and used by rtl8821ae.


Please do more than blindly follow Coverity outputs, or improve that tool!

Larry


  drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c | 3 ---
  1 file changed, 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c 
b/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
index 2bc6bac..d158e34 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8821ae/hw.c
@@ -1360,7 +1360,6 @@ static bool _rtl8821ae_reset_pcie_interface_dma(struct 
ieee80211_hw *hw,
  static void _rtl8821ae_get_wakeup_reason(struct ieee80211_hw *hw)
  {
struct rtl_priv *rtlpriv = rtl_priv(hw);
-   struct rtl_hal *rtlhal = rtl_hal(rtl_priv(hw));
struct rtl_ps_ctl *ppsc = rtl_psc(rtlpriv);
u8 fw_reason = 0;
struct timeval ts;
@@ -1372,8 +1371,6 @@ static void _rtl8821ae_get_wakeup_reason(struct 
ieee80211_hw *hw)
  
  	ppsc->wakeup_reason = 0;
  
-	rtlhal->last_suspend_sec = ts.tv_sec;

-
switch (fw_reason) {
case FW_WOW_V2_PTK_UPDATE_EVENT:
ppsc->wakeup_reason = WOL_REASON_PTK_UPDATE;





[PATCH] net: introduce SO_PEERGROUPS getsockopt

2017-06-16 Thread David Herrmann
This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
retrieve the auxiliary groups of the remote peer. It is designed to
naturally extend SO_PEERCRED. That is, the underlying data is from the
same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
is, if the provided buffer is too small, ERANGE is returned and @optlen
is updated. Otherwise, the information is copied, @optlen is set to the
actual size, and 0 is returned.

While SO_PEERCRED (and thus `struct ucred') already returns the primary
group, it lacks the auxiliary group vector. However, nearly all access
controls (including kernel side VFS and SYSVIPC, but also user-space
polkit, DBus, ...) consider the entire set of groups, rather than just
the primary group. But this is currently not possible with pure
SO_PEERCRED. Instead, user-space has to work around this and query the
system database for the auxiliary groups of a UID retrieved via
SO_PEERCRED.

Unfortunately, there is no race-free way to query the auxiliary groups
of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
solution is to use getgrouplist(3p), which itself falls back to NSS and
whatever is configured in nsswitch.conf(3). This effectively checks
which groups we *would* assign to the user if it logged in *now*. On
normal systems it is as easy as reading /etc/group, but with NSS it can
resort to quering network databases (eg., LDAP), using IPC or network
communication.

Long story short: Whenever we want to use auxiliary groups for access
checks on IPC, we need further IPC to talk to the user/group databases,
rather than just relying on SO_PEERCRED and the incoming socket. This
is unfortunate, and might even result in dead-locks if the database
query uses the same IPC as the original request.

So far, those recursions / dead-locks have been avoided by using
primitive IPC for all crucial NSS modules. However, we want to avoid
re-inventing the wheel for each NSS module that might be involved in
user/group queries. Hence, we would preferably make DBus (and other IPC
that supports access-management based on groups) work without resorting
to the user/group database. This new SO_PEERGROUPS ioctl would allow us
to make dbus-daemon work without ever calling into NSS.

Cc: Michal Sekletar 
Cc: Simon McVittie 
Cc: Tom Gundersen 
Signed-off-by: David Herrmann 
---
 arch/alpha/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h |  2 ++
 arch/ia64/include/uapi/asm/socket.h|  2 ++
 arch/m32r/include/uapi/asm/socket.h|  2 ++
 arch/mips/include/uapi/asm/socket.h|  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/powerpc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h|  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/uapi/asm-generic/socket.h  |  2 ++
 net/core/sock.c| 33 +
 13 files changed, 57 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/socket.h 
b/arch/alpha/include/uapi/asm/socket.h
index 148d7a32754e..975c5cbf9a86 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -105,4 +105,6 @@
 
 #define SO_COOKIE  57
 
+#define SO_PEERGROUPS  58
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h 
b/arch/frv/include/uapi/asm/socket.h
index 1ccf45657472..8e53a149b216 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -98,5 +98,7 @@
 
 #define SO_COOKIE  57
 
+#define SO_PEERGROUPS  58
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h 
b/arch/ia64/include/uapi/asm/socket.h
index 2c3f4b48042a..d122c30429ae 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -107,4 +107,6 @@
 
 #define SO_COOKIE  57
 
+#define SO_PEERGROUPS  58
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h 
b/arch/m32r/include/uapi/asm/socket.h
index ae6548d29a18..7e689cc14668 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -98,4 +98,6 @@
 
 #define SO_COOKIE  57
 
+#define SO_PEERGROUPS  58
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h 
b/arch/mips/include/uapi/asm/socket.h
index 3418ec9c1c50..5c0947d063cc 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -116,4 +116,6 @@
 
 #define SO_COOKIE  57
 
+#define SO_PEERGROUPS  58
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h 
b/arch/mn10300/include/uapi/asm/socket.h
index 4526e92301a6..219f516eb6ad 100644
--- 

Re: [PATCH 0/2] Replace driver's usage of hard-coded device IDs to #defines

2017-06-16 Thread Myron Stowe
On Fri, Jun 16, 2017 at 2:08 PM, Bjorn Helgaas  wrote:
> On Thu, May 25, 2017 at 09:56:55AM -0600, Myron Stowe wrote:
>> On Wed, 24 May 2017 20:02:49 -0400 (EDT)
>> David Miller  wrote:
>>
>> > From: Myron Stowe 
>> > Date: Wed, 24 May 2017 16:47:34 -0600
>> >
>> > > Noa Osherovich introduced a series of new Mellanox device ID
>> > > definitions to help differentiate specific controllers that needed
>> > > INTx masking quirks [1].
>> > >
>> > > Bjorn Helgaas followed on, using the device ID definitions Noa
>> > > provided to replace hard-coded values within the mxl4 ID table [2].
>> > >
>> > > This patch continues along similar lines, adding a few additional
>> > > Mellanox device ID definitions and converting the net/mlx5e
>> > > driver's mlx5 ID table to use the defines so tools like 'grep' and
>> > > 'cscope' can be used to help identify relationships with other
>> > > aspects (such as INTx masking).
>> >
>> > If you're adding pci_ids.h defines, it's only valid to do so if you
>> > actually use the defines in more than one location.
>> >
>> > This patch series is not doing that.
>>
>> Hi David,
>>
>> Yes, now that you mention that again I do vaguely remember past
>> conversations stating similar constraints which is a little odd as
>> Noa's series did exactly that.  It was Bjorn, in a separate patch, that
>> made the connection to the driver with commit c19e4b9037f
>> ("net/mlx4_core: Use device ID defines") [1] and even after such, some
>> of the introduced #defines are still currently singular in usage.
>>
>> Anyway, the part I'm interested in is creating a more transparent
>> association between the Mellanox controllers that need the INTx masking
>> quirk and their drivers, something that remains very opaque currently
>> for a few of the remaining instances (PCI_DEVICE_ID_MELLANOX_CONNECTIB,
>> PCI_DEVICE_ID_MELLANOX_CONNECTX4, and
>> PCI_DEVICE_ID_MELLANOX_CONNECTX4_LX).
>
> I think what you want is the patch below (your patch 2, after removing
> CONNECTX5, CONNECTX5_EX, and CONNECTX6 since they're only used in one
> place).
>
> We added definitions for CONNECTIB, CONNECTX4, and CONNECTX4_LX and uses of
> them in a quirk via:
>
>   7254383341bc ("PCI: Add Mellanox device IDs")
>   d76d2fe05fd9 ("PCI: Convert Mellanox broken INTx quirks to be for listed
>   devices only")
>
> But somehow we missed using those in mlx5/core/main.c.

Yes, that's the downside to not adding pci_ids.h defines originally
within the driver when its written (even if they are only used once),
it ends up putting the burden on subsequent patch generators to notice
the association and clean things up after the fact.  :)

>
> The patch below doesn't touch PCI, so it would be just for netdev.

Yes, without the pci_ids.h #defines, the resulting patch just ends up
being for netdev.  That's fine, it was the association between the
quirk and the driver, wanting to make such more transparent, that was
the inpetus for this.

>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 0c123d571b4c..8a4e292f26b8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -1508,11 +1508,11 @@ static void shutdown(struct pci_dev *pdev)
>  }
>
>  static const struct pci_device_id mlx5_core_pci_table[] = {
> -   { PCI_VDEVICE(MELLANOX, 0x1011) },  /* Connect-IB 
> */
> +   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTIB) },
> { PCI_VDEVICE(MELLANOX, 0x1012), MLX5_PCI_DEV_IS_VF},   /* Connect-IB 
> VF */
> -   { PCI_VDEVICE(MELLANOX, 0x1013) },  /* ConnectX-4 
> */
> +   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTX4) },
> { PCI_VDEVICE(MELLANOX, 0x1014), MLX5_PCI_DEV_IS_VF},   /* ConnectX-4 
> VF */
> -   { PCI_VDEVICE(MELLANOX, 0x1015) },  /* 
> ConnectX-4LX */
> +   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTX4_LX) },
> { PCI_VDEVICE(MELLANOX, 0x1016), MLX5_PCI_DEV_IS_VF},   /* 
> ConnectX-4LX VF */
> { PCI_VDEVICE(MELLANOX, 0x1017) },  /* 
> ConnectX-5, PCIe 3.0 */
> { PCI_VDEVICE(MELLANOX, 0x1018), MLX5_PCI_DEV_IS_VF},   /* ConnectX-5 
> VF */


Re: [PATCH 0/2] Replace driver's usage of hard-coded device IDs to #defines

2017-06-16 Thread Bjorn Helgaas
On Thu, May 25, 2017 at 09:56:55AM -0600, Myron Stowe wrote:
> On Wed, 24 May 2017 20:02:49 -0400 (EDT)
> David Miller  wrote:
> 
> > From: Myron Stowe 
> > Date: Wed, 24 May 2017 16:47:34 -0600
> > 
> > > Noa Osherovich introduced a series of new Mellanox device ID
> > > definitions to help differentiate specific controllers that needed
> > > INTx masking quirks [1].
> > > 
> > > Bjorn Helgaas followed on, using the device ID definitions Noa
> > > provided to replace hard-coded values within the mxl4 ID table [2].
> > > 
> > > This patch continues along similar lines, adding a few additional
> > > Mellanox device ID definitions and converting the net/mlx5e
> > > driver's mlx5 ID table to use the defines so tools like 'grep' and
> > > 'cscope' can be used to help identify relationships with other
> > > aspects (such as INTx masking).  
> > 
> > If you're adding pci_ids.h defines, it's only valid to do so if you
> > actually use the defines in more than one location.
> > 
> > This patch series is not doing that.
> 
> Hi David,
> 
> Yes, now that you mention that again I do vaguely remember past
> conversations stating similar constraints which is a little odd as
> Noa's series did exactly that.  It was Bjorn, in a separate patch, that
> made the connection to the driver with commit c19e4b9037f
> ("net/mlx4_core: Use device ID defines") [1] and even after such, some
> of the introduced #defines are still currently singular in usage.
> 
> Anyway, the part I'm interested in is creating a more transparent
> association between the Mellanox controllers that need the INTx masking
> quirk and their drivers, something that remains very opaque currently
> for a few of the remaining instances (PCI_DEVICE_ID_MELLANOX_CONNECTIB,
> PCI_DEVICE_ID_MELLANOX_CONNECTX4, and
> PCI_DEVICE_ID_MELLANOX_CONNECTX4_LX).

I think what you want is the patch below (your patch 2, after removing
CONNECTX5, CONNECTX5_EX, and CONNECTX6 since they're only used in one
place).

We added definitions for CONNECTIB, CONNECTX4, and CONNECTX4_LX and uses of
them in a quirk via:

  7254383341bc ("PCI: Add Mellanox device IDs")
  d76d2fe05fd9 ("PCI: Convert Mellanox broken INTx quirks to be for listed
  devices only")

But somehow we missed using those in mlx5/core/main.c.

The patch below doesn't touch PCI, so it would be just for netdev.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 0c123d571b4c..8a4e292f26b8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1508,11 +1508,11 @@ static void shutdown(struct pci_dev *pdev)
 }
 
 static const struct pci_device_id mlx5_core_pci_table[] = {
-   { PCI_VDEVICE(MELLANOX, 0x1011) },  /* Connect-IB */
+   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTIB) },
{ PCI_VDEVICE(MELLANOX, 0x1012), MLX5_PCI_DEV_IS_VF},   /* Connect-IB 
VF */
-   { PCI_VDEVICE(MELLANOX, 0x1013) },  /* ConnectX-4 */
+   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTX4) },
{ PCI_VDEVICE(MELLANOX, 0x1014), MLX5_PCI_DEV_IS_VF},   /* ConnectX-4 
VF */
-   { PCI_VDEVICE(MELLANOX, 0x1015) },  /* ConnectX-4LX 
*/
+   { PCI_VDEVICE(MELLANOX, PCI_DEVICE_ID_MELLANOX_CONNECTX4_LX) },
{ PCI_VDEVICE(MELLANOX, 0x1016), MLX5_PCI_DEV_IS_VF},   /* ConnectX-4LX 
VF */
{ PCI_VDEVICE(MELLANOX, 0x1017) },  /* ConnectX-5, 
PCIe 3.0 */
{ PCI_VDEVICE(MELLANOX, 0x1018), MLX5_PCI_DEV_IS_VF},   /* ConnectX-5 
VF */


Re: [PATCH 0/5] skb data accessors cleanup

2017-06-16 Thread David Miller
From: Joe Perches 
Date: Fri, 16 Jun 2017 12:43:23 -0700

> On Fri, 2017-06-16 at 11:50 -0400, David Miller wrote:
>> From: Johannes Berg 
>> Date: Fri, 16 Jun 2017 09:07:42 +0200
>> 
>> > Over night, Fengguang's bot told me that it compiled all of its many
>> > various configurations successfully, and I had done allyesconfig on
>> > x86_64 myself yesterday to iron out the things I missed.
>> > 
>> > So now I think I'm happy with it.
> 
> Nice work Johannes. thanks.
> 
> Next up: skb_push?
>  
>> Series applied, thanks!
> 
> David, it seems you applied the V2 version of
> this patchset.  When you
> apply revised patches,
> can you please reply to the V2 submission?
> 
> It's a bit confusing otherwise.

Sorry, that happens from time to time :-(


Re: [PATCH 0/5] skb data accessors cleanup

2017-06-16 Thread Joe Perches
On Fri, 2017-06-16 at 11:50 -0400, David Miller wrote:
> From: Johannes Berg 
> Date: Fri, 16 Jun 2017 09:07:42 +0200
> 
> > Over night, Fengguang's bot told me that it compiled all of its many
> > various configurations successfully, and I had done allyesconfig on
> > x86_64 myself yesterday to iron out the things I missed.
> > 
> > So now I think I'm happy with it.

Nice work Johannes. thanks.

Next up: skb_push?
 
> Series applied, thanks!

David, it seems you applied the V2 version of
this patchset.  When you
apply revised patches,
can you please reply to the V2 submission?

It's a bit confusing otherwise.

> I tell ya, spatch appears to be the crack cocaine of Linus kernel
> development.  Once someone gets into some spatch scripting work,
> they can't stop!

True.

Sometimes I use cocci, sometimes I use regexes.
cocci is cool but it can still be pretty slow.




Re: [PATCH net-next 0/3] ipmr/ip6mr: add Netlink notifications on cache reports

2017-06-16 Thread Julien Gomes
On 06/15/2017 06:00 AM, Nikolay Aleksandrov wrote:
> On 15/06/17 14:44, Nikolay Aleksandrov wrote:
>> On 15/06/17 14:33, Nikolay Aleksandrov wrote:
>>> On 15/06/17 00:51, Julien Gomes wrote:
 Hi Nikolay,

 On 06/14/2017 05:04 AM, Nikolay Aleksandrov wrote:

> This has been on our todo list and I'm definitely interested in the 
> implementation.
> A few things that need careful consideration from my POV. First are the 
> security
> implications - this sends rtnl multicast messages but the rtnl socket has
> the NL_CFG_F_NONROOT_RECV flag thus allowing any user on the system to 
> listen in.
> This would allow them to see the full packets and all reports (granted 
> they can see
> the notifications even now), but the full packet is like giving them the 
> opportunity
> to tcpdump the PIM traffic.
 I definitely see how this can be an issue.
 From what I see, this means that either the packet should be
 transmitted another way, or another Netlink family should be used.

 NETLINK_ROUTE looks to be the logical family to choose though,
 but then I do not see a proper other way to handle this.
>>> Right, currently me neither, unless it provides a bind callback when 
>>> registering
>>> the kernel socket.
>>>
 However I may just not be looking into the right direction,
 maybe you currently have another approach in mind?
>>> I haven't gotten around to make (or even try) them but I was thinking about 
>>> 2 options
>>> ending up with a similar result:
>>>
>>> 1) genetlink
>>>  It also has the NONROOT_RECV flag, but it also allows for a callback - 
>>> mcast_bind()
>>>  which can be used to filter.
>>>
>>> or
>>>
>>> 2) Providing a bind callback to the NETLINK_ROUTE socket.
>>>
>> Ah nevermind, these cannot be used for filtering currently, so it seems
>> the netlink interface would need to be extended too if going down this road.
>>
> Sorry for the multiple emails, just to be thorough - again if going down this
> road all of these would obviously require a different group to bind to in 
> order
> to be able to filter on it, because users must keep receiving their 
> notifications
> for the ipmr one.

Actually, using a bind callback for NETLINK_ROUTE with a new group,
without netlink interface extension, could  work.

I quickly tested something like this:
> static int rtnetlink_bind(struct net *net, int group)
> {
> switch (group) {
> case RTNLGRP_IPV4_MROUTE_R:
> case RTNLGRP_IPV6_MROUTE_R:
>if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
>return -EPERM;
>break;
> }
> return 0;
> }

With the addition of one/two groups this does restrict the reports'
potential listeners.
The group names here are just placeholders, I am not especially fixed
on these ones.

It is not perfect as this would introduce groups with specific
requirements in NETLINK_ROUTE, but I think it can be decent.

What do you think about this?

-- 
Julien Gomes



Re: [pull request][net-next 00/15] Mellanox mlx5 updates and cleanups 2017-06-16

2017-06-16 Thread David Miller
From: Saeed Mahameed 
Date: Fri, 16 Jun 2017 00:42:37 +0300

> This series provides updates and cleanups to mlx5 driver.
> 
> For more details please see tag log below.
> 
> Please pull and let me know if there's any problem.
> *This series doesn't introduce any conflict with the ongoing net
> pull request.

Pulled, thank you.


Re: [PATCH net-next] net: dsa: add cross-chip multicast support

2017-06-16 Thread David Miller
From: Vivien Didelot 
Date: Thu, 15 Jun 2017 16:14:48 -0400

> Similarly to how cross-chip VLAN works, define a bitmap of multicast
> group members for a switch, now including its DSA ports, so that
> multicast traffic can be sent to all switches of the fabric.
> 
> A switch may drop the frames if no user port is a member.
> 
> This brings support for multicast in a multi-chip environment.
> As of now, all switches of the fabric must support the multicast
> operations in order to program a single fabric port.
> 
> Reported-by: Jason Cobham 
> Signed-off-by: Vivien Didelot 

Applied, thanks Vivien.


Re: [PATCH] ibmvnic: driver initialization for kdump/kexec

2017-06-16 Thread David Miller
From: Nathan Fontenot 
Date: Thu, 15 Jun 2017 14:48:09 -0400

> When booting into the kdump/kexec kernel, pHyp and vios
> are not prepared for the initialization crq request and
> a failover transport event is generated. This is not
> handled correctly.
> 
> At this point in initialization the driver is still in
> the 'probing' state and cannot handle a full reset of the
> driver as is normally done for a failover transport event.
> 
> To correct this we catch driver resets while still in the
> 'probing' state and return EAGAIN. This results in the
> driver tearing down the main crq and calling ibmvnic_init()
> again.
> 
> Signed-off-by: Nathan Fontenot 

Applied to net-next.


Re: [PATCH v2 0/6] skb data accessor cleanups

2017-06-16 Thread Joe Perches
On Fri, 2017-06-16 at 14:29 +0200, Johannes Berg wrote:
> Changes from v1:
>  * add skb_put_u8() as suggested by Joe and Bjørn
> 
> Again build-tested by myself and kbuild bot.

Now that I think about it, I did something similar in 2015
https://www.spinics.net/lists/netdev/msg327404.html

At that time Eric Dumazet objected to the API expansion.

And here's an FYI that was a bit surprising to me:

Adding separate EXPORT_SYMBOL functions for skb_put_zero
and skb_put_data that do the memset/memcpy does not
shrink an x86-64 defconfig kernel image much at all.

$ size vmlinux.new vmlinux.old
   text    data bss dec hex 
filename
107799414707128  892928 16379997 f9f05d 
vmlinux.new
107806314707128  892928 16380687 f9f30f 
vmlinux.old

This was the patch I tested:
---
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 852feacf4bbf..9de7c642ee4f 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1904,24 +1904,8 @@ static inline void *__skb_put(struct sk_buff *skb, 
unsigned int len)
    return tmp;
 }
 
-static inline void *skb_put_zero(struct sk_buff *skb, unsigned int len)
-{
-   void *tmp = skb_put(skb, len);
-
-   memset(tmp, 0, len);
-
-   return tmp;
-}
-
-static inline void *skb_put_data(struct sk_buff *skb, const void *data,
-    unsigned int len)
-{
-   void *tmp = skb_put(skb, len);
-
-   memcpy(tmp, data, len);
-
-   return tmp;
-}
+void *skb_put_zero(struct sk_buff *skb, unsigned int len);
+void *skb_put_data(struct sk_buff *skb, const void *data, unsigned int len);
 
 static inline void skb_put_u8(struct sk_buff *skb, u8 val)
 {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index f75897a33fa4..327f7cd2e0bb 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1431,6 +1431,19 @@ void *pskb_put(struct sk_buff *skb, struct sk_buff 
*tail, int len)
 }
 EXPORT_SYMBOL_GPL(pskb_put);
 
+static __always_inline void *___skb_put(struct sk_buff *skb, unsigned int len)
+{
+   void *tmp = skb_tail_pointer(skb);
+
+   SKB_LINEAR_ASSERT(skb);
+   skb->tail += len;
+   skb->len  += len;
+   if (unlikely(skb->tail > skb->end))
+   skb_over_panic(skb, len, __builtin_return_address(0));
+
+   return tmp;
+}
+
 /**
  * skb_put - add data to a buffer
  * @skb: buffer to use
@@ -1442,17 +1455,42 @@ EXPORT_SYMBOL_GPL(pskb_put);
  */
 void *skb_put(struct sk_buff *skb, unsigned int len)
 {
-   void *tmp = skb_tail_pointer(skb);
-   SKB_LINEAR_ASSERT(skb);
-   skb->tail += len;
-   skb->len  += len;
-   if (unlikely(skb->tail > skb->end))
-   skb_over_panic(skb, len, __builtin_return_address(0));
-   return tmp;
+   return ___skb_put(skb, len);
 }
 EXPORT_SYMBOL(skb_put);
 
 /**
+ * skb_put_zero - add zeroed data to a buffer
+ * @skb: buffer to use
+ * @len: amount of zeroed data to add
+ *
+ * This function extends the used data area of the buffer. If this would
+ * exceed the total buffer size the kernel will panic. A pointer to the
+ * first byte of the extra data is returned.
+ */
+void *skb_put_zero(struct sk_buff *skb, unsigned int len)
+{
+   return memset(___skb_put(skb, len), 0, len);
+}
+EXPORT_SYMBOL(skb_put_zero);
+
+/**
+ * skb_put_data - Copy data to a buffer
+ * @skb: buffer to use
+ * @data: pointer to data to copy
+ * @len: amount of data to add
+ *
+ * This function extends the used data area of the buffer. If this would
+ * exceed the total buffer size the kernel will panic. A pointer to the
+ * first byte of the extra data is returned.
+ */
+void *skb_put_data(struct sk_buff *skb, const void *data, unsigned int len)
+{
+   return memcpy(___skb_put(skb, len), data, len);
+}
+EXPORT_SYMBOL(skb_put_data);
+
+/**
  * skb_push - add data to the start of a buffer
  * @skb: buffer to use
  * @len: amount of data to add


Re: [PATCH net-next 00/21] remove dst garbage collector logic

2017-06-16 Thread David Miller
From: Wei Wang 
Date: Fri, 16 Jun 2017 10:47:23 -0700

> From: Wei Wang 
> 
> The current mechanism of dst release is a bit complicated. It is because
> the users of dst get divided into 2 situations:
>   1. Most users take the reference count when using a dst and release the
>  reference count when done.
>   2. Exceptional users like IPv4/IPv6/decnet/xfrm routing code do not take
>  reference count when referencing to a dst due to some histotic reasons.
> 
> Due to those exceptional use cases in 2, reference count being 0 is not an
> adequate evidence to indicate that no user is using this dst. So users in 1
> can't free the dst simply based on reference count being 0 because users in
> 2 might still hold reference to it.
> Instead, a dst garbage list is needed to hold the dst entries that already
> get removed by the users in 2 but are still held by users in 1. And a periodic
> garbage collector task is run to check all the dst entries in the list to see
> if the users in 1 have released the reference to those dst entries.
> If so, the dst is now ready to be freed.
> 
> This logic introduces unnecessary complications in the dst code which makes it
> hard to understand and to debug.
> 
> In order to get rid of the whole dst garbage collector (gc) and make the dst
> code more unified and simplified, we can make the users in 2 also take 
> reference
> count on the dst and release it properly when done.
> This way, dst can be safely freed once the refcount drops to 0 and no gc
> thread is needed anymore.
> 
> This patch series' target is to completely get rid of dst gc logic and free
> dst based on reference count only.
> Patch 1-3 are preparation patches to do some cleanup/improvement on the 
> existing
> code to make later work easier.
> Patch 4-21 are real implementations.
> In these patches, a temporary flag DST_NOGC is used to help transition
> those exceptional users one by one. Once every component is transitioned,
> this temporary flag is removed.
> By the end of this patch series, all dst are refcounted when being used
> and released when done. And dst will be freed when its refcount drops to 0.
> No dst gc task is running anymore.
> 
> Note: This patch series depends on the decnet fix that was sent right before:
>   "decnet: always not take dst->__refcnt when inserting dst into hash 
> table"

Other than the minor feedback I gave, this series looks great!

Indeed the code is a lot simpler afterwards and much easier to audit and
understand.

I'll wait a bit for some others to give feedback as well.


Re: [PATCH net-next 02/21] udp: call dst_hold_safe() in udp_sk_rx_set_dst()

2017-06-16 Thread Wei Wang
On Fri, Jun 16, 2017 at 12:02 PM, David Miller  wrote:
> From: Wei Wang 
> Date: Fri, 16 Jun 2017 10:47:25 -0700
>
>> + if (dst)
>> + /* set noref for now.
>> +  * any place which wants to hold dst has to call
>> +  * dst_hold_safe()
>> +  */
>> + skb_dst_set_noref(skb, dst);
>
> You must enclose the code in curly braces if you want to put a comment
> in this one-line basic block of the 'if' statement.
>
> Otherwise it's hard to read.
>
> Likewise for the other similar change in this file.
>

Got it. Will update in the next version.


Re: [PATCH net-next 02/21] udp: call dst_hold_safe() in udp_sk_rx_set_dst()

2017-06-16 Thread David Miller
From: Wei Wang 
Date: Fri, 16 Jun 2017 10:47:25 -0700

> + if (dst)
> + /* set noref for now.
> +  * any place which wants to hold dst has to call
> +  * dst_hold_safe()
> +  */
> + skb_dst_set_noref(skb, dst);

You must enclose the code in curly braces if you want to put a comment
in this one-line basic block of the 'if' statement.

Otherwise it's hard to read.

Likewise for the other similar change in this file.

THanks.


Re: [PATCH net] decnet: always not take dst->__refcnt when inserting dst into hash table

2017-06-16 Thread David Miller
From: Wei Wang 
Date: Fri, 16 Jun 2017 10:46:37 -0700

> From: Wei Wang 
> 
> In the existing dn_route.c code, dn_route_output_slow() takes
> dst->__refcnt before calling dn_insert_route() while dn_route_input_slow()
> does not take dst->__refcnt before calling dn_insert_route().
> This makes the whole routing code very buggy.
> In dn_dst_check_expire(), dnrt_free() is called when rt expires. This
> makes the routes inserted by dn_route_output_slow() not able to be
> freed as the refcnt is not released.
> In dn_dst_gc(), dnrt_drop() is called to release rt which could
> potentially cause the dst->__refcnt to be dropped to -1.
> In dn_run_flush(), dst_free() is called to release all the dst. Again,
> it makes the dst inserted by dn_route_output_slow() not able to be
> released and also, it does not wait on the rcu and could potentially
> cause crash in the path where other users still refer to this dst.
> 
> This patch makes sure both input and output path do not take
> dst->__refcnt before calling dn_insert_route() and also makes sure
> dnrt_free()/dst_free() is called when removing dst from the hash table.
> The only difference between those 2 calls is that dnrt_free() waits on
> the rcu while dst_free() does not.
> 
> Signed-off-by: Wei Wang 
> Acked-by: Martin KaFai Lau 

Applied and queued up for -stable, thanks.

I've also applied it to net-next for the sake of your dst gc removal
series.

Thanks again.



[PATCH 03/44] dmaengine: ioat: don't use DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE is not a public API and will go away.  Instead properly
unwind based on the loop counter.

Signed-off-by: Christoph Hellwig 
Acked-by: Dave Jiang 
Acked-By: Vinod Koul 
---
 drivers/dma/ioat/init.c | 24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/dma/ioat/init.c b/drivers/dma/ioat/init.c
index 6ad4384b3fa8..ed8ed1192775 100644
--- a/drivers/dma/ioat/init.c
+++ b/drivers/dma/ioat/init.c
@@ -839,8 +839,6 @@ static int ioat_xor_val_self_test(struct ioatdma_device 
*ioat_dma)
goto free_resources;
}
 
-   for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
-   dma_srcs[i] = DMA_ERROR_CODE;
for (i = 0; i < IOAT_NUM_SRC_TEST; i++) {
dma_srcs[i] = dma_map_page(dev, xor_srcs[i], 0, PAGE_SIZE,
   DMA_TO_DEVICE);
@@ -910,8 +908,6 @@ static int ioat_xor_val_self_test(struct ioatdma_device 
*ioat_dma)
 
xor_val_result = 1;
 
-   for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
-   dma_srcs[i] = DMA_ERROR_CODE;
for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++) {
dma_srcs[i] = dma_map_page(dev, xor_val_srcs[i], 0, PAGE_SIZE,
   DMA_TO_DEVICE);
@@ -965,8 +961,6 @@ static int ioat_xor_val_self_test(struct ioatdma_device 
*ioat_dma)
op = IOAT_OP_XOR_VAL;
 
xor_val_result = 0;
-   for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
-   dma_srcs[i] = DMA_ERROR_CODE;
for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++) {
dma_srcs[i] = dma_map_page(dev, xor_val_srcs[i], 0, PAGE_SIZE,
   DMA_TO_DEVICE);
@@ -1017,18 +1011,14 @@ static int ioat_xor_val_self_test(struct ioatdma_device 
*ioat_dma)
goto free_resources;
 dma_unmap:
if (op == IOAT_OP_XOR) {
-   if (dest_dma != DMA_ERROR_CODE)
-   dma_unmap_page(dev, dest_dma, PAGE_SIZE,
-  DMA_FROM_DEVICE);
-   for (i = 0; i < IOAT_NUM_SRC_TEST; i++)
-   if (dma_srcs[i] != DMA_ERROR_CODE)
-   dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
-  DMA_TO_DEVICE);
+   while (--i >= 0)
+   dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
+  DMA_TO_DEVICE);
+   dma_unmap_page(dev, dest_dma, PAGE_SIZE, DMA_FROM_DEVICE);
} else if (op == IOAT_OP_XOR_VAL) {
-   for (i = 0; i < IOAT_NUM_SRC_TEST + 1; i++)
-   if (dma_srcs[i] != DMA_ERROR_CODE)
-   dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
-  DMA_TO_DEVICE);
+   while (--i >= 0)
+   dma_unmap_page(dev, dma_srcs[i], PAGE_SIZE,
+  DMA_TO_DEVICE);
}
 free_resources:
dma->device_free_chan_resources(dma_chan);
-- 
2.11.0



[PATCH 05/44] drm/armada: don't abuse DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
dev_addr isn't even a dma_addr_t, and DMA_ERROR_CODE has never been
a valid driver API.  Add a bool mapped flag instead.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/armada/armada_fb.c  | 2 +-
 drivers/gpu/drm/armada/armada_gem.c | 5 ++---
 drivers/gpu/drm/armada/armada_gem.h | 1 +
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/armada/armada_fb.c 
b/drivers/gpu/drm/armada/armada_fb.c
index 2a7eb6817c36..92e6b08ea64a 100644
--- a/drivers/gpu/drm/armada/armada_fb.c
+++ b/drivers/gpu/drm/armada/armada_fb.c
@@ -133,7 +133,7 @@ static struct drm_framebuffer *armada_fb_create(struct 
drm_device *dev,
}
 
/* Framebuffer objects must have a valid device address for scanout */
-   if (obj->dev_addr == DMA_ERROR_CODE) {
+   if (!obj->mapped) {
ret = -EINVAL;
goto err_unref;
}
diff --git a/drivers/gpu/drm/armada/armada_gem.c 
b/drivers/gpu/drm/armada/armada_gem.c
index d6c2a5d190eb..a76ca21d063b 100644
--- a/drivers/gpu/drm/armada/armada_gem.c
+++ b/drivers/gpu/drm/armada/armada_gem.c
@@ -175,6 +175,7 @@ armada_gem_linear_back(struct drm_device *dev, struct 
armada_gem_object *obj)
 
obj->phys_addr = obj->linear->start;
obj->dev_addr = obj->linear->start;
+   obj->mapped = true;
}
 
DRM_DEBUG_DRIVER("obj %p phys %#llx dev %#llx\n", obj,
@@ -205,7 +206,6 @@ armada_gem_alloc_private_object(struct drm_device *dev, 
size_t size)
return NULL;
 
drm_gem_private_object_init(dev, >obj, size);
-   obj->dev_addr = DMA_ERROR_CODE;
 
DRM_DEBUG_DRIVER("alloc private obj %p size %zu\n", obj, size);
 
@@ -229,8 +229,6 @@ static struct armada_gem_object 
*armada_gem_alloc_object(struct drm_device *dev,
return NULL;
}
 
-   obj->dev_addr = DMA_ERROR_CODE;
-
mapping = obj->obj.filp->f_mapping;
mapping_set_gfp_mask(mapping, GFP_HIGHUSER | __GFP_RECLAIMABLE);
 
@@ -610,5 +608,6 @@ int armada_gem_map_import(struct armada_gem_object *dobj)
return -EINVAL;
}
dobj->dev_addr = sg_dma_address(dobj->sgt->sgl);
+   dobj->mapped = true;
return 0;
 }
diff --git a/drivers/gpu/drm/armada/armada_gem.h 
b/drivers/gpu/drm/armada/armada_gem.h
index b88d2b9853c7..6e524e0676bb 100644
--- a/drivers/gpu/drm/armada/armada_gem.h
+++ b/drivers/gpu/drm/armada/armada_gem.h
@@ -16,6 +16,7 @@ struct armada_gem_object {
void*addr;
phys_addr_t phys_addr;
resource_size_t dev_addr;
+   boolmapped;
struct drm_mm_node  *linear;/* for linear backed */
struct page *page;  /* for page backed */
struct sg_table *sgt;   /* for imported */
-- 
2.11.0



[PATCH 04/44] drm/exynos: don't use DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE already isn't a valid API to user for drivers and will
go away soon.  exynos_drm_fb_dma_addr uses it a an error return when
the passed in index is invalid, but the callers never check for it
but instead pass the address straight to the hardware.

Add a WARN_ON instead and just return 0.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/exynos/exynos_drm_fb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_fb.c 
b/drivers/gpu/drm/exynos/exynos_drm_fb.c
index c77a5aced81a..d48fd7c918f8 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_fb.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_fb.c
@@ -181,8 +181,8 @@ dma_addr_t exynos_drm_fb_dma_addr(struct drm_framebuffer 
*fb, int index)
 {
struct exynos_drm_fb *exynos_fb = to_exynos_fb(fb);
 
-   if (index >= MAX_FB_BUFFER)
-   return DMA_ERROR_CODE;
+   if (WARN_ON_ONCE(index >= MAX_FB_BUFFER))
+   return 0;
 
return exynos_fb->dma_addr[index];
 }
-- 
2.11.0



[PATCH 07/44] xen-swiotlb: consolidate xen_swiotlb_dma_ops

2017-06-16 Thread Christoph Hellwig
ARM and x86 had duplicated versions of the dma_ops structure, the
only difference is that x86 hasn't wired up the set_dma_mask,
mmap, and get_sgtable ops yet.  On x86 all of them are identical
to the generic version, so they aren't needed but harmless.

All the symbols used only for xen_swiotlb_dma_ops can now be marked
static as well.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 arch/arm/xen/mm.c  | 17 
 arch/x86/xen/pci-swiotlb-xen.c | 14 ---
 drivers/xen/swiotlb-xen.c  | 93 ++
 include/xen/swiotlb-xen.h  | 62 +---
 4 files changed, 49 insertions(+), 137 deletions(-)

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index f0325d96b97a..785d2a562a23 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -185,23 +185,6 @@ EXPORT_SYMBOL_GPL(xen_destroy_contiguous_region);
 const struct dma_map_ops *xen_dma_ops;
 EXPORT_SYMBOL(xen_dma_ops);
 
-static const struct dma_map_ops xen_swiotlb_dma_ops = {
-   .alloc = xen_swiotlb_alloc_coherent,
-   .free = xen_swiotlb_free_coherent,
-   .sync_single_for_cpu = xen_swiotlb_sync_single_for_cpu,
-   .sync_single_for_device = xen_swiotlb_sync_single_for_device,
-   .sync_sg_for_cpu = xen_swiotlb_sync_sg_for_cpu,
-   .sync_sg_for_device = xen_swiotlb_sync_sg_for_device,
-   .map_sg = xen_swiotlb_map_sg_attrs,
-   .unmap_sg = xen_swiotlb_unmap_sg_attrs,
-   .map_page = xen_swiotlb_map_page,
-   .unmap_page = xen_swiotlb_unmap_page,
-   .dma_supported = xen_swiotlb_dma_supported,
-   .set_dma_mask = xen_swiotlb_set_dma_mask,
-   .mmap = xen_swiotlb_dma_mmap,
-   .get_sgtable = xen_swiotlb_get_sgtable,
-};
-
 int __init xen_mm_init(void)
 {
struct gnttab_cache_flush cflush;
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 42b08f8fc2ca..37c6056a7bba 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -18,20 +18,6 @@
 
 int xen_swiotlb __read_mostly;
 
-static const struct dma_map_ops xen_swiotlb_dma_ops = {
-   .alloc = xen_swiotlb_alloc_coherent,
-   .free = xen_swiotlb_free_coherent,
-   .sync_single_for_cpu = xen_swiotlb_sync_single_for_cpu,
-   .sync_single_for_device = xen_swiotlb_sync_single_for_device,
-   .sync_sg_for_cpu = xen_swiotlb_sync_sg_for_cpu,
-   .sync_sg_for_device = xen_swiotlb_sync_sg_for_device,
-   .map_sg = xen_swiotlb_map_sg_attrs,
-   .unmap_sg = xen_swiotlb_unmap_sg_attrs,
-   .map_page = xen_swiotlb_map_page,
-   .unmap_page = xen_swiotlb_unmap_page,
-   .dma_supported = xen_swiotlb_dma_supported,
-};
-
 /*
  * pci_xen_swiotlb_detect - set xen_swiotlb to 1 if necessary
  *
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 8dab0d3dc172..a0f006daab48 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -295,7 +295,8 @@ int __ref xen_swiotlb_init(int verbose, bool early)
free_pages((unsigned long)xen_io_tlb_start, order);
return rc;
 }
-void *
+
+static void *
 xen_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
   dma_addr_t *dma_handle, gfp_t flags,
   unsigned long attrs)
@@ -346,9 +347,8 @@ xen_swiotlb_alloc_coherent(struct device *hwdev, size_t 
size,
memset(ret, 0, size);
return ret;
 }
-EXPORT_SYMBOL_GPL(xen_swiotlb_alloc_coherent);
 
-void
+static void
 xen_swiotlb_free_coherent(struct device *hwdev, size_t size, void *vaddr,
  dma_addr_t dev_addr, unsigned long attrs)
 {
@@ -369,8 +369,6 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t 
size, void *vaddr,
 
xen_free_coherent_pages(hwdev, size, vaddr, (dma_addr_t)phys, attrs);
 }
-EXPORT_SYMBOL_GPL(xen_swiotlb_free_coherent);
-
 
 /*
  * Map a single buffer of the indicated size for DMA in streaming mode.  The
@@ -379,7 +377,7 @@ EXPORT_SYMBOL_GPL(xen_swiotlb_free_coherent);
  * Once the device is given the dma address, the device owns this memory until
  * either xen_swiotlb_unmap_page or xen_swiotlb_dma_sync_single is performed.
  */
-dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
+static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
unsigned long offset, size_t size,
enum dma_data_direction dir,
unsigned long attrs)
@@ -429,7 +427,6 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct 
page *page,
 
return DMA_ERROR_CODE;
 }
-EXPORT_SYMBOL_GPL(xen_swiotlb_map_page);
 
 /*
  * Unmap a single streaming mode DMA translation.  The dma_addr and size must
@@ -467,13 +464,12 @@ static void xen_unmap_single(struct device *hwdev, 
dma_addr_t dev_addr,
dma_mark_clean(phys_to_virt(paddr), size);
 }
 
-void 

[PATCH 10/44] ia64: remove DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
All ia64 dma_mapping_ops instances already have a mapping_error member.

Signed-off-by: Christoph Hellwig 
---
 arch/ia64/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/ia64/include/asm/dma-mapping.h 
b/arch/ia64/include/asm/dma-mapping.h
index 73ec3c6f4cfe..3ce5ab4339f3 100644
--- a/arch/ia64/include/asm/dma-mapping.h
+++ b/arch/ia64/include/asm/dma-mapping.h
@@ -12,8 +12,6 @@
 
 #define ARCH_HAS_DMA_GET_REQUIRED_MASK
 
-#define DMA_ERROR_CODE 0
-
 extern const struct dma_map_ops *dma_ops;
 extern struct ia64_machine_vector ia64_mv;
 extern void set_iommu_machvec(void);
-- 
2.11.0



[PATCH 08/44] xen-swiotlb: implement ->mapping_error

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Konrad Rzeszutek Wilk 
---
 drivers/xen/swiotlb-xen.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index a0f006daab48..c3a04b2d7532 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -67,6 +67,8 @@ static unsigned long dma_alloc_coherent_mask(struct device 
*dev,
 }
 #endif
 
+#define XEN_SWIOTLB_ERROR_CODE (~(dma_addr_t)0x0)
+
 static char *xen_io_tlb_start, *xen_io_tlb_end;
 static unsigned long xen_io_tlb_nslabs;
 /*
@@ -410,7 +412,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
map = swiotlb_tbl_map_single(dev, start_dma_addr, phys, size, dir,
 attrs);
if (map == SWIOTLB_MAP_ERROR)
-   return DMA_ERROR_CODE;
+   return XEN_SWIOTLB_ERROR_CODE;
 
dev_addr = xen_phys_to_bus(map);
xen_dma_map_page(dev, pfn_to_page(map >> PAGE_SHIFT),
@@ -425,7 +427,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
attrs |= DMA_ATTR_SKIP_CPU_SYNC;
swiotlb_tbl_unmap_single(dev, map, size, dir, attrs);
 
-   return DMA_ERROR_CODE;
+   return XEN_SWIOTLB_ERROR_CODE;
 }
 
 /*
@@ -715,6 +717,11 @@ xen_swiotlb_get_sgtable(struct device *dev, struct 
sg_table *sgt,
return dma_common_get_sgtable(dev, sgt, cpu_addr, handle, size);
 }
 
+static int xen_swiotlb_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == XEN_SWIOTLB_ERROR_CODE;
+}
+
 const struct dma_map_ops xen_swiotlb_dma_ops = {
.alloc = xen_swiotlb_alloc_coherent,
.free = xen_swiotlb_free_coherent,
@@ -730,4 +737,5 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.set_dma_mask = xen_swiotlb_set_dma_mask,
.mmap = xen_swiotlb_dma_mmap,
.get_sgtable = xen_swiotlb_get_sgtable,
+   .mapping_error  = xen_swiotlb_mapping_error,
 };
-- 
2.11.0



[PATCH 11/44] m32r: remove DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
dma-noop is the only dma_mapping_ops instance for m32r and does not return
errors.

Signed-off-by: Christoph Hellwig 
---
 arch/m32r/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/m32r/include/asm/dma-mapping.h 
b/arch/m32r/include/asm/dma-mapping.h
index c01d9f52d228..aff3ae8b62f7 100644
--- a/arch/m32r/include/asm/dma-mapping.h
+++ b/arch/m32r/include/asm/dma-mapping.h
@@ -8,8 +8,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
return _noop_ops;
-- 
2.11.0



[PATCH 12/44] microblaze: remove DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
microblaze does not return errors for dma_map_page.

Signed-off-by: Christoph Hellwig 
---
 arch/microblaze/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/microblaze/include/asm/dma-mapping.h 
b/arch/microblaze/include/asm/dma-mapping.h
index 3fad5e722a66..e15cd2f76e23 100644
--- a/arch/microblaze/include/asm/dma-mapping.h
+++ b/arch/microblaze/include/asm/dma-mapping.h
@@ -28,8 +28,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-
 #define __dma_alloc_coherent(dev, gfp, size, handle)   NULL
 #define __dma_free_coherent(size, addr)((void)0)
 
-- 
2.11.0



[PATCH 16/44] arm64: remove DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
The dma alloc interface returns an error by return NULL, and the
mapping interfaces rely on the mapping_error method, which the dummy
ops already implement correctly.

Thus remove the DMA_ERROR_CODE define.

Signed-off-by: Christoph Hellwig 
Reviewed-by: Robin Murphy 
---
 arch/arm64/include/asm/dma-mapping.h | 1 -
 arch/arm64/mm/dma-mapping.c  | 3 +--
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index 5392dbeffa45..cf8fc8f05580 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -24,7 +24,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0)
 extern const struct dma_map_ops dummy_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 3216e098c058..147fbb907a2f 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -184,7 +184,6 @@ static void *__dma_alloc(struct device *dev, size_t size,
 no_map:
__dma_free_coherent(dev, size, ptr, *dma_handle, attrs);
 no_mem:
-   *dma_handle = DMA_ERROR_CODE;
return NULL;
 }
 
@@ -487,7 +486,7 @@ static dma_addr_t __dummy_map_page(struct device *dev, 
struct page *page,
   enum dma_data_direction dir,
   unsigned long attrs)
 {
-   return DMA_ERROR_CODE;
+   return 0;
 }
 
 static void __dummy_unmap_page(struct device *dev, dma_addr_t dev_addr,
-- 
2.11.0



[PATCH 18/44] iommu/amd: implement ->mapping_error

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/amd_iommu.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 63cacf5d6cf2..d41280e869de 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -54,6 +54,8 @@
 #include "amd_iommu_types.h"
 #include "irq_remapping.h"
 
+#define AMD_IOMMU_MAPPING_ERROR0
+
 #define CMD_SET_TYPE(cmd, t) ((cmd)->data[1] |= ((t) << 28))
 
 #define LOOP_TIMEOUT   10
@@ -2394,7 +2396,7 @@ static dma_addr_t __map_single(struct device *dev,
paddr &= PAGE_MASK;
 
address = dma_ops_alloc_iova(dev, dma_dom, pages, dma_mask);
-   if (address == DMA_ERROR_CODE)
+   if (address == AMD_IOMMU_MAPPING_ERROR)
goto out;
 
prot = dir2prot(direction);
@@ -2431,7 +2433,7 @@ static dma_addr_t __map_single(struct device *dev,
 
dma_ops_free_iova(dma_dom, address, pages);
 
-   return DMA_ERROR_CODE;
+   return AMD_IOMMU_MAPPING_ERROR;
 }
 
 /*
@@ -2483,7 +2485,7 @@ static dma_addr_t map_page(struct device *dev, struct 
page *page,
if (PTR_ERR(domain) == -EINVAL)
return (dma_addr_t)paddr;
else if (IS_ERR(domain))
-   return DMA_ERROR_CODE;
+   return AMD_IOMMU_MAPPING_ERROR;
 
dma_mask = *dev->dma_mask;
dma_dom = to_dma_ops_domain(domain);
@@ -2560,7 +2562,7 @@ static int map_sg(struct device *dev, struct scatterlist 
*sglist,
npages = sg_num_pages(dev, sglist, nelems);
 
address = dma_ops_alloc_iova(dev, dma_dom, npages, dma_mask);
-   if (address == DMA_ERROR_CODE)
+   if (address == AMD_IOMMU_MAPPING_ERROR)
goto out_err;
 
prot = dir2prot(direction);
@@ -2683,7 +2685,7 @@ static void *alloc_coherent(struct device *dev, size_t 
size,
*dma_addr = __map_single(dev, dma_dom, page_to_phys(page),
 size, DMA_BIDIRECTIONAL, dma_mask);
 
-   if (*dma_addr == DMA_ERROR_CODE)
+   if (*dma_addr == AMD_IOMMU_MAPPING_ERROR)
goto out_free;
 
return page_address(page);
@@ -2732,6 +2734,11 @@ static int amd_iommu_dma_supported(struct device *dev, 
u64 mask)
return check_device(dev);
 }
 
+static int amd_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == AMD_IOMMU_MAPPING_ERROR;
+}
+
 static const struct dma_map_ops amd_iommu_dma_ops = {
.alloc  = alloc_coherent,
.free   = free_coherent,
@@ -2740,6 +2747,7 @@ static const struct dma_map_ops amd_iommu_dma_ops = {
.map_sg = map_sg,
.unmap_sg   = unmap_sg,
.dma_supported  = amd_iommu_dma_supported,
+   .mapping_error  = amd_iommu_mapping_error,
 };
 
 static int init_reserved_iova_ranges(void)
-- 
2.11.0



[PATCH 20/44] sparc: implement ->mapping_error

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
Acked-by: David S. Miller 
---
 arch/sparc/include/asm/dma-mapping.h |  2 --
 arch/sparc/kernel/iommu.c| 12 +---
 arch/sparc/kernel/iommu_common.h |  2 ++
 arch/sparc/kernel/pci_sun4v.c| 14 ++
 4 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/arch/sparc/include/asm/dma-mapping.h 
b/arch/sparc/include/asm/dma-mapping.h
index 69cc627779f2..b8e8dfcd065d 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -5,8 +5,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-
 #define HAVE_ARCH_DMA_SUPPORTED 1
 int dma_supported(struct device *dev, u64 mask);
 
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index c63ba99ca551..dafa316d978d 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -314,7 +314,7 @@ static dma_addr_t dma_4u_map_page(struct device *dev, 
struct page *page,
 bad_no_ctx:
if (printk_ratelimit())
WARN_ON(1);
-   return DMA_ERROR_CODE;
+   return SPARC_MAPPING_ERROR;
 }
 
 static void strbuf_flush(struct strbuf *strbuf, struct iommu *iommu,
@@ -547,7 +547,7 @@ static int dma_4u_map_sg(struct device *dev, struct 
scatterlist *sglist,
 
if (outcount < incount) {
outs = sg_next(outs);
-   outs->dma_address = DMA_ERROR_CODE;
+   outs->dma_address = SPARC_MAPPING_ERROR;
outs->dma_length = 0;
}
 
@@ -573,7 +573,7 @@ static int dma_4u_map_sg(struct device *dev, struct 
scatterlist *sglist,
iommu_tbl_range_free(>tbl, vaddr, npages,
 IOMMU_ERROR_CODE);
 
-   s->dma_address = DMA_ERROR_CODE;
+   s->dma_address = SPARC_MAPPING_ERROR;
s->dma_length = 0;
}
if (s == outs)
@@ -741,6 +741,11 @@ static void dma_4u_sync_sg_for_cpu(struct device *dev,
spin_unlock_irqrestore(>lock, flags);
 }
 
+static int dma_4u_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == SPARC_MAPPING_ERROR;
+}
+
 static const struct dma_map_ops sun4u_dma_ops = {
.alloc  = dma_4u_alloc_coherent,
.free   = dma_4u_free_coherent,
@@ -750,6 +755,7 @@ static const struct dma_map_ops sun4u_dma_ops = {
.unmap_sg   = dma_4u_unmap_sg,
.sync_single_for_cpu= dma_4u_sync_single_for_cpu,
.sync_sg_for_cpu= dma_4u_sync_sg_for_cpu,
+   .mapping_error  = dma_4u_mapping_error,
 };
 
 const struct dma_map_ops *dma_ops = _dma_ops;
diff --git a/arch/sparc/kernel/iommu_common.h b/arch/sparc/kernel/iommu_common.h
index 828493329f68..5ea5c192b1d9 100644
--- a/arch/sparc/kernel/iommu_common.h
+++ b/arch/sparc/kernel/iommu_common.h
@@ -47,4 +47,6 @@ static inline int is_span_boundary(unsigned long entry,
return iommu_is_span_boundary(entry, nr, shift, boundary_size);
 }
 
+#define SPARC_MAPPING_ERROR(~(dma_addr_t)0x0)
+
 #endif /* _IOMMU_COMMON_H */
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 68bec7c97cb8..8e2a56f4c03a 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -412,12 +412,12 @@ static dma_addr_t dma_4v_map_page(struct device *dev, 
struct page *page,
 bad:
if (printk_ratelimit())
WARN_ON(1);
-   return DMA_ERROR_CODE;
+   return SPARC_MAPPING_ERROR;
 
 iommu_map_fail:
local_irq_restore(flags);
iommu_tbl_range_free(tbl, bus_addr, npages, IOMMU_ERROR_CODE);
-   return DMA_ERROR_CODE;
+   return SPARC_MAPPING_ERROR;
 }
 
 static void dma_4v_unmap_page(struct device *dev, dma_addr_t bus_addr,
@@ -590,7 +590,7 @@ static int dma_4v_map_sg(struct device *dev, struct 
scatterlist *sglist,
 
if (outcount < incount) {
outs = sg_next(outs);
-   outs->dma_address = DMA_ERROR_CODE;
+   outs->dma_address = SPARC_MAPPING_ERROR;
outs->dma_length = 0;
}
 
@@ -607,7 +607,7 @@ static int dma_4v_map_sg(struct device *dev, struct 
scatterlist *sglist,
iommu_tbl_range_free(tbl, vaddr, npages,
 IOMMU_ERROR_CODE);
/* XXX demap? XXX */
-   s->dma_address = DMA_ERROR_CODE;
+   s->dma_address = SPARC_MAPPING_ERROR;
s->dma_length = 0;
}
if (s == outs)
@@ -669,6 +669,11 @@ static void dma_4v_unmap_sg(struct device *dev, struct 
scatterlist *sglist,
local_irq_restore(flags);
 }
 
+static int dma_4v_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == 

[PATCH 17/44] hexagon: switch to use ->mapping_error for error reporting

2017-06-16 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
Acked-by: Richard Kuo 
---
 arch/hexagon/include/asm/dma-mapping.h |  2 --
 arch/hexagon/kernel/dma.c  | 12 +---
 arch/hexagon/kernel/hexagon_ksyms.c|  1 -
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/hexagon/include/asm/dma-mapping.h 
b/arch/hexagon/include/asm/dma-mapping.h
index d3a87bd9b686..00e3f10113b0 100644
--- a/arch/hexagon/include/asm/dma-mapping.h
+++ b/arch/hexagon/include/asm/dma-mapping.h
@@ -29,8 +29,6 @@
 #include 
 
 struct device;
-extern int bad_dma_address;
-#define DMA_ERROR_CODE bad_dma_address
 
 extern const struct dma_map_ops *dma_ops;
 
diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index e74b65009587..71269dc0f225 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -25,11 +25,11 @@
 #include 
 #include 
 
+#define HEXAGON_MAPPING_ERROR  0
+
 const struct dma_map_ops *dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
-int bad_dma_address;  /*  globals are automatically initialized to zero  */
-
 static inline void *dma_addr_to_virt(dma_addr_t dma_addr)
 {
return phys_to_virt((unsigned long) dma_addr);
@@ -181,7 +181,7 @@ static dma_addr_t hexagon_map_page(struct device *dev, 
struct page *page,
WARN_ON(size == 0);
 
if (!check_addr("map_single", dev, bus, size))
-   return bad_dma_address;
+   return HEXAGON_MAPPING_ERROR;
 
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
dma_sync(dma_addr_to_virt(bus), size, dir);
@@ -203,6 +203,11 @@ static void hexagon_sync_single_for_device(struct device 
*dev,
dma_sync(dma_addr_to_virt(dma_handle), size, dir);
 }
 
+static int hexagon_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == HEXAGON_MAPPING_ERROR;
+}
+
 const struct dma_map_ops hexagon_dma_ops = {
.alloc  = hexagon_dma_alloc_coherent,
.free   = hexagon_free_coherent,
@@ -210,6 +215,7 @@ const struct dma_map_ops hexagon_dma_ops = {
.map_page   = hexagon_map_page,
.sync_single_for_cpu = hexagon_sync_single_for_cpu,
.sync_single_for_device = hexagon_sync_single_for_device,
+   .mapping_error  = hexagon_mapping_error;
.is_phys= 1,
 };
 
diff --git a/arch/hexagon/kernel/hexagon_ksyms.c 
b/arch/hexagon/kernel/hexagon_ksyms.c
index 00bcad9cbd8f..aa248f595431 100644
--- a/arch/hexagon/kernel/hexagon_ksyms.c
+++ b/arch/hexagon/kernel/hexagon_ksyms.c
@@ -40,7 +40,6 @@ EXPORT_SYMBOL(memset);
 /* Additional variables */
 EXPORT_SYMBOL(__phys_offset);
 EXPORT_SYMBOL(_dflt_cache_att);
-EXPORT_SYMBOL(bad_dma_address);
 
 #define DECLARE_EXPORT(name) \
extern void name(void); EXPORT_SYMBOL(name)
-- 
2.11.0



[PATCH 19/44] s390: implement ->mapping_error

2017-06-16 Thread Christoph Hellwig
s390 can also use noop_dma_ops, and while that currently does not return
errors it will so in the future.  Implementing the mapping_error method
is the proper way to have per-ops error conditions.

Signed-off-by: Christoph Hellwig 
Acked-by: Gerald Schaefer 
---
 arch/s390/include/asm/dma-mapping.h |  2 --
 arch/s390/pci/pci_dma.c | 18 +-
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/s390/include/asm/dma-mapping.h 
b/arch/s390/include/asm/dma-mapping.h
index 3108b8dbe266..512ad0eaa11a 100644
--- a/arch/s390/include/asm/dma-mapping.h
+++ b/arch/s390/include/asm/dma-mapping.h
@@ -8,8 +8,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t) 0x0)
-
 extern const struct dma_map_ops s390_pci_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
diff --git a/arch/s390/pci/pci_dma.c b/arch/s390/pci/pci_dma.c
index 9081a57fa340..ea623faab525 100644
--- a/arch/s390/pci/pci_dma.c
+++ b/arch/s390/pci/pci_dma.c
@@ -14,6 +14,8 @@
 #include 
 #include 
 
+#define S390_MAPPING_ERROR (~(dma_addr_t) 0x0)
+
 static struct kmem_cache *dma_region_table_cache;
 static struct kmem_cache *dma_page_table_cache;
 static int s390_iommu_strict;
@@ -281,7 +283,7 @@ static dma_addr_t dma_alloc_address(struct device *dev, int 
size)
 
 out_error:
spin_unlock_irqrestore(>iommu_bitmap_lock, flags);
-   return DMA_ERROR_CODE;
+   return S390_MAPPING_ERROR;
 }
 
 static void dma_free_address(struct device *dev, dma_addr_t dma_addr, int size)
@@ -329,7 +331,7 @@ static dma_addr_t s390_dma_map_pages(struct device *dev, 
struct page *page,
/* This rounds up number of pages based on size and offset */
nr_pages = iommu_num_pages(pa, size, PAGE_SIZE);
dma_addr = dma_alloc_address(dev, nr_pages);
-   if (dma_addr == DMA_ERROR_CODE) {
+   if (dma_addr == S390_MAPPING_ERROR) {
ret = -ENOSPC;
goto out_err;
}
@@ -352,7 +354,7 @@ static dma_addr_t s390_dma_map_pages(struct device *dev, 
struct page *page,
 out_err:
zpci_err("map error:\n");
zpci_err_dma(ret, pa);
-   return DMA_ERROR_CODE;
+   return S390_MAPPING_ERROR;
 }
 
 static void s390_dma_unmap_pages(struct device *dev, dma_addr_t dma_addr,
@@ -429,7 +431,7 @@ static int __s390_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
int ret;
 
dma_addr_base = dma_alloc_address(dev, nr_pages);
-   if (dma_addr_base == DMA_ERROR_CODE)
+   if (dma_addr_base == S390_MAPPING_ERROR)
return -ENOMEM;
 
dma_addr = dma_addr_base;
@@ -476,7 +478,7 @@ static int s390_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
for (i = 1; i < nr_elements; i++) {
s = sg_next(s);
 
-   s->dma_address = DMA_ERROR_CODE;
+   s->dma_address = S390_MAPPING_ERROR;
s->dma_length = 0;
 
if (s->offset || (size & ~PAGE_MASK) ||
@@ -525,6 +527,11 @@ static void s390_dma_unmap_sg(struct device *dev, struct 
scatterlist *sg,
s->dma_length = 0;
}
 }
+   
+static int s390_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == S390_MAPPING_ERROR;
+}
 
 int zpci_dma_init_device(struct zpci_dev *zdev)
 {
@@ -657,6 +664,7 @@ const struct dma_map_ops s390_pci_dma_ops = {
.unmap_sg   = s390_dma_unmap_sg,
.map_page   = s390_dma_map_pages,
.unmap_page = s390_dma_unmap_pages,
+   .mapping_error  = s390_mapping_error,
/* if we support direct DMA this must be conditional */
.is_phys= 0,
/* dma_supported is unconditionally true without a callback */
-- 
2.11.0



[PATCH 22/44] x86/pci-nommu: implement ->mapping_error

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/kernel/pci-nommu.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index a88952ef371c..085fe6ce4049 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+#define NOMMU_MAPPING_ERROR0
+
 static int
 check_addr(char *name, struct device *hwdev, dma_addr_t bus, size_t size)
 {
@@ -33,7 +35,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct 
page *page,
dma_addr_t bus = page_to_phys(page) + offset;
WARN_ON(size == 0);
if (!check_addr("map_single", dev, bus, size))
-   return DMA_ERROR_CODE;
+   return NOMMU_MAPPING_ERROR;
flush_write_buffers();
return bus;
 }
@@ -88,6 +90,11 @@ static void nommu_sync_sg_for_device(struct device *dev,
flush_write_buffers();
 }
 
+static int nommu_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == NOMMU_MAPPING_ERROR;
+}
+
 const struct dma_map_ops nommu_dma_ops = {
.alloc  = dma_generic_alloc_coherent,
.free   = dma_generic_free_coherent,
@@ -96,4 +103,5 @@ const struct dma_map_ops nommu_dma_ops = {
.sync_single_for_device = nommu_sync_single_for_device,
.sync_sg_for_device = nommu_sync_sg_for_device,
.is_phys= 1,
+   .mapping_error  = nommu_mapping_error,
 };
-- 
2.11.0



[PATCH 23/44] x86/calgary: implement ->mapping_error

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/kernel/pci-calgary_64.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index fda7867046d0..e75b490f2b0b 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -50,6 +50,8 @@
 #include 
 #include 
 
+#define CALGARY_MAPPING_ERROR  0
+
 #ifdef CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT
 int use_calgary __read_mostly = 1;
 #else
@@ -252,7 +254,7 @@ static unsigned long iommu_range_alloc(struct device *dev,
if (panic_on_overflow)
panic("Calgary: fix the allocator.\n");
else
-   return DMA_ERROR_CODE;
+   return CALGARY_MAPPING_ERROR;
}
}
 
@@ -272,10 +274,10 @@ static dma_addr_t iommu_alloc(struct device *dev, struct 
iommu_table *tbl,
 
entry = iommu_range_alloc(dev, tbl, npages);
 
-   if (unlikely(entry == DMA_ERROR_CODE)) {
+   if (unlikely(entry == CALGARY_MAPPING_ERROR)) {
pr_warn("failed to allocate %u pages in iommu %p\n",
npages, tbl);
-   return DMA_ERROR_CODE;
+   return CALGARY_MAPPING_ERROR;
}
 
/* set the return dma address */
@@ -295,7 +297,7 @@ static void iommu_free(struct iommu_table *tbl, dma_addr_t 
dma_addr,
unsigned long flags;
 
/* were we called with bad_dma_address? */
-   badend = DMA_ERROR_CODE + (EMERGENCY_PAGES * PAGE_SIZE);
+   badend = CALGARY_MAPPING_ERROR + (EMERGENCY_PAGES * PAGE_SIZE);
if (unlikely(dma_addr < badend)) {
WARN(1, KERN_ERR "Calgary: driver tried unmapping bad DMA "
   "address 0x%Lx\n", dma_addr);
@@ -380,7 +382,7 @@ static int calgary_map_sg(struct device *dev, struct 
scatterlist *sg,
npages = iommu_num_pages(vaddr, s->length, PAGE_SIZE);
 
entry = iommu_range_alloc(dev, tbl, npages);
-   if (entry == DMA_ERROR_CODE) {
+   if (entry == CALGARY_MAPPING_ERROR) {
/* makes sure unmap knows to stop */
s->dma_length = 0;
goto error;
@@ -398,7 +400,7 @@ static int calgary_map_sg(struct device *dev, struct 
scatterlist *sg,
 error:
calgary_unmap_sg(dev, sg, nelems, dir, 0);
for_each_sg(sg, s, nelems, i) {
-   sg->dma_address = DMA_ERROR_CODE;
+   sg->dma_address = CALGARY_MAPPING_ERROR;
sg->dma_length = 0;
}
return 0;
@@ -453,7 +455,7 @@ static void* calgary_alloc_coherent(struct device *dev, 
size_t size,
 
/* set up tces to cover the allocated range */
mapping = iommu_alloc(dev, tbl, ret, npages, DMA_BIDIRECTIONAL);
-   if (mapping == DMA_ERROR_CODE)
+   if (mapping == CALGARY_MAPPING_ERROR)
goto free;
*dma_handle = mapping;
return ret;
@@ -478,6 +480,11 @@ static void calgary_free_coherent(struct device *dev, 
size_t size,
free_pages((unsigned long)vaddr, get_order(size));
 }
 
+static int calgary_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == CALGARY_MAPPING_ERROR;
+}
+
 static const struct dma_map_ops calgary_dma_ops = {
.alloc = calgary_alloc_coherent,
.free = calgary_free_coherent,
@@ -485,6 +492,7 @@ static const struct dma_map_ops calgary_dma_ops = {
.unmap_sg = calgary_unmap_sg,
.map_page = calgary_map_page,
.unmap_page = calgary_unmap_page,
+   .mapping_error = calgary_mapping_error,
 };
 
 static inline void __iomem * busno_to_bbar(unsigned char num)
@@ -732,7 +740,7 @@ static void __init calgary_reserve_regions(struct pci_dev 
*dev)
struct iommu_table *tbl = pci_iommu(dev->bus);
 
/* reserve EMERGENCY_PAGES from bad_dma_address and up */
-   iommu_range_reserve(tbl, DMA_ERROR_CODE, EMERGENCY_PAGES);
+   iommu_range_reserve(tbl, CALGARY_MAPPING_ERROR, EMERGENCY_PAGES);
 
/* avoid the BIOS/VGA first 640KB-1MB region */
/* for CalIOC2 - avoid the entire first MB */
-- 
2.11.0



[PATCH 25/44] arm: implement ->mapping_error

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/common/dmabounce.c| 13 +---
 arch/arm/include/asm/dma-iommu.h   |  2 ++
 arch/arm/include/asm/dma-mapping.h |  1 -
 arch/arm/mm/dma-mapping.c  | 41 --
 4 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 9b1b7be2ec0e..4060378e0f14 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -33,6 +33,7 @@
 #include 
 
 #include 
+#include 
 
 #undef STATS
 
@@ -256,7 +257,7 @@ static inline dma_addr_t map_single(struct device *dev, 
void *ptr, size_t size,
if (buf == NULL) {
dev_err(dev, "%s: unable to map unsafe buffer %p!\n",
   __func__, ptr);
-   return DMA_ERROR_CODE;
+   return ARM_MAPPING_ERROR;
}
 
dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n",
@@ -326,7 +327,7 @@ static dma_addr_t dmabounce_map_page(struct device *dev, 
struct page *page,
 
ret = needs_bounce(dev, dma_addr, size);
if (ret < 0)
-   return DMA_ERROR_CODE;
+   return ARM_MAPPING_ERROR;
 
if (ret == 0) {
arm_dma_ops.sync_single_for_device(dev, dma_addr, size, dir);
@@ -335,7 +336,7 @@ static dma_addr_t dmabounce_map_page(struct device *dev, 
struct page *page,
 
if (PageHighMem(page)) {
dev_err(dev, "DMA buffer bouncing of HIGHMEM pages is not 
supported\n");
-   return DMA_ERROR_CODE;
+   return ARM_MAPPING_ERROR;
}
 
return map_single(dev, page_address(page) + offset, size, dir, attrs);
@@ -452,6 +453,11 @@ static int dmabounce_set_mask(struct device *dev, u64 
dma_mask)
return arm_dma_ops.set_dma_mask(dev, dma_mask);
 }
 
+static int dmabounce_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return arm_dma_ops.mapping_error(dev, dma_addr);
+}
+
 static const struct dma_map_ops dmabounce_ops = {
.alloc  = arm_dma_alloc,
.free   = arm_dma_free,
@@ -466,6 +472,7 @@ static const struct dma_map_ops dmabounce_ops = {
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_sync_sg_for_device,
.set_dma_mask   = dmabounce_set_mask,
+   .mapping_error  = dmabounce_mapping_error,
 };
 
 static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev,
diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
index 2ef282f96651..389a26a10ea3 100644
--- a/arch/arm/include/asm/dma-iommu.h
+++ b/arch/arm/include/asm/dma-iommu.h
@@ -9,6 +9,8 @@
 #include 
 #include 
 
+#define ARM_MAPPING_ERROR  (~(dma_addr_t)0x0)
+
 struct dma_iommu_mapping {
/* iommu specific data */
struct iommu_domain *domain;
diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 680d3f3889e7..52a8fd5a8edb 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -12,7 +12,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
 extern const struct dma_map_ops arm_dma_ops;
 extern const struct dma_map_ops arm_coherent_dma_ops;
 
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index bd83c531828a..8f2c5a8a98f0 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -180,6 +180,11 @@ static void arm_dma_sync_single_for_device(struct device 
*dev,
__dma_page_cpu_to_dev(page, offset, size, dir);
 }
 
+static int arm_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == ARM_MAPPING_ERROR;
+}
+
 const struct dma_map_ops arm_dma_ops = {
.alloc  = arm_dma_alloc,
.free   = arm_dma_free,
@@ -193,6 +198,7 @@ const struct dma_map_ops arm_dma_ops = {
.sync_single_for_device = arm_dma_sync_single_for_device,
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_sync_sg_for_device,
+   .mapping_error  = arm_dma_mapping_error,
 };
 EXPORT_SYMBOL(arm_dma_ops);
 
@@ -211,6 +217,7 @@ const struct dma_map_ops arm_coherent_dma_ops = {
.get_sgtable= arm_dma_get_sgtable,
.map_page   = arm_coherent_dma_map_page,
.map_sg = arm_dma_map_sg,
+   .mapping_error  = arm_dma_mapping_error,
 };
 EXPORT_SYMBOL(arm_coherent_dma_ops);
 
@@ -799,7 +806,7 @@ static void *__dma_alloc(struct device *dev, size_t size, 
dma_addr_t *handle,
gfp &= ~(__GFP_COMP);
args.gfp = gfp;
 
-   *handle = DMA_ERROR_CODE;
+   *handle = ARM_MAPPING_ERROR;
allowblock = gfpflags_allow_blocking(gfp);
cma = allowblock ? 

[PATCH 26/44] dma-mapping: remove DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
And update the documentation - dma_mapping_error has been supported
everywhere for a long time.

Signed-off-by: Christoph Hellwig 
---
 Documentation/DMA-API-HOWTO.txt | 31 +--
 include/linux/dma-mapping.h |  5 -
 2 files changed, 5 insertions(+), 31 deletions(-)

diff --git a/Documentation/DMA-API-HOWTO.txt b/Documentation/DMA-API-HOWTO.txt
index 979228bc9035..4ed388356898 100644
--- a/Documentation/DMA-API-HOWTO.txt
+++ b/Documentation/DMA-API-HOWTO.txt
@@ -550,32 +550,11 @@ and to unmap it:
dma_unmap_single(dev, dma_handle, size, direction);
 
 You should call dma_mapping_error() as dma_map_single() could fail and return
-error. Not all DMA implementations support the dma_mapping_error() interface.
-However, it is a good practice to call dma_mapping_error() interface, which
-will invoke the generic mapping error check interface. Doing so will ensure
-that the mapping code will work correctly on all DMA implementations without
-any dependency on the specifics of the underlying implementation. Using the
-returned address without checking for errors could result in failures ranging
-from panics to silent data corruption. A couple of examples of incorrect ways
-to check for errors that make assumptions about the underlying DMA
-implementation are as follows and these are applicable to dma_map_page() as
-well.
-
-Incorrect example 1:
-   dma_addr_t dma_handle;
-
-   dma_handle = dma_map_single(dev, addr, size, direction);
-   if ((dma_handle & 0x != 0) || (dma_handle >= 0x100)) {
-   goto map_error;
-   }
-
-Incorrect example 2:
-   dma_addr_t dma_handle;
-
-   dma_handle = dma_map_single(dev, addr, size, direction);
-   if (dma_handle == DMA_ERROR_CODE) {
-   goto map_error;
-   }
+error.  Doing so will ensure that the mapping code will work correctly on all
+DMA implementations without any dependency on the specifics of the underlying
+implementation. Using the returned address without checking for errors could
+result in failures ranging from panics to silent data corruption.  The same
+applies to dma_map_page() as well.
 
 You should call dma_unmap_single() when the DMA activity is finished, e.g.,
 from the interrupt which told you that the DMA transfer is done.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 4f3eecedca2d..a57875309bfd 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -546,12 +546,7 @@ static inline int dma_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
 
if (get_dma_ops(dev)->mapping_error)
return get_dma_ops(dev)->mapping_error(dev, dma_addr);
-
-#ifdef DMA_ERROR_CODE
-   return dma_addr == DMA_ERROR_CODE;
-#else
return 0;
-#endif
 }
 
 #ifndef HAVE_ARCH_DMA_SUPPORTED
-- 
2.11.0



[PATCH 28/44] sparc: remove arch specific dma_supported implementations

2017-06-16 Thread Christoph Hellwig
Usually dma_supported decisions are done by the dma_map_ops instance.
Switch sparc to that model by providing a ->dma_supported instance for
sbus that always returns false, and implementations tailored to the sun4u
and sun4v cases for sparc64, and leave it unimplemented for PCI on
sparc32, which means always supported.

Signed-off-by: Christoph Hellwig 
Acked-by: David S. Miller 
---
 arch/sparc/include/asm/dma-mapping.h |  3 ---
 arch/sparc/kernel/iommu.c| 40 +++-
 arch/sparc/kernel/ioport.c   | 22 ++--
 arch/sparc/kernel/pci_sun4v.c| 17 +++
 4 files changed, 39 insertions(+), 43 deletions(-)

diff --git a/arch/sparc/include/asm/dma-mapping.h 
b/arch/sparc/include/asm/dma-mapping.h
index 98da9f92c318..60bf1633d554 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -5,9 +5,6 @@
 #include 
 #include 
 
-#define HAVE_ARCH_DMA_SUPPORTED 1
-int dma_supported(struct device *dev, u64 mask);
-
 static inline void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
  enum dma_data_direction dir)
 {
diff --git a/arch/sparc/kernel/iommu.c b/arch/sparc/kernel/iommu.c
index dafa316d978d..fcbcc031f615 100644
--- a/arch/sparc/kernel/iommu.c
+++ b/arch/sparc/kernel/iommu.c
@@ -746,6 +746,21 @@ static int dma_4u_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
return dma_addr == SPARC_MAPPING_ERROR;
 }
 
+static int dma_4u_supported(struct device *dev, u64 device_mask)
+{
+   struct iommu *iommu = dev->archdata.iommu;
+
+   if (device_mask > DMA_BIT_MASK(32))
+   return 0;
+   if ((device_mask & iommu->dma_addr_mask) == iommu->dma_addr_mask)
+   return 1;
+#ifdef CONFIG_PCI
+   if (dev_is_pci(dev))
+   return pci64_dma_supported(to_pci_dev(dev), device_mask);
+#endif
+   return 0;
+}
+
 static const struct dma_map_ops sun4u_dma_ops = {
.alloc  = dma_4u_alloc_coherent,
.free   = dma_4u_free_coherent,
@@ -755,32 +770,9 @@ static const struct dma_map_ops sun4u_dma_ops = {
.unmap_sg   = dma_4u_unmap_sg,
.sync_single_for_cpu= dma_4u_sync_single_for_cpu,
.sync_sg_for_cpu= dma_4u_sync_sg_for_cpu,
+   .dma_supported  = dma_4u_supported,
.mapping_error  = dma_4u_mapping_error,
 };
 
 const struct dma_map_ops *dma_ops = _dma_ops;
 EXPORT_SYMBOL(dma_ops);
-
-int dma_supported(struct device *dev, u64 device_mask)
-{
-   struct iommu *iommu = dev->archdata.iommu;
-   u64 dma_addr_mask = iommu->dma_addr_mask;
-
-   if (device_mask > DMA_BIT_MASK(32)) {
-   if (iommu->atu)
-   dma_addr_mask = iommu->atu->dma_addr_mask;
-   else
-   return 0;
-   }
-
-   if ((device_mask & dma_addr_mask) == dma_addr_mask)
-   return 1;
-
-#ifdef CONFIG_PCI
-   if (dev_is_pci(dev))
-   return pci64_dma_supported(to_pci_dev(dev), device_mask);
-#endif
-
-   return 0;
-}
-EXPORT_SYMBOL(dma_supported);
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index dd081d557609..12894f259bea 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -401,6 +401,11 @@ static void sbus_sync_sg_for_device(struct device *dev, 
struct scatterlist *sg,
BUG();
 }
 
+static int sbus_dma_supported(struct device *dev, u64 mask)
+{
+   return 0;
+}
+
 static const struct dma_map_ops sbus_dma_ops = {
.alloc  = sbus_alloc_coherent,
.free   = sbus_free_coherent,
@@ -410,6 +415,7 @@ static const struct dma_map_ops sbus_dma_ops = {
.unmap_sg   = sbus_unmap_sg,
.sync_sg_for_cpu= sbus_sync_sg_for_cpu,
.sync_sg_for_device = sbus_sync_sg_for_device,
+   .dma_supported  = sbus_dma_supported,
 };
 
 static int __init sparc_register_ioport(void)
@@ -655,22 +661,6 @@ EXPORT_SYMBOL(pci32_dma_ops);
 const struct dma_map_ops *dma_ops = _dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
-
-/*
- * Return whether the given PCI device DMA address mask can be
- * supported properly.  For example, if your device can only drive the
- * low 24-bits during PCI bus mastering, then you would pass
- * 0x00ff as the mask to this function.
- */
-int dma_supported(struct device *dev, u64 mask)
-{
-   if (dev_is_pci(dev))
-   return 1;
-
-   return 0;
-}
-EXPORT_SYMBOL(dma_supported);
-
 #ifdef CONFIG_PROC_FS
 
 static int sparc_io_proc_show(struct seq_file *m, void *v)
diff --git a/arch/sparc/kernel/pci_sun4v.c b/arch/sparc/kernel/pci_sun4v.c
index 8e2a56f4c03a..24f21c726dfa 100644
--- a/arch/sparc/kernel/pci_sun4v.c
+++ b/arch/sparc/kernel/pci_sun4v.c
@@ -24,6 +24,7 @@
 
 #include "pci_impl.h"
 #include "iommu_common.h"
+#include 

[PATCH 24/44] x86: remove DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
All dma_map_ops instances now handle their errors through
->mapping_error.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h 
b/arch/x86/include/asm/dma-mapping.h
index 08a0838b83fb..c35d228aa381 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -19,8 +19,6 @@
 # define ISA_DMA_BIT_MASK DMA_BIT_MASK(32)
 #endif
 
-#define DMA_ERROR_CODE 0
-
 extern int iommu_merge;
 extern struct device x86_dma_fallback_dev;
 extern int panic_on_overflow;
-- 
2.11.0



[PATCH 29/44] dma-noop: remove dma_supported and mapping_error methods

2017-06-16 Thread Christoph Hellwig
These just duplicate the default behavior if no method is provided.

Signed-off-by: Christoph Hellwig 
---
 lib/dma-noop.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/lib/dma-noop.c b/lib/dma-noop.c
index de26c8b68f34..643a074f139d 100644
--- a/lib/dma-noop.c
+++ b/lib/dma-noop.c
@@ -54,23 +54,11 @@ static int dma_noop_map_sg(struct device *dev, struct 
scatterlist *sgl, int nent
return nents;
 }
 
-static int dma_noop_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return 0;
-}
-
-static int dma_noop_supported(struct device *dev, u64 mask)
-{
-   return 1;
-}
-
 const struct dma_map_ops dma_noop_ops = {
.alloc  = dma_noop_alloc,
.free   = dma_noop_free,
.map_page   = dma_noop_map_page,
.map_sg = dma_noop_map_sg,
-   .mapping_error  = dma_noop_mapping_error,
-   .dma_supported  = dma_noop_supported,
 };
 
 EXPORT_SYMBOL(dma_noop_ops);
-- 
2.11.0



[PATCH 27/44] sparc: remove leon_dma_ops

2017-06-16 Thread Christoph Hellwig
We can just use pci32_dma_ops directly.

Signed-off-by: Christoph Hellwig 
Acked-by: David S. Miller 
---
 arch/sparc/include/asm/dma-mapping.h | 3 +--
 arch/sparc/kernel/ioport.c   | 5 +
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/sparc/include/asm/dma-mapping.h 
b/arch/sparc/include/asm/dma-mapping.h
index b8e8dfcd065d..98da9f92c318 100644
--- a/arch/sparc/include/asm/dma-mapping.h
+++ b/arch/sparc/include/asm/dma-mapping.h
@@ -17,7 +17,6 @@ static inline void dma_cache_sync(struct device *dev, void 
*vaddr, size_t size,
 }
 
 extern const struct dma_map_ops *dma_ops;
-extern const struct dma_map_ops *leon_dma_ops;
 extern const struct dma_map_ops pci32_dma_ops;
 
 extern struct bus_type pci_bus_type;
@@ -26,7 +25,7 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
 {
 #ifdef CONFIG_SPARC_LEON
if (sparc_cpu_model == sparc_leon)
-   return leon_dma_ops;
+   return _dma_ops;
 #endif
 #if defined(CONFIG_SPARC32) && defined(CONFIG_PCI)
if (bus == _bus_type)
diff --git a/arch/sparc/kernel/ioport.c b/arch/sparc/kernel/ioport.c
index cf20033a1458..dd081d557609 100644
--- a/arch/sparc/kernel/ioport.c
+++ b/arch/sparc/kernel/ioport.c
@@ -637,6 +637,7 @@ static void pci32_sync_sg_for_device(struct device *device, 
struct scatterlist *
}
 }
 
+/* note: leon re-uses pci32_dma_ops */
 const struct dma_map_ops pci32_dma_ops = {
.alloc  = pci32_alloc_coherent,
.free   = pci32_free_coherent,
@@ -651,10 +652,6 @@ const struct dma_map_ops pci32_dma_ops = {
 };
 EXPORT_SYMBOL(pci32_dma_ops);
 
-/* leon re-uses pci32_dma_ops */
-const struct dma_map_ops *leon_dma_ops = _dma_ops;
-EXPORT_SYMBOL(leon_dma_ops);
-
 const struct dma_map_ops *dma_ops = _dma_ops;
 EXPORT_SYMBOL(dma_ops);
 
-- 
2.11.0



[PATCH 30/44] dma-virt: remove dma_supported and mapping_error methods

2017-06-16 Thread Christoph Hellwig
These just duplicate the default behavior if no method is provided.

Signed-off-by: Christoph Hellwig 
---
 lib/dma-virt.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/lib/dma-virt.c b/lib/dma-virt.c
index dcd4df1f7174..5c4f11329721 100644
--- a/lib/dma-virt.c
+++ b/lib/dma-virt.c
@@ -51,22 +51,10 @@ static int dma_virt_map_sg(struct device *dev, struct 
scatterlist *sgl,
return nents;
 }
 
-static int dma_virt_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return false;
-}
-
-static int dma_virt_supported(struct device *dev, u64 mask)
-{
-   return true;
-}
-
 const struct dma_map_ops dma_virt_ops = {
.alloc  = dma_virt_alloc,
.free   = dma_virt_free,
.map_page   = dma_virt_map_page,
.map_sg = dma_virt_map_sg,
-   .mapping_error  = dma_virt_mapping_error,
-   .dma_supported  = dma_virt_supported,
 };
 EXPORT_SYMBOL(dma_virt_ops);
-- 
2.11.0



[PATCH 34/44] arm: remove arch specific dma_supported implementation

2017-06-16 Thread Christoph Hellwig
And instead wire it up as method for all the dma_map_ops instances.

Note that the code seems a little fishy for dmabounce and iommu, but
for now I'd like to preserve the existing behavior 1:1.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/common/dmabounce.c| 1 +
 arch/arm/include/asm/dma-iommu.h   | 2 ++
 arch/arm/include/asm/dma-mapping.h | 3 ---
 arch/arm/mm/dma-mapping.c  | 7 +--
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 4060378e0f14..6ecd5be5d37e 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -473,6 +473,7 @@ static const struct dma_map_ops dmabounce_ops = {
.sync_sg_for_device = arm_dma_sync_sg_for_device,
.set_dma_mask   = dmabounce_set_mask,
.mapping_error  = dmabounce_mapping_error,
+   .dma_supported  = arm_dma_supported,
 };
 
 static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev,
diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
index 389a26a10ea3..c090ec675eac 100644
--- a/arch/arm/include/asm/dma-iommu.h
+++ b/arch/arm/include/asm/dma-iommu.h
@@ -35,5 +35,7 @@ int arm_iommu_attach_device(struct device *dev,
struct dma_iommu_mapping *mapping);
 void arm_iommu_detach_device(struct device *dev);
 
+int arm_dma_supported(struct device *dev, u64 mask);
+
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 52a8fd5a8edb..8dabcfdf4505 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -20,9 +20,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return _dma_ops;
 }
 
-#define HAVE_ARCH_DMA_SUPPORTED 1
-extern int dma_supported(struct device *dev, u64 mask);
-
 #ifdef __arch_page_to_dma
 #error Please update to __arch_pfn_to_dma
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 8f2c5a8a98f0..b9677ada421f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -199,6 +199,7 @@ const struct dma_map_ops arm_dma_ops = {
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_sync_sg_for_device,
.mapping_error  = arm_dma_mapping_error,
+   .dma_supported  = arm_dma_supported,
 };
 EXPORT_SYMBOL(arm_dma_ops);
 
@@ -218,6 +219,7 @@ const struct dma_map_ops arm_coherent_dma_ops = {
.map_page   = arm_coherent_dma_map_page,
.map_sg = arm_dma_map_sg,
.mapping_error  = arm_dma_mapping_error,
+   .dma_supported  = arm_dma_supported,
 };
 EXPORT_SYMBOL(arm_coherent_dma_ops);
 
@@ -1184,11 +1186,10 @@ void arm_dma_sync_sg_for_device(struct device *dev, 
struct scatterlist *sg,
  * during bus mastering, then you would pass 0x00ff as the mask
  * to this function.
  */
-int dma_supported(struct device *dev, u64 mask)
+int arm_dma_supported(struct device *dev, u64 mask)
 {
return __dma_supported(dev, mask, false);
 }
-EXPORT_SYMBOL(dma_supported);
 
 #define PREALLOC_DMA_DEBUG_ENTRIES 4096
 
@@ -2149,6 +2150,7 @@ const struct dma_map_ops iommu_ops = {
.unmap_resource = arm_iommu_unmap_resource,
 
.mapping_error  = arm_dma_mapping_error,
+   .dma_supported  = arm_dma_supported,
 };
 
 const struct dma_map_ops iommu_coherent_ops = {
@@ -2167,6 +2169,7 @@ const struct dma_map_ops iommu_coherent_ops = {
.unmap_resource = arm_iommu_unmap_resource,
 
.mapping_error  = arm_dma_mapping_error,
+   .dma_supported  = arm_dma_supported,
 };
 
 /**
-- 
2.11.0



[PATCH 38/44] arm: implement ->dma_supported instead of ->set_dma_mask

2017-06-16 Thread Christoph Hellwig
Same behavior, less code duplication.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/common/dmabounce.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c
index 6ecd5be5d37e..9a92de63426f 100644
--- a/arch/arm/common/dmabounce.c
+++ b/arch/arm/common/dmabounce.c
@@ -445,12 +445,12 @@ static void dmabounce_sync_for_device(struct device *dev,
arm_dma_ops.sync_single_for_device(dev, handle, size, dir);
 }
 
-static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
+static int dmabounce_dma_supported(struct device *dev, u64 dma_mask)
 {
if (dev->archdata.dmabounce)
return 0;
 
-   return arm_dma_ops.set_dma_mask(dev, dma_mask);
+   return arm_dma_ops.dma_supported(dev, dma_mask);
 }
 
 static int dmabounce_mapping_error(struct device *dev, dma_addr_t dma_addr)
@@ -471,9 +471,8 @@ static const struct dma_map_ops dmabounce_ops = {
.unmap_sg   = arm_dma_unmap_sg,
.sync_sg_for_cpu= arm_dma_sync_sg_for_cpu,
.sync_sg_for_device = arm_dma_sync_sg_for_device,
-   .set_dma_mask   = dmabounce_set_mask,
+   .dma_supported  = dmabounce_dma_supported,
.mapping_error  = dmabounce_mapping_error,
-   .dma_supported  = arm_dma_supported,
 };
 
 static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev,
-- 
2.11.0



[PATCH 36/44] dma-mapping: remove HAVE_ARCH_DMA_SUPPORTED

2017-06-16 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index a57875309bfd..3e5908656226 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -549,7 +549,6 @@ static inline int dma_mapping_error(struct device *dev, 
dma_addr_t dma_addr)
return 0;
 }
 
-#ifndef HAVE_ARCH_DMA_SUPPORTED
 static inline int dma_supported(struct device *dev, u64 mask)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -560,7 +559,6 @@ static inline int dma_supported(struct device *dev, u64 
mask)
return 1;
return ops->dma_supported(dev, mask);
 }
-#endif
 
 #ifndef HAVE_ARCH_DMA_SET_MASK
 static inline int dma_set_mask(struct device *dev, u64 mask)
-- 
2.11.0



[PATCH 35/44] x86: remove arch specific dma_supported implementation

2017-06-16 Thread Christoph Hellwig
And instead wire it up as method for all the dma_map_ops instances.

Note that this also means the arch specific check will be fully instead
of partially applied in the AMD iommu driver.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/dma-mapping.h | 3 ---
 arch/x86/include/asm/iommu.h   | 2 ++
 arch/x86/kernel/amd_gart_64.c  | 1 +
 arch/x86/kernel/pci-calgary_64.c   | 1 +
 arch/x86/kernel/pci-dma.c  | 7 +--
 arch/x86/kernel/pci-nommu.c| 1 +
 arch/x86/pci/sta2x11-fixup.c   | 3 ++-
 drivers/iommu/amd_iommu.c  | 2 ++
 drivers/iommu/intel-iommu.c| 3 +++
 9 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h 
b/arch/x86/include/asm/dma-mapping.h
index c35d228aa381..398c79889f5c 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -33,9 +33,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
 bool arch_dma_alloc_attrs(struct device **dev, gfp_t *gfp);
 #define arch_dma_alloc_attrs arch_dma_alloc_attrs
 
-#define HAVE_ARCH_DMA_SUPPORTED 1
-extern int dma_supported(struct device *hwdev, u64 mask);
-
 extern void *dma_generic_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_addr, gfp_t flag,
unsigned long attrs);
diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index 793869879464..fca144a104e4 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -6,6 +6,8 @@ extern int force_iommu, no_iommu;
 extern int iommu_detected;
 extern int iommu_pass_through;
 
+int x86_dma_supported(struct device *dev, u64 mask);
+
 /* 10 seconds */
 #define DMAR_OPERATION_TIMEOUT ((cycles_t) tsc_khz*10*1000)
 
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index 815dd63f49d0..cc0e8bc0ea3f 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -704,6 +704,7 @@ static const struct dma_map_ops gart_dma_ops = {
.alloc  = gart_alloc_coherent,
.free   = gart_free_coherent,
.mapping_error  = gart_mapping_error,
+   .dma_supported  = x86_dma_supported,
 };
 
 static void gart_iommu_shutdown(void)
diff --git a/arch/x86/kernel/pci-calgary_64.c b/arch/x86/kernel/pci-calgary_64.c
index e75b490f2b0b..5286a4a92cf7 100644
--- a/arch/x86/kernel/pci-calgary_64.c
+++ b/arch/x86/kernel/pci-calgary_64.c
@@ -493,6 +493,7 @@ static const struct dma_map_ops calgary_dma_ops = {
.map_page = calgary_map_page,
.unmap_page = calgary_unmap_page,
.mapping_error = calgary_mapping_error,
+   .dma_supported = x86_dma_supported,
 };
 
 static inline void __iomem * busno_to_bbar(unsigned char num)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 3a216ec869cd..b6f5684be3b5 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -213,10 +213,8 @@ static __init int iommu_setup(char *p)
 }
 early_param("iommu", iommu_setup);
 
-int dma_supported(struct device *dev, u64 mask)
+int x86_dma_supported(struct device *dev, u64 mask)
 {
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-
 #ifdef CONFIG_PCI
if (mask > 0x && forbid_dac > 0) {
dev_info(dev, "PCI: Disallowing DAC for device\n");
@@ -224,9 +222,6 @@ int dma_supported(struct device *dev, u64 mask)
}
 #endif
 
-   if (ops->dma_supported)
-   return ops->dma_supported(dev, mask);
-
/* Copied from i386. Doesn't make much sense, because it will
   only work for pci_alloc_coherent.
   The caller just has to use GFP_DMA in this case. */
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index 085fe6ce4049..a6d404087fe3 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -104,4 +104,5 @@ const struct dma_map_ops nommu_dma_ops = {
.sync_sg_for_device = nommu_sync_sg_for_device,
.is_phys= 1,
.mapping_error  = nommu_mapping_error,
+   .dma_supported  = x86_dma_supported,
 };
diff --git a/arch/x86/pci/sta2x11-fixup.c b/arch/x86/pci/sta2x11-fixup.c
index ec008e800b45..53d600217973 100644
--- a/arch/x86/pci/sta2x11-fixup.c
+++ b/arch/x86/pci/sta2x11-fixup.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define STA2X11_SWIOTLB_SIZE (4*1024*1024)
 extern int swiotlb_late_init_with_default_size(size_t default_size);
@@ -191,7 +192,7 @@ static const struct dma_map_ops sta2x11_dma_ops = {
.sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
.sync_sg_for_device = swiotlb_sync_sg_for_device,
.mapping_error = swiotlb_dma_mapping_error,
-   .dma_supported = NULL, /* FIXME: we should use this instead! */
+   .dma_supported = x86_dma_supported,
 };
 

[PATCH 37/44] mips/loongson64: implement ->dma_supported instead of ->set_dma_mask

2017-06-16 Thread Christoph Hellwig
Same behavior, less code duplication.

Signed-off-by: Christoph Hellwig 
---
 arch/mips/loongson64/common/dma-swiotlb.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/mips/loongson64/common/dma-swiotlb.c 
b/arch/mips/loongson64/common/dma-swiotlb.c
index 178ca17a5667..34486c138206 100644
--- a/arch/mips/loongson64/common/dma-swiotlb.c
+++ b/arch/mips/loongson64/common/dma-swiotlb.c
@@ -75,19 +75,11 @@ static void loongson_dma_sync_sg_for_device(struct device 
*dev,
mb();
 }
 
-static int loongson_dma_set_mask(struct device *dev, u64 mask)
+static int loongson_dma_supported(struct device *dev, u64 mask)
 {
-   if (!dev->dma_mask || !dma_supported(dev, mask))
-   return -EIO;
-
-   if (mask > DMA_BIT_MASK(loongson_sysconf.dma_mask_bits)) {
-   *dev->dma_mask = DMA_BIT_MASK(loongson_sysconf.dma_mask_bits);
-   return -EIO;
-   }
-
-   *dev->dma_mask = mask;
-
-   return 0;
+   if (mask > DMA_BIT_MASK(loongson_sysconf.dma_mask_bits))
+   return 0;
+   return swiotlb_dma_supported(dev, mask);
 }
 
 dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
@@ -126,8 +118,7 @@ static const struct dma_map_ops loongson_dma_map_ops = {
.sync_sg_for_cpu = swiotlb_sync_sg_for_cpu,
.sync_sg_for_device = loongson_dma_sync_sg_for_device,
.mapping_error = swiotlb_dma_mapping_error,
-   .dma_supported = swiotlb_dma_supported,
-   .set_dma_mask = loongson_dma_set_mask
+   .dma_supported = loongson_dma_supported,
 };
 
 void __init plat_swiotlb_setup(void)
-- 
2.11.0



[PATCH 40/44] tile: remove dma_supported and mapping_error methods

2017-06-16 Thread Christoph Hellwig
These just duplicate the default behavior if no method is provided.

Signed-off-by: Christoph Hellwig 
---
 arch/tile/kernel/pci-dma.c | 30 --
 1 file changed, 30 deletions(-)

diff --git a/arch/tile/kernel/pci-dma.c b/arch/tile/kernel/pci-dma.c
index 569bb6dd154a..f2abedc8a080 100644
--- a/arch/tile/kernel/pci-dma.c
+++ b/arch/tile/kernel/pci-dma.c
@@ -317,18 +317,6 @@ static void tile_dma_sync_sg_for_device(struct device *dev,
}
 }
 
-static inline int
-tile_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return 0;
-}
-
-static inline int
-tile_dma_supported(struct device *dev, u64 mask)
-{
-   return 1;
-}
-
 static const struct dma_map_ops tile_default_dma_map_ops = {
.alloc = tile_dma_alloc_coherent,
.free = tile_dma_free_coherent,
@@ -340,8 +328,6 @@ static const struct dma_map_ops tile_default_dma_map_ops = {
.sync_single_for_device = tile_dma_sync_single_for_device,
.sync_sg_for_cpu = tile_dma_sync_sg_for_cpu,
.sync_sg_for_device = tile_dma_sync_sg_for_device,
-   .mapping_error = tile_dma_mapping_error,
-   .dma_supported = tile_dma_supported
 };
 
 const struct dma_map_ops *tile_dma_map_ops = _default_dma_map_ops;
@@ -504,18 +490,6 @@ static void tile_pci_dma_sync_sg_for_device(struct device 
*dev,
}
 }
 
-static inline int
-tile_pci_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
-{
-   return 0;
-}
-
-static inline int
-tile_pci_dma_supported(struct device *dev, u64 mask)
-{
-   return 1;
-}
-
 static const struct dma_map_ops tile_pci_default_dma_map_ops = {
.alloc = tile_pci_dma_alloc_coherent,
.free = tile_pci_dma_free_coherent,
@@ -527,8 +501,6 @@ static const struct dma_map_ops 
tile_pci_default_dma_map_ops = {
.sync_single_for_device = tile_pci_dma_sync_single_for_device,
.sync_sg_for_cpu = tile_pci_dma_sync_sg_for_cpu,
.sync_sg_for_device = tile_pci_dma_sync_sg_for_device,
-   .mapping_error = tile_pci_dma_mapping_error,
-   .dma_supported = tile_pci_dma_supported
 };
 
 const struct dma_map_ops *gx_pci_dma_map_ops = _pci_default_dma_map_ops;
@@ -578,8 +550,6 @@ static const struct dma_map_ops pci_hybrid_dma_ops = {
.sync_single_for_device = tile_pci_dma_sync_single_for_device,
.sync_sg_for_cpu = tile_pci_dma_sync_sg_for_cpu,
.sync_sg_for_device = tile_pci_dma_sync_sg_for_device,
-   .mapping_error = tile_pci_dma_mapping_error,
-   .dma_supported = tile_pci_dma_supported
 };
 
 const struct dma_map_ops *gx_legacy_pci_dma_map_ops = _swiotlb_dma_ops;
-- 
2.11.0



[PATCH 43/44] dma-mapping: remove the set_dma_mask method

2017-06-16 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/dma.c   | 4 
 include/linux/dma-mapping.h | 6 --
 2 files changed, 10 deletions(-)

diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 41c749586bd2..466c9f07b288 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -316,10 +316,6 @@ EXPORT_SYMBOL(dma_set_coherent_mask);
 
 int __dma_set_mask(struct device *dev, u64 dma_mask)
 {
-   const struct dma_map_ops *dma_ops = get_dma_ops(dev);
-
-   if ((dma_ops != NULL) && (dma_ops->set_dma_mask != NULL))
-   return dma_ops->set_dma_mask(dev, dma_mask);
if (!dev->dma_mask || !dma_supported(dev, dma_mask))
return -EIO;
*dev->dma_mask = dma_mask;
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 3e5908656226..527f2ed8c645 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -127,7 +127,6 @@ struct dma_map_ops {
   enum dma_data_direction dir);
int (*mapping_error)(struct device *dev, dma_addr_t dma_addr);
int (*dma_supported)(struct device *dev, u64 mask);
-   int (*set_dma_mask)(struct device *dev, u64 mask);
 #ifdef ARCH_HAS_DMA_GET_REQUIRED_MASK
u64 (*get_required_mask)(struct device *dev);
 #endif
@@ -563,11 +562,6 @@ static inline int dma_supported(struct device *dev, u64 
mask)
 #ifndef HAVE_ARCH_DMA_SET_MASK
 static inline int dma_set_mask(struct device *dev, u64 mask)
 {
-   const struct dma_map_ops *ops = get_dma_ops(dev);
-
-   if (ops->set_dma_mask)
-   return ops->set_dma_mask(dev, mask);
-
if (!dev->dma_mask || !dma_supported(dev, mask))
return -EIO;
*dev->dma_mask = mask;
-- 
2.11.0



Re: [PATCH 1/3] dev: Prevent creating network devices with negative ifindex

2017-06-16 Thread Serhey Popovych
>> What do you think?
>
> Passing -1 is an error, it doesn't make sense  to try and be
> helpful to buggy userland.

Here is commit I actually change/fix:

commit 9c7dafbfab15 ("net: Allow to create links with given ifindex")

In this change done the opposite: check for ifm->ifi_index moved to
register_netdevice() from rtnl_newlink().

Let me understand finally: why do I need to reverse and move check
for dev->ifindex from register_netdevice() to rtnl_newlink()? That
will partially revert commit I note above.

-- 
Thanks,  Serhey


[PATCH 42/44] powerpc/cell: use the dma_supported method for ops switching

2017-06-16 Thread Christoph Hellwig
Besides removing the last instance of the set_dma_mask method this also
reduced the code duplication.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/platforms/cell/iommu.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/platforms/cell/iommu.c 
b/arch/powerpc/platforms/cell/iommu.c
index 497bfbdbd967..29d4f96ed33e 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -644,20 +644,14 @@ static void dma_fixed_unmap_sg(struct device *dev, struct 
scatterlist *sg,
   direction, attrs);
 }
 
-static int dma_fixed_dma_supported(struct device *dev, u64 mask)
-{
-   return mask == DMA_BIT_MASK(64);
-}
-
-static int dma_set_mask_and_switch(struct device *dev, u64 dma_mask);
+static int dma_suported_and_switch(struct device *dev, u64 dma_mask);
 
 static const struct dma_map_ops dma_iommu_fixed_ops = {
.alloc  = dma_fixed_alloc_coherent,
.free   = dma_fixed_free_coherent,
.map_sg = dma_fixed_map_sg,
.unmap_sg   = dma_fixed_unmap_sg,
-   .dma_supported  = dma_fixed_dma_supported,
-   .set_dma_mask   = dma_set_mask_and_switch,
+   .dma_supported  = dma_suported_and_switch,
.map_page   = dma_fixed_map_page,
.unmap_page = dma_fixed_unmap_page,
.mapping_error  = dma_iommu_mapping_error,
@@ -952,11 +946,8 @@ static u64 cell_iommu_get_fixed_address(struct device *dev)
return dev_addr;
 }
 
-static int dma_set_mask_and_switch(struct device *dev, u64 dma_mask)
+static int dma_suported_and_switch(struct device *dev, u64 dma_mask)
 {
-   if (!dev->dma_mask || !dma_supported(dev, dma_mask))
-   return -EIO;
-
if (dma_mask == DMA_BIT_MASK(64) &&
cell_iommu_get_fixed_address(dev) != OF_BAD_ADDR) {
u64 addr = cell_iommu_get_fixed_address(dev) +
@@ -965,14 +956,16 @@ static int dma_set_mask_and_switch(struct device *dev, 
u64 dma_mask)
dev_dbg(dev, "iommu: fixed addr = %llx\n", addr);
set_dma_ops(dev, _iommu_fixed_ops);
set_dma_offset(dev, addr);
-   } else {
+   return 1;
+   }
+
+   if (dma_iommu_dma_supported(dev, dma_mask)) {
dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
set_dma_ops(dev, get_pci_dma_ops());
cell_dma_dev_setup(dev);
+   return 1;
}
 
-   *dev->dma_mask = dma_mask;
-
return 0;
 }
 
@@ -1127,7 +1120,7 @@ static int __init cell_iommu_fixed_mapping_init(void)
cell_iommu_setup_window(iommu, np, dbase, dsize, 0);
}
 
-   dma_iommu_ops.set_dma_mask = dma_set_mask_and_switch;
+   dma_iommu_ops.dma_supported = dma_suported_and_switch;
set_pci_dma_ops(_iommu_ops);
 
return 0;
-- 
2.11.0



[PATCH 44/44] powerpc: merge __dma_set_mask into dma_set_mask

2017-06-16 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/dma-mapping.h |  1 -
 arch/powerpc/kernel/dma.c  | 13 -
 2 files changed, 4 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 73aedbe6c977..eaece3d3e225 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -112,7 +112,6 @@ static inline void set_dma_offset(struct device *dev, 
dma_addr_t off)
 #define HAVE_ARCH_DMA_SET_MASK 1
 extern int dma_set_mask(struct device *dev, u64 dma_mask);
 
-extern int __dma_set_mask(struct device *dev, u64 dma_mask);
 extern u64 __dma_get_required_mask(struct device *dev);
 
 static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t 
size)
diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
index 466c9f07b288..4194db10 100644
--- a/arch/powerpc/kernel/dma.c
+++ b/arch/powerpc/kernel/dma.c
@@ -314,14 +314,6 @@ EXPORT_SYMBOL(dma_set_coherent_mask);
 
 #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
 
-int __dma_set_mask(struct device *dev, u64 dma_mask)
-{
-   if (!dev->dma_mask || !dma_supported(dev, dma_mask))
-   return -EIO;
-   *dev->dma_mask = dma_mask;
-   return 0;
-}
-
 int dma_set_mask(struct device *dev, u64 dma_mask)
 {
if (ppc_md.dma_set_mask)
@@ -334,7 +326,10 @@ int dma_set_mask(struct device *dev, u64 dma_mask)
return phb->controller_ops.dma_set_mask(pdev, dma_mask);
}
 
-   return __dma_set_mask(dev, dma_mask);
+   if (!dev->dma_mask || !dma_supported(dev, dma_mask))
+   return -EIO;
+   *dev->dma_mask = dma_mask;
+   return 0;
 }
 EXPORT_SYMBOL(dma_set_mask);
 
-- 
2.11.0



[PATCH 41/44] powerpc/cell: clean up fixed mapping dma_ops initialization

2017-06-16 Thread Christoph Hellwig
By the time cell_pci_dma_dev_setup calls cell_dma_dev_setup no device can
have the fixed map_ops set yet as it's only set by the set_dma_mask
method.  So move the setup for the fixed case to be only called in that
place instead of indirecting through cell_dma_dev_setup.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/platforms/cell/iommu.c | 27 +++
 1 file changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/platforms/cell/iommu.c 
b/arch/powerpc/platforms/cell/iommu.c
index 948086e33a0c..497bfbdbd967 100644
--- a/arch/powerpc/platforms/cell/iommu.c
+++ b/arch/powerpc/platforms/cell/iommu.c
@@ -663,14 +663,9 @@ static const struct dma_map_ops dma_iommu_fixed_ops = {
.mapping_error  = dma_iommu_mapping_error,
 };
 
-static void cell_dma_dev_setup_fixed(struct device *dev);
-
 static void cell_dma_dev_setup(struct device *dev)
 {
-   /* Order is important here, these are not mutually exclusive */
-   if (get_dma_ops(dev) == _iommu_fixed_ops)
-   cell_dma_dev_setup_fixed(dev);
-   else if (get_pci_dma_ops() == _iommu_ops)
+   if (get_pci_dma_ops() == _iommu_ops)
set_iommu_table_base(dev, cell_get_iommu_table(dev));
else if (get_pci_dma_ops() == _direct_ops)
set_dma_offset(dev, cell_dma_direct_offset);
@@ -963,32 +958,24 @@ static int dma_set_mask_and_switch(struct device *dev, 
u64 dma_mask)
return -EIO;
 
if (dma_mask == DMA_BIT_MASK(64) &&
-   cell_iommu_get_fixed_address(dev) != OF_BAD_ADDR)
-   {
+   cell_iommu_get_fixed_address(dev) != OF_BAD_ADDR) {
+   u64 addr = cell_iommu_get_fixed_address(dev) +
+   dma_iommu_fixed_base;
dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
+   dev_dbg(dev, "iommu: fixed addr = %llx\n", addr);
set_dma_ops(dev, _iommu_fixed_ops);
+   set_dma_offset(dev, addr);
} else {
dev_dbg(dev, "iommu: not 64-bit, using default ops\n");
set_dma_ops(dev, get_pci_dma_ops());
+   cell_dma_dev_setup(dev);
}
 
-   cell_dma_dev_setup(dev);
-
*dev->dma_mask = dma_mask;
 
return 0;
 }
 
-static void cell_dma_dev_setup_fixed(struct device *dev)
-{
-   u64 addr;
-
-   addr = cell_iommu_get_fixed_address(dev) + dma_iommu_fixed_base;
-   set_dma_offset(dev, addr);
-
-   dev_dbg(dev, "iommu: fixed addr = %llx\n", addr);
-}
-
 static void insert_16M_pte(unsigned long addr, unsigned long *ptab,
   unsigned long base_pte)
 {
-- 
2.11.0



[PATCH 39/44] xen-swiotlb: remove xen_swiotlb_set_dma_mask

2017-06-16 Thread Christoph Hellwig
This just duplicates the generic implementation.

Signed-off-by: Christoph Hellwig 
---
 drivers/xen/swiotlb-xen.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index c3a04b2d7532..82fc54f8eb77 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -661,17 +661,6 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask)
return xen_virt_to_bus(xen_io_tlb_end - 1) <= mask;
 }
 
-static int
-xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask)
-{
-   if (!dev->dma_mask || !xen_swiotlb_dma_supported(dev, dma_mask))
-   return -EIO;
-
-   *dev->dma_mask = dma_mask;
-
-   return 0;
-}
-
 /*
  * Create userspace mapping for the DMA-coherent memory.
  * This function should be called with the pages from the current domain only,
@@ -734,7 +723,6 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.map_page = xen_swiotlb_map_page,
.unmap_page = xen_swiotlb_unmap_page,
.dma_supported = xen_swiotlb_dma_supported,
-   .set_dma_mask = xen_swiotlb_set_dma_mask,
.mmap = xen_swiotlb_dma_mmap,
.get_sgtable = xen_swiotlb_get_sgtable,
.mapping_error  = xen_swiotlb_mapping_error,
-- 
2.11.0



[PATCH 33/44] openrisc: remove arch-specific dma_supported implementation

2017-06-16 Thread Christoph Hellwig
This implementation is simply bogus - openrisc only has a simple
direct mapped DMA implementation and thus doesn't care about the
address.

Signed-off-by: Christoph Hellwig 
---
 arch/openrisc/include/asm/dma-mapping.h | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/arch/openrisc/include/asm/dma-mapping.h 
b/arch/openrisc/include/asm/dma-mapping.h
index a4ea139c2ef9..f41bd3cb76d9 100644
--- a/arch/openrisc/include/asm/dma-mapping.h
+++ b/arch/openrisc/include/asm/dma-mapping.h
@@ -33,11 +33,4 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return _dma_map_ops;
 }
 
-#define HAVE_ARCH_DMA_SUPPORTED 1
-static inline int dma_supported(struct device *dev, u64 dma_mask)
-{
-   /* Support 32 bit DMA mask exclusively */
-   return dma_mask == DMA_BIT_MASK(32);
-}
-
 #endif /* __ASM_OPENRISC_DMA_MAPPING_H */
-- 
2.11.0



[PATCH 32/44] hexagon: remove the unused dma_is_consistent prototype

2017-06-16 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/hexagon/include/asm/dma-mapping.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/hexagon/include/asm/dma-mapping.h 
b/arch/hexagon/include/asm/dma-mapping.h
index 9c15cb5271a6..463dbc18f853 100644
--- a/arch/hexagon/include/asm/dma-mapping.h
+++ b/arch/hexagon/include/asm/dma-mapping.h
@@ -37,7 +37,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return dma_ops;
 }
 
-extern int dma_is_consistent(struct device *dev, dma_addr_t dma_handle);
 extern void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
   enum dma_data_direction direction);
 
-- 
2.11.0



[PATCH 31/44] hexagon: remove arch-specific dma_supported implementation

2017-06-16 Thread Christoph Hellwig
This implementation is simply bogus - hexagon only has a simple
direct mapped DMA implementation and thus doesn't care about the
address.

Signed-off-by: Christoph Hellwig 
Acked-by: Richard Kuo 
---
 arch/hexagon/include/asm/dma-mapping.h | 2 --
 arch/hexagon/kernel/dma.c  | 9 -
 2 files changed, 11 deletions(-)

diff --git a/arch/hexagon/include/asm/dma-mapping.h 
b/arch/hexagon/include/asm/dma-mapping.h
index 00e3f10113b0..9c15cb5271a6 100644
--- a/arch/hexagon/include/asm/dma-mapping.h
+++ b/arch/hexagon/include/asm/dma-mapping.h
@@ -37,8 +37,6 @@ static inline const struct dma_map_ops 
*get_arch_dma_ops(struct bus_type *bus)
return dma_ops;
 }
 
-#define HAVE_ARCH_DMA_SUPPORTED 1
-extern int dma_supported(struct device *dev, u64 mask);
 extern int dma_is_consistent(struct device *dev, dma_addr_t dma_handle);
 extern void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
   enum dma_data_direction direction);
diff --git a/arch/hexagon/kernel/dma.c b/arch/hexagon/kernel/dma.c
index 71269dc0f225..9ff1b2041f85 100644
--- a/arch/hexagon/kernel/dma.c
+++ b/arch/hexagon/kernel/dma.c
@@ -35,15 +35,6 @@ static inline void *dma_addr_to_virt(dma_addr_t dma_addr)
return phys_to_virt((unsigned long) dma_addr);
 }
 
-int dma_supported(struct device *dev, u64 mask)
-{
-   if (mask == DMA_BIT_MASK(32))
-   return 1;
-   else
-   return 0;
-}
-EXPORT_SYMBOL(dma_supported);
-
 static struct gen_pool *coherent_pool;
 
 
-- 
2.11.0



[PATCH 13/44] openrisc: remove DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
openrisc does not return errors for dma_map_page.

Signed-off-by: Christoph Hellwig 
---
 arch/openrisc/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/openrisc/include/asm/dma-mapping.h 
b/arch/openrisc/include/asm/dma-mapping.h
index 0c0075f17145..a4ea139c2ef9 100644
--- a/arch/openrisc/include/asm/dma-mapping.h
+++ b/arch/openrisc/include/asm/dma-mapping.h
@@ -26,8 +26,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-
 extern const struct dma_map_ops or1k_dma_map_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-- 
2.11.0



[PATCH 14/44] sh: remove DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
sh does not return errors for dma_map_page.

Signed-off-by: Christoph Hellwig 
---
 arch/sh/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/sh/include/asm/dma-mapping.h 
b/arch/sh/include/asm/dma-mapping.h
index d99008af5f73..9b06be07db4d 100644
--- a/arch/sh/include/asm/dma-mapping.h
+++ b/arch/sh/include/asm/dma-mapping.h
@@ -9,8 +9,6 @@ static inline const struct dma_map_ops *get_arch_dma_ops(struct 
bus_type *bus)
return dma_ops;
 }
 
-#define DMA_ERROR_CODE 0
-
 void dma_cache_sync(struct device *dev, void *vaddr, size_t size,
enum dma_data_direction dir);
 
-- 
2.11.0



[PATCH 21/44] powerpc: implement ->mapping_error

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE is going to go away, so don't rely on it.  Instead
define a ->mapping_error method for all IOMMU based dma operation
instances.  The direct ops don't ever return an error and don't
need a ->mapping_error method.

Signed-off-by: Christoph Hellwig 
Acked-by: Michael Ellerman 
---
 arch/powerpc/include/asm/dma-mapping.h |  4 
 arch/powerpc/include/asm/iommu.h   |  4 
 arch/powerpc/kernel/dma-iommu.c|  6 ++
 arch/powerpc/kernel/iommu.c| 28 ++--
 arch/powerpc/platforms/cell/iommu.c|  1 +
 arch/powerpc/platforms/pseries/vio.c   |  3 ++-
 6 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/dma-mapping.h 
b/arch/powerpc/include/asm/dma-mapping.h
index 181a095468e4..73aedbe6c977 100644
--- a/arch/powerpc/include/asm/dma-mapping.h
+++ b/arch/powerpc/include/asm/dma-mapping.h
@@ -17,10 +17,6 @@
 #include 
 #include 
 
-#ifdef CONFIG_PPC64
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-#endif
-
 /* Some dma direct funcs must be visible for use in other dma_ops */
 extern void *__dma_direct_alloc_coherent(struct device *dev, size_t size,
 dma_addr_t *dma_handle, gfp_t flag,
diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 8a8ce220d7d0..20febe0b7f32 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -139,6 +139,8 @@ struct scatterlist;
 
 #ifdef CONFIG_PPC64
 
+#define IOMMU_MAPPING_ERROR(~(dma_addr_t)0x0)
+
 static inline void set_iommu_table_base(struct device *dev,
struct iommu_table *base)
 {
@@ -238,6 +240,8 @@ static inline int __init tce_iommu_bus_notifier_init(void)
 }
 #endif /* !CONFIG_IOMMU_API */
 
+int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
 #else
 
 static inline void *get_iommu_table_base(struct device *dev)
diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index fb7cbaa37658..8f7abf9baa63 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -105,6 +105,11 @@ static u64 dma_iommu_get_required_mask(struct device *dev)
return mask;
 }
 
+int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+   return dma_addr == IOMMU_MAPPING_ERROR;
+}
+
 struct dma_map_ops dma_iommu_ops = {
.alloc  = dma_iommu_alloc_coherent,
.free   = dma_iommu_free_coherent,
@@ -115,5 +120,6 @@ struct dma_map_ops dma_iommu_ops = {
.map_page   = dma_iommu_map_page,
.unmap_page = dma_iommu_unmap_page,
.get_required_mask  = dma_iommu_get_required_mask,
+   .mapping_error  = dma_iommu_mapping_error,
 };
 EXPORT_SYMBOL(dma_iommu_ops);
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index f2b724cd9e64..233ca3fe4754 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -198,11 +198,11 @@ static unsigned long iommu_range_alloc(struct device *dev,
if (unlikely(npages == 0)) {
if (printk_ratelimit())
WARN_ON(1);
-   return DMA_ERROR_CODE;
+   return IOMMU_MAPPING_ERROR;
}
 
if (should_fail_iommu(dev))
-   return DMA_ERROR_CODE;
+   return IOMMU_MAPPING_ERROR;
 
/*
 * We don't need to disable preemption here because any CPU can
@@ -278,7 +278,7 @@ static unsigned long iommu_range_alloc(struct device *dev,
} else {
/* Give up */
spin_unlock_irqrestore(&(pool->lock), flags);
-   return DMA_ERROR_CODE;
+   return IOMMU_MAPPING_ERROR;
}
}
 
@@ -310,13 +310,13 @@ static dma_addr_t iommu_alloc(struct device *dev, struct 
iommu_table *tbl,
  unsigned long attrs)
 {
unsigned long entry;
-   dma_addr_t ret = DMA_ERROR_CODE;
+   dma_addr_t ret = IOMMU_MAPPING_ERROR;
int build_fail;
 
entry = iommu_range_alloc(dev, tbl, npages, NULL, mask, align_order);
 
-   if (unlikely(entry == DMA_ERROR_CODE))
-   return DMA_ERROR_CODE;
+   if (unlikely(entry == IOMMU_MAPPING_ERROR))
+   return IOMMU_MAPPING_ERROR;
 
entry += tbl->it_offset;/* Offset into real TCE table */
ret = entry << tbl->it_page_shift;  /* Set the return dma address */
@@ -328,12 +328,12 @@ static dma_addr_t iommu_alloc(struct device *dev, struct 
iommu_table *tbl,
 
/* tbl->it_ops->set() only returns non-zero for transient errors.
 * Clean up the table bitmap in this case and return
-* DMA_ERROR_CODE. For all other errors the functionality is
+* IOMMU_MAPPING_ERROR. For all other errors the functionality 

[PATCH 15/44] xtensa: remove DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
xtensa already implements the mapping_error method for its only
dma_map_ops instance.

Signed-off-by: Christoph Hellwig 
---
 arch/xtensa/include/asm/dma-mapping.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/xtensa/include/asm/dma-mapping.h 
b/arch/xtensa/include/asm/dma-mapping.h
index c6140fa8c0be..269738dc9d1d 100644
--- a/arch/xtensa/include/asm/dma-mapping.h
+++ b/arch/xtensa/include/asm/dma-mapping.h
@@ -16,8 +16,6 @@
 #include 
 #include 
 
-#define DMA_ERROR_CODE (~(dma_addr_t)0x0)
-
 extern const struct dma_map_ops xtensa_dma_map_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-- 
2.11.0



[PATCH 01/44] firmware/ivc: use dma_mapping_error

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE is not supposed to be used by drivers.

Signed-off-by: Christoph Hellwig 
Acked-by: Thierry Reding 
---
 drivers/firmware/tegra/ivc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/tegra/ivc.c b/drivers/firmware/tegra/ivc.c
index 29ecfd815320..a01461d63f68 100644
--- a/drivers/firmware/tegra/ivc.c
+++ b/drivers/firmware/tegra/ivc.c
@@ -646,12 +646,12 @@ int tegra_ivc_init(struct tegra_ivc *ivc, struct device 
*peer, void *rx,
if (peer) {
ivc->rx.phys = dma_map_single(peer, rx, queue_size,
  DMA_BIDIRECTIONAL);
-   if (ivc->rx.phys == DMA_ERROR_CODE)
+   if (dma_mapping_error(peer, ivc->rx.phys))
return -ENOMEM;
 
ivc->tx.phys = dma_map_single(peer, tx, queue_size,
  DMA_BIDIRECTIONAL);
-   if (ivc->tx.phys == DMA_ERROR_CODE) {
+   if (dma_mapping_error(peer, ivc->tx.phys)) {
dma_unmap_single(peer, ivc->rx.phys, queue_size,
 DMA_BIDIRECTIONAL);
return -ENOMEM;
-- 
2.11.0



[PATCH 02/44] ibmveth: properly unwind on init errors

2017-06-16 Thread Christoph Hellwig
That way the driver doesn't have to rely on DMA_ERROR_CODE, which
is not a public API and going away.

Signed-off-by: Christoph Hellwig 
Acked-by: David S. Miller 
---
 drivers/net/ethernet/ibm/ibmveth.c | 159 +
 1 file changed, 74 insertions(+), 85 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmveth.c 
b/drivers/net/ethernet/ibm/ibmveth.c
index 72ab7b6bf20b..3ac27f59e595 100644
--- a/drivers/net/ethernet/ibm/ibmveth.c
+++ b/drivers/net/ethernet/ibm/ibmveth.c
@@ -467,56 +467,6 @@ static void ibmveth_rxq_harvest_buffer(struct 
ibmveth_adapter *adapter)
}
 }
 
-static void ibmveth_cleanup(struct ibmveth_adapter *adapter)
-{
-   int i;
-   struct device *dev = >vdev->dev;
-
-   if (adapter->buffer_list_addr != NULL) {
-   if (!dma_mapping_error(dev, adapter->buffer_list_dma)) {
-   dma_unmap_single(dev, adapter->buffer_list_dma, 4096,
-   DMA_BIDIRECTIONAL);
-   adapter->buffer_list_dma = DMA_ERROR_CODE;
-   }
-   free_page((unsigned long)adapter->buffer_list_addr);
-   adapter->buffer_list_addr = NULL;
-   }
-
-   if (adapter->filter_list_addr != NULL) {
-   if (!dma_mapping_error(dev, adapter->filter_list_dma)) {
-   dma_unmap_single(dev, adapter->filter_list_dma, 4096,
-   DMA_BIDIRECTIONAL);
-   adapter->filter_list_dma = DMA_ERROR_CODE;
-   }
-   free_page((unsigned long)adapter->filter_list_addr);
-   adapter->filter_list_addr = NULL;
-   }
-
-   if (adapter->rx_queue.queue_addr != NULL) {
-   dma_free_coherent(dev, adapter->rx_queue.queue_len,
- adapter->rx_queue.queue_addr,
- adapter->rx_queue.queue_dma);
-   adapter->rx_queue.queue_addr = NULL;
-   }
-
-   for (i = 0; i < IBMVETH_NUM_BUFF_POOLS; i++)
-   if (adapter->rx_buff_pool[i].active)
-   ibmveth_free_buffer_pool(adapter,
->rx_buff_pool[i]);
-
-   if (adapter->bounce_buffer != NULL) {
-   if (!dma_mapping_error(dev, adapter->bounce_buffer_dma)) {
-   dma_unmap_single(>vdev->dev,
-   adapter->bounce_buffer_dma,
-   adapter->netdev->mtu + IBMVETH_BUFF_OH,
-   DMA_BIDIRECTIONAL);
-   adapter->bounce_buffer_dma = DMA_ERROR_CODE;
-   }
-   kfree(adapter->bounce_buffer);
-   adapter->bounce_buffer = NULL;
-   }
-}
-
 static int ibmveth_register_logical_lan(struct ibmveth_adapter *adapter,
 union ibmveth_buf_desc rxq_desc, u64 mac_address)
 {
@@ -573,14 +523,17 @@ static int ibmveth_open(struct net_device *netdev)
for(i = 0; i < IBMVETH_NUM_BUFF_POOLS; i++)
rxq_entries += adapter->rx_buff_pool[i].size;
 
+   rc = -ENOMEM;
adapter->buffer_list_addr = (void*) get_zeroed_page(GFP_KERNEL);
-   adapter->filter_list_addr = (void*) get_zeroed_page(GFP_KERNEL);
+   if (!adapter->buffer_list_addr) {
+   netdev_err(netdev, "unable to allocate list pages\n");
+   goto out;
+   }
 
-   if (!adapter->buffer_list_addr || !adapter->filter_list_addr) {
-   netdev_err(netdev, "unable to allocate filter or buffer list "
-  "pages\n");
-   rc = -ENOMEM;
-   goto err_out;
+   adapter->filter_list_addr = (void*) get_zeroed_page(GFP_KERNEL);
+   if (!adapter->filter_list_addr) {
+   netdev_err(netdev, "unable to allocate filter pages\n");
+   goto out_free_buffer_list;
}
 
dev = >vdev->dev;
@@ -590,22 +543,21 @@ static int ibmveth_open(struct net_device *netdev)
adapter->rx_queue.queue_addr =
dma_alloc_coherent(dev, adapter->rx_queue.queue_len,
   >rx_queue.queue_dma, GFP_KERNEL);
-   if (!adapter->rx_queue.queue_addr) {
-   rc = -ENOMEM;
-   goto err_out;
-   }
+   if (!adapter->rx_queue.queue_addr)
+   goto out_free_filter_list;
 
adapter->buffer_list_dma = dma_map_single(dev,
adapter->buffer_list_addr, 4096, DMA_BIDIRECTIONAL);
+   if (dma_mapping_error(dev, adapter->buffer_list_dma)) {
+   netdev_err(netdev, "unable to map buffer list pages\n");
+   goto out_free_queue_mem;
+   }
+
adapter->filter_list_dma = dma_map_single(dev,
adapter->filter_list_addr, 4096, DMA_BIDIRECTIONAL);
-
-   if ((dma_mapping_error(dev, adapter->buffer_list_dma)) ||
- 

[PATCH 06/44] iommu/dma: don't rely on DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
DMA_ERROR_CODE is not a public API and will go away soon.  dma dma-iommu
driver already implements a proper ->mapping_error method, so it's only
using the value internally.  Add a new local define using the value
that arm64 which is the only current user of dma-iommu.

Signed-off-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 62618e77bedc..9403336f1fa6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -31,6 +31,8 @@
 #include 
 #include 
 
+#define IOMMU_MAPPING_ERROR0
+
 struct iommu_dma_msi_page {
struct list_headlist;
dma_addr_t  iova;
@@ -500,7 +502,7 @@ void iommu_dma_free(struct device *dev, struct page 
**pages, size_t size,
 {
__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle, size);
__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
-   *handle = DMA_ERROR_CODE;
+   *handle = IOMMU_MAPPING_ERROR;
 }
 
 /**
@@ -533,7 +535,7 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
dma_addr_t iova;
unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
 
-   *handle = DMA_ERROR_CODE;
+   *handle = IOMMU_MAPPING_ERROR;
 
min_size = alloc_sizes & -alloc_sizes;
if (min_size < PAGE_SIZE) {
@@ -627,11 +629,11 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
 
iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
if (!iova)
-   return DMA_ERROR_CODE;
+   return IOMMU_MAPPING_ERROR;
 
if (iommu_map(domain, iova, phys - iova_off, size, prot)) {
iommu_dma_free_iova(cookie, iova, size);
-   return DMA_ERROR_CODE;
+   return IOMMU_MAPPING_ERROR;
}
return iova + iova_off;
 }
@@ -671,7 +673,7 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
 
s->offset += s_iova_off;
s->length = s_length;
-   sg_dma_address(s) = DMA_ERROR_CODE;
+   sg_dma_address(s) = IOMMU_MAPPING_ERROR;
sg_dma_len(s) = 0;
 
/*
@@ -714,11 +716,11 @@ static void __invalidate_sg(struct scatterlist *sg, int 
nents)
int i;
 
for_each_sg(sg, s, nents, i) {
-   if (sg_dma_address(s) != DMA_ERROR_CODE)
+   if (sg_dma_address(s) != IOMMU_MAPPING_ERROR)
s->offset += sg_dma_address(s);
if (sg_dma_len(s))
s->length = sg_dma_len(s);
-   sg_dma_address(s) = DMA_ERROR_CODE;
+   sg_dma_address(s) = IOMMU_MAPPING_ERROR;
sg_dma_len(s) = 0;
}
 }
@@ -836,7 +838,7 @@ void iommu_dma_unmap_resource(struct device *dev, 
dma_addr_t handle,
 
 int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
 {
-   return dma_addr == DMA_ERROR_CODE;
+   return dma_addr == IOMMU_MAPPING_ERROR;
 }
 
 static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
-- 
2.11.0



[PATCH 09/44] c6x: remove DMA_ERROR_CODE

2017-06-16 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/c6x/include/asm/dma-mapping.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/arch/c6x/include/asm/dma-mapping.h 
b/arch/c6x/include/asm/dma-mapping.h
index aca9f755e4f8..05daf1038111 100644
--- a/arch/c6x/include/asm/dma-mapping.h
+++ b/arch/c6x/include/asm/dma-mapping.h
@@ -12,11 +12,6 @@
 #ifndef _ASM_C6X_DMA_MAPPING_H
 #define _ASM_C6X_DMA_MAPPING_H
 
-/*
- * DMA errors are defined by all-bits-set in the DMA address.
- */
-#define DMA_ERROR_CODE ~0
-
 extern const struct dma_map_ops c6x_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
-- 
2.11.0



clean up and modularize arch dma_mapping interface V2

2017-06-16 Thread Christoph Hellwig
Hi all,

for a while we have a generic implementation of the dma mapping routines
that call into per-arch or per-device operations.  But right now there
still are various bits in the interfaces where don't clearly operate
on these ops.  This series tries to clean up a lot of those (but not all
yet, but the series is big enough).  It gets rid of the DMA_ERROR_CODE
way of signaling failures of the mapping routines from the
implementations to the generic code (and cleans up various drivers that
were incorrectly using it), and gets rid of the ->set_dma_mask routine
in favor of relying on the ->dma_capable method that can be used in
the same way, but which requires less code duplication.

I've got a good number of reviews last time, but a few are still missing.
I'd love to not have to re-spam everyone with this patchbomb, so early
ACKs (or complaints) are welcome.

I plan to create a new dma-mapping tree to collect all this work.
Any volunteers for co-maintainers, especially from the iommu gang?

The whole series is also available in git:

git://git.infradead.org/users/hch/misc.git dma-map

Gitweb:

http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-map

Changes since V1:
 - remove two lines of code from arm dmabounce
 - a few commit message tweaks
 - lots of ACKs


[PATCH net-next 05/21] net: introduce a new function dst_dev_put()

2017-06-16 Thread Wei Wang
From: Wei Wang 

This function should be called when removing routes from fib tree after
the dst gc is no longer in use.
We first mark DST_OBSOLETE_DEAD on this dst to make sure next
dst_ops->check() fails and returns NULL.
Secondly, as we no longer keep the gc_list, we need to properly
release dst->dev right at the moment when the dst is removed from
the fib/fib6 tree.
It does the following:
1. change dst->input and output pointers to dst_discard/dst_dscard_out to
   discard all packets
2. replace dst->dev with loopback interface

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 include/net/dst.h |  1 +
 net/core/dst.c| 23 +++
 2 files changed, 24 insertions(+)

diff --git a/include/net/dst.h b/include/net/dst.h
index 2735d5a1e774..11d779803c0d 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -428,6 +428,7 @@ void dst_init(struct dst_entry *dst, struct dst_ops *ops,
  unsigned short flags);
 void __dst_free(struct dst_entry *dst);
 struct dst_entry *dst_destroy(struct dst_entry *dst);
+void dst_dev_put(struct dst_entry *dst);
 
 static inline void dst_free(struct dst_entry *dst)
 {
diff --git a/net/core/dst.c b/net/core/dst.c
index 551834c3363f..2031f778bf2a 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -296,6 +296,29 @@ static void dst_destroy_rcu(struct rcu_head *head)
__dst_free(dst);
 }
 
+/* Operations to mark dst as DEAD and clean up the net device referenced
+ * by dst:
+ * 1. put the dst under loopback interface and discard all tx/rx packets
+ *on this route.
+ * 2. release the net_device
+ * This function should be called when removing routes from the fib tree
+ * in preparation for a NETDEV_DOWN/NETDEV_UNREGISTER event and also to
+ * make the next dst_ops->check() fail.
+ */
+void dst_dev_put(struct dst_entry *dst)
+{
+   struct net_device *dev = dst->dev;
+
+   dst->obsolete = DST_OBSOLETE_DEAD;
+   if (dst->ops->ifdown)
+   dst->ops->ifdown(dst, dev, true);
+   dst->input = dst_discard;
+   dst->output = dst_discard_out;
+   dst->dev = dev_net(dst->dev)->loopback_dev;
+   dev_hold(dst->dev);
+   dev_put(dev);
+}
+
 void dst_release(struct dst_entry *dst)
 {
if (dst) {
-- 
2.13.1.518.g3df882009-goog



[PATCH net-next 02/21] udp: call dst_hold_safe() in udp_sk_rx_set_dst()

2017-06-16 Thread Wei Wang
From: Wei Wang 

In udp_v4/6_early_demux() code, we try to hold dst->__refcnt for
dst with DST_NOCACHE flag. This is because later in udp_sk_rx_dst_set()
function, we will try to cache this dst in sk for connected case.
However, a better way to achieve this is to not try to hold dst in
early_demux(), but in udp_sk_rx_dst_set(), call dst_hold_safe(). This
approach is also more consistant with how tcp is handling it. And it
will make later changes simpler.

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 net/ipv4/udp.c | 22 ++
 net/ipv6/udp.c | 14 ++
 2 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 2bc638c48b86..99fb1fb90ad3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1977,9 +1977,10 @@ static void udp_sk_rx_dst_set(struct sock *sk, struct 
dst_entry *dst)
 {
struct dst_entry *old;
 
-   dst_hold(dst);
-   old = xchg(>sk_rx_dst, dst);
-   dst_release(old);
+   if (dst_hold_safe(dst)) {
+   old = xchg(>sk_rx_dst, dst);
+   dst_release(old);
+   }
 }
 
 /*
@@ -2302,15 +2303,12 @@ void udp_v4_early_demux(struct sk_buff *skb)
 
if (dst)
dst = dst_check(dst, 0);
-   if (dst) {
-   /* DST_NOCACHE can not be used without taking a reference */
-   if (dst->flags & DST_NOCACHE) {
-   if (likely(atomic_inc_not_zero(>__refcnt)))
-   skb_dst_set(skb, dst);
-   } else {
-   skb_dst_set_noref(skb, dst);
-   }
-   }
+   if (dst)
+   /* set noref for now.
+* any place which wants to hold dst has to call
+* dst_hold_safe()
+*/
+   skb_dst_set_noref(skb, dst);
 }
 
 int udp_rcv(struct sk_buff *skb)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 2e9b52bded2d..a2152e2138ff 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -919,14 +919,12 @@ static void udp_v6_early_demux(struct sk_buff *skb)
 
if (dst)
dst = dst_check(dst, inet6_sk(sk)->rx_dst_cookie);
-   if (dst) {
-   if (dst->flags & DST_NOCACHE) {
-   if (likely(atomic_inc_not_zero(>__refcnt)))
-   skb_dst_set(skb, dst);
-   } else {
-   skb_dst_set_noref(skb, dst);
-   }
-   }
+   if (dst)
+   /* set noref for now.
+* any place which wants to hold dst has to call
+* dst_hold_safe()
+*/
+   skb_dst_set_noref(skb, dst);
 }
 
 static __inline__ int udpv6_rcv(struct sk_buff *skb)
-- 
2.13.1.518.g3df882009-goog



[PATCH net-next 04/21] net: introduce DST_NOGC in dst_release() to destroy dst based on refcnt

2017-06-16 Thread Wei Wang
From: Wei Wang 

The current mechanism of freeing dst is a bit complicated. dst has its
ref count and when user grabs the reference to the dst, the ref count is
properly taken in most cases except in IPv4/IPv6/decnet/xfrm routing
code due to some historic reasons.

If the reference to dst is always taken properly, we should be able to
simplify the logic in dst_release() to destroy dst when dst->__refcnt
drops from 1 to 0. And this should be the only condition to determine
if we can call dst_destroy().
And as dst is always ref counted, there is no need for a dst garbage
list to hold the dst entries that already get removed by the routing
code but are still held by other users. And the task to periodically
check the list to free dst if ref count become 0 is also not needed
anymore.

This patch introduces a temporary flag DST_NOGC(no garbage collector).
If it is set in the dst, dst_release() will call dst_destroy() when
dst->__refcnt drops to 0. dst_hold_safe() will also check for this flag
and do atomic_inc_not_zero() similar as DST_NOCACHE to avoid double free
issue.
This temporary flag is mainly used so that we can make the transition
component by component without breaking other parts.
This flag will be removed after all components are properly transitioned.

This patch also introduces a new function dst_release_immediate() which
destroys dst without waiting on the rcu when refcnt drops to 0. It will
be used in later patches.

Follow-up patches will correct all the places to properly take ref count
on dst and mark DST_NOGC. dst_release() or dst_release_immediate() will
be used to release the dst instead of dst_free() and its related
functions.
And final clean-up patch will remove the DST_NOGC flag.

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 include/net/dst.h |  5 -
 net/core/dst.c| 20 ++--
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 1969008783d8..2735d5a1e774 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -58,6 +58,7 @@ struct dst_entry {
 #define DST_XFRM_TUNNEL0x0080
 #define DST_XFRM_QUEUE 0x0100
 #define DST_METADATA   0x0200
+#define DST_NOGC   0x0400
 
short   error;
 
@@ -278,6 +279,8 @@ static inline struct dst_entry *dst_clone(struct dst_entry 
*dst)
 
 void dst_release(struct dst_entry *dst);
 
+void dst_release_immediate(struct dst_entry *dst);
+
 static inline void refdst_drop(unsigned long refdst)
 {
if (!(refdst & SKB_DST_NOREF))
@@ -334,7 +337,7 @@ static inline void skb_dst_force(struct sk_buff *skb)
  */
 static inline bool dst_hold_safe(struct dst_entry *dst)
 {
-   if (dst->flags & DST_NOCACHE)
+   if (dst->flags & (DST_NOCACHE | DST_NOGC))
return atomic_inc_not_zero(>__refcnt);
dst_hold(dst);
return true;
diff --git a/net/core/dst.c b/net/core/dst.c
index 13ba4a090c41..551834c3363f 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -300,18 +300,34 @@ void dst_release(struct dst_entry *dst)
 {
if (dst) {
int newrefcnt;
-   unsigned short nocache = dst->flags & DST_NOCACHE;
+   unsigned short destroy_after_rcu = dst->flags &
+  (DST_NOCACHE | DST_NOGC);
 
newrefcnt = atomic_dec_return(>__refcnt);
if (unlikely(newrefcnt < 0))
net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
 __func__, dst, newrefcnt);
-   if (!newrefcnt && unlikely(nocache))
+   if (!newrefcnt && unlikely(destroy_after_rcu))
call_rcu(>rcu_head, dst_destroy_rcu);
}
 }
 EXPORT_SYMBOL(dst_release);
 
+void dst_release_immediate(struct dst_entry *dst)
+{
+   if (dst) {
+   int newrefcnt;
+
+   newrefcnt = atomic_dec_return(>__refcnt);
+   if (unlikely(newrefcnt < 0))
+   net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
+__func__, dst, newrefcnt);
+   if (!newrefcnt)
+   dst_destroy(dst);
+   }
+}
+EXPORT_SYMBOL(dst_release_immediate);
+
 u32 *dst_cow_metrics_generic(struct dst_entry *dst, unsigned long old)
 {
struct dst_metrics *p = kmalloc(sizeof(*p), GFP_ATOMIC);
-- 
2.13.1.518.g3df882009-goog



[PATCH net-next 21/21] net: add debug atomic_inc_not_zero() in dst_hold()

2017-06-16 Thread Wei Wang
From: Wei Wang 

This patch is meant to add a debug warning on the situation where dst is
being held during its destroy phase. This could potentially cause double
free issue on the dst.

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 include/net/dst.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index d912b44d2dcb..f73611ec4017 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -251,7 +251,7 @@ static inline void dst_hold(struct dst_entry *dst)
 * __pad_to_align_refcnt declaration in struct dst_entry
 */
BUILD_BUG_ON(offsetof(struct dst_entry, __refcnt) & 63);
-   atomic_inc(>__refcnt);
+   WARN_ON(atomic_inc_not_zero(>__refcnt) == 0);
 }
 
 static inline void dst_use(struct dst_entry *dst, unsigned long time)
-- 
2.13.1.518.g3df882009-goog



[PATCH net-next 11/21] ipv6: call dst_dev_put() properly

2017-06-16 Thread Wei Wang
From: Wei Wang 

As the intend of this patch series is to completely remove dst gc,
we need to call dst_dev_put() to release the reference to dst->dev
when removing routes from fib because we won't keep the gc list anymore
and will lose the dst pointer right after removing the routes.
Without the gc list, there is no way to find all the dst's that have
dst->dev pointing to the going-down dev.
Hence, we are doing dst_dev_put() immediately before we lose the last
reference of the dst from the routing code. The next dst_check() will
trigger a route re-lookup to find another route (if there is any).

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 net/ipv6/ip6_fib.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 3b728bcb1301..265401abb98e 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -172,6 +172,7 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
ppcpu_rt = per_cpu_ptr(non_pcpu_rt->rt6i_pcpu, cpu);
pcpu_rt = *ppcpu_rt;
if (pcpu_rt) {
+   dst_dev_put(_rt->dst);
dst_release(_rt->dst);
rt6_rcu_free(pcpu_rt);
*ppcpu_rt = NULL;
@@ -186,6 +187,7 @@ static void rt6_release(struct rt6_info *rt)
 {
if (atomic_dec_and_test(>rt6i_ref)) {
rt6_free_pcpu(rt);
+   dst_dev_put(>dst);
dst_release(>dst);
rt6_rcu_free(rt);
}
-- 
2.13.1.518.g3df882009-goog



[PATCH net-next 16/21] decnet: take dst->__refcnt when struct dn_route is created

2017-06-16 Thread Wei Wang
From: Wei Wang 

struct dn_route is inserted into dn_rt_hash_table but no dst->__refcnt
is taken.
This patch makes sure the dn_rt_hash_table's reference to the dst is ref
counted.

As the dst is always ref counted properly, we can safely mark
DST_NOGC flag so dst_release() will release dst based on refcnt only.
And dst gc is no longer needed and all dst_free() or its related
function calls should be replaced with dst_release() or
dst_release_immediate(). And dst_dev_put() is called when removing dst
from the hash table to release the reference on dst->dev before we lose
pointer to it.

Also, correct the logic in dn_dst_check_expire() and dn_dst_gc() to
check dst->__refcnt to be > 1 to indicate it is referenced by other
users.

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 net/decnet/dn_route.c | 36 +++-
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index 6f95612b4d32..f467c4e3205b 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -183,11 +183,6 @@ static __inline__ unsigned int dn_hash(__le16 src, __le16 
dst)
return dn_rt_hash_mask & (unsigned int)tmp;
 }
 
-static inline void dnrt_free(struct dn_route *rt)
-{
-   call_rcu_bh(>dst.rcu_head, dst_rcu_free);
-}
-
 static void dn_dst_check_expire(unsigned long dummy)
 {
int i;
@@ -202,14 +197,15 @@ static void dn_dst_check_expire(unsigned long dummy)
spin_lock(_rt_hash_table[i].lock);
while ((rt = rcu_dereference_protected(*rtp,

lockdep_is_held(_rt_hash_table[i].lock))) != NULL) {
-   if (atomic_read(>dst.__refcnt) ||
-   (now - rt->dst.lastuse) < expire) {
+   if (atomic_read(>dst.__refcnt) > 1 ||
+   (now - rt->dst.lastuse) < expire) {
rtp = >dst.dn_next;
continue;
}
*rtp = rt->dst.dn_next;
rt->dst.dn_next = NULL;
-   dnrt_free(rt);
+   dst_dev_put(>dst);
+   dst_release(>dst);
}
spin_unlock(_rt_hash_table[i].lock);
 
@@ -235,14 +231,15 @@ static int dn_dst_gc(struct dst_ops *ops)
 
while ((rt = rcu_dereference_protected(*rtp,

lockdep_is_held(_rt_hash_table[i].lock))) != NULL) {
-   if (atomic_read(>dst.__refcnt) ||
-   (now - rt->dst.lastuse) < expire) {
+   if (atomic_read(>dst.__refcnt) > 1 ||
+   (now - rt->dst.lastuse) < expire) {
rtp = >dst.dn_next;
continue;
}
*rtp = rt->dst.dn_next;
rt->dst.dn_next = NULL;
-   dnrt_free(rt);
+   dst_dev_put(>dst);
+   dst_release(>dst);
break;
}
spin_unlock_bh(_rt_hash_table[i].lock);
@@ -344,7 +341,7 @@ static int dn_insert_route(struct dn_route *rt, unsigned 
int hash, struct dn_rou
dst_use(>dst, now);
spin_unlock_bh(_rt_hash_table[hash].lock);
 
-   dst_free(>dst);
+   dst_release_immediate(>dst);
*rp = rth;
return 0;
}
@@ -374,7 +371,8 @@ static void dn_run_flush(unsigned long dummy)
for(; rt; rt = next) {
next = rcu_dereference_raw(rt->dst.dn_next);
RCU_INIT_POINTER(rt->dst.dn_next, NULL);
-   dnrt_free(rt);
+   dst_dev_put(>dst);
+   dst_release(>dst);
}
 
 nothing_to_declare:
@@ -1181,7 +1179,8 @@ static int dn_route_output_slow(struct dst_entry **pprt, 
const struct flowidn *o
if (dev_out->flags & IFF_LOOPBACK)
flags |= RTCF_LOCAL;
 
-   rt = dst_alloc(_dst_ops, dev_out, 0, DST_OBSOLETE_NONE, DST_HOST);
+   rt = dst_alloc(_dst_ops, dev_out, 1, DST_OBSOLETE_NONE,
+  DST_HOST | DST_NOGC);
if (rt == NULL)
goto e_nobufs;
 
@@ -1215,6 +1214,7 @@ static int dn_route_output_slow(struct dst_entry **pprt, 
const struct flowidn *o
goto e_neighbour;
 
hash = dn_hash(rt->fld.saddr, rt->fld.daddr);
+   /* dn_insert_route() increments dst->__refcnt */
dn_insert_route(rt, hash, (struct dn_route **)pprt);
 
 done:
@@ -1237,7 +1237,7 @@ static int dn_route_output_slow(struct dst_entry **pprt, 
const struct flowidn *o

[PATCH net-next 09/21] ipv4: mark DST_NOGC and remove the operation of dst_free()

2017-06-16 Thread Wei Wang
From: Wei Wang 

With the previous preparation patches, we are ready to get rid of the
dst gc operation in ipv4 code and release dst based on refcnt only.
So this patch adds DST_NOGC flag for all IPv4 dst and remove the calls
to dst_free().
At this point, all dst created in ipv4 code do not use the dst gc
anymore and will be destroyed at the point when refcnt drops to 0.

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 net/ipv4/fib_semantics.c |  6 ++
 net/ipv4/route.c | 15 +++
 2 files changed, 5 insertions(+), 16 deletions(-)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index f163fa0a1164..ff47ea1408fe 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -153,8 +153,7 @@ static void rt_fibinfo_free(struct rtable __rcu **rtp)
 */
 
dst_dev_put(>dst);
-   dst_release(>dst);
-   dst_free(>dst);
+   dst_release_immediate(>dst);
 }
 
 static void free_nh_exceptions(struct fib_nh *nh)
@@ -198,8 +197,7 @@ static void rt_fibinfo_free_cpus(struct rtable __rcu * 
__percpu *rtp)
rt = rcu_dereference_protected(*per_cpu_ptr(rtp, cpu), 1);
if (rt) {
dst_dev_put(>dst);
-   dst_release(>dst);
-   dst_free(>dst);
+   dst_release_immediate(>dst);
}
}
free_percpu(rtp);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 903a12c601ac..80b30c2bf47d 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -589,11 +589,6 @@ static void ip_rt_build_flow_key(struct flowi4 *fl4, const 
struct sock *sk,
build_sk_flow_key(fl4, sk);
 }
 
-static inline void rt_free(struct rtable *rt)
-{
-   call_rcu(>dst.rcu_head, dst_rcu_free);
-}
-
 static DEFINE_SPINLOCK(fnhe_lock);
 
 static void fnhe_flush_routes(struct fib_nh_exception *fnhe)
@@ -605,14 +600,12 @@ static void fnhe_flush_routes(struct fib_nh_exception 
*fnhe)
RCU_INIT_POINTER(fnhe->fnhe_rth_input, NULL);
dst_dev_put(>dst);
dst_release(>dst);
-   rt_free(rt);
}
rt = rcu_dereference(fnhe->fnhe_rth_output);
if (rt) {
RCU_INIT_POINTER(fnhe->fnhe_rth_output, NULL);
dst_dev_put(>dst);
dst_release(>dst);
-   rt_free(rt);
}
 }
 
@@ -1341,7 +1334,6 @@ static bool rt_bind_exception(struct rtable *rt, struct 
fib_nh_exception *fnhe,
if (orig) {
dst_dev_put(>dst);
dst_release(>dst);
-   rt_free(orig);
}
ret = true;
}
@@ -1374,7 +1366,6 @@ static bool rt_cache_route(struct fib_nh *nh, struct 
rtable *rt)
if (orig) {
dst_dev_put(>dst);
dst_release(>dst);
-   rt_free(orig);
}
} else {
dst_release(>dst);
@@ -1505,7 +1496,8 @@ struct rtable *rt_dst_alloc(struct net_device *dev,
rt = dst_alloc(_dst_ops, dev, 1, DST_OBSOLETE_FORCE_CHK,
   (will_cache ? 0 : (DST_HOST | DST_NOCACHE)) |
   (nopolicy ? DST_NOPOLICY : 0) |
-  (noxfrm ? DST_NOXFRM : 0));
+  (noxfrm ? DST_NOXFRM : 0) |
+  DST_NOGC);
 
if (rt) {
rt->rt_genid = rt_genid_ipv4(dev_net(dev));
@@ -2511,7 +2503,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, 
struct dst_entry *dst_or
struct rtable *ort = (struct rtable *) dst_orig;
struct rtable *rt;
 
-   rt = dst_alloc(_dst_blackhole_ops, NULL, 1, DST_OBSOLETE_NONE, 0);
+   rt = dst_alloc(_dst_blackhole_ops, NULL, 1, DST_OBSOLETE_NONE, 
DST_NOGC);
if (rt) {
struct dst_entry *new = >dst;
 
@@ -2534,7 +2526,6 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, 
struct dst_entry *dst_or
rt->rt_uses_gateway = ort->rt_uses_gateway;
 
INIT_LIST_HEAD(>rt_uncached);
-   dst_free(new);
}
 
dst_release(dst_orig);
-- 
2.13.1.518.g3df882009-goog



[PATCH net-next 07/21] ipv4: call dst_dev_put() properly

2017-06-16 Thread Wei Wang
From: Wei Wang 

As the intend of this patch series is to completely remove dst gc,
we need to call dst_dev_put() to release the reference to dst->dev
when removing routes from fib because we won't keep the gc list anymore
and will lose the dst pointer right after removing the routes.
Without the gc list, there is no way to find all the dst's that have
dst->dev pointing to the going-down dev.
Hence, we are doing dst_dev_put() immediately before we lose the last
reference of the dst from the routing code. The next dst_check() will
trigger a route re-lookup to find another route (if there is any).

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 net/ipv4/fib_semantics.c | 2 ++
 net/ipv4/route.c | 4 
 2 files changed, 6 insertions(+)

diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
index 53b3e9c2da4c..f163fa0a1164 100644
--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -152,6 +152,7 @@ static void rt_fibinfo_free(struct rtable __rcu **rtp)
 * free_fib_info_rcu()
 */
 
+   dst_dev_put(>dst);
dst_release(>dst);
dst_free(>dst);
 }
@@ -196,6 +197,7 @@ static void rt_fibinfo_free_cpus(struct rtable __rcu * 
__percpu *rtp)
 
rt = rcu_dereference_protected(*per_cpu_ptr(rtp, cpu), 1);
if (rt) {
+   dst_dev_put(>dst);
dst_release(>dst);
dst_free(>dst);
}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 3dee0043117e..d986d80258d2 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -603,12 +603,14 @@ static void fnhe_flush_routes(struct fib_nh_exception 
*fnhe)
rt = rcu_dereference(fnhe->fnhe_rth_input);
if (rt) {
RCU_INIT_POINTER(fnhe->fnhe_rth_input, NULL);
+   dst_dev_put(>dst);
dst_release(>dst);
rt_free(rt);
}
rt = rcu_dereference(fnhe->fnhe_rth_output);
if (rt) {
RCU_INIT_POINTER(fnhe->fnhe_rth_output, NULL);
+   dst_dev_put(>dst);
dst_release(>dst);
rt_free(rt);
}
@@ -1337,6 +1339,7 @@ static bool rt_bind_exception(struct rtable *rt, struct 
fib_nh_exception *fnhe,
dst_hold(>dst);
rcu_assign_pointer(*porig, rt);
if (orig) {
+   dst_dev_put(>dst);
dst_release(>dst);
rt_free(orig);
}
@@ -1369,6 +1372,7 @@ static bool rt_cache_route(struct fib_nh *nh, struct 
rtable *rt)
prev = cmpxchg(p, orig, rt);
if (prev == orig) {
if (orig) {
+   dst_dev_put(>dst);
dst_release(>dst);
rt_free(orig);
}
-- 
2.13.1.518.g3df882009-goog



[PATCH net-next 13/21] ipv6: mark DST_NOGC and remove the operation of dst_free()

2017-06-16 Thread Wei Wang
From: Wei Wang 

With the previous preparation patches, we are ready to get rid of the
dst gc operation in ipv6 code and release dst based on refcnt only.
So this patch adds DST_NOGC flag for all IPv6 dst and remove the calls
to dst_free() and its related functions.
At this point, all dst created in ipv6 code do not use the dst gc
anymore and will be destroyed at the point when refcnt drops to 0.

Also, as icmp6 dst route is refcounted during creation and will be freed
by user during its call of dst_release(), there is no need to add this
dst to the icmp6 gc list as well.
Instead, we need to add it into uncached list so that when a
NETDEV_DOWN/NETDEV_UNREGISRER event comes, we can properly go through
these icmp6 dst as well and release the net device properly.

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 net/ipv6/ip6_fib.c | 15 ++-
 net/ipv6/route.c   | 49 +
 2 files changed, 19 insertions(+), 45 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 265401abb98e..e3b35e146eef 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -153,11 +153,6 @@ static void node_free(struct fib6_node *fn)
kmem_cache_free(fib6_node_kmem, fn);
 }
 
-static void rt6_rcu_free(struct rt6_info *rt)
-{
-   call_rcu(>dst.rcu_head, dst_rcu_free);
-}
-
 static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
 {
int cpu;
@@ -174,7 +169,6 @@ static void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
if (pcpu_rt) {
dst_dev_put(_rt->dst);
dst_release(_rt->dst);
-   rt6_rcu_free(pcpu_rt);
*ppcpu_rt = NULL;
}
}
@@ -189,7 +183,6 @@ static void rt6_release(struct rt6_info *rt)
rt6_free_pcpu(rt);
dst_dev_put(>dst);
dst_release(>dst);
-   rt6_rcu_free(rt);
}
 }
 
@@ -1108,9 +1101,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt,
/* Always release dst as dst->__refcnt is guaranteed
 * to be taken before entering this function
 */
-   dst_release(>dst);
-   if (!(rt->dst.flags & DST_NOCACHE))
-   dst_free(>dst);
+   dst_release_immediate(>dst);
}
return err;
 
@@ -1124,9 +1115,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt,
/* Always release dst as dst->__refcnt is guaranteed
 * to be taken before entering this function
 */
-   dst_release(>dst);
-   if (!(rt->dst.flags & DST_NOCACHE))
-   dst_free(>dst);
+   dst_release_immediate(>dst);
return err;
 #endif
 }
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c52c51908881..5f859ee67172 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -354,7 +354,8 @@ static struct rt6_info *__ip6_dst_alloc(struct net *net,
int flags)
 {
struct rt6_info *rt = dst_alloc(>ipv6.ip6_dst_ops, dev,
-   1, DST_OBSOLETE_FORCE_CHK, flags);
+   1, DST_OBSOLETE_FORCE_CHK,
+   flags | DST_NOGC);
 
if (rt)
rt6_info_init(rt);
@@ -381,9 +382,7 @@ struct rt6_info *ip6_dst_alloc(struct net *net,
*p =  NULL;
}
} else {
-   dst_release(>dst);
-   if (!(flags & DST_NOCACHE))
-   dst_destroy((struct dst_entry *)rt);
+   dst_release_immediate(>dst);
return NULL;
}
}
@@ -1053,8 +1052,7 @@ static struct rt6_info *rt6_make_pcpu_route(struct 
rt6_info *rt)
prev = cmpxchg(p, NULL, pcpu_rt);
if (prev) {
/* If someone did it before us, return prev instead */
-   dst_release(_rt->dst);
-   dst_destroy(_rt->dst);
+   dst_release_immediate(_rt->dst);
pcpu_rt = prev;
}
} else {
@@ -1064,8 +1062,7 @@ static struct rt6_info *rt6_make_pcpu_route(struct 
rt6_info *rt)
 * since rt is going away anyway.  The next
 * dst_check() will trigger a re-lookup.
 */
-   dst_release(_rt->dst);
-   dst_destroy(_rt->dst);
+   dst_release_immediate(_rt->dst);
pcpu_rt = rt;
}
dst_hold(_rt->dst);
@@ -1257,9 +1254,8 @@ struct dst_entry *ip6_blackhole_route(struct net *net, 
struct dst_entry *dst_ori
struct net_device *loopback_dev = net->loopback_dev;
struct dst_entry *new = NULL;
 
-
rt = 

[PATCH net-next 12/21] ipv6: call dst_hold_safe() properly

2017-06-16 Thread Wei Wang
From: Wei Wang 

Similar as ipv4, ipv6 path also needs to call dst_hold_safe() when
necessary to avoid double free issue on the dst.

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 net/ipv6/addrconf.c | 4 ++--
 net/ipv6/route.c| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 0aa36b093013..2a6397714d70 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -5576,8 +5576,8 @@ static void __ipv6_ifa_notify(int event, struct 
inet6_ifaddr *ifp)
ip6_del_rt(rt);
}
if (ifp->rt) {
-   dst_hold(>rt->dst);
-   ip6_del_rt(ifp->rt);
+   if (dst_hold_safe(>rt->dst))
+   ip6_del_rt(ifp->rt);
}
rt_genid_bump_ipv6(net);
break;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 908b71188c57..c52c51908881 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1366,8 +1366,8 @@ static void ip6_link_failure(struct sk_buff *skb)
rt = (struct rt6_info *) skb_dst(skb);
if (rt) {
if (rt->rt6i_flags & RTF_CACHE) {
-   dst_hold(>dst);
-   ip6_del_rt(rt);
+   if (dst_hold_safe(>dst))
+   ip6_del_rt(rt);
} else if (rt->rt6i_node && (rt->rt6i_flags & RTF_DEFAULT)) {
rt->rt6i_node->fn_sernum = -1;
}
-- 
2.13.1.518.g3df882009-goog



[PATCH net-next 18/21] net: remove DST_NOGC flag

2017-06-16 Thread Wei Wang
From: Wei Wang 

Now that all the components have been changed to release dst based on
refcnt only and not depend on dst gc anymore, we can remove the
temporary flag DST_NOGC.

Note that we also need to remove the DST_NOCACHE check in dst_release()
and dst_hold_safe() because now all the dst are released based on refcnt
and behaves as DST_NOCACHE.

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 include/net/dst.h  | 6 +-
 net/core/dst.c | 4 +---
 net/decnet/dn_route.c  | 6 ++
 net/ipv4/route.c   | 5 ++---
 net/ipv6/route.c   | 5 ++---
 net/xfrm/xfrm_policy.c | 2 +-
 6 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 0c56d1fc4d7f..1be82f672c37 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -57,7 +57,6 @@ struct dst_entry {
 #define DST_XFRM_TUNNEL0x0080
 #define DST_XFRM_QUEUE 0x0100
 #define DST_METADATA   0x0200
-#define DST_NOGC   0x0400
 
short   error;
 
@@ -336,10 +335,7 @@ static inline void skb_dst_force(struct sk_buff *skb)
  */
 static inline bool dst_hold_safe(struct dst_entry *dst)
 {
-   if (dst->flags & (DST_NOCACHE | DST_NOGC))
-   return atomic_inc_not_zero(>__refcnt);
-   dst_hold(dst);
-   return true;
+   return atomic_inc_not_zero(>__refcnt);
 }
 
 /**
diff --git a/net/core/dst.c b/net/core/dst.c
index cd61291fb0a7..573dcf21b0af 100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -178,14 +178,12 @@ void dst_release(struct dst_entry *dst)
 {
if (dst) {
int newrefcnt;
-   unsigned short destroy_after_rcu = dst->flags &
-  (DST_NOCACHE | DST_NOGC);
 
newrefcnt = atomic_dec_return(>__refcnt);
if (unlikely(newrefcnt < 0))
net_warn_ratelimited("%s: dst:%p refcnt:%d\n",
 __func__, dst, newrefcnt);
-   if (!newrefcnt && unlikely(destroy_after_rcu))
+   if (!newrefcnt)
call_rcu(>rcu_head, dst_destroy_rcu);
}
 }
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index f467c4e3205b..5d17d843ac86 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1179,8 +1179,7 @@ static int dn_route_output_slow(struct dst_entry **pprt, 
const struct flowidn *o
if (dev_out->flags & IFF_LOOPBACK)
flags |= RTCF_LOCAL;
 
-   rt = dst_alloc(_dst_ops, dev_out, 1, DST_OBSOLETE_NONE,
-  DST_HOST | DST_NOGC);
+   rt = dst_alloc(_dst_ops, dev_out, 1, DST_OBSOLETE_NONE, DST_HOST);
if (rt == NULL)
goto e_nobufs;
 
@@ -1445,8 +1444,7 @@ static int dn_route_input_slow(struct sk_buff *skb)
}
 
 make_route:
-   rt = dst_alloc(_dst_ops, out_dev, 1, DST_OBSOLETE_NONE,
-  DST_HOST | DST_NOGC);
+   rt = dst_alloc(_dst_ops, out_dev, 1, DST_OBSOLETE_NONE, DST_HOST);
if (rt == NULL)
goto e_nobufs;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 80b30c2bf47d..9a0f496f8bf4 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1496,8 +1496,7 @@ struct rtable *rt_dst_alloc(struct net_device *dev,
rt = dst_alloc(_dst_ops, dev, 1, DST_OBSOLETE_FORCE_CHK,
   (will_cache ? 0 : (DST_HOST | DST_NOCACHE)) |
   (nopolicy ? DST_NOPOLICY : 0) |
-  (noxfrm ? DST_NOXFRM : 0) |
-  DST_NOGC);
+  (noxfrm ? DST_NOXFRM : 0));
 
if (rt) {
rt->rt_genid = rt_genid_ipv4(dev_net(dev));
@@ -2503,7 +2502,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, 
struct dst_entry *dst_or
struct rtable *ort = (struct rtable *) dst_orig;
struct rtable *rt;
 
-   rt = dst_alloc(_dst_blackhole_ops, NULL, 1, DST_OBSOLETE_NONE, 
DST_NOGC);
+   rt = dst_alloc(_dst_blackhole_ops, NULL, 1, DST_OBSOLETE_NONE, 0);
if (rt) {
struct dst_entry *new = >dst;
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c88044b8fa7c..6b6528fa3292 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -354,8 +354,7 @@ static struct rt6_info *__ip6_dst_alloc(struct net *net,
int flags)
 {
struct rt6_info *rt = dst_alloc(>ipv6.ip6_dst_ops, dev,
-   1, DST_OBSOLETE_FORCE_CHK,
-   flags | DST_NOGC);
+   1, DST_OBSOLETE_FORCE_CHK, flags);
 
if (rt)
rt6_info_init(rt);
@@ -1255,7 +1254,7 @@ struct dst_entry *ip6_blackhole_route(struct net *net, 
struct dst_entry *dst_ori
struct dst_entry *new = NULL;
 
rt = dst_alloc(_dst_blackhole_ops, loopback_dev, 1,
-   

[PATCH net-next 03/21] net: use loopback dev when generating blackhole route

2017-06-16 Thread Wei Wang
From: Wei Wang 

Existing ipv4/6_blackhole_route() code generates a blackhole route
with dst->dev pointing to the passed in dst->dev.
It is not necessary to hold reference to the passed in dst->dev
because the packets going through this route are dropped anyway.
A loopback interface is good enough so that we don't need to worry about
releasing this dst->dev when this dev is going down.

Signed-off-by: Wei Wang 
Acked-by: Martin KaFai Lau 
---
 net/ipv4/route.c | 2 +-
 net/ipv6/route.c | 9 +
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9b38cf18144e..0a843ef2b709 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2504,7 +2504,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, 
struct dst_entry *dst_or
new->input = dst_discard;
new->output = dst_discard_out;
 
-   new->dev = ort->dst.dev;
+   new->dev = net->loopback_dev;
if (new->dev)
dev_hold(new->dev);
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 18fe6e2b88d5..bc1bc91bb969 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1245,9 +1245,12 @@ EXPORT_SYMBOL_GPL(ip6_route_output_flags);
 struct dst_entry *ip6_blackhole_route(struct net *net, struct dst_entry 
*dst_orig)
 {
struct rt6_info *rt, *ort = (struct rt6_info *) dst_orig;
+   struct net_device *loopback_dev = net->loopback_dev;
struct dst_entry *new = NULL;
 
-   rt = dst_alloc(_dst_blackhole_ops, ort->dst.dev, 1, 
DST_OBSOLETE_NONE, 0);
+
+   rt = dst_alloc(_dst_blackhole_ops, loopback_dev, 1,
+  DST_OBSOLETE_NONE, 0);
if (rt) {
rt6_info_init(rt);
 
@@ -1257,10 +1260,8 @@ struct dst_entry *ip6_blackhole_route(struct net *net, 
struct dst_entry *dst_ori
new->output = dst_discard_out;
 
dst_copy_metrics(new, >dst);
-   rt->rt6i_idev = ort->rt6i_idev;
-   if (rt->rt6i_idev)
-   in6_dev_hold(rt->rt6i_idev);
 
+   rt->rt6i_idev = in6_dev_get(loopback_dev);
rt->rt6i_gateway = ort->rt6i_gateway;
rt->rt6i_flags = ort->rt6i_flags & ~RTF_PCPU;
rt->rt6i_metric = 0;
-- 
2.13.1.518.g3df882009-goog



  1   2   3   >