RE: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets

2016-05-07 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Sunday, May 8, 2016 1:41
> To: Dexuan Cui 
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY
> Srinivasan ; Haiyang Zhang ;
> j...@perches.com; vkuzn...@redhat.com
> Subject: Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
>
> From: Dexuan Cui 
> Date: Sat, 7 May 2016 10:49:25 +
>
> > I should be able to make 'send', 'recv' here to pointers and use vmalloc()
> > to allocate the memory for them.  I will do this.
>
> That's still unswappable kernel memory.
Hi David,
My understanding is: kernel pages are not swappable in Linux, so it looks I
can't avoid unswappable kernel memory here?

> People can open N sockets, where N is something on the order of the FD
> limit the process has, per process.  This allows someone to quickly
> eat up a lot of memory and hold onto it nearly indefinitely.

Thanks for pointing this out!
I understand, so I think I should add a module parameter, e.g.,
"hv_sock.max_socket_number" with a default value, say, 1024?

1 established hv_sock connection takes less than 20 pages, including 10
pages for VMBus ringbuffer, 6 pages for send/recv buffers(I'll use
vmalloc() for this), etc.
Here the recv buf needs a size of 5 pages because potentially the host
can send the guest a VMBus packet with an up-to-5-page payload, i..e,
the VMBus inbound ringbuffer size.

1024 hv_sock connections take less than 20*4KB * 1K = 80MB memory.

A user who needs more connections can change the module parameter
without reboot.

hv_sock connection is designed  to work only between the host and the
guest. I think 1024 connections seem pretty enough.

BTW, a user can't create hv_sock connections without enough privilege.
Please see

+static int hvsock_create(struct net *net, struct socket *sock,
+int protocol, int kern)
+{
+   if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN))
+   return -EPERM;

David, does this make sense to you?

Thanks,
-- Dexuan


Re: [PATCH net-next v2 1/2] net: l3mdev: Add hook in ip and ipv6

2016-05-07 Thread Shmulik Ladkani
Hi,

On Sat, 7 May 2016 20:25:40 -0600 David Ahern  wrote:
> >   - On which circumstances we end up entering
> > l3mdev_ip_rcv/l3mdev_ip6_rcv where skb->dev is the master?
> > If I got it right, we enter 'ip_rcv_finish' on a slave device,
> > the callback is invoked and eventually sets skb->dev and skb->skb_iif
> > to the VRF device; then ip_rcv_finish continues processing the
> > altered skb (with the changed skb->dev).
> > So on which cicumstances do we enter 'ip_rcv_finish' where the
> > skb->dev is ALREADY a master device?  
> 
> If you look at the full patchset I posted on 5/4 the patch after PKTINFO 
> allows local traffic. That change needs the netif_is_l3_master().

I see. Thanks David.


Re: [PATCH v2 net-next] ifb: support more features

2016-05-07 Thread Eric Dumazet
On Fri, 2016-05-06 at 18:19 -0700, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> When using ifb+netem on ingress on SIT/IPIP/GRE traffic,
> GRO packets are not properly processed.
> 
> Segmentation should not be forced, since ifb is already adding
> quite a performance hit.
> 
> Signed-off-by: Eric Dumazet 
> ---
>  drivers/net/ifb.c |3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
> index cc56fac3c3f8..66c0eeafcb5d 100644
> --- a/drivers/net/ifb.c
> +++ b/drivers/net/ifb.c
> @@ -196,6 +196,7 @@ static const struct net_device_ops ifb_netdev_ops = {
>  
>  #define IFB_FEATURES (NETIF_F_HW_CSUM | NETIF_F_SG  | NETIF_F_FRAGLIST   
> | \
> NETIF_F_TSO_ECN | NETIF_F_TSO | NETIF_F_TSO6  | \
> +   NETIF_F_GSO_ENCAP_ALL | \
> NETIF_F_HIGHDMA | NETIF_F_HW_VLAN_CTAG_TX | \
> NETIF_F_HW_VLAN_STAG_TX)
>  
> @@ -224,6 +225,8 @@ static void ifb_setup(struct net_device *dev)
>   dev->tx_queue_len = TX_Q_LIMIT;
>  
>   dev->features |= IFB_FEATURES;
> + dev->hw_features |= dev->features;
> + dev->hw_enc_features |= dev->features;
>   dev->vlan_features |= IFB_FEATURES & ~(NETIF_F_HW_VLAN_CTAG_TX |
>  NETIF_F_HW_VLAN_STAG_TX);
>  
> 


BTW, encapsulated GRO traffic going through mirred+ifb is dropped
because segments get an incorrect skb->mac_len 

(If TSO/GSO is disabled on ifb, as before the above patch)

SIT traffic for example : segments get mac_len set to 34 instead of 14

What do you think of :

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e561f9f07d6d..bec5c32b2fe9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3176,7 +3176,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
__copy_skb_header(nskb, head_skb);
 
skb_headers_offset_update(nskb, skb_headroom(nskb) - headroom);
-   skb_reset_mac_len(nskb);
+   nskb->mac_len = head_skb->mac_len;
 
skb_copy_from_linear_data_offset(head_skb, -tnl_hlen,
 nskb->data - tnl_hlen,




Re: [PATCH] Add support for configuring Infiniband GUIDs

2016-05-07 Thread Leon Romanovsky
On Fri, May 06, 2016 at 10:43:25AM -0500, Eli Cohen wrote:
> Add two NLA's that allow configuration of Infiniband node or port GUIDs
> by referencing the IPoIB net device set over then physical function. The
> format to be used is as follows:
> 
> ip link set dev ib0 vf 0 node_guid 00:02:c9:03:00:21:6e:70
> ip link set dev ib0 vf 0 port_guid 00:02:c9:03:00:21:6e:78
> 
> Issue: 702759
> Change-Id: I5ffb54d6de7bfa8650bf5818f484279914991d6e

These two lines are not needed to be part of commit message.


signature.asc
Description: Digital signature


[PATCH net-next 2/2] net: original ingress device index in PKTINFO

2016-05-07 Thread David Ahern
Applications such as OSPF and BFD need the original ingress device not
the VRF device; the latter can be derived from the former. To that end
add the skb_iif to inet_skb_parm and set it in ipv4 code after clearing
the skb control buffer similar to IPv6. From there the pktinfo can just
pull it from cb with the PKTINFO_SKB_CB cast.

The previous patch moving the skb->dev change to L3 means nothing else
is needed for IPv6; it just works.

Signed-off-by: David Ahern 
---
 include/net/ip.h   | 1 +
 net/ipv4/ip_input.c| 1 +
 net/ipv4/ip_sockglue.c | 7 ++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 247ac82e9cf2..37165fba3741 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -36,6 +36,7 @@
 struct sock;
 
 struct inet_skb_parm {
+   int iif;
struct ip_options   opt;/* Compiled IP options  
*/
unsigned char   flags;
 
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index 37375eedeef9..4b351af3e67b 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -478,6 +478,7 @@ int ip_rcv(struct sk_buff *skb, struct net_device *dev, 
struct packet_type *pt,
 
/* Remove any debris in the socket control block */
memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
+   IPCB(skb)->iif = skb->skb_iif;
 
/* Must drop socket now because of tproxy. */
skb_orphan(skb);
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index bdb222c0c6a2..5805762d7fc7 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1193,7 +1193,12 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct 
sk_buff *skb)
   ipv6_sk_rxinfo(sk);
 
if (prepare && skb_rtable(skb)) {
-   pktinfo->ipi_ifindex = inet_iif(skb);
+   /* skb->cb is overloaded: prior to this point it is IP{6}CB
+* which has interface index (iif) as the first member of the
+* underlying inet{6}_skb_parm struct. This code then overlays
+* PKTINFO_SKB_CB and in_pktinfo also has iif as the first
+* element so the iif is picked up from the prior IPCB
+*/
pktinfo->ipi_spec_dst.s_addr = fib_compute_spec_dst(skb);
} else {
pktinfo->ipi_ifindex = 0;
-- 
2.1.4



[PATCH net-next v3 1/2] net: l3mdev: Add hook in ip and ipv6

2016-05-07 Thread David Ahern
Currently the VRF driver uses the rx_handler to switch the skb device
to the VRF device. Switching the dev prior to the ip / ipv6 layer
means the VRF driver has to duplicate IP/IPv6 processing which adds
overhead and makes features such as retaining the ingress device index
more complicated than necessary.

This patch moves the hook to the L3 layer just after the first NF_HOOK
for PRE_ROUTING. This location makes exposing the original ingress device
trivial (next patch) and allows adding other NF_HOOKs to the VRF driver
in the future.

dev_queue_xmit_nit is exported so that the VRF driver can cycle the skb
with the switched device through the packet taps to maintain current
behavior (tcpdump can be used on either the vrf device or the enslaved
devices).

Signed-off-by: David Ahern 
---
v3
- reverted direct skb->data accesses caused by inadvertent drop of
  65c38aa653c1 with forward port of this patch

v2
- add skb_l3mdev_slave helper for inet6_iif and tcp_v6_iif rather
  than open coding the if case. Added benefit that the change compiles
  out if CONFIG_NET_L3_MASTER_DEV is not enabled.

 drivers/net/vrf.c | 189 ++
 include/linux/ipv6.h  |  17 -
 include/linux/netdevice.h |   2 +
 include/net/l3mdev.h  |  43 +++
 include/net/tcp.h |   4 +-
 net/core/dev.c|   3 +-
 net/ipv4/ip_input.c   |   7 ++
 net/ipv6/ip6_input.c  |   7 ++
 8 files changed, 171 insertions(+), 101 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 4b2461ae5d3b..309efbf8da0e 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -42,9 +42,6 @@
 #define DRV_NAME   "vrf"
 #define DRV_VERSION"1.0"
 
-#define vrf_master_get_rcu(dev) \
-   ((struct net_device *)rcu_dereference(dev->rx_handler_data))
-
 struct net_vrf {
struct rtable   *rth;
struct rt6_info *rt6;
@@ -60,90 +57,12 @@ struct pcpu_dstats {
struct u64_stats_sync   syncp;
 };
 
-/* neighbor handling is done with actual device; do not want
- * to flip skb->dev for those ndisc packets. This really fails
- * for multiple next protocols (e.g., NEXTHDR_HOP). But it is
- * a start.
- */
-#if IS_ENABLED(CONFIG_IPV6)
-static bool check_ipv6_frame(const struct sk_buff *skb)
-{
-   const struct ipv6hdr *ipv6h;
-   struct ipv6hdr _ipv6h;
-   bool rc = true;
-
-   ipv6h = skb_header_pointer(skb, 0, sizeof(_ipv6h), &_ipv6h);
-   if (!ipv6h)
-   goto out;
-
-   if (ipv6h->nexthdr == NEXTHDR_ICMP) {
-   const struct icmp6hdr *icmph;
-   struct icmp6hdr _icmph;
-
-   icmph = skb_header_pointer(skb, sizeof(_ipv6h),
-  sizeof(_icmph), &_icmph);
-   if (!icmph)
-   goto out;
-
-   switch (icmph->icmp6_type) {
-   case NDISC_ROUTER_SOLICITATION:
-   case NDISC_ROUTER_ADVERTISEMENT:
-   case NDISC_NEIGHBOUR_SOLICITATION:
-   case NDISC_NEIGHBOUR_ADVERTISEMENT:
-   case NDISC_REDIRECT:
-   rc = false;
-   break;
-   }
-   }
-
-out:
-   return rc;
-}
-#else
-static bool check_ipv6_frame(const struct sk_buff *skb)
-{
-   return false;
-}
-#endif
-
-static bool is_ip_rx_frame(struct sk_buff *skb)
-{
-   switch (skb->protocol) {
-   case htons(ETH_P_IP):
-   return true;
-   case htons(ETH_P_IPV6):
-   return check_ipv6_frame(skb);
-   }
-   return false;
-}
-
 static void vrf_tx_error(struct net_device *vrf_dev, struct sk_buff *skb)
 {
vrf_dev->stats.tx_errors++;
kfree_skb(skb);
 }
 
-/* note: already called with rcu_read_lock */
-static rx_handler_result_t vrf_handle_frame(struct sk_buff **pskb)
-{
-   struct sk_buff *skb = *pskb;
-
-   if (is_ip_rx_frame(skb)) {
-   struct net_device *dev = vrf_master_get_rcu(skb->dev);
-   struct pcpu_dstats *dstats = this_cpu_ptr(dev->dstats);
-
-   u64_stats_update_begin(&dstats->syncp);
-   dstats->rx_pkts++;
-   dstats->rx_bytes += skb->len;
-   u64_stats_update_end(&dstats->syncp);
-
-   skb->dev = dev;
-
-   return RX_HANDLER_ANOTHER;
-   }
-   return RX_HANDLER_PASS;
-}
-
 static struct rtnl_link_stats64 *vrf_get_stats64(struct net_device *dev,
 struct rtnl_link_stats64 
*stats)
 {
@@ -506,28 +425,14 @@ static int do_vrf_add_slave(struct net_device *dev, 
struct net_device *port_dev)
 {
int ret;
 
-   /* register the packet handler for slave ports */
-   ret = netdev_rx_handler_register(port_dev, vrf_handle_frame, dev);
-   if (ret) {
-   netdev_err(port_dev,
-  "Device %s failed to register rx_handler\n",
-  port_dev->nam

[PATCH net-next v3 0/2] net: vrf: Fixup PKTINFO to return enslaved device index

2016-05-07 Thread David Ahern
Applications such as OSPF and BFD need the original ingress device not
the VRF device; the latter can be derived from the former. To that end
move the packet intercept from an rx handler that is invoked by
__netif_receive_skb_core to the ipv4 and ipv6 receive processing.

IPv6 already saves the skb_iif to the control buffer in ipv6_rcv. Since
the skb->dev has not been switched the cb has the enslaved device. Make
the same happen for IPv4 by adding the skb_iif to inet_skb_parm and set
it in ipv4 code after clearing the skb control buffer similar to IPv6.
>From there the pktinfo can just pull it from cb with the PKTINFO_SKB_CB
cast.

David Ahern (2):
  net: l3mdev: Add hook in ip and ipv6
  net: original ingress device index in PKTINFO

 drivers/net/vrf.c | 189 ++
 include/linux/ipv6.h  |  17 -
 include/linux/netdevice.h |   2 +
 include/net/ip.h  |   1 +
 include/net/l3mdev.h  |  43 +++
 include/net/tcp.h |   4 +-
 net/core/dev.c|   3 +-
 net/ipv4/ip_input.c   |   8 ++
 net/ipv4/ip_sockglue.c|   7 +-
 net/ipv6/ip6_input.c  |   7 ++
 10 files changed, 179 insertions(+), 102 deletions(-)

-- 
2.1.4



Re: [PATCH net-next v2 1/2] net: l3mdev: Add hook in ip and ipv6

2016-05-07 Thread David Ahern

On 5/7/16 12:32 PM, Shmulik Ladkani wrote:

Hi David,

On Sat, 7 May 2016 08:50:49 -0600 David Ahern  wrote:

+static inline
+struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
+{
+   struct net_device *master = NULL;
+
+   if (netif_is_l3_slave(skb->dev))
+   master = netdev_master_upper_dev_get_rcu(skb->dev);
+
+   else if (netif_is_l3_master(skb->dev))
+   master = skb->dev;
+
+   if (master && master->l3mdev_ops->l3mdev_l3_rcv)
+   skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);


In the case where netif_is_l3_master(skb->dev) is true, can you explain
why we need to pass it through the l3mdev_l3_rcv callback again?


what do you mean again? This is only time the l3mdev_l3_rcv method is
called on a packet.


You have the following:

if (netif_is_l3_slave(skb->dev))
master = netdev_master_upper_dev_get_rcu(skb->dev);
else if (netif_is_l3_master(skb->dev))
master = skb->dev;
if (master && master->l3mdev_ops->l3mdev_l3_rcv)
skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);

So in both conditions (skb->dev being the slave or the master) the skb
is passed to master's l3mdev_l3_rcv callback.

Appreciate if you can elaborate:
  - Why callback needs to be invoked when skb->dev is the L3 master?


Every l3mdev_ops has converged on that ordering -- if l3 slave else if 
l3 master.



  - On which circumstances we end up entering
l3mdev_ip_rcv/l3mdev_ip6_rcv where skb->dev is the master?
If I got it right, we enter 'ip_rcv_finish' on a slave device,
the callback is invoked and eventually sets skb->dev and skb->skb_iif
to the VRF device; then ip_rcv_finish continues processing the
altered skb (with the changed skb->dev).
So on which cicumstances do we enter 'ip_rcv_finish' where the
skb->dev is ALREADY a master device?


If you look at the full patchset I posted on 5/4 the patch after PKTINFO 
allows local traffic. That change needs the netif_is_l3_master().




Re: [PATCH 1/2] net: phy: add ethtool_phy_{get|set}_link_ksettings

2016-05-07 Thread Ben Hutchings
On Sun, 2016-05-08 at 00:56 +0200, Philippe Reynes wrote:
> On 07/05/16 13:59, Ben Hutchings wrote:
> > 
> > On Sat, 2016-05-07 at 01:18 +0200, Philippe Reynes wrote:
> > > 
> > > The callback {get|set}_link_ksettings are often defined
> > > in a very close way. There are mainly two differences in
> > > those callback:
> > > - the name of the netdev private structure
> > > - the name of the struct phydev in the private structure
> > > 
> > > We add two defines ethtool_phy_{get|set}_link_ksettings
> > > to avoid writing severals times almost the same function.
> > [...]
> > 
> > I don't think there's no need to access a private structure, as there's
> > a phydev pointer in struct net_device.  If some drivers don't maintain
> > that pointer, they should be changed to do so.  Then they can
> > use generic implementations of {get,set}_link_ksettings provided by
> > phylib.
> If we could use the phydev in the struct net_device, we could write a
> generic function for {get|set}_link_ksettings. It's a good idea.
> 
> But I've quickly looked and a lot of ethernet driver use the private
> structure to store the phydev. If the ethernet driver may use the
> struct net_device for phydev, do you know why so many drivers use
> the private structure ?

Maybe just because no-one bothered to update them after it was added to
net_device.

Ben.

> If everybody agree, I can send a new version with a generic 
> {get|set}_link_ksettings
> and a update of fec to use the phydev store in the structure net_device.

-- 
Ben Hutchings
I haven't lost my mind; it's backed up on tape somewhere.

signature.asc
Description: This is a digitally signed message part


[PATCH net-next v2 0/2] net: l3mdev: Allow send on enslaved interface

2016-05-07 Thread David Ahern
First patch preps for the second. The second is required for several use
cases such as ping on an interface and BFD that need to send packets on
a specific interface, including ones enslaved to a VRF device.

v2
- fixed brackets on both patches per comment from DaveM

David Ahern (2):
  net: l3mdev: Move get_saddr and rt6_dst
  net: l3mdev: Allow send on enslaved interface

 drivers/net/vrf.c|  2 ++
 include/net/l3mdev.h | 56 +++---
 net/ipv4/route.c |  4 
 net/ipv6/route.c |  2 +-
 net/l3mdev/l3mdev.c  | 63 
 5 files changed, 73 insertions(+), 54 deletions(-)

-- 
2.1.4



[PATCH net-next 2/2] net: l3mdev: Allow send on enslaved interface

2016-05-07 Thread David Ahern
Allow udp and raw sockets to send by oif that is an enslaved interface
versus the l3mdev/VRF device. For example, this allows BFD to use ifindex
from IP_PKTINFO on a receive to send a response without the need to
convert to the VRF index. It also allows ping and ping6 to work when
specifying an enslaved interface (e.g., ping -I swp1 ) which is
a natural use case.

Signed-off-by: David Ahern 
---
 drivers/net/vrf.c   |  2 ++
 net/ipv4/route.c|  4 
 net/l3mdev/l3mdev.c | 17 +
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 4b2461ae5d3b..c8db55aa8280 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -648,6 +648,8 @@ static int vrf_get_saddr(struct net_device *dev, struct 
flowi4 *fl4)
 
fl4->flowi4_flags |= FLOWI_FLAG_SKIP_NH_OIF;
fl4->flowi4_iif = LOOPBACK_IFINDEX;
+   /* make sure oif is set to VRF device for lookup */
+   fl4->flowi4_oif = dev->ifindex;
fl4->flowi4_tos = tos & IPTOS_RT_MASK;
fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
 RT_SCOPE_LINK : RT_SCOPE_UNIVERSE);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 8c8c655bb2c4..a1f2830d8110 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2146,6 +2146,7 @@ struct rtable *__ip_route_output_key_hash(struct net 
*net, struct flowi4 *fl4,
unsigned int flags = 0;
struct fib_result res;
struct rtable *rth;
+   int master_idx;
int orig_oif;
int err = -ENETUNREACH;
 
@@ -2155,6 +2156,9 @@ struct rtable *__ip_route_output_key_hash(struct net 
*net, struct flowi4 *fl4,
 
orig_oif = fl4->flowi4_oif;
 
+   master_idx = l3mdev_master_ifindex_by_index(net, fl4->flowi4_oif);
+   if (master_idx)
+   fl4->flowi4_oif = master_idx;
fl4->flowi4_iif = LOOPBACK_IFINDEX;
fl4->flowi4_tos = tos & IPTOS_RT_MASK;
fl4->flowi4_scope = ((tos & RTO_ONLINK) ?
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index 898d01e0f87b..6651a78e100c 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -112,12 +112,18 @@ struct dst_entry *l3mdev_get_rt6_dst(struct net *net,
struct dst_entry *dst = NULL;
struct net_device *dev;
 
-   dev = dev_get_by_index(net, fl6->flowi6_oif);
-   if (dev) {
-   if (netif_is_l3_master(dev) &&
+   if (fl6->flowi6_oif) {
+   rcu_read_lock();
+
+   dev = dev_get_by_index_rcu(net, fl6->flowi6_oif);
+   if (dev && netif_is_l3_slave(dev))
+   dev = netdev_master_upper_dev_get_rcu(dev);
+
+   if (dev && netif_is_l3_master(dev) &&
dev->l3mdev_ops->l3mdev_get_rt6_dst)
dst = dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6);
-   dev_put(dev);
+
+   rcu_read_unlock();
}
 
return dst;
@@ -141,6 +147,9 @@ int l3mdev_get_saddr(struct net *net, int ifindex, struct 
flowi4 *fl4)
rcu_read_lock();
 
dev = dev_get_by_index_rcu(net, ifindex);
+   if (dev && netif_is_l3_slave(dev))
+   dev = netdev_master_upper_dev_get_rcu(dev);
+
if (dev && netif_is_l3_master(dev) &&
dev->l3mdev_ops->l3mdev_get_saddr)
rc = dev->l3mdev_ops->l3mdev_get_saddr(dev, fl4);
-- 
2.1.4



[PATCH net-next 1/2] net: l3mdev: Move get_saddr and rt6_dst

2016-05-07 Thread David Ahern
Move l3mdev_rt6_dst_by_oif and l3mdev_get_saddr to l3mdev.c. Collapse
l3mdev_get_rt6_dst into l3mdev_rt6_dst_by_oif since it is the only
user and keep the l3mdev_get_rt6_dst name for consistency with other
hooks.

A follow-on patch adds more code to these functions making them long
for inlined functions.

Signed-off-by: David Ahern 
---
 include/net/l3mdev.h | 56 +++-
 net/ipv6/route.c |  2 +-
 net/l3mdev/l3mdev.c  | 54 ++
 3 files changed, 58 insertions(+), 54 deletions(-)

diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index c43a9c73de5e..78872bd1dc2c 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -130,52 +130,9 @@ static inline bool netif_index_is_l3_master(struct net 
*net, int ifindex)
return rc;
 }
 
-static inline int l3mdev_get_saddr(struct net *net, int ifindex,
-  struct flowi4 *fl4)
-{
-   struct net_device *dev;
-   int rc = 0;
-
-   if (ifindex) {
-
-   rcu_read_lock();
-
-   dev = dev_get_by_index_rcu(net, ifindex);
-   if (dev && netif_is_l3_master(dev) &&
-   dev->l3mdev_ops->l3mdev_get_saddr) {
-   rc = dev->l3mdev_ops->l3mdev_get_saddr(dev, fl4);
-   }
-
-   rcu_read_unlock();
-   }
-
-   return rc;
-}
+int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4);
 
-static inline struct dst_entry *l3mdev_get_rt6_dst(const struct net_device 
*dev,
-  const struct flowi6 *fl6)
-{
-   if (netif_is_l3_master(dev) && dev->l3mdev_ops->l3mdev_get_rt6_dst)
-   return dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6);
-
-   return NULL;
-}
-
-static inline
-struct dst_entry *l3mdev_rt6_dst_by_oif(struct net *net,
-   const struct flowi6 *fl6)
-{
-   struct dst_entry *dst = NULL;
-   struct net_device *dev;
-
-   dev = dev_get_by_index(net, fl6->flowi6_oif);
-   if (dev) {
-   dst = l3mdev_get_rt6_dst(dev, fl6);
-   dev_put(dev);
-   }
-
-   return dst;
-}
+struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 
*fl6);
 
 #else
 
@@ -233,14 +190,7 @@ static inline int l3mdev_get_saddr(struct net *net, int 
ifindex,
 }
 
 static inline
-struct dst_entry *l3mdev_get_rt6_dst(const struct net_device *dev,
-const struct flowi6 *fl6)
-{
-   return NULL;
-}
-static inline
-struct dst_entry *l3mdev_rt6_dst_by_oif(struct net *net,
-   const struct flowi6 *fl6)
+struct dst_entry *l3mdev_get_rt6_dst(struct net *net, const struct flowi6 *fl6)
 {
return NULL;
 }
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index af46e19205f5..c42fa1deb152 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1190,7 +1190,7 @@ struct dst_entry *ip6_route_output_flags(struct net *net, 
const struct sock *sk,
struct dst_entry *dst;
bool any_src;
 
-   dst = l3mdev_rt6_dst_by_oif(net, fl6);
+   dst = l3mdev_get_rt6_dst(net, fl6);
if (dst)
return dst;
 
diff --git a/net/l3mdev/l3mdev.c b/net/l3mdev/l3mdev.c
index e925037fa0df..898d01e0f87b 100644
--- a/net/l3mdev/l3mdev.c
+++ b/net/l3mdev/l3mdev.c
@@ -97,3 +97,57 @@ u32 l3mdev_fib_table_by_index(struct net *net, int ifindex)
return tb_id;
 }
 EXPORT_SYMBOL_GPL(l3mdev_fib_table_by_index);
+
+/**
+ * l3mdev_get_rt6_dst - IPv6 route lookup based on flow. Returns
+ *  cached route for L3 master device if relevant
+ *  to flow
+ * @net: network namespace for device index lookup
+ * @fl6: IPv6 flow struct for lookup
+ */
+
+struct dst_entry *l3mdev_get_rt6_dst(struct net *net,
+const struct flowi6 *fl6)
+{
+   struct dst_entry *dst = NULL;
+   struct net_device *dev;
+
+   dev = dev_get_by_index(net, fl6->flowi6_oif);
+   if (dev) {
+   if (netif_is_l3_master(dev) &&
+   dev->l3mdev_ops->l3mdev_get_rt6_dst)
+   dst = dev->l3mdev_ops->l3mdev_get_rt6_dst(dev, fl6);
+   dev_put(dev);
+   }
+
+   return dst;
+}
+EXPORT_SYMBOL_GPL(l3mdev_get_rt6_dst);
+
+/**
+ * l3mdev_get_saddr - get source address for a flow based on an interface
+ *enslaved to an L3 master device
+ * @net: network namespace for device index lookup
+ * @ifindex: Interface index
+ * @fl4: IPv4 flow struct
+ */
+
+int l3mdev_get_saddr(struct net *net, int ifindex, struct flowi4 *fl4)
+{
+   struct net_device *dev;
+   int rc = 0;
+
+   if (ifindex) {
+   rcu_read_lock();
+
+   dev = dev_get_by_index_rcu(net, ifindex);
+   if (dev && netif_is_l3_master(dev) &&
+   

Re: [PATCH 1/2] net: phy: add ethtool_phy_{get|set}_link_ksettings

2016-05-07 Thread Philippe Reynes

On 07/05/16 13:59, Ben Hutchings wrote:

On Sat, 2016-05-07 at 01:18 +0200, Philippe Reynes wrote:

The callback {get|set}_link_ksettings are often defined
in a very close way. There are mainly two differences in
those callback:
- the name of the netdev private structure
- the name of the struct phydev in the private structure

We add two defines ethtool_phy_{get|set}_link_ksettings
to avoid writing severals times almost the same function.

[...]

I don't think there's no need to access a private structure, as there's
a phydev pointer in struct net_device.  If some drivers don't maintain
that pointer, they should be changed to do so.  Then they can
use generic implementations of {get,set}_link_ksettings provided by
phylib.


If we could use the phydev in the struct net_device, we could write a
generic function for {get|set}_link_ksettings. It's a good idea.

But I've quickly looked and a lot of ethernet driver use the private
structure to store the phydev. If the ethernet driver may use the
struct net_device for phydev, do you know why so many drivers use
the private structure ?

If everybody agree, I can send a new version with a generic 
{get|set}_link_ksettings
and a update of fec to use the phydev store in the structure net_device.

Philippe




[PATCH 2/2] sh_eth: reuse sh_eth_chip_reset()

2016-05-07 Thread Sergei Shtylyov
All the chip_reset() methods repeat the code writing to the ARSTR register
and delaying for 1 ms, so that we can reuse sh_eth_chip_reset() twice.

Signed-off-by: Sergei Shtylyov 

---
 drivers/net/ethernet/renesas/sh_eth.c |   11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
===
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net-next/drivers/net/ethernet/renesas/sh_eth.c
@@ -537,11 +537,7 @@ static struct sh_eth_cpu_data r7s72100_d
 
 static void sh_eth_chip_reset_r8a7740(struct net_device *ndev)
 {
-   struct sh_eth_private *mdp = netdev_priv(ndev);
-
-   /* reset device */
-   sh_eth_tsu_write(mdp, ARSTR_ARST, ARSTR);
-   mdelay(1);
+   sh_eth_chip_reset(ndev);
 
sh_eth_select_mii(ndev);
 }
@@ -725,7 +721,6 @@ static struct sh_eth_cpu_data sh7757_dat
 #define GIGA_MAHR(port)(SH_GIGA_ETH_BASE + 0x800 * (port) + 
0x05c0)
 static void sh_eth_chip_reset_giga(struct net_device *ndev)
 {
-   struct sh_eth_private *mdp = netdev_priv(ndev);
u32 mahr[2], malr[2];
int i;
 
@@ -735,9 +730,7 @@ static void sh_eth_chip_reset_giga(struc
mahr[i] = ioread32((void *)GIGA_MAHR(i));
}
 
-   /* reset device */
-   sh_eth_tsu_write(mdp, ARSTR_ARST, ARSTR);
-   mdelay(1);
+   sh_eth_chip_reset(ndev);
 
/* restore MAHR and MALR */
for (i = 0; i < 2; i++) {



[PATCH 1/2] sh_eth: call sh_eth_tsu_write() from sh_eth_chip_reset_giga()

2016-05-07 Thread Sergei Shtylyov
sh_eth_chip_reset_giga() doesn't really need to use direct iowrite32() when
writing  to the ARSTR register,  it can use sh_eth_tsu_write() as all other
chip_reset() methods.

Signed-off-by: Sergei Shtylyov 

---
 drivers/net/ethernet/renesas/sh_eth.c |5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: net-next/drivers/net/ethernet/renesas/sh_eth.c
===
--- net-next.orig/drivers/net/ethernet/renesas/sh_eth.c
+++ net-next/drivers/net/ethernet/renesas/sh_eth.c
@@ -725,8 +725,9 @@ static struct sh_eth_cpu_data sh7757_dat
 #define GIGA_MAHR(port)(SH_GIGA_ETH_BASE + 0x800 * (port) + 
0x05c0)
 static void sh_eth_chip_reset_giga(struct net_device *ndev)
 {
-   int i;
+   struct sh_eth_private *mdp = netdev_priv(ndev);
u32 mahr[2], malr[2];
+   int i;
 
/* save MAHR and MALR */
for (i = 0; i < 2; i++) {
@@ -735,7 +736,7 @@ static void sh_eth_chip_reset_giga(struc
}
 
/* reset device */
-   iowrite32(ARSTR_ARST, (void *)(SH_GIGA_ETH_BASE + 0x1800));
+   sh_eth_tsu_write(mdp, ARSTR_ARST, ARSTR);
mdelay(1);
 
/* restore MAHR and MALR */




[PATCH 0/2] sh_eth: couple of software reset bit cleanups

2016-05-07 Thread Sergei Shtylyov
Hello.

   Here's a set of 2 patches against DaveM's 'net-next.git' repo. We can save
on the repetitive chip reset code...

[1/2] sh_eth: call sh_eth_tsu_write() from sh_eth_chip_reset_giga()
[2/2] sh_eth: reuse sh_eth_chip_reset()

MBR, Sergei



Re: [PATCH net-next 15/21] net: dsa: mv88e6xxx: factorize VLAN Ethertype

2016-05-07 Thread Vivien Didelot
Hi Andrew,

Andrew Lunn  writes:

> On Fri, May 06, 2016 at 05:57:17PM -0400, Vivien Didelot wrote:
>> The 6131 switch models have a Core Tag Type register. Add a
>> MV88E6XXX_FLAG_CORE_TAG_TYPE flag and set the VLAN Ethertype to 0x8100
>> in the shared setup code if it is present.
>
> Do you have any idea what the core tag is?

Core Tag Type is the global register 0x19 present only on 6185 and
similar. It is used when egressing double tagged frames. This value is
also checked when ingressing frames on ports in UseCoreTag mode.

Note that we might remove this flag and setup condition, because 0x8100
is the reset value anyway.

Vivien


Re: [PATCH net-next 06/21] net: dsa: mv88e6xxx: factorize MAC address setting

2016-05-07 Thread Vivien Didelot
Hi Andrew,

Andrew Lunn  writes:

>> @@ -378,6 +385,7 @@ enum mv88e6xxx_cap {
>>  #define MV88E6XXX_FLAG_EEPROM   BIT(MV88E6XXX_CAP_EEPROM)
>>  #define MV88E6XXX_FLAG_PPU  BIT(MV88E6XXX_CAP_PPU)
>>  #define MV88E6XXX_FLAG_SMI_PHY  BIT(MV88E6XXX_CAP_SMI_PHY)
>> +#define MV88E6XXX_FLAG_SWITCH_MAC   BIT(MV88E6XXX_CAP_SWITCH_MAC_WOL_WOF)
>>  #define MV88E6XXX_FLAG_TEMP BIT(MV88E6XXX_CAP_TEMP)
>>  #define MV88E6XXX_FLAG_TEMP_LIMIT   BIT(MV88E6XXX_CAP_TEMP_LIMIT)
>
> There is a general pattern here that the flag has a name derived from
> the capability. Except you dropped the WOL_WOF here. It would probably
> be better to not have WOL_WOF at all.

Indeed, I did that because the global 2 register 0x0D "Switch
MAC/WoL/WoF" is used to indirectly configure the switch MAC address, the
Wake on Lan and Wake on Frame. So I explicitly named the capability
MV88E6XXX_CAP_SWITCH_MAC_WOL_WOF and the flag for the switch MAC
MV88E6XXX_FLAG_SWITCH_MAC.

So if we add support for WoL, we can then define:

#define MV88E6XXX_FLAG_WOL BIT(MV88E6XXX_CAP_SWITCH_MAC_WOL_WOF)

But I can get rid of it if it feels confusing.

Thanks,

Vivien


[PATCH] pxa168_eth: mdiobus_scan() doesn't return NULL anymore

2016-05-07 Thread Sergei Shtylyov
Now that mdiobus_scan() doesn't return NULL on failure anymore, this driver
no  longer needs to check for it...

Signed-off-by: Sergei Shtylyov 

---
The patch is against DaveM's 'net-next.git' repo.

 drivers/net/ethernet/marvell/pxa168_eth.c |2 --
 1 file changed, 2 deletions(-)

Index: net-next/drivers/net/ethernet/marvell/pxa168_eth.c
===
--- net-next.orig/drivers/net/ethernet/marvell/pxa168_eth.c
+++ net-next/drivers/net/ethernet/marvell/pxa168_eth.c
@@ -981,8 +981,6 @@ static int pxa168_init_phy(struct net_de
pep->phy = mdiobus_scan(pep->smi_bus, pep->phy_addr);
if (IS_ERR(pep->phy))
return PTR_ERR(pep->phy);
-   if (!pep->phy)
-   return -ENODEV;
 
err = phy_connect_direct(dev, pep->phy, pxa168_eth_adjust_link,
 pep->phy_intf);



Re: [PATCH v2 1/2 -net] ravb: Add missing free_irq() call to ravb_close()

2016-05-07 Thread Geert Uytterhoeven
On Sat, May 7, 2016 at 8:39 PM, Sergei Shtylyov
 wrote:
> On 05/07/2016 02:17 PM, Geert Uytterhoeven wrote:
>
>> When reopening the network device on ra7795/salvator-x, e.g. after a
>> DHCP timeout:
>>
>>  IP-Config: Reopening network devices...
>>  genirq: Flags mismatch irq 139.  (eth0:ch24:emac) vs.
>>  (eth0:ch24:emac)
>>  ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac
>
>Er, this can't be a message from the kernel built from net.git. That
> driver used 'ndev->name'
> to request both IRQs...

Oops, I copied the error message from the second patch.
I don't think it matters that much, though...

>>  IP-Config: Failed to open eth0
>>  IP-Config: No network devices available
>>
>> The "mismatch" is due to requesting an IRQ that is already in use,
>> while IRQF_PROBE_SHARED wasn't set.
>>
>> However, the real cause is that ravb_close() doesn't release the R-Car
>> Gen3-specific secondary IRQ.
>>
>> Add the missing free_irq() call to fix this.
>>
>> Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC")
>> Signed-off-by: Geert Uytterhoeven 
>
>
> Acked-by: Sergei Shtylyov 

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [patch] netxen: netxen_rom_fast_read() doesn't return -1

2016-05-07 Thread David Miller
From: Dan Carpenter 
Date: Thu, 5 May 2016 16:20:20 +0300

> The error handling is broken here.  netxen_rom_fast_read() returns zero
> on success and -EIO on error.  It never returns -1.
> 
> Signed-off-by: Dan Carpenter 

Applied.


Re: [patch 2/2] netxen: reversed condition in netxen_nic_set_link_parameters()

2016-05-07 Thread David Miller
From: Dan Carpenter 
Date: Thu, 5 May 2016 16:19:44 +0300

> My static checker complains that we are using "autoneg" without
> initializing it.  The problem is the ->phy_read() condition is reversed
> so we only set this on error instead of success.
> 
> Signed-off-by: Dan Carpenter 

Applied.


Re: [patch 1/2] netxen: fix error handling in netxen_get_flash_block()

2016-05-07 Thread David Miller
From: Dan Carpenter 
Date: Thu, 5 May 2016 16:18:46 +0300

> My static checker complained that "v" can be used unintialized if
> netxen_rom_fast_read() returns -EIO.  That function never actually
> returns -1.
> 
> Signed-off-by: Dan Carpenter 

Applied.


Re: [PATCH net-next] cxgb4: Reset dcb state machine and tx queue prio only if dcb is enabled

2016-05-07 Thread David Miller
From: Hariprasad Shenai 
Date: Thu,  5 May 2016 11:05:39 +0530

> When cxgb4 is enabled with CONFIG_CHELSIO_T4_DCB set, VI enable command
> gets called with DCB enabled. But when we have a back to back setup with
> DCB enabled on one side and non-DCB on the Peer side. Firmware doesn't
> send any DCB_L2_CFG, and DCB priority is never set for Tx queue.
> But driver resets the queue priority and state machine whenever there
> is a link down, this patch fixes it by adding a check to reset only if
> cxgb4_dcb_enabled() returns true.
> 
> Signed-off-by: Hariprasad Shenai 

Applied.


Re: [PATCH net-next 2/4] xen-netback: add control protocol implementation

2016-05-07 Thread David Miller
From: Paul Durrant 
Date: Thu, 5 May 2016 12:19:28 +0100

> +struct xenvif_hash_cache {
> + rwlock_t lock;

You really don't want to lock on every SKB hash computation like
this, turn this into a spin lock for locking the write side and
use RCU locking for lookup and usage.

THanks.


Re: [PATCH net-next 2/2] net: original ingress device index in PKTINFO

2016-05-07 Thread Shmulik Ladkani
Hi,

On Sat, 7 May 2016 08:53:44 -0600 David Ahern  wrote:
> >> @@ -1193,7 +1193,12 @@ void ipv4_pktinfo_prepare(const struct sock *sk, 
> >> struct sk_buff *skb)
> >>   ipv6_sk_rxinfo(sk);
> >>
> >>if (prepare && skb_rtable(skb)) {
> >> -  pktinfo->ipi_ifindex = inet_iif(skb);
> >> +  /* skb->cb is overloaded: prior to this point it is IP{6}CB
> >> +   * which has interface index (iif) as the first member of the
> >> +   * underlying inet{6}_skb_parm struct. This code then overlays
> >> +   * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
> >> +   * element so the iif is picked up from the prior IPCB
> >> +   */  
> >
> > Better if there was a guarantee in the code that inet_skb_parm layout stays
> > that way. Or instead just explicitly assign the iif.  
> 
> At this point inet_iif points to the vrf device so can't use it.

Initially I was thinking about explicitly getting the iif out of the
IPCB first, then assign to ipi_ifindex. Seems more readable, and less
fragile. However this depends on the IPCB/IP6CB layout relationship as
well (iif being first on both).

I assume documenting the IP{6}CB/PKTINFO_SKB_CB layout relationship at the
struct definitions would be beneficial.

Regards,
Shmulik


Re: [PATCH net-next 1/2] net: l3mdev: Move get_saddr and rt6_dst

2016-05-07 Thread David Miller
From: David Ahern 
Date: Wed,  4 May 2016 21:54:09 -0700

> + if (dev && netif_is_l3_master(dev) &&
> + dev->l3mdev_ops->l3mdev_get_saddr) {
> + rc = dev->l3mdev_ops->l3mdev_get_saddr(dev, fl4);
> + }

Please do not use braces for single statement basic blocks.

The same issue exists in patch #2 so please fix it there as well.

Thanks.


Re: [PATCH net 1/1] net: fec: update dirty_tx even if no skb

2016-05-07 Thread Troy Kisky
On 4/21/2016 10:59 PM, Fugang Duan wrote:
> From: Troy Kisky  Sent: Friday, April 22, 
> 2016 10:01 AM
>> To: netdev@vger.kernel.org; da...@davemloft.net; Fugang Duan
>> ; lzn...@gmail.com
>> Cc: Fabio Estevam ; l.st...@pengutronix.de;
>> and...@lunn.ch; trem...@gmail.com; g...@uclinux.org; linux-arm-
>> ker...@lists.infradead.org; johan...@sipsolutions.net;
>> stillcompil...@gmail.com; sergei.shtyl...@cogentembedded.com;
>> a...@arndb.de; holgerschu...@gmail.com; Troy Kisky
>> 
>> Subject: [PATCH net 1/1] net: fec: update dirty_tx even if no skb
>>
>> If dirty_tx isn't updated, then dma_unmap_single will be called twice.
>>
>> This fixes a
>> [   58.420980] [ cut here ]
>> [   58.425667] WARNING: CPU: 0 PID: 377 at /home/schurig/d/mkarm/linux-
>> 4.5/lib/dma-debug.c:1096 check_unmap+0x9d0/0xab8()
>> [   58.436405] fec 2188000.ethernet: DMA-API: device driver tries to free DMA
>> memory it has not allocated [device address=0x] [size=66
>> bytes]
>>
>> encountered by Holger
>>
>> Signed-off-by: Troy Kisky 
>> Tested-by: 
>> ---
>>  drivers/net/ethernet/freescale/fec_main.c | 8 +++-
>>  1 file changed, 3 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/freescale/fec_main.c
>> b/drivers/net/ethernet/freescale/fec_main.c
>> index 08243c2..b71654c 100644
>> --- a/drivers/net/ethernet/freescale/fec_main.c
>> +++ b/drivers/net/ethernet/freescale/fec_main.c
>> @@ -1197,10 +1197,8 @@ fec_enet_tx_queue(struct net_device *ndev, u16
>> queue_id)
>>   fec16_to_cpu(bdp->cbd_datlen),
>>   DMA_TO_DEVICE);
>>  bdp->cbd_bufaddr = cpu_to_fec32(0);
>> -if (!skb) {
>> -bdp = fec_enet_get_nextdesc(bdp, &txq->bd);
>> -continue;
>> -}
>> +if (!skb)
>> +goto skb_done;
>>
>>  /* Check for errors. */
>>  if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC | @@ -1239,7
>> +1237,7 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id)
>>
>>  /* Free the sk buffer associated with this last transmit */
>>  dev_kfree_skb_any(skb);
>> -
>> +skb_done:
>>  /* Make sure the update to bdp and tx_skbuff are performed
>>   * before dirty_tx
>>   */
>> --
>> 2.5.0
> 
> The patch is fine for me.
> Can you review below patch that also fix the issue. It can take much effort 
> due to less rmb() and READ_ONCE() operation that is very sensitive for duplex 
> Gbps test for i.MX6SX/i.MX7d SOC. (i.MX6SX can reach at 1.4Gbps, i.MX7D can 
> reach at 1.8Gbps.)
> 


Am I supposed to do anything else to get this patch into net ?



pull request: bluetooth-next 2016-05-07

2016-05-07 Thread Johan Hedberg
Hi Dave,

Here are a few more Bluetooth patches for the 4.7 kernel:

 - NULL pointer fix in hci_intel driver
 - New Intel Bluetooth controller id in btusb driver
 - Added device tree binding documentation for Marvel's bt-sd8xxx
 - Platform specific wakeup interrupt support for btmrvl driver

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit 582a1db98892ef4c1f34c7338b272331994d44ab:

  Merge branch 'qed-selftests' (2016-05-02 00:16:45 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to a0af53b511423cca93900066512379e21586d7dd:

  Bluetooth: Add support for Intel Bluetooth device 8265 [8087:0a2b] 
(2016-05-06 21:52:35 +0200)


Loic Poulain (1):
  Bluetooth: hci_intel: Fix null gpio desc pointer dereference

Tedd Ho-Jeong An (1):
  Bluetooth: Add support for Intel Bluetooth device 8265 [8087:0a2b]

Xinming Hu (2):
  dt: bindings: add MARVELL's bt-sd8xxx wireless device
  btmrvl: add platform specific wakeup interrupt support

 Documentation/devicetree/bindings/btmrvl.txt   | 29 
 .../devicetree/bindings/net/marvell-bt-sd8xxx.txt  | 56 +++
 drivers/bluetooth/btmrvl_drv.h | 11 +++
 drivers/bluetooth/btmrvl_main.c| 35 +
 drivers/bluetooth/btmrvl_sdio.c| 79 +
 drivers/bluetooth/btmrvl_sdio.h|  6 ++
 drivers/bluetooth/btusb.c  | 11 +--
 drivers/bluetooth/hci_intel.c  |  6 +-
 8 files changed, 180 insertions(+), 53 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/btmrvl.txt
 create mode 100644 Documentation/devicetree/bindings/net/marvell-bt-sd8xxx.txt


signature.asc
Description: PGP signature


Re: [PATCH v2 2/2 -net-next] ravb: Add missing free_irq() calls to ravb_close()

2016-05-07 Thread Sergei Shtylyov

Hello.

On 05/07/2016 02:17 PM, Geert Uytterhoeven wrote:


When reopening the network device on ra7795/salvator-x, e.g. after a
DHCP timeout:

 IP-Config: Reopening network devices...
 genirq: Flags mismatch irq 139.  (eth0:ch24:emac) vs.  (


   Unwrapped line this time? :-)


 ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac
 IP-Config: Failed to open eth0
 IP-Config: No network devices available

The "mismatch" is due to requesting an IRQ that is already in use,
while IRQF_PROBE_SHARED wasn't set.

However, the real cause is that ravb_close() doesn't release any of the
R-Car Gen3-specific secondary IRQs.

Add the missing free_irq() calls to fix this.

Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC")
Fixes: f51bdc236b6c5835 ("ravb: Add dma queue interrupt support")
Signed-off-by: Geert Uytterhoeven 


Acked-by: Sergei Shtylyov 

MBR, Sergei



Re: [PATCH v2 1/2 -net] ravb: Add missing free_irq() call to ravb_close()

2016-05-07 Thread Sergei Shtylyov

Hello.

On 05/07/2016 02:17 PM, Geert Uytterhoeven wrote:


When reopening the network device on ra7795/salvator-x, e.g. after a
DHCP timeout:

 IP-Config: Reopening network devices...
 genirq: Flags mismatch irq 139.  (eth0:ch24:emac) vs.  
(eth0:ch24:emac)
 ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac


   Er, this can't be a message from the kernel built from net.git. That 
driver used 'ndev->name'

to request both IRQs...


 IP-Config: Failed to open eth0
 IP-Config: No network devices available

The "mismatch" is due to requesting an IRQ that is already in use,
while IRQF_PROBE_SHARED wasn't set.

However, the real cause is that ravb_close() doesn't release the R-Car
Gen3-specific secondary IRQ.

Add the missing free_irq() call to fix this.

Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC")
Signed-off-by: Geert Uytterhoeven 


Acked-by: Sergei Shtylyov 

MBR, Sergei



Re: [PATCH net-next v2 1/2] net: l3mdev: Add hook in ip and ipv6

2016-05-07 Thread Shmulik Ladkani
Hi David,

On Sat, 7 May 2016 08:50:49 -0600 David Ahern  wrote:
> >> +static inline
> >> +struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
> >> +{
> >> +  struct net_device *master = NULL;
> >> +
> >> +  if (netif_is_l3_slave(skb->dev))
> >> +  master = netdev_master_upper_dev_get_rcu(skb->dev);
> >> +
> >> +  else if (netif_is_l3_master(skb->dev))
> >> +  master = skb->dev;
> >> +
> >> +  if (master && master->l3mdev_ops->l3mdev_l3_rcv)
> >> +  skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);  
> >
> > In the case where netif_is_l3_master(skb->dev) is true, can you explain
> > why we need to pass it through the l3mdev_l3_rcv callback again?  
> 
> what do you mean again? This is only time the l3mdev_l3_rcv method is 
> called on a packet.

You have the following:

if (netif_is_l3_slave(skb->dev))
master = netdev_master_upper_dev_get_rcu(skb->dev);
else if (netif_is_l3_master(skb->dev))
master = skb->dev;
if (master && master->l3mdev_ops->l3mdev_l3_rcv)
skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);

So in both conditions (skb->dev being the slave or the master) the skb
is passed to master's l3mdev_l3_rcv callback.

Appreciate if you can elaborate:
 - Why callback needs to be invoked when skb->dev is the L3 master?
 - On which circumstances we end up entering
   l3mdev_ip_rcv/l3mdev_ip6_rcv where skb->dev is the master?
   If I got it right, we enter 'ip_rcv_finish' on a slave device,
   the callback is invoked and eventually sets skb->dev and skb->skb_iif
   to the VRF device; then ip_rcv_finish continues processing the
   altered skb (with the changed skb->dev).
   So on which cicumstances do we enter 'ip_rcv_finish' where the
   skb->dev is ALREADY a master device?

Thanks,
Shmulik


[PATCH net] macsec: key identifier is 128 bits, not 64

2016-05-07 Thread Sabrina Dubroca
The MACsec standard mentions a key identifier for each key, but
doesn't specify anything about it, so I arbitrarily chose 64 bits.

IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the
key identifier to be 128 bits (96 bits "member identifier" + 32 bits
"key number").

Signed-off-by: Sabrina Dubroca 
Acked-by: Hannes Frederic Sowa 
---
Let's make uapi consistent now, before macsec lands in an official
kernel release.

 drivers/net/macsec.c   | 19 +--
 include/uapi/linux/if_macsec.h |  4 +++-
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/macsec.c b/drivers/net/macsec.c
index c6385617bfb2..92eaab95ae2b 100644
--- a/drivers/net/macsec.c
+++ b/drivers/net/macsec.c
@@ -85,7 +85,7 @@ struct gcm_iv {
  * @tfm: crypto struct, key storage
  */
 struct macsec_key {
-   u64 id;
+   u8 id[MACSEC_KEYID_LEN];
struct crypto_aead *tfm;
 };
 
@@ -1529,7 +1529,8 @@ static const struct nla_policy 
macsec_genl_sa_policy[NUM_MACSEC_SA_ATTR] = {
[MACSEC_SA_ATTR_AN] = { .type = NLA_U8 },
[MACSEC_SA_ATTR_ACTIVE] = { .type = NLA_U8 },
[MACSEC_SA_ATTR_PN] = { .type = NLA_U32 },
-   [MACSEC_SA_ATTR_KEYID] = { .type = NLA_U64 },
+   [MACSEC_SA_ATTR_KEYID] = { .type = NLA_BINARY,
+  .len = MACSEC_KEYID_LEN, },
[MACSEC_SA_ATTR_KEY] = { .type = NLA_BINARY,
 .len = MACSEC_MAX_KEY_LEN, },
 };
@@ -1576,6 +1577,9 @@ static bool validate_add_rxsa(struct nlattr **attrs)
return false;
}
 
+   if (nla_len(attrs[MACSEC_SA_ATTR_KEYID]) != MACSEC_KEYID_LEN)
+   return false;
+
return true;
 }
 
@@ -1641,7 +1645,7 @@ static int macsec_add_rxsa(struct sk_buff *skb, struct 
genl_info *info)
if (tb_sa[MACSEC_SA_ATTR_ACTIVE])
rx_sa->active = !!nla_get_u8(tb_sa[MACSEC_SA_ATTR_ACTIVE]);
 
-   rx_sa->key.id = nla_get_u64(tb_sa[MACSEC_SA_ATTR_KEYID]);
+   nla_memcpy(rx_sa->key.id, tb_sa[MACSEC_SA_ATTR_KEY], MACSEC_KEYID_LEN);
rx_sa->sc = rx_sc;
rcu_assign_pointer(rx_sc->sa[assoc_num], rx_sa);
 
@@ -1722,6 +1726,9 @@ static bool validate_add_txsa(struct nlattr **attrs)
return false;
}
 
+   if (nla_len(attrs[MACSEC_SA_ATTR_KEYID]) != MACSEC_KEYID_LEN)
+   return false;
+
return true;
 }
 
@@ -1777,7 +1784,7 @@ static int macsec_add_txsa(struct sk_buff *skb, struct 
genl_info *info)
return -ENOMEM;
}
 
-   tx_sa->key.id = nla_get_u64(tb_sa[MACSEC_SA_ATTR_KEYID]);
+   nla_memcpy(tx_sa->key.id, tb_sa[MACSEC_SA_ATTR_KEY], MACSEC_KEYID_LEN);
 
spin_lock_bh(&tx_sa->lock);
tx_sa->next_pn = nla_get_u32(tb_sa[MACSEC_SA_ATTR_PN]);
@@ -2318,7 +2325,7 @@ static int dump_secy(struct macsec_secy *secy, struct 
net_device *dev,
 
if (nla_put_u8(skb, MACSEC_SA_ATTR_AN, i) ||
nla_put_u32(skb, MACSEC_SA_ATTR_PN, tx_sa->next_pn) ||
-   nla_put_u64(skb, MACSEC_SA_ATTR_KEYID, tx_sa->key.id) ||
+   nla_put(skb, MACSEC_SA_ATTR_KEYID, MACSEC_KEYID_LEN, 
tx_sa->key.id) ||
nla_put_u8(skb, MACSEC_SA_ATTR_ACTIVE, tx_sa->active)) {
nla_nest_cancel(skb, txsa_nest);
nla_nest_cancel(skb, txsa_list);
@@ -2419,7 +2426,7 @@ static int dump_secy(struct macsec_secy *secy, struct 
net_device *dev,
 
if (nla_put_u8(skb, MACSEC_SA_ATTR_AN, i) ||
nla_put_u32(skb, MACSEC_SA_ATTR_PN, rx_sa->next_pn) 
||
-   nla_put_u64(skb, MACSEC_SA_ATTR_KEYID, 
rx_sa->key.id) ||
+   nla_put(skb, MACSEC_SA_ATTR_KEYID, 
MACSEC_KEYID_LEN, rx_sa->key.id) ||
nla_put_u8(skb, MACSEC_SA_ATTR_ACTIVE, 
rx_sa->active)) {
nla_nest_cancel(skb, rxsa_nest);
nla_nest_cancel(skb, rxsc_nest);
diff --git a/include/uapi/linux/if_macsec.h b/include/uapi/linux/if_macsec.h
index 4c58d9917aa4..3411ed06b9c0 100644
--- a/include/uapi/linux/if_macsec.h
+++ b/include/uapi/linux/if_macsec.h
@@ -19,6 +19,8 @@
 
 #define MACSEC_MAX_KEY_LEN 128
 
+#define MACSEC_KEYID_LEN 16
+
 #define MACSEC_DEFAULT_CIPHER_ID   0x008002000101ULL
 #define MACSEC_DEFAULT_CIPHER_ALT  0x0080C2000101ULL
 
@@ -77,7 +79,7 @@ enum macsec_sa_attrs {
MACSEC_SA_ATTR_ACTIVE, /* config/dump, u8 0..1 */
MACSEC_SA_ATTR_PN, /* config/dump, u32 */
MACSEC_SA_ATTR_KEY,/* config, data */
-   MACSEC_SA_ATTR_KEYID,  /* config/dump, u64 */
+   MACSEC_SA_ATTR_KEYID,  /* config/dump, 128-bit */
MACSEC_SA_ATTR_STATS,  /* dump, nested, macsec_sa_stats_attr */
__MACSEC_SA_ATTR_END,
NUM_MACSEC_SA_ATTR = __MACSEC_SA_ATTR_END,
-- 
2.8.2



Re: [PATCH net-next 5/7] Driver: Vmxnet3: Add support for get_coalesce, set_coalesce ethtool operations

2016-05-07 Thread David Miller
From: Ben Hutchings 
Date: Sat, 07 May 2016 13:04:46 +0100

> On Fri, 2016-05-06 at 16:12 -0700, Shrikrishna Khare wrote:
> [...]
>> +static int
>> +vmxnet3_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *ec)
>> +{
> [...]
>> +switch (ec->rx_coalesce_usecs) {
>> +case VMXNET3_COALESCE_DEFAULT:
>> +case VMXNET3_COALESCE_DISABLED:
>> +case VMXNET3_COALESCE_ADAPT:
>> +if (ec->tx_max_coalesced_frames ||
>> +ec->tx_max_coalesced_frames_irq ||
>> +ec->rx_max_coalesced_frames_irq) {
>> +return -EINVAL;
>> +}
>> +memset(adapter->coal_conf, 0, sizeof(*adapter->coal_conf));
>> +adapter->coal_conf->coalMode = ec->rx_coalesce_usecs;
>> +break;
>> +case VMXNET3_COALESCE_STATIC:
> [...]
> 
> I don't want to see drivers introducing magic values for fields that
> are denominated in microseconds (especially not for 0, which is the
> correct way to specify 'no coalescing').  If the current
> ethtool_coalesce structure is inadequate, propose an extension.

Agreed.


Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets

2016-05-07 Thread David Miller
From: Dexuan Cui 
Date: Sat, 7 May 2016 10:49:25 +

> I should be able to make 'send', 'recv' here to pointers and use vmalloc()
> to allocate the memory for them.  I will do this.

That's still unswappable kernel memory.

People can open N sockets, where N is something on the order of the FD
limit the process has, per process.  This allows someone to quickly
eat up a lot of memory and hold onto it nearly indefinitely.


Re: [PATCH net-next 2/2] net: original ingress device index in PKTINFO

2016-05-07 Thread David Ahern

On 5/7/16 2:41 AM, Shmulik Ladkani wrote:

Hi David,

On Fri,  6 May 2016 18:49:41 -0700 David Ahern  wrote:

Applications such as OSPF and BFD need the original ingress device not
the VRF device;


Would you consider this true for any IP_PKTINFO users in VRF setups?


yes. I was just giving specific examples, but certainly every app I am 
aware of wants the enslaved index. If an app pops up that wants the vrf 
index it can be derived from the enslaved index.





@@ -1193,7 +1193,12 @@ void ipv4_pktinfo_prepare(const struct sock *sk, struct 
sk_buff *skb)
   ipv6_sk_rxinfo(sk);

if (prepare && skb_rtable(skb)) {
-   pktinfo->ipi_ifindex = inet_iif(skb);
+   /* skb->cb is overloaded: prior to this point it is IP{6}CB
+* which has interface index (iif) as the first member of the
+* underlying inet{6}_skb_parm struct. This code then overlays
+* PKTINFO_SKB_CB and in_pktinfo also has iif as the first
+* element so the iif is picked up from the prior IPCB
+*/


Better if there was a guarantee in the code that inet_skb_parm layout stays
that way. Or instead just explicitly assign the iif.


At this point inet_iif points to the vrf device so can't use it.


Re: [PATCH net-next v2 1/2] net: l3mdev: Add hook in ip and ipv6

2016-05-07 Thread David Ahern

On 5/7/16 2:30 AM, Shmulik Ladkani wrote:

Hi David,

On Fri,  6 May 2016 18:49:40 -0700 David Ahern  wrote:

+static bool ipv6_ndisc_frame(const struct sk_buff *skb)
+{
+   const struct ipv6hdr *ipv6h = (struct ipv6hdr *)skb->data;
+   size_t hlen = sizeof(*ipv6h);
+   bool rc = false;
+
+   if (ipv6h->nexthdr == NEXTHDR_ICMP) {
+   const struct icmp6hdr *icmph;
+
+   if (skb->len < hlen + sizeof(*icmph))
+   goto out;
+
+   icmph = (struct icmp6hdr *)(skb->data + sizeof(*ipv6h));
+   switch (icmph->icmp6_type) {


Don't we need an additional pskb_may_pull here?

If I get it right, 'ipv6_rcv' only assures sizeof(ipv6hdr) to be in the
linear header (unless it's a NEXTHDR_HOP, which is not the case here).


yes, I inadvertently dropped this:

commit 65c38aa653c14df49e19faad74bd375f36e61c57
Author: David Ahern 
Date:   Tue Feb 23 10:10:26 2016 -0800

net: vrf: Remove direct access to skb->data

when I forward ported this patch. Will fix and re-send.




+static inline
+struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
+{
+   struct net_device *master = NULL;
+
+   if (netif_is_l3_slave(skb->dev))
+   master = netdev_master_upper_dev_get_rcu(skb->dev);
+
+   else if (netif_is_l3_master(skb->dev))
+   master = skb->dev;
+
+   if (master && master->l3mdev_ops->l3mdev_l3_rcv)
+   skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);


In the case where netif_is_l3_master(skb->dev) is true, can you explain
why we need to pass it through the l3mdev_l3_rcv callback again?


what do you mean again? This is only time the l3mdev_l3_rcv method is 
called on a packet.





[iproute2 PATCH 1/1] tc: don't ignore ok as an action branch

2016-05-07 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

This is what used to happen before:

tc filter add dev tap1 parent : protocol 0xfefe prio 10 \
 u32 match u32 0 0 flowid 1:16 \
 action ife decode allow mark ok

tc -s filter ls dev tap1 parent :
filter protocol [65278] pref 10 u32
filter protocol [65278] pref 10 u32 fh 800: ht divisor 1
filter protocol [65278] pref 10 u32 fh 800::800 order 2048 key ht 800
bkt 0 flowid 1:16
  match / at 0
action order 1: ife decode action pipe
 index 2 ref 1 bind 1 installed 4 sec used 4 sec
 type: 0x0
 Metadata: allow mark
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

action order 2: gact action pass
 random type none pass val 0
 index 1 ref 1 bind 1 installed 4 sec used 4 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Note the extra action added at the end..

Signed-off-by: Jamal Hadi Salim 
---
 tc/m_connmark.c | 3 ++-
 tc/m_csum.c | 3 ++-
 tc/m_ife.c  | 3 ++-
 tc/m_mirred.c   | 3 ++-
 tc/m_nat.c  | 3 ++-
 tc/m_pedit.c| 3 ++-
 tc/m_skbedit.c  | 3 ++-
 tc/m_vlan.c | 3 ++-
 8 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/tc/m_connmark.c b/tc/m_connmark.c
index b1c7d3a..143d75d 100644
--- a/tc/m_connmark.c
+++ b/tc/m_connmark.c
@@ -99,7 +99,8 @@ parse_connmark(struct action_util *a, int *argc_p, char 
***argv_p, int tca_id,
sel.action = TC_ACT_UNSPEC;
argc--;
argv++;
-   } else if (matches(*argv, "pass") == 0) {
+   } else if (matches(*argv, "pass") == 0 ||
+  matches(*argv, "ok") == 0) {
sel.action = TC_ACT_OK;
argc--;
argv++;
diff --git a/tc/m_csum.c b/tc/m_csum.c
index 36181fa..fb1183a 100644
--- a/tc/m_csum.c
+++ b/tc/m_csum.c
@@ -140,7 +140,8 @@ parse_csum(struct action_util *a, int *argc_p,
sel.action = TC_ACT_UNSPEC;
argc--;
argv++;
-   } else if (matches(*argv, "pass") == 0) {
+   } else if (matches(*argv, "pass") == 0 ||
+  matches(*argv, "ok") == 0) {
sel.action = TC_ACT_OK;
argc--;
argv++;
diff --git a/tc/m_ife.c b/tc/m_ife.c
index 839e370..ed01ff7 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -167,7 +167,8 @@ static int parse_ife(struct action_util *a, int *argc_p, 
char ***argv_p,
p.action = TC_ACT_UNSPEC;
argc--;
argv++;
-   } else if (matches(*argv, "pass") == 0) {
+   } else if (matches(*argv, "pass") == 0 ||
+  matches(*argv, "ok") == 0) {
p.action = TC_ACT_OK;
argc--;
argv++;
diff --git a/tc/m_mirred.c b/tc/m_mirred.c
index e7e69df..64aad4d 100644
--- a/tc/m_mirred.c
+++ b/tc/m_mirred.c
@@ -172,7 +172,8 @@ parse_egress(struct action_util *a, int *argc_p, char 
***argv_p,
} else if (matches(*argv, "continue") == 0) {
p.action = TC_POLICE_UNSPEC;
NEXT_ARG();
-   } else if (matches(*argv, "pass") == 0) {
+   } else if (matches(*argv, "pass") == 0 ||
+  matches(*argv, "ok") == 0) {
p.action = TC_POLICE_OK;
NEXT_ARG();
}
diff --git a/tc/m_nat.c b/tc/m_nat.c
index 4b90121..4d1b1ed 100644
--- a/tc/m_nat.c
+++ b/tc/m_nat.c
@@ -135,7 +135,8 @@ parse_nat(struct action_util *a, int *argc_p, char 
***argv_p, int tca_id, struct
sel.action = TC_ACT_UNSPEC;
argc--;
argv++;
-   } else if (matches(*argv, "pass") == 0) {
+   } else if (matches(*argv, "pass") == 0 ||
+  matches(*argv, "ok") == 0) {
sel.action = TC_ACT_OK;
argc--;
argv++;
diff --git a/tc/m_pedit.c b/tc/m_pedit.c
index 2a94dfb..a539b68 100644
--- a/tc/m_pedit.c
+++ b/tc/m_pedit.c
@@ -495,7 +495,8 @@ parse_pedit(struct action_util *a, int *argc_p, char 
***argv_p, int tca_id, stru
} else if (matches(*argv, "continue") == 0) {
sel.sel.action = TC_ACT_UNSPEC;
NEXT_ARG();
-   } else if (matches(*argv, "pass") == 0) {
+   } else if (matches(*argv, "pass") == 0 ||
+  matches(*argv, "ok") == 0) {
sel.sel.action = TC_ACT_OK;
NEXT_ARG();
}
diff --git a/tc/m_skbedit.c 

[iproute2 PATCH v4 1/1] tc: introduce IFE action

2016-05-07 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

This action allows for a sending side to encapsulate arbitrary metadata
which is decapsulated by the receiving end.
The sender runs in encoding mode and the receiver in decode mode.
Both sender and receiver must specify the same ethertype.
At some point we hope to have a registered ethertype and we'll
then provide a default so the user doesnt have to specify it.
For now we enforce the user specify it.

Described in netdev01 paper:
   "Distributing Linux Traffic Control Classifier-Action Subsystem"
Authors: Jamal Hadi Salim and Damascene M. Joachimpillai

Also refer to IETF draft-ietf-forces-interfelfb-04.txt

Lets show example usage where we encode icmp from a sender towards
a receiver with an skbmark of 17; both sender and receiver use
ethertype of 0xdead to interop.

: Lets start with Receiver-side policy config:
xxx: add an ingress qdisc
sudo tc qdisc add dev $ETH ingress

xxx: any packets with ethertype 0xdead will be subjected to ife decoding
xxx: we then restart the classification so we can match on icmp at prio 3
sudo $TC filter add dev $ETH parent : prio 2 protocol 0xdead \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

xxx: on restarting the classification from above if it was an icmp
xxx: packet, then match it here and continue to the next rule at prio 4
xxx: which will match based on skb mark of 17
sudo tc filter add dev $ETH parent : prio 3 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue

xxx: match on skbmark of 0x11 (decimal 17) and accept
sudo tc filter add dev $ETH parent : prio 4 protocol ip \
handle 0x11 fw flowid 1:1 \
action ok

xxx: Lets show the decoding policy
sudo tc -s filter ls dev $ETH parent : protocol 0xdead
xxx:
filter pref 2 u32
filter pref 2 u32 fh 800: ht divisor 1
filter pref 2 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:1  (rule hit 
0 success 0)
  match / at 0 (success 0 )
action order 1: ife decode action reclassify type 0x0
 allow mark allow prio
 index 11 ref 1 bind 1 installed 45 sec used 45 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

xxx:
Observe that above lists all metadatum it can decode. Typically these
submodules will already be compiled into a monolithic kernel or
loaded as modules

: Lets show the sender side now ..
xxx: Add an egress qdisc on the sender netdev
sudo tc qdisc add dev $ETH root handle 1: prio
xxx:
xxx: Match all icmp packets to 192.168.122.237/24, then
xxx: tag the packet with skb mark of decimal 17, then
xxx: Encode it with:
xxx:ethertype 0xdead
xxx:add skb->mark to whitelist of metadatum to send
xxx:rewrite target dst MAC address to 02:15:15:15:15:15
xxx:
sudo $TC filter add dev $ETH parent 1: protocol ip prio 10  u32 \
match ip dst 192.168.122.237/24 \
match ip protocol 1 0xff \
flowid 1:2 \
action skbedit mark 17 \
action ife encode \
type 0xDEAD \
allow mark \
dst 02:15:15:15:15:15

xxx: Lets show the encoding policy
filter pref 10 u32
filter pref 10 u32 fh 800: ht divisor 1
filter pref 10 u32 fh 800::800 order 2048 key ht 800 bkt 0 flowid 1:2  (rule 
hit 118 success 0)
  match c0a87a00/ff00 at 16 (success 0 )
  match 0001/00ff at 8 (success 0 )
action order 1:  skbedit mark 17
 index 11 ref 1 bind 1 installed 3 sec used 3 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

action order 2: ife encode action pipe type 0xDEAD
 allow mark dst 02:15:15:15:15:15
 index 12 ref 1 bind 1 installed 3 sec used 3 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
xxx:

Now test by sending ping from sender to destination

Signed-off-by: Jamal Hadi Salim 
---
 tc/Makefile   |   1 +
 tc/m_ife.c| 341 ++
 4 files changed, 418 insertions(+)
 create mode 100644 include/linux/tc_act/tc_ife.h
 create mode 100644 include/linux/tc_ife.h
 create mode 100644 tc/m_ife.c

diff --git a/tc/Makefile b/tc/Makefile
index f5bea87..20f5110 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -43,6 +43,7 @@ TCMODULES += m_gact.o
 TCMODULES += m_mirred.o
 TCMODULES += m_nat.o
 TCMODULES += m_pedit.o
+TCMODULES += m_ife.o
 TCMODULES += m_skbedit.o
 TCMODULES += m_csum.o
 TCMODULES += m_simple.o
diff --git a/tc/m_ife.c b/tc/m_ife.c
new file mode 100644
index 000..839e370
--- /dev/null
+++ b/tc/m_ife.c
@@ -0,0 +1,341 @@
+/*
+ * m_ife.c IFE actions module
+ *
+ * This program is free software; you can distribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Auth

[net-next PATCH 1/1] export tc ife uapi header

2016-05-07 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Signed-off-by: Jamal Hadi Salim 
---
 include/uapi/linux/tc_act/Kbuild | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/tc_act/Kbuild b/include/uapi/linux/tc_act/Kbuild
index 242cf0c..e3969bd 100644
--- a/include/uapi/linux/tc_act/Kbuild
+++ b/include/uapi/linux/tc_act/Kbuild
@@ -10,3 +10,4 @@ header-y += tc_skbedit.h
 header-y += tc_vlan.h
 header-y += tc_bpf.h
 header-y += tc_connmark.h
+header-y += tc_ife.h
-- 
1.9.1



Re: [iproute2 PATCH v3 1/1] tc: introduce IFE action

2016-05-07 Thread Jamal Hadi Salim

On 16-04-25 08:28 AM, Jamal Hadi Salim wrote:

On 16-04-22 01:00 PM, Stephen Hemminger wrote:

On Thu, 21 Apr 2016 17:40:14 -0400




All header files for iproute2 (in include/linux) should come from
santized kernel
headers. I do not see the file tc_ife.h in include/uapi/linux in
current net-next
kernel source tree.



I am assuming based on your other email this is taken care of?



your patch didnt make it. I have time - will take care of this.

cheers,
jamal


Re: [PATCH net-next] net: make sch_handle_ingress() drop monitor ready

2016-05-07 Thread Jamal Hadi Salim

On 16-05-06 06:55 PM, Eric Dumazet wrote:

From: Eric Dumazet 

TC_ACT_STOLEN is used when ingress traffic is mirred/redirected
to say ifb.

Packet is not dropped, but consumed.

Only TC_ACT_SHOT is a clear indication something went wrong.

Signed-off-by: Eric Dumazet 


Acked-by: Jamal Hadi Salim 


cheers,
jamal


Re: [PATCH net-next 5/7] Driver: Vmxnet3: Add support for get_coalesce, set_coalesce ethtool operations

2016-05-07 Thread Ben Hutchings
On Fri, 2016-05-06 at 16:12 -0700, Shrikrishna Khare wrote:
[...]
> +static int
> +vmxnet3_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *ec)
> +{
[...]
> + switch (ec->rx_coalesce_usecs) {
> + case VMXNET3_COALESCE_DEFAULT:
> + case VMXNET3_COALESCE_DISABLED:
> + case VMXNET3_COALESCE_ADAPT:
> + if (ec->tx_max_coalesced_frames ||
> + ec->tx_max_coalesced_frames_irq ||
> + ec->rx_max_coalesced_frames_irq) {
> + return -EINVAL;
> + }
> + memset(adapter->coal_conf, 0, sizeof(*adapter->coal_conf));
> + adapter->coal_conf->coalMode = ec->rx_coalesce_usecs;
> + break;
> + case VMXNET3_COALESCE_STATIC:
[...]

I don't want to see drivers introducing magic values for fields that
are denominated in microseconds (especially not for 0, which is the
correct way to specify 'no coalescing').  If the current
ethtool_coalesce structure is inadequate, propose an extension.

Ben.

-- 
Ben Hutchings
Editing code like this is akin to sticking plasters on the bleeding stump
of a severed limb. - me, 29 June 1999

signature.asc
Description: This is a digitally signed message part


Re: [PATCH 1/2] net: phy: add ethtool_phy_{get|set}_link_ksettings

2016-05-07 Thread Ben Hutchings
On Sat, 2016-05-07 at 01:18 +0200, Philippe Reynes wrote:
> The callback {get|set}_link_ksettings are often defined
> in a very close way. There are mainly two differences in
> those callback:
> - the name of the netdev private structure
> - the name of the struct phydev in the private structure
> 
> We add two defines ethtool_phy_{get|set}_link_ksettings
> to avoid writing severals times almost the same function.
[...]

I don't think there's no need to access a private structure, as there's
a phydev pointer in struct net_device.  If some drivers don't maintain
that pointer, they should be changed to do so.  Then they can 
use generic implementations of {get,set}_link_ksettings provided by
phylib.

Ben.

-- 
Ben Hutchings
Editing code like this is akin to sticking plasters on the bleeding stump
of a severed limb. - me, 29 June 1999

signature.asc
Description: This is a digitally signed message part


[PATCH v2 1/2 -net] ravb: Add missing free_irq() call to ravb_close()

2016-05-07 Thread Geert Uytterhoeven
When reopening the network device on ra7795/salvator-x, e.g. after a
DHCP timeout:

IP-Config: Reopening network devices...
genirq: Flags mismatch irq 139.  (eth0:ch24:emac) vs.  
(eth0:ch24:emac)
ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac
IP-Config: Failed to open eth0
IP-Config: No network devices available

The "mismatch" is due to requesting an IRQ that is already in use,
while IRQF_PROBE_SHARED wasn't set.

However, the real cause is that ravb_close() doesn't release the R-Car
Gen3-specific secondary IRQ.

Add the missing free_irq() call to fix this.

Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC")
Signed-off-by: Geert Uytterhoeven 
---
This version is against net.git.

v2:
  - New.
---
 drivers/net/ethernet/renesas/ravb_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 9e2a0bd8f5a88803..4277d0c12101fef7 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1506,6 +1506,8 @@ static int ravb_close(struct net_device *ndev)
priv->phydev = NULL;
}
 
+   if (priv->chip_id == RCAR_GEN3)
+   free_irq(priv->emac_irq, ndev);
free_irq(ndev->irq, ndev);
 
napi_disable(&priv->napi[RAVB_NC]);
-- 
1.9.1



[PATCH v2 2/2 -net-next] ravb: Add missing free_irq() calls to ravb_close()

2016-05-07 Thread Geert Uytterhoeven
When reopening the network device on ra7795/salvator-x, e.g. after a
DHCP timeout:

IP-Config: Reopening network devices...
genirq: Flags mismatch irq 139.  (eth0:ch24:emac) vs.  (
ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac
IP-Config: Failed to open eth0
IP-Config: No network devices available

The "mismatch" is due to requesting an IRQ that is already in use,
while IRQF_PROBE_SHARED wasn't set.

However, the real cause is that ravb_close() doesn't release any of the
R-Car Gen3-specific secondary IRQs.

Add the missing free_irq() calls to fix this.

Fixes: 22d4df8ff3a3cc72 ("ravb: Add support for r8a7795 SoC")
Fixes: f51bdc236b6c5835 ("ravb: Add dma queue interrupt support")
Signed-off-by: Geert Uytterhoeven 
---
This version is against net-next.git.

v2:
  - Clearly state which version this patch is against,
  - Add Fixes tags.
---
 drivers/net/ethernet/renesas/ravb_main.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 34066e0649f5c673..867caf6e7a5a65ad 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -1667,6 +1667,13 @@ static int ravb_close(struct net_device *ndev)
priv->phydev = NULL;
}
 
+   if (priv->chip_id != RCAR_GEN2) {
+   free_irq(priv->tx_irqs[RAVB_NC], ndev);
+   free_irq(priv->rx_irqs[RAVB_NC], ndev);
+   free_irq(priv->tx_irqs[RAVB_BE], ndev);
+   free_irq(priv->rx_irqs[RAVB_BE], ndev);
+   free_irq(priv->emac_irq, ndev);
+   }
free_irq(ndev->irq, ndev);
 
napi_disable(&priv->napi[RAVB_NC]);
-- 
1.9.1



[PATCH v2 0/2] ravb: Add missing free_irq() calls to ravb_close()

2016-05-07 Thread Geert Uytterhoeven
Hi Dave,

When reopening the network device on ra7795/salvator-x, e.g. after a
DHCP timeout:

IP-Config: Reopening network devices...
genirq: Flags mismatch irq 139.  (eth0:ch24:emac) vs.  (
ravb e680.ethernet eth0: cannot request IRQ eth0:ch24:emac
IP-Config: Failed to open eth0
IP-Config: No network devices available

The "mismatch" is due to requesting an IRQ that is already in use,
while IRQF_PROBE_SHARED wasn't set.

However, the real cause is that ravb_close() doesn't release any of the
R-Car Gen3-specific secondary IRQs. Hence the following two patches add
the missing free_irq() calls to fix this.

As the code has been changed in net-next.git to support more interrupts,
and these weren't freed neither, I'm sending two patches:
  1. The first patch applies to net.git,
  2. The second patch applies to net-next.git.

If you prefer the second patch to be replaced by a version that applies
to net-next.git after you will have applied the first patch to net.git,
and merged net.git into net-next.git, just let me know.

Chances compared to the previous version:
  - Add version against net.git,
  - Clearly state which version the second patch is against,
  - Add Fixes tags.

Thanks!

Geert Uytterhoeven (1):
  ravb: Add missing free_irq() call to ravb_close()
  ravb: Add missing free_irq() calls to ravb_close()

 drivers/net/ethernet/renesas/ravb_main.c   | 9 +++
 1 files changed, 9 insertions(+)

-- 
1.9.1

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


RE: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets

2016-05-07 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Saturday, May 7, 2016 1:04
> To: Dexuan Cui 
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY
> Srinivasan ; Haiyang Zhang ;
> j...@perches.com; vkuzn...@redhat.com
> Subject: Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
> 
> From: Dexuan Cui 
> Date: Wed,  4 May 2016 09:56:57 -0700
> 
> > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
> > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
> > +
> > +#define HVSOCK_RCV_BUF_SZ
>   VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
>  ...
> > +struct hvsock_sock {
>  ...
> > +   /* The 'hdr' and 'buf' in the below 'send' and 'recv' definitions must
> > +* be consecutive: see hvsock_send_data() and hvsock_recv_data().
> > +*/
> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   u8 buf[HVSOCK_SND_BUF_SZ];
> > +   } send;
> > +
> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   u8 buf[HVSOCK_RCV_BUF_SZ];
> > +
> > +   unsigned int data_len;
> > +   unsigned int data_offset;
> > +   } recv;
> 
> I don't think allocating 5 pages of unswappable memory for every Hyper-V
> socket
> created is reasonable.

Thanks for the comment, David!

I should be able to make 'send', 'recv' here to pointers and use vmalloc()
to allocate the memory for them.  I will do this.

Thanks,
-- Dexuan


Re: OpenWRT wrong adjustment of fq_codel defaults (Was: [Codel] fq_codel_drop vs a udp flood)

2016-05-07 Thread Kevin Darbyshire-Bryant


On 06/05/16 10:42, Jesper Dangaard Brouer wrote:
> Hi Felix,
>
> This is an important fix for OpenWRT, please read!
>
> OpenWRT changed the default fq_codel sch->limit from 10240 to 1024,
> without also adjusting q->flows_cnt.  Eric explains below that you must
> also adjust the buckets (q->flows_cnt) for this not to break. (Just
> adjust it to 128)
>
> Problematic OpenWRT commit in question:
>  http://git.openwrt.org/?p=openwrt.git;a=patch;h=12cd6578084e
>  12cd6578084e ("kernel: revert fq_codel quantum override to prevent it from 
> causing too much cpu load with higher speed (#21326)")
I 'pull requested' this to the lede-staging tree on github.
https://github.com/lede-project/staging/pull/11

One way or another Felix & co should see the change :-)
>
>
> I also highly recommend you cherry-pick this very recent commit:
>  net-next: 9d18562a2278 ("fq_codel: add batch ability to fq_codel_drop()")
>  https://git.kernel.org/davem/net-next/c/9d18562a227
>
> This should fix very high CPU usage in-case fq_codel goes into drop mode.
> The problem is that drop mode was considered rare, and implementation
> wise it was chosen to be more expensive (to save cycles on normal mode).
> Unfortunately is it easy to trigger with an UDP flood. Drop mode is
> especially expensive for smaller devices, as it scans a 4K big array,
> thus 64 cache misses for small devices!
>
> The fix is to allow drop-mode to bulk-drop more packets when entering
> drop-mode (default 64 bulk drop).  That way we don't suddenly
> experience a significantly higher processing cost per packet, but
> instead can amortize this.
I haven't done the above cherry-pick patch & backport patch creation for
4.4/4.1/3.18 yet - maybe if $dayjob permits time and no one else beats
me to it :-)

Kevin



smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH net v4] vlan: Propagate MAC address to VLANs

2016-05-07 Thread Mike Manning
The MAC address of the physical interface is only copied to the VLAN
when it is first created, resulting in an inconsistency after MAC
address changes of only newly created VLANs having an up-to-date MAC.

The VLANs should continue inheriting the MAC address of the physical
interface until the VLAN MAC address is explicitly set to any value. 
This allows IPv6 EUI64 addresses for the VLAN to reflect any changes
to the MAC of the physical interface and thus for DAD to behave as
expected.

Signed-off-by: Mike Manning 
---
 net/8021q/vlan.c |7 +++
 net/8021q/vlan_dev.c |   14 ++
 2 files changed, 17 insertions(+), 4 deletions(-)

--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -291,6 +291,12 @@ static void vlan_sync_address(struct net
if (ether_addr_equal(vlan->real_dev_addr, dev->dev_addr))
return;
 
+   /* vlan continues to inherit address of parent interface */
+   if (vlandev->addr_assign_type == NET_ADDR_STOLEN) {
+   ether_addr_copy(vlandev->dev_addr, dev->dev_addr);
+   goto out;
+   }
+
/* vlan address was different from the old address and is equal to
 * the new address */
if (!ether_addr_equal(vlandev->dev_addr, vlan->real_dev_addr) &&
@@ -303,6 +309,7 @@ static void vlan_sync_address(struct net
!ether_addr_equal(vlandev->dev_addr, dev->dev_addr))
dev_uc_add(dev, vlandev->dev_addr);
 
+out:
ether_addr_copy(vlan->real_dev_addr, dev->dev_addr);
 }
 
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -255,9 +255,13 @@ static int vlan_dev_open(struct net_devi
return -ENETDOWN;
 
if (!ether_addr_equal(dev->dev_addr, real_dev->dev_addr)) {
-   err = dev_uc_add(real_dev, dev->dev_addr);
-   if (err < 0)
-   goto out;
+   if (dev->addr_assign_type == NET_ADDR_STOLEN) {
+   ether_addr_copy(dev->dev_addr, real_dev->dev_addr);
+   } else {
+   err = dev_uc_add(real_dev, dev->dev_addr);
+   if (err < 0)
+   goto out;
+   }
}
 
if (dev->flags & IFF_ALLMULTI) {
@@ -558,8 +562,10 @@ static int vlan_dev_init(struct net_devi
/* ipv6 shared card related stuff */
dev->dev_id = real_dev->dev_id;
 
-   if (is_zero_ether_addr(dev->dev_addr))
+   if (is_zero_ether_addr(dev->dev_addr)) {
eth_hw_addr_inherit(dev, real_dev);
+   dev->addr_assign_type = NET_ADDR_STOLEN;
+   }
if (is_zero_ether_addr(dev->broadcast))
memcpy(dev->broadcast, real_dev->broadcast, dev->addr_len);

-- 
1.7.10.4

























Re: [PATCH net v3] vlan: Propagate MAC address to VLANs

2016-05-07 Thread Mike Manning
On 05/06/2016 08:48 PM, Alexander Duyck wrote:
> On Fri, May 6, 2016 at 12:36 PM, Mike Manning  wrote:
>> On 05/06/2016 06:02 PM, Alexander Duyck wrote:
>>> On Fri, May 6, 2016 at 6:26 AM, Mike Manning  wrote:
 The MAC address of the physical interface is only copied to the VLAN
 when it is first created, resulting in an inconsistency after MAC
 address changes of only newly created VLANs having an up-to-date MAC.

 The VLANs should continue inheriting the MAC address of the physical
 interface, unless explicitly changed to be different from this.
 This allows IPv6 EUI64 addresses for the VLAN to reflect any changes
 to the MAC of the physical interface and thus for DAD to behave as
 expected.

 Signed-off-by: Mike Manning 
 ---
  include/linux/if_vlan.h |2 ++
  net/8021q/vlan.c|   17 +++--
  net/8021q/vlan_dev.c|   13 ++---
  3 files changed, 23 insertions(+), 9 deletions(-)

 --- a/include/linux/if_vlan.h
 +++ b/include/linux/if_vlan.h
 @@ -138,6 +138,7 @@ struct netpoll;
   * @flags: device flags
   * @real_dev: underlying netdevice
   * @real_dev_addr: address of underlying netdevice
 + * @addr_assign_type: address assignment type
   * @dent: proc dir entry
   * @vlan_pcpu_stats: ptr to percpu rx stats
   */
 @@ -153,6 +154,7 @@ struct vlan_dev_priv {

 struct net_device   *real_dev;
 unsigned char   real_dev_addr[ETH_ALEN];
 +   unsigned char   addr_assign_type;

 struct proc_dir_entry   *dent;
 struct vlan_pcpu_stats __percpu *vlan_pcpu_stats;
>>>
>>> Please don't start adding new members to structures when it already
>>> exists in the net_device.  If anything you should be able to drop
>>> read_dev_addr if you do this correctly because you shouldn't need to
>>> clone the lower dev address to watch for changes.  All you will need
>>> to do is watch NET_ADDR_STOLEN.
>>>
>>
>> Thanks for the detailed review. I had initially used the existing type
>> in net_device, but the problem with this was that it got overwritten to
>> NET_ADDR_SET in dev_set_mac_address(), which I was reluctant to modify.
>> It would just be a case of setting the type earlier in that function
>> (and caching the previous value in case there is an error).
>>
>> However, based on your later comment, it seems I should not bother with
>> the approach I have here, namely that if the VLAN MAC is set to the same
>> value as that of the lower device MAC, that is to be considered as
>> resetting it and thus for MAC inheritance to resume. Instead, I will just
>> make this a 1-shot transition, i.e. the VLAN MAC starts off as inherited,
>> and if it is set to anything (even the value of the lower device MAC),
>> inheritance is stopped. I agree this makes for a far simpler changeset.
>>
>> I don't think I can remove real_dev_addr, as that is still needed for
>> the existing functionality in vlan_sync_address() to determine if the sync
>> should be done, also as a way of caching it for handling in vlan_dev_open().
> 
> The thing is that logic isn't really needed anymore though if you are
> going to be following the lower dev.  If you follow the code what it
> is doing is adding the address via dev_uc_add if the lower address
> moves away from the VLAN address.  With your changes you are updating
> the VLAN MAC address to the lower value in the NET_ADDR_STOLEN case so
> you don't need to add or remove an extra unicast address.  If the user
> sets the MAC address you can then use the vlandev->dev_addr as the
> address you add/remove from the unicast list and you probably don't
> need to bother with tracking the lower device state anyway.
>

I agree that this logic is not needed at all for the NET_ADDR_STOLEN case.
However, once the VLAN MAC has been explicitly set, the situation reverts
to the existing functionality, whereby real_dev_addr is used to ensure that
dev_uc_add/del are not incorrectly called multiple times for the same MACs.
As an example, if the lower device MAC is different from the VLAN MAC and
is repeatedly set to the same value, then without this check on
real_dev_addr, the refcnt for the VLAN MAC would keep getting incremented.
The checks on the lower device MAC need to remain in place so as to call
dev_uc_add/del if this is changed to be different/same as the VLAN MAC.
For this reason, I have left this logic unchanged. Also, to ensure that
the value of real_dev_addr is correct on the transition to NET_ADDR_SET,
I keep this up-to-date (the alternative would be to add some code in
vlan_dev_set_mac_address() to copy dev_addr to real_dev_addr if the type on
entry is NET_ADDR_STOLEN).

>> As a matter of interest, what is the advantage of not updating the VLAN
>> MAC when it is down? I appreciate that one should 

Re: A couple of questions about the SKB fragments

2016-05-07 Thread Ilya Matveychikov
2016-05-05 14:13 GMT+03:00 Edward Cree :
> On 05/05/16 08:40, Ilya Matveychikov wrote:
> > Is there any docs except the kernel sources itself to refer to?

> davem has some docs up at http://vger.kernel.org/~davem/skb.html and
> http://vger.kernel.org/~davem/skb_data.html
> In particular note the following:
> "The frag_list is used to maintain a chain of SKBs organized for
>  fragmentation purposes, it is _not_ used for maintaining paged data."
> So my reading would suggest there is no way to multiple-layer-fragment
> an SKB; the frags are page pointers and offsets, not entire sk_buff
> structs in their own right.
>

Seems that the docs is slightly outdated. I think the structure of SKB does
not impose any restrictions on the nesting of the fragments. But is there
any of them in the kernel's code?


Re: [PATCH net] netfilter: nf_conntrack: Use net_mutex for helper unregistration.

2016-05-07 Thread Florian Westphal
Joe Stringer  wrote:
> > If so, probably I can append this as comment to this function so we
> > don't forget. If we ever have .exit callbacks (I don't expect so), we
> > would need to wait for worker completion.
> 
> Sounds reasonable to me.
> 
> I see there's a bunch of other unregister locations like
> nf_nat_l3proto_clean(), nf_nat_l4proto_clean(), nf_unregister_hook()
> which might need similar treatment?

I think they are fine, hook entries are duplicated per netns so we
should not access data in a removed module.

However, we might be able to trigger the

WARN(1, "nf_unregister_net_hook: hook not found!\n");

part in nf_unregister_net_hook():

[ destroy netns -> destruction queued -> rmmod -> all hooks are
destroyed -> netns workq runs -> nf_unregister_net_hook gets called
-> hook already gone ]

For nf_nat_l3|4proto_clean I don't see a problem either, if netns
is gone all these conntracks will be zapped once the workqueue runs, even if
the iteration in those function did not see the netns anymore.

Cheers,
Florian


Re: [PATCH net-next 2/2] net: original ingress device index in PKTINFO

2016-05-07 Thread Shmulik Ladkani
Hi David,

On Fri,  6 May 2016 18:49:41 -0700 David Ahern  wrote:
> Applications such as OSPF and BFD need the original ingress device not
> the VRF device;

Would you consider this true for any IP_PKTINFO users in VRF setups?

> @@ -1193,7 +1193,12 @@ void ipv4_pktinfo_prepare(const struct sock *sk, 
> struct sk_buff *skb)
>  ipv6_sk_rxinfo(sk);
>  
>   if (prepare && skb_rtable(skb)) {
> - pktinfo->ipi_ifindex = inet_iif(skb);
> + /* skb->cb is overloaded: prior to this point it is IP{6}CB
> +  * which has interface index (iif) as the first member of the
> +  * underlying inet{6}_skb_parm struct. This code then overlays
> +  * PKTINFO_SKB_CB and in_pktinfo also has iif as the first
> +  * element so the iif is picked up from the prior IPCB
> +  */

Better if there was a guarantee in the code that inet_skb_parm layout stays
that way. Or instead just explicitly assign the iif.

Regards,
Shmulik


Re: rtk8168 driver help needed

2016-05-07 Thread Francois Romieu
Murali Karicheri  :
[...]
> I am trying to integrate the rtl8168 PCIe card to have Ethernet functional
> on my Keystone EVM.

Which (EVM) one ?

> I purchased the rtl8111c Gib card from Amazon. The Card is detected
> by the RC and I can see it is enumerated and show up when doing lspci command.

What does "the RC" mean ? A different PC ?

> However I can't get the Ethernet port functional.

> Does this need MSI interrupt ?

No.

> I can't see it has requested any.

Yes, something went really, really wrong. See below.

[...]
> [2.303965] PCI host bridge /soc/pcie@2180 ranges:
> [2.309108]   No bus range found for /soc/pcie@2180, using [bus 00-ff]
> [2.316269]IO 0x2325..0x23253fff -> 0x
> [2.321499]   MEM 0x5000..0x5fff -> 0x5000
> [2.331666] keystone-pcie 21801000.pcie: PCI host bridge to bus :00
> [2.338283] pci_bus :00: root bus resource [bus 00-ff]
> [2.343937] pci_bus :00: root bus resource [io  0x-0x3fff]
> [2.350114] pci_bus :00: root bus resource [mem 0x5000-0x5fff]
> [2.357095] pci :00:00.0: [104c:b00b] type 01 class 0x060400
> [2.357665] PCI: bus0: Fast back to back transfers disabled
> [2.363717] pci :01:00.0: [10ec:8168] type 00 class 0x02
> [2.363809] pci :01:00.0: reg 0x10: [io  0x-0x00ff]
> [2.363867] pci :01:00.0: reg 0x18: [mem 0x-0x0fff 64bit]
> [2.363909] pci :01:00.0: reg 0x20: [mem 0x-0x 64bit 
> pref]
> [2.363939] pci :01:00.0: reg 0x30: [mem 0x-0x0001 pref]
> [2.364099] pci :01:00.0: supports D1 D2
> [2.364116] pci :01:00.0: PME# supported from D0 D1 D2 D3hot D3cold
> [2.381251] PCI: bus1: Fast back to back transfers disabled
> [2.386989] pci :00:00.0: BAR 8: assigned [mem 0x5000-0x500f]
> [2.393937] pci :00:00.0: BAR 9: assigned [mem 0x5010-0x501f 
> pref]
> [2.401221] pci :00:00.0: BAR 7: assigned [io  0x1000-0x1fff]
> [2.407320] pci :01:00.0: BAR 6: assigned [mem 0x5010-0x5011 
> pref]
> [2.414597] pci :01:00.0: BAR 4: assigned [mem 0x5012-0x5012 
> 64bit pref]
> [2.422380] pci :01:00.0: BAR 2: assigned [mem 0x5000-0x5fff 
> 64bit]
> [2.429702] pci :01:00.0: BAR 0: assigned [io  0x1000-0x10ff]
> [2.435821] pci :00:00.0: PCI bridge to [bus 01]
> [2.440783] pci :00:00.0:   bridge window [io  0x1000-0x1fff]
> [2.446896] pci :00:00.0:   bridge window [mem 0x5000-0x500f]
> [2.453699] pci :00:00.0:   bridge window [mem 0x5010-0x501f 
> pref]
> [2.461453] pcieport :00:00.0: Signaling PME through PCIe PME interrupt
> [2.468411] pci :01:00.0: Signaling PME through PCIe PME interrupt
> [2.475075] pcie_pme :00:00.0:pcie01: service driver pcie_pme loaded
> [2.475392] aer :00:00.0:pcie02: service driver aer loaded
> [2.475652] r8169 Gigabit Ethernet driver 2.3LK-NAPI loaded
> [2.481419] r8169 :01:00.0: enabling device (0140 -> 0143)
> [2.488865] r8169 :01:00.0 eth0: RTL8169 at 0xf0d6a000, 
> 00:00:00:00:00:00, XID  IRQ 286

No need to go further, there is a serious problem here.

Most of the XID bits are read from a mapped register. They're definitely not
expected to be null as the driver requires it to identify a proper chipset
(the common PCI configuration registers only tell that you aren't dealing
with a coffee machine).

Any read returning zeroes would not surprize me.

[...]
> Can someone help me figure out what is missing ?

Hardly at this point. I can only suggest to switch power off, plug
the 8168 device in a x86 (32 or 64 bits) system and check how it behaves.

-- 
Ueimor


Re: [PATCH net-next v2 1/2] net: l3mdev: Add hook in ip and ipv6

2016-05-07 Thread Shmulik Ladkani
Hi David,

On Fri,  6 May 2016 18:49:40 -0700 David Ahern  wrote:
> +static bool ipv6_ndisc_frame(const struct sk_buff *skb)
> +{
> + const struct ipv6hdr *ipv6h = (struct ipv6hdr *)skb->data;
> + size_t hlen = sizeof(*ipv6h);
> + bool rc = false;
> +
> + if (ipv6h->nexthdr == NEXTHDR_ICMP) {
> + const struct icmp6hdr *icmph;
> +
> + if (skb->len < hlen + sizeof(*icmph))
> + goto out;
> +
> + icmph = (struct icmp6hdr *)(skb->data + sizeof(*ipv6h));
> + switch (icmph->icmp6_type) {

Don't we need an additional pskb_may_pull here?

If I get it right, 'ipv6_rcv' only assures sizeof(ipv6hdr) to be in the
linear header (unless it's a NEXTHDR_HOP, which is not the case here).

> +static inline
> +struct sk_buff *l3mdev_l3_rcv(struct sk_buff *skb, u16 proto)
> +{
> + struct net_device *master = NULL;
> +
> + if (netif_is_l3_slave(skb->dev))
> + master = netdev_master_upper_dev_get_rcu(skb->dev);
> +
> + else if (netif_is_l3_master(skb->dev))
> + master = skb->dev;
> +
> + if (master && master->l3mdev_ops->l3mdev_l3_rcv)
> + skb = master->l3mdev_ops->l3mdev_l3_rcv(master, skb, proto);

In the case where netif_is_l3_master(skb->dev) is true, can you explain
why we need to pass it through the l3mdev_l3_rcv callback again?

Regards,
Shmulik


Re: [PATCH RFC 1/5] net: phy: sun8i-h3-ephy: Add bindings for Allwinner H3 Ethernet PHY

2016-05-07 Thread Hans de Goede

Hi,

On 07-05-16 07:30, Chen-Yu Tsai wrote:

Hi,

On Tue, Apr 12, 2016 at 9:38 AM, Chen-Yu Tsai  wrote:

On Tue, Apr 12, 2016 at 3:23 AM, Florian Fainelli  wrote:

On 04/04/16 09:22, Chen-Yu Tsai wrote:

The Allwinner H3 SoC incorporates an Ethernet PHY. This is enabled and
configured through a memory mapped hardware register.

This same register also configures the MAC interface mode and TX clock
source. Also covered by the register, but not supported in these bindings,
are TX/RX clock delay chains and inverters, and an RMII module.

Signed-off-by: Chen-Yu Tsai 
---
 .../bindings/net/allwinner,sun8i-h3-ephy.txt   | 44 ++
 1 file changed, 44 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt

diff --git a/Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt 
b/Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt
new file mode 100644
index ..146f227e6d88
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/allwinner,sun8i-h3-ephy.txt
@@ -0,0 +1,44 @@
+* Allwinner H3 E(thernet) PHY
+
+The Allwinner H3 integrates an MII ethernet PHY. As with external PHYs,
+before it can be configured over the MDIO bus and used, certain hardware
+features must be configured, such as the PHY address and LED polarity.


Is the internal PHY address really configurable? Not that there is
anything wrong with it, this is good.


It is. Things that are configured or provided to a discrete PHY are routed
to registers in the SoC, things such as PHY address, clocks, resets.


+The PHY must also be powered on and brought out of reset.
+
+This is accomplished with regulators and pull-up/downs for external PHYs.
+For this internal PHY, a hardware register is programmed.
+
+The same hardware register also contains clock and interface controls
+for the MAC. This is also present in earlier SoCs, and is covered by
+"allwinner,sun7i-a20-gmac-clk". The controls in the H3 are slightly
+different due to the inclusion of what appears to be an RMII-MII
+bridge.
+
+Required properties:
+- compatible: should be "allwinner,sun8i-h3-ephy"
+- reg: address and length of the register set for the device
+- clocks: A phandle to the reference clock for this device
+- resets: A phandle to the reset control for this device
+
+Ethernet PHY related properties:
+- allwinner,ephy-addr: the MDIO bus address the PHY should respond to.
+If this is not set, the external PHY is used, and
+everything else in this section is ignored.


So we are going to put the same value here, and in the actual Ethernet
PHY device tree node that should be hanging off the EMAC/MDIO bus
controller, this is confusing and error prone.


Yes, that would be an issue when writing the DTS.



+- allwinner,leds-active-low: LEDs are active low. Without this, LEDs are
+  active high.
+
+Ethernet MAC clock related properties:
+- #clock-cells: should be 0
+- clock-output-names: "mac_tx"
+
+Example:
+
+ethernet-phy@01c00030 {
+ compatible = "allwinner,sun8i-h3-ephy";
+ reg = <0x01c00030 0x4>;


Looking at this register space this looks more like an internal PHY SHIM
that is needed to be configured before the internal PHY can be access,
not a proper Ethernet PHY per-se, see replies in aptch 2.

Should not this block be a second cell associated with the Ethernet MAC
block? One or the other are not going to be very useful without
knowledge of each other.


True. However the lower half of the same register also controls the
MAC interface mode and TX clock source and delays. This we had a clock
driver that was used in conjuction with stmmac on earlier SoCs. I was
hoping to keep that model with the newer EMAC. At the time it was
argued that what seemed like a clock should be handled by a clock
driver, instead of just a "syscon". If this is reaching too far to
handle this new use case, I will happily just provide patches to merge
this into the MAC.


Maxime, Hans, any thoughts?

It seems like it'd be easier to just fold this into the EMAC driver.
The register is not part of the clock controller in these new SoCs,
so it's nicer than what we had in A20/A31. It's also not just a clock
control, but a bunch of various controls.


I don't know the emac stuff well enough to really have an opinion,
but with that said I've no objections against just folding this into
the driver. If you believe that is best, go for it.

Regards,

Hans


Re: [PATCH net] macvtap: segmented packet is consumed

2016-05-07 Thread Shmulik Ladkani
Hi,

On Fri, 06 May 2016 05:58:21 -0700 Eric Dumazet  wrote:
> From: Eric Dumazet 
> 
> If GSO packet is segmented and its segments are properly queued,
> we call consume_skb() instead of kfree_skb() to be drop monitor
> friendly.
> 
> Fixes: 3e4f8b7873709 ("macvtap: Perform GSO on forwarding path.")
> Signed-off-by: Eric Dumazet 
> Cc: Vlad Yasevich 

Reviewed-by: Shmulik Ladkani