Re: [GIT PULL net-next] rxrpc: Improve conn/call lookup and fix call number generation [ver #3]

2016-07-08 Thread David Howells
David Miller  wrote:

> > Can you pull this into net-next please?
> 
> I'll pull, but this is not how I want you to operate.
> 
> If you change stuff, you must repost the entire series.  And this is
> one of many reasons I want people to keep patch sets small, so that
> this is less painful.
> 
> But the whole series repost is absolutely required.
> 
> I don't care if you just add a "." to the end of a sentence, I want to
> see the series reposted for review in it's entirety.

That's why I asked if you'd prefer me to post the whole lot instead.  I
checked the netdev faq, but it doesn't express your preference.

David


Re: [PATCH] sctp: fix panic when sending auth chunks

2016-07-08 Thread David Miller
From: Marcelo Ricardo Leitner 
Date: Thu,  7 Jul 2016 09:39:29 -0300

> When we introduced GSO support, if using auth the auth chunk was being
> left queued on the packet even after the final segment was generated.
> Later on sctp_transmit_packet it calls sctp_packet_reset, which zeroed
> the packet len while not accounting for this left-over. This caused more
> space to be used the next packet due to the chunk still being queued,
> but space which wasn't allocated as its size wasn't accounted.
> 
> The fix is to only queue it back when we know that we are going to
> generate another segment.
> 
> Fixes: 90017accff61 ("sctp: Add GSO support")
> Signed-off-by: Marcelo Ricardo Leitner 

Applied to net-next.

Please make the target tree for your patch explicit in future
submissions.

Thanks.


Re: [patch -next] bnxt: fix a condition

2016-07-08 Thread David Miller
From: Dan Carpenter 
Date: Thu, 7 Jul 2016 11:23:09 +0300

> This code generates as static checker warning because htons(ETH_P_IPV6)
> is always true.  From the context it looks like the && was intended to
> be !=.
> 
> Fixes: 94758f8de037 ('bnxt_en: Add GRO logic for BCM5731X chips.')
> Signed-off-by: Dan Carpenter 

Applied.


Re: [PATCH net-next] bpf: introduce bpf_get_current_task() helper

2016-07-08 Thread David Miller
From: Alexei Starovoitov 
Date: Wed, 6 Jul 2016 22:38:36 -0700

> over time there were multiple requests to access different data
> structures and fields of task_struct current, so finally add
> the helper to access 'current' as-is. Tracing bpf programs will do
> the rest of walking the pointers via bpf_probe_read().
> Note that current can be null and bpf program has to deal it with,
> but even dumb passing null into bpf_probe_read() is still safe.
> 
> Suggested-by: Brendan Gregg 
> Signed-off-by: Alexei Starovoitov 
> Acked-by: Daniel Borkmann 

Applied.


Re: [PATCH net-next] net: dsa: initialize the routing table

2016-07-08 Thread David Miller
From: Vivien Didelot 
Date: Wed,  6 Jul 2016 20:03:54 -0400

> The routing table of every switch in a tree is currently initialized to
> all zeros. This is an issue since 0 is a valid port number.
> 
> Add a DSA_RTABLE_NONE=-1 constant to initialize the signed values of the
> routing table pointing to other switches.
> 
> This fixes the device mapping of the mv88e6xxx driver where the port
> pointing to the switch itself and to non-existent switches was wrongly
> configured to be 0. It is now set to the expected 0xf value.
> 
> Signed-off-by: Vivien Didelot 

Applied.


[PATCH v2 net-next] rtnl: Add GFP flag argument to rtnl_unicast()

2016-07-08 Thread Masashi Honma
This commit extends rtnl_unicast() to specify GFP flags.

This commit depends on Eric Dumazet's commits below.
ipv4: do not abuse GFP_ATOMIC in inet_netconf_notify_devconf()
ipv6: do not abuse GFP_ATOMIC in inet6_netconf_notify_devconf()

Signed-off-by: Masashi Honma 
---
 include/linux/netlink.h   |  2 ++
 include/linux/rtnetlink.h |  3 ++-
 net/core/net_namespace.c  |  2 +-
 net/core/rtnetlink.c  | 16 +++-
 net/dcb/dcbnl.c   |  2 +-
 net/decnet/dn_route.c |  3 ++-
 net/ipv4/devinet.c|  2 +-
 net/ipv4/ipmr.c   |  6 --
 net/ipv4/route.c  |  2 +-
 net/ipv6/addrconf.c   |  4 ++--
 net/ipv6/addrlabel.c  |  2 +-
 net/ipv6/ip6mr.c  |  6 --
 net/ipv6/route.c  |  2 +-
 net/netlink/af_netlink.c  | 12 +---
 net/sched/act_api.c   |  2 +-
 15 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index da14ab6..5b8e34d 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -69,6 +69,8 @@ extern void __netlink_clear_multicast_users(struct sock *sk, 
unsigned int group)
 extern void netlink_ack(struct sk_buff *in_skb, struct nlmsghdr *nlh, int err);
 extern int netlink_has_listeners(struct sock *sk, unsigned int group);
 
+extern int __netlink_unicast(struct sock *ssk, struct sk_buff *skb, u32 portid,
+int nonblock, gfp_t allocation);
 extern int netlink_unicast(struct sock *ssk, struct sk_buff *skb, __u32 
portid, int nonblock);
 extern int netlink_broadcast(struct sock *ssk, struct sk_buff *skb, __u32 
portid,
 __u32 group, gfp_t allocation);
diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 2daece8..132730f 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -8,7 +8,8 @@
 #include 
 
 extern int rtnetlink_send(struct sk_buff *skb, struct net *net, u32 pid, u32 
group, int echo);
-extern int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid);
+extern int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid,
+   gfp_t flags);
 extern void rtnl_notify(struct sk_buff *skb, struct net *net, u32 pid,
u32 group, struct nlmsghdr *nlh, gfp_t flags);
 extern void rtnl_set_sk_err(struct net *net, u32 group, int error);
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 2c2eb1b..28eed58 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -646,7 +646,7 @@ static int rtnl_net_getid(struct sk_buff *skb, struct 
nlmsghdr *nlh)
if (err < 0)
goto err_out;
 
-   err = rtnl_unicast(msg, net, NETLINK_CB(skb).portid);
+   err = rtnl_unicast(msg, net, NETLINK_CB(skb).portid, GFP_KERNEL);
goto out;
 
 err_out:
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index eb49ca2..42e6c5c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -653,11 +653,15 @@ int rtnetlink_send(struct sk_buff *skb, struct net *net, 
u32 pid, unsigned int g
return err;
 }
 
-int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid)
+int rtnl_unicast(struct sk_buff *skb, struct net *net, u32 pid, gfp_t flags)
 {
-   struct sock *rtnl = net->rtnl;
+   int err;
 
-   return nlmsg_unicast(rtnl, skb, pid);
+   err = __netlink_unicast(net->rtnl, skb, pid, MSG_DONTWAIT, flags);
+   if (err > 0)
+   err = 0;
+
+   return err;
 }
 EXPORT_SYMBOL(rtnl_unicast);
 
@@ -2565,7 +2569,8 @@ static int rtnl_getlink(struct sk_buff *skb, struct 
nlmsghdr* nlh)
WARN_ON(err == -EMSGSIZE);
kfree_skb(nskb);
} else
-   err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+   err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid,
+  GFP_KERNEL);
 
return err;
 }
@@ -3601,7 +3606,8 @@ static int rtnl_stats_get(struct sk_buff *skb, struct 
nlmsghdr *nlh)
WARN_ON(err == -EMSGSIZE);
kfree_skb(nskb);
} else {
-   err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid);
+   err = rtnl_unicast(nskb, net, NETLINK_CB(skb).portid,
+  GFP_KERNEL);
}
 
return err;
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index 4f6c186..e4de9fe 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -1749,7 +1749,7 @@ static int dcb_doit(struct sk_buff *skb, struct nlmsghdr 
*nlh)
 
nlmsg_end(reply_skb, reply_nlh);
 
-   ret = rtnl_unicast(reply_skb, net, portid);
+   ret = rtnl_unicast(reply_skb, net, portid, GFP_KERNEL);
 out:
return ret;
 }
diff --git a/net/decnet/dn_route.c b/net/decnet/dn_route.c
index b1dc096..6fe02bb 100644
--- a/net/decnet/dn_route.c
+++ b/net/decnet/dn_route.c
@@ -1714,7 +1714,8 @@ static int dn_cache_getroute(struct sk_buff *in_skb, 
struct nlmsghdr *nlh)
   

Re: [PATCH net-next] tun: Don't assume type tun in tun_device_event

2016-07-08 Thread David Miller
From: Craig Gallek 
Date: Wed,  6 Jul 2016 18:44:20 -0400

> From: Craig Gallek 
> 
> The referenced change added a netlink notifier for processing
> device queue size events.  These events are fired for all devices
> but the registered callback assumed they only occurred for tun
> devices.  This fix adds a check (borrowed from macvtap.c) to discard
> non-tun device events.
> 
> For reference, this fixes the following splat:
 ...
> Fixes: 1576d9860599 ("tun: switch to use skb array for tx")
> Signed-off-by: Craig Gallek 

Applied, thanks.


Re: pull-request: mac80211 2016-07-06

2016-07-08 Thread David Miller
From: Johannes Berg 
Date: Wed,  6 Jul 2016 14:27:43 +0200

> People found two more bugs, so I have two more fixes. Both are related to
> memory - one's a leak, the other is a missing allocation failure check.
> 
> These are both tagged for stable already, and shouldn't conflict with any
> other patches, so if they won't go in any more it won't be a big deal.
> 
> Let me know if there's any problem.

Pulled, thanks.


Re: [RFC 0/7] netlink: Add allocation flag to netlink_unicast()

2016-07-08 Thread Masashi Honma
On 2016年07月06日 09:28, Masashi Honma wrote:
> Though netlink_broadcast() ...

Thanks for reply of David Miller, Eric Dumazet, David Teigrand.

On the basis of their comment, only rtnl_unicast() looks need to add gfp
flag
argument. So I will drop almost of patches except 0005.

I will send patch v2 to more limited destination.



Re: [RFC 0/7] netlink: Add allocation flag to netlink_unicast()

2016-07-08 Thread Masashi Honma

On 2016年07月09日 01:08, David Teigland wrote:

On Thu, Jul 07, 2016 at 09:35:45AM +0900, Masashi Honma wrote:

At the fs/dlm/netlink.c#dlm_timeout_warn(),
prepare_data allocates buffer with GFP_NOFS
and send_data() sends the buffer.

But send_data() uses GFP_KERNEL or GFP_ATOMIC inside it.
Should it be replaced by GFP_NOFS ?

That's old code that's never been used so it doesn't really matter.


I see. Thank you.



Re: [GIT PULL net-next] rxrpc: Improve conn/call lookup and fix call number generation [ver #3]

2016-07-08 Thread David Miller
From: David Howells 
Date: Wed, 06 Jul 2016 11:48:15 +0100

> Hi Dave,
> 
> Can you pull this into net-next please?

I'll pull, but this is not how I want you to operate.

If you change stuff, you must repost the entire series.  And this is
one of many reasons I want people to keep patch sets small, so that
this is less painful.

But the whole series repost is absolutely required.

I don't care if you just add a "." to the end of a sentence, I want to
see the series reposted for review in it's entirety.

Thanks.


Re: [PATCH net] r8152: remove the setting of LAN_WAKE_EN

2016-07-08 Thread David Miller
From: Hayes Wang 
Date: Wed, 6 Jul 2016 17:03:29 +0800

> The LAN_WAKE_EN is not used to determine if the device could support
> WOL. It is used to sigal a GPIO pin when a WOL event occurs. The WOL
> still works even though it is disabled.
> 
> Signed-off-by: Hayes Wang 

Applied.


Re: [Patch net] ppp: defer netns reference release for ppp channel

2016-07-08 Thread David Miller
From: Cong Wang 
Date: Tue,  5 Jul 2016 22:12:36 -0700

> Matt reported that we have a NULL pointer dereference
> in ppp_pernet() from ppp_connect_channel(),
> i.e. pch->chan_net is NULL.
> 
> This is due to that a parallel ppp_unregister_channel()
> could happen while we are in ppp_connect_channel(), during
> which pch->chan_net set to NULL. Since we need a reference
> to net per channel, it makes sense to sync the refcnt
> with the life time of the channel, therefore we should
> release this reference when we destroy it.
> 
> Fixes: 1f461dcdd296 ("ppp: take reference on channels netns")
> Reported-by: Matt Bennett 
> Cc: Paul Mackerras 
> Cc: linux-...@vger.kernel.org
> Cc: Guillaume Nault 
> Cc: Cyrill Gorcunov 
> Signed-off-by: Cong Wang 

Applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe linux-ppp" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] net: mvneta: set real interrupt per packet for tx_done

2016-07-08 Thread David Miller
From: Marcin Wojtas 
Date: Wed,  6 Jul 2016 04:18:58 +0200

> From: Dmitri Epshtein 
> 
> Commit aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay") intended to
> set coalescing threshold to a value guaranteeing interrupt generation
> per each sent packet, so that buffers can be released with no delay.
> 
> In fact setting threshold to '1' was wrong, because it causes interrupt
> every two packets. According to the documentation a reason behind it is
> following - interrupt occurs once sent buffers counter reaches a value,
> which is higher than one specified in MVNETA_TXQ_SIZE_REG(q). This
> behavior was confirmed during tests. Also when testing the SoC working
> as a NAS device, better performance was observed with int-per-packet,
> as it strongly depends on the fact that all transmitted packets are
> released immediately.
> 
> This commit enables NETA controller work in interrupt per sent packet mode
> by setting coalescing threshold to 0.
> 
> Signed-off-by: Dmitri Epshtein 
> Signed-off-by: Marcin Wojtas 
> Cc:  # v3.10+
> Fixes aebea2ba0f74 ("net: mvneta: fix Tx interrupt delay")

Applied, thanks.


Re: [PATCH RESEND net-next] netvsc: Use the new in-place consumption APIs in the rx path

2016-07-08 Thread David Miller
From: k...@exchange.microsoft.com
Date: Tue,  5 Jul 2016 16:52:46 -0700

> From: K. Y. Srinivasan 
> 
> Use the new APIs for eliminating a copy on the receive path. These new APIs 
> also
> help in minimizing the number of memory barriers we end up issuing (in the
> ringbuffer code) since we can better control when we want to expose the ring
> state to the host.
> 
> The patch is being resent to address earlier email issues.
> 
> Signed-off-by: K. Y. Srinivasan 

Applied.


Re: [PATCH] net: ip_finish_output_gso: If skb_gso_network_seglen exceeds MTU, do segmentation even for non IPSKB_FORWARDED skbs

2016-07-08 Thread David Miller
From: Shmulik Ladkani 
Date: Tue, 5 Jul 2016 17:05:41 +0300

> On Tue, 5 Jul 2016 15:03:27 +0200, f...@strlen.de wrote:
>> (Or did I misunderstand this setup...?)
> 
> tap0 bridged with vxlan0.
> route to vxlan0's remote peer is via eth0, configured with small mtu.

Florian, any more comments?


Re: [PATCH -next] hfsc: reduce hfsc_sched to 14 cachelines

2016-07-08 Thread David Miller
From: Florian Westphal 
Date: Mon,  4 Jul 2016 16:22:20 +0200

> hfsc_sched is huge (size: 920, cachelines: 15), but we can get it to 14
> cachelines by placing level after filter_cnt (covering 4 byte hole) and
> reducing period/nactive/flags to u32 (period is just a counter,
> incremented when class becomes active -- 2**32 is plenty for this
> purpose, also, long is only 32bit wide on 32bit platforms anyway).
> 
> cl_vtperiod is exported to userspace via tc_hfsc_stats, but its period
> member is already u32, so no precision is lost there either.
> 
> Cc: Michal Soltys 
> Signed-off-by: Florian Westphal 

Applied, thanks.


Re: [PATCH net-next 0/4] net: cleanup for UDP tunnel's GRO

2016-07-08 Thread Alexander Duyck
On Fri, Jul 8, 2016 at 4:04 PM, Hannes Frederic Sowa
 wrote:
> On 08.07.2016 18:11, Alexander Duyck wrote:
>> On Fri, Jul 8, 2016 at 2:51 PM, Hannes Frederic Sowa
>>  wrote:
>>> On 08.07.2016 17:27, Alexander Duyck wrote:
 On Fri, Jul 8, 2016 at 1:57 PM, Hannes Frederic Sowa
  wrote:
> On 08.07.2016 16:17, Shmulik Ladkani wrote:
>> On Fri, 8 Jul 2016 09:21:40 -0700 Alexander Duyck 
>>  wrote:
>>> On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
 With udp tunnel offload in place, the kernel can do GRO for some udp 
 tunnels
 at the ingress device level. Currently both the geneve and the vxlan 
 drivers
 implement an additional GRO aggregation point via gro_cells.
 The latter takes effect for tunnels using zero checksum udp packets, 
 which are
 currently explicitly not aggregated by the udp offload layer.

 This patch series adapts the udp tunnel offload to process also zero 
 checksum
 udp packets, if the tunnel's socket allow it. Aggregation, if possible 
 is always
 performed at the ingress device level.

 Then the gro_cells hooks, in both vxlan and geneve driver are removed.
>>>
>>> I think removing the gro_cells hooks may be taking things one step too 
>>> far.
>>
>> +1
>>
>>> I get that there is an impression that it is redundant but there are a
>>> number of paths that could lead to VXLAN or GENEVE frames being
>>> received that are not aggregated via GRO.
>>
>> There's the case where the vxlan/geneve datagrams get IP fragmented, and
>> IP frags are not GROed.
>> GRO aggregation at the vxlan/geneve level is beneficial for this case.
>
> Isn't this a misconfiguration? TCP should not fragment at all, not even
> in vxlan/geneve if one cares about performance? And UDP is not GRO'ed
> anyway.

 The problem is that the DF bit of the outer header is what matters,
 not the inner one.  I believe the default for many of the UDP tunnels
 is to not set the DF bit on the outer header.  The fact is
 fragmentation shouldn't happen, but it can and we need to keep that in
 mind when we work on these sort of things.
>>>
>>> "Old" tunnel protocols inherit the outer DF bit from the inner one,
>>> geneve and vxlan do not. I think we simply never should set DF bit
>>> because vxlan pmtu with source port rss perturbation breaks pmtu
>>> discovery anyway. On the other hand doing so and not having end-to-end
>>> protection of the VNI because of proposed 0 checksum is also...
>>> otherwise at least the Ethernet FCS could protect the frame.
>>
>> The "Old" tunnel protocols can be configured to behave the same way as
>> well.  It is just that most of them default to a mode where they
>> support getting a path MTU if I recall correctly.
>
> That actually only happend very recently, about one month ago (for gre
> at least). The other (ipip) doesn't support that.
>
 There have been a number of cases in the past with tunnels where "it
 works for me" has been how things have been implemented and we need to
 avoid that as it creates a significant amount of pain for end users
 when they do things and they don't work because they have strayed off
 of the one use case that has been tested.
>>>
>>> We certainly don't want to break fragmentation with vxlan and this patch
>>> doesn't change so.
>>>
>>> I really do wonder if GRO on top of fragmentation does have any effect.
>>> Would be great if someone has data for that already?
>>
>> I think that logic is kind of backwards.  It is already there.
>> Instead of asking people to prove that this change is invalid the onus
>> should be on the submitter to prove the change causes no harm.
>
> Of course, sorry, I didn't want to make the impression others should do
> that. I asked because Shmulik made the impression on me he had
> experience with GRO+fragmentation on vxlan and/or geneve and could
> provide some data, maybe even just anecdotal.
>
>> The whole argument is kind of moot anyway based on the comment from
>> Tom.  This is based around being able to aggregate frames with a zero
>> checksum which we can already do, but it requires that the hardware
>> has validated the inner checksum.  What this patch set is doing is
>> trying to take frames that have no checksum and force us to perform a
>> checksum for the inner headers in software.  Really we don't do that
>> now for general GRO, why should we do it in the case of tunnels?  Also
>> I think there is a test escape for the case of a frame with an outer
>> checksum that was not validated by hardware.  In that case the
>> checksum isn't converted until you hit the UDP handler which will then
>> convert it to checksum complete and it then would rely on the GRO
>> cells to merge the frames after we have already validated the checksum
>> for the outer header.
>
> I do agree that To

Re: [PATCH net] udp: prevent bugcheck if filter truncates packet too much

2016-07-08 Thread Alexei Starovoitov
On Sat, Jul 09, 2016 at 01:31:40AM +0200, Eric Dumazet wrote:
> On Fri, 2016-07-08 at 17:52 +0200, Michal Kubecek wrote:
> > If socket filter truncates an udp packet below the length of UDP header
> > in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
> > BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
> > kernel is configured that way) can be easily enforced by an unprivileged
> > user which was reported as CVE-2016-6162. For a reproducer, see
> > http://seclists.org/oss-sec/2016/q3/8
> > 
> > Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
> > Reported-by: Marco Grassi 
> > Signed-off-by: Michal Kubecek 
> > ---
> >  net/ipv4/udp.c | 2 ++
> >  net/ipv6/udp.c | 2 ++
> >  2 files changed, 4 insertions(+)
> > 
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index ca5e8ea29538..4aed8fc23d32 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1583,6 +1583,8 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff 
> > *skb)
> >  
> > if (sk_filter(sk, skb))
> > goto drop;
> > +   if (unlikely(skb->len < sizeof(struct udphdr)))
> > +   goto drop;
> >  
> > udp_csum_pull_header(skb);
> > if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {
> > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> > index 005dc82c2138..acc09705618b 100644
> > --- a/net/ipv6/udp.c
> > +++ b/net/ipv6/udp.c
> > @@ -620,6 +620,8 @@ int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff 
> > *skb)
> >  
> > if (sk_filter(sk, skb))
> > goto drop;
> > +   if (unlikely(skb->len < sizeof(struct udphdr)))
> > +   goto drop;
> >  
> > udp_csum_pull_header(skb);
> > if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {
> 
> 
> Arg :(
> 
> Acked-by: Eric Dumazet 

this is incomplete fix. Please do not apply. See discussion at security@kernel



Re: [PATCH net-next 6/9] net: dsa: mv88e6xxx: rework Switch MAC setter

2016-07-08 Thread Vivien Didelot
Hi Andrew,

On Jul 7, 2016, at 9:52 AM, Andrew Lunn and...@lunn.ch wrote:

>> Also, note that this indirect access is a single-register which doesn't
>> require to wait for the operation to complete (like Switch MAC, Trunk
>> Mapping, etc.), in contrary to multi-registers indirect accesses with
>> several busy operations (like ATU, VTU, etc.).
> 
> Are you sure about this? The DSDT polls bit 15 of the register.

Every single-register operation (with an "Update" bit, "pointer" and "data"
bits) execute in a single write operation and doesn't need to wait for
completion.

But multiple-register operations like ATU, VTU, etc. with a "Busy" bit,
Operation bits and data registers, do require and explicitly mention to wait
for the operation to complete (by polling the busy bit or via interrupt).

We could add checks but it doesn't sound necessary, we are not doing it for
others Update operations and a badly set switch MAC address would be easily
identifiable.

Thanks,

Vivien


Re: [PATCH net] udp: prevent bugcheck if filter truncates packet too much

2016-07-08 Thread Eric Dumazet
On Fri, 2016-07-08 at 17:52 +0200, Michal Kubecek wrote:
> If socket filter truncates an udp packet below the length of UDP header
> in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
> BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
> kernel is configured that way) can be easily enforced by an unprivileged
> user which was reported as CVE-2016-6162. For a reproducer, see
> http://seclists.org/oss-sec/2016/q3/8
> 
> Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
> Reported-by: Marco Grassi 
> Signed-off-by: Michal Kubecek 
> ---
>  net/ipv4/udp.c | 2 ++
>  net/ipv6/udp.c | 2 ++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index ca5e8ea29538..4aed8fc23d32 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1583,6 +1583,8 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff 
> *skb)
>  
>   if (sk_filter(sk, skb))
>   goto drop;
> + if (unlikely(skb->len < sizeof(struct udphdr)))
> + goto drop;
>  
>   udp_csum_pull_header(skb);
>   if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 005dc82c2138..acc09705618b 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -620,6 +620,8 @@ int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff 
> *skb)
>  
>   if (sk_filter(sk, skb))
>   goto drop;
> + if (unlikely(skb->len < sizeof(struct udphdr)))
> + goto drop;
>  
>   udp_csum_pull_header(skb);
>   if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {


Arg :(

Acked-by: Eric Dumazet 

Thanks !




Re: [PATCH nf-next 3/3] netfilter: replace list_head with single linked list

2016-07-08 Thread Florian Westphal
Aaron Conole  wrote:
> --- a/net/netfilter/core.c
> +++ b/net/netfilter/core
[..]
> +#define nf_entry_dereference(e) \
> + rcu_dereference_protected(e, lockdep_is_held(&nf_hook_mutex))
>  
> -static struct list_head *nf_find_hook_list(struct net *net,
> -const struct nf_hook_ops *reg)
> +static struct nf_hook_entry *nf_find_hook_list(struct net *net,
> +const struct nf_hook_ops *reg)
>  {
> - struct list_head *hook_list = NULL;
> + struct nf_hook_entry *hook_list = NULL;
>  
>   if (reg->pf != NFPROTO_NETDEV)
> - hook_list = &net->nf.hooks[reg->pf][reg->hooknum];
> + hook_list = rcu_dereference(net->nf.hooks[reg->pf]
> + [reg->hooknum]);
>   else if (reg->hooknum == NF_NETDEV_INGRESS) {
>  #ifdef CONFIG_NETFILTER_INGRESS
>   if (reg->dev && dev_net(reg->dev) == net)
> - hook_list = ®->dev->nf_hooks_ingress;
> + hook_list =
> + rcu_dereference(reg->dev->nf_hooks_ingress);

Both of these should use nf_entry_dereference() to avoid the lockdep
splat reported by kbuild robot:

net/netfilter/core.c:75 suspicious rcu_dereference_check() usage!
2 locks held by swapper/1:
#0:  (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x20
#1:  (nf_hook_mutex){+.+...}, at: [] 
nf_register_net_hook+0xcb/0x240



Re: [PATCH] net: Fragment large datagrams even when IP_HDRINCL is set.

2016-07-08 Thread Alexey Kuznetsov
Hello!

I can tell why it has not been done initially.

Main problem was in IP options, which can be present in raw packet.
They have to be properly fragmented, some options are to be deleted
on fragments. Not that it is too complicated, it is just boring and ugly
and inconsistent with IP_HDRINCL logic.

So, it was done in logically consistent way: did you order IP_HDRINCL?
Please, then deal with fragments yourself.

Alexey


Re: [PATCH net-next 0/4] net: cleanup for UDP tunnel's GRO

2016-07-08 Thread Hannes Frederic Sowa
On 08.07.2016 18:11, Alexander Duyck wrote:
> On Fri, Jul 8, 2016 at 2:51 PM, Hannes Frederic Sowa
>  wrote:
>> On 08.07.2016 17:27, Alexander Duyck wrote:
>>> On Fri, Jul 8, 2016 at 1:57 PM, Hannes Frederic Sowa
>>>  wrote:
 On 08.07.2016 16:17, Shmulik Ladkani wrote:
> On Fri, 8 Jul 2016 09:21:40 -0700 Alexander Duyck 
>  wrote:
>> On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
>>> With udp tunnel offload in place, the kernel can do GRO for some udp 
>>> tunnels
>>> at the ingress device level. Currently both the geneve and the vxlan 
>>> drivers
>>> implement an additional GRO aggregation point via gro_cells.
>>> The latter takes effect for tunnels using zero checksum udp packets, 
>>> which are
>>> currently explicitly not aggregated by the udp offload layer.
>>>
>>> This patch series adapts the udp tunnel offload to process also zero 
>>> checksum
>>> udp packets, if the tunnel's socket allow it. Aggregation, if possible 
>>> is always
>>> performed at the ingress device level.
>>>
>>> Then the gro_cells hooks, in both vxlan and geneve driver are removed.
>>
>> I think removing the gro_cells hooks may be taking things one step too 
>> far.
>
> +1
>
>> I get that there is an impression that it is redundant but there are a
>> number of paths that could lead to VXLAN or GENEVE frames being
>> received that are not aggregated via GRO.
>
> There's the case where the vxlan/geneve datagrams get IP fragmented, and
> IP frags are not GROed.
> GRO aggregation at the vxlan/geneve level is beneficial for this case.

 Isn't this a misconfiguration? TCP should not fragment at all, not even
 in vxlan/geneve if one cares about performance? And UDP is not GRO'ed
 anyway.
>>>
>>> The problem is that the DF bit of the outer header is what matters,
>>> not the inner one.  I believe the default for many of the UDP tunnels
>>> is to not set the DF bit on the outer header.  The fact is
>>> fragmentation shouldn't happen, but it can and we need to keep that in
>>> mind when we work on these sort of things.
>>
>> "Old" tunnel protocols inherit the outer DF bit from the inner one,
>> geneve and vxlan do not. I think we simply never should set DF bit
>> because vxlan pmtu with source port rss perturbation breaks pmtu
>> discovery anyway. On the other hand doing so and not having end-to-end
>> protection of the VNI because of proposed 0 checksum is also...
>> otherwise at least the Ethernet FCS could protect the frame.
> 
> The "Old" tunnel protocols can be configured to behave the same way as
> well.  It is just that most of them default to a mode where they
> support getting a path MTU if I recall correctly.

That actually only happend very recently, about one month ago (for gre
at least). The other (ipip) doesn't support that.

>>> There have been a number of cases in the past with tunnels where "it
>>> works for me" has been how things have been implemented and we need to
>>> avoid that as it creates a significant amount of pain for end users
>>> when they do things and they don't work because they have strayed off
>>> of the one use case that has been tested.
>>
>> We certainly don't want to break fragmentation with vxlan and this patch
>> doesn't change so.
>>
>> I really do wonder if GRO on top of fragmentation does have any effect.
>> Would be great if someone has data for that already?
> 
> I think that logic is kind of backwards.  It is already there.
> Instead of asking people to prove that this change is invalid the onus
> should be on the submitter to prove the change causes no harm.

Of course, sorry, I didn't want to make the impression others should do
that. I asked because Shmulik made the impression on me he had
experience with GRO+fragmentation on vxlan and/or geneve and could
provide some data, maybe even just anecdotal.

> The whole argument is kind of moot anyway based on the comment from
> Tom.  This is based around being able to aggregate frames with a zero
> checksum which we can already do, but it requires that the hardware
> has validated the inner checksum.  What this patch set is doing is
> trying to take frames that have no checksum and force us to perform a
> checksum for the inner headers in software.  Really we don't do that
> now for general GRO, why should we do it in the case of tunnels?  Also
> I think there is a test escape for the case of a frame with an outer
> checksum that was not validated by hardware.  In that case the
> checksum isn't converted until you hit the UDP handler which will then
> convert it to checksum complete and it then would rely on the GRO
> cells to merge the frames after we have already validated the checksum
> for the outer header.

I do agree that Tom's comment changes things a little bit, but currently
I tend towards removing the check completely, which needs further
benchmark data. I wonder what difference it sh

Re: [PATCH] Revert "net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings"

2016-07-08 Thread Florian Fainelli
On 07/08/2016 03:54 PM, Philippe Reynes wrote:
> This reverts commit 4386f5662e63 ("net: ethernet: bcmgenet: use
> phy_ethtool_{get|set}_link_ksettings")
> 
> This patch is wrong, the function phy_ethtool_{get|set}_link_ksettings
> don't check if the device is running, but the driver bcmgenet need this
> check.
> 
> The function {get|set}_settings need to access the mdio bus, and this
> bus may only be used when the device is running. Otherwise, the clock
> is disable and a mdio access will fail.
> 
> Signed-off-by: Philippe Reynes 

Acked-by: Florian Fainelli 

Thanks!
-- 
Florian


[PATCH] Revert "net: ethernet: bcmgenet: use phy_ethtool_{get|set}_link_ksettings"

2016-07-08 Thread Philippe Reynes
This reverts commit 4386f5662e63 ("net: ethernet: bcmgenet: use
phy_ethtool_{get|set}_link_ksettings")

This patch is wrong, the function phy_ethtool_{get|set}_link_ksettings
don't check if the device is running, but the driver bcmgenet need this
check.

The function {get|set}_settings need to access the mdio bus, and this
bus may only be used when the device is running. Otherwise, the clock
is disable and a mdio access will fail.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/broadcom/genet/bcmgenet.c |   28 ++-
 1 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 76ed6df..8d4f849 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -450,6 +450,30 @@ static inline void bcmgenet_rdma_ring_writel(struct 
bcmgenet_priv *priv,
genet_dma_ring_regs[r]);
 }
 
+static int bcmgenet_get_settings(struct net_device *dev,
+struct ethtool_cmd *cmd)
+{
+   if (!netif_running(dev))
+   return -EINVAL;
+
+   if (!dev->phydev)
+   return -ENODEV;
+
+   return phy_ethtool_gset(dev->phydev, cmd);
+}
+
+static int bcmgenet_set_settings(struct net_device *dev,
+struct ethtool_cmd *cmd)
+{
+   if (!netif_running(dev))
+   return -EINVAL;
+
+   if (!dev->phydev)
+   return -ENODEV;
+
+   return phy_ethtool_sset(dev->phydev, cmd);
+}
+
 static int bcmgenet_set_rx_csum(struct net_device *dev,
netdev_features_t wanted)
 {
@@ -953,6 +977,8 @@ static struct ethtool_ops bcmgenet_ethtool_ops = {
.get_strings= bcmgenet_get_strings,
.get_sset_count = bcmgenet_get_sset_count,
.get_ethtool_stats  = bcmgenet_get_ethtool_stats,
+   .get_settings   = bcmgenet_get_settings,
+   .set_settings   = bcmgenet_set_settings,
.get_drvinfo= bcmgenet_get_drvinfo,
.get_link   = ethtool_op_get_link,
.get_msglevel   = bcmgenet_get_msglevel,
@@ -964,8 +990,6 @@ static struct ethtool_ops bcmgenet_ethtool_ops = {
.nway_reset = bcmgenet_nway_reset,
.get_coalesce   = bcmgenet_get_coalesce,
.set_coalesce   = bcmgenet_set_coalesce,
-   .get_link_ksettings = phy_ethtool_get_link_ksettings,
-   .set_link_ksettings = phy_ethtool_set_link_ksettings,
 };
 
 /* Power down the unimac, based on mode. */
-- 
1.7.4.4



Re: [PATCH v2 0/6] net: ethernet: bgmac: Add platform device support

2016-07-08 Thread David Miller
From: Jon Mason 
Date: Fri, 8 Jul 2016 11:52:42 -0400

> Oops.  I didn't send out the 7th patch in this series.  Sending out
> shortly as 7/7.

Please don't do things like this.

If there is a mistake, always resend the entire series as a completely
new set of postings.

That way there is never any ambiguity about anything, people don't
have to search a complicated set of threads to figure out what this
special new posting means, etc.


Re: [PATCH] net: Fragment large datagrams even when IP_HDRINCL is set.

2016-07-08 Thread David Miller
From: Paul Jakma 
Date: Fri, 8 Jul 2016 13:55:11 +0100 (BST)

> On Wed, 15 Jun 2016, Alan Davey wrote:
> 
>> The only case that would break is that where an application relies on
>> the existing (documented as a bug) feature of getting an EMSGSIZE
>> return code in the case of an over-sized packet.  Applications that
>> perform their own fragmentation would be unaffected.
> 
> If this doesn't break existing applications that are doing
> fragmentation in userspace on raw sockets (e.g. Quagga ospfd), that's
> better.
> 
> As per previous email, I'd love to be able to get rid of that code and
> have the kernel do it for me. However, I also don't want to have to do
> anything other non-trivial to that code either. :)
> 
> The issue for us is, how would we know on any given host whether the
> kernel will do the fragmentation or whether ospfd has to do it? We
> need to be able to probe for that capability, surely?

The fact is, regardless of whether you could probe for the capability
or not, you have to keep the fragmentation code around forever.

And that is yet another reason I do not want to add this change at all.

It doesn't make any existing server any simpler, in fact it makes them
all more complicated because not only do they keep the fragmentation
code, they also get new logic to test for the feature that would allow
them to avoid using it.

Sorry, there is no way I am adding this, it's a net lose.


Re: [PATCH net-next 0/4] net: cleanup for UDP tunnel's GRO

2016-07-08 Thread Alexander Duyck
On Fri, Jul 8, 2016 at 2:51 PM, Hannes Frederic Sowa
 wrote:
> On 08.07.2016 17:27, Alexander Duyck wrote:
>> On Fri, Jul 8, 2016 at 1:57 PM, Hannes Frederic Sowa
>>  wrote:
>>> On 08.07.2016 16:17, Shmulik Ladkani wrote:
 On Fri, 8 Jul 2016 09:21:40 -0700 Alexander Duyck 
  wrote:
> On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
>> With udp tunnel offload in place, the kernel can do GRO for some udp 
>> tunnels
>> at the ingress device level. Currently both the geneve and the vxlan 
>> drivers
>> implement an additional GRO aggregation point via gro_cells.
>> The latter takes effect for tunnels using zero checksum udp packets, 
>> which are
>> currently explicitly not aggregated by the udp offload layer.
>>
>> This patch series adapts the udp tunnel offload to process also zero 
>> checksum
>> udp packets, if the tunnel's socket allow it. Aggregation, if possible 
>> is always
>> performed at the ingress device level.
>>
>> Then the gro_cells hooks, in both vxlan and geneve driver are removed.
>
> I think removing the gro_cells hooks may be taking things one step too 
> far.

 +1

> I get that there is an impression that it is redundant but there are a
> number of paths that could lead to VXLAN or GENEVE frames being
> received that are not aggregated via GRO.

 There's the case where the vxlan/geneve datagrams get IP fragmented, and
 IP frags are not GROed.
 GRO aggregation at the vxlan/geneve level is beneficial for this case.
>>>
>>> Isn't this a misconfiguration? TCP should not fragment at all, not even
>>> in vxlan/geneve if one cares about performance? And UDP is not GRO'ed
>>> anyway.
>>
>> The problem is that the DF bit of the outer header is what matters,
>> not the inner one.  I believe the default for many of the UDP tunnels
>> is to not set the DF bit on the outer header.  The fact is
>> fragmentation shouldn't happen, but it can and we need to keep that in
>> mind when we work on these sort of things.
>
> "Old" tunnel protocols inherit the outer DF bit from the inner one,
> geneve and vxlan do not. I think we simply never should set DF bit
> because vxlan pmtu with source port rss perturbation breaks pmtu
> discovery anyway. On the other hand doing so and not having end-to-end
> protection of the VNI because of proposed 0 checksum is also...
> otherwise at least the Ethernet FCS could protect the frame.

The "Old" tunnel protocols can be configured to behave the same way as
well.  It is just that most of them default to a mode where they
support getting a path MTU if I recall correctly.

>> There have been a number of cases in the past with tunnels where "it
>> works for me" has been how things have been implemented and we need to
>> avoid that as it creates a significant amount of pain for end users
>> when they do things and they don't work because they have strayed off
>> of the one use case that has been tested.
>
> We certainly don't want to break fragmentation with vxlan and this patch
> doesn't change so.
>
> I really do wonder if GRO on top of fragmentation does have any effect.
> Would be great if someone has data for that already?

I think that logic is kind of backwards.  It is already there.
Instead of asking people to prove that this change is invalid the onus
should be on the submitter to prove the change causes no harm.

The whole argument is kind of moot anyway based on the comment from
Tom.  This is based around being able to aggregate frames with a zero
checksum which we can already do, but it requires that the hardware
has validated the inner checksum.  What this patch set is doing is
trying to take frames that have no checksum and force us to perform a
checksum for the inner headers in software.  Really we don't do that
now for general GRO, why should we do it in the case of tunnels?  Also
I think there is a test escape for the case of a frame with an outer
checksum that was not validated by hardware.  In that case the
checksum isn't converted until you hit the UDP handler which will then
convert it to checksum complete and it then would rely on the GRO
cells to merge the frames after we have already validated the checksum
for the outer header.

- Alex


Re: [PATCH net-next 0/4] net: cleanup for UDP tunnel's GRO

2016-07-08 Thread Hannes Frederic Sowa
On 08.07.2016 17:19, Shmulik Ladkani wrote:
> On Fri, 8 Jul 2016 16:57:10 -0400 Hannes Frederic Sowa 
>  wrote:
>> On 08.07.2016 16:17, Shmulik Ladkani wrote:
>>> On Fri, 8 Jul 2016 09:21:40 -0700 Alexander Duyck 
>>>  wrote:  
 I get that there is an impression that it is redundant but there are a
 number of paths that could lead to VXLAN or GENEVE frames being
 received that are not aggregated via GRO.  
>>>
>>> There's the case where the vxlan/geneve datagrams get IP fragmented, and
>>> IP frags are not GROed.
>>> GRO aggregation at the vxlan/geneve level is beneficial for this case.  
>>
>> Isn't this a misconfiguration? TCP should not fragment at all, not even
>> in vxlan/geneve if one cares about performance? And UDP is not GRO'ed
>> anyway.
> 
> It's not an ideal configuration, but it is a valid one.
> 
> Imagine TCP within vxlan/geneve, that gets properly segmented and
> encapsulated.
> 
> The vxlan/geneve datagrams go out the wire, and these can occasionally
> be fragmented on the way (e.g. when we can't control the MTUs along the
> path, or if unable to use PMTUD for whatever reason).

PMTUD doesn't work with vxlan in most situations anyway. But you can
still control the mtu/mss with ip route and you don't need to modify the
mid-hosts.

> At the receiving vxlan/geneve termination, these IP frags are not GROed.
> 
> Instead they get reassembled by the IP stack, then handed to UDP and to
> the vxlan/geneve drivers.
> 
> From that point, GROing at the vxlan/geneve device, which aggregates
> the TCP segments into a TCP super-packet still make sense and has
> benefits.

Given the spreading with which those fragments will be send, I wonder if
the GRO ontop of the tunnels will really aggregate them?

Bye,
Hannes




Re: [PATCH net-next 0/4] net: cleanup for UDP tunnel's GRO

2016-07-08 Thread Hannes Frederic Sowa
On 08.07.2016 17:27, Alexander Duyck wrote:
> On Fri, Jul 8, 2016 at 1:57 PM, Hannes Frederic Sowa
>  wrote:
>> On 08.07.2016 16:17, Shmulik Ladkani wrote:
>>> On Fri, 8 Jul 2016 09:21:40 -0700 Alexander Duyck 
>>>  wrote:
 On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
> With udp tunnel offload in place, the kernel can do GRO for some udp 
> tunnels
> at the ingress device level. Currently both the geneve and the vxlan 
> drivers
> implement an additional GRO aggregation point via gro_cells.
> The latter takes effect for tunnels using zero checksum udp packets, 
> which are
> currently explicitly not aggregated by the udp offload layer.
>
> This patch series adapts the udp tunnel offload to process also zero 
> checksum
> udp packets, if the tunnel's socket allow it. Aggregation, if possible is 
> always
> performed at the ingress device level.
>
> Then the gro_cells hooks, in both vxlan and geneve driver are removed.

 I think removing the gro_cells hooks may be taking things one step too far.
>>>
>>> +1
>>>
 I get that there is an impression that it is redundant but there are a
 number of paths that could lead to VXLAN or GENEVE frames being
 received that are not aggregated via GRO.
>>>
>>> There's the case where the vxlan/geneve datagrams get IP fragmented, and
>>> IP frags are not GROed.
>>> GRO aggregation at the vxlan/geneve level is beneficial for this case.
>>
>> Isn't this a misconfiguration? TCP should not fragment at all, not even
>> in vxlan/geneve if one cares about performance? And UDP is not GRO'ed
>> anyway.
> 
> The problem is that the DF bit of the outer header is what matters,
> not the inner one.  I believe the default for many of the UDP tunnels
> is to not set the DF bit on the outer header.  The fact is
> fragmentation shouldn't happen, but it can and we need to keep that in
> mind when we work on these sort of things.

"Old" tunnel protocols inherit the outer DF bit from the inner one,
geneve and vxlan do not. I think we simply never should set DF bit
because vxlan pmtu with source port rss perturbation breaks pmtu
discovery anyway. On the other hand doing so and not having end-to-end
protection of the VNI because of proposed 0 checksum is also...
otherwise at least the Ethernet FCS could protect the frame.

> There have been a number of cases in the past with tunnels where "it
> works for me" has been how things have been implemented and we need to
> avoid that as it creates a significant amount of pain for end users
> when they do things and they don't work because they have strayed off
> of the one use case that has been tested.

We certainly don't want to break fragmentation with vxlan and this patch
doesn't change so.

I really do wonder if GRO on top of fragmentation does have any effect.
Would be great if someone has data for that already?

Bye,
Hannes



Re: [PATCH net-next 0/4] net: cleanup for UDP tunnel's GRO

2016-07-08 Thread Alexander Duyck
On Fri, Jul 8, 2016 at 1:57 PM, Hannes Frederic Sowa
 wrote:
> On 08.07.2016 16:17, Shmulik Ladkani wrote:
>> On Fri, 8 Jul 2016 09:21:40 -0700 Alexander Duyck 
>>  wrote:
>>> On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
 With udp tunnel offload in place, the kernel can do GRO for some udp 
 tunnels
 at the ingress device level. Currently both the geneve and the vxlan 
 drivers
 implement an additional GRO aggregation point via gro_cells.
 The latter takes effect for tunnels using zero checksum udp packets, which 
 are
 currently explicitly not aggregated by the udp offload layer.

 This patch series adapts the udp tunnel offload to process also zero 
 checksum
 udp packets, if the tunnel's socket allow it. Aggregation, if possible is 
 always
 performed at the ingress device level.

 Then the gro_cells hooks, in both vxlan and geneve driver are removed.
>>>
>>> I think removing the gro_cells hooks may be taking things one step too far.
>>
>> +1
>>
>>> I get that there is an impression that it is redundant but there are a
>>> number of paths that could lead to VXLAN or GENEVE frames being
>>> received that are not aggregated via GRO.
>>
>> There's the case where the vxlan/geneve datagrams get IP fragmented, and
>> IP frags are not GROed.
>> GRO aggregation at the vxlan/geneve level is beneficial for this case.
>
> Isn't this a misconfiguration? TCP should not fragment at all, not even
> in vxlan/geneve if one cares about performance? And UDP is not GRO'ed
> anyway.

The problem is that the DF bit of the outer header is what matters,
not the inner one.  I believe the default for many of the UDP tunnels
is to not set the DF bit on the outer header.  The fact is
fragmentation shouldn't happen, but it can and we need to keep that in
mind when we work on these sort of things.

There have been a number of cases in the past with tunnels where "it
works for me" has been how things have been implemented and we need to
avoid that as it creates a significant amount of pain for end users
when they do things and they don't work because they have strayed off
of the one use case that has been tested.

- Alex


Re: [PATCH net-next 0/4] net: cleanup for UDP tunnel's GRO

2016-07-08 Thread Shmulik Ladkani
On Fri, 8 Jul 2016 16:57:10 -0400 Hannes Frederic Sowa 
 wrote:
> On 08.07.2016 16:17, Shmulik Ladkani wrote:
> > On Fri, 8 Jul 2016 09:21:40 -0700 Alexander Duyck 
> >  wrote:  
> >> I get that there is an impression that it is redundant but there are a
> >> number of paths that could lead to VXLAN or GENEVE frames being
> >> received that are not aggregated via GRO.  
> > 
> > There's the case where the vxlan/geneve datagrams get IP fragmented, and
> > IP frags are not GROed.
> > GRO aggregation at the vxlan/geneve level is beneficial for this case.  
> 
> Isn't this a misconfiguration? TCP should not fragment at all, not even
> in vxlan/geneve if one cares about performance? And UDP is not GRO'ed
> anyway.

It's not an ideal configuration, but it is a valid one.

Imagine TCP within vxlan/geneve, that gets properly segmented and
encapsulated.

The vxlan/geneve datagrams go out the wire, and these can occasionally
be fragmented on the way (e.g. when we can't control the MTUs along the
path, or if unable to use PMTUD for whatever reason).

At the receiving vxlan/geneve termination, these IP frags are not GROed.

Instead they get reassembled by the IP stack, then handed to UDP and to
the vxlan/geneve drivers.

>From that point, GROing at the vxlan/geneve device, which aggregates
the TCP segments into a TCP super-packet still make sense and has
benefits.

Regards,
Shmulik


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-08 Thread Roland Dreier
On Fri, Jul 8, 2016 at 9:51 AM, Jason Gunthorpe
 wrote:
> So, it appears, the dst and neigh can be used for all performances cases.
>
> For the non performance dst == null case, can we just burn cycles and
> stuff the daddr in front of the packet at hardheader time, even if we
> have to copy?

OK, sounds interesting.

Unfortunately the scope of this work has gotten to the point where I
can't take it on right now.  My system is running 4.4.y for now
(before struct skb_gso_cb grew) so I think shrinking struct skb_gso_cb
to 8 bytes plus changing SKB_SGO_CB_OFFSET to 20 will work for now.
Hope someone is able to come up with a real fix before I need to
upgrade to 4.10.y...

 - R.


[PATCH net] tcp_timer.c: Add kernel-doc function descriptions

2016-07-08 Thread Richard Sailer
This adds kernel-doc style descriptions for 6 functions and
fixes 1 typo.

Signed-off-by: Richard Sailer 
---
 net/ipv4/tcp_timer.c | 66 +---
 1 file changed, 57 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index debdd8b..bdccd67 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -24,6 +24,13 @@
 
 int sysctl_tcp_thin_linear_timeouts __read_mostly;
 
+/**
+ *   tcp_write_err() - close socket and save error info.
+ *   @sk:  The socket the error has appeared on.
+ *
+ *   Returns: Nothing (void)
+ */
+
 static void tcp_write_err(struct sock *sk)
 {
sk->sk_err = sk->sk_err_soft ? : ETIMEDOUT;
@@ -33,7 +40,12 @@ static void tcp_write_err(struct sock *sk)
__NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPABORTONTIMEOUT);
 }
 
-/* Do not allow orphaned sockets to eat all our resources.
+/**
+ * tcp_out_of_resources() - Close socket if out of resources
+ * @sk:pointer to current socket
+ * @do_reset:  send a last packet with reset flag
+ *
+ * Do not allow orphaned sockets to eat all our resources.
  * This is direct violation of TCP specs, but it is required
  * to prevent DoS attacks. It is called when a retransmission timeout
  * or zero probe timeout occurs on orphaned socket.
@@ -74,7 +86,12 @@ static int tcp_out_of_resources(struct sock *sk, bool 
do_reset)
return 0;
 }
 
-/* Calculate maximal number or retries on an orphaned socket. */
+/**
+ * tcp_orphan_retries() - Returns maximal number of retries on an orphaned 
socket
+ * @sk:Pointer to the current socket.
+ * @alive: bool, socket alive state
+ *
+ */
 static int tcp_orphan_retries(struct sock *sk, bool alive)
 {
int retries = sock_net(sk)->ipv4.sysctl_tcp_orphan_retries; /* May be 
zero. */
@@ -115,10 +132,22 @@ static void tcp_mtu_probing(struct inet_connection_sock 
*icsk, struct sock *sk)
}
 }
 
-/* This function calculates a "timeout" which is equivalent to the timeout of a
- * TCP connection after "boundary" unsuccessful, exponentially backed-off
+
+/**
+ *  retransmits_timed_out() - returns true if this connection has timed out
+ *  @sk:   The current socket
+ *  @boundary: max number of retransmissions
+ *  @timeout:  A custom timeout value.
+ * If set to 0 the default timeout is calculated and used.
+ * Using TCP_RTO_MIN and the number of unsuccessful retransmits.
+ *  @syn_set:  true if the SYN Bit was set.
+ *
+ * The default "timeout" value this function can calculate and use
+ * is equivalent to the timeout of a TCP Connection
+ * after "boundary" unsuccessful, exponentially backed-off
  * retransmissions with an initial RTO of TCP_RTO_MIN or TCP_TIMEOUT_INIT if
  * syn_set flag is set.
+ *
  */
 static bool retransmits_timed_out(struct sock *sk,
  unsigned int boundary,
@@ -257,6 +286,16 @@ out:
sk_mem_reclaim(sk);
 }
 
+
+/**
+ * tcp_delack_timer() - The TCP delayed ACK timeout handler.
+ * @data:  Pointer to the current socket. (gets casted to struct sock *)
+ *
+ * This function gets (indirectly) called when the kernel timer for a TCP 
packet
+ * of this socket expires. Calls tcp_delack_timer_handler() to do the actual 
work.
+ *
+ * Returns: Nothing (void)
+ */
 static void tcp_delack_timer(unsigned long data)
 {
struct sock *sk = (struct sock *)data;
@@ -350,10 +389,18 @@ static void tcp_fastopen_synack_timer(struct sock *sk)
  TCP_TIMEOUT_INIT << req->num_timeout, TCP_RTO_MAX);
 }
 
-/*
- * The TCP retransmit timer.
- */
 
+/**
+ * tcp_retransmit_timer() - The TCP retransmit timout handler.
+ * @sk:  Pointer to the current socket.
+ *
+ * This function gets called when the kernel timer for a TCP packet
+ * of this socket expires.
+ *
+ * It handles retransmission, timer adjustment and other necesarry measures.
+ *
+ * Returns: Nothing (void)
+ */
 void tcp_retransmit_timer(struct sock *sk)
 {
struct tcp_sock *tp = tcp_sk(sk);
@@ -494,7 +541,8 @@ out_reset_timer:
 out:;
 }
 
-/* Called with BH disabled */
+/* Called with bottom-half processing disabled.
+   Called by tcp_write_timer() */
 void tcp_write_timer_handler(struct sock *sk)
 {
struct inet_connection_sock *icsk = inet_csk(sk);
@@ -539,7 +587,7 @@ static void tcp_write_timer(unsigned long data)
if (!sock_owned_by_user(sk)) {
tcp_write_timer_handler(sk);
} else {
-   /* deleguate our work to tcp_release_cb() */
+   /* delegate our work to tcp_release_cb() */
if (!test_and_set_bit(TCP_WRITE_TIMER_DEFERRED, 
&tcp_sk(sk)->tsq_flags))
sock_hold(sk);
}
-- 
2.8.1



Re: [PATCH net-next 2/4] udp offload: allow GRO on 0 checksum packets

2016-07-08 Thread Tom Herbert
On Thu, Jul 7, 2016 at 10:58 AM, Paolo Abeni  wrote:
> currently, UDP packets with zero checksum are not allowed to
> use udp offload's GRO. This patch admits such packets to
> GRO, if the related socket settings allow it: ipv6 packets
> are not admitted if the sockets don't have the no_check6_rx
> flag set.
>
> Signed-off-by: Paolo Abeni 
> ---
>  net/ipv4/udp_offload.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 9c37338..ac783f4 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -257,7 +257,7 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, 
> struct sk_buff *skb,
> struct sock *sk;
>
> if (NAPI_GRO_CB(skb)->encap_mark ||
> -   (skb->ip_summed != CHECKSUM_PARTIAL &&
> +   (uh->check && skb->ip_summed != CHECKSUM_PARTIAL &&
>  NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>  !NAPI_GRO_CB(skb)->csum_valid))

Paolo,

I think you might be misunderstanding the intent of this conditional.
It is trying to deduce that that the inner TCP checksum has likely
been validated or can be validated without doing  packet checksum
calculation. This is trying to avoid doing host side checksum
calculation in the GRO path and really has little to do with rather
uh->check is zero or not. The assumption was that we shouldn't compute
whole packet checksums in the GRO path because of performance. If this
assumption is no longer valid (i.e. there's good data saying doing
checksums in GRO path is a benefit) then all the checksum parts of
this conditional should be removed.

Tom

> goto out;
> @@ -271,6 +271,10 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, 
> struct sk_buff *skb,
> if (!sk || !udp_sk(sk)->gro_receive)
> goto out_unlock;
>
> +   if (!uh->check && skb->protocol == cpu_to_be16(ETH_P_IPV6) &&
> +   !udp_sk(sk)->no_check6_rx)
> +   goto out_unlock;
> +
> flush = 0;
>
> for (p = *head; p; p = p->next) {
> --
> 1.8.3.1
>


Re: [PATCH net-next 0/4] net: cleanup for UDP tunnel's GRO

2016-07-08 Thread Hannes Frederic Sowa
On 08.07.2016 16:17, Shmulik Ladkani wrote:
> On Fri, 8 Jul 2016 09:21:40 -0700 Alexander Duyck  
> wrote:
>> On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
>>> With udp tunnel offload in place, the kernel can do GRO for some udp tunnels
>>> at the ingress device level. Currently both the geneve and the vxlan drivers
>>> implement an additional GRO aggregation point via gro_cells.
>>> The latter takes effect for tunnels using zero checksum udp packets, which 
>>> are
>>> currently explicitly not aggregated by the udp offload layer.
>>>
>>> This patch series adapts the udp tunnel offload to process also zero 
>>> checksum
>>> udp packets, if the tunnel's socket allow it. Aggregation, if possible is 
>>> always
>>> performed at the ingress device level.
>>>
>>> Then the gro_cells hooks, in both vxlan and geneve driver are removed.  
>>
>> I think removing the gro_cells hooks may be taking things one step too far.
> 
> +1
> 
>> I get that there is an impression that it is redundant but there are a
>> number of paths that could lead to VXLAN or GENEVE frames being
>> received that are not aggregated via GRO.
> 
> There's the case where the vxlan/geneve datagrams get IP fragmented, and
> IP frags are not GROed.
> GRO aggregation at the vxlan/geneve level is beneficial for this case.

Isn't this a misconfiguration? TCP should not fragment at all, not even
in vxlan/geneve if one cares about performance? And UDP is not GRO'ed
anyway.

Bye,
Hannes



Re: [PATCH net-next 0/2] net: dsa: b53: Add Broadcom NSP switch support

2016-07-08 Thread Andrew Lunn
On Fri, Jul 08, 2016 at 11:39:11AM -0700, Florian Fainelli wrote:
> Hi all,
> 
> This patch series updates the B53 driver to support Broadcom's Northstar Plus
> Soc integrated switch.

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net-next 0/4] net: cleanup for UDP tunnel's GRO

2016-07-08 Thread Shmulik Ladkani
On Fri, 8 Jul 2016 09:21:40 -0700 Alexander Duyck  
wrote:
> On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
> > With udp tunnel offload in place, the kernel can do GRO for some udp tunnels
> > at the ingress device level. Currently both the geneve and the vxlan drivers
> > implement an additional GRO aggregation point via gro_cells.
> > The latter takes effect for tunnels using zero checksum udp packets, which 
> > are
> > currently explicitly not aggregated by the udp offload layer.
> >
> > This patch series adapts the udp tunnel offload to process also zero 
> > checksum
> > udp packets, if the tunnel's socket allow it. Aggregation, if possible is 
> > always
> > performed at the ingress device level.
> >
> > Then the gro_cells hooks, in both vxlan and geneve driver are removed.  
> 
> I think removing the gro_cells hooks may be taking things one step too far.

+1

> I get that there is an impression that it is redundant but there are a
> number of paths that could lead to VXLAN or GENEVE frames being
> received that are not aggregated via GRO.

There's the case where the vxlan/geneve datagrams get IP fragmented, and
IP frags are not GROed.
GRO aggregation at the vxlan/geneve level is beneficial for this case.

Regards,
Shmulik


Re: [PATCH 0/2] ARM: dts: NSP: Add built-in Ethernet switch nodes

2016-07-08 Thread Andrew Lunn
On Fri, Jul 08, 2016 at 11:49:27AM -0700, Florian Fainelli wrote:
> This patch series is based on Broadcom/stblinux/devicetree/next which
> contains proper support for the BCM958625HR board. To get working
> Ethernet switch and CPU Ethernet support, the following dependencies
> based on David Miller's net-next tree are required:

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH] bnxt_en: initialize rc to zero to avoid returning garbage

2016-07-08 Thread Michael Chan
On Fri, Jul 8, 2016 at 8:42 AM, Colin King  wrote:
> From: Colin Ian King 
>
> rc is not initialized so it can contain garbage if it is not
> set by the call to bnxt_read_sfp_module_eeprom_info. Ensure
> garbage is not returned by initializing rc to 0.
>
> Signed-off-by: Colin Ian King 

Thanks.

Acked-by: Michael Chan 


Re: [PATCH 0/2] ARM: dts: NSP: Add built-in Ethernet switch nodes

2016-07-08 Thread Florian Fainelli
On 07/08/2016 11:48 AM, Florian Fainelli wrote:
> This patch series is based on Broadcom/stblinux/devicetree/next which
> contains proper support for the BCM958625HR board. To get working
> Ethernet switch and CPU Ethernet support, the following dependencies
> based on David Miller's net-next tree are required:
> 
> - Jon Mason's BGMAC/AMAC support: https://marc.info/?t=14679330832&r=1&w=3
> - dsa/b53 support for NSP switch: 
> https://marc.info/?l=linux-netdev&m=146800324531914&w=3

Please ignore this series, it contained patches from a previous
submission, see this one instead:
1468003769-26959-1-git-send-email-f.faine...@gmail.com>

> 
> Florian Fainelli (2):
>   ARM: dts: NSP: Add Switch Register Access Block node
>   ARM: dts: NSP: Add BCM958625HR switch ports
> 
>  arch/arm/boot/dts/bcm-nsp.dtsi| 11 +
>  arch/arm/boot/dts/bcm958625hr.dts | 49 
> +++
>  2 files changed, 60 insertions(+)
> 


-- 
Florian


[PATCH 2/2] ARM: dts: NSP: Add BCM958625HR switch ports

2016-07-08 Thread Florian Fainelli
Add the layout of the switch ports found on the BCM958625HR reference
board. The CPU port is hooked up to the AMAC0 Ethernet controlelr
adapter, so we also enable it.

Signed-off-by: Florian Fainelli 
---
 arch/arm/boot/dts/bcm958625hr.dts | 49 +++
 1 file changed, 49 insertions(+)

diff --git a/arch/arm/boot/dts/bcm958625hr.dts 
b/arch/arm/boot/dts/bcm958625hr.dts
index 03b8bbeb694f..4239e58cf97f 100644
--- a/arch/arm/boot/dts/bcm958625hr.dts
+++ b/arch/arm/boot/dts/bcm958625hr.dts
@@ -109,3 +109,52 @@
groups = "nand_grp";
};
 };
+
+&amac0 {
+   status = "okay";
+};
+
+&srab {
+   compatible = "brcm,bcm58625-srab", "brcm,nsp-srab";
+   status = "okay";
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   port@0 {
+   label = "port0";
+   reg = <0>;
+   };
+
+   port@1 {
+   label = "port1";
+   reg = <1>;
+   };
+
+   port@2 {
+   label = "port2";
+   reg = <2>;
+   };
+
+   port@3 {
+   label = "port3";
+   reg = <3>;
+   };
+
+   port@4 {
+   label = "port4";
+   reg = <4>;
+   };
+
+   port@5 {
+   ethernet = <&amac0>;
+   label = "cpu";
+   reg = <5>;
+   fixed-link {
+   speed = <1000>;
+   full-duplex;
+   };
+   };
+   };
+};
-- 
2.7.4



[PATCH 1/2] ARM: dts: NSP: Add Switch Register Access Block node

2016-07-08 Thread Florian Fainelli
Add the Switch Register Access Block node, this peripheral is identical
to the BCM5301x Northstar SoC, but we utilize the SoC-wide
"brcm,nsp-srab" compatible string to illustrate the integration
difference here.

Signed-off-by: Florian Fainelli 
---
 arch/arm/boot/dts/bcm-nsp.dtsi | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/boot/dts/bcm-nsp.dtsi b/arch/arm/boot/dts/bcm-nsp.dtsi
index 8d7b35a4b5f1..983fdba905e3 100644
--- a/arch/arm/boot/dts/bcm-nsp.dtsi
+++ b/arch/arm/boot/dts/bcm-nsp.dtsi
@@ -241,6 +241,17 @@
clock-names = "apb_pclk";
};
 
+   srab: srab@36000 {
+   compatible = "brcm,nsp-srab";
+   reg = <0x36000 0x1000>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   status = "disabled";
+
+   /* ports are defined in board DTS */
+   };
+
i2c0: i2c@38000 {
compatible = "brcm,iproc-i2c";
reg = <0x38000 0x50>;
-- 
2.7.4



[PATCH 1/2] ARM: dts: NSP: Add Switch Register Access Block node

2016-07-08 Thread Florian Fainelli
Add the Switch Register Access Block node, this peripheral is identical
to the BCM5301x Northstar SoC, but we utilize the SoC-wide
"brcm,nsp-srab" compatible string to illustrate the integration
difference here.

Signed-off-by: Florian Fainelli 
---
 arch/arm/boot/dts/bcm-nsp.dtsi | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm/boot/dts/bcm-nsp.dtsi b/arch/arm/boot/dts/bcm-nsp.dtsi
index 8d7b35a4b5f1..983fdba905e3 100644
--- a/arch/arm/boot/dts/bcm-nsp.dtsi
+++ b/arch/arm/boot/dts/bcm-nsp.dtsi
@@ -241,6 +241,17 @@
clock-names = "apb_pclk";
};
 
+   srab: srab@36000 {
+   compatible = "brcm,nsp-srab";
+   reg = <0x36000 0x1000>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   status = "disabled";
+
+   /* ports are defined in board DTS */
+   };
+
i2c0: i2c@38000 {
compatible = "brcm,iproc-i2c";
reg = <0x38000 0x50>;
-- 
2.7.4



[PATCH 0/2] ARM: dts: NSP: Add built-in Ethernet switch nodes

2016-07-08 Thread Florian Fainelli
This patch series is based on Broadcom/stblinux/devicetree/next which
contains proper support for the BCM958625HR board. To get working
Ethernet switch and CPU Ethernet support, the following dependencies
based on David Miller's net-next tree are required:

- Jon Mason's BGMAC/AMAC support: https://marc.info/?t=14679330832&r=1&w=3
- dsa/b53 support for NSP switch: 
https://marc.info/?l=linux-netdev&m=146800324531914&w=3

Florian Fainelli (2):
  ARM: dts: NSP: Add Switch Register Access Block node
  ARM: dts: NSP: Add BCM958625HR switch ports

 arch/arm/boot/dts/bcm-nsp.dtsi| 11 +
 arch/arm/boot/dts/bcm958625hr.dts | 49 +++
 2 files changed, 60 insertions(+)

-- 
2.7.4



[PATCH 1/3] ARM: dts: Enable SRAB switch and GMACs on 5301x DTS

2016-07-08 Thread Florian Fainelli
Add the Switch Register Access Block which is a special piece of
hardware allowing us to perform indirect read/writes towards the
integrated BCM5301X Ethernet switch.

We also add the 4 Gigabit MAC Device Tree nodes within the brcm,bus-axi
bus node to get proper binding between the BCMA instantiated core and
the Device Tree nodes. We will need that to be able to reference
Ethernet Device Tree nodes in a future patch adding the switch ports
layout.

Signed-off-by: Florian Fainelli 
---
 arch/arm/boot/dts/bcm5301x.dtsi | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/arch/arm/boot/dts/bcm5301x.dtsi b/arch/arm/boot/dts/bcm5301x.dtsi
index 7d4d29bf0ed3..9fb565841004 100644
--- a/arch/arm/boot/dts/bcm5301x.dtsi
+++ b/arch/arm/boot/dts/bcm5301x.dtsi
@@ -239,6 +239,22 @@
status = "disabled";
};
};
+
+   gmac0: ethernet@24000 {
+   reg = <0x24000 0x800>;
+   };
+
+   gmac1: ethernet@25000 {
+   reg = <0x25000 0x800>;
+   };
+
+   gmac2: ethernet@26000 {
+   reg = <0x26000 0x800>;
+   };
+
+   gmac3: ethernet@27000 {
+   reg = <0x27000 0x800>;
+   };
};
 
lcpll0: lcpll0@1800c100 {
@@ -260,6 +276,17 @@
 "sata2";
};
 
+   srab: srab@18007000 {
+   compatible = "brcm,bcm5301x-srab";
+   reg = <0x18007000 0x1000>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   status = "disabled";
+
+   /* ports are defined in board DTS */
+   };
+
nand: nand@18028000 {
compatible = "brcm,nand-iproc", "brcm,brcmnand-v6.1", 
"brcm,brcmnand";
reg = <0x18028000 0x600>, <0x1811a408 0x600>, <0x18028f00 0x20>;
-- 
2.7.4



[PATCH 0/2] ARM: dts: NSP: Add built-in Ethernet switch nodes

2016-07-08 Thread Florian Fainelli
This patch series is based on Broadcom/stblinux/devicetree/next which
contains proper support for the BCM958625HR board. To get working
Ethernet switch and CPU Ethernet support, the following dependencies
based on David Miller's net-next tree are required:

- Jon Mason's BGMAC/AMAC support: https://marc.info/?t=14679330832&r=1&w=3
- dsa/b53 support for NSP switch: 
https://marc.info/?l=linux-netdev&m=146800324531914&w=3

Florian Fainelli (2):
  ARM: dts: NSP: Add Switch Register Access Block node
  ARM: dts: NSP: Add BCM958625HR switch ports

 arch/arm/boot/dts/bcm-nsp.dtsi| 11 +
 arch/arm/boot/dts/bcm958625hr.dts | 49 +++
 2 files changed, 60 insertions(+)

-- 
2.7.4



[PATCH 2/3] ARM: dts: BCM5301X: Add SRAB interrupts

2016-07-08 Thread Florian Fainelli
Add interrupt mapping for the Switch Register Access Block. Only 12
interrupts are usable at the moment even though up to 32 are dedicated
to the SRAB.

Signed-off-by: Florian Fainelli 
---
 arch/arm/boot/dts/bcm5301x.dtsi | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm/boot/dts/bcm5301x.dtsi b/arch/arm/boot/dts/bcm5301x.dtsi
index 9fb565841004..a20ebd2ac9a2 100644
--- a/arch/arm/boot/dts/bcm5301x.dtsi
+++ b/arch/arm/boot/dts/bcm5301x.dtsi
@@ -153,6 +153,21 @@
/* ChipCommon */
<0x 0 &gic GIC_SPI 85 IRQ_TYPE_LEVEL_HIGH>,
 
+   /* Switch Register Access Block */
+   <0x7000 0 &gic GIC_SPI 95 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 1 &gic GIC_SPI 96 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 2 &gic GIC_SPI 97 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 3 &gic GIC_SPI 98 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 4 &gic GIC_SPI 99 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 5 &gic GIC_SPI 100 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 6 &gic GIC_SPI 101 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 7 &gic GIC_SPI 102 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 8 &gic GIC_SPI 103 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 9 &gic GIC_SPI 104 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 10 &gic GIC_SPI 105 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 11 &gic GIC_SPI 106 IRQ_TYPE_LEVEL_HIGH>,
+   <0x7000 12 &gic GIC_SPI 107 IRQ_TYPE_LEVEL_HIGH>,
+
/* PCIe Controller 0 */
<0x00012000 0 &gic GIC_SPI 126 IRQ_TYPE_LEVEL_HIGH>,
<0x00012000 1 &gic GIC_SPI 127 IRQ_TYPE_LEVEL_HIGH>,
-- 
2.7.4



[PATCH 2/2] ARM: dts: NSP: Add BCM958625HR switch ports

2016-07-08 Thread Florian Fainelli
Add the layout of the switch ports found on the BCM958625HR reference
board. The CPU port is hooked up to the AMAC0 Ethernet controlelr
adapter, so we also enable it.

Signed-off-by: Florian Fainelli 
---
 arch/arm/boot/dts/bcm958625hr.dts | 49 +++
 1 file changed, 49 insertions(+)

diff --git a/arch/arm/boot/dts/bcm958625hr.dts 
b/arch/arm/boot/dts/bcm958625hr.dts
index 03b8bbeb694f..4239e58cf97f 100644
--- a/arch/arm/boot/dts/bcm958625hr.dts
+++ b/arch/arm/boot/dts/bcm958625hr.dts
@@ -109,3 +109,52 @@
groups = "nand_grp";
};
 };
+
+&amac0 {
+   status = "okay";
+};
+
+&srab {
+   compatible = "brcm,bcm58625-srab", "brcm,nsp-srab";
+   status = "okay";
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   port@0 {
+   label = "port0";
+   reg = <0>;
+   };
+
+   port@1 {
+   label = "port1";
+   reg = <1>;
+   };
+
+   port@2 {
+   label = "port2";
+   reg = <2>;
+   };
+
+   port@3 {
+   label = "port3";
+   reg = <3>;
+   };
+
+   port@4 {
+   label = "port4";
+   reg = <4>;
+   };
+
+   port@5 {
+   ethernet = <&amac0>;
+   label = "cpu";
+   reg = <5>;
+   fixed-link {
+   speed = <1000>;
+   full-duplex;
+   };
+   };
+   };
+};
-- 
2.7.4



[PATCH net-next 2/2] net: dsa: b53: Add support for BCM585xx/586xx/88312 integrated switch

2016-07-08 Thread Florian Fainelli
Update the SRAB, core driver and binding document to support the
BCM585xx/586xx/88312 integrated switch (Northstar Plus SoCs family).

Signed-off-by: Florian Fainelli 
---
 Documentation/devicetree/bindings/net/dsa/b53.txt |  9 +
 drivers/net/dsa/b53/b53_common.c  | 12 
 drivers/net/dsa/b53/b53_priv.h|  1 +
 drivers/net/dsa/b53/b53_srab.c|  8 
 4 files changed, 30 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/dsa/b53.txt 
b/Documentation/devicetree/bindings/net/dsa/b53.txt
index ca752db14dff..d6c6e41648d4 100644
--- a/Documentation/devicetree/bindings/net/dsa/b53.txt
+++ b/Documentation/devicetree/bindings/net/dsa/b53.txt
@@ -20,6 +20,15 @@ Required properties:
   "brcm,bcm53018-srab"
   "brcm,bcm53019-srab" and the mandatory "brcm,bcm5301x-srab" string
 
+  For the BCM585xx/586XX/88312 SoCs with an integrated switch, must be one of:
+  "brcm,bcm58522-srab"
+  "brcm,bcm58523-srab"
+  "brcm,bcm58525-srab"
+  "brcm,bcm58622-srab"
+  "brcm,bcm58623-srab"
+  "brcm,bcm58625-srab"
+  "brcm,bcm88312-srab" and the mandatory "brcm,nsp-srab string
+
   For the BCM63xx/33xx SoCs with an integrated switch, must be one of:
   "brcm,bcm3384-switch"
   "brcm,bcm6328-switch"
diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 444de7b9..bda37d336736 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1581,6 +1581,18 @@ static const struct b53_chip_data b53_switch_chips[] = {
.jumbo_pm_reg = B53_JUMBO_PORT_MASK,
.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
},
+   {
+   .chip_id = BCM58XX_DEVICE_ID,
+   .dev_name = "BCM585xx/586xx/88312",
+   .vlans  = 4096,
+   .enabled_ports = 0x1ff,
+   .arl_entries = 4,
+   .cpu_port = B53_CPU_PORT_25,
+   .vta_regs = B53_VTA_REGS,
+   .duplex_reg = B53_DUPLEX_STAT_GE,
+   .jumbo_pm_reg = B53_JUMBO_PORT_MASK,
+   .jumbo_size_reg = B53_JUMBO_MAX_SIZE,
+   },
 };
 
 static int b53_switch_init(struct b53_device *dev)
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 5d8c602fb877..835a744f206e 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -59,6 +59,7 @@ enum {
BCM53012_DEVICE_ID = 0x53012,
BCM53018_DEVICE_ID = 0x53018,
BCM53019_DEVICE_ID = 0x53019,
+   BCM58XX_DEVICE_ID = 0x5800,
 };
 
 #define B53_N_PORTS9
diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c
index de2b9e710041..2b304eaeb8e8 100644
--- a/drivers/net/dsa/b53/b53_srab.c
+++ b/drivers/net/dsa/b53/b53_srab.c
@@ -364,6 +364,14 @@ static const struct of_device_id b53_srab_of_match[] = {
{ .compatible = "brcm,bcm53018-srab" },
{ .compatible = "brcm,bcm53019-srab" },
{ .compatible = "brcm,bcm5301x-srab" },
+   { .compatible = "brcm,bcm58522-srab", .data = (void *)BCM58XX_DEVICE_ID 
},
+   { .compatible = "brcm,bcm58525-srab", .data = (void *)BCM58XX_DEVICE_ID 
},
+   { .compatible = "brcm,bcm58535-srab", .data = (void *)BCM58XX_DEVICE_ID 
},
+   { .compatible = "brcm,bcm58622-srab", .data = (void *)BCM58XX_DEVICE_ID 
},
+   { .compatible = "brcm,bcm58623-srab", .data = (void *)BCM58XX_DEVICE_ID 
},
+   { .compatible = "brcm,bcm58625-srab", .data = (void *)BCM58XX_DEVICE_ID 
},
+   { .compatible = "brcm,bcm88312-srab", .data = (void *)BCM58XX_DEVICE_ID 
},
+   { .compatible = "brcm,nsp-srab", .data = (void *)BCM58XX_DEVICE_ID },
{ /* sentinel */ },
 };
 MODULE_DEVICE_TABLE(of, b53_srab_of_match);
-- 
2.7.4



[PATCH net-next 0/2] net: dsa: b53: Add Broadcom NSP switch support

2016-07-08 Thread Florian Fainelli
Hi all,

This patch series updates the B53 driver to support Broadcom's Northstar Plus
Soc integrated switch.

Unlike the version of the core present in BCM5301x/Northstar, we cannot read the
full chip id of the switch, so we need to get the information about our switch
id from Device Tree.

Other than that, this is a regular Broadcom Ethernet switch which is register
compatible for all practical purposes with the existing switch driver.

Since DSA requires a working CPU Ethernet MAC driver this depends on Jon
Mason's AMAC/BGMAC driver changes to support NSP. Board specific changes depend
on patches present in Broadcom's ARM SoC branches and will be posted in a short
while.

Florian Fainelli (2):
  net: dsa: b53: Allow SRAB driver to specify platform data
  net: dsa: b53: Add support for BCM585xx/586xx/88312 integrated switch

 Documentation/devicetree/bindings/net/dsa/b53.txt |  9 +
 drivers/net/dsa/b53/b53_common.c  | 12 ++
 drivers/net/dsa/b53/b53_priv.h|  1 +
 drivers/net/dsa/b53/b53_srab.c| 47 ++-
 4 files changed, 59 insertions(+), 10 deletions(-)

-- 
2.7.4



[PATCH net-next 1/2] net: dsa: b53: Allow SRAB driver to specify platform data

2016-07-08 Thread Florian Fainelli
For Northstart Plus SoCs, we cannot detect the switch because only the
revision information is provied in the Management page, instead, rely on
Device Tree to tell us the chip id, and pass it down using the
b53_platform_data structure.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_srab.c | 39 +--
 1 file changed, 29 insertions(+), 10 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c
index 70fd47284535..de2b9e710041 100644
--- a/drivers/net/dsa/b53/b53_srab.c
+++ b/drivers/net/dsa/b53/b53_srab.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "b53_priv.h"
 
@@ -356,12 +357,37 @@ static struct b53_io_ops b53_srab_ops = {
.write64 = b53_srab_write64,
 };
 
+static const struct of_device_id b53_srab_of_match[] = {
+   { .compatible = "brcm,bcm53010-srab" },
+   { .compatible = "brcm,bcm53011-srab" },
+   { .compatible = "brcm,bcm53012-srab" },
+   { .compatible = "brcm,bcm53018-srab" },
+   { .compatible = "brcm,bcm53019-srab" },
+   { .compatible = "brcm,bcm5301x-srab" },
+   { /* sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, b53_srab_of_match);
+
 static int b53_srab_probe(struct platform_device *pdev)
 {
+   struct b53_platform_data *pdata = pdev->dev.platform_data;
+   struct device_node *dn = pdev->dev.of_node;
+   const struct of_device_id *of_id = NULL;
struct b53_srab_priv *priv;
struct b53_device *dev;
struct resource *r;
 
+   if (dn)
+   of_id = of_match_node(b53_srab_of_match, dn);
+
+   if (of_id) {
+   pdata = devm_kzalloc(&pdev->dev, sizeof(*pdata), GFP_KERNEL);
+   if (!pdata)
+   return -ENOMEM;
+
+   pdata->chip_id = (u32)of_id->data;
+   }
+
priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
return -ENOMEM;
@@ -375,6 +401,9 @@ static int b53_srab_probe(struct platform_device *pdev)
if (!dev)
return -ENOMEM;
 
+   if (pdata)
+   dev->pdata = pdata;
+
platform_set_drvdata(pdev, dev);
 
return b53_switch_register(dev);
@@ -390,16 +419,6 @@ static int b53_srab_remove(struct platform_device *pdev)
return 0;
 }
 
-static const struct of_device_id b53_srab_of_match[] = {
-   { .compatible = "brcm,bcm53010-srab" },
-   { .compatible = "brcm,bcm53011-srab" },
-   { .compatible = "brcm,bcm53012-srab" },
-   { .compatible = "brcm,bcm53018-srab" },
-   { .compatible = "brcm,bcm53019-srab" },
-   { .compatible = "brcm,bcm5301x-srab" },
-   { /* sentinel */ },
-};
-
 static struct platform_driver b53_srab_driver = {
.probe = b53_srab_probe,
.remove = b53_srab_remove,
-- 
2.7.4



Re: [PATCH v2 6/6] dt-bindings: net: bgmac: add bindings documentation for bgmac

2016-07-08 Thread Florian Fainelli
On 07/07/2016 04:08 PM, Jon Mason wrote:
> Signed-off-by: Jon Mason 

Reviewed-by: Florian Fainelli 
Tested-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v2 4/6] net: ethernet: bgmac: convert to feature flags

2016-07-08 Thread Florian Fainelli
On 07/07/2016 04:08 PM, Jon Mason wrote:
> The bgmac driver is using the bcma provides device ID and revision, as
> well as the SoC ID and package, to determine which features are
> necessary to enable, reset, etc in the driver.   In anticipation of
> removing the bcma requirement for this driver, these must be changed to
> not reference that struct.  In place of that, each "feature" has been
> given a flag, and the flags are enabled for their respective device and
> SoC.
> 
> Signed-off-by: Jon Mason 
> Acked-by: Arnd Bergmann 

Reviewed-by: Florian Fainelli 
Tested-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v2 2/6] net: ethernet: bgmac: add dma_dev pointer

2016-07-08 Thread Florian Fainelli
On 07/07/2016 04:08 PM, Jon Mason wrote:
> The dma buffer allocation, etc references a dma_dev device pointer from
> the bcma core.  In anticipation of removing the bcma requirement for
> this driver, these must be changed to not reference that struct.  Add a
> dma_dev device pointer to the bgmac stuct and reference that instead.
> 
> Signed-off-by: Jon Mason 
> Acked-by: Arnd Bergmann 

Reviewed-by: Florian Fainelli 
Tested-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v2 3/6] net: ethernet: bgmac: move BCMA MDIO Phy code into a separate file

2016-07-08 Thread Florian Fainelli
On 07/07/2016 04:08 PM, Jon Mason wrote:
> Move the BCMA MDIO phy into a separate file, as it is very tightly
> coupled with the BCMA bus.  This will help with the upcoming BCMA
> removal from the bgmac driver.  Optimally, this should be moved into
> phy drivers, but it is too tightly coupled with the bgmac driver to
> effectively move it without more changes to the driver.
> 
> Note: the phy_reset was intentionally removed, as the mdio phy subsystem
> automatically resets the phy if a reset function pointer is present.  In
> addition to the moving of the driver, this reset function is added.
> 
> Signed-off-by: Jon Mason 
> Acked-by: Arnd Bergmann 

Reviewed-by: Florian Fainelli 
Tested-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v2 5/6] net: ethernet: bgmac: Add platform device support

2016-07-08 Thread Florian Fainelli
On 07/07/2016 04:08 PM, Jon Mason wrote:
> The bcma portion of the driver has been split off into a bcma specific
> driver.  This has been mirrored for the platform driver.  The last
> references to the bcma core struct have been changed into a generic
> function call.  These function calls are wrappers to either the original
> bcma code or new platform functions that access the same areas via MMIO.
> This necessitated adding function pointers for both platform and bcma to
> hide which backend is being used from the generic bgmac code.
> 
> Signed-off-by: Jon Mason 
> Acked-by: Arnd Bergmann 

Reviewed-by: Florian Fainelli 
Tested-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v2 1/6] net: ethernet: bgmac: change bgmac_* prints to dev_* prints

2016-07-08 Thread Florian Fainelli
On 07/07/2016 04:08 PM, Jon Mason wrote:
> The bgmac_* print wrappers call dev_* prints with the dev pointer from
> the bcma core.  In anticipation of removing the bcma requirement for
> this driver, these must be changed to not reference that struct.  So,
> simply change all of the bgmac_* prints to their dev_* counterparts.  In
> some cases netdev_* prints are more appropriate, so change those as
> well.
> 
> Signed-off-by: Jon Mason 
> Acked-by: Arnd Bergmann 

Reviewed-by: Florian Fainelli 
Tested-by: Florian Fainelli 
-- 
Florian


Re: [PATCH v2 0/6] net: ethernet: bgmac: Add platform device support

2016-07-08 Thread Florian Fainelli
On 07/07/2016 04:08 PM, Jon Mason wrote:
> David Miller, Please consider including patches 1-5 in net-next
> 
> Florian Fainelli, Please consider including patches 6 & 7 in
>   devicetree/next

David should pick all 6 patches, including the binding documentation, as
this comes with the driver, I will take the DT patch (patch 7) through
Broadcom's arm-soc.

For this entire series, on BCM953012ER and BCM958625HR:

Reviewed-by: Florian Fainelli 
Tested-by: Florian Fainelli 

> 
> Changes in v2:
> * Made device tree binding changes suggested by Sergei Shtylyov,
>   Ray Jui, Rob Herring, Florian Fainelli, and Arnd Bergmann
> * Removed devm_* error paths in the bgmac_platform.c suggested by
>   Florian Fainelli
> * Added Arnd Bergmann's Acked-by to the first 5 (there were changes
>   outlined in the bullets above, but I believe them to be minor enough
>   for him to not revoke his acks)
> 
> 
> This patch series adds support for other, non-bcma iProc SoC's to the
> bgmac driver.  This series only adds NSP support, but we are interested
> in adding support for the Cygnus and NS2 families (with more possible
> down the road).
> 
> To support non-bcma enabled SoCs, we need to add the standard device
> tree "platform device" support.  Unfortunately, this driver is very
> tighly coupled with the bcma bus and much unwinding is needed.  I tried
> to break this up into a number of patches to make it more obvious what
> was being done to add platform device support.  I was able to verify
> that the bcma code still works using a 53012K board (NS SoC), and that
> the platform code works using a 58625K board (NSP SoC).
> 
> Thanks,
> Jon
> 
> 
> Jon Mason (6):
>   net: ethernet: bgmac: change bgmac_* prints to dev_* prints
>   net: ethernet: bgmac: add dma_dev pointer
>   net: ethernet: bgmac: move BCMA MDIO Phy code into a separate file
>   net: ethernet: bgmac: convert to feature flags
>   net: ethernet: bgmac: Add platform device support
>   dt-bindings: net: bgmac: add bindings documentation for bgmac
> 
>  .../devicetree/bindings/net/brcm,amac.txt  |  24 +
>  .../devicetree/bindings/net/brcm,bgmac-nsp.txt |  24 +
>  drivers/net/ethernet/broadcom/Kconfig  |  23 +-
>  drivers/net/ethernet/broadcom/Makefile |   2 +
>  drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c| 266 +
>  drivers/net/ethernet/broadcom/bgmac-bcma.c | 315 ++
>  drivers/net/ethernet/broadcom/bgmac-platform.c | 189 ++
>  drivers/net/ethernet/broadcom/bgmac.c  | 658 
> +
>  drivers/net/ethernet/broadcom/bgmac.h  | 112 +++-
>  9 files changed, 1097 insertions(+), 516 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/net/brcm,amac.txt
>  create mode 100644 Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
>  create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
>  create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma.c
>  create mode 100644 drivers/net/ethernet/broadcom/bgmac-platform.c
> 


-- 
Florian


Re: XDP seeking input from NIC hardware vendors

2016-07-08 Thread Jakub Kicinski
On Fri, 8 Jul 2016 09:45:25 -0700, John Fastabend wrote:
> The only distinction between VFs and queue groupings on my side is VFs
> provide RSS where as queue groupings have to be selected explicitly.
> In a programmable NIC world the distinction might be lost if a "RSS"
> program can be loaded into the NIC to select queues but for existing
> hardware the distinction is there.

To do BPF RSS we need a way to select the queue which I think is all
Jasper wanted.  So we will have to tackle the queue selection at some
point.  The main obstacle with it for me is to define what queue
selection means when program is not offloaded to HW...  Implementing
queue selection on HW side is trivial.

> If you demux using a eBPF program or via a filter model like
> flow_director or cls_{u32|flower} I think we can support both. And this
> just depends on the programmability of the hardware. Note flow_director
> and cls_{u32|flower} steering to VFs is already in place.

Yes, for steering to VFs we could potentially reuse a lot of existing
infrastructure.

> The question I have is should the "filter" part of the eBPF program
> be a separate program from the XDP program and loaded using specific
> semantics (e.g. "load_hardware_demux" ndo op) at the risk of building
> a ever growing set of "ndo" ops. If you are running multiple XDP
> programs on the same NIC hardware then I think this actually makes
> sense otherwise how would the hardware and even software find the
> "demux" logic. In this model there is a "demux" program that selects
> a queue/VF and a program that runs on the netdev queues.

I don't think we should enforce the separation here.  What we may want
to do before forwarding to the VF can be much more complicated than
pure demux/filtering (simple eg - pop VLAN/tunnel).  VF representative
model works well here as fallback - if program could not be offloaded
it will be run on the host and "trombone" packets via VFR into the VF.

If we have a chain of BPF programs we can order them in increasing
level of complexity/features required and then HW could transparently
offload the first parts - the easier ones - leaving more complex
processing on the host.

This should probably be paired with some sort of "skip-sw" flag to let
user space enforce the HW offload on the fast path part.


[net-next 5/6] libcxgb: export ppm release and tagmask set api

2016-07-08 Thread Varun Prakash
Export cxgbi_ppm_release() to release
ppod manager and cxgbi_tagmask_set() to
set tag mask, they are used by cxgb3i, cxgb4i
and cxgbit.

Signed-off-by: Varun Prakash 
---
 drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c | 2 ++
 drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 1 +
 drivers/scsi/cxgbi/libcxgbi.c  | 1 +
 drivers/target/iscsi/cxgbit/cxgbit_main.c  | 2 ++
 4 files changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c 
b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c
index aa9a9bb..3d6970a 100644
--- a/drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c
@@ -313,6 +313,7 @@ int cxgbi_ppm_release(struct cxgbi_ppm *ppm)
}
return 1;
 }
+EXPORT_SYMBOL(cxgbi_ppm_release);
 
 static struct cxgbi_ppm_pool *ppm_alloc_cpu_pool(unsigned int *total,
 unsigned int *pcpu_ppmax)
@@ -466,6 +467,7 @@ unsigned int cxgbi_tagmask_set(unsigned int ppmax)
 
return 1 << (bits + PPOD_IDX_SHIFT);
 }
+EXPORT_SYMBOL(cxgbi_tagmask_set);
 
 static int __init libcxgb_init(void)
 {
diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c 
b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
index d9092c0..44a424d 100644
--- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
+++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
@@ -1234,6 +1234,7 @@ static int cxgb3i_ddp_init(struct cxgbi_device *cdev)
}
 
ppmax = (uinfo.ulimit - uinfo.llimit + 1) >> PPOD_SIZE_SHIFT;
+   tagmask = cxgbi_tagmask_set(ppmax);
 
pr_info("T3 %s: 0x%x~0x%x, 0x%x, tagmask 0x%x -> 0x%x.\n",
ndev->name, uinfo.llimit, uinfo.ulimit, ppmax, uinfo.tagmask,
diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
index 9d425a7..d142113 100644
--- a/drivers/scsi/cxgbi/libcxgbi.c
+++ b/drivers/scsi/cxgbi/libcxgbi.c
@@ -121,6 +121,7 @@ static inline void cxgbi_device_destroy(struct cxgbi_device 
*cdev)
"cdev 0x%p, p# %u.\n", cdev, cdev->nports);
cxgbi_hbas_remove(cdev);
cxgbi_device_portmap_cleanup(cdev);
+   cxgbi_ppm_release(cdev->cdev2ppm(cdev));
if (cdev->pmap.max_connect)
cxgbi_free_big_mem(cdev->pmap.port_csk);
kfree(cdev);
diff --git a/drivers/target/iscsi/cxgbit/cxgbit_main.c 
b/drivers/target/iscsi/cxgbit/cxgbit_main.c
index 60dccd0..27dd11a 100644
--- a/drivers/target/iscsi/cxgbit/cxgbit_main.c
+++ b/drivers/target/iscsi/cxgbit/cxgbit_main.c
@@ -26,6 +26,8 @@ void _cxgbit_free_cdev(struct kref *kref)
struct cxgbit_device *cdev;
 
cdev = container_of(kref, struct cxgbit_device, kref);
+
+   cxgbi_ppm_release(cdev2ppm(cdev));
kfree(cdev);
 }
 
-- 
2.0.2



[net-next 6/6] cxgb3i,cxgb4i: fix symbol not declared sparse warning

2016-07-08 Thread Varun Prakash
Fix following sparse warnings
warning: symbol 'cxgb3i_ofld_init' was not declared. Should it be static?
warning: symbol 'cxgb4i_cplhandlers' was not declared. Should it be static?
warning: symbol 'cxgb4i_ofld_init' was not declared. Should it be static?

Signed-off-by: Varun Prakash 
---
 drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 2 +-
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c 
b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
index 44a424d..40d30bd 100644
--- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
+++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
@@ -1028,7 +1028,7 @@ cxgb3_cpl_handler_func cxgb3i_cpl_handlers[NUM_CPL_CMDS] 
= {
  * cxgb3i_ofld_init - allocate and initialize resources for each adapter found
  * @cdev:  cxgbi adapter
  */
-int cxgb3i_ofld_init(struct cxgbi_device *cdev)
+static int cxgb3i_ofld_init(struct cxgbi_device *cdev)
 {
struct t3cdev *t3dev = (struct t3cdev *)cdev->lldev;
struct adap_ports port;
diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c 
b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 521f9e4..e4ba2d2 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -1503,7 +1503,7 @@ rel_resource_without_clip:
return -EINVAL;
 }
 
-cxgb4i_cplhandler_func cxgb4i_cplhandlers[NUM_CPL_CMDS] = {
+static cxgb4i_cplhandler_func cxgb4i_cplhandlers[NUM_CPL_CMDS] = {
[CPL_ACT_ESTABLISH] = do_act_establish,
[CPL_ACT_OPEN_RPL] = do_act_open_rpl,
[CPL_PEER_CLOSE] = do_peer_close,
@@ -1519,7 +1519,7 @@ cxgb4i_cplhandler_func cxgb4i_cplhandlers[NUM_CPL_CMDS] = 
{
[CPL_RX_DATA] = do_rx_data,
 };
 
-int cxgb4i_ofld_init(struct cxgbi_device *cdev)
+static int cxgb4i_ofld_init(struct cxgbi_device *cdev)
 {
int rc;
 
-- 
2.0.2



[net-next 3/6] cxgb4i,libcxgbi: add iSCSI DDP support

2016-07-08 Thread Varun Prakash
Add iSCSI DDP support in cxgb4i driver
using common iSCSI DDP Page Pod Manager.

Signed-off-by: Varun Prakash 
---
 drivers/scsi/cxgbi/Makefile|   2 +
 drivers/scsi/cxgbi/cxgb3i/Kbuild   |   1 +
 drivers/scsi/cxgbi/cxgb3i/Kconfig  |   1 +
 drivers/scsi/cxgbi/cxgb4i/Kbuild   |   1 +
 drivers/scsi/cxgbi/cxgb4i/Kconfig  |   1 +
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 145 
 drivers/scsi/cxgbi/libcxgbi.c  | 331 +
 drivers/scsi/cxgbi/libcxgbi.h  |  27 ++-
 8 files changed, 507 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/cxgbi/Makefile b/drivers/scsi/cxgbi/Makefile
index 86007e3..a73781a 100644
--- a/drivers/scsi/cxgbi/Makefile
+++ b/drivers/scsi/cxgbi/Makefile
@@ -1,2 +1,4 @@
+ccflags-y += -Idrivers/net/ethernet/chelsio/libcxgb
+
 obj-$(CONFIG_SCSI_CXGB3_ISCSI) += libcxgbi.o cxgb3i/
 obj-$(CONFIG_SCSI_CXGB4_ISCSI) += libcxgbi.o cxgb4i/
diff --git a/drivers/scsi/cxgbi/cxgb3i/Kbuild b/drivers/scsi/cxgbi/cxgb3i/Kbuild
index 961a12f..663c52e 100644
--- a/drivers/scsi/cxgbi/cxgb3i/Kbuild
+++ b/drivers/scsi/cxgbi/cxgb3i/Kbuild
@@ -1,3 +1,4 @@
 ccflags-y += -I$(srctree)/drivers/net/ethernet/chelsio/cxgb3
+ccflags-y += -I$(srctree)/drivers/net/ethernet/chelsio/libcxgb
 
 obj-$(CONFIG_SCSI_CXGB3_ISCSI) += cxgb3i.o
diff --git a/drivers/scsi/cxgbi/cxgb3i/Kconfig 
b/drivers/scsi/cxgbi/cxgb3i/Kconfig
index e460398..f68c871 100644
--- a/drivers/scsi/cxgbi/cxgb3i/Kconfig
+++ b/drivers/scsi/cxgbi/cxgb3i/Kconfig
@@ -5,6 +5,7 @@ config SCSI_CXGB3_ISCSI
select ETHERNET
select NET_VENDOR_CHELSIO
select CHELSIO_T3
+   select CHELSIO_LIB
select SCSI_ISCSI_ATTRS
---help---
  This driver supports iSCSI offload for the Chelsio T3 devices.
diff --git a/drivers/scsi/cxgbi/cxgb4i/Kbuild b/drivers/scsi/cxgbi/cxgb4i/Kbuild
index 3745864..38e03c2 100644
--- a/drivers/scsi/cxgbi/cxgb4i/Kbuild
+++ b/drivers/scsi/cxgbi/cxgb4i/Kbuild
@@ -1,3 +1,4 @@
 ccflags-y += -I$(srctree)/drivers/net/ethernet/chelsio/cxgb4
+ccflags-y += -I$(srctree)/drivers/net/ethernet/chelsio/libcxgb
 
 obj-$(CONFIG_SCSI_CXGB4_ISCSI) += cxgb4i.o
diff --git a/drivers/scsi/cxgbi/cxgb4i/Kconfig 
b/drivers/scsi/cxgbi/cxgb4i/Kconfig
index 8c4e423..594f593 100644
--- a/drivers/scsi/cxgbi/cxgb4i/Kconfig
+++ b/drivers/scsi/cxgbi/cxgb4i/Kconfig
@@ -5,6 +5,7 @@ config SCSI_CXGB4_ISCSI
select ETHERNET
select NET_VENDOR_CHELSIO
select CHELSIO_T4
+   select CHELSIO_LIB
select SCSI_ISCSI_ATTRS
---help---
  This driver supports iSCSI offload for the Chelsio T4 devices.
diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c 
b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
index 2911214..521f9e4 100644
--- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
+++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c
@@ -1543,6 +1543,115 @@ int cxgb4i_ofld_init(struct cxgbi_device *cdev)
return 0;
 }
 
+static inline void
+ulp_mem_io_set_hdr(struct cxgbi_device *cdev,
+  struct ulp_mem_io *req,
+  unsigned int wr_len, unsigned int dlen,
+  unsigned int pm_addr,
+  int tid)
+{
+   struct cxgb4_lld_info *lldi = cxgbi_cdev_priv(cdev);
+   struct ulptx_idata *idata = (struct ulptx_idata *)(req + 1);
+
+   INIT_ULPTX_WR(req, wr_len, 0, tid);
+   req->wr.wr_hi = htonl(FW_WR_OP_V(FW_ULPTX_WR) |
+   FW_WR_ATOMIC_V(0));
+   req->cmd = htonl(ULPTX_CMD_V(ULP_TX_MEM_WRITE) |
+   ULP_MEMIO_ORDER_V(is_t4(lldi->adapter_type)) |
+   T5_ULP_MEMIO_IMM_V(!is_t4(lldi->adapter_type)));
+   req->dlen = htonl(ULP_MEMIO_DATA_LEN_V(dlen >> 5));
+   req->lock_addr = htonl(ULP_MEMIO_ADDR_V(pm_addr >> 5));
+   req->len16 = htonl(DIV_ROUND_UP(wr_len - sizeof(req->wr), 16));
+
+   idata->cmd_more = htonl(ULPTX_CMD_V(ULP_TX_SC_IMM));
+   idata->len = htonl(dlen);
+}
+
+static struct sk_buff *
+ddp_ppod_init_idata(struct cxgbi_device *cdev,
+   struct cxgbi_ppm *ppm,
+   unsigned int idx, unsigned int npods,
+   unsigned int tid)
+{
+   unsigned int pm_addr = (idx << PPOD_SIZE_SHIFT) + ppm->llimit;
+   unsigned int dlen = npods << PPOD_SIZE_SHIFT;
+   unsigned int wr_len = roundup(sizeof(struct ulp_mem_io) +
+   sizeof(struct ulptx_idata) + dlen, 16);
+   struct sk_buff *skb = alloc_wr(wr_len, 0, GFP_ATOMIC);
+
+   if (!skb) {
+   pr_err("%s: %s idx %u, npods %u, OOM.\n",
+  __func__, ppm->ndev->name, idx, npods);
+   return NULL;
+   }
+
+   ulp_mem_io_set_hdr(cdev, (struct ulp_mem_io *)skb->head, wr_len, dlen,
+  pm_addr, tid);
+
+   return skb;
+}
+
+static int ddp_ppod_write_idata(struct cxgbi_ppm *ppm, struct cxgbi_sock *csk,
+   struct cxgbi_task_tag_info *ttinfo,
+   unsigned int idx, unsigned int npods,
+ 

[net-next 2/6] cxgb3i,cxgb4i,libcxgbi: remove iSCSI DDP support

2016-07-08 Thread Varun Prakash
Remove old ddp code from cxgb3i,cxgb4i,libcxgbi.

Next two commits adds DDP support using
common iSCSI DDP Page Pod Manager.

Signed-off-by: Varun Prakash 
---
 drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 128 
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 142 -
 drivers/scsi/cxgbi/libcxgbi.c  | 578 -
 drivers/scsi/cxgbi/libcxgbi.h  | 161 ---
 4 files changed, 1009 deletions(-)

diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c 
b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
index e22a268..fda0234 100644
--- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
+++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
@@ -1076,65 +1076,6 @@ static inline void ulp_mem_io_set_hdr(struct sk_buff 
*skb, unsigned int addr)
req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_BYPASS));
req->cmd_lock_addr = htonl(V_ULP_MEMIO_ADDR(addr >> 5) |
   V_ULPTX_CMD(ULP_MEM_WRITE));
-   req->len = htonl(V_ULP_MEMIO_DATA_LEN(PPOD_SIZE >> 5) |
-V_ULPTX_NFLITS((PPOD_SIZE >> 3) + 1));
-}
-
-static int ddp_set_map(struct cxgbi_sock *csk, struct cxgbi_pagepod_hdr *hdr,
-   unsigned int idx, unsigned int npods,
-   struct cxgbi_gather_list *gl)
-{
-   struct cxgbi_device *cdev = csk->cdev;
-   struct cxgbi_ddp_info *ddp = cdev->ddp;
-   unsigned int pm_addr = (idx << PPOD_SIZE_SHIFT) + ddp->llimit;
-   int i;
-
-   log_debug(1 << CXGBI_DBG_DDP,
-   "csk 0x%p, idx %u, npods %u, gl 0x%p.\n",
-   csk, idx, npods, gl);
-
-   for (i = 0; i < npods; i++, idx++, pm_addr += PPOD_SIZE) {
-   struct sk_buff *skb = alloc_wr(sizeof(struct ulp_mem_io) +
-   PPOD_SIZE, 0, GFP_ATOMIC);
-
-   if (!skb)
-   return -ENOMEM;
-
-   ulp_mem_io_set_hdr(skb, pm_addr);
-   cxgbi_ddp_ppod_set((struct cxgbi_pagepod *)(skb->head +
-   sizeof(struct ulp_mem_io)),
-  hdr, gl, i * PPOD_PAGES_MAX);
-   skb->priority = CPL_PRIORITY_CONTROL;
-   cxgb3_ofld_send(cdev->lldev, skb);
-   }
-   return 0;
-}
-
-static void ddp_clear_map(struct cxgbi_hba *chba, unsigned int tag,
- unsigned int idx, unsigned int npods)
-{
-   struct cxgbi_device *cdev = chba->cdev;
-   struct cxgbi_ddp_info *ddp = cdev->ddp;
-   unsigned int pm_addr = (idx << PPOD_SIZE_SHIFT) + ddp->llimit;
-   int i;
-
-   log_debug(1 << CXGBI_DBG_DDP,
-   "cdev 0x%p, idx %u, npods %u, tag 0x%x.\n",
-   cdev, idx, npods, tag);
-
-   for (i = 0; i < npods; i++, idx++, pm_addr += PPOD_SIZE) {
-   struct sk_buff *skb = alloc_wr(sizeof(struct ulp_mem_io) +
-   PPOD_SIZE, 0, GFP_ATOMIC);
-
-   if (!skb) {
-   pr_err("tag 0x%x, 0x%x, %d/%u, skb OOM.\n",
-   tag, idx, i, npods);
-   continue;
-   }
-   ulp_mem_io_set_hdr(skb, pm_addr);
-   skb->priority = CPL_PRIORITY_CONTROL;
-   cxgb3_ofld_send(cdev->lldev, skb);
-   }
 }
 
 static int ddp_setup_conn_pgidx(struct cxgbi_sock *csk,
@@ -1203,82 +1144,14 @@ static int ddp_setup_conn_digest(struct cxgbi_sock 
*csk, unsigned int tid,
 }
 
 /**
- * t3_ddp_cleanup - release the cxgb3 adapter's ddp resource
- * @cdev: cxgb3i adapter
- * release all the resource held by the ddp pagepod manager for a given
- * adapter if needed
- */
-
-static void t3_ddp_cleanup(struct cxgbi_device *cdev)
-{
-   struct t3cdev *tdev = (struct t3cdev *)cdev->lldev;
-
-   if (cxgbi_ddp_cleanup(cdev)) {
-   pr_info("t3dev 0x%p, ulp_iscsi no more user.\n", tdev);
-   tdev->ulp_iscsi = NULL;
-   }
-}
-
-/**
  * ddp_init - initialize the cxgb3 adapter's ddp resource
  * @cdev: cxgb3i adapter
  * initialize the ddp pagepod manager for a given adapter
  */
 static int cxgb3i_ddp_init(struct cxgbi_device *cdev)
 {
-   struct t3cdev *tdev = (struct t3cdev *)cdev->lldev;
-   struct cxgbi_ddp_info *ddp = tdev->ulp_iscsi;
-   struct ulp_iscsi_info uinfo;
-   unsigned int pgsz_factor[4];
-   int i, err;
-
-   if (ddp) {
-   kref_get(&ddp->refcnt);
-   pr_warn("t3dev 0x%p, ddp 0x%p already set up.\n",
-   tdev, tdev->ulp_iscsi);
-   cdev->ddp = ddp;
-   return -EALREADY;
-   }
-
-   err = tdev->ctl(tdev, ULP_ISCSI_GET_PARAMS, &uinfo);
-   if (err < 0) {
-   pr_err("%s, failed to get iscsi param err=%d.\n",
-tdev->name, err);
-   return err;
-   }
-
-   err = cxgbi_ddp_init(cdev, uinfo.llimit, uinfo.ulimit,
-   uinfo.max_txsz, uinfo.max_rxsz);
-   if (err < 

[net-next 1/6] libcxgb: add library module for Chelsio drivers

2016-07-08 Thread Varun Prakash
Add common library module(libcxgb.ko) for
Chelsio drivers to remove duplicate code.

Code for iSCSI DDP Page Pod Manager is moved
from cxgb4.ko to libcxgb.ko. Earlier only cxgbit.ko
was using this code, now cxgb3i and cxgb4i will
also use common Page Pod manager code.

In future this module will have common connection
management and hardware specific code that can be
shared by multiple Chelsio drivers.

Signed-off-by: Varun Prakash 
---
 drivers/net/ethernet/chelsio/Kconfig   | 18 ++--
 drivers/net/ethernet/chelsio/Makefile  |  1 +
 drivers/net/ethernet/chelsio/cxgb4/Makefile|  1 -
 drivers/net/ethernet/chelsio/libcxgb/Makefile  |  3 +++
 .../{cxgb4/cxgb4_ppm.c => libcxgb/libcxgb_ppm.c}   | 25 --
 .../{cxgb4/cxgb4_ppm.h => libcxgb/libcxgb_ppm.h}   |  8 +++
 drivers/target/iscsi/cxgbit/Kconfig|  2 +-
 drivers/target/iscsi/cxgbit/Makefile   |  1 +
 drivers/target/iscsi/cxgbit/cxgbit.h   |  2 +-
 9 files changed, 41 insertions(+), 20 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/libcxgb/Makefile
 rename drivers/net/ethernet/chelsio/{cxgb4/cxgb4_ppm.c => 
libcxgb/libcxgb_ppm.c} (95%)
 rename drivers/net/ethernet/chelsio/{cxgb4/cxgb4_ppm.h => 
libcxgb/libcxgb_ppm.h} (98%)

diff --git a/drivers/net/ethernet/chelsio/Kconfig 
b/drivers/net/ethernet/chelsio/Kconfig
index 4686a85..1a5ce1e 100644
--- a/drivers/net/ethernet/chelsio/Kconfig
+++ b/drivers/net/ethernet/chelsio/Kconfig
@@ -96,17 +96,6 @@ config CHELSIO_T4_DCB
 
  If unsure, say N.
 
-config CHELSIO_T4_UWIRE
-   bool "Unified Wire Support for Chelsio T5 cards"
-   default n
-   depends on CHELSIO_T4
-   ---help---
- Enable unified-wire offload features.
- Say Y here if you want to enable unified-wire over Ethernet
- in the driver.
-
- If unsure, say N.
-
 config CHELSIO_T4_FCOE
bool "Fibre Channel over Ethernet (FCoE) Support for Chelsio T5 cards"
default n
@@ -137,4 +126,11 @@ config CHELSIO_T4VF
  To compile this driver as a module choose M here; the module
  will be called cxgb4vf.
 
+config CHELSIO_LIB
+   tristate "Chelsio common library"
+   default n
+   ---help---
+ This is common library module for Chelsio T3/T4/T5/T6
+ drivers.
+
 endif # NET_VENDOR_CHELSIO
diff --git a/drivers/net/ethernet/chelsio/Makefile 
b/drivers/net/ethernet/chelsio/Makefile
index 390510b..b6a5eec 100644
--- a/drivers/net/ethernet/chelsio/Makefile
+++ b/drivers/net/ethernet/chelsio/Makefile
@@ -6,3 +6,4 @@ obj-$(CONFIG_CHELSIO_T1) += cxgb/
 obj-$(CONFIG_CHELSIO_T3) += cxgb3/
 obj-$(CONFIG_CHELSIO_T4) += cxgb4/
 obj-$(CONFIG_CHELSIO_T4VF) += cxgb4vf/
+obj-$(CONFIG_CHELSIO_LIB) += libcxgb/
diff --git a/drivers/net/ethernet/chelsio/cxgb4/Makefile 
b/drivers/net/ethernet/chelsio/cxgb4/Makefile
index 85c9282..ace0ab9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/Makefile
+++ b/drivers/net/ethernet/chelsio/cxgb4/Makefile
@@ -7,5 +7,4 @@ obj-$(CONFIG_CHELSIO_T4) += cxgb4.o
 cxgb4-objs := cxgb4_main.o l2t.o t4_hw.o sge.o clip_tbl.o cxgb4_ethtool.o
 cxgb4-$(CONFIG_CHELSIO_T4_DCB) +=  cxgb4_dcb.o
 cxgb4-$(CONFIG_CHELSIO_T4_FCOE) +=  cxgb4_fcoe.o
-cxgb4-$(CONFIG_CHELSIO_T4_UWIRE) +=  cxgb4_ppm.o
 cxgb4-$(CONFIG_DEBUG_FS) += cxgb4_debugfs.o
diff --git a/drivers/net/ethernet/chelsio/libcxgb/Makefile 
b/drivers/net/ethernet/chelsio/libcxgb/Makefile
new file mode 100644
index 000..2362230
--- /dev/null
+++ b/drivers/net/ethernet/chelsio/libcxgb/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_CHELSIO_LIB) += libcxgb.o
+
+libcxgb-y := libcxgb_ppm.o
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ppm.c 
b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c
similarity index 95%
rename from drivers/net/ethernet/chelsio/cxgb4/cxgb4_ppm.c
rename to drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c
index d88a7a7..aa9a9bb 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_ppm.c
+++ b/drivers/net/ethernet/chelsio/libcxgb/libcxgb_ppm.c
@@ -1,5 +1,5 @@
 /*
- * cxgb4_ppm.c: Chelsio common library for T4/T5 iSCSI PagePod Manager
+ * libcxgb_ppm.c: Chelsio common library for T3/T4/T5 iSCSI PagePod Manager
  *
  * Copyright (c) 2016 Chelsio Communications, Inc. All rights reserved.
  *
@@ -10,6 +10,10 @@
  * Written by: Karen Xie (k...@chelsio.com)
  */
 
+#define DRV_NAME "libcxgb"
+#define DRV_VERSION "1.0.0-ko"
+#define pr_fmt(fmt) DRV_NAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -22,7 +26,7 @@
 #include 
 #include 
 
-#include "cxgb4_ppm.h"
+#include "libcxgb_ppm.h"
 
 /* Direct Data Placement -
  * Directly place the iSCSI Data-In or Data-Out PDU's payload into
@@ -462,3 +466,20 @@ unsigned int cxgbi_tagmask_set(unsigned int ppmax)
 
return 1 << (bits + PPOD_IDX_SHIFT);
 }
+
+static int __init libcxgb_init(void)
+{
+   return 0;
+}
+
+static void __exit libcxgb_exit(void)
+{
+}
+
+module_init(libcxgb_init);
+module_exit(libcxgb_exit);

[net-next 4/6] cxgb3i: add iSCSI DDP support

2016-07-08 Thread Varun Prakash
Add iSCSI DDP support in cxgb3i driver
using common iSCSI DDP Page Pod Manager.

Signed-off-by: Varun Prakash 
---
 drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 119 -
 1 file changed, 118 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c 
b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
index fda0234..d9092c0 100644
--- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
+++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
@@ -1076,6 +1076,70 @@ static inline void ulp_mem_io_set_hdr(struct sk_buff 
*skb, unsigned int addr)
req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_BYPASS));
req->cmd_lock_addr = htonl(V_ULP_MEMIO_ADDR(addr >> 5) |
   V_ULPTX_CMD(ULP_MEM_WRITE));
+   req->len = htonl(V_ULP_MEMIO_DATA_LEN(IPPOD_SIZE >> 5) |
+V_ULPTX_NFLITS((IPPOD_SIZE >> 3) + 1));
+}
+
+static struct cxgbi_ppm *cdev2ppm(struct cxgbi_device *cdev)
+{
+   return ((struct t3cdev *)cdev->lldev)->ulp_iscsi;
+}
+
+static int ddp_set_map(struct cxgbi_ppm *ppm, struct cxgbi_sock *csk,
+  struct cxgbi_task_tag_info *ttinfo)
+{
+   unsigned int idx = ttinfo->idx;
+   unsigned int npods = ttinfo->npods;
+   struct scatterlist *sg = ttinfo->sgl;
+   struct cxgbi_pagepod *ppod;
+   struct ulp_mem_io *req;
+   unsigned int sg_off;
+   unsigned int pm_addr = (idx << PPOD_SIZE_SHIFT) + ppm->llimit;
+   int i;
+
+   for (i = 0; i < npods; i++, idx++, pm_addr += IPPOD_SIZE) {
+   struct sk_buff *skb = alloc_wr(sizeof(struct ulp_mem_io) +
+  IPPOD_SIZE, 0, GFP_ATOMIC);
+
+   if (!skb)
+   return -ENOMEM;
+   ulp_mem_io_set_hdr(skb, pm_addr);
+   req = (struct ulp_mem_io *)skb->head;
+   ppod = (struct cxgbi_pagepod *)(req + 1);
+   sg_off = i * PPOD_PAGES_MAX;
+   cxgbi_ddp_set_one_ppod(ppod, ttinfo, &sg,
+  &sg_off);
+   skb->priority = CPL_PRIORITY_CONTROL;
+   cxgb3_ofld_send(ppm->lldev, skb);
+   }
+   return 0;
+}
+
+static void ddp_clear_map(struct cxgbi_device *cdev, struct cxgbi_ppm *ppm,
+ struct cxgbi_task_tag_info *ttinfo)
+{
+   unsigned int idx = ttinfo->idx;
+   unsigned int pm_addr = (idx << PPOD_SIZE_SHIFT) + ppm->llimit;
+   unsigned int npods = ttinfo->npods;
+   int i;
+
+   log_debug(1 << CXGBI_DBG_DDP,
+ "cdev 0x%p, clear idx %u, npods %u.\n",
+ cdev, idx, npods);
+
+   for (i = 0; i < npods; i++, idx++, pm_addr += IPPOD_SIZE) {
+   struct sk_buff *skb = alloc_wr(sizeof(struct ulp_mem_io) +
+  IPPOD_SIZE, 0, GFP_ATOMIC);
+
+   if (!skb) {
+   pr_err("cdev 0x%p, clear ddp, %u,%d/%u, skb OOM.\n",
+  cdev, idx, i, npods);
+   continue;
+   }
+   ulp_mem_io_set_hdr(skb, pm_addr);
+   skb->priority = CPL_PRIORITY_CONTROL;
+   cxgb3_ofld_send(ppm->lldev, skb);
+   }
 }
 
 static int ddp_setup_conn_pgidx(struct cxgbi_sock *csk,
@@ -1144,14 +1208,67 @@ static int ddp_setup_conn_digest(struct cxgbi_sock 
*csk, unsigned int tid,
 }
 
 /**
- * ddp_init - initialize the cxgb3 adapter's ddp resource
+ * cxgb3i_ddp_init - initialize the cxgb3 adapter's ddp resource
  * @cdev: cxgb3i adapter
  * initialize the ddp pagepod manager for a given adapter
  */
 static int cxgb3i_ddp_init(struct cxgbi_device *cdev)
 {
+   struct t3cdev *tdev = (struct t3cdev *)cdev->lldev;
+   struct net_device *ndev = cdev->ports[0];
+   struct cxgbi_tag_format tformat;
+   unsigned int ppmax, tagmask;
+   struct ulp_iscsi_info uinfo;
+   int i, err;
+
+   err = tdev->ctl(tdev, ULP_ISCSI_GET_PARAMS, &uinfo);
+   if (err < 0) {
+   pr_err("%s, failed to get iscsi param %d.\n",
+  ndev->name, err);
+   return err;
+   }
+   if (uinfo.llimit >= uinfo.ulimit) {
+   pr_warn("T3 %s, iscsi NOT enabled %u ~ %u!\n",
+   ndev->name, uinfo.llimit, uinfo.ulimit);
+   return -EACCES;
+   }
+
+   ppmax = (uinfo.ulimit - uinfo.llimit + 1) >> PPOD_SIZE_SHIFT;
+
+   pr_info("T3 %s: 0x%x~0x%x, 0x%x, tagmask 0x%x -> 0x%x.\n",
+   ndev->name, uinfo.llimit, uinfo.ulimit, ppmax, uinfo.tagmask,
+   tagmask);
+
+   memset(&tformat, 0, sizeof(struct cxgbi_tag_format));
+   for (i = 0; i < 4; i++)
+   tformat.pgsz_order[i] = uinfo.pgsz_factor[i];
+   cxgbi_tagmask_check(tagmask, &tformat);
+
+   cxgbi_ddp_ppm_setup(&tdev->ulp_iscsi, cdev, &tformat, ppmax,
+   uinfo.llimit, uinfo.llimit, 0);
+   if (!(cdev->flags & CXGBI_FLAG_DDP_OFF)) {
+

[net-next 0/6] common library for Chelsio drivers

2016-07-08 Thread Varun Prakash
Hi,

 This patch series adds common library module(libcxgb.ko)
for Chelsio drivers to remove duplicate code.

This series moves common iSCSI DDP Page Pod manager
code from cxgb4.ko to libcxgb.ko, earlier this code
was used by only cxgbit.ko now it is used by
three Chelsio iSCSI drivers cxgb3i, cxgb4i, cxgbit.

In future this module will have common connection
management and hardware specific code that can
be shared by multiple Chelsio drivers(cxgb4,
csiostor, iw_cxgb4, cxgb4i, cxgbit).

Please review.

Thanks
Varun

Varun Prakash (6):
  libcxgb: add library module for Chelsio drivers
  cxgb3i,cxgb4i,libcxgbi: remove iSCSI DDP support
  cxgb4i,libcxgbi: add iSCSI DDP support
  cxgb3i: add iSCSI DDP support
  libcxgb: export ppm release and tagmask set api
  cxgb3i,cxgb4i: fix symbol not declared sparse warning

 drivers/net/ethernet/chelsio/Kconfig   |  18 +-
 drivers/net/ethernet/chelsio/Makefile  |   1 +
 drivers/net/ethernet/chelsio/cxgb4/Makefile|   1 -
 drivers/net/ethernet/chelsio/libcxgb/Makefile  |   3 +
 .../{cxgb4/cxgb4_ppm.c => libcxgb/libcxgb_ppm.c}   |  27 +-
 .../{cxgb4/cxgb4_ppm.h => libcxgb/libcxgb_ppm.h}   |   8 +-
 drivers/scsi/cxgbi/Makefile|   2 +
 drivers/scsi/cxgbi/cxgb3i/Kbuild   |   1 +
 drivers/scsi/cxgbi/cxgb3i/Kconfig  |   1 +
 drivers/scsi/cxgbi/cxgb3i/cxgb3i.c | 164 +++--
 drivers/scsi/cxgbi/cxgb4i/Kbuild   |   1 +
 drivers/scsi/cxgbi/cxgb4i/Kconfig  |   1 +
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 203 +++---
 drivers/scsi/cxgbi/libcxgbi.c  | 734 +++--
 drivers/scsi/cxgbi/libcxgbi.h  | 188 +-
 drivers/target/iscsi/cxgbit/Kconfig|   2 +-
 drivers/target/iscsi/cxgbit/Makefile   |   1 +
 drivers/target/iscsi/cxgbit/cxgbit.h   |   2 +-
 drivers/target/iscsi/cxgbit/cxgbit_main.c  |   2 +
 19 files changed, 500 insertions(+), 860 deletions(-)
 create mode 100644 drivers/net/ethernet/chelsio/libcxgb/Makefile
 rename drivers/net/ethernet/chelsio/{cxgb4/cxgb4_ppm.c => 
libcxgb/libcxgb_ppm.c} (95%)
 rename drivers/net/ethernet/chelsio/{cxgb4/cxgb4_ppm.h => 
libcxgb/libcxgb_ppm.h} (98%)

-- 
2.0.2



Re: [iproute PATCH 0/2] Netns performance improvements

2016-07-08 Thread Rick Jones

On 07/08/2016 01:01 AM, Nicolas Dichtel wrote:

Those 300 routers will each have at least one namespace along with the dhcp
namespaces.  Depending on the nature of the routers (Distributed versus
Centralized Virtual Routers - DVR vs CVR) and whether the routers are supposed
to be "HA" there can be more than one namespace for a given router.

300 routers is far from the upper limit/goal.  Back in HP Public Cloud, we were
running as many as 700 routers per network node (*), and more than four network
nodes. (back then it was just the one namespace per router and network). Mileage
will of course vary based on the "oomph" of one's network node(s).

Thank you for the details.

Do you have a script or something else to easily reproduce this problem?


Do you mean for my much older, slightly different stuff done in HP 
Public Cloud, or for what Phil (?) is doing presently?  I believe Phil 
posted something several messages back in the thread.


happy benchmarking,

rick jones


Re: [PATCH net-next 2/4] udp offload: allow GRO on 0 checksum packets

2016-07-08 Thread Alexander Duyck
On Fri, Jul 8, 2016 at 9:56 AM, Hannes Frederic Sowa
 wrote:
> On 08.07.2016 12:46, Alexander Duyck wrote:
>> On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
>>> currently, UDP packets with zero checksum are not allowed to
>>> use udp offload's GRO. This patch admits such packets to
>>> GRO, if the related socket settings allow it: ipv6 packets
>>> are not admitted if the sockets don't have the no_check6_rx
>>> flag set.
>>>
>>> Signed-off-by: Paolo Abeni 
>>> ---
>>>  net/ipv4/udp_offload.c | 6 +-
>>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>>> index 9c37338..ac783f4 100644
>>> --- a/net/ipv4/udp_offload.c
>>> +++ b/net/ipv4/udp_offload.c
>>> @@ -257,7 +257,7 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, 
>>> struct sk_buff *skb,
>>> struct sock *sk;
>>>
>>> if (NAPI_GRO_CB(skb)->encap_mark ||
>>> -   (skb->ip_summed != CHECKSUM_PARTIAL &&
>>> +   (uh->check && skb->ip_summed != CHECKSUM_PARTIAL &&
>>>  NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>>>  !NAPI_GRO_CB(skb)->csum_valid))
>>> goto out;
>>
>> So now all zero checksum UDP traffic will be targeted for GRO if I am
>> understanding this right.  Have you looked into how much of an impact
>> this might have on performance for non-tunnel UDP traffic using a zero
>> checksum?  I'm thinking it will be negative.  The issue is you are now
>> going to be performing an extra socket lookup for all incoming UDP
>> frames that have a zero checksum.
>
> Are zero checksummed UDP protocols rare and only happen in case where we
> have tunneling protocols, which need the socket lookup anyway? That
> said, we haven't really focused on the impact here and thought it
> shouldn't matter to try to speed up zero checksum UDP protocols over
> zero ones.

I'm not sure how rare they are, but I know they are used for more than
just tunnels, especially in the case of IPv4.  What I suspect will
happen with this being implemented is that we will end up with all
sorts of people coming forward complaining about performance
regressions when we add an extra socket lookup to their fast-path.
I'm sure Jesper's pktgen tests would show a some negatives with
something like this as pktgen uses a 0 UDP checksum as I recall.
However I would suspect he probably runs such tests with GRO already
disabled.

>> Also in the line below this line we are setting the encap_mark.  That
>> will probably need to be moved down to the point just before we call
>> gro_receive so that we can avoid setting unneeded data in the
>> NAPI_GRO_CB.
>
> Ack.
>
>>> @@ -271,6 +271,10 @@ struct sk_buff **udp_gro_receive(struct sk_buff 
>>> **head, struct sk_buff *skb,
>>> if (!sk || !udp_sk(sk)->gro_receive)
>>> goto out_unlock;
>>>
>>> +   if (!uh->check && skb->protocol == cpu_to_be16(ETH_P_IPV6) &&
>>> +   !udp_sk(sk)->no_check6_rx)
>>> +   goto out_unlock;
>>> +
>>> flush = 0;
>>>
>>> for (p = *head; p; p = p->next) {
>>
>> So I am pretty sure this check doesn't pass the sniff test.
>> Specifically I don't believe you can use skb->protocol like you
>> currently are as it could be an 8021q frame for instance that is being
>> aggregated so the skb->protocol would be invalid.  I think what you
>> should probably be using is NAPI_GRO_CB(skb)->is_ipv6 although it
>> occurs to me that in the case of tunnels I don't know if that value is
>> being reset for IPv4 like it should be.

I just looked at the function and verified the v4 path for UDP GRO is
setting this to zero as it should.

> Thanks, we probably should switch to sk->sk_family (we don't allow dual
> family sockets with tunnel drivers so far)?

I don't know what the situation there is.  I just now that for v4 vs
v6 UDP the NAPI_GRO_CB has a field called is_ipv6 which is populated
just before calling into the tunnel GRO path.  If you use that you can
guarantee that you are looking at the right type for the protocol
instead of guessing at it based on skb->protocol.

- Alex


Re: [PATCH v2 1/2] libxt_hashlimit: Prepare libxt_hashlimit.c for revision 2

2016-07-08 Thread Vishwanath Pai
On 07/08/2016 12:54 PM, Vishwanath Pai wrote:
> On 07/08/2016 12:37 PM, David Laight wrote:
>> If you think some users would still want 32bit limits, then you should
>> (probably) use a _64 suffix for the new functions.
>>
>>  David
> 
> I am proposing a new revision for hashlimit that supports a higher rate
> along with a few other changes/fixes (in separate patches). Hence the
> prefix _v2 for the new functions.
> 
> - Vish
> 

I'm sorry _v1 for the old functions, the new functions (for rev2) don't
have a suffix.


RE: [Intel-wired-lan] [PATCH] (resend) ixgbe: always initialize setup_fc

2016-07-08 Thread Tantilov, Emil S
>-Original Message-
>From: Patrick McLean [mailto:patri...@gaikai.com]
>Sent: Friday, July 08, 2016 9:45 AM
>To: zhuyj 
>Cc: Tantilov, Emil S ; Rustad, Mark D
>; netdev ; intel-wired-lan
>
>Subject: Re: [Intel-wired-lan] [PATCH] (resend) ixgbe: always initialize
>setup_fc
>
>How about just initializing it when the rest of the struct is
>initialized? This is what is done for every other model.

That works as well.

Thanks,
Emil



Re: [PATCH net-next 2/4] udp offload: allow GRO on 0 checksum packets

2016-07-08 Thread Hannes Frederic Sowa
On 08.07.2016 12:46, Alexander Duyck wrote:
> On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
>> currently, UDP packets with zero checksum are not allowed to
>> use udp offload's GRO. This patch admits such packets to
>> GRO, if the related socket settings allow it: ipv6 packets
>> are not admitted if the sockets don't have the no_check6_rx
>> flag set.
>>
>> Signed-off-by: Paolo Abeni 
>> ---
>>  net/ipv4/udp_offload.c | 6 +-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>> index 9c37338..ac783f4 100644
>> --- a/net/ipv4/udp_offload.c
>> +++ b/net/ipv4/udp_offload.c
>> @@ -257,7 +257,7 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, 
>> struct sk_buff *skb,
>> struct sock *sk;
>>
>> if (NAPI_GRO_CB(skb)->encap_mark ||
>> -   (skb->ip_summed != CHECKSUM_PARTIAL &&
>> +   (uh->check && skb->ip_summed != CHECKSUM_PARTIAL &&
>>  NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>>  !NAPI_GRO_CB(skb)->csum_valid))
>> goto out;
> 
> So now all zero checksum UDP traffic will be targeted for GRO if I am
> understanding this right.  Have you looked into how much of an impact
> this might have on performance for non-tunnel UDP traffic using a zero
> checksum?  I'm thinking it will be negative.  The issue is you are now
> going to be performing an extra socket lookup for all incoming UDP
> frames that have a zero checksum.

Are zero checksummed UDP protocols rare and only happen in case where we
have tunneling protocols, which need the socket lookup anyway? That
said, we haven't really focused on the impact here and thought it
shouldn't matter to try to speed up zero checksum UDP protocols over
zero ones.

> Also in the line below this line we are setting the encap_mark.  That
> will probably need to be moved down to the point just before we call
> gro_receive so that we can avoid setting unneeded data in the
> NAPI_GRO_CB.

Ack.

>> @@ -271,6 +271,10 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, 
>> struct sk_buff *skb,
>> if (!sk || !udp_sk(sk)->gro_receive)
>> goto out_unlock;
>>
>> +   if (!uh->check && skb->protocol == cpu_to_be16(ETH_P_IPV6) &&
>> +   !udp_sk(sk)->no_check6_rx)
>> +   goto out_unlock;
>> +
>> flush = 0;
>>
>> for (p = *head; p; p = p->next) {
> 
> So I am pretty sure this check doesn't pass the sniff test.
> Specifically I don't believe you can use skb->protocol like you
> currently are as it could be an 8021q frame for instance that is being
> aggregated so the skb->protocol would be invalid.  I think what you
> should probably be using is NAPI_GRO_CB(skb)->is_ipv6 although it
> occurs to me that in the case of tunnels I don't know if that value is
> being reset for IPv4 like it should be.

Thanks, we probably should switch to sk->sk_family (we don't allow dual
family sockets with tunnel drivers so far)?

Bye,
Hannes



Re: [PATCH v2 1/2] libxt_hashlimit: Prepare libxt_hashlimit.c for revision 2

2016-07-08 Thread Vishwanath Pai
On 07/08/2016 12:37 PM, David Laight wrote:
> If you think some users would still want 32bit limits, then you should
> (probably) use a _64 suffix for the new functions.
> 
>   David

I am proposing a new revision for hashlimit that supports a higher rate
along with a few other changes/fixes (in separate patches). Hence the
prefix _v2 for the new functions.

- Vish


Re: Resurrecting due to huge ipoib perf regression - [BUG] skb corruption and kernel panic at forwarding with fragmentation

2016-07-08 Thread Jason Gunthorpe
On Fri, Jul 08, 2016 at 07:18:11AM -0700, Roland Dreier wrote:
> On Thu, Jul 7, 2016 at 4:14 PM, Jason Gunthorpe
>  wrote:
> > We have neighbour_priv, and ndo_neigh_construct/destruct now ..
> >
> > A first blush that would seem to be enough to let ipoib store the AH
> > and other path information in the neigh and avoid the cb? At least the
> > example in clip sure looks like what ipoib needs to do.
> 
> Do you think those new facilities let us go back to using the neigh
> and still avoid the issues that led to commit b63b70d87741 ("IPoIB:
> Use a private hash table for path lookup in xmit path")?

Well, the priv stuff were brought up in the discussion around
b63b70d87741 but never fully analyzed. Maybe it could have been used
to solve that problem, who knows.. I guess it doesn't help this exact
issue because we don't have a dst at hard header time anyhow.

But, DaveM suggested how to handle our current problem in the above thread:

http://marc.info/?l=linux-rdma&m=132813323907877&w=2

Which is the same route CLIP took:

331 struct dst_entry *dst = skb_dst(skb);
347 rt = (struct rtable *) dst;
348 if (rt->rt_gateway)
349 daddr = &rt->rt_gateway;
350 else
351 daddr = &ip_hdr(skb)->daddr;
352 n = dst_neigh_lookup(dst, daddr);

(DaveM said it should be &ip/ipv6_hdr(skb)->daddr, not the rtable cast)

Last time this was brought up you were concerned about ARP, ARP
sets skb_dst after calling dev_hard_header:

310 skb = arp_create(type, ptype, dest_ip, dev, src_ip,
311  dest_hw, src_hw, target_hw);
312 if (!skb)
313 return;
314
315 skb_dst_set(skb, dst_clone(dst));

However, there is at least one fringe case (arp_send) where the dst is
left NULL. Presumably there are other fringe cases too..

So, it appears, the dst and neigh can be used for all performances cases.

For the non performance dst == null case, can we just burn cycles and
stuff the daddr in front of the packet at hardheader time, even if we
have to copy?

Jason


Re: [PATCH v6 12/12] net/mlx4_en: add prefetch in xdp rx path

2016-07-08 Thread Brenden Blanco
On Fri, Jul 08, 2016 at 08:56:45AM +0200, Eric Dumazet wrote:
> On Thu, 2016-07-07 at 21:16 -0700, Alexei Starovoitov wrote:
> 
> > I've tried this style of prefetching in the past for normal stack
> > and it didn't help at all.
> 
> This is very nice, but my experience showed opposite numbers.
> So I guess you did not choose the proper prefetch strategy.
> 
> prefetching in mlx4 gave me good results, once I made sure our compiler
> was not moving the actual prefetch operations on x86_64 (ie forcing use
> of asm volatile as in x86_32 instead of the builtin prefetch). You might
> check if your compiler does the proper thing because this really hurt me
> in the past.
> 
> In my case, I was using 40Gbit NIC, and prefetching 128 bytes instead of
> 64 bytes allowed to remove one stall in GRO engine when using TCP with
> TS (total header size : 66 bytes), or tunnels.
> 
> The problem with prefetch is that it works well assuming a given rate
> (in pps), and given cpus, as prefetch behavior is varying among flavors.
> 
> Brenden chose to prefetch N+3, based on some experiments, on some
> hardware,
> 
> prefetch N+3 can actually slow down if you receive a moderate load,
> which is the case 99% of the time in typical workloads on modern servers
> with multi queue NIC.
Thanks for the feedback Eric!

This particular patch in the series is meant to be standalone exactly
for this reason. I don't pretend to assert that this optimization will
work for everybody, or even for a future version of me with different
hardware. But, it passes my internal criteria for usefulness:
1. It provides a measurable gain in the experiments that I have at hand
2. The code is easy to review
3. The change does not negatively impact non-XDP users

I would love to have a solution for all mlx4 driver users, but this
patch set is focused on a different goal. So, without munging a
different set of changes for the universal use case, and probably
violating criteria #2 or #3, I went with what you see.

In hopes of not derailing the whole patch series, what is an actionable
next step for this patch #12?
Ideas:
Pick a safer N? (I saw improvements with N=1 as well)
Drop this patch?

One thing I definitely don't want to do is go into the weeds trying to
get a universal prefetch logic in order to merge the XDP framework, even
though I agree the net result would benefit everybody.
> 
> This is why it was hard to upstream such changes, because they focus on
> max throughput instead of low latencies.
> 
> 
> 


Re: XDP seeking input from NIC hardware vendors

2016-07-08 Thread John Fastabend
On 16-07-08 09:07 AM, Jakub Kicinski wrote:
> On Fri, 8 Jul 2016 17:19:43 +0200, Jesper Dangaard Brouer wrote:
>> On Fri, 8 Jul 2016 14:44:53 +0100 Jakub Kicinski 
>>  wrote:
>>> On Thu, 7 Jul 2016 19:22:12 -0700, Alexei Starovoitov wrote:
> If the goal is to just separate XDP traffic from non-XDP traffic
> you could accomplish this with a combination of SR-IOV/macvlan to
> separate the device queues into multiple netdevs and then run XDP
> on just one of the netdevs. Then use flow director (ethtool) or
> 'tc cls_u32/flower' to steer traffic to the netdev. This is how
> we support multiple networking stacks on one device by the way it
> is called the bifurcated driver. Its not too far of a stretch to
> think we could offload some simple XDP programs to program the
> splitting of traffic instead of cls_u32/flower/flow_director and
> then you would have a stack of XDP programs. One running in
> hardware and a set running on the queues in software.  


 the above sounds like much better approach then Jesper/mine
 prog_per_ring stuff.

 If we can split the nic via sriov and have dedicated netdev via VF
 just for XDP that's way cleaner approach. I guess we won't need to
 do xdp_rxqmask after all.
>>>
>>> +1
>>>
>>> I was thinking about using eBPF to direct to NIC queues but concluded
>>> that doing a redirect to a VF is cleaner.  Especially if the PF driver
>>> supports VF representatives we could potentially just use
>>> bpf_redirect(VFR netdev) and the VF doesn't even have to be handled by
>>> the same stack.  
>>
>> I actually disagree.
>>
>> I _do_ want to use the "filter" part of eBPF to direct to NIC queues, and
>> then run a single/specific XDP program on that queue.
>>
>> Why to I want this?
>>
>> This part of solving a very fundamental CS problem (early demux), when
>> wanting to support Zero-copy on RX.  The basic problem that the NIC
>> driver need to map RX pages into the RX ring, prior to receiving
>> packets. Thus, we need HW support to steer packets, for gaining enough
>> isolation (e.g between tenants domains) for allowing zero-copy.
>>
>>
>> Based on the flexibility of the HW-filter, the granularity achievable
>> for isolation (e.g. application specific) is much more flexible.  Than
>> splitting up the entire NIC with SR-IOV, VFs or macvlans.
> 
> I think of SR-IOV VFs a way of grouping queues.  If HW is capable of
> directing to a queue it's usually capable of directing to a VF as well.
> And the VF could have all other traffic disabled so you would get only
> packets directed to it by the (BPF) filter - same as you would for the
> queue.  Does that make sense for zero copy apps?
> 

The only distinction between VFs and queue groupings on my side is VFs
provide RSS where as queue groupings have to be selected explicitly.
In a programmable NIC world the distinction might be lost if a "RSS"
program can be loaded into the NIC to select queues but for existing
hardware the distinction is there.

If you demux using a eBPF program or via a filter model like
flow_director or cls_{u32|flower} I think we can support both. And this
just depends on the programmability of the hardware. Note flow_director
and cls_{u32|flower} steering to VFs is already in place.

The question I have is should the "filter" part of the eBPF program
be a separate program from the XDP program and loaded using specific
semantics (e.g. "load_hardware_demux" ndo op) at the risk of building
a ever growing set of "ndo" ops. If you are running multiple XDP
programs on the same NIC hardware then I think this actually makes
sense otherwise how would the hardware and even software find the
"demux" logic. In this model there is a "demux" program that selects
a queue/VF and a program that runs on the netdev queues.

Any thoughts?

.John


Re: [PATCH net-next 2/4] udp offload: allow GRO on 0 checksum packets

2016-07-08 Thread Alexander Duyck
On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
> currently, UDP packets with zero checksum are not allowed to
> use udp offload's GRO. This patch admits such packets to
> GRO, if the related socket settings allow it: ipv6 packets
> are not admitted if the sockets don't have the no_check6_rx
> flag set.
>
> Signed-off-by: Paolo Abeni 
> ---
>  net/ipv4/udp_offload.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 9c37338..ac783f4 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -257,7 +257,7 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, 
> struct sk_buff *skb,
> struct sock *sk;
>
> if (NAPI_GRO_CB(skb)->encap_mark ||
> -   (skb->ip_summed != CHECKSUM_PARTIAL &&
> +   (uh->check && skb->ip_summed != CHECKSUM_PARTIAL &&
>  NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>  !NAPI_GRO_CB(skb)->csum_valid))
> goto out;

So now all zero checksum UDP traffic will be targeted for GRO if I am
understanding this right.  Have you looked into how much of an impact
this might have on performance for non-tunnel UDP traffic using a zero
checksum?  I'm thinking it will be negative.  The issue is you are now
going to be performing an extra socket lookup for all incoming UDP
frames that have a zero checksum.

Also in the line below this line we are setting the encap_mark.  That
will probably need to be moved down to the point just before we call
gro_receive so that we can avoid setting unneeded data in the
NAPI_GRO_CB.

> @@ -271,6 +271,10 @@ struct sk_buff **udp_gro_receive(struct sk_buff **head, 
> struct sk_buff *skb,
> if (!sk || !udp_sk(sk)->gro_receive)
> goto out_unlock;
>
> +   if (!uh->check && skb->protocol == cpu_to_be16(ETH_P_IPV6) &&
> +   !udp_sk(sk)->no_check6_rx)
> +   goto out_unlock;
> +
> flush = 0;
>
> for (p = *head; p; p = p->next) {

So I am pretty sure this check doesn't pass the sniff test.
Specifically I don't believe you can use skb->protocol like you
currently are as it could be an 8021q frame for instance that is being
aggregated so the skb->protocol would be invalid.  I think what you
should probably be using is NAPI_GRO_CB(skb)->is_ipv6 although it
occurs to me that in the case of tunnels I don't know if that value is
being reset for IPv4 like it should be.

- Alex


Re: [Intel-wired-lan] [PATCH] (resend) ixgbe: always initialize setup_fc

2016-07-08 Thread Patrick McLean
How about just initializing it when the rest of the struct is
initialized? This is what is done for every other model.

On Fri, Jul 8, 2016 at 2:47 AM, zhuyj  wrote:
> Sure. setup_fc should not be null. Emil, your patch can fix it well.
>
> On Fri, Jul 8, 2016 at 8:18 AM, Tantilov, Emil S 
> wrote:
>>
>> >-Original Message-
>> >From: Intel-wired-lan [mailto:intel-wired-lan-boun...@lists.osuosl.org]
>> > On
>> >Behalf Of Rustad, Mark D
>> >Sent: Wednesday, July 06, 2016 4:01 PM
>> >To: Patrick McLean 
>> >Cc: netdev ; intel-wired-lan > >l...@lists.osuosl.org>
>> >Subject: Re: [Intel-wired-lan] [PATCH] (resend) ixgbe: always initialize
>> >setup_fc
>> >
>> >Patrick McLean  wrote:
>> >
>> >> Gmail mangled my first message, sorry about that. Second attempt.
>> >>
>> >> In ixgbe_init_mac_link_ops_X550em, the code has a special case for
>> >> backplane media type, but does not fall through to the default case,
>> >> so the setup_fc never gets initialized. This causes a panic when it
>> >> later tries to set up the card, and the kernel dereferences the null
>> >> pointer.
>> >>
>> >> This patch lets the the function fall through, which initialized
>> >> setup_fc properly.
>> >
>> >I don't think that this is the right fix. My memory is that fc autoneg is
>>
>> setup_fc() does not configure FC autoneg and it should always be set.
>>
>> I posted an alternative patch that simply sets setup_fc at the beginning
>> of
>> the function. The fall-through in the switch statement is not a good
>> solution
>> because it won't work in case we need to add another case.
>>
>> http://patchwork.ozlabs.org/patch/646228/
>>
>> Thanks,
>> Emil
>>
>
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index 19b75cd..cfc814a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -2915,7 +2915,7 @@ static const struct ixgbe_mac_operations mac_ops_X550EM_x = {
 	.acquire_swfw_sync	= &ixgbe_acquire_swfw_sync_X550em,
 	.release_swfw_sync	= &ixgbe_release_swfw_sync_X550em,
 	.init_swfw_sync		= &ixgbe_init_swfw_sync_X540,
-	.setup_fc		= NULL, /* defined later */
+	.setup_fc		= ixgbe_setup_fc_x550em,
 	.read_iosf_sb_reg	= ixgbe_read_iosf_sb_reg_x550,
 	.write_iosf_sb_reg	= ixgbe_write_iosf_sb_reg_x550,
 };


RE: [PATCH v2 1/2] libxt_hashlimit: Prepare libxt_hashlimit.c for revision 2

2016-07-08 Thread David Laight
From: Vishwanath Pai
> Sent: 08 July 2016 00:34
> I am planning to add a revision 2 for the hashlimit xtables module to
> support higher packets per second rates. This patch renames all the
> functions and variables related to revision 1 by adding _v1 at the end of
> the names.

Sounds backwards.
If you need to change the default, and are changing all the callers,
the just change it.

If you think some users would still want 32bit limits, then you should
(probably) use a _64 suffix for the new functions.

David



[PATCH net] tcp: make challenge acks less predictable

2016-07-08 Thread Eric Dumazet
From: Eric Dumazet 

Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.

This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.

Based on initial analysis and patch from Linus.

Note that we also have per socket rate limiting, so it is tempting
to remove the host limit. This might be done later.

Fixes: 282f23c6ee34 ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao 
Signed-off-by: Eric Dumazet 
Suggested-by: Linus Torvalds 
Cc: Yuchung Cheng 
Cc: Neal Cardwell 
---
diff --git a/Documentation/networking/ip-sysctl.txt 
b/Documentation/networking/ip-sysctl.txt
index 9ae929395b24..391ed93a8e49 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -732,7 +732,7 @@ tcp_limit_output_bytes - INTEGER
 tcp_challenge_ack_limit - INTEGER
Limits number of Challenge ACK sent per second, as recommended
in RFC 5961 (Improving TCP's Robustness to Blind In-Window Attacks)
-   Default: 100
+   Default: 1000
 
 UDP variables:
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d6c8f4cd0800..25f95a41090a 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -87,7 +87,7 @@ int sysctl_tcp_adv_win_scale __read_mostly = 1;
 EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);
 
 /* rfc5961 challenge ack rate limiting */
-int sysctl_tcp_challenge_ack_limit = 100;
+int sysctl_tcp_challenge_ack_limit = 1000;
 
 int sysctl_tcp_stdurg __read_mostly;
 int sysctl_tcp_rfc1337 __read_mostly;
@@ -3455,10 +3455,11 @@ not_rate_limited:
 static void tcp_send_challenge_ack(struct sock *sk, const struct sk_buff *skb)
 {
/* unprotected vars, we dont care of overwrites */
-   static u32 challenge_timestamp;
+   static unsigned int challenge_window = HZ;
+   static unsigned long challenge_timestamp;
static unsigned int challenge_count;
struct tcp_sock *tp = tcp_sk(sk);
-   u32 now;
+   unsigned long now;
 
/* First check our per-socket dupack rate limit. */
if (tcp_oow_rate_limited(sock_net(sk), skb,
@@ -3467,9 +3468,11 @@ static void tcp_send_challenge_ack(struct sock *sk, 
const struct sk_buff *skb)
return;
 
/* Then check the check host-wide RFC 5961 rate limit. */
-   now = jiffies / HZ;
-   if (now != challenge_timestamp) {
+   now = jiffies;
+   if (time_before(now, challenge_timestamp) ||
+   time_after_eq(now, challenge_timestamp + challenge_window)) {
challenge_timestamp = now;
+   challenge_window = HZ/2 + prandom_u32_max(HZ);
challenge_count = 0;
}
if (++challenge_count <= sysctl_tcp_challenge_ack_limit) {




Re: [PATCH net-next 0/4] net: cleanup for UDP tunnel's GRO

2016-07-08 Thread Alexander Duyck
On Thu, Jul 7, 2016 at 8:58 AM, Paolo Abeni  wrote:
> With udp tunnel offload in place, the kernel can do GRO for some udp tunnels
> at the ingress device level. Currently both the geneve and the vxlan drivers
> implement an additional GRO aggregation point via gro_cells.
> The latter takes effect for tunnels using zero checksum udp packets, which are
> currently explicitly not aggregated by the udp offload layer.
>
> This patch series adapts the udp tunnel offload to process also zero checksum
> udp packets, if the tunnel's socket allow it. Aggregation, if possible is 
> always
> performed at the ingress device level.
>
> Then the gro_cells hooks, in both vxlan and geneve driver are removed.

I think removing the gro_cells hooks may be taking things one step too far.

I get that there is an impression that it is redundant but there are a
number of paths that could lead to VXLAN or GENEVE frames being
received that are not aggregated via GRO.  My concern here is that you
are optimizing for one specific use case while at the same time
possibly negatively impacting other use cases.  It would be useful to
provide some data on what the advantages and disadvantages are
expected to be.

- Alex


Re: ipv6 issues after an DDoS for kernel 4.6.3

2016-07-08 Thread Toralf Förster
On 07/08/2016 05:38 PM, Eric Dumazet wrote:
> With IPv4, a server can typically absorb 10 Mpps SYN without major
> disruption on linux-4.6
Well, this particular server even survived >900 MBit/sec w/o any service 
disruption at IPv4 ([1])
but yesterday with a much more less attack the IPv6 issue was bothering me.

[1] https://www.zwiebeltoralf.de/torserver/ddos_sysstat_example.txt

-- 
Toralf
PGP: C4EACDDE 0076E94E, OTR: 420E74C8 30246EE7


Re: [PATCH] xen-netback: prefer xenbus_write() over xenbus_printf() where possible

2016-07-08 Thread Wei Liu
On Fri, Jul 08, 2016 at 05:13:49PM +0100, Wei Liu wrote:
> On Thu, Jul 07, 2016 at 01:58:18AM -0600, Jan Beulich wrote:
> > ... as being the simpler variant.
> > 
> > Signed-off-by: Jan Beulich 
> 
> Acked-by: Wei Liu 

Please ignore this, I acked v2 instead.

Only v2 is needed.


Re: [Xen-devel] [PATCH v2 4/4] xen-netback: prefer xenbus_scanf() over xenbus_gather()

2016-07-08 Thread Wei Liu
On Fri, Jul 08, 2016 at 06:28:58AM -0600, Jan Beulich wrote:
> For single items being collected this should be preferred as being more
> typesafe (as the compiler can check format string and to-be-written-to
> variable match) and more efficient (requiring one less parameter to be
> passed).
> 
> Signed-off-by: Jan Beulich 

Acked-by: Wei Liu 


Re: [PATCH net-next 3/4] vxlan: remove gro_cell support

2016-07-08 Thread Eric Dumazet
On Fri, 2016-07-08 at 11:55 -0400, Hannes Frederic Sowa wrote:

> Exactly, thus we are also only touching UDP tunneling protocols at the
> moment. Did you nack the removal of gro_cell support from the udp
> protocols or are you fine with it, given that we won't take away the
> functionality to spread out skb_checksum to mulitple CPUs during GRO for
> other protocols and didn't plan to do so?

I am fine with it, but could you rephrase the changelog, otherwise some
people will think they can copy/paste this to other tunnels ?





Re: [PATCH 7/7] ARM: dts: NSP: Add bgmac entries

2016-07-08 Thread Scott Branden

Hi Jon,

In Subject line, you mean add amac entries.


On 16-07-08 08:56 AM, Jon Mason wrote:

Add device tree entries for the ethernet devices present on the
Broadcom Northstar Plus SoCs

Signed-off-by: Jon Mason 
---
  arch/arm/boot/dts/bcm-nsp.dtsi   | 18 ++
  arch/arm/boot/dts/bcm958625k.dts |  8 
  2 files changed, 26 insertions(+)

diff --git a/arch/arm/boot/dts/bcm-nsp.dtsi b/arch/arm/boot/dts/bcm-nsp.dtsi
index def9e78..f6d5abe 100644
--- a/arch/arm/boot/dts/bcm-nsp.dtsi
+++ b/arch/arm/boot/dts/bcm-nsp.dtsi
@@ -192,6 +192,24 @@
status = "disabled";
};

+   amac0: ethernet@22000 {
+   compatible = "brcm,nsp-amac";
+   reg = <0x022000 0x1000>,
+ <0x11 0x1000>;
+   reg-names = "amac_base", "idm_base";
+   interrupts = ;
+   status = "disabled";
+   };
+
+   amac1: ethernet@23000 {
+   compatible = "brcm,nsp-amac";
+   reg = <0x023000 0x1000>,
+ <0x111000 0x1000>;
+   reg-names = "amac_base", "idm_base";
+   interrupts = ;
+   status = "disabled";
+   };
+
nand: nand@26000 {
compatible = "brcm,nand-iproc", "brcm,brcmnand-v6.1";
reg = <0x026000 0x600>,
diff --git a/arch/arm/boot/dts/bcm958625k.dts b/arch/arm/boot/dts/bcm958625k.dts
index e298450..f41a13b 100644
--- a/arch/arm/boot/dts/bcm958625k.dts
+++ b/arch/arm/boot/dts/bcm958625k.dts
@@ -56,6 +56,14 @@
status = "okay";
  };

+&amac0 {
+   status = "okay";
+};
+
+&amac1 {
+   status = "okay";
+};
+
  &pcie0 {
status = "okay";
  };



Regards,
 Scott


Re: [PATCH] xen-netback: prefer xenbus_write() over xenbus_printf() where possible

2016-07-08 Thread Wei Liu
On Thu, Jul 07, 2016 at 01:58:18AM -0600, Jan Beulich wrote:
> ... as being the simpler variant.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Wei Liu 


Re: XDP seeking input from NIC hardware vendors

2016-07-08 Thread Jakub Kicinski
On Fri, 8 Jul 2016 17:19:43 +0200, Jesper Dangaard Brouer wrote:
> On Fri, 8 Jul 2016 14:44:53 +0100 Jakub Kicinski 
>  wrote:
> > On Thu, 7 Jul 2016 19:22:12 -0700, Alexei Starovoitov wrote:
> > > > If the goal is to just separate XDP traffic from non-XDP traffic
> > > > you could accomplish this with a combination of SR-IOV/macvlan to
> > > > separate the device queues into multiple netdevs and then run XDP
> > > > on just one of the netdevs. Then use flow director (ethtool) or
> > > > 'tc cls_u32/flower' to steer traffic to the netdev. This is how
> > > > we support multiple networking stacks on one device by the way it
> > > > is called the bifurcated driver. Its not too far of a stretch to
> > > > think we could offload some simple XDP programs to program the
> > > > splitting of traffic instead of cls_u32/flower/flow_director and
> > > > then you would have a stack of XDP programs. One running in
> > > > hardware and a set running on the queues in software.  
> > > 
> > >
> > > the above sounds like much better approach then Jesper/mine
> > > prog_per_ring stuff.
> > >
> > > If we can split the nic via sriov and have dedicated netdev via VF
> > > just for XDP that's way cleaner approach. I guess we won't need to
> > > do xdp_rxqmask after all.
> > 
> > +1
> > 
> > I was thinking about using eBPF to direct to NIC queues but concluded
> > that doing a redirect to a VF is cleaner.  Especially if the PF driver
> > supports VF representatives we could potentially just use
> > bpf_redirect(VFR netdev) and the VF doesn't even have to be handled by
> > the same stack.  
> 
> I actually disagree.
> 
> I _do_ want to use the "filter" part of eBPF to direct to NIC queues, and
> then run a single/specific XDP program on that queue.
> 
> Why to I want this?
> 
> This part of solving a very fundamental CS problem (early demux), when
> wanting to support Zero-copy on RX.  The basic problem that the NIC
> driver need to map RX pages into the RX ring, prior to receiving
> packets. Thus, we need HW support to steer packets, for gaining enough
> isolation (e.g between tenants domains) for allowing zero-copy.
> 
> 
> Based on the flexibility of the HW-filter, the granularity achievable
> for isolation (e.g. application specific) is much more flexible.  Than
> splitting up the entire NIC with SR-IOV, VFs or macvlans.

I think of SR-IOV VFs a way of grouping queues.  If HW is capable of
directing to a queue it's usually capable of directing to a VF as well.
And the VF could have all other traffic disabled so you would get only
packets directed to it by the (BPF) filter - same as you would for the
queue.  Does that make sense for zero copy apps?


[PATCH net] udp: prevent bugcheck if filter truncates packet too much

2016-07-08 Thread Michal Kubecek
If socket filter truncates an udp packet below the length of UDP header
in udpv6_queue_rcv_skb() or udp_queue_rcv_skb(), it will trigger a
BUG_ON in skb_pull_rcsum(). This BUG_ON (and therefore a system crash if
kernel is configured that way) can be easily enforced by an unprivileged
user which was reported as CVE-2016-6162. For a reproducer, see
http://seclists.org/oss-sec/2016/q3/8

Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Reported-by: Marco Grassi 
Signed-off-by: Michal Kubecek 
---
 net/ipv4/udp.c | 2 ++
 net/ipv6/udp.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ca5e8ea29538..4aed8fc23d32 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1583,6 +1583,8 @@ int udp_queue_rcv_skb(struct sock *sk, struct sk_buff 
*skb)
 
if (sk_filter(sk, skb))
goto drop;
+   if (unlikely(skb->len < sizeof(struct udphdr)))
+   goto drop;
 
udp_csum_pull_header(skb);
if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 005dc82c2138..acc09705618b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -620,6 +620,8 @@ int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff 
*skb)
 
if (sk_filter(sk, skb))
goto drop;
+   if (unlikely(skb->len < sizeof(struct udphdr)))
+   goto drop;
 
udp_csum_pull_header(skb);
if (sk_rcvqueues_full(sk, sk->sk_rcvbuf)) {
-- 
2.9.0



[net-next PATCH RFC] mlx4: RX prefetch loop

2016-07-08 Thread Jesper Dangaard Brouer
This patch is about prefetching without being opportunistic.
The idea is only to start prefetching on packets that are marked as
ready/completed in the RX ring.

This is acheived by splitting the napi_poll call mlx4_en_process_rx_cq()
loop into two.  The first loop extract completed CQEs and start
prefetching on data and RX descriptors. The second loop process the
real packets.

Details: The batching of CQEs are limited to 8 in-order to avoid
stressing the LFB (Line Fill Buffer) and cache usage.

I've left some opportunities for prefetching CQE descriptors.


The performance improvements on my platform are huge, as I tested this
on a CPU without DDIO.  The performance for XDP is the same as with
Brendens prefetch hack.

Signed-off-by: Jesper Dangaard Brouer 
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c |   70 +---
 1 file changed, 62 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 41c76fe00a7f..c5efe03e31ce 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -782,7 +782,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
int doorbell_pending;
struct sk_buff *skb;
int tx_index;
-   int index;
+   int index, saved_index, i;
int nr;
unsigned int length;
int polled = 0;
@@ -790,6 +790,10 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
int factor = priv->cqe_factor;
u64 timestamp;
bool l2_tunnel;
+#define PREFETCH_BATCH 8
+   struct mlx4_cqe *cqe_array[PREFETCH_BATCH];
+   int cqe_idx;
+   bool cqe_more;
 
if (!priv->port_up)
return 0;
@@ -801,24 +805,75 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
doorbell_pending = 0;
tx_index = (priv->tx_ring_num - priv->rsv_tx_rings) + cq->ring;
 
+next_prefetch_batch:
+   cqe_idx = 0;
+   cqe_more = false;
+
/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx
 * descriptor offset can be deduced from the CQE index instead of
 * reading 'cqe->index' */
index = cq->mcq.cons_index & ring->size_mask;
+   saved_index = index;
cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
 
-   /* Process all completed CQEs */
+   /* Extract and prefetch completed CQEs */
while (XNOR(cqe->owner_sr_opcode & MLX4_CQE_OWNER_MASK,
cq->mcq.cons_index & cq->size)) {
+   void *data;
 
frags = ring->rx_info + (index << priv->log_rx_info);
rx_desc = ring->buf + (index << ring->log_stride);
+   prefetch(rx_desc);
 
/*
 * make sure we read the CQE after we read the ownership bit
 */
dma_rmb();
 
+   cqe_array[cqe_idx++] = cqe;
+
+   /* Base error handling here, free handled in next loop */
+   if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) ==
+MLX4_CQE_OPCODE_ERROR))
+   goto skip;
+
+   data = page_address(frags[0].page) + frags[0].page_offset;
+   prefetch(data);
+   skip:
+   ++cq->mcq.cons_index;
+   index = (cq->mcq.cons_index) & ring->size_mask;
+   cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
+   /* likely too slow prefetching CQE here ... do look-a-head ? */
+   //prefetch(cqe + priv->cqe_size * 3);
+
+   if (++polled == budget) {
+   cqe_more = false;
+   break;
+   }
+   if (cqe_idx == PREFETCH_BATCH) {
+   cqe_more = true;
+   // IDEA: Opportunistic prefetch CQEs for 
next_prefetch_batch?
+   //for (i = 0; i < PREFETCH_BATCH; i++) {
+   //  prefetch(cqe + priv->cqe_size * i);
+   //}
+   break;
+   }
+   }
+   /* Hint: The cqe_idx will be number of packets, it can be used
+* for bulk allocating SKBs
+*/
+
+   /* Now, index function as index for rx_desc */
+   index = saved_index;
+
+   /* Process completed CQEs in cqe_array */
+   for (i = 0; i < cqe_idx; i++) {
+
+   cqe = cqe_array[i];
+
+   frags = ring->rx_info + (index << priv->log_rx_info);
+   rx_desc = ring->buf + (index << ring->log_stride);
+
/* Drop packet on bad receive or bad checksum */
if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) ==
MLX4_CQE_OPCODE_ERROR)) {
@@ -1065,14 +1120,13 @@ next:
m

Re: [PATCH v2 0/6] net: ethernet: bgmac: Add platform device support

2016-07-08 Thread Jon Mason
On Thu, Jul 7, 2016 at 7:08 PM, Jon Mason  wrote:
> David Miller, Please consider including patches 1-5 in net-next
>
> Florian Fainelli, Please consider including patches 6 & 7 in
>   devicetree/next

Oops.  I didn't send out the 7th patch in this series.  Sending out
shortly as 7/7.

Thanks,
Jon

> Changes in v2:
> * Made device tree binding changes suggested by Sergei Shtylyov,
>   Ray Jui, Rob Herring, Florian Fainelli, and Arnd Bergmann
> * Removed devm_* error paths in the bgmac_platform.c suggested by
>   Florian Fainelli
> * Added Arnd Bergmann's Acked-by to the first 5 (there were changes
>   outlined in the bullets above, but I believe them to be minor enough
>   for him to not revoke his acks)
>
>
> This patch series adds support for other, non-bcma iProc SoC's to the
> bgmac driver.  This series only adds NSP support, but we are interested
> in adding support for the Cygnus and NS2 families (with more possible
> down the road).
>
> To support non-bcma enabled SoCs, we need to add the standard device
> tree "platform device" support.  Unfortunately, this driver is very
> tighly coupled with the bcma bus and much unwinding is needed.  I tried
> to break this up into a number of patches to make it more obvious what
> was being done to add platform device support.  I was able to verify
> that the bcma code still works using a 53012K board (NS SoC), and that
> the platform code works using a 58625K board (NSP SoC).
>
> Thanks,
> Jon
>
>
> Jon Mason (6):
>   net: ethernet: bgmac: change bgmac_* prints to dev_* prints
>   net: ethernet: bgmac: add dma_dev pointer
>   net: ethernet: bgmac: move BCMA MDIO Phy code into a separate file
>   net: ethernet: bgmac: convert to feature flags
>   net: ethernet: bgmac: Add platform device support
>   dt-bindings: net: bgmac: add bindings documentation for bgmac
>
>  .../devicetree/bindings/net/brcm,amac.txt  |  24 +
>  .../devicetree/bindings/net/brcm,bgmac-nsp.txt |  24 +
>  drivers/net/ethernet/broadcom/Kconfig  |  23 +-
>  drivers/net/ethernet/broadcom/Makefile |   2 +
>  drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c| 266 +
>  drivers/net/ethernet/broadcom/bgmac-bcma.c | 315 ++
>  drivers/net/ethernet/broadcom/bgmac-platform.c | 189 ++
>  drivers/net/ethernet/broadcom/bgmac.c  | 658 
> +
>  drivers/net/ethernet/broadcom/bgmac.h  | 112 +++-
>  9 files changed, 1097 insertions(+), 516 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/net/brcm,amac.txt
>  create mode 100644 Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt
>  create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma-mdio.c
>  create mode 100644 drivers/net/ethernet/broadcom/bgmac-bcma.c
>  create mode 100644 drivers/net/ethernet/broadcom/bgmac-platform.c
>
> --
> 1.9.1
>


[PATCH 7/7] ARM: dts: NSP: Add bgmac entries

2016-07-08 Thread Jon Mason
Add device tree entries for the ethernet devices present on the
Broadcom Northstar Plus SoCs

Signed-off-by: Jon Mason 
---
 arch/arm/boot/dts/bcm-nsp.dtsi   | 18 ++
 arch/arm/boot/dts/bcm958625k.dts |  8 
 2 files changed, 26 insertions(+)

diff --git a/arch/arm/boot/dts/bcm-nsp.dtsi b/arch/arm/boot/dts/bcm-nsp.dtsi
index def9e78..f6d5abe 100644
--- a/arch/arm/boot/dts/bcm-nsp.dtsi
+++ b/arch/arm/boot/dts/bcm-nsp.dtsi
@@ -192,6 +192,24 @@
status = "disabled";
};
 
+   amac0: ethernet@22000 {
+   compatible = "brcm,nsp-amac";
+   reg = <0x022000 0x1000>,
+ <0x11 0x1000>;
+   reg-names = "amac_base", "idm_base";
+   interrupts = ;
+   status = "disabled";
+   };
+
+   amac1: ethernet@23000 {
+   compatible = "brcm,nsp-amac";
+   reg = <0x023000 0x1000>,
+ <0x111000 0x1000>;
+   reg-names = "amac_base", "idm_base";
+   interrupts = ;
+   status = "disabled";
+   };
+
nand: nand@26000 {
compatible = "brcm,nand-iproc", "brcm,brcmnand-v6.1";
reg = <0x026000 0x600>,
diff --git a/arch/arm/boot/dts/bcm958625k.dts b/arch/arm/boot/dts/bcm958625k.dts
index e298450..f41a13b 100644
--- a/arch/arm/boot/dts/bcm958625k.dts
+++ b/arch/arm/boot/dts/bcm958625k.dts
@@ -56,6 +56,14 @@
status = "okay";
 };
 
+&amac0 {
+   status = "okay";
+};
+
+&amac1 {
+   status = "okay";
+};
+
 &pcie0 {
status = "okay";
 };
-- 
1.9.1



Re: [PATCH net-next 3/4] vxlan: remove gro_cell support

2016-07-08 Thread Hannes Frederic Sowa
On 08.07.2016 11:33, Eric Dumazet wrote:
> On Fri, 2016-07-08 at 11:12 -0400, Hannes Frederic Sowa wrote:
>> Hi Eric,
>>
>> On 07.07.2016 12:13, Eric Dumazet wrote:
>>> On Thu, 2016-07-07 at 17:58 +0200, Paolo Abeni wrote:
 GRO is now handled entirely by the udp_offload layer and  there is no need
 for trying it again at the device level. We can drop gro_cell usage,
 simplifying the driver a bit, while maintaining the same performance for
 TCP and improving slightly for UDP.
 This basically reverts the commit 58ce31cca1ff ("vxlan: GRO support
 at tunnel layer")
>>>
>>> Note that gro_cells provide GRO support after RPS, so this helps when we
>>> must perform TCP checksum computation, if NIC lacks CHECKSUM_COMPLETE
>>>
>>> (Say we receive packets all steered to a single RX queue due to RSS hash
>>> being computed on outer header only)
>>>
>>> Some people disable GRO on the physical device, but enable GRO on the
>>> tunnels.
>>
>> we are currently discussing your feedback and wonder how much it makes
>> sense to support such a scenario?
>>
>> We have part of the inner hash in the outer UDP source port. So even the
>> outer hash does provide enough entropy to get frames of one tunnel on
>> multiple CPUs via hardware hashing - given that you don't care about OoO
>> for UDP (I infer that from the fact that RPS will also reorder UDP
>> frames in case of fragmentation).
>>
>> I wonder why it makes sense to still take single RX queue nics into
>> consideration? We already provide support for multiqueue devices for
>> most VM-related interfaces as well. Can you describe why someone would
>> do such a scenario?
> 
> I was simply pointing out there are some use cases where the ability to
> split incoming traffic on multiple cpus can help, especially with dumb
> NIC.
> 
> Fact that GRO is already handled on the NIC itself is not something that
> is hard coded. GRO can be enabled or disabled.
>
> If you remove GRO support at tunnel, then you remove some flexibility.
>
> For example, when GRO for GRE was added by Jerry Chu, we did not remove
> GRO on GRE devices, because mlx4 NICs for example are unable to compute
> TCP checksum when GRE encapsulation is used. A single CPU can not decap
> at line rate on 40Gbit NIC without RX checksum offloading. An admin can
> choose to use RPS to  split traffic coming on a single RX queue to X
> cpus, and enable GRO after RPS, instead of before.
>
> UDP might be different, if the sender properly adds entropy on outer
> header (which is not something you can do with GRE)

Exactly, thus we are also only touching UDP tunneling protocols at the
moment. Did you nack the removal of gro_cell support from the udp
protocols or are you fine with it, given that we won't take away the
functionality to spread out skb_checksum to mulitple CPUs during GRO for
other protocols and didn't plan to do so?

> You probably could default GRO on tunnels to off, since by default GRO
> would already happen at the physical interface.

Thanks!



Re: ipv6 issues after an DDoS for kernel 4.6.3

2016-07-08 Thread Eric Dumazet
On Fri, 2016-07-08 at 11:43 -0400, Hannes Frederic Sowa wrote:

> Exactly, I can very well imagine that the stack becomes unresponsive
> during DDoS, but after the DDoS I don't see a reason why services should
> come up like in IPv4.

If the service uses different listeners, one for IPV4, one (or more) for
IPv6, it is possible that an IPV6 flood leaves the IPV6 part in a sad
state (this can also be an application bug, dealing with some backlog,
like DNS requests)





Re: ipv6 issues after an DDoS for kernel 4.6.3

2016-07-08 Thread Hannes Frederic Sowa
On 08.07.2016 11:38, Eric Dumazet wrote:
> On Fri, 2016-07-08 at 11:28 -0400, Hannes Frederic Sowa wrote:
>> On 08.07.2016 10:14, Eric Dumazet wrote:
>>> On Fri, 2016-07-08 at 15:51 +0200, Toralf Förster wrote:
 I do run a 4.6.3 hardened Gentoo kernel at a commodity i7 server. A
 DDoS with about 300 MBit/sec over 5 mins resulted an issue for ipv6 at
 that system.

 The IPv6 monitoring from my ISP told my that the to be monitored
 services (80, 443, 5) weren't reachable any longer at ipv6 (at
 ipv4 there was no issue). Restarting the NIC brought back green lights
 for the services at the ipv6 ports too.
>>>
>>> Hard to tell without knowing DDOS details, but IPv6 lacks some
>>> scalability improvements found in IPv4.
>>>
>>> IPv4 no longer has a routing cache, but IPv6 still has one.
>>
>> The difference is that routing exceptions are stored in "the" trie
>> instead of hash tables in the fib nodes. IPv4 limits that by the size of
>> the hash tables, in IPv6 we grow to ipv6/route/max_size, which is pretty
>> low.
>>
>> Only redirects and mtu updates could potentially increase its size.
>> Redirects are limited to the same L2 network, MTU updates must hit the
>> socket to be acted upon appropriately. All limited to max_size, so I
>> currently see a major problem in the routing code.
>>
>> Unfortunately your report has not enough details to pinpoint a specific
>> problem in the kernel
> 
> Well, the typical DDOS simply use SYN flood, right ? ;)

Exactly, I can very well imagine that the stack becomes unresponsive
during DDoS, but after the DDoS I don't see a reason why services should
come up like in IPv4.

> With IPv4, a server can typically absorb 10 Mpps SYN without major
> disruption on linux-4.6
> 
> With IPv6, kernel hits the route rwlock quite hard, at less than 2 Mpps
> 
> 30.44%  [kernel]  [k] ip6_pol_route.isra.49 
> 12.93%  [kernel]  [k] fib6_lookup   
> 12.35%  [kernel]  [k] fib6_get_table
> 10.36%  [kernel]  [k] _raw_read_lock_bh 
>  8.29%  [kernel]  [k] _raw_read_unlock_bh   
>  2.02%  [kernel]  [k] dst_release   
>  1.86%  [kernel]  [k] memcpy_erms   
> 
> I guess that switching to plain spinlock could help a bit, before major
> surgery.

Hmm, interesting idea.




[PATCH] bnxt_en: initialize rc to zero to avoid returning garbage

2016-07-08 Thread Colin King
From: Colin Ian King 

rc is not initialized so it can contain garbage if it is not
set by the call to bnxt_read_sfp_module_eeprom_info. Ensure
garbage is not returned by initializing rc to 0.

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 0f7dd86..64466f5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -1684,7 +1684,7 @@ static int bnxt_get_module_eeprom(struct net_device *dev,
 {
struct bnxt *bp = netdev_priv(dev);
u16  start = eeprom->offset, length = eeprom->len;
-   int rc;
+   int rc = 0;
 
memset(data, 0, eeprom->len);
 
-- 
2.8.1



Re: ipv6 issues after an DDoS for kernel 4.6.3

2016-07-08 Thread Eric Dumazet
On Fri, 2016-07-08 at 16:34 +0200, Toralf Förster wrote:
> On 07/08/2016 04:14 PM, Eric Dumazet wrote:
> > Are you sure conntrack is needed at all ?
> 
> Erm, I didn't mention conntrack - but yes, I do have in the firewall rules.
> 
> It is my understanding that conntrack is best practise, right ?

It depends what you want to protect ?

linux TCP stack should work quite well without conntrack.

If you are aware of any known defect, we should fix TCP stack instead of
working around by adding a very expensive framework.





Re: ipv6 issues after an DDoS for kernel 4.6.3

2016-07-08 Thread Eric Dumazet
On Fri, 2016-07-08 at 11:28 -0400, Hannes Frederic Sowa wrote:
> On 08.07.2016 10:14, Eric Dumazet wrote:
> > On Fri, 2016-07-08 at 15:51 +0200, Toralf Förster wrote:
> >> I do run a 4.6.3 hardened Gentoo kernel at a commodity i7 server. A
> >> DDoS with about 300 MBit/sec over 5 mins resulted an issue for ipv6 at
> >> that system.
> >>
> >> The IPv6 monitoring from my ISP told my that the to be monitored
> >> services (80, 443, 5) weren't reachable any longer at ipv6 (at
> >> ipv4 there was no issue). Restarting the NIC brought back green lights
> >> for the services at the ipv6 ports too.
> > 
> > Hard to tell without knowing DDOS details, but IPv6 lacks some
> > scalability improvements found in IPv4.
> > 
> > IPv4 no longer has a routing cache, but IPv6 still has one.
> 
> The difference is that routing exceptions are stored in "the" trie
> instead of hash tables in the fib nodes. IPv4 limits that by the size of
> the hash tables, in IPv6 we grow to ipv6/route/max_size, which is pretty
> low.
> 
> Only redirects and mtu updates could potentially increase its size.
> Redirects are limited to the same L2 network, MTU updates must hit the
> socket to be acted upon appropriately. All limited to max_size, so I
> currently see a major problem in the routing code.
> 
> Unfortunately your report has not enough details to pinpoint a specific
> problem in the kernel

Well, the typical DDOS simply use SYN flood, right ? ;)

With IPv4, a server can typically absorb 10 Mpps SYN without major
disruption on linux-4.6

With IPv6, kernel hits the route rwlock quite hard, at less than 2 Mpps

30.44%  [kernel]  [k] ip6_pol_route.isra.49 
12.93%  [kernel]  [k] fib6_lookup   
12.35%  [kernel]  [k] fib6_get_table
10.36%  [kernel]  [k] _raw_read_lock_bh 
 8.29%  [kernel]  [k] _raw_read_unlock_bh   
 2.02%  [kernel]  [k] dst_release   
 1.86%  [kernel]  [k] memcpy_erms   

I guess that switching to plain spinlock could help a bit, before major
surgery.






Re: ipv6 issues after an DDoS for kernel 4.6.3

2016-07-08 Thread Hannes Frederic Sowa
On 08.07.2016 10:14, Eric Dumazet wrote:
> On Fri, 2016-07-08 at 15:51 +0200, Toralf Förster wrote:
>> I do run a 4.6.3 hardened Gentoo kernel at a commodity i7 server. A
>> DDoS with about 300 MBit/sec over 5 mins resulted an issue for ipv6 at
>> that system.
>>
>> The IPv6 monitoring from my ISP told my that the to be monitored
>> services (80, 443, 5) weren't reachable any longer at ipv6 (at
>> ipv4 there was no issue). Restarting the NIC brought back green lights
>> for the services at the ipv6 ports too.
> 
> Hard to tell without knowing DDOS details, but IPv6 lacks some
> scalability improvements found in IPv4.
> 
> IPv4 no longer has a routing cache, but IPv6 still has one.

The difference is that routing exceptions are stored in "the" trie
instead of hash tables in the fib nodes. IPv4 limits that by the size of
the hash tables, in IPv6 we grow to ipv6/route/max_size, which is pretty
low.

Only redirects and mtu updates could potentially increase its size.
Redirects are limited to the same L2 network, MTU updates must hit the
socket to be acted upon appropriately. All limited to max_size, so I
currently see a major problem in the routing code.

Unfortunately your report has not enough details to pinpoint a specific
problem in the kernel

Bye,
Hannes



  1   2   >