from:"David Ahern"

Re: [PATCH] neighbor: tracing: Move pin6 inside CONFIG_IPV6=y section

2023-10-17 Thread David Ahern

On 10/16/23 6:49 AM, Geert Uytterhoeven wrote:
> When CONFIG_IPV6=n, and building with W=1:
> 
> In file included from include/trace/define_trace.h:102,
>from include/trace/events/neigh.h:255,
>from net/core/net-traces.c:51:
> include/trace/events/neigh.h: In function 
> ‘trace_event_raw_event_neigh_create’:
> include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not 
> used [-Werror=unused-but-set-variable]
>42 | struct in6_addr *pin6;
> |  ^~~~
> include/trace/trace_events.h:402:11: note: in definition of macro 
> ‘DECLARE_EVENT_CLASS’
>   402 | { assign; }   
>   \
> |   ^~
> include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
>44 |  PARAMS(assign),   \
> |  ^~
> include/trace/events/neigh.h:23:1: note: in expansion of macro 
> ‘TRACE_EVENT’
>23 | TRACE_EVENT(neigh_create,
> | ^~~
> include/trace/events/neigh.h:41:9: note: in expansion of macro 
> ‘TP_fast_assign’
>41 | TP_fast_assign(
> | ^~
> In file included from include/trace/define_trace.h:103,
>from include/trace/events/neigh.h:255,
>from net/core/net-traces.c:51:
> include/trace/events/neigh.h: In function ‘perf_trace_neigh_create’:
> include/trace/events/neigh.h:42:34: error: variable ‘pin6’ set but not 
> used [-Werror=unused-but-set-variable]
>42 | struct in6_addr *pin6;
> |  ^~~~
> include/trace/perf.h:51:11: note: in definition of macro 
> ‘DECLARE_EVENT_CLASS’
>51 | { assign; }   
>   \
> |   ^~
> include/trace/trace_events.h:44:30: note: in expansion of macro ‘PARAMS’
>44 |  PARAMS(assign),   \
> |  ^~
> include/trace/events/neigh.h:23:1: note: in expansion of macro 
> ‘TRACE_EVENT’
>23 | TRACE_EVENT(neigh_create,
> | ^~~
> include/trace/events/neigh.h:41:9: note: in expansion of macro 
> ‘TP_fast_assign’
>41 | TP_fast_assign(
> | ^~
> 
> Indeed, the variable pin6 is declared and initialized unconditionally,
> while it is only used and needlessly re-initialized when support for
> IPv6 is enabled.
> 
> Fix this by dropping the unused variable initialization, and moving the
> variable declaration inside the existing section protected by a check
> for CONFIG_IPV6.
> 
> Fixes: fc651001d2c5ca4f ("neighbor: Add tracepoint to __neigh_create")
> Signed-off-by: Geert Uytterhoeven 
> ---
> No changes in generated code.
> 
>  include/trace/events/neigh.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Reviewed-by: David Ahern 

Google put your email in the spam folder, BTW.

Re: PROBLEM: DoS Attack on Fragment Cache

2021-04-16 Thread David Ahern

[ cc author of 648700f76b03b7e8149d13cc2bdb3355035258a9 ]

On 4/16/21 3:58 PM, Keyu Man wrote:
> Hi,
> 
>  
> 
>     My name is Keyu Man. We are a group of researchers from University
> of California, Riverside. Zhiyun Qian is my advisor. We found the code
> in processing IPv4/IPv6 fragments will potentially lead to DoS Attacks.
> Specifically, after the latest kernel receives an IPv4 fragment, it will
> try to fit it into a queue by calling function
> 
>  
> 
>     struct inet_frag_queue *inet_frag_find(struct fqdir *fqdir, void
> *key) in net/ipv4/inet_fragment.c.
> 
>  
> 
>     However, this function will first check if the existing fragment
> memory exceeds the fqdir->high_thresh. If it exceeds, then drop the
> fragment regardless whether it belongs to a new queue or an existing queue.
> 
> Chances are that an attacker can fill the cache with fragments that
> will never be assembled (i.e., only sends the first fragment with new
> IPIDs every time) to exceed the threshold so that all future incoming
> fragmented IPv4 traffic would be blocked and dropped. Since there is no
> GC mechanism, the victim host has to wait for 30s when the fragments are
> expired to continue receive incoming fragments normally.
> 
>     In practice, given the 4MB fragment cache, the attacker only needs
> to send 1766 fragments to exhaust the cache and DoS the victim for 30s,
> whose cost is pretty low. Besides, IPv6 would also be affected since the
> issue resides in inet part.
> 
> This issue is introduced in commit
> 648700f76b03b7e8149d13cc2bdb3355035258a9 (inet: frags: use rhashtables
> for reassembly units) which removes fqdir->low_thresh, and GC worker as
> well. We would gently request to bring GC worker back to the kernel to
> prevent the DoS attacks.
> 
> Looking forward to hear from you
> 
>  
> 
>     Thanks,
> 
> Keyu Man
>

Re: [RFC net-next 0/1] seg6: Counters for SRv6 Behaviors

2021-04-07 Thread David Ahern

Since this is a single patch set, just put this good cover letter
content as the message in the patch.

Re: [RFC net-next 1/1] seg6: add counters support for SRv6 Behaviors

2021-04-07 Thread David Ahern

On 4/7/21 12:03 PM, Andrea Mayer wrote:
> diff --git a/include/uapi/linux/seg6_local.h b/include/uapi/linux/seg6_local.h
> index 3b39ef1dbb46..ae5e3fd12b73 100644
> --- a/include/uapi/linux/seg6_local.h
> +++ b/include/uapi/linux/seg6_local.h
> @@ -27,6 +27,7 @@ enum {
>   SEG6_LOCAL_OIF,
>   SEG6_LOCAL_BPF,
>   SEG6_LOCAL_VRFTABLE,
> + SEG6_LOCAL_COUNTERS,
>   __SEG6_LOCAL_MAX,
>  };
>  #define SEG6_LOCAL_MAX (__SEG6_LOCAL_MAX - 1)
> @@ -78,4 +79,11 @@ enum {
>  
>  #define SEG6_LOCAL_BPF_PROG_MAX (__SEG6_LOCAL_BPF_PROG_MAX - 1)
>  
> +/* SRv6 Behavior counters */
> +struct seg6_local_counters {
> + __u64 rx_packets;
> + __u64 rx_bytes;
> + __u64 rx_errors;
> +};
> +
>  #endif

It's highly likely that more stats would get added over time. It would
be good to document that here for interested parties and then make sure
iproute2 can handle different sized stats structs. e.g., commit support
to your repo, then add a new one (e.g, rx_drops) and verify the
combinations handle it. e.g., old kernel - new iproute2, new kernel -
old iproute, old - old and new-new.

Re: [PATCH net] net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...);

2021-04-01 Thread David Ahern

On 4/1/21 12:59 AM, Norman Maurer wrote:
> From: Norman Maurer 
> 
> Support for UDP_GRO was added in the past but the implementation for
> getsockopt was missed which did lead to an error when we tried to
> retrieve the setting for UDP_GRO. This patch adds the missing switch
> case for UDP_GRO
> 
> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
> Signed-off-by: Norman Maurer 
> ---
>  net/ipv4/udp.c | 4 
>  1 file changed, 4 insertions(+)
> 

Reviewed-by: David Ahern

Re: [PATCH] udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...)

2021-03-31 Thread David Ahern

On 3/31/21 7:10 AM, Norman Maurer wrote:
> Friendly ping… 
> 
> As this missing change was most likely an oversight in the original commit I 
> do think it should go into 5.12 and subsequently stable as well. That’s also 
> the reason why I didn’t send a v2 and changed the commit message / subject 
> for the patch. For me it clearly is a bug and not a new feature.
> 
> 

I agree that it should be added to net

If you do not see it here:
  https://patchwork.kernel.org/project/netdevbpf/list/

you need to re-send and clearly mark it as [PATCH net]. Make sure it has
a Fixes tag.

Re: [PATCH] sit: use min

2021-03-27 Thread David Ahern

On 3/27/21 3:29 AM, Julia Lawall wrote:
> From: kernel test robot 
> 
> Opportunity for min()
> 
> Generated by: scripts/coccinelle/misc/minmax.cocci
> 
> CC: Denis Efremov 
> Reported-by: kernel test robot 
> Signed-off-by: kernel test robot 
> ---
> 
>  sit.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: David Ahern

Re: [PATCH 2/2] net: ipv4: route.c: Remove unnecessary if()

2021-03-25 Thread David Ahern

On 3/23/21 9:10 PM, Yejune Deng wrote:
> negative_advice handler is only called when dst is non-NULL hence the
> 'if (rt)' check can be removed. 'if' and 'else if' can be merged together.
> And use container_of() instead of (struct rtable *).
> 
> Signed-off-by: Yejune Deng 
> ---
>  net/ipv4/route.c | 16 ++--
>  1 file changed, 6 insertions(+), 10 deletions(-)
> 


Reviewed-by: David Ahern

Re: [PATCH net-next 0/6] page_pool: recycle buffers

2021-03-23 Thread David Ahern

On 3/22/21 11:02 AM, Matteo Croce wrote:
> From: Matteo Croce 
> 
> This series enables recycling of the buffers allocated with the page_pool API.
> The first two patches are just prerequisite to save space in a struct and
> avoid recycling pages allocated with other API.
> Patch 2 was based on a previous idea from Jonathan Lemon.
> 
> The third one is the real recycling, 4 fixes the compilation of 
> __skb_frag_unref
> users, and 5,6 enable the recycling on two drivers.

patch 4 should be folded into 3; each patch should build without errors.

> 
> In the last two patches I reported the improvement I have with the series.
> 
> The recycling as is can't be used with drivers like mlx5 which do page split,
> but this is documented in a comment.
> In the future, a refcount can be used so to support mlx5 with no changes.

Is the end goal of the page_pool changes to remove driver private caches?

Re: [PATCH] net: ipv4: route.c: Remove unnecessary {else} if()

2021-03-23 Thread David Ahern

subject line should have net-next as the target branch


On 3/23/21 4:20 AM, Yejune Deng wrote:
> Put if and else if together, and remove unnecessary judgments, because
> it's caller can make sure it is true. And add likely() in
> ipv4_confirm_neigh().
> 
> Signed-off-by: Yejune Deng 
> ---
>  net/ipv4/route.c | 18 +++---
>  1 file changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index fa68c2612252..f4ba07c5c1b1 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -440,7 +440,7 @@ static void ipv4_confirm_neigh(const struct dst_entry 
> *dst, const void *daddr)
>   struct net_device *dev = dst->dev;
>   const __be32 *pkey = daddr;
>  
> - if (rt->rt_gw_family == AF_INET) {
> + if (likely(rt->rt_gw_family == AF_INET)) {
>   pkey = (const __be32 *)>rt_gw4;
>   } else if (rt->rt_gw_family == AF_INET6) {
>   return __ipv6_confirm_neigh_stub(dev, >rt_gw6);
> @@ -814,19 +814,15 @@ static void ip_do_redirect(struct dst_entry *dst, 
> struct sock *sk, struct sk_buf
>  
>  static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
>  {
> - struct rtable *rt = (struct rtable *)dst;
> + struct rtable *rt = container_of(dst, struct rtable, dst);
>   struct dst_entry *ret = dst;
>  
> - if (rt) {
> - if (dst->obsolete > 0) {
> - ip_rt_put(rt);
> - ret = NULL;
> - } else if ((rt->rt_flags & RTCF_REDIRECTED) ||
> -rt->dst.expires) {
> - ip_rt_put(rt);
> - ret = NULL;
> - }
> + if (dst->obsolete > 0 || rt->dst.expires ||
> + (rt->rt_flags & RTCF_REDIRECTED)) {
> + ip_rt_put(rt);
> + ret = NULL;
>   }
> +
>   return ret;
>  }
>  
> 

This should be 2 separate patches since they are unrelated changes.

For the ipv4_negative_advice, the changelog should note that
negative_advice handler is only called when dst is non-NULL hence the
'if (rt)' check can be removed.

Re: [PATCH] ipv4/raw: support binding to nonlocal addresses

2021-03-21 Thread David Ahern

On 3/20/21 6:20 PM, Riccardo Paolo Bestetti wrote:
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index 50a73178d63a..734c0332b54b 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -717,6 +717,7 @@ static int raw_bind(struct sock *sk, struct sockaddr 
> *uaddr, int addr_len)
>  {
>   struct inet_sock *inet = inet_sk(sk);
>   struct sockaddr_in *addr = (struct sockaddr_in *) uaddr;
> + struct net *net = sock_net(sk);
>   u32 tb_id = RT_TABLE_LOCAL;
>   int ret = -EINVAL;
>   int chk_addr_ret;
> @@ -732,7 +733,8 @@ static int raw_bind(struct sock *sk, struct sockaddr 
> *uaddr, int addr_len)
>   tb_id);
>  
>   ret = -EADDRNOTAVAIL;
> - if (addr->sin_addr.s_addr && chk_addr_ret != RTN_LOCAL &&
> + if (!inet_can_nonlocal_bind(net, inet) &&
> + addr->sin_addr.s_addr && chk_addr_ret != RTN_LOCAL &&
>   chk_addr_ret != RTN_MULTICAST && chk_addr_ret != RTN_BROADCAST)
>   goto out;
>   inet->inet_rcv_saddr = inet->inet_saddr = addr->sin_addr.s_addr;
> 


Please add test cases to ipv4_addr_bind and ipv6_addr_bind in
tools/testing/selftests/net/fcnal-test.sh. The latter will verify if
IPv6 works the same or needs a change.

Also, this check duplicates the ones in __inet_bind and __inet6_bind; it
would be good to use an inline helper to reduce the duplication.

Re: [PATCH] net: ipv4: Fixed some styling issues.

2021-03-17 Thread David Ahern

On 3/17/21 9:07 AM, Anish Udupa wrote:
> Ran checkpatch and found these warnings. Fixed some of them in this patch.
> a) Added a space before '='.
> b) Removed the space before the tab.
> 
> Signed-off-by: Anish Udupa H 
> ---
>  net/ipv4/route.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 02d81d79deeb..0b9024584fde 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -2236,7 +2236,7 @@ out: return err;
>   if (!rth)
>   goto e_nobufs;
> 
> - rth->dst.output= ip_rt_bug;
> + rth->dst.output = ip_rt_bug;
>  #ifdef CONFIG_IP_ROUTE_CLASSID
>   rth->dst.tclassid = itag;
>  #endif
> @@ -2244,9 +2244,9 @@ out: return err;
> 
>   RT_CACHE_STAT_INC(in_slow_tot);
>   if (res->type == RTN_UNREACHABLE) {
> - rth->dst.input= ip_error;
> - rth->dst.error= -err;
> - rth->rt_flags &= ~RTCF_LOCAL;
> + rth->dst.input = ip_error;
> + rth->dst.error = -err;
> + rth->rt_flags &= ~RTCF_LOCAL;
>   }
> 
>   if (do_cache) {
> 

your patch seems to have lost one or more tabs at the beginning of each
line.

Re: [PATCH] net: add net namespace inode for all net_dev events

2021-03-09 Thread David Ahern

On 3/9/21 1:02 PM, Steven Rostedt wrote:
> On Tue, 9 Mar 2021 12:53:37 -0700
> David Ahern  wrote:
> 
>> Changing the order of the fields will impact any bpf programs expecting
>> the existing format
> 
> I thought bpf programs were not API. And why are they not parsing this
> information? They have these offsets hard coded Why would they do that!
> The information to extract the data where ever it is has been there from
> day 1! Way before BPF ever had access to trace events.

BPF programs attached to a tracepoint are passed a context - a structure
based on the format for the tracepoint. To take an in-tree example, look
at samples/bpf/offwaketime_kern.c:

...

/* taken from /sys/kernel/debug/tracing/events/sched/sched_switch/format */
struct sched_switch_args {
unsigned long long pad;
char prev_comm[16];
int prev_pid;
int prev_prio;
long long prev_state;
char next_comm[16];
int next_pid;
int next_prio;
};
SEC("tracepoint/sched/sched_switch")
int oncpu(struct sched_switch_args *ctx)
{

...

Production systems do not typically have toolchains installed, so
dynamic generation of the program based on the 'format' file on the
running system is not realistic. That means creating the programs on a
development machine and installing on the production box. Further, there
is an expectation that a bpf program compiled against version X works on
version Y. Changing the order of the fields will break such programs in
non-obvious ways.

Re: [PATCH] net: add net namespace inode for all net_dev events

2021-03-09 Thread David Ahern

On 3/9/21 10:40 AM, Steven Rostedt wrote:
> The order of the fields is important. Don't worry about breaking API by
> fixing it. The parsing code uses this output to find where the binary data
> is.

Changing the order of the fields will impact any bpf programs expecting
the existing format.

Re: [PATCH net v3 2/2] net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0

2021-03-09 Thread David Ahern

On 3/9/21 4:31 AM, Balazs Nemeth wrote:
> A packet with skb_inner_network_header(skb) == skb_network_header(skb)
> and ETH_P_MPLS_UC will prevent mpls_gso_segment from pulling any headers
> from the packet. Subsequently, the call to skb_mac_gso_segment will
> again call mpls_gso_segment with the same packet leading to an infinite
> loop. In addition, ensure that the header length is a multiple of four,
> which should hold irrespective of the number of stacked labels.
> 
> Signed-off-by: Balazs Nemeth 
> ---
>  net/mpls/mpls_gso.c | 3 +++
>  1 file changed, 3 insertions(+)
> 


Reviewed-by: David Ahern

Re: [PATCH v2 2/2] net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0

2021-03-08 Thread David Ahern

On 3/8/21 9:26 AM, Balazs Nemeth wrote:
> On Mon, 2021-03-08 at 09:17 -0700, David Ahern wrote:
>> On 3/8/21 9:07 AM, Willem de Bruijn wrote:
>>>> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
>>>> index b1690149b6fa..cc1b6457fc93 100644
>>>> --- a/net/mpls/mpls_gso.c
>>>> +++ b/net/mpls/mpls_gso.c
>>>> @@ -27,7 +27,7 @@ static struct sk_buff *mpls_gso_segment(struct
>>>> sk_buff *skb,
>>>>
>>>>     skb_reset_network_header(skb);
>>>>     mpls_hlen = skb_inner_network_header(skb) -
>>>> skb_network_header(skb);
>>>> -   if (unlikely(!pskb_may_pull(skb, mpls_hlen)))
>>>> +   if (unlikely(!mpls_hlen || !pskb_may_pull(skb,
>>>> mpls_hlen)))
>>>>     goto out;
>>>
>>> Good cathc. Besides length zero, this can be more strict: a label
>>> is
>>> 4B, so mpls_hlen needs to be >= 4B.
>>>
>>> Perhaps even aligned to 4B, too, but not if there may be other
>>> encap on top.
>>>
>>> Unfortunately there is no struct or type definition that we can use
>>> a
>>> sizeof instead of open coding the raw constant.
>>>
>>
>> MPLS_HLEN can be used here.
>>
> 
> What about sizeof(struct mpls_label), like in net/ipv4/tunnel4.c?
> 

I was thinking MPLS_HLEN because of its consistent use with skb
manipulations. net/mpls code uses mpls_shim_hdr over mpls_label.

Looks like the MPLS code could use some cleanups to make this consistent.

Re: [PATCH v2 2/2] net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0

2021-03-08 Thread David Ahern

On 3/8/21 9:07 AM, Willem de Bruijn wrote:
>> diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
>> index b1690149b6fa..cc1b6457fc93 100644
>> --- a/net/mpls/mpls_gso.c
>> +++ b/net/mpls/mpls_gso.c
>> @@ -27,7 +27,7 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff 
>> *skb,
>>
>> skb_reset_network_header(skb);
>> mpls_hlen = skb_inner_network_header(skb) - skb_network_header(skb);
>> -   if (unlikely(!pskb_may_pull(skb, mpls_hlen)))
>> +   if (unlikely(!mpls_hlen || !pskb_may_pull(skb, mpls_hlen)))
>> goto out;
> 
> Good cathc. Besides length zero, this can be more strict: a label is
> 4B, so mpls_hlen needs to be >= 4B.
> 
> Perhaps even aligned to 4B, too, but not if there may be other encap on top.
> 
> Unfortunately there is no struct or type definition that we can use a
> sizeof instead of open coding the raw constant.
> 

MPLS_HLEN can be used here.

Re: [PATCH] net:ipv4: Packet is not forwarded if bc_forwarding not configured on ingress interface

2021-02-28 Thread David Ahern

On 2/28/21 5:53 PM, Henry Shen wrote:
> When an IPv4 packet with a destination address of broadcast is received
> on an ingress interface, it will not be forwarded out of the egress
> interface if the ingress interface is not configured with bc_forwarding 
> but the egress interface is. If both the ingress and egress interfaces
> are configured with bc_forwarding, the packet can be forwarded
> successfully.
> 
> This patch is to be inline with Cisco's implementation that packet can be 
> forwarded if ingress interface is NOT configured with bc_forwarding, 
> but egress interface is.
> 

In Linux, forwarding decisions are made based on the ingress device, not
the egress device.

Re: [PATCH] ipv6: Honor route mtu if it is within limit of dev mtu

2021-02-24 Thread David Ahern

On 2/22/21 9:32 AM, Kaustubh Pandey wrote:
> When netdevice MTU is increased via sysfs, NETDEV_CHANGEMTU is raised.
> 
> addrconf_notify -> rt6_mtu_change -> rt6_mtu_change_route ->
> fib6_nh_mtu_change
> 
> As part of handling NETDEV_CHANGEMTU notification we land up on a
> condition where if route mtu is less than dev mtu and route mtu equals
> ipv6_devconf mtu, route mtu gets updated.
> 
> Due to this v6 traffic end up using wrong MTU then configured earlier.
> This commit fixes this by removing comparison with ipv6_devconf
> and updating route mtu only when it is greater than incoming dev mtu.
> 
> This can be easily reproduced with below script:
> pre-condition:
> device up(mtu = 1500) and route mtu for both v4 and v6 is 1500
> 
> test-script:
> ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1400
> ip -6 route change 2001::/64 dev eth0 metric 256 mtu 1400
> echo 1400 > /sys/class/net/eth0/mtu
> ip route change 192.168.0.0/24 dev eth0 src 192.168.0.1 mtu 1500
> echo 1500 > /sys/class/net/eth0/mtu
> 
> Signed-off-by: Kaustubh Pandey 
> ---
>  net/ipv6/route.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 1536f49..653b6c7 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4813,8 +4813,7 @@ static int fib6_nh_mtu_change(struct fib6_nh *nh, void 
> *_arg)
>   struct inet6_dev *idev = __in6_dev_get(arg->dev);
>   u32 mtu = f6i->fib6_pmtu;
>  
> - if (mtu >= arg->mtu ||
> - (mtu < arg->mtu && mtu == idev->cnf.mtu6))
> + if (mtu >= arg->mtu)
>   fib6_metric_set(f6i, RTAX_MTU, arg->mtu);
>  
>   spin_lock_bh(_exception_lock);
> 

The existing logic mirrors what is done for exceptions, see
rt6_mtu_change_route_allowed and commit e9fa1495d738.

It seems right to me to drop the mtu == idev->cnf.mtu6 comparison in
which case the exceptions should do the same.

Added author of e9fa1495d738 in case I am overlooking something.

Test case should be added to tools/testing/selftests/net/pmtu.sh, and
did you run that script with the proposed change?

Re: [PATCH] arp: Remove the arp_hh_ops structure

2021-02-22 Thread David Ahern

On 2/22/21 1:37 AM, Eric Dumazet wrote:
> 
> 
> On 2/22/21 4:15 AM, Yejune Deng wrote:
>> The arp_hh_ops structure is similar to the arp_generic_ops structure.
>> but the latter is more general,so remove the arp_hh_ops structure.
>>
>> Fix when took out the neigh->ops assignment:
>> 8.973653] #PF: supervisor read access in kernel mode
>> [8.975027] #PF: error_code(0x) - not-present page
>> [8.976310] PGD 0 P4D 0
>> [8.977036] Oops:  [#1] SMP PTI
>> [8.977973] CPU: 1 PID: 210 Comm: sd-resolve Not tainted 
>> 5.11.0-rc7-02046-g4591591ab715 #1
>> [8.979998] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.12.0-1 04/01/2014
>> [8.981996] RIP: 0010:neigh_probe 
>> (kbuild/src/consumer/net/core/neighbour.c:1009)
>>
> 
> I have a hard time understanding this patch submission.
> 
> This seems a mix of a net-next and net material ?

It is net-next at best.

> 
> 
> 
>> Reported-by: kernel test robot 
> 
> If this is a bug fix, we want a Fixes: tag
> 
> This will really help us. Please don't let us guess what is going on.
> 

This patch is a v2. v1 was clearly wrong and not tested; I responded as
such 12 hours before the robot test.

Re: [PATCH] arp: Remove the arp_hh_ops structure

2021-02-20 Thread David Ahern

On 2/19/21 9:32 PM, Yejune Deng wrote:
>  static const struct neigh_ops arp_direct_ops = {
>   .family =   AF_INET,
>   .output =   neigh_direct_output,
> @@ -277,15 +269,10 @@ static int arp_constructor(struct neighbour *neigh)
>   memcpy(neigh->ha, dev->broadcast, dev->addr_len);
>   }
>  
> - if (dev->header_ops->cache)
> - neigh->ops = _hh_ops;
> - else
> - neigh->ops = _generic_ops;

How did you test this?

you took out the neigh->ops assignment, so all of the neigh->ops in
net/core/neighbour.c are going to cause a NULL dereference.


> -
> - if (neigh->nud_state & NUD_VALID)
> - neigh->output = neigh->ops->connected_output;
> + if (!dev->header_ops->cache && (neigh->nud_state & NUD_VALID))
> + neigh->output = arp_generic_ops.connected_output;
>   else
> - neigh->output = neigh->ops->output;
> + neigh->output = arp_generic_ops.output;
>   }
>   return 0;
>  }
>

Re: [PATCH iproute2-next V4] devlink: add support for port params get/set

2021-02-11 Thread David Ahern

On 2/9/21 3:31 AM, Oleksandr Mazur wrote:
> Add implementation for the port parameters
> getting/setting.
> Add bash completion for port param.
> Add man description for port param.
> 
> Signed-off-by: Oleksandr Mazur 
> ---

applied to iproute2-next.

In the future, please add example commands - get/set/show, and the show
commands should have examples for normal and json.

Re: [PATCH iproute2-next V3] devlink: add support for port params get/set

2021-02-08 Thread David Ahern

On 2/2/21 6:04 AM, Oleksandr Mazur wrote:
> Add implementation for the port parameters
> getting/setting.
> Add bash completion for port param.
> Add man description for port param.
> 
> Example:
> $ devlink dev param set netdevsim/netdevsim0/0 name test_port_parameter value 
> false cmode runtime
> 
> $ devlink port param show netdevsim/netdevsim0/0 name test_port_parameter
> netdevsim/netdevsim0/0:
>   name test_port_parameter type driver-specific
> values:
>   cmode runtime value false
> 
> $ devlink port  -jp param show netdevsim/netdevsim0/0 name test_port_parameter
> {
> "param": {
> "netdevsim/netdevsim0/0": [ {
> "name": "test_port_parameter",
> "type": "driver-specific",
> "values": [ {
> "cmode": "runtime",
> "value": false
> } ]
> } ]
> }
> }
> 
> Signed-off-by: Oleksandr Mazur 
> ---
> V3:
> 1) Add usage example;
> 2) Remove stray newline in code;
> V2:
> 1) Add bash completion for port param;
> 2) Add man decsription / examples for port param;
> 
>  bash-completion/devlink |  55 
>  devlink/devlink.c   | 274 +++-
>  man/man8/devlink-port.8 |  65 ++
>  3 files changed, 388 insertions(+), 6 deletions(-)
> 

does not apply to iproute2-next. please rebase

Re: [PATCH net-next v2] seg6: fool-proof the processing of SRv6 behavior attributes

2021-02-07 Thread David Ahern

On 2/6/21 10:09 AM, Andrea Mayer wrote:
> The set of required attributes for a given SRv6 behavior is identified
> using a bitmap stored in an unsigned long, since the initial design of SRv6
> networking in Linux. Recently the same approach has been used for
> identifying the optional attributes.
> 
> However, the number of attributes supported by SRv6 behaviors depends on
> the size of the unsigned long type which changes with the architecture.
> Indeed, on a 64-bit architecture, an SRv6 behavior can support up to 64
> attributes while on a 32-bit architecture it can support at most 32
> attributes.
> 
> To fool-proof the processing of SRv6 behaviors we verify, at compile time,
> that the set of all supported SRv6 attributes can be encoded into a bitmap
> stored in an unsigned long. Otherwise, kernel build fails forcing
> developers to reconsider adding a new attribute or extend the total
> number of supported attributes by the SRv6 behaviors.
> 
> Moreover, we replace all patterns (1 << i) with the macro SEG6_F_ATTR(i) in
> order to address potential overflow issues caused by 32-bit signed
> arithmetic.
> 
> Thanks to Colin Ian King for catching the overflow problem, providing a
> solution and inspiring this patch.
> Thanks to Jakub Kicinski for his useful suggestions during the design of
> this patch.
> 
> v2:
>  - remove the SEG6_LOCAL_MAX_SUPP which is not strictly needed: it can
>be derived from the unsigned long type. Thanks to David Ahern for
>pointing it out.
> 
> Signed-off-by: Andrea Mayer 
> ---
>  net/ipv6/seg6_local.c | 67 +++++--
>  1 file changed, 39 insertions(+), 28 deletions(-)
> 

Reviewed-by: David Ahern

Re: [PATCH iproute2-next V3] devlink: add support for port params get/set

2021-02-04 Thread David Ahern

Jiri/Parav: does this look ok to you?

Re: [PATCH net-next] seg6: fool-proof the processing of SRv6 behavior attributes

2021-02-03 Thread David Ahern

On 2/3/21 7:27 PM, Andrea Mayer wrote:
> 
> I think there is an issue here because BITS_PER_TYPE(unsigned long) is greater
> than the SEG6_LOCAL_MAX (currently = 9).
> 
> I think it should be like this:
> 
>  BUILD_BUG_ON(SEG6_LOCAL_MAX + 1 > BITS_PER_TYPE(unsigned long))
> 
> I will send a v2 with the changes discussed so far.

yes, I had that backwards.

Re: [PATCH net-next] seg6: fool-proof the processing of SRv6 behavior attributes

2021-02-03 Thread David Ahern

On 2/2/21 11:56 AM, Andrea Mayer wrote:
> diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
> index b07f7c1c82a4..7cc50d506902 100644
> --- a/net/ipv6/seg6_local.c
> +++ b/net/ipv6/seg6_local.c
> @@ -31,6 +31,9 @@
>  #include 
>  #include 
>  
> +#define SEG6_F_ATTR(i)   BIT(i)
> +#define SEG6_LOCAL_MAX_SUPP  32
> +

SEG6_LOCAL_MAX_SUPP should not be needed; it can be derived from the type:

BUILD_BUG_ON(BITS_PER_TYPE(unsigned long) > SEG6_LOCAL_MAX)

The use of BIT() looks fine.

Re: [PATCH iproute-next v2] devlink: add support for port params get/set

2021-01-29 Thread David Ahern

On 1/25/21 6:48 AM, Oleksandr Mazur wrote:
> Add implementation for the port parameters getting/setting.
> Add bash completion for port param.
> Add man description for port param.
> 

Add example commands here - both set and show. Include a json version of
the show.

> Signed-off-by: Oleksandr Mazur 
> ---
> V2:
> 1) Add bash completion for port param;
> 2) Add man decsription / examples for port param;
> 
>  bash-completion/devlink |  55 
>  devlink/devlink.c   | 275 +++-
>  man/man8/devlink-port.8 |  65 ++
>  3 files changed, 389 insertions(+), 6 deletions(-)
> 

> diff --git a/devlink/devlink.c b/devlink/devlink.c
> index a2e06644..0fc1d4f0 100644
> --- a/devlink/devlink.c
> +++ b/devlink/devlink.c
> @@ -2706,7 +2706,8 @@ static void pr_out_param_value(struct dl *dl, const 
> char *nla_name,
>   }
>  }
>  
> -static void pr_out_param(struct dl *dl, struct nlattr **tb, bool array)
> +static void pr_out_param(struct dl *dl, struct nlattr **tb, bool array,
> +  bool is_port_param)
>  {
>   struct nlattr *nla_param[DEVLINK_ATTR_MAX + 1] = {};
>   struct nlattr *param_value_attr;
> @@ -2714,6 +2715,7 @@ static void pr_out_param(struct dl *dl, struct nlattr 
> **tb, bool array)
>   int nla_type;
>   int err;
>  
> +

stray newline here

Re: [PATCH 4/4] net: l3mdev: use obj-$(CONFIG_NET_L3_MASTER_DEV) form in net/Makefile

2021-01-26 Thread David Ahern

On 1/25/21 4:16 PM, Masahiro Yamada wrote:
> CONFIG_NET_L3_MASTER_DEV is a bool option. Change the ifeq conditional
> to the standard obj-$(CONFIG_NET_L3_MASTER_DEV) form.
> 
> Use obj-y in net/l3mdev/Makefile because Kbuild visits this Makefile
> only when CONFIG_NET_L3_MASTER_DEV=y.
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
>  net/Makefile| 4 +---
>  net/l3mdev/Makefile | 2 +-
>  2 files changed, 2 insertions(+), 4 deletions(-)
> 

Reviewed-by: David Ahern

Re: [PATCH v4 net-next 1/1] Allow user to set metric on default route learned via Router Advertisement.

2021-01-26 Thread David Ahern

On 1/25/21 2:44 PM, Praveen Chaudhary wrote:
> For IPv4, default route is learned via DHCPv4 and user is allowed to change
> metric using config etc/network/interfaces. But for IPv6, default route can
> be learned via RA, for which, currently a fixed metric value 1024 is used.
> 
> Ideally, user should be able to configure metric on default route for IPv6
> similar to IPv4. This fix adds sysctl for the same.
> 
> Signed-off-by: Praveen Chaudhary 
> Signed-off-by: Zhenggen Xu 
> 
> Reviewed-by: David Ahern 
> 
> Changes in v1.
> 1.) Correct the call to rt6_add_dflt_router.
> 
> Changes in v2.
> 1.) Replace accept_ra_defrtr_metric to ra_defrtr_metric.
> 2.) Change Type to __u32 instead of __s32.
> 3.) Change description in Documentation/networking/ip-sysctl.rst.
> 4.) Use proc_douintvec instead of proc_dointvec.
> 5.) Code style in ndisc_router_discovery().
> 6.) Change Type to u32 instead of unsigned int.
> 
> Changes in v3:
> 1.) Removed '---' and '```' from description.
> 2.) Remove stray ' after accept_ra_defrtr.
> 3.) Fix tab in net/ipv6/addrconf.c.
> 
> Changes in v4:
> 1.) Remove special case of 0 and use IP6_RT_PRIO_USER as default.
> 2.) Do not allow 0.
> 3.) Change Documentation accordingly.
> 4.) Remove extra brackets and compare with zero in ndisc_router_discovery().
> 5.) Remove compare with zero in rt6_add_dflt_router().
> 
> Logs:
> 
> For IPv4:
> 
> Config in etc/network/interfaces:
> auto eth0
> iface eth0 inet dhcp
> metric 4261413864
> 
> IPv4 Kernel Route Table:
> $ ip route list
> default via 172.21.47.1 dev eth0 metric 4261413864
> 
> FRR Table, if a static route is configured:
> [In real scenario, it is useful to prefer BGP learned default route over 
> DHCPv4 default route.]
> Codes: K - kernel route, C - connected, S - static, R - RIP,
>O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP,
>T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
>> - selected route, * - FIB route
> 
> S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03
> K   0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m
> 
> i.e. User can prefer Default Router learned via Routing Protocol in IPv4.
> Similar behavior is not possible for IPv6, without this fix.
> 
> After fix [for IPv6]:
> sudo sysctl -w 
> net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705
> 
> IP monitor: [When IPv6 RA is received]
> default via fe80::xx16::feb3:ce8e dev eth0 proto ra metric 1996489705  
> pref high
> 
> Kernel IPv6 routing table
> $ ip -6 route list
> default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 
> expires 21sec hoplimit 64 pref high
> 
> FRR Table, if a static route is configured:
> [In real scenario, it is useful to prefer BGP learned default route over IPv6 
> RA default route.]
> Codes: K - kernel route, C - connected, S - static, R - RIPng,
>O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
>v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
>> - selected route, * - FIB route
> 
> S>* ::/0 [20/0] is directly connected, eth0, 00:00:06
> K   ::/0 [119/1001] via fe80::xx16::feb3:ce8e, eth0, 6d07h43m
> 
> If the metric is changed later, the effect will be seen only when next IPv6
> RA is received, because the default route must be fully controlled by RA msg.
> Below metric is changed from 1996489705 to 1996489704.
> 
> $ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704
> net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704
> 
> IP monitor:
> [On next IPv6 RA msg, Kernel deletes prev route and installs new route with 
> updated metric]
> 
> Deleted default via fe80::xx16::feb3:ce8e dev eth0 proto ra metric 
> 1996489705  expires 3sec hoplimit 64 pref high
> default via fe80::xx16::feb3:ce8e dev eth0 proto ra metric 1996489704  
> pref high
> ---
>  Documentation/networking/ip-sysctl.rst | 10 ++
>  include/linux/ipv6.h   |  1 +
>  include/net/ip6_route.h|  3 ++-
>  include/uapi/linux/ipv6.h  |  1 +
>  include/uapi/linux/sysctl.h|  1 +
>  net/ipv6/addrconf.c| 11 +++
>  net/ipv6/ndisc.c   | 12 
>  net/ipv6/route.c   |  5 +++--
>  8 files changed, 37 insertions(+), 7 deletions(-)
> 

Reviewed-by: David Ahern

Re: [PATCH v3 net-next 1/1] Allow user to set metric on default route learned via Router Advertisement.

2021-01-23 Thread David Ahern

On 1/23/21 1:00 PM, Jakub Kicinski wrote:
> On Fri, 22 Jan 2021 22:16:41 -0700 David Ahern wrote:
>> On 1/22/21 9:02 PM, Maciej Żenczykowski wrote:
>>> Why can't we get rid of the special case for 0 and simply make 1024 the
>>> default value?  
>>
>> That would work too.
> 
> Should we drop it then? Easier to bring it back than to change the
> interpretation later. It doesn't seem to serve any clear purpose right
> now.
> 
> (Praveen if you post v4 please take a look at the checkpatch --strict
> warnings and address the ones which make sense, e.g. drop the brackets
> around comparisons, those are just noise, basic grasp of C operator
> precedence can be assumed in readers of kernel code).
> 

let's do a v4.

Praveen: set the initial value to IP6_RT_PRIO_USER, do not allow 0,
remove the checks on value and don't forget to update documentation.

Oh and cc me on the next otherwise the review depends on me finding time
to scan netdev.

Re: [PATCH v3 net-next 1/1] Allow user to set metric on default route learned via Router Advertisement.

2021-01-22 Thread David Ahern

On 1/22/21 9:02 PM, Maciej Żenczykowski wrote:
> Why can't we get rid of the special case for 0 and simply make 1024 the
> default value?

That would work too.

> 
> As for making it an RA option: it's not clear how that would work, the
> use case I see for this is for example two connections to the internet,
> of which one is clearly better (higher throughput, lower latency, lower
> packet loss, etc) then the other.
> 
> The upstream routers would have to somehow coordinate with each other
> the metric values... that seems impossible to achieve in practice -
> unless they do something like report expected down/up
> bandwidth, latency, etc...  While some sort of policy on the machine
> itself seems much more feasible (for example wired interface > wireless
> interface > cell interface or something like that)

I was thinking the admin of the network controls the RAs and knows which
paths are preferred over the admin of the node receiving the RA (not
practical for a mobile setup with cell vs wifi, but is for a DC which is
the driving use case).

But it takes an extension to IPv6/ndisc to add metric as an RA option,
so not realistic in a reasonable time frame.

Re: [PATCH v3 net-next 1/1] Allow user to set metric on default route learned via Router Advertisement.

2021-01-22 Thread David Ahern

On 1/19/21 2:29 PM, Praveen Chaudhary wrote:
> For IPv4, default route is learned via DHCPv4 and user is allowed to change
> metric using config etc/network/interfaces. But for IPv6, default route can
> be learned via RA, for which, currently a fixed metric value 1024 is used.
> 
> Ideally, user should be able to configure metric on default route for IPv6
> similar to IPv4. This fix adds sysctl for the same.
> 
> Signed-off-by: Praveen Chaudhary 
> Signed-off-by: Zhenggen Xu 
> 
> Changes in v1.
> 1.) Correct the call to rt6_add_dflt_router.
> 
> Changes in v2.
> 1.) Replace accept_ra_defrtr_metric to ra_defrtr_metric.
> 2.) Change Type to __u32 instead of __s32.
> 3.) Change description in Documentation/networking/ip-sysctl.rst.
> 4.) Use proc_douintvec instead of proc_dointvec.
> 5.) Code style in ndisc_router_discovery().
> 6.) Change Type to u32 instead of unsigned int.
> 
> Changes in v3:
> 1.) Removed '---' and '```' from description.
> 2.) Remove stray ' after accept_ra_defrtr.
> 3.) Fix tab in net/ipv6/addrconf.c.
> 
> Logs:
> 
> For IPv4:
> 
> Config in etc/network/interfaces:
> auto eth0
> iface eth0 inet dhcp
> metric 4261413864
> 
> IPv4 Kernel Route Table:
> $ ip route list
> default via 172.21.47.1 dev eth0 metric 4261413864
> 
> FRR Table, if a static route is configured:
> [In real scenario, it is useful to prefer BGP learned default route over 
> DHCPv4 default route.]
> Codes: K - kernel route, C - connected, S - static, R - RIP,
>O - OSPF, I - IS-IS, B - BGP, P - PIM, E - EIGRP, N - NHRP,
>T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
>> - selected route, * - FIB route
> 
> S>* 0.0.0.0/0 [20/0] is directly connected, eth0, 00:00:03
> K   0.0.0.0/0 [254/1000] via 172.21.47.1, eth0, 6d08h51m
> 
> i.e. User can prefer Default Router learned via Routing Protocol in IPv4.
> Similar behavior is not possible for IPv6, without this fix.
> 
> After fix [for IPv6]:
> sudo sysctl -w 
> net.ipv6.conf.eth0.net.ipv6.conf.eth0.ra_defrtr_metric=1996489705
> 
> IP monitor: [When IPv6 RA is received]
> default via fe80::xx16::feb3:ce8e dev eth0 proto ra metric 1996489705  
> pref high
> 
> Kernel IPv6 routing table
> $ ip -6 route list
> default via fe80::be16:65ff:feb3:ce8e dev eth0 proto ra metric 1996489705 
> expires 21sec hoplimit 64 pref high
> 
> FRR Table, if a static route is configured:
> [In real scenario, it is useful to prefer BGP learned default route over IPv6 
> RA default route.]
> Codes: K - kernel route, C - connected, S - static, R - RIPng,
>O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
>v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
>> - selected route, * - FIB route
> 
> S>* ::/0 [20/0] is directly connected, eth0, 00:00:06
> K   ::/0 [119/1001] via fe80::xx16::feb3:ce8e, eth0, 6d07h43m
> 
> If the metric is changed later, the effect will be seen only when next IPv6
> RA is received, because the default route must be fully controlled by RA msg.
> Below metric is changed from 1996489705 to 1996489704.
> 
> $ sudo sysctl -w net.ipv6.conf.eth0.ra_defrtr_metric=1996489704
> net.ipv6.conf.eth0.ra_defrtr_metric = 1996489704
> 
> IP monitor:
> [On next IPv6 RA msg, Kernel deletes prev route and installs new route with 
> updated metric]
> 
> Deleted default via fe80::xx16::feb3:ce8e dev eth0 proto ra metric 
> 1996489705  expires 3sec hoplimit 64 pref high
> default via fe80::xx16::feb3:ce8e dev eth0 proto ra metric 1996489704  
> pref high
> ---
>  Documentation/networking/ip-sysctl.rst | 12 
>  include/linux/ipv6.h   |  1 +
>  include/net/ip6_route.h|  3 ++-
>  include/uapi/linux/ipv6.h  |  1 +
>  include/uapi/linux/sysctl.h|  1 +
>  net/ipv6/addrconf.c| 10 ++++++
>  net/ipv6/ndisc.c   | 14 ++
>  net/ipv6/route.c   |  5 +++--
>  8 files changed, 40 insertions(+), 7 deletions(-)
> 

LGTM. I can't think of a better way to do this than a sysctl. Shame that
the metric/priority is not an RA option.

Reviewed-by: David Ahern

Re: [PATCH v2 net-next 1/1] Allow user to set metric on default route learned via Router Advertisement.

2021-01-19 Thread David Ahern

On 1/19/21 3:17 PM, praveen chaudhary wrote:
>>> 
>>> For IPv4:
>>> 
>>>
>>> Config in etc/network/interfaces
>>> 
>>> ```
>>> auto eth0
>>> iface eth0 inet dhcp
>>>metric 4261413864
>>
>> how does that work for IPv4? Is the metric passed to the dhclient and it
>> inserts the route with the given metric or is a dhclient script used to
>> replace the route after insert?
>>
>>
> 
> Yes, DHCP client picks config under “iface eth0 inet dhcp” line and if metric 
> is configured, then it adds the metric for all added routes.

As I recall ifupdown{2} forks dhclient as a process to handle dhcp
config, and I believe there is a script that handles adding the default
route with metric. Meaning ... it is not comparable to an RA.

> 
> Thanks a lot again for spending time for this Review,
> This feature will help SONiC OS [and others Linux flavors] for better IPv6 
> support, so thanks again.

I think SONiC is an abomination, so that is definitely not the
motivation for my reviews. :-)

Re: [PATCH net-next v3] bonding: add a vlan+srcmac tx hashing option

2021-01-18 Thread David Ahern

On 1/15/21 12:21 PM, Jarod Wilson wrote:
> diff --git a/Documentation/networking/bonding.rst 
> b/Documentation/networking/bonding.rst
> index adc314639085..36562dcd3e1e 100644
> --- a/Documentation/networking/bonding.rst
> +++ b/Documentation/networking/bonding.rst
> @@ -951,6 +951,19 @@ xmit_hash_policy
>   packets will be distributed according to the encapsulated
>   flows.
>  
> + vlan+srcmac
> +
> + This policy uses a very rudimentary vland ID and source mac

s/vland/vlan/

> + ID hash to load-balance traffic per-vlan, with failover

drop ID on this line; just 'source mac'.


> + should one leg fail. The intended use case is for a bond
> + shared by multiple virtual machines, all configured to
> + use their own vlan, to give lacp-like functionality
> + without requiring lacp-capable switching hardware.
> +
> + The formula for the hash is simply
> +
> + hash = (vlan ID) XOR (source MAC vendor) XOR (source MAC dev)
> +
>   The default value is layer2.  This option was added in bonding
>   version 2.6.3.  In earlier versions of bonding, this parameter
>   does not exist, and the layer2 policy is the only policy.  The

Re: [PATCH v2 net-next 1/1] Allow user to set metric on default route learned via Router Advertisement.

2021-01-16 Thread David Ahern

On 1/15/21 1:02 AM, Praveen Chaudhary wrote:
> For IPv4, default route is learned via DHCPv4 and user is allowed to change
> metric using config etc/network/interfaces. But for IPv6, default route can
> be learned via RA, for which, currently a fixed metric value 1024 is used.
> 
> Ideally, user should be able to configure metric on default route for IPv6
> similar to IPv4. This fix adds sysctl for the same.
> 
> Signed-off-by: Praveen Chaudhary 
> Signed-off-by: Zhenggen Xu 
> 
> Changes in v1.
> ---

your trying to be too fancy in the log messages; everything after this
first '---' is dropped. Just Remove all of the '---' lines and '```' tags.

> 1.) Correct the call to rt6_add_dflt_router.
> ---
> 
> Changes in v2.
> [Refer: lkml.org/lkml/2021/1/14/1400]
> ---
> 1.) Replace accept_ra_defrtr_metric to ra_defrtr_metric.
> 2.) Change Type to __u32 instead of __s32.
> 3.) Change description in Documentation/networking/ip-sysctl.rst.
> 4.) Use proc_douintvec instead of proc_dointvec.
> 5.) Code style in ndisc_router_discovery().
> 6.) Change Type to u32 instead of unsigned int.
> ---
> 
> Logs:
> 
> For IPv4:
> 
> 
> Config in etc/network/interfaces
> 
> ```
> auto eth0
> iface eth0 inet dhcp
> metric 4261413864

how does that work for IPv4? Is the metric passed to the dhclient and it
inserts the route with the given metric or is a dhclient script used to
replace the route after insert?


> diff --git a/Documentation/networking/ip-sysctl.rst 
> b/Documentation/networking/ip-sysctl.rst
> index dd2b12a32b73..c4b8d4b8d213 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -1871,6 +1871,18 @@ accept_ra_defrtr - BOOLEAN
>   - enabled if accept_ra is enabled.
>   - disabled if accept_ra is disabled.
>  
> +ra_defrtr_metric - INTEGER
> + Route metric for default route learned in Router Advertisement. This 
> value
> + will be assigned as metric for the default route learned via IPv6 Router
> + Advertisement. Takes affect only if accept_ra_defrtr' is enabled.

stray ' after accept_ra_defrtr

> +
> + Possible values are:
> + 0:
> + default value will be used for route metric
> + i.e. IP6_RT_PRIO_USER 1024.
> + 1 to 0x:
> + current value will be used for route metric.
> +
>  accept_ra_from_local - BOOLEAN
>   Accept RA with source-address that is found on local machine
>   if the RA is otherwise proper and able to be accepted.



> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index eff2cacd5209..b13d3213e58f 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -205,6 +205,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = {
>   .max_desync_factor  = MAX_DESYNC_FACTOR,
>   .max_addresses  = IPV6_MAX_ADDRESSES,
>   .accept_ra_defrtr   = 1,
> + .ra_defrtr_metric = 0,

make the the '=' align column wise with the existing entries; seems like
your new line is missing a tab

>   .accept_ra_from_local   = 0,
>   .accept_ra_min_hop_limit= 1,
>   .accept_ra_pinfo= 1,
> @@ -260,6 +261,7 @@ static struct ipv6_devconf ipv6_devconf_dflt 
> __read_mostly = {
>   .max_desync_factor  = MAX_DESYNC_FACTOR,
>   .max_addresses  = IPV6_MAX_ADDRESSES,
>   .accept_ra_defrtr   = 1,
> + .ra_defrtr_metric = 0,

same here

>   .accept_ra_from_local   = 0,
>   .accept_ra_min_hop_limit= 1,
>   .accept_ra_pinfo= 1,
> @@ -5475,6 +5477,7 @@ static inline void ipv6_store_devconf(struct 
> ipv6_devconf *cnf,
>   array[DEVCONF_MAX_DESYNC_FACTOR] = cnf->max_desync_factor;
>   array[DEVCONF_MAX_ADDRESSES] = cnf->max_addresses;
>   array[DEVCONF_ACCEPT_RA_DEFRTR] = cnf->accept_ra_defrtr;
> + array[DEVCONF_RA_DEFRTR_METRIC] = cnf->ra_defrtr_metric;
>   array[DEVCONF_ACCEPT_RA_MIN_HOP_LIMIT] = cnf->accept_ra_min_hop_limit;
>   array[DEVCONF_ACCEPT_RA_PINFO] = cnf->accept_ra_pinfo;
>  #ifdef CONFIG_IPV6_ROUTER_PREF
> @@ -6667,6 +6670,13 @@ static const struct ctl_table addrconf_sysctl[] = {
>   .mode   = 0644,
>   .proc_handler   = proc_dointvec,
>   },
> + {
> + .procname   = "ra_defrtr_metric",
> + .data   = _devconf.ra_defrtr_metric,
> + .maxlen = sizeof(u32),
> + .mode   = 0644,
> + .proc_handler   = proc_douintvec,
> + },
>   {
>   .procname   = "accept_ra_min_hop_limit",
>   .data   = _devconf.accept_ra_min_hop_limit,
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index 76717478f173..2bffed49f5c0 100644
> --- a/net/ipv6/ndisc.c
>

Re: [PATCH net 0/2] ipv6: fixes for the multicast routes

2021-01-15 Thread David Ahern

On 1/15/21 4:12 PM, Matteo Croce wrote:
> On Fri, Jan 15, 2021 at 11:50 PM Jakub Kicinski  wrote:
>>
>> On Fri, 15 Jan 2021 19:42:07 +0100 Matteo Croce wrote:
>>> From: Matteo Croce 
>>>
>>> Fix two wrong flags in the IPv6 multicast routes created
>>> by the autoconf code.
>>
>> Any chance for Fixes tags here?
> 
> Right.
> For 1/2 I don't know exactly, that code was touched last time in
> 86872cb57925 ("[IPv6] route: FIB6 configuration using struct
> fib6_config"), but it was only refactored. Before 86872cb57925, it was
> pushed in the git import commit by Linus: 1da177e4c3f4
> ("Linux-2.6.12-rc2").
> BTW, according the history repo, it entered the tree in the 2.4.0
> import, so I'd say it's here since the beginning.
> 
> While for 2/2 I'd say:
> 
> Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info")
> 

As I recall (memory jogging from commit description) my patch only moved
the setting from ip6_route_info_create default to here.

The change is correct, just thinking it goes back beyond 4.16. If
someone has a system running a 4.14 or earlier kernel it should be easy
to know if this was the default prior.

Re: [PATCH net 1/2] ipv6: create multicast route with RTPROT_KERNEL

2021-01-15 Thread David Ahern

On 1/15/21 11:42 AM, Matteo Croce wrote:
> From: Matteo Croce 
> 
> The ff00::/8 multicast route is created without specifying the fc_protocol
> field, so the default RTPROT_BOOT value is used:
> 
>   $ ip -6 -d route
>   unicast ::1 dev lo proto kernel scope global metric 256 pref medium
>   unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
>   unicast ff00::/8 dev eth0 proto boot scope global metric 256 pref medium
> 
> As the documentation says, this value identifies routes installed during
> boot, but the route is created when interface is set up.
> Change the value to RTPROT_KERNEL which is a better value.
> 
> Signed-off-by: Matteo Croce 
> ---
>  net/ipv6/addrconf.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index eff2cacd5209..19bf6822911c 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2469,6 +2469,7 @@ static void addrconf_add_mroute(struct net_device *dev)
>   .fc_flags = RTF_UP,
>   .fc_type = RTN_UNICAST,
>   .fc_nlinfo.nl_net = dev_net(dev),
> + .fc_protocol = RTPROT_KERNEL,
>   };
>  
>   ipv6_addr_set(_dst, htonl(0xFF00), 0, 0, 0);
> 


What's the motivation for changing this? ie., what s/w cares that it is
kernel vs boot?

Re: [PATCH v1 net-next 1/1] Allow user to set metric on default route learned via Router Advertisement.

2021-01-14 Thread David Ahern

On 1/12/21 6:50 PM, Praveen Chaudhary wrote:
> For IPv4, default route is learned via DHCPv4 and user is allowed to change
> metric using config etc/network/interfaces. But for IPv6, default route can
> be learned via RA, for which, currently a fixed metric value 1024 is used.
> 
> Ideally, user should be able to configure metric on default route for IPv6
> similar to IPv4. This fix adds sysctl for the same.

This is a single patch set, so the details you have in the cover letter
should be in this description here. Also, please just 'ip' commands in
the patch description; 'route' command is a dinosaur that needs to be
retired.

> 
> Signed-off-by: Praveen Chaudhary
> Signed-off-by: Zhenggen Xu
> 
> Changes in v1.
> ---
> 1.) Correct the call to rt6_add_dflt_router.
> ---
> 
> ---
>  Documentation/networking/ip-sysctl.rst | 18 ++
>  include/linux/ipv6.h   |  1 +
>  include/net/ip6_route.h|  3 ++-
>  include/uapi/linux/ipv6.h  |  1 +
>  include/uapi/linux/sysctl.h|  1 +
>  net/ipv6/addrconf.c| 10 ++
>  net/ipv6/ndisc.c   | 14 ++
>  net/ipv6/route.c   |  5 +++--
>  8 files changed, 46 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/networking/ip-sysctl.rst 
> b/Documentation/networking/ip-sysctl.rst
> index dd2b12a32b73..384159081d91 100644
> --- a/Documentation/networking/ip-sysctl.rst
> +++ b/Documentation/networking/ip-sysctl.rst
> @@ -1871,6 +1871,24 @@ accept_ra_defrtr - BOOLEAN
>   - enabled if accept_ra is enabled.
>   - disabled if accept_ra is disabled.
>  
> +accept_ra_defrtr_metric - INTEGER

Drop the 'accept' part; ra_defrtr_metric is sufficiently long. Since the
value is not from the RA, it is not really about accepting data from the RA.

> + Route metric for default route learned in Router Advertisement. This
> + value will be assigned as metric for the route learned via IPv6 Router
> + Advertisement.
> +
> + Possible values are:
> + 0:
> + Use default value i.e. IP6_RT_PRIO_USER 1024.
> + 0x to -1:
> + -ve values represent high route metric, value will be 
> treated as
> + unsigned value. This behaviour is inline with current 
> IPv4 metric
> + shown with commands such as "route -n" or "ip route 
> list".
> + 1 to 0x7FF:
> + +ve values will be used as is for route metric.

route metric is a u32, so these ranges should not be needed. 'ip route
list' shows metric as a positive number only.


> +
> + Functional default: enabled if accept_ra_defrtr is enabled.
> + disabled if accept_ra_defrtr is disabled.

Alignment problem, but I think this can be moved above to the
description and changed to something like 'only takes affect if
accept_ra_defrtr' is enabled.

> +
>  accept_ra_from_local - BOOLEAN
>   Accept RA with source-address that is found on local machine
>   if the RA is otherwise proper and able to be accepted.
> diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
> index dda61d150a13..19af90c77200 100644
> --- a/include/linux/ipv6.h
> +++ b/include/linux/ipv6.h
> @@ -31,6 +31,7 @@ struct ipv6_devconf {
>   __s32   max_desync_factor;
>   __s32   max_addresses;
>   __s32   accept_ra_defrtr;
> + __s32   accept_ra_defrtr_metric;

__u32 and drop the 'accept_' prefix.


>   __s32   accept_ra_min_hop_limit;
>   __s32   accept_ra_pinfo;
>   __s32   ignore_routes_with_linkdown;
> diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
> index 2a5277758379..a470bdab2420 100644
> --- a/include/net/ip6_route.h
> +++ b/include/net/ip6_route.h
> @@ -174,7 +174,8 @@ struct fib6_info *rt6_get_dflt_router(struct net *net,
>struct net_device *dev);
>  struct fib6_info *rt6_add_dflt_router(struct net *net,
>const struct in6_addr *gwaddr,
> -  struct net_device *dev, unsigned int pref);
> +  struct net_device *dev, unsigned int pref,
> +  unsigned int defrtr_usr_metric);

u32 for consistency

>  
>  void rt6_purge_dflt_routers(struct net *net);
>  
> diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
> index 13e8751bf24a..945de5de5144 100644
> --- a/include/uapi/linux/ipv6.h
> +++ b/include/uapi/linux/ipv6.h
> @@ -189,6 +189,7 @@ enum {
>   DEVCONF_ACCEPT_RA_RT_INFO_MIN_PLEN,
>   DEVCONF_NDISC_TCLASS,
>   DEVCONF_RPL_SEG_ENABLED,
> + DEVCONF_ACCEPT_RA_DEFRTR_METRIC,

Drop 'ACCEPT_'

>   DEVCONF_MAX
>  };
>  
> diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
> index 458179df9b27..5e79c196e33c 100644
> ---

Re: [PATCH] bpf: fix: address of local auto-variable assigned to a function parameter.

2020-12-24 Thread David Ahern

On 12/24/20 12:01 AM, YANG LI wrote:
> Assigning local variable txq to the outputting parameter xdp->txq is not
> safe, txq will be released after the end of the function call. 
> Then the result of using xdp is unpredictable.

txq can only be accessed in this devmap context. Was it actually hit
during runtime or is this report based on code analysis?


> 
> Fix this error by defining the struct xdp_txq_info in function
> dev_map_run_prog() as a static type.
> 
> Signed-off-by: YANG LI 
> Reported-by: Abaci 
> ---
>  kernel/bpf/devmap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index f6e9c68..af6f004 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -454,7 +454,7 @@ static struct xdp_buff *dev_map_run_prog(struct 
> net_device *dev,
>struct xdp_buff *xdp,
>struct bpf_prog *xdp_prog)
>  {
> - struct xdp_txq_info txq = { .dev = dev };
> + static struct xdp_txq_info txq = { .dev = dev };
>   u32 act;
>  
>   xdp_set_data_meta_invalid(xdp);
>

Re: [PATCH net-next] vrf: handle CONFIG_IPV6 not set for vrf_add_mac_header_if_unset()

2020-12-08 Thread David Ahern

On 12/8/20 10:52 AM, Andrea Mayer wrote:
> The vrf_add_mac_header_if_unset() is defined within a conditional
> compilation block which depends on the CONFIG_IPV6 macro.
> However, the vrf_add_mac_header_if_unset() needs to be called also by IPv4
> related code and when the CONFIG_IPV6 is not set, this function is missing.
> As a consequence, the build process stops reporting the error:
> 
>  ERROR: implicit declaration of function 'vrf_add_mac_header_if_unset'
> 
> The problem is solved by *only* moving functions
> vrf_add_mac_header_if_unset() and vrf_prepare_mac_header() out of the
> conditional block.
> 
> Reported-by: kernel test robot 
> Fixes: 0489390882202 ("vrf: add mac header for tunneled packets when sniffer 
> is attached")
> Signed-off-by: Andrea Mayer 
> ---
>  drivers/net/vrf.c | 110 +++---
>  1 file changed, 55 insertions(+), 55 deletions(-)
> 


I should have caught that in my review.

Reviewed-by: David Ahern

Re: [resend/standalone PATCH v4] Add auxiliary bus support

2020-12-05 Thread David Ahern

On 12/2/20 5:54 PM, Dan Williams wrote:
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 8d7001712062..040be48ce046 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -1,6 +1,9 @@
>  # SPDX-License-Identifier: GPL-2.0
>  menu "Generic Driver Options"
>  
> +config AUXILIARY_BUS
> + bool
> +
>  config UEVENT_HELPER
>   bool "Support for uevent helper"
>   help

Missing a description and without it does not appear in menuconfig or in
the config file.

Could use a blurb in the help as well.

Re: [PATCH net] ipv4: fix error return code in rtm_to_fib_config()

2020-12-04 Thread David Ahern

On 12/4/20 1:48 AM, Zhang Changzhong wrote:
> Fix to return a negative error code from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Fixes: d15662682db2 ("ipv4: Allow ipv6 gateway with ipv4 routes")
> Reported-by: Hulk Robot 
> Signed-off-by: Zhang Changzhong 
> ---
>  net/ipv4/fib_frontend.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index b87140a..cdf6ec5 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -825,7 +825,7 @@ static int rtm_to_fib_config(struct net *net, struct 
> sk_buff *skb,
>   if (has_gw && has_via) {
>   NL_SET_ERR_MSG(extack,
>  "Nexthop configuration can not contain both 
> GATEWAY and VIA");
> - goto errout;
> +     return -EINVAL;
>   }
>  
>   return 0;
> 

Thanks for the patch.

Reviewed-by: David Ahern

Re: [PATCH v3 iproute2] bridge: add support for L2 multicast groups

2020-11-29 Thread David Ahern

On 11/25/20 7:36 AM, Vladimir Oltean wrote:
> Extend the 'bridge mdb' command for the following syntax:
> bridge mdb add dev br0 port swp0 grp 01:02:03:04:05:06 permanent
> 
> Signed-off-by: Vladimir Oltean 
> ---
> Changes in v3:
> - Using rt_addr_n2a_r instead of inet_ntop/ll_addr_n2a directly.
> - Updated the bridge manpage.
> 
> Changes in v2:
> - Removed the const void casts.
> - Removed MDB_FLAGS_L2 from the UAPI to be in sync with the latest
>   kernel patch:
>   
> https://patchwork.ozlabs.org/project/netdev/patch/20201028233831.610076-1-vladimir.olt...@nxp.com/
> 
>  bridge/mdb.c   | 54 ++
>  include/uapi/linux/if_bridge.h |  1 +
>  man/man8/bridge.8  |  8 ++---
>  3 files changed, 46 insertions(+), 17 deletions(-)
> 

applied to iproute2-next

Re: [net-next v3 0/8] seg6: add support for SRv6 End.DT4/DT6 behavior

2020-11-25 Thread David Ahern

On 11/25/20 9:47 AM, Jakub Kicinski wrote:
> On Tue, 24 Nov 2020 21:37:18 -0700 David Ahern wrote:
>> On 11/24/20 6:58 PM, Jakub Kicinski wrote:
>>> But it's generally not a huge issue for applying the patch. I just like
>>> to see the build bot result, to make sure we're not adding W=1 C=1
>>> warnings.  
>>
>> ah, the build bot part is new. got it.
> 
> BTW I was wondering for the longest time how to structure things so
> that build bot can also build iproute2 in case we want to run tests
> attached to the series and the tests depend on iproute2 changes...
> 
> But let's cross that bridge when we get there.
> 

Why not cross it now? You handled the switch over to new a patchworks
with a build bot, so we can take advantage of automation.

Seems like the bot needs to detect 'net', 'net-next', 'bpf' and
'bpf-next' as they are all different trees for the kernel patches.
iproute2 is just another tree, so it should be able to put those in a
different bucket for automated builds - even if it means a 'set' crosses
trees.

Re: [net-next v3 0/8] seg6: add support for SRv6 End.DT4/DT6 behavior

2020-11-24 Thread David Ahern

On 11/24/20 6:58 PM, Jakub Kicinski wrote:
> But it's generally not a huge issue for applying the patch. I just like
> to see the build bot result, to make sure we're not adding W=1 C=1
> warnings.

ah, the build bot part is new. got it.

Re: [net-next v3 0/8] seg6: add support for SRv6 End.DT4/DT6 behavior

2020-11-24 Thread David Ahern

On 11/24/20 4:49 PM, Jakub Kicinski wrote:
> 
> LGTM! Please address the nit and repost without the iproute2 patch.
> Mixing the iproute2 patch in has confused patchwork:
> 
> https://patchwork.kernel.org/project/netdevbpf/list/?series=389667=*
> 
> Note how it thinks that the iproute2 patch is part of the kernel
> series. This build bot-y thing is pretty new. I'll add a suggestion 
> to our process documentation not to mix patches.

That was me - I suggested doing that. I have done that in the past as
has several other people. I don't recall DaveM having a problem, so
maybe it is the new patchworks that is not liking it?

> 
>> I would like to thank David Ahern for his support during the development of
>> this patchset.
> 
> Should I take this to mean that David has review the code off-list?
> 

reviews and general guidance.

Re: [PATCH v4 net-next 0/3] add support for sending RFC8335 PROBE

2020-11-20 Thread David Ahern

On 11/20/20 10:27 AM, Andreas Roeseler wrote:
> On Thu, 2020-11-19 at 21:01 -0700, David Ahern wrote:
>> On 11/19/20 8:51 PM, David Ahern wrote:
>>> On 11/17/20 5:46 PM, Andreas Roeseler wrote:
>>>> The popular utility ping has several severe limitations such as
>>>> the
>>>> inability to query specific  interfaces on a node and requiring
>>>> bidirectional connectivity between the probing and the probed
>>>> interfaces. RFC8335 attempts to solve these limitations by
>>>> creating the
>>>> new utility PROBE which is a specialized ICMP message that makes
>>>> use of
>>>> the ICMP Extension Structure outlined in RFC4884.
>>>>
>>>> This patchset adds definitions for the ICMP Extended Echo Request
>>>> and
>>>> Reply (PROBE) types for both IPv4 and IPv6. It also expands the
>>>> list of
>>>> supported ICMP messages to accommodate PROBEs.
>>>>
>>>
>>> You are updating the send, but what about the response side?
>>>
>>
>> you also are not setting 'ICMP Extension Structure'. From:
>> https://tools.ietf.org/html/rfc8335
>>
>>    o  ICMP Extension Structure: The ICMP Extension Structure
>> identifies
>>   the probed interface.
>>
>>    Section 7 of [RFC4884] defines the ICMP Extension Structure.  As
>> per
>>    RFC 4884, the Extension Structure contains exactly one Extension
>>    Header followed by one or more objects.  When applied to the ICMP
>>    Extended Echo Request message, the ICMP Extension Structure MUST
>>    contain exactly one instance of the Interface Identification
>> Object
>>    (see Section 2.1).
> 
> I am currently finishing testing and polishing the response side and
> hope to be sendding out v1 of the patch in the upcoming few weeks.

send the response side with the request side -- 1 set of patches for the
entire feature.

> 
> As for the 'ICMP Extension Structure', I have been working with the
> iputils package to add a command to send PROBE messages, and the
> changes included in this patchset are all that are necessary to be able
> to send PROBEs using the existing ping framework.
> 

right.

Re: [PATCH] bpf: Check the return value of dev_get_by_index_rcu()

2020-11-20 Thread David Ahern

On 11/20/20 8:13 AM, Daniel Borkmann wrote:
> [ +David ]
> 
> On 11/19/20 8:04 AM, xiakaixu1...@gmail.com wrote:
>> From: Kaixu Xia 
>>
>> The return value of dev_get_by_index_rcu() can be NULL, so here it
>> is need to check the return value and return error code if it is NULL.
>>
>> Signed-off-by: Kaixu Xia 
>> ---
>>   net/core/filter.c | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 2ca5eecebacf..1263fe07170a 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -5573,6 +5573,8 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *,
>> skb,
>>   struct net_device *dev;
>>     dev = dev_get_by_index_rcu(net, params->ifindex);
>> +    if (unlikely(!dev))
>> +    return -EINVAL;
>>   if (!is_skb_forwardable(dev, skb))
>>   rc = BPF_FIB_LKUP_RET_FRAG_NEEDED;

rcu lock is held right? It is impossible for dev to return NULL here.

> 
> The above logic is quite ugly anyway given we fetched the dev pointer
> already earlier
> in bpf_ipv{4,6}_fib_lookup() and now need to redo it again ... so yeah

evolved from the different needs of the xdp and tc paths.

> there could be
> a tiny race in here. We wanted do bring this logic closer to what XDP
> does anyway,
> something like below, for example. David, thoughts? Thx
> 
> Subject: [PATCH] diff mtu check
> 
> Signed-off-by: Daniel Borkmann 
> ---
>  net/core/filter.c | 22 +-
>  1 file changed, 5 insertions(+), 17 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 2ca5eecebacf..3bab0a97fa38 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5547,9 +5547,6 @@ static const struct bpf_func_proto
> bpf_xdp_fib_lookup_proto = {
>  BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *, skb,
>     struct bpf_fib_lookup *, params, int, plen, u32, flags)
>  {
> -    struct net *net = dev_net(skb->dev);
> -    int rc = -EAFNOSUPPORT;
> -
>  if (plen < sizeof(*params))
>  return -EINVAL;
> 
> @@ -5559,25 +5556,16 @@ BPF_CALL_4(bpf_skb_fib_lookup, struct sk_buff *,
> skb,
>  switch (params->family) {
>  #if IS_ENABLED(CONFIG_INET)
>  case AF_INET:
> -    rc = bpf_ipv4_fib_lookup(net, params, flags, false);
> -    break;
> +    return bpf_ipv4_fib_lookup(dev_net(skb->dev), params, flags,
> +   !skb_is_gso(skb));
>  #endif
>  #if IS_ENABLED(CONFIG_IPV6)
>  case AF_INET6:
> -    rc = bpf_ipv6_fib_lookup(net, params, flags, false);
> -    break;
> +    return bpf_ipv6_fib_lookup(dev_net(skb->dev), params, flags,
> +   !skb_is_gso(skb));

seems ok.


>  #endif
>  }
> -
> -    if (!rc) {
> -    struct net_device *dev;
> -
> -    dev = dev_get_by_index_rcu(net, params->ifindex);
> -    if (!is_skb_forwardable(dev, skb))
> -    rc = BPF_FIB_LKUP_RET_FRAG_NEEDED;
> -    }
> -
> -    return rc;
> +    return -EAFNOSUPPORT;
>  }
> 
>  static const struct bpf_func_proto bpf_skb_fib_lookup_proto = {

Re: [PATCH v4 net-next 0/3] add support for sending RFC8335 PROBE

2020-11-19 Thread David Ahern

On 11/19/20 8:51 PM, David Ahern wrote:
> On 11/17/20 5:46 PM, Andreas Roeseler wrote:
>> The popular utility ping has several severe limitations such as the
>> inability to query specific  interfaces on a node and requiring
>> bidirectional connectivity between the probing and the probed
>> interfaces. RFC8335 attempts to solve these limitations by creating the
>> new utility PROBE which is a specialized ICMP message that makes use of
>> the ICMP Extension Structure outlined in RFC4884.
>>
>> This patchset adds definitions for the ICMP Extended Echo Request and
>> Reply (PROBE) types for both IPv4 and IPv6. It also expands the list of
>> supported ICMP messages to accommodate PROBEs.
>>
> 
> You are updating the send, but what about the response side?
> 

you also are not setting 'ICMP Extension Structure'. From:
https://tools.ietf.org/html/rfc8335

   o  ICMP Extension Structure: The ICMP Extension Structure identifies
  the probed interface.

   Section 7 of [RFC4884] defines the ICMP Extension Structure.  As per
   RFC 4884, the Extension Structure contains exactly one Extension
   Header followed by one or more objects.  When applied to the ICMP
   Extended Echo Request message, the ICMP Extension Structure MUST
   contain exactly one instance of the Interface Identification Object
   (see Section 2.1).

Re: [PATCH v4 net-next 0/3] add support for sending RFC8335 PROBE

2020-11-19 Thread David Ahern

On 11/17/20 5:46 PM, Andreas Roeseler wrote:
> The popular utility ping has several severe limitations such as the
> inability to query specific  interfaces on a node and requiring
> bidirectional connectivity between the probing and the probed
> interfaces. RFC8335 attempts to solve these limitations by creating the
> new utility PROBE which is a specialized ICMP message that makes use of
> the ICMP Extension Structure outlined in RFC4884.
> 
> This patchset adds definitions for the ICMP Extended Echo Request and
> Reply (PROBE) types for both IPv4 and IPv6. It also expands the list of
> supported ICMP messages to accommodate PROBEs.
> 

You are updating the send, but what about the response side?

Re: [PATCH v4 bpf-next 3/5] kbuild: build kernel module BTFs if BTF is enabled and pahole supports it

2020-11-16 Thread David Ahern

On 11/9/20 6:19 PM, Andrii Nakryiko wrote:
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index d7a7bc3b6098..1e78faaf20a5 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -274,6 +274,15 @@ config DEBUG_INFO_BTF
> Turning this on expects presence of pahole tool, which will convert
> DWARF type info into equivalent deduplicated BTF type info.
>  
> +config PAHOLE_HAS_SPLIT_BTF
> + def_bool $(success, test `$(PAHOLE) --version | sed -E 
> 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "119")
> +
> +config DEBUG_INFO_BTF_MODULES
> + def_bool y
> + depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
> + help
> +   Generate compact split BTF type information for kernel modules.
> +
>  config GDB_SCRIPTS
>   bool "Provide GDB scripts for kernel debugging"
>   help

Thank you for adding a config option for this feature vs bumping the
required pahole version in scripts/link-vmlinux.sh. This is a much more
friendly way of handling kernel features that require support from build
tools.

Re: [net-next,v2,4/5] seg6: add support for the SRv6 End.DT4 behavior

2020-11-13 Thread David Ahern

On 11/13/20 7:29 PM, Andrea Mayer wrote:
> Hi Jakub,
> 
> On Fri, 13 Nov 2020 18:01:26 -0800
> Jakub Kicinski  wrote:
> 
>>> UAPI solution 2
>>>
>>> we turn "table" into an optional parameter and we add the "vrftable" 
>>> optional
>>> parameter. DT4 can only be used with the "vrftable" (hence it is a required
>>> parameter for DT4).
>>> DT6 can be used with "vrftable" (new vrf mode) or with "table" (legacy mode)
>>> (hence it is an optional parameter for DT6).
>>>
>>> UAPI solution 2 examples:
>>>
>>> ip -6 route add 2001:db8::1/128 encap seg6local action End.DT4 vrftable 100 
>>> dev eth0
>>> ip -6 route add 2001:db8::1/128 encap seg6local action End.DT6 vrftable 100 
>>> dev eth0
>>> ip -6 route add 2001:db8::1/128 encap seg6local action End.DT6 table 100 
>>> dev eth0
>>>
>>> IMO solution 2 is nicer from UAPI POV because we always have only one 
>>> parameter, maybe solution 1 is slightly easier to implement, all in all 
>>> we prefer solution 2 but we can go for 1 if you prefer.
>>
>> Agreed, 2 looks better to me as well. But let's not conflate uABI with
>> iproute2's command line. I'm more concerned about the kernel ABI.
> 
> Sorry I was a little imprecise here. I reported only the user command 
> perspective.
> From the kernel point of view in solution 2 the vrftable will be a new
> [SEG6_LOCAL_VRFTABLE] optional parameter.
> 
>> BTW you prefer to operate on tables (and therefore require
>> net.vrf.strict_mode=1) because that's closer to the spirit of the RFC,
>> correct? As I said from the implementation perspective passing any VRF
>> ifindex down from user space to the kernel should be fine?
> 
> Yes, I definitely prefer to operate on tables (and so on the table ID) due to
> the spirit of the RFC. We have discussed in depth this design choice with
> David Ahern when implementing the DT4 patch and we are confident that 
> operating
> with VRF strict mode is a sound approach also for DT6. 
> 

I like the vrftable option. Straightforward extension from current table
argument.

Re: [net-next,v2,4/5] seg6: add support for the SRv6 End.DT4 behavior

2020-11-13 Thread David Ahern

On 11/13/20 10:02 AM, Stefano Salsano wrote:
> Il 2020-11-13 17:55, Jakub Kicinski ha scritto:
>> On Thu, 12 Nov 2020 18:49:17 -0700 David Ahern wrote:
>>> On 11/12/20 6:28 PM, Andrea Mayer wrote:
>>>> The implementation of SRv6 End.DT4 differs from the the
>>>> implementation of SRv6
>>>> End.DT6 due to the different *route input* lookup functions. For
>>>> IPv6 is it
>>>> possible to force the routing lookup specifying a routing table
>>>> through the
>>>> ip6_pol_route() function (as it is done in the
>>>> seg6_lookup_any_nexthop()).
>>>
>>> It is unfortunate that the IPv6 variant got in without the VRF piece.
>>
>> Should we make it a requirement for this series to also extend the v6
>> version to support the preferred VRF-based operation? Given VRF is
>> better and we require v4 features to be implemented for v6?
> 
> I think it is better to separate the two aspects... adding a missing
> feature in IPv4 datapath should not depend on improving the quality of
> the implementation of the IPv6 datapath :-)
> 
> I think that Andrea is willing to work on improving the IPv6
> implementation, but this should be considered after this patchset...
> 

agreed. The v6 variant has existed for a while. The v4 version is
independent.

Re: [net-next,v2,4/5] seg6: add support for the SRv6 End.DT4 behavior

2020-11-12 Thread David Ahern

On 11/12/20 6:28 PM, Andrea Mayer wrote:
> The implementation of SRv6 End.DT4 differs from the the implementation of SRv6
> End.DT6 due to the different *route input* lookup functions. For IPv6 is it
> possible to force the routing lookup specifying a routing table through the
> ip6_pol_route() function (as it is done in the seg6_lookup_any_nexthop()).

It is unfortunate that the IPv6 variant got in without the VRF piece.

Re: [PATCH v2 iproute2-next] bridge: add support for L2 multicast groups

2020-11-06 Thread David Ahern

On 10/29/20 4:28 PM, Vladimir Oltean wrote:
> @@ -168,9 +176,14 @@ static void print_mdb_entry(FILE *f, int ifindex, const 
> struct br_mdb_entry *e,
>   print_string(PRINT_ANY, "port", " port %s",
>ll_index_to_name(e->ifindex));
>  
> + if (af == AF_INET || af == AF_INET6)
> + addr = inet_ntop(af, grp, abuf, sizeof(abuf));
> + else
> + addr = ll_addr_n2a(grp, ETH_ALEN, 0, abuf, sizeof(abuf));
> +

The above can be replaced with a single call to rt_addr_n2a_r.

>   print_color_string(PRINT_ANY, ifa_family_color(af),
> - "grp", " grp %s",
> - inet_ntop(af, grp, abuf, sizeof(abuf)));
> + "grp", " grp %s", addr);
> +
>   if (tb && tb[MDBA_MDB_EATTR_SOURCE]) {
>   src = (const void *)RTA_DATA(tb[MDBA_MDB_EATTR_SOURCE]);
>   print_color_string(PRINT_ANY, ifa_family_color(af),

I think the rest is ok.

Re: [net-next,v1,5/5] selftests: add selftest for the SRv6 End.DT4 behavior

2020-11-03 Thread David Ahern

--+
> +# |fc00:12:200::6004|apply SRv6 End.DT4 table 200|
> +# +--+
> +#
> +# rt-2: VRF tenant 100 (table 100)
> +# +---+
> +# |host   |Action |
> +# +---+
> +# |10.0.0.1   |apply seg6 encap segs fc00:21:100::6004|
> +# +-------+
> +# |10.0.0.0/24|forward to dev veth_t100   |
> +# +---+
> +#
> +# rt-2: VRF tenant 200 (table 200)
> +# +---+
> +# |host   |Action |
> +# +---+
> +# |10.0.0.3   |apply seg6 encap segs fc00:21:200::6004|
> +# +---+
> +# |10.0.0.0/24|forward to dev veth_t200   |
> +# +---+
> +#
> +
>

thanks for creating the very well documented test case.

Reviewed-by: David Ahern

Re: [net-next,v1,1/5] vrf: add mac header for tunneled packets when sniffer is attached

2020-11-03 Thread David Ahern

On 11/3/20 5:52 AM, Andrea Mayer wrote:
> Before this patch, a sniffer attached to a VRF used as the receiving
> interface of L3 tunneled packets detects them as malformed packets and
> it complains about that (i.e.: tcpdump shows bogus packets).
> 
> The reason is that a tunneled L3 packet does not carry any L2
> information and when the VRF is set as the receiving interface of a
> decapsulated L3 packet, no mac header is currently set or valid.
> Therefore, the purpose of this patch consists of adding a MAC header to
> any packet which is directly received on the VRF interface ONLY IF:
> 
>  i) a sniffer is attached on the VRF and ii) the mac header is not set.
> 
> In this case, the mac address of the VRF is copied in both the
> destination and the source address of the ethernet header. The protocol
> type is set either to IPv4 or IPv6, depending on which L3 packet is
> received.
> 
> Signed-off-by: Andrea Mayer 
> ---
>  drivers/net/vrf.c | 78 +++
>  1 file changed, 72 insertions(+), 6 deletions(-)
> 

Reviewed-by: David Ahern

Re: [PATCH AUTOSEL 5.9 035/111] ipv6/icmp: l3mdev: Perform icmp error route lookup on source device routing table (v2)

2020-10-18 Thread David Ahern

On 10/18/20 1:40 PM, Jakub Kicinski wrote:
> This one got applied a few days ago, and the urgency is low so it may be
> worth letting it see at least one -rc release ;)

agreed

Re: [RFC PATCH 0/3] l3mdev icmp error route lookup fixes

2020-10-12 Thread David Ahern

On 10/12/20 7:10 AM, Mathieu Desnoyers wrote:
> - On Oct 12, 2020, at 9:45 AM, David Ahern dsah...@gmail.com wrote:
> 
>> On 10/12/20 5:57 AM, Mathieu Desnoyers wrote:
>>> OK, do you want to pick up the RFC patch series, or should I re-send it
>>> without RFC tag ?
>>
>> you need to re-send for Dave or Jakub to pick them up via patchworks
> 
> OK. Can I have your Acked-by or Reviewed-by for all three patches ?
> 


sure.
Reviewed-by: David Ahern

Re: [RFC PATCH 0/3] l3mdev icmp error route lookup fixes

2020-10-12 Thread David Ahern

On 10/12/20 5:57 AM, Mathieu Desnoyers wrote:
> OK, do you want to pick up the RFC patch series, or should I re-send it
> without RFC tag ?

you need to re-send for Dave or Jakub to pick them up via patchworks

Re: [RFC PATCH 0/3] l3mdev icmp error route lookup fixes

2020-10-11 Thread David Ahern

On 10/5/20 9:30 AM, David Ahern wrote:
> On 9/25/20 1:04 PM, Mathieu Desnoyers wrote:
>> Hi,
>>
>> Here is an updated series of fixes for ipv4 and ipv6 which which ensure
>> the route lookup is performed on the right routing table in VRF
>> configurations when sending TTL expired icmp errors (useful for
>> traceroute).
>>
>> It includes tests for both ipv4 and ipv6.
>>
>> These fixes address specifically address the code paths involved in
>> sending TTL expired icmp errors. As detailed in the individual commit
>> messages, those fixes do not address similar icmp errors related to
>> network namespaces and unreachable / fragmentation needed messages,
>> which appear to use different code paths.
>>
>> The main changes since the last round are updates to the selftests.
>>
> 
> This looks fine to me. I noticed the IPv6 large packet test case is
> failing; the fib6 tracepoint is showing the loopback as the iif which is
> wrong:
> 
> ping6  8488 [004]   502.015817: fib6:fib6_table_lookup: table 255 oif 0
> iif 1 proto 58 ::/0 -> 2001:db8:16:1::1/0 tos 0 scope 0 flags 0 ==> dev
> lo gw :: err -113
> 
> I will dig into it later this week.
> 

I see the problem here -- source address selection is picking ::1. I do
not have a solution to the problem yet, but its resolution is
independent of the change in this set so I think this one is good to go.

Re: [RFC PATCH 0/3] l3mdev icmp error route lookup fixes

2020-10-05 Thread David Ahern

On 9/25/20 1:04 PM, Mathieu Desnoyers wrote:
> Hi,
> 
> Here is an updated series of fixes for ipv4 and ipv6 which which ensure
> the route lookup is performed on the right routing table in VRF
> configurations when sending TTL expired icmp errors (useful for
> traceroute).
> 
> It includes tests for both ipv4 and ipv6.
> 
> These fixes address specifically address the code paths involved in
> sending TTL expired icmp errors. As detailed in the individual commit
> messages, those fixes do not address similar icmp errors related to
> network namespaces and unreachable / fragmentation needed messages,
> which appear to use different code paths.
> 
> The main changes since the last round are updates to the selftests.
> 

This looks fine to me. I noticed the IPv6 large packet test case is
failing; the fib6 tracepoint is showing the loopback as the iif which is
wrong:

ping6  8488 [004]   502.015817: fib6:fib6_table_lookup: table 255 oif 0
iif 1 proto 58 ::/0 -> 2001:db8:16:1::1/0 tos 0 scope 0 flags 0 ==> dev
lo gw :: err -113

I will dig into it later this week.

Re: [RFC PATCH v2 0/3] l3mdev icmp error route lookup fixes

2020-09-23 Thread David Ahern

On 9/23/20 1:12 PM, Michael Jeanson wrote:
> 
> Just a final clarification, the asymmetric setup would have no return
> route in VRF 2 and only test the TTL case since the others would fail?

correct. add a statement about it representing a customer setup so it is
clear such a config is a 1-off

Re: [RFC PATCH v2 0/3] l3mdev icmp error route lookup fixes

2020-09-23 Thread David Ahern

On 9/23/20 11:03 AM, Michael Jeanson wrote:
> On 2020-09-23 12 h 04, Michael Jeanson wrote:
>>> It should work without asymmetric routing; adding the return route to
>>> the second vrf as I mentioned above fixes the FRAG_NEEDED problem. It
>>> should work for TTL as well.
>>>
>>> Adding a second pass on the tests with the return through r2 is fine,
>>> but add a first pass for the more typical case.
>>
>> Hi,
>>
>> Before writing new tests I just want to make sure we are trying to fix
>> the same issue. If I add a return route to the red VRF then we don't
>> need this patchset because whether the ICMP error are routed using the
>> table from the source or destination interface they will reach the
>> source host.
>>
>> The issue for which this patchset was sent only happens when the
>> destination interface's VRF doesn't have a route back to the source
>> host. I guess we might question if this is actually a bug or not.
>>
>> So the question really is, when a packet is forwarded between VRFs
>> through route leaking and an icmp error is generated, which table
>> should be used for the route lookup? And does it depend on the type of
>> icmp error? (e.g. TTL=1 happens before forwarding, but fragmentation
>> needed happens after when on the destination interface)
> 
> As a side note, I don't mind reworking the tests as you requested even
> if the patchset as a whole ends up not being needed and if you think
> they are still useful. I just wanted to make sure we understood each other.
> 

if you are leaking from VRF 1 to VRF 2 and you do not configure VRF 2
with how to send to errors back to source - MTU or TTL - then I will
argue that is a configuration problem, not a bug.

Now the TTL problem is interesting. You need the FIB lookup to know that
the packet is forwarded, and by the time of the ttl check in ip_forward
skb->dev points to the ingress VRF and dst points to the egress VRF. So
I think the change is warranted.

Let's do this for the tests:
1 pass through all of the tests (TTL and MTU, v4 and v6) with symmetric
routing configured and make sure they all pass. ie., keep all of them
and make sure all tests pass. No sense losing the tests and the thoughts
behind them.

Add a second pass with the asymmetric routing per the customer setup
since it motivated the investigation.

Rename the test to something like vrf_route_leaking.sh. It can be
expanded with more tests related to route leaking as they come up.

Re: [RFC PATCH v2 0/3] l3mdev icmp error route lookup fixes

2020-09-22 Thread David Ahern

On 9/22/20 7:52 AM, Michael Jeanson wrote:
>>>
>>> the test setup is bad. You have r1 dropping the MTU in VRF red, but not
>>> telling VRF red how to send back the ICMP. e.g., for IPv4 add:
>>>
>>>ip -netns r1 ro add vrf red 172.16.1.0/24 dev blue
>>>
>>> do the same for v6.
>>>
>>> Also, I do not see a reason for r2; I suggest dropping it. What you are
>>> testing is icmp crossing VRF with route leaking, so there should not be
>>> a need for r2 which leads to asymmetrical routing (172.16.1.0 via r1 and
>>> the return via r2).
> 
> The objective of the test was to replicate a clients environment where
> packets are crossing from a VRF which has a route back to the source to
> one which doesn't while reaching a ttl of 0. If the route lookup for the
> icmp error is done on the interface in the first VRF, it can be routed to
> the source but not on the interface in the second VRF which is the
> current behaviour for icmp errors generated while crossing between VRFs.
> 
> There may be a better test case that doesn't involve asymmetric routing
> to test this but it's the only way I found to replicate this.
> 

It should work without asymmetric routing; adding the return route to
the second vrf as I mentioned above fixes the FRAG_NEEDED problem. It
should work for TTL as well.

Adding a second pass on the tests with the return through r2 is fine,
but add a first pass for the more typical case.

Re: [RFC PATCH v2 0/3] l3mdev icmp error route lookup fixes

2020-09-21 Thread David Ahern

On 9/21/20 12:44 PM, Mathieu Desnoyers wrote:
> - On Sep 21, 2020, at 2:36 PM, David Ahern dsah...@gmail.com wrote:
> 
>> On 9/18/20 12:17 PM, Mathieu Desnoyers wrote:
>>> Hi,
>>>
>>> Here is an updated series of fixes for ipv4 and ipv6 which which ensure
>>> the route lookup is performed on the right routing table in VRF
>>> configurations when sending TTL expired icmp errors (useful for
>>> traceroute).
>>>
>>> It includes tests for both ipv4 and ipv6.
>>>
>>> These fixes address specifically address the code paths involved in
>>> sending TTL expired icmp errors. As detailed in the individual commit
>>> messages, those fixes do not address similar issues related to network
>>> namespaces and unreachable / fragmentation needed messages, which appear
>>> to use different code paths.
>>>
>>
>> New selftests are failing:
>> TEST: Ping received ICMP frag needed   [FAIL]
>>
>> Both IPv4 and IPv6 versions are failing.
> 
> Indeed, this situation is discussed in each patch commit message:
> 
> ipv4:
> 
> [ It has also been pointed out that a similar issue exists with
>   unreachable / fragmentation needed messages, which can be triggered by
>   changing the MTU of eth1 in r1 to 1400 and running:
> 
>   ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2
> 
>   Some investigation points to raw_icmp_error() and raw_err() as being
>   involved in this last scenario. The focus of this patch is TTL expired
>   ICMP messages, which go through icmp_route_lookup.
>   Investigation of failure modes related to raw_icmp_error() is beyond
>   this investigation's scope. ]
> 
> ipv6:
> 
> [ Testing shows that similar issues exist with ipv6 unreachable /
>   fragmentation needed messages.  However, investigation of this
>   additional failure mode is beyond this investigation's scope. ]
> 
> I do not have the time to investigate further unfortunately, so I
> thought it best to post what I have.
> 

the test setup is bad. You have r1 dropping the MTU in VRF red, but not
telling VRF red how to send back the ICMP. e.g., for IPv4 add:

ip -netns r1 ro add vrf red 172.16.1.0/24 dev blue

do the same for v6.

Also, I do not see a reason for r2; I suggest dropping it. What you are
testing is icmp crossing VRF with route leaking, so there should not be
a need for r2 which leads to asymmetrical routing (172.16.1.0 via r1 and
the return via r2).

Re: [RFC PATCH v2 0/3] l3mdev icmp error route lookup fixes

2020-09-21 Thread David Ahern

On 9/18/20 12:17 PM, Mathieu Desnoyers wrote:
> Hi,
> 
> Here is an updated series of fixes for ipv4 and ipv6 which which ensure
> the route lookup is performed on the right routing table in VRF
> configurations when sending TTL expired icmp errors (useful for
> traceroute).
> 
> It includes tests for both ipv4 and ipv6.
> 
> These fixes address specifically address the code paths involved in
> sending TTL expired icmp errors. As detailed in the individual commit
> messages, those fixes do not address similar issues related to network
> namespaces and unreachable / fragmentation needed messages, which appear
> to use different code paths.
> 

New selftests are failing:
TEST: Ping received ICMP frag needed   [FAIL]

Both IPv4 and IPv6 versions are failing.

Re: [PATCH] ipv6: remove redundant assignment to variable err

2020-09-11 Thread David Ahern

On 9/11/20 4:35 AM, Colin King wrote:
> From: Colin Ian King 
> 
> The variable err is being initialized with a value that is never read and
> it is being updated later with a new value. The initialization is redundant
> and can be removed.  Also re-order variable declarations in reverse
> Christmas tree ordering.
> 
> Addresses-Coverity: ("Unused value")
> Signed-off-by: Colin Ian King 
> ---
>  net/ipv6/route.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 

Reviewed-by: David Ahern

Re: [PATCH] net: ipv6: fix __rt6_purge_dflt_routers when forwarding is not set on all ifaces

2020-09-01 Thread David Ahern

On 9/1/20 9:50 AM, Brian Vazquez wrote:
> Hey David,
> 
> On Tue, Sep 1, 2020 at 7:57 AM David Ahern  wrote:
>>
>> On 9/1/20 1:56 AM, Eric Dumazet wrote:
>>> On Tue, Sep 1, 2020 at 8:58 AM Brian Vazquez  wrote:
>>>>
>>>> The problem is exposed when the system has multiple ifaces and
>>>> forwarding is enabled on a subset of them, __rt6_purge_dflt_routers will
>>>> clean the default route on all the ifaces which is not desired.
>>>>
>>>> This patches fixes that by cleaning only the routes where the iface has
>>>> forwarding enabled.
>>>>
>>>> Fixes: z ("net: ipv6: Fix processing of RAs in presence of VRF")
>>
>> are you sure that is a Fixes tag for this problem? looking at that
>> change it only handles RA for tables beyond the main table; it does not
>> change the logic of how many or which routes are purged.
> 
> That commit also added RT6_TABLE_HAS_DFLT_ROUTER so I thought that was
> the commit needed to be mentioned. But probably it shouldn't?

nah. That flag was added as an optimization. The patch referenced
earlier changed the code from looking at one table to looking at all of
them. The flag indicates which table have an RA based default route to
avoid unnecessary walks.

You could probably change it to a counter to handle the case of multiple
default route entries.

> Also Am I missing something or this is only called on on the sysctl path?

It is only called when accept_ra sysctl is enabled as I recall. That
setting requires forwarding to be disabled or overridden. See
Documentation/networking/ip-sysctl.rst.

It should be fairly easy to create a selftest using radvd and network
namespaces.

Re: [PATCH] net: ipv6: fix __rt6_purge_dflt_routers when forwarding is not set on all ifaces

2020-09-01 Thread David Ahern

On 9/1/20 1:56 AM, Eric Dumazet wrote:
> On Tue, Sep 1, 2020 at 8:58 AM Brian Vazquez  wrote:
>>
>> The problem is exposed when the system has multiple ifaces and
>> forwarding is enabled on a subset of them, __rt6_purge_dflt_routers will
>> clean the default route on all the ifaces which is not desired.
>>
>> This patches fixes that by cleaning only the routes where the iface has
>> forwarding enabled.
>>
>> Fixes: 830218c1add1 ("net: ipv6: Fix processing of RAs in presence of VRF")

are you sure that is a Fixes tag for this problem? looking at that
change it only handles RA for tables beyond the main table; it does not
change the logic of how many or which routes are purged.

Re: [PATCH v2] net: ipv4: remove unused arg exact_dif in compute_score

2020-08-31 Thread David Ahern

On 8/31/20 12:26 AM, Miaohe Lin wrote:
> The arg exact_dif is not used anymore, remove it. inet_exact_dif_match()
> is no longer needed after the above is removed, so remove it too.
> 
> Signed-off-by: Miaohe Lin 
> ---
>  include/net/tcp.h  | 10 --
>  net/ipv4/inet_hashtables.c |  6 ++
>  2 files changed, 2 insertions(+), 14 deletions(-)
> 

Reviewed-by: David Ahern

Re: [PATCH v2] net: ipv6: remove unused arg exact_dif in compute_score

2020-08-31 Thread David Ahern

On 8/31/20 12:26 AM, Miaohe Lin wrote:
> The arg exact_dif is not used anymore, remove it. inet6_exact_dif_match()
> is no longer needed after the above is removed, remove it too.
> 
> Signed-off-by: Miaohe Lin 
> ---
>  include/linux/ipv6.h| 11 ---
>  net/ipv6/inet6_hashtables.c |  6 ++
>  2 files changed, 2 insertions(+), 15 deletions(-)
> 

Reviewed-by: David Ahern

Re: [PATCH] net: ipv6: remove unused arg exact_dif in compute_score

2020-08-29 Thread David Ahern

On 8/29/20 3:04 AM, Miaohe Lin wrote:
> @@ -138,15 +138,13 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
>   const __be16 sport, const struct in6_addr *daddr,
>   const unsigned short hnum, const int dif, const int sdif)
>  {
> - bool exact_dif = inet6_exact_dif_match(net, skb);

inet6_exact_dif_match is no longer needed after the above is removed.

Re: [PATCH] net: ipv4: remove unused arg exact_dif in compute_score

2020-08-29 Thread David Ahern

On 8/29/20 3:01 AM, Miaohe Lin wrote:
> @@ -277,15 +277,13 @@ static struct sock *inet_lhash2_lookup(struct net *net,
>   const __be32 daddr, const unsigned short hnum,
>   const int dif, const int sdif)
>  {
> - bool exact_dif = inet_exact_dif_match(net, skb);

inet_exact_dif_match is no longer needed after the above is removed.

Re: [net-next v5 1/2] seg6: inherit DSCP of inner IPv4 packets

2020-08-26 Thread David Ahern

On 8/26/20 6:12 AM, Ahmed Abdelsalam wrote:
> 
> On 26/08/2020 02:45, David Ahern wrote:
>> On 8/25/20 5:45 PM, Ahmed Abdelsalam wrote:
>>>
>>> Hi David
>>>
>>> The seg6 encap is implemented through the seg6_lwt rather than
>>> seg6_local_lwt.
>>
>> ok. I don't know the seg6 code; just taking a guess from a quick look.
>>
>>> We can add a flag(SEG6_IPTUNNEL_DSCP) in seg6_iptunnel.h if we do not
>>> want to go the sysctl direction.
>>
>> sysctl is just a big hammer with side effects.
>>
>> It struck me that the DSCP propagation is very similar to the TTL
>> propagation with MPLS which is per route entry (MPLS_IPTUNNEL_TTL and
>> stored as ttl_propagate in mpls_iptunnel_encap). Hence the question of
>> whether SR could make this a per route attribute. Consistency across
>> implementations is best.
>> SRv6 does not have an issue of having this per route.
> Actually, as SRv6 leverage IPv6 encapsulation, I would say it should
> consistent with ip6_tunnel not MPLS.
> 
> In ip6_tunnel, both ttl and flowinfo (tclass and flowlabel) are provided.
> 
> Ideally, SRv6 code should have done the same with:
> TTL   := VLAUE | DEFAULT | inherit.
> TCLASS    := 0x00 .. 0xFF | inherit
> FLOWLABEL := { 0x0 .. 0xf | inherit | compute.
> 

New attributes get added all the time. Why does something like this now
work for these features:

diff --git a/include/uapi/linux/seg6_iptunnel.h
b/include/uapi/linux/seg6_iptunnel.h
index eb815e0d0ac3..b628333ba100 100644
--- a/include/uapi/linux/seg6_iptunnel.h
+++ b/include/uapi/linux/seg6_iptunnel.h
@@ -20,6 +20,8 @@
 enum {
SEG6_IPTUNNEL_UNSPEC,
SEG6_IPTUNNEL_SRH,
+   SEG6_IPTUNNEL_TTL,  /* u8 */
+   SEG6_IPTUNNEL_TCLASS,   /* u8 */
__SEG6_IPTUNNEL_MAX,
 };
 #define SEG6_IPTUNNEL_MAX (__SEG6_IPTUNNEL_MAX - 1)
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 897fa59c47de..7cb512b65bc3 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -46,6 +46,11 @@ static size_t seg6_lwt_headroom(struct
seg6_iptunnel_encap *tuninfo)

 struct seg6_lwt {
struct dst_cache cache;
+   u8  ttl_propagate;  /* propagate ttl from inner header */
+   u8  default_ttl;/* ttl value to use */
+   u8  tclass_inherit; /* inherit tclass from inner header */
+   u8  tclass; /* tclass value to use */
+
struct seg6_iptunnel_encap tuninfo[];
 };

@@ -61,7 +66,10 @@ seg6_encap_lwtunnel(struct lwtunnel_state *lwt)
 }

 static const struct nla_policy seg6_iptunnel_policy[SEG6_IPTUNNEL_MAX +
1] = {
-   [SEG6_IPTUNNEL_SRH] = { .type = NLA_BINARY },
+   [SEG6_IPTUNNEL_UNSPEC]  = { .strict_start_type =
SEG6_IPTUNNEL_SRH + 1 },
+   [SEG6_IPTUNNEL_SRH] = { .type = NLA_BINARY },
+   [SEG6_IPTUNNEL_TTL] = { .type = NLA_U8 },
+   [SEG6_IPTUNNEL_TCLASS]  = { .type = NLA_U8 },
 };

 static int nla_put_srh(struct sk_buff *skb, int attrtype,
@@ -460,6 +468,22 @@ static int seg6_build_state(struct net *net, struct
nlattr *nla,

memcpy(>tuninfo, tuninfo, tuninfo_len);

+   if (tb[SEG6_IPTUNNEL_TTL]) {
+   slwt->default_ttl = nla_get_u8(tb[SEG6_IPTUNNEL_TTL]);
+   slwt->ttl_propagate = slwt->default_ttl ? 0 : 1;
+   }
+   if (tb[SEG6_IPTUNNEL_TCLASS]) {
+   u32 tmp = nla_get_u32(tb[SEG6_IPTUNNEL_TCLASS]);
+
+   if (tmp == (u32)-1) {
+   slwt->tclass_inherit = true;
+   } else if (tmp & ) {
+   error
+   } else {
+   slwt->tclass = ...
+   }
+   }
+
newts->type = LWTUNNEL_ENCAP_SEG6;
newts->flags |= LWTUNNEL_STATE_INPUT_REDIRECT;


And the use the values in slwt as needed.

Re: [net-next v5 1/2] seg6: inherit DSCP of inner IPv4 packets

2020-08-25 Thread David Ahern

On 8/25/20 5:45 PM, Ahmed Abdelsalam wrote:
> 
> Hi David
> 
> The seg6 encap is implemented through the seg6_lwt rather than
> seg6_local_lwt.

ok. I don't know the seg6 code; just taking a guess from a quick look.

> We can add a flag(SEG6_IPTUNNEL_DSCP) in seg6_iptunnel.h if we do not
> want to go the sysctl direction.

sysctl is just a big hammer with side effects.

It struck me that the DSCP propagation is very similar to the TTL
propagation with MPLS which is per route entry (MPLS_IPTUNNEL_TTL and
stored as ttl_propagate in mpls_iptunnel_encap). Hence the question of
whether SR could make this a per route attribute. Consistency across
implementations is best.

> Perhaps this would require various changes to seg6 infrastructure
> including seg6_iptunnel_policy, seg6_build_state, fill_encap,
> get_encap_size, etc.
> 
> We have proposed a patch before to support optional parameters for SRv6
> behaviors [1].
> Unfortunately, this patch was rejected.
> 

not sure I follow why the patch was rejected. Does it change behavior of
existing code?

I would expect that new attributes can be added without affecting
handling of current ones. Looking at seg6_iptunnel.c the new attribute
would be ignored on older kernels but should be fine on new ones and
forward.

###

Since seg6 does not have strict attribute checking the only way to find
out if it is supported is to send down the config and then read it back.
If the attribute is missing, the kernel does not support. Ugly, but one
way to determine support. The next time an attribute is added to seg6
code, strict checking should be enabled so that going forward as new
attributes are added older kernels with strict checking would reject it.

Re: [net-next v5 1/2] seg6: inherit DSCP of inner IPv4 packets

2020-08-25 Thread David Ahern

On 8/25/20 10:02 AM, Ahmed Abdelsalam wrote:
> This patch allows SRv6 encapsulation to inherit the DSCP value of
> the inner IPv4 packet.
> 
> This allows forwarding packet across the SRv6 fabric based on their
> original traffic class.
> 
> The option is controlled through a sysctl (seg6_inherit_inner_ipv4_dscp).
> The sysctl has to be set to 1 to enable this feature.
> 

rather than adding another sysctl, can this be done as a SEG6_LOCAL
attribute and managed via seg6_local_lwt?

Re: general protection fault in fib_dump_info (2)

2020-08-21 Thread David Ahern

On 8/21/20 10:00 AM, Nikolay Aleksandrov wrote:
> 
> This seems like a much older bug to me, the code allows to pass 0 groups
> and
> thus we end up without any nh_grp_entry pointers. I reproduced it with a
> modified iproute2 that sends an empty NHA_GROUP and then just uses the new
> nexthop in any way (e.g. add a route with it). This is the same bug as the
> earlier report for: "general protection fault in fib_check_nexthop"

hmmm empty NHA_GROUP should not be allowed.

> 
> I have a patch but I'll be able to send it tomorrow.
> 

thanks.

[PATCH v2] perf sched timehist: Fix use of CPU list with summary option

2020-08-17 Thread David Ahern

Do not update thread stats or show idle summary unless CPU is in
the list of interest.

Fixes: c30d630d1bcf ("perf sched timehist: Add support for filtering on CPU")
Signed-off-by: David Ahern 
---
v2
- check that cpu_list is set before checking cpu_bitmap in 
timehist_print_summary
 
 tools/perf/builtin-sched.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 0c7d599fa555..e6fc297cee91 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2584,7 +2584,8 @@ static int timehist_sched_change_event(struct perf_tool 
*tool,
}
 
if (!sched->idle_hist || thread->tid == 0) {
-   timehist_update_runtime_stats(tr, t, tprev);
+   if (!cpu_list || test_bit(sample->cpu, cpu_bitmap))
+   timehist_update_runtime_stats(tr, t, tprev);
 
if (sched->idle_hist) {
struct idle_thread_runtime *itr = (void *)tr;
@@ -2857,6 +2858,9 @@ static void timehist_print_summary(struct perf_sched 
*sched,
 
printf("\nIdle stats:\n");
for (i = 0; i < idle_max_cpu; ++i) {
+   if (cpu_list && !test_bit(i, cpu_bitmap))
+   continue;
+
t = idle_threads[i];
if (!t)
continue;
-- 
2.17.1

Re: [PATCH 3/3] ipv6/icmp: l3mdev: Perform icmp error route lookup on source device routing table

2020-08-13 Thread David Ahern

On 8/11/20 1:50 PM, Mathieu Desnoyers wrote:
> As per RFC4443, the destination address field for ICMPv6 error messages
> is copied from the source address field of the invoking packet.
> 
> In configurations with Virtual Routing and Forwarding tables, looking up
> which routing table to use for sending ICMPv6 error messages is
> currently done by using the destination net_device.
> 
> If the source and destination interfaces are within separate VRFs, or
> one in the global routing table and the other in a VRF, looking up the
> source address of the invoking packet in the destination interface's
> routing table will fail if the destination interface's routing table
> contains no route to the invoking packet's source address.
> 
> One observable effect of this issue is that traceroute6 does not work in
> the following cases:
> 
> - Route leaking between global routing table and VRF
> - Route leaking between VRFs
> 
> Preferably use the source device routing table when sending ICMPv6 error
> messages. If no source device is set, fall-back on the destination
> device routing table.
> 
> Link: https://tools.ietf.org/html/rfc4443
> Signed-off-by: Mathieu Desnoyers 
> Cc: David Ahern 
> Cc: David S. Miller 
> Cc: net...@vger.kernel.org
> ---
>  net/ipv6/icmp.c   | 15 +--
>  net/ipv6/ip6_output.c |  2 --
>  2 files changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
> index a4e4912ad607..a971b58b0371 100644
> --- a/net/ipv6/icmp.c
> +++ b/net/ipv6/icmp.c
> @@ -501,8 +501,19 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, 
> __u32 info,
>   if (__ipv6_addr_needs_scope_id(addr_type)) {
>   iif = icmp6_iif(skb);
>   } else {
> - dst = skb_dst(skb);
> - iif = l3mdev_master_ifindex(dst ? dst->dev : skb->dev);
> + struct net_device *route_lookup_dev = NULL;
> +
> + /*
> +  * The device used for looking up which routing table to use is
> +  * preferably the source whenever it is set, which should
> +  * ensure the icmp error can be sent to the source host, else
> +  * fallback on the destination device.
> +  */
> + if (skb->dev)
> + route_lookup_dev = skb->dev;

top of icmp6_send there is a check that skb->dev is set.


> + else if (skb_dst(skb))
> + route_lookup_dev = skb_dst(skb)->dev;
> + iif = l3mdev_master_ifindex(route_lookup_dev);
>   }
>  
>   /*
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index c78e67d7747f..cd623068de53 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -468,8 +468,6 @@ int ip6_forward(struct sk_buff *skb)
>*  check and decrement ttl
>*/
>   if (hdr->hop_limit <= 1) {
> - /* Force OUTPUT device used as source address */
> - skb->dev = dst->dev;

I *think* this ok. Not clear to me why the forward path would change the
skb->dev like that. Goes back to beginning of the git history.

>   icmpv6_send(skb, ICMPV6_TIME_EXCEED, ICMPV6_EXC_HOPLIMIT, 0);
>   __IP6_INC_STATS(net, idev, IPSTATS_MIB_INHDRERRORS);
>  
>

Re: [PATCH 1/3] selftests: Add VRF icmp error route lookup test

2020-08-13 Thread David Ahern

On 8/11/20 1:50 PM, Mathieu Desnoyers wrote:
> +run_cmd()
> +{
> + local cmd="$*"
> + local out
> + local rc
> +
> + if [ "$VERBOSE" = "1" ]; then
> + echo "COMMAND: $cmd"
> + fi
> +
> + out=$(eval $cmd 2>&1)
> + rc=$?
> + if [ "$VERBOSE" = "1" ] && [ -n "$out" ]; then
> + echo "$out"
> + fi
> +
> + [ "$VERBOSE" = "1" ] && echo
> +
> + return $rc
> +}
> +

...

> +ipv6_ping()
> +{
> + log_section "IPv6: VRF ICMP error route lookup ping"
> +
> + setup
> +
> + # verify connectivity
> + if ! check_connectivity6; then
> + echo "Error: Basic connectivity is broken"
> + ret=1
> + return
> + fi
> +
> + if [ "$VERBOSE" = "1" ]; then
> + echo "Command to check for ICMP ttl exceeded:"
> + run_cmd ip netns exec h1 "${ping6}" -t1 -c1 -W2 ${H2_N2_IP6}
> + fi
> +
> + ip netns exec h1 "${ping6}" -t1 -c1 -W2 ${H2_N2_IP6} | grep -q "Time 
> exceeded: Hop limit"

run_cmd runs the command and if VERBOSE is set to 1 shows the command to
the user. Something is off with this script and passing the -v arg -- I
do not get a command list. This applies to the whole script.

Since you need to check for output, I suggest modifying run_cmd to
search the output for the given string.


> + log_test $? 0 "Ping received ICMP ttl exceeded"
> +}
> +

missing newline between '}' and ''

> +# usage
> +
> +usage()
> +{
> +cat < +usage: ${0##*/} OPTS
> +
> + -4  IPv4 tests only
> + -6  IPv6 tests only
> + -p  Pause on fail
> + -v  verbose mode (show commands and output)
> +EOF
> +}
> +
> +
> +# main
> +
> +# Some systems don't have a ping6 binary anymore
> +command -v ping6 > /dev/null 2>&1 && ping6=$(command -v ping6) || 
> ping6=$(command -v ping)
> +
> +TESTS_IPV4="ipv4_ping ipv4_traceroute"
> +TESTS_IPV6="ipv6_ping ipv6_traceroute"
> +
> +ret=0
> +nsuccess=0
> +nfail=0
> +setup=0
> +
> +while getopts :46pvh o
> +do
> + case $o in
> + 4) TESTS=ipv4;;
> + 6) TESTS=ipv6;;
> +p) PAUSE_ON_FAIL=yes;;
> +v) VERBOSE=1;;
> + h) usage; exit 0;;
> +*) usage; exit 1;;

indentation issues; not using tabs

> + esac
> +done
> +
> +#
> +# show user test config
> +#
> +if [ -z "$TESTS" ]; then
> +TESTS="$TESTS_IPV4 $TESTS_IPV6"
> +elif [ "$TESTS" = "ipv4" ]; then
> +TESTS="$TESTS_IPV4"
> +elif [ "$TESTS" = "ipv6" ]; then
> +TESTS="$TESTS_IPV6"
> +fi
> +
> +for t in $TESTS
> +do
> + case $t in
> + ipv4_ping|ping) ipv4_ping;;
> + ipv4_traceroute|traceroute) ipv4_traceroute;;
> +
> + ipv6_ping|ping) ipv6_ping;;
> + ipv6_traceroute|traceroute) ipv6_traceroute;;
> +
> + # setup namespaces and config, but do not run any tests
> + setup)  setup; exit 0;;

you don't allow '-t setup' so you can remove this part

> +
> + help)   echo "Test names: $TESTS"; exit 0;;
> + esac
> +done
> +
> +cleanup
> +
> +printf "\nTests passed: %3d\n" ${nsuccess}
> +printf "Tests failed: %3d\n"   ${nfail}
> +
> +exit $ret
>

Re: [PATCH 2/3] ipv4/icmp: l3mdev: Perform icmp error route lookup on source device routing table

2020-08-13 Thread David Ahern

On 8/11/20 1:50 PM, Mathieu Desnoyers wrote:
> As per RFC792, ICMP errors should be sent to the source host.
> 
> However, in configurations with Virtual Routing and Forwarding tables,
> looking up which routing table to use is currently done by using the
> destination net_device.
> 
> commit 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to
> determine L3 domain") changes the interface passed to
> l3mdev_master_ifindex() and inet_addr_type_dev_table() from skb_in->dev
> to skb_dst(skb_in)->dev. This effectively uses the destination device
> rather than the source device for choosing which routing table should be
> used to lookup where to send the ICMP error.
> 
> Therefore, if the source and destination interfaces are within separate
> VRFs, or one in the global routing table and the other in a VRF, looking
> up the source host in the destination interface's routing table will
> fail if the destination interface's routing table contains no route to
> the source host.
> 
> One observable effect of this issue is that traceroute does not work in
> the following cases:
> 
> - Route leaking between global routing table and VRF
> - Route leaking between VRFs
> 
> Preferably use the source device routing table when sending ICMP error
> messages. If no source device is set, fall-back on the destination
> device routing table.
> 
> Fixes: 9d1a6c4ea43e ("net: icmp_route_lookup should use rt dev to determine 
> L3 domain")
> Link: https://tools.ietf.org/html/rfc792
> Signed-off-by: Mathieu Desnoyers 
> Cc: David Ahern 
> Cc: David S. Miller 
> Cc: net...@vger.kernel.org
> ---
>  net/ipv4/icmp.c | 15 +--
>  1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
> index cf36f955bfe6..1eb83d82ec68 100644
> --- a/net/ipv4/icmp.c
> +++ b/net/ipv4/icmp.c
> @@ -465,6 +465,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
>   int type, int code,
>   struct icmp_bxm *param)
>  {
> + struct net_device *route_lookup_dev = NULL;
>   struct rtable *rt, *rt2;
>   struct flowi4 fl4_dec;
>   int err;
> @@ -479,7 +480,17 @@ static struct rtable *icmp_route_lookup(struct net *net,
>   fl4->flowi4_proto = IPPROTO_ICMP;
>   fl4->fl4_icmp_type = type;
>   fl4->fl4_icmp_code = code;
> - fl4->flowi4_oif = l3mdev_master_ifindex(skb_dst(skb_in)->dev);
> + /*
> +  * The device used for looking up which routing table to use is
> +  * preferably the source whenever it is set, which should ensure
> +  * the icmp error can be sent to the source host, else fallback
> +  * on the destination device.
> +  */
> + if (skb_in->dev)
> + route_lookup_dev = skb_in->dev;
> + else if (skb_dst(skb_in))
> + route_lookup_dev = skb_dst(skb_in)->dev;
> + fl4->flowi4_oif = l3mdev_master_ifindex(route_lookup_dev);
>  
>   security_skb_classify_flow(skb_in, flowi4_to_flowi(fl4));
>   rt = ip_route_output_key_hash(net, fl4, skb_in);
> @@ -503,7 +514,7 @@ static struct rtable *icmp_route_lookup(struct net *net,
>   if (err)
>   goto relookup_failed;
>  
> - if (inet_addr_type_dev_table(net, skb_dst(skb_in)->dev,
> + if (inet_addr_type_dev_table(net, route_lookup_dev,
>fl4_dec.saddr) == RTN_LOCAL) {
>   rt2 = __ip_route_output_key(net, _dec);
>   if (IS_ERR(rt2))
> 

ICMP's can be generated in many locations:
1. forward path - I think the skb_in dev is always set,

2. ingress and upper layer protocols -  dev is dropped prior to
transport layers, so, for example, UDP sending port unreachable calls
icmp_send with skb_in->dev set to NULL.

3. local packets and egress - e.g., link failures and here I believe skb
dev is set.

If in and out are in the same L3 domain, either device works where for
VRF route leaking with the forward path in and out are in separate
domains so yes you want the ingress device.

This change seems fine to me and I have not seen any issues with
existing selftests.

Reviewed-by: David Ahern 


But I did notice that unreachable / fragmentation needed messages are
NOT working with this change. You can see that by changing the MTU of
eth1 in r1 to 1400 and running:
   ip netns exec h1 ping -s 1450 -Mdo -c1 172.16.2.2

You really should get that working as well with VRF route leaking.

Re: [RFC PATCH 0/5] Introduce /proc/all/ to gather stats from all processes

2020-08-12 Thread David Ahern

On 8/12/20 1:51 AM, Andrei Vagin wrote:
> 
> I rebased the task_diag patches on top of v5.8:
> https://github.com/avagin/linux-task-diag/tree/v5.8-task-diag

Thanks for updating the patches.

> 
> /proc/pid files have three major limitations:
> * Requires at least three syscalls per process per file
>   open(), read(), close()
> * Variety of formats, mostly text based
>   The kernel spent time to encode binary data into a text format and
>   then tools like top and ps spent time to decode them back to a binary
>   format.
> * Sometimes slow due to extra attributes
>   For example, /proc/PID/smaps contains a lot of useful informations
>   about memory mappings and memory consumption for each of them. But
>   even if we don't need memory consumption fields, the kernel will
>   spend time to collect this information.

that's what I recall as well.

> 
> More details and numbers are in this article:
> https://avagin.github.io/how-fast-is-procfs
> 
> This new interface doesn't have only one of these limitations, but
> task_diag doesn't have all of them.
> 
> And I compared how fast each of these interfaces:
> 
> The test environment:
> CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
> RAM: 16GB
> kernel: v5.8 with task_diag and /proc/all patches.
> 100K processes:
> $ ps ax | wc -l
> 10228

100k processes but showing 10k here??

> 
> $ time cat /proc/all/status > /dev/null
> 
> real  0m0.577s
> user  0m0.017s
> sys   0m0.559s
> 
> task_proc_all is used to read /proc/pid/status for all tasks:
> https://github.com/avagin/linux-task-diag/blob/master/tools/testing/selftests/task_diag/task_proc_all.c
> 
> $ time ./task_proc_all status
> tasks: 100230
> 
> real  0m0.924s
> user  0m0.054s
> sys   0m0.858s
> 
> 
> /proc/all/status is about 40% faster than /proc/*/status.
> 
> Now let's take a look at the perf output:
> 
> $ time perf record -g cat /proc/all/status > /dev/null
> $ perf report
> -   98.08% 1.38%  cat  [kernel.vmlinux]  [k] entry_SYSCALL_64
>- 96.70% entry_SYSCALL_64
>   - do_syscall_64
>  - 94.97% ksys_read
> - 94.80% vfs_read
>- 94.58% proc_reg_read
>   - seq_read
>  - 87.95% proc_pid_status
> + 13.10% seq_put_decimal_ull_width
> - 11.69% task_mem
>+ 9.48% seq_put_decimal_ull_width
> + 10.63% seq_printf
> - 10.35% cpuset_task_status_allowed
>+ seq_printf
> - 9.84% render_sigset_t
>  1.61% seq_putc
>+ 1.61% seq_puts
> + 4.99% proc_task_name
> + 4.11% seq_puts
> - 3.76% render_cap_t
>  2.38% seq_put_hex_ll
>+ 1.25% seq_puts
>   2.64% __task_pid_nr_ns
> + 1.54% get_task_mm
> + 1.34% __lock_task_sighand
> + 0.70% from_kuid_munged
>   0.61% get_task_cred
>   0.56% seq_putc
>   0.52% hugetlb_report_usage
>   0.52% from_kgid_munged
>  + 4.30% proc_all_next
>  + 0.82% _copy_to_user 
> 
> We can see that the kernel spent more than 50% of the time to encode binary
> data into a text format.
> 
> Now let's see how fast task_diag:
> 
> $ time ./task_diag_all all -c -q
> 
> real  0m0.087s
> user  0m0.001s
> sys   0m0.082s
> 
> Maybe we need resurrect the task_diag series instead of inventing
> another less-effective interface...

I think the netlink message design is the better way to go. As system
sizes continue to increase (> 100 cpus is common now) you need to be
able to pass the right data to userspace as fast as possible to keep up
with what can be a very dynamic userspace and set of processes.

When you first proposed this idea I was working on systems with >= 1k
cpus and the netlink option was able to keep up with a 'make -j N' on
those systems. `perf record` walking /proc would never finish
initializing - I had to add a "done initializing" message to know when
to start a test. With the task_diag approach, perf could collect the
data in short order and move on to recording data.

Re: [PATCH 2/2] perf sched timehist: Fix use of CPU list with summary option

2020-08-11 Thread David Ahern

On 8/11/20 12:42 AM, Namhyung Kim wrote:
>> @@ -2575,7 +2575,8 @@ static int timehist_sched_change_event(struct 
>> perf_tool *tool,
>> }
>>
>> if (!sched->idle_hist || thread->tid == 0) {
>> -   timehist_update_runtime_stats(tr, t, tprev);
>> +   if (!cpu_list || test_bit(sample->cpu, cpu_bitmap))
>> +   timehist_update_runtime_stats(tr, t, tprev);
>>
>> if (sched->idle_hist) {
>> struct idle_thread_runtime *itr = (void *)tr;
>> @@ -2848,6 +2849,9 @@ static void timehist_print_summary(struct perf_sched 
>> *sched,
>>
>> printf("\nIdle stats:\n");
>> for (i = 0; i < idle_max_cpu; ++i) {
>> +   if (!test_bit(i, cpu_bitmap))
> 
> Shouldn't it check cpu_list as well?
> 

oh, right. will fix.

Re: [PATCH] selftests: Add VRF icmp error route lookup test

2020-08-11 Thread David Ahern

On 8/11/20 1:11 PM, Mathieu Desnoyers wrote:
> One thing I am missing before this series can be considered for upstreaming
> is an Acked-by of the 2 fixes for ipv4 and ipv6 from you, as maintainer
> of l3mdev, if you think the approach I am taking with those fixes makes sense.

Send the set, and I will review as vrf/l3mdev maintainer. I need working
tests and patches to see the before and after.

Re: [PATCH] selftests: Add VRF icmp error route lookup test

2020-08-11 Thread David Ahern

On 8/11/20 11:28 AM, David Miller wrote:
> From: Michael Jeanson 
> Date: Thu,  6 Aug 2020 14:51:21 -0400
> 
>> The objective is to check that the incoming vrf routing table is selected
>> to send an ICMP error back to the source when the ttl of a packet reaches 1
>> while it is forwarded between different vrfs.
>>
>> The first test sends a ping with a ttl of 1 from h1 to h2 and parses the
>> output of the command to check that a ttl expired error is received.
>>
>> [This may be flaky, I'm open to suggestions of a more robust approch.]
>>
>> The second test runs traceroute from h1 to h2 and parses the output to
>> check for a hop on r1.
>>
>> Signed-off-by: Michael Jeanson 
> 
> This patch does not apply cleanly to the current net tree.
> 

It is also out of context since the tests fail on current net and net-next.

The tests along with the patches that fix the problem should be sent
together.

Re: [RFC PATCH 0/7] metricfs metric file system and examples

2020-08-08 Thread David Ahern

On 8/7/20 8:06 PM, Andrew Lunn wrote:
> So i personally don't think netdev statistics is a good idea, i doubt
> it scales.

+1

Re: [PATCH 1/2] perf sched: Prefer sched_waking event when it exists

2020-08-07 Thread David Ahern

On 8/7/20 1:43 PM, Arnaldo Carvalho de Melo wrote:
>> @@ -2958,9 +2967,10 @@ static int timehist_check_attr(struct perf_sched 
>> *sched,
>>  
>>  static int perf_sched__timehist(struct perf_sched *sched)
>>  {
>> -const struct evsel_str_handler handlers[] = {
>> +struct evsel_str_handler handlers[] = {
>>  { "sched:sched_switch",   timehist_sched_switch_event, },
>>  { "sched:sched_wakeup",   timehist_sched_wakeup_event, },
>> +{ "sched:sched_waking",   timehist_sched_wakeup_event, },
>>  { "sched:sched_wakeup_new",   timehist_sched_wakeup_event, },
>>  };
>>  const struct evsel_str_handler migrate_handlers[] = {
>> @@ -3018,6 +3028,11 @@ static int perf_sched__timehist(struct perf_sched 
>> *sched)
>>  
>>  setup_pager();
>>  
>> +/* prefer sched_waking if it is captured */
>> +if (perf_evlist__find_tracepoint_by_name(session->evlist,
>> +  "sched:sched_waking"))
>> +handlers[1].handler = timehist_sched_wakeup_ignore;
>> +
> 
> 
> ouch, can't we figure out if its present and then don't ask for the
> wakeup one to be recorded?
> 

This is the analysis side. If someone recorded with sched:* we do not
want to analyze both sched_wakeup and sched_waking. Rather, it should
prefer the latter and ignore the former.

[PATCH 2/2] perf sched timehist: Fix use of CPU list with summary option

2020-08-07 Thread David Ahern

Do not update thread stats or show idle summary unless CPU is in
the list of interest.

Fixes: c30d630d1bcf ("perf sched timehist: Add support for filtering on CPU")
Signed-off-by: David Ahern 
---
 tools/perf/builtin-sched.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 0c7d599fa555..82ee0dfd1831 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2575,7 +2575,8 @@ static int timehist_sched_change_event(struct perf_tool 
*tool,
}
 
if (!sched->idle_hist || thread->tid == 0) {
-   timehist_update_runtime_stats(tr, t, tprev);
+   if (!cpu_list || test_bit(sample->cpu, cpu_bitmap))
+   timehist_update_runtime_stats(tr, t, tprev);
 
if (sched->idle_hist) {
struct idle_thread_runtime *itr = (void *)tr;
@@ -2848,6 +2849,9 @@ static void timehist_print_summary(struct perf_sched 
*sched,
 
printf("\nIdle stats:\n");
for (i = 0; i < idle_max_cpu; ++i) {
+   if (!test_bit(i, cpu_bitmap))
+   continue;
+
t = idle_threads[i];
if (!t)
continue;
-- 
2.17.1

[PATCH 1/2] perf sched: Prefer sched_waking event when it exists

2020-08-07 Thread David Ahern

Commit fbd705a0c618 ("sched: Introduce the 'trace_sched_waking' tracepoint")
added sched_waking tracepoint which should be preferred over sched_wakeup
when analyzing scheduling delays.

Update 'perf sched record' to collect sched_waking events if it exists
and fallback to sched_wakeup if it does not. Similarly, update timehist
command to skip sched_wakeup events if the session includes
sched_waking (ie., sched_waking is preferred over sched_wakeup).

Signed-off-by: David Ahern 
---
 tools/perf/builtin-sched.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 459e4229945e..0c7d599fa555 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -2398,6 +2398,15 @@ static void timehist_print_wakeup_event(struct 
perf_sched *sched,
printf("\n");
 }
 
+static int timehist_sched_wakeup_ignore(struct perf_tool *tool __maybe_unused,
+   union perf_event *event __maybe_unused,
+   struct evsel *evsel __maybe_unused,
+   struct perf_sample *sample 
__maybe_unused,
+   struct machine *machine __maybe_unused)
+{
+   return 0;
+}
+
 static int timehist_sched_wakeup_event(struct perf_tool *tool,
   union perf_event *event __maybe_unused,
   struct evsel *evsel,
@@ -2958,9 +2967,10 @@ static int timehist_check_attr(struct perf_sched *sched,
 
 static int perf_sched__timehist(struct perf_sched *sched)
 {
-   const struct evsel_str_handler handlers[] = {
+   struct evsel_str_handler handlers[] = {
{ "sched:sched_switch",   timehist_sched_switch_event, },
{ "sched:sched_wakeup",   timehist_sched_wakeup_event, },
+   { "sched:sched_waking",   timehist_sched_wakeup_event, },
{ "sched:sched_wakeup_new",   timehist_sched_wakeup_event, },
};
const struct evsel_str_handler migrate_handlers[] = {
@@ -3018,6 +3028,11 @@ static int perf_sched__timehist(struct perf_sched *sched)
 
setup_pager();
 
+   /* prefer sched_waking if it is captured */
+   if (perf_evlist__find_tracepoint_by_name(session->evlist,
+ "sched:sched_waking"))
+   handlers[1].handler = timehist_sched_wakeup_ignore;
+
/* setup per-evsel handlers */
if (perf_session__set_tracepoints_handlers(session, handlers))
goto out;
@@ -3330,12 +3345,16 @@ static int __cmd_record(int argc, const char **argv)
"-e", "sched:sched_stat_iowait",
"-e", "sched:sched_stat_runtime",
"-e", "sched:sched_process_fork",
-   "-e", "sched:sched_wakeup",
"-e", "sched:sched_wakeup_new",
"-e", "sched:sched_migrate_task",
};
+   struct tep_event *waking_event;
 
-   rec_argc = ARRAY_SIZE(record_args) + argc - 1;
+   /*
+* +2 for either "-e", "sched:sched_wakeup" or
+* "-e", "sched:sched_waking"
+*/
+   rec_argc = ARRAY_SIZE(record_args) + 2 + argc - 1;
rec_argv = calloc(rec_argc + 1, sizeof(char *));
 
if (rec_argv == NULL)
@@ -3344,6 +3363,13 @@ static int __cmd_record(int argc, const char **argv)
for (i = 0; i < ARRAY_SIZE(record_args); i++)
rec_argv[i] = strdup(record_args[i]);
 
+   rec_argv[i++] = "-e";
+   waking_event = trace_event__tp_format("sched", "sched_waking");
+   if (!IS_ERR(waking_event))
+   rec_argv[i++] = strdup("sched:sched_waking");
+   else
+   rec_argv[i++] = strdup("sched:sched_wakeup");
+
for (j = 1; j < (unsigned int)argc; j++, i++)
rec_argv[i] = argv[j];
 
-- 
2.17.1

Re: [net-next iproute2 PATCH v3 1/2] iplink: hsr: add support for creating PRP device similar to HSR

2020-08-06 Thread David Ahern

On 8/6/20 10:04 AM, Murali Karicheri wrote:
> that the maintainers are different than the netdev maintainers. My bad.
> The PRP driver support in kernel is merged by Dave to net-next and this
> iproute2 change has to go with it. So please review and apply this if it
> looks good. The kernel part merged is at

there was a long delay between iproute2 and commit to net-next. You need
to re-send the iproute2 patches.

Re: [RFC PATCH 1/3] selftests: Add VRF icmp error route lookup test

2020-08-06 Thread David Ahern

On 7/29/20 3:12 PM, Mathieu Desnoyers wrote:
> From: Michael Jeanson 
> 
> The objective is to check that the incoming vrf routing table is selected
> to send an ICMP error back to the source when the ttl of a packet reaches 1
> while it is forwarded between different vrfs.
> 
> The first test sends a ping with a ttl of 1 from h1 to h2 and parses the
> output of the command to check that a ttl expired error is received.
> 
> [This may be flaky, I'm open to suggestions of a more robust approch.]
> 
> The second test runs traceroute from h1 to h2 and parses the output to check
> for a hop on r1.
> 
> Signed-off-by: Michael Jeanson 
> Cc: David Ahern 

Update the address to dsah...@kernel.org


> Cc: David S. Miller 
> Cc: net...@vger.kernel.org
> ---
>  tools/testing/selftests/net/Makefile  |   1 +
>  .../selftests/net/vrf_icmp_error_route.sh | 461 ++
>  2 files changed, 462 insertions(+)
>  create mode 100755 tools/testing/selftests/net/vrf_icmp_error_route.sh
> 

Test seems fine to me. you copied icmp_redirect.sh which is fine but
please clean up comments and functions not needed for this test.

Re: [PATCH 0/6] perf tools: Add wallclock time conversion support

2020-07-31 Thread David Ahern

On 7/31/20 12:05 PM, pet...@infradead.org wrote:
> On Fri, Jul 31, 2020 at 08:36:12AM -0700, Andi Kleen wrote:
>>> yep, we have a customer that needs to compare data from multiple servers
>>
>> It's also needed to correlate over different guests on the same machine.
>> This is an important use case.
> 
> Both these cases you want to sync up CLOCK_MONOTONIC, using walltime is
> just utterly misguided.

Every userspace component logs in walltime. You can say that is
misguided, but that is the way it is. The missing piece is the ability
to correlate kernel events to userspace logs.

> 
> What happens if the servers have (per accident or otherwise) different
> DST settings, or someone does a clock_setttime() for giggles.

Yes, someone *could* change the time. Someone *could* start ntpd or
other time server in the middle of a session. While technically such
things can happen, that is not real life in most environments (e.g.,
Data center servers). ntpd (or other) is started at boot, and it is just
the little misc adjustments that happen over time.

We could add tracepoints and detect the changes and invalidate the
reference time. We could add tracepoints to track the adjustments and
update the reference time. In my experience over 9+ years using this
tool (out of tree patches) that has never been the problem.

> 
> All you really want is a clock that runs at the same rate but is not
> subject to random jumps and user foibles.
> 

All I want is to compare user logs to a kernel event via timestamps.

Re: [PATCH 2/6] perf tools: Store clock references for -k/--clockid option

2020-07-31 Thread David Ahern

On 7/31/20 10:15 AM, Jiri Olsa wrote:
>> It might also want to be implemented in a loop and iteration with minimal
>> time delta is chosen to improve synchronization accuracy and also mitigate
>> possible context switches between gettimeofday() and clock_gettime() calls.
> right, we could make this more accurate.. I'll post some follow up
> change with that

If it is handled through vdso (e.g., x86) both calls should be very fast
with no context switches and the margin of error is well less than the
usec resolution in the output.

Re: [PATCH 0/6] perf tools: Add wallclock time conversion support

2020-07-30 Thread David Ahern

On 7/30/20 4:14 PM, pet...@infradead.org wrote:
> On Thu, Jul 30, 2020 at 11:39:44PM +0200, Jiri Olsa wrote:
> 
>> The patchset is adding the ability to display TOD/wallclock timestamp
>> in 'perf script' output and in 'perf data convert --to-ctf' subcommand,
>> so the converted CTF data contain TOD/wallclock timestamps.
> 
> But why? Wallclock is a horrible piece of crap. Why would you want to do
> this?
> 

Same reason I brought this up 9+ years ago: userspace lives on
time-of-day, and troubleshooting is based on correlating timestamps from
multiple sources. To correlate a perf event to syslog or an application
log, we need time-of-day.

Re: [PATCH] ipv6: Fix nexthop refcnt leak when creating ipv6 route info

2020-07-25 Thread David Ahern

On 7/25/20 2:02 AM, Xiyu Yang wrote:
> ip6_route_info_create() invokes nexthop_get(), which increases the
> refcount of the "nh".
> 
> When ip6_route_info_create() returns, local variable "nh" becomes
> invalid, so the refcount should be decreased to keep refcount balanced.

I forgot to write the test case for this very code path in
tools/testing/selftests/net/fib_nexthops.sh. If you have the time, it
goes in ipv6_fcnal_runtime() - see the last 'TO-DO' item.

> 
> The reference counting issue happens in one exception handling path of
> ip6_route_info_create(). When nexthops can not be used with source
> routing, the function forgets to decrease the refcnt increased by
> nexthop_get(), causing a refcnt leak.
> 
> Fix this issue by pulling up the error source routing handling when
> nexthops can not be used with source routing.
> 

Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a
fib6_info")

> Signed-off-by: Xiyu Yang 
> Signed-off-by: Xin Tan 
> ---
>  net/ipv6/route.c | 8 ----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 


Reviewed-by: David Ahern

Re: linux-next: Tree for Jul 22 (drivers/net/vrf)

2020-07-22 Thread David Ahern

On 7/22/20 10:35 AM, Randy Dunlap wrote:
> On 7/22/20 6:16 AM, Stephen Rothwell wrote:
>> Hi all,
>>
>> Changes since 20200721:
>>
> 
> on i386:
> when CONFIG_SYSCTL is not set/enabled:
> 
> ERROR: modpost: "sysctl_vals" [drivers/net/vrf.ko] undefined!
> 
> 

thanks for the report; I'll send a patch when I get a few minutes.

Re: [PATCH] selftests: fib_nexthop_multiprefix: fix cleanup() netns deletion

2020-07-14 Thread David Ahern

On 7/14/20 9:40 AM, Paolo Pisati wrote:
> During setup():
> ...
> for ns in h0 r1 h1 h2 h3
> do
> create_ns ${ns}
> done
> ...
> 
> while in cleanup():
> ...
> for n in h1 r1 h2 h3 h4
> do
> ip netns del ${n} 2>/dev/null
> done
> ...
> 
> and after removing the stderr redirection in cleanup():
> 
> $ sudo ./fib_nexthop_multiprefix.sh
> ...
> TEST: IPv4: host 0 to host 3, mtu 1400  [ OK ]
> TEST: IPv6: host 0 to host 3, mtu 1400  [ OK ]
> Cannot remove namespace file "/run/netns/h4": No such file or directory
> $ echo $?
> 1
> 
> and a non-zero return code, make kselftests fail (even if the test
> itself is fine):
> 
> ...
> not ok 34 selftests: net: fib_nexthop_multiprefix.sh # exit=1
> ...
> 
> Signed-off-by: Paolo Pisati 
> ---
>  tools/testing/selftests/net/fib_nexthop_multiprefix.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/testing/selftests/net/fib_nexthop_multiprefix.sh 
> b/tools/testing/selftests/net/fib_nexthop_multiprefix.sh
> index 9dc35a16e415..51df5e305855 100755
> --- a/tools/testing/selftests/net/fib_nexthop_multiprefix.sh
> +++ b/tools/testing/selftests/net/fib_nexthop_multiprefix.sh
> @@ -144,7 +144,7 @@ setup()
>  
>  cleanup()
>  {
> -     for n in h1 r1 h2 h3 h4
> + for n in h0 r1 h1 h2 h3
>   do
>   ip netns del ${n} 2>/dev/null
>   done
> 

Reviewed-by: David Ahern

Re: [v2,iproute2-next 1/2] action police: change the print message quotes style

2020-07-05 Thread David Ahern

On 6/28/20 8:04 PM, Po Liu wrote:
> Change the double quotes to single quotes in fprintf message to make it
> more readable.
> 
> Signed-off-by: Po Liu 
> ---
> v1->v2 changes:
> - Patch new added
> 
>  tc/m_police.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 

applied both to iproute2-next. Thanks

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3566 matches

Mail list logo