Re: [ovs-discuss] Encapsulate VXLAN and then process other flows

2024-02-21 Thread Ilya Maximets via discuss
On 2/19/24 17:19, Ilya Maximets wrote:
> On 2/7/24 03:21, Lim, Derrick wrote:
>> Hi Ilya Maximets,
>>
>> From the tcpdump, with or without the rewrite, the link-local address was 
>> used.
>>
>> ===
>> $ ovs-tcpdump -nn -i exit_p0
>> 11:10:26.323938 IP6 fe80::dc03:37ff:fee2:1fef.51513 > 
>> 2403:400:31da:::18:3.4789: VXLAN, flags [I] (0x08), vni 1
>> IP 100.87.18.60 > 192.168.1.33: ICMP echo request, id 70, seq 1, length 64
>> 11:10:27.326875 IP6 fe80::dc03:37ff:fee2:1fef.51513 > 
>> 2403:400:31da:::18:3.4789: VXLAN, flags [I] (0x08), vni 1
>> IP 100.87.18.60 > 192.168.1.33: ICMP echo request, id 70, seq 2, length 64
>> ===
>>
>> Here is the output of the trace without the rewrite.
>> ===
>> $ ovs-appctl ofproto/trace --names br-int 'in_port=dpdk-vm101,
>> eth_src=52:54:00:3d:cd:0c,eth_dst=00:00:00:00:00:01,eth_type=0x0800,
>> nw_src=100.87.18.60,nw_dst=192.168.1.33,nw_proto=1,nw_ttl=64,nw_frag=no,
>> icmp_type=8,icmp_code=0'
>> Flow: 
>> icmp,in_port="dpdk-vm101",vlan_tci=0x,dl_src=52:54:00:3d:cd:0c,dl_dst=00:00:00:00:00:01,nw_src=100.87.18.60,nw_dst=192.168.1.33,nw_tos=0,nw_ecn=0,nw_ttl=64,nw_frag=no,icmp_type=8,icmp_code=0
>>
>> bridge("br-int")
>> --
>> 0. in_port="dpdk-vm101", priority 32768
>>     output:vxlan0
>>  -> output to native tunnel
>>  -> tunneling to fe80::920a:84ff:fe9e:9570 via br-phy
>>  -> tunneling from de:03:37:e2:1f:ef fe80::dc03:37ff:fee2:1fef to 
>> 90:0a:84:9e:95:70 fe80::920a:84ff:fe9e:9570
>>
>> bridge("br-phy")
>> ---
>> 0. priority 10
>>     NORMAL
>>  -> forwarding to learned port
>>
>> Final flow: unchanged
>> Megaflow: recirc_id=0,eth,ip,in_port="dpdk-vm101",nw_ecn=0,nw_frag=no
>> Datapath actions: 
>> tnl_push(tnl_port(vxlan_sys_4789),header(size=70,type=4,eth(dst=90:0a:84:9e:95:70,src=de:03:37:e2:1f:ef,dl_type=0x86dd),ipv6(src=fe80::dc03:37ff:fee2:1fef,dst=2403:400:31da:::18:3,label=0,proto=17,tclass=0x0,hlimit=64),udp(src=0,dst=4789,csum=0x),vxlan(flags=0x800,vni=0x1)),out_port(br-phy)),push_vlan(vid=304,pcp=0),exit_p0
>> ===
>>
>> The "tunneling to fe80::920a:84ff:fe9e:9570 via br-phy" looks a bit curious.
>> I'm not sure why this was picked instead of the `remote_ip` specified in the
>> tunnel configuration. But then the final datapath actions shows the correct
>> `dst` address.
> 
> Hi.  Sorry for the late reply, was caught up in the releases.
> 
> The 'tunneling to' message may be a little misleading, it prints out the
> result of a route lookup, and we only use the device name from it while
> building a tunnel header.  The correct remote ip will be taken from a tunnel
> configuration, not the IP from a route lookup.  Maybe the wording in the
> trace needs some adjustment.

On a second look, it does seem a little strange.  The likley cause of
having fe80::920a:84ff:fe9e:9570 instead of the configured remote_ip
is that OVS found a route to 2403:400:31da:::18:3 via a gateway
fe80::920a:84ff:fe9e:9570.  But in one of the previous route lookups
you provided the fe80::920a:84ff:fe9e:9570 was indeed a gatewey IP,
so it checks out.  The correct remote_ip is used in the actions because
though we're sending the packet via gateway, we're not send it to the
gateway.  The gateway IP is only needed to get the destination MAC.

'tunneling to fe80::920a:84ff:fe9e:9570 via br-phy' should probbaly be
'tunneling via fe80::920a:84ff:fe9e:9570 and br-phy' in this case.

> 
>> Why is the `local_ip` specified in the VXLAN tunnel options
>> not considered?
> I see there is a bug in the tunnel lookup code that doesn't take into
> account IPv6 local ip.  It only checks for IPv4 one.  The following
> change should fix it:
> 
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 1cf4d5f7c..89f183182 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -3815,6 +3815,8 @@ native_tunnel_output(struct xlate_ctx *ctx, const 
> struct xport *xport,
>  
>  if (flow->tunnel.ip_src) {
>  in6_addr_set_mapped_ipv4(_ip6, flow->tunnel.ip_src);
> +} else if (ipv6_addr_is_set(>tunnel.ipv6_src)) {
> +s_ip6 = flow->tunnel.ipv6_src;
>  }
>  
>  err = tnl_route_lookup_flow(ctx, flow, _ip6, _ip6, _dev);
> ---
> 
> Could you try it in your setup?
> 
> Without this change the route lookup is performed without taking the
> local_ip into account and later the local_ip is not used for the packet
> header.
> 
> I'll work on a proper patch for this.

FWIW, I posted a fix here:
  
https://patchwork.ozlabs.org/project/openvswitch/patch/20240220223547.2368878-4-i.maxim...@ovn.org/

> 
> Best regards, Ilya Maximets.
> 
>>
>> Here is the out of the trace with the rewrite. It seems the flow entry was
>> matched but the rewrite didn't happen.
>>
>> ===
>> $ ovs-appctl ofproto/trace --names br-int 'in_port=dpdk-vm101,
>> eth_src=52:54:00:3d:cd:0c,eth_dst=00:00:00:00:00:01,eth_type=0x0800,
>> 

Re: [ovs-discuss] ovs-vswitchd core at revalidator_sweep__

2024-02-21 Thread Eelco Chaudron via discuss


On 21 Feb 2024, at 4:26, LIU Yulong wrote:

> Thank you very much for your reply.
>
> The problem is not easy to reproduce, we have to wait a random long time to 
> see
> if the issue happens again. It can be more than one day or longer.
> OVS 2.17 with dpdk 20.11 had run to core before, so it's hard to say
> if it is related to DPDK.
> I'm running the ovs without offload to see if the issue can happen in
> recent days.
>
> And again, TLDR, paste more thread call stacks.
> Most of the threads are in the state of sched_yield, nanosleep,
> epoll_wait and  poll.

If this looks like a memory trash issue, it might be hard to figure out. Does 
the ukey show any kind of pattern, i.e. does the trashed data look like 
anything known?
Maybe it’s a use after free, so you could add some debugging code 
logging/recording all free and xmalloc of the ukey structure, to see that when 
it crashes it was actually allocated?

Hope this helps you getting started.

//Eelco

> The following threads are in working state. So hope this can have
> clues for investigation.
>
> Thread 14 (Thread 0x7fd34002b700 (LWP 91928)):
> #0  0x7fd344487b6d in recvmsg () at ../sysdeps/unix/syscall-template.S:81
> #1  0x562773cb8d03 in mp_handle ()
> #2  0x7fd344480e65 in start_thread (arg=0x7fd34002b700) at
> pthread_create.c:307
> #3  0x7fd34260988d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> Thread 13 (Thread 0x7fd3359d7700 (LWP 91929)):
> #0  0x7fd34448799d in accept () at ../sysdeps/unix/syscall-template.S:81
> #1  0x562773cd8f3c in socket_listener ()
> #2  0x7fd344480e65 in start_thread (arg=0x7fd3359d7700) at
> pthread_create.c:307
> #3  0x7fd34260988d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> Thread 6 (Thread 0x7fd304663700 (LWP 91965)):
> #0  0x7fd34448771d in read () at ../sysdeps/unix/syscall-template.S:81
> #1  0x7fd343b42bfb in _mlx5dv_devx_get_event () from /lib64/libmlx5.so.1
> #2  0x562773936d86 in mlx5_vdpa_event_handle ()
> #3  0x7fd344480e65 in start_thread (arg=0x7fd304663700) at
> pthread_create.c:307
> #4  0x7fd34260988d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>
> Thread 2 (Thread 0x7fd305730700 (LWP 91943)):
> #0  ccmap_find_slot_protected (count=,
> hash=hash@entry=1669671676, b=b@entry=0x7fd2f8012a80) at
> lib/ccmap.c:278
> #1  ccmap_inc_bucket_existing (b=b@entry=0x7fd2f8012a80,
> hash=hash@entry=1669671676, inc=inc@entry=1) at lib/ccmap.c:281
> #2  0x562773d4b015 in ccmap_try_inc
> (impl=impl@entry=0x7fd2f8012a40, hash=hash@entry=1669671676,
> inc=inc@entry=1) at lib/ccmap.c:464
> #3  0x562773d4b224 in ccmap_inc (ccmap=ccmap@entry=0x7fd2f802a7e8,
> hash=1669671676) at lib/ccmap.c:485
> #4  0x562773d4975a in classifier_replace (cls=,
> rule=rule@entry=0x7fd2fac70e28, version=,
> conjs=, n_conjs=)
> at lib/classifier.c:579
> #5  0x562773d49e99 in classifier_insert (cls=,
> rule=rule@entry=0x7fd2fac70e28, version=,
> conj=, n_conj=)
> at lib/classifier.c:694
> #6  0x562773d00fc8 in replace_rule_start
> (ofproto=ofproto@entry=0x5627778cc420, ofm=ofm@entry=0x7fd3057235f0,
> old_rule=, new_rule=new_rule@entry=0x7fd2fac70e20)
> at ofproto/ofproto.c:5645
> #7  0x562773d010e4 in add_flow_start (ofproto=0x5627778cc420,
> ofm=0x7fd3057235f0) at ofproto/ofproto.c:5256
> #8  0x562773d0122d in modify_flows_start__
> (ofproto=ofproto@entry=0x5627778cc420, ofm=ofm@entry=0x7fd3057235f0)
> at ofproto/ofproto.c:5824
> #9  0x562773d01eac in modify_flow_start_strict
> (ofm=0x7fd3057235f0, ofproto=0x5627778cc420) at ofproto/ofproto.c:5953
> #10 ofproto_flow_mod_start (ofproto=0x5627778cc420,
> ofm=ofm@entry=0x7fd3057235f0) at ofproto/ofproto.c:8112
> #11 0x562773d0225a in ofproto_flow_mod_learn_start
> (ofm=ofm@entry=0x7fd3057235f0) at ofproto/ofproto.c:5491
> #12 0x562773d040ad in ofproto_flow_mod_learn
> (ofm=ofm@entry=0x7fd3057235f0, keep_ref=,
> limit=, below_limitp=below_limitp@entry=0x7fd305723510)
> at ofproto/ofproto.c:5576
> #13 0x562773d2641e in xlate_learn_action
> (ctx=ctx@entry=0x7fd305729a60, learn=learn@entry=0x562777db4618) at
> ofproto/ofproto-dpif-xlate.c:5547
> #14 0x562773d2aafb in do_xlate_actions (ofpacts=,
> ofpacts_len=, ctx=0x7fd305729a60,
> is_last_action=, group_bucket_action=)
> at ofproto/ofproto-dpif-xlate.c:7232
> #15 0x562773d26c85 in xlate_recursively
> (actions_xlator=0x562773d29490 ,
> is_last_action=true, deepens=false, rule=0x562777db4470,
> ctx=0x7fd305729a60)
> at ofproto/ofproto-dpif-xlate.c:4383
> #16 xlate_table_action (ctx=0x7fd305729a60, in_port=,
> table_id=, may_packet_in=,
> honor_table_miss=, with_ct_orig=,
> is_last_action=true, xlator=0x562773d29490 ) at
> ofproto/ofproto-dpif-xlate.c:4512
> #17 0x562773d2ab8d in xlate_ofpact_resubmit
> (resubmit=0x56277781db28, resubmit=0x56277781db28,
> resubmit=0x56277781db28, is_last_action=true, ctx=0x7fd305729a60)
> at